Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Trained Model: Fix start deployment with ML autoscaling and 0 active nodes #201256

Merged
merged 5 commits into from
Nov 26, 2024

Conversation

darnautov
Copy link
Contributor

@darnautov darnautov commented Nov 21, 2024

Summary

During my testing, I used the current user with all required privileges but failed to notice that, after switching to the internal kibana_system user, it lacked the manage_autoscaling privilege required for the GET /_autoscaling/policy API.

As a result, the isMlAutoscalingEnabled flag, which we rely on in the Start Deployment modal, was always set to false. This caused a bug in scenarios with zero active ML nodes, where falling back to deriving available processors from ML limits was not possible.

You can check the created deployment, it correctly identifies ML autoscaling:

image

Also fixes restoring vCPU levels from the API deployment params.

Checklist

Check the PR satisfies following conditions.

@darnautov darnautov self-assigned this Nov 21, 2024
@darnautov darnautov added Team:ML Team label for ML (also use :ml) backport:version Backport to applied version labels v8.17.0 labels Nov 21, 2024
@darnautov darnautov added v8.18.0 v8.16.2 ci:cloud-deploy Create or update a Cloud deployment labels Nov 21, 2024
@darnautov darnautov marked this pull request as ready for review November 21, 2024 17:49
@darnautov darnautov requested a review from a team as a code owner November 21, 2024 17:49
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I've added a comment about the possible redundancy of the getAutoscalingPolicy call.
If using the lazy node count is reliable, then I think we could just use that for setting isMlAutoscalingEnabled

// If doesn't exist, then keep the false
// If ml autoscaling policy doesn't exist or the user does not have privileges to fetch it,
// check the number of lazy ml nodes to determine if autoscaling is enabled.
const lazyMlNodeCount = await getLazyMlNodeCount(client);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this check will always work and so we don't really need to attempt to fetch getAutoscalingPolicy as they'll produce the same results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are cases when this check won't work, e.g.

  • ML auto scaling may be disabled in the tier
  • We’ve hit the autoscaling limit
  • On prem users can set the settings, but it won't scale

So checking the autoscaling policy as a current user is still worth trying

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing against your cloud deployment, the check for auto-scaling looks good.

As discussed offline, when I update a deployment created at low / medium vCPUs level, it comes up saying it was 'high'.

Screenshot 2024-11-22 at 16 09 50

@darnautov
Copy link
Contributor Author

@peteharverson the vCPU levels issues is fixed in 37ee935

@darnautov
Copy link
Contributor Author

@elasticmachine merge upstream

@darnautov
Copy link
Contributor Author

@elasticmachine merge upstream

@elasticmachine
Copy link
Contributor

elasticmachine commented Nov 25, 2024

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
ml 4.7MB 4.7MB +154.0B

History

cc @darnautov

@darnautov darnautov added the ci:cloud-redeploy Always create a new Cloud deployment label Nov 25, 2024
Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested latest changes in your cloud instance and LGTM.
As discussed offline, we should aim to fix in a separate PR the lack of error messaging when a second deployment fails to start.

@darnautov darnautov merged commit 9827a07 into elastic:main Nov 26, 2024
24 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.16, 8.17, 8.x

https://github.com/elastic/kibana/actions/runs/12028725291

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 26, 2024
…tive nodes (elastic#201256)

## Summary

During my testing, I used the current user with all required privileges
but failed to notice that, after switching to the internal`
kibana_system` user, it lacked the manage_autoscaling privilege required
for the `GET /_autoscaling/policy` API.

As a result, the `isMlAutoscalingEnabled` flag, which we rely on in the
Start Deployment modal, was always set to false. This caused a bug in
scenarios with zero active ML nodes, where falling back to deriving
available processors from ML limits was not possible.

You can check the created deployment, it correctly identifies ML
autoscaling:

<img width="670" alt="image"
src="https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd">

Also fixes restoring vCPU levels from the API deployment params.

### Checklist

Check the PR satisfies following conditions.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 9827a07)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 26, 2024
…tive nodes (elastic#201256)

## Summary

During my testing, I used the current user with all required privileges
but failed to notice that, after switching to the internal`
kibana_system` user, it lacked the manage_autoscaling privilege required
for the `GET /_autoscaling/policy` API.

As a result, the `isMlAutoscalingEnabled` flag, which we rely on in the
Start Deployment modal, was always set to false. This caused a bug in
scenarios with zero active ML nodes, where falling back to deriving
available processors from ML limits was not possible.

You can check the created deployment, it correctly identifies ML
autoscaling:

<img width="670" alt="image"
src="https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd">

Also fixes restoring vCPU levels from the API deployment params.

### Checklist

Check the PR satisfies following conditions.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 9827a07)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 26, 2024
…tive nodes (elastic#201256)

## Summary

During my testing, I used the current user with all required privileges
but failed to notice that, after switching to the internal`
kibana_system` user, it lacked the manage_autoscaling privilege required
for the `GET /_autoscaling/policy` API.

As a result, the `isMlAutoscalingEnabled` flag, which we rely on in the
Start Deployment modal, was always set to false. This caused a bug in
scenarios with zero active ML nodes, where falling back to deriving
available processors from ML limits was not possible.

You can check the created deployment, it correctly identifies ML
autoscaling:

<img width="670" alt="image"
src="https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd">

Also fixes restoring vCPU levels from the API deployment params.

### Checklist

Check the PR satisfies following conditions.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 9827a07)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.16
8.17
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Nov 26, 2024
…nd 0 active nodes (#201256) (#201746)

# Backport

This will backport the following commits from `main` to `8.16`:
- [[ML] Trained Model: Fix start deployment with ML autoscaling and 0
active nodes (#201256)](#201256)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dima
Arnautov","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T10:33:04Z","message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix",":ml","v9.0.0","Feature:3rd
Party
Models","Team:ML","ci:cloud-deploy","ci:cloud-redeploy","backport:version","v8.17.0","v8.18.0","v8.16.2"],"title":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes
","number":201256,"url":"https://github.com/elastic/kibana/pull/201256","mergeCommit":{"message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889"}},"sourceBranch":"main","suggestedTargetBranches":["8.17","8.x","8.16"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201256","number":201256,"mergeCommit":{"message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.x","label":"v8.18.0","branchLabelMappingKey":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.16","label":"v8.16.2","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Dima Arnautov <[email protected]>
kibanamachine added a commit that referenced this pull request Nov 26, 2024
…nd 0 active nodes (#201256) (#201747)

# Backport

This will backport the following commits from `main` to `8.17`:
- [[ML] Trained Model: Fix start deployment with ML autoscaling and 0
active nodes (#201256)](#201256)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dima
Arnautov","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T10:33:04Z","message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix",":ml","v9.0.0","Feature:3rd
Party
Models","Team:ML","ci:cloud-deploy","ci:cloud-redeploy","backport:version","v8.17.0","v8.18.0","v8.16.2"],"title":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes
","number":201256,"url":"https://github.com/elastic/kibana/pull/201256","mergeCommit":{"message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889"}},"sourceBranch":"main","suggestedTargetBranches":["8.17","8.x","8.16"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201256","number":201256,"mergeCommit":{"message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.x","label":"v8.18.0","branchLabelMappingKey":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.16","label":"v8.16.2","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Dima Arnautov <[email protected]>
kibanamachine added a commit that referenced this pull request Nov 26, 2024
…d 0 active nodes (#201256) (#201748)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[ML] Trained Model: Fix start deployment with ML autoscaling and 0
active nodes (#201256)](#201256)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dima
Arnautov","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T10:33:04Z","message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix",":ml","v9.0.0","Feature:3rd
Party
Models","Team:ML","ci:cloud-deploy","ci:cloud-redeploy","backport:version","v8.17.0","v8.18.0","v8.16.2"],"title":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes
","number":201256,"url":"https://github.com/elastic/kibana/pull/201256","mergeCommit":{"message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889"}},"sourceBranch":"main","suggestedTargetBranches":["8.17","8.x","8.16"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201256","number":201256,"mergeCommit":{"message":"[ML]
Trained Model: Fix start deployment with ML autoscaling and 0 active
nodes (#201256)\n\n## Summary\r\n\r\nDuring my testing, I used the
current user with all required privileges\r\nbut failed to notice that,
after switching to the internal`\r\nkibana_system` user, it lacked the
manage_autoscaling privilege required\r\nfor the `GET
/_autoscaling/policy` API.\r\n\r\nAs a result, the
`isMlAutoscalingEnabled` flag, which we rely on in the\r\nStart
Deployment modal, was always set to false. This caused a bug
in\r\nscenarios with zero active ML nodes, where falling back to
deriving\r\navailable processors from ML limits was not
possible.\r\n\r\n\r\nYou can check the created deployment, it correctly
identifies ML\r\nautoscaling:\r\n\r\n<img width=\"670\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd\">\r\n\r\n\r\nAlso
fixes restoring vCPU levels from the API deployment params.\r\n\r\n###
Checklist\r\n\r\nCheck the PR satisfies following conditions. \r\n\r\n-
[x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"9827a07b5891d643a61a53e09350ff6e4ab25889"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.x","label":"v8.18.0","branchLabelMappingKey":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.16","label":"v8.16.2","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Dima Arnautov <[email protected]>
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this pull request Nov 26, 2024
…tive nodes (elastic#201256)

## Summary

During my testing, I used the current user with all required privileges
but failed to notice that, after switching to the internal`
kibana_system` user, it lacked the manage_autoscaling privilege required
for the `GET /_autoscaling/policy` API.

As a result, the `isMlAutoscalingEnabled` flag, which we rely on in the
Start Deployment modal, was always set to false. This caused a bug in
scenarios with zero active ML nodes, where falling back to deriving
available processors from ML limits was not possible.


You can check the created deployment, it correctly identifies ML
autoscaling:

<img width="670" alt="image"
src="https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd">


Also fixes restoring vCPU levels from the API deployment params.

### Checklist

Check the PR satisfies following conditions. 

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@darnautov darnautov deleted the ml-fix-autoscaling-check branch December 6, 2024 08:38
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Dec 12, 2024
…tive nodes (elastic#201256)

## Summary

During my testing, I used the current user with all required privileges
but failed to notice that, after switching to the internal`
kibana_system` user, it lacked the manage_autoscaling privilege required
for the `GET /_autoscaling/policy` API.

As a result, the `isMlAutoscalingEnabled` flag, which we rely on in the
Start Deployment modal, was always set to false. This caused a bug in
scenarios with zero active ML nodes, where falling back to deriving
available processors from ML limits was not possible.


You can check the created deployment, it correctly identifies ML
autoscaling:

<img width="670" alt="image"
src="https://github.com/user-attachments/assets/ff1f835e-2b90-4b73-bea8-a49da8846fbd">


Also fixes restoring vCPU levels from the API deployment params.

### Checklist

Check the PR satisfies following conditions. 

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:version Backport to applied version labels ci:cloud-deploy Create or update a Cloud deployment ci:cloud-redeploy Always create a new Cloud deployment Feature:3rd Party Models ML 3rd party models :ml release_note:fix Team:ML Team label for ML (also use :ml) v8.16.2 v8.17.0 v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants