-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds deploy model flag support for local model registration, fixes integration tests #350
Conversation
Signed-off-by: Joshua Palis <[email protected]>
…ssociated integration test Signed-off-by: Joshua Palis <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #350 +/- ##
============================================
- Coverage 72.42% 72.27% -0.16%
- Complexity 571 572 +1
============================================
Files 72 73 +1
Lines 2988 2997 +9
Branches 226 230 +4
============================================
+ Hits 2164 2166 +2
- Misses 721 727 +6
- Partials 103 104 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we keep the "retry taskId until we get a resource ID" and "update resources in index" functionality in appropriate classes.
src/main/java/org/opensearch/flowframework/workflow/AbstractRetryableWorkflowStep.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/flowframework/rest/FlowFrameworkRestApiIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Joshua Palis <[email protected]>
Some flaky test failures for local model registration, curious if deploying multiple local models is the root cause of this. I'll check if undeploying the local model before proceeding with additional tests would resolve this:
|
…e, ascertaining deprovision sequence from created resources Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
…d to 100 to avoid opening circuit breaker Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Still observing flaky test failures, to mitigate this I have set the native memory threshold from 90 to 100 to prevent the circuit breaker from opening (Documentation), however the logs still show that the breaker is open. Will continue to look into this :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one strong suggestion for improvement.
src/main/java/org/opensearch/flowframework/workflow/AbstractRetryableWorkflowStep.java
Show resolved
Hide resolved
src/main/java/org/opensearch/flowframework/transport/DeprovisionWorkflowTransportAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/flowframework/transport/DeprovisionWorkflowTransportAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/flowframework/transport/DeprovisionWorkflowTransportAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/flowframework/transport/DeprovisionWorkflowTransportAction.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Joshua Palis <[email protected]>
…aker issues Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good overall with minor comments.
src/main/java/org/opensearch/flowframework/workflow/AbstractRetryableWorkflowStep.java
Show resolved
Hide resolved
src/main/java/org/opensearch/flowframework/workflow/RegisterLocalModelStep.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/flowframework/FlowFrameworkRestTestCase.java
Outdated
Show resolved
Hide resolved
…ry setting instead of native memory heap setting Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
…ith deployed flag, testing remote model registration with deploy step Signed-off-by: Joshua Palis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Approving with a comment that I'm not sure an id-suffix is needed any more. If it is, it's fine (or make it a constant or auto-generated).
src/main/java/org/opensearch/flowframework/workflow/RegisterLocalModelStep.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Joshua Palis <[email protected]>
Signed-off-by: Joshua Palis <[email protected]>
Did the CI miss 2 checks? We have 21 in total |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/flow-framework/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/flow-framework/backport-2.x
# Create a new branch
git switch --create backport/backport-350-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 760177a82a329312ed24385a82bd4a8b21f3bb41
# Push it to GitHub
git push --set-upstream origin backport/backport-350-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/flow-framework/backport-2.x Then, create a pull request where the |
…tegration tests (opensearch-project#350) * Fixing local model integration test Signed-off-by: Joshua Palis <[email protected]> * Added deploy model flag support for local model registration, added associated integration test Signed-off-by: Joshua Palis <[email protected]> * Fixing comment Signed-off-by: Joshua Palis <[email protected]> * Fixing deprovision workflow transport action, removing use of template, ascertaining deprovision sequence from created resources Signed-off-by: Joshua Palis <[email protected]> * Removing rest status checks for deprovision API tests Signed-off-by: Joshua Palis <[email protected]> * Increasing wait time for deprovision status Signed-off-by: Joshua Palis <[email protected]> * Removing sdeprovision status checks for model deployment tests Signed-off-by: Joshua Palis <[email protected]> * increasing timeout for local model registration test template Signed-off-by: Joshua Palis <[email protected]> * Reverting timeout increase, setting ML Commons native memory threshold to 100 to avoid opening circuit breaker Signed-off-by: Joshua Palis <[email protected]> * Passing an action listener to retryableGetMlTask Signed-off-by: Joshua Palis <[email protected]> * Addressing PR comments, preserving order of resource map Signed-off-by: Joshua Palis <[email protected]> * Testing if a wait time after deprovisioning will mitigate circuit breaker issues Signed-off-by: Joshua Palis <[email protected]> * Increasing mlconfig index creation wait time Signed-off-by: Joshua Palis <[email protected]> * Combining local model registration tests into one Signed-off-by: Joshua Palis <[email protected]> * removing resource map from deprovision workflow transport action Signed-off-by: Joshua Palis <[email protected]> * Fixing getResourceFromDeprovisionNOde and tests Signed-off-by: Joshua Palis <[email protected]> * Separating out local model registration tests, using ml jvm heap memory setting instead of native memory heap setting Signed-off-by: Joshua Palis <[email protected]> * Testing : removing second local model registration test Signed-off-by: Joshua Palis <[email protected]> * Reducing model registration tests, testing local model registration with deployed flag, testing remote model registration with deploy step Signed-off-by: Joshua Palis <[email protected]> * Removing suffix from simulated deploy model step Signed-off-by: Joshua Palis <[email protected]> --------- Signed-off-by: Joshua Palis <[email protected]>
Adds deploy model flag support for local model registration, fixes integration tests (#350) * Fixing local model integration test * Added deploy model flag support for local model registration, added associated integration test * Fixing comment * Fixing deprovision workflow transport action, removing use of template, ascertaining deprovision sequence from created resources * Removing rest status checks for deprovision API tests * Increasing wait time for deprovision status * Removing sdeprovision status checks for model deployment tests * increasing timeout for local model registration test template * Reverting timeout increase, setting ML Commons native memory threshold to 100 to avoid opening circuit breaker * Passing an action listener to retryableGetMlTask * Addressing PR comments, preserving order of resource map * Testing if a wait time after deprovisioning will mitigate circuit breaker issues * Increasing mlconfig index creation wait time * Combining local model registration tests into one * removing resource map from deprovision workflow transport action * Fixing getResourceFromDeprovisionNOde and tests * Separating out local model registration tests, using ml jvm heap memory setting instead of native memory heap setting * Testing : removing second local model registration test * Reducing model registration tests, testing local model registration with deployed flag, testing remote model registration with deploy step * Removing suffix from simulated deploy model step --------- Signed-off-by: Joshua Palis <[email protected]>
Description
This PR achieves multiple things :
AbstractRetryableWorkflowStep
retryableGetMLTask
to accept an action listener and moves completing the future/ simulating model deployment within the step classes themselvesDeprovisionWorkflowTransportAction
so that the deprovision sequence is ascertained from the resources created, rather than the provision sequenceResources Created output for the single step register local model template :
Issues Resolved
Fixes #345
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.