-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to kill function executions #4334
Merged
vbustamante
merged 2 commits into
main
from
victor/eng-2633-add-ability-to-kill-functions
Aug 15, 2024
Merged
Add ability to kill function executions #4334
vbustamante
merged 2 commits into
main
from
victor/eng-2633-add-ability-to-kill-functions
Aug 15, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
A-si-settings
Area: Backend service settings management [Rust]
A-sdf
Area: Primary backend API service [Rust]
A-veritech
Area: Task execution backend service [Rust]
A-cyclone
Area: Function execution engine [Rust]
A-dal
A-bytes-lines-codec
A-config-file
A-si-test-macros
A-telemetry-rs
A-dal-test
A-si-data-nats
A-si-data-pg
A-si-std
A-telemetry-application-rs
A-pinga
A-object-tree
A-si-pkg
A-nats-subscriber
A-si-posthog-rs
A-buck2-resources
A-module-index
A-rebaser
A-si-layer-cache
A-si-pool-noodle
A-web
A-lang-js
labels
Aug 14, 2024
nickgerace
force-pushed
the
victor/eng-2633-add-ability-to-kill-functions
branch
3 times, most recently
from
August 15, 2024 17:18
fdde7ef
to
19f4428
Compare
nickgerace
force-pushed
the
victor/eng-2633-add-ability-to-kill-functions
branch
2 times, most recently
from
August 15, 2024 17:31
c93191a
to
418c127
Compare
sprutton1
reviewed
Aug 15, 2024
@@ -33,9 +33,7 @@ impl FuncDispatch for FuncBackendJsAttribute { | |||
before: Vec<BeforeFunction>, | |||
) -> Box<Self> { | |||
let request = ResolverFunctionRequest { | |||
// Once we start tracking the state of these executions, then this id will be useful, | |||
// but for now it's passed along and back, and is opaque | |||
execution_id: "tomcruise".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rip
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Aug 15, 2024
nickgerace
force-pushed
the
victor/eng-2633-add-ability-to-kill-functions
branch
from
August 15, 2024 21:25
418c127
to
2e7e654
Compare
This commit contains the ability to kill function executions as well as a variety of other enhancements and tuning to the function execution layer. Killing function executions occurs through the FuncRunner by using its new "cancel execution" functionality. This works by using the veritech client to request that veritech kill its ongoing function execution via a "tokio::sync::oneshot" channel. What about multiple veritechs? This commit is untested with multiple veritechs, but the use case was considered. The idea is that all veritechs will receive the message and only one of them will act upon it. The other veritechs will "no-op". This will require testing and potential tuning to the NATS subscriptions. For example, they should all consume the same messages for requests to cancel active executions. Future work: we need to de-duplicate a lot of the veritech server logic. We also need to keep an eye on the mutex locking and be sure that we cannot hit a deadlock. While we are there, it would be worth seeing if we can use our once cell lazy implementation of the "kill sender map" or if we should store it in an "Arc<Mutex<T>>" on the server itself. Finally, we should test multiple veritechs and "no-op" if a "kill sender" could not be found (only one veritech will have it). Primary changes: - Add ability to cancel func run executions via the FuncRunner - Add ability to kill function executions from the admin dashboard and in the API - Add "meta" subscriber and "meta" NATS subject prefix for veritech communication unrelated to literal function execution (i.e. communication for veritech itself) - Replace hardcoded execution ids with FuncRunId converted to a String (potential to make this strongly typed in the future) - We now thread through the FuncRunId in the FuncDispatchContext - Add "FuncRunState::Cancelled" for cancelled FuncRuns, but do not do the same for ActionRunState - They should still fail, but their reason for failing is due to a cancelled FuncRun - This also makes it clear that the ActionRun can be retried whereas if we added a similar "cancelled" ActionRunState, it would be unclear Secondary changes: - Add admin dashboard and endpoint (no change set required) - Add "ADMIN_PANEL_ACCESS" feature flag - Group veritech internal errors to be sent to a veritech client by a single kind stored by a const ("veritechServer") - Make "FunctionResultFailure" fields private and provide two constructor methods: one for generic use and one for internal veritech errors - We found existing locations where "FunctionResultFailure" was used for veritech internal errors by using the "veritechServer" kind, so we now group this functionality for clarity (and "veritechServer" is now stored in a const) - Add ability to determine if the history actor's email is a "systeminit" email - Add integration test for cancel execution - Remove unused "with_subject" functions from the veritech client - Make the "output_tx" option for veritech client requests - We do not need it for cancelling function executions Signed-off-by: Nick Gerace <[email protected]> Co-authored-by: Victor Bustamante <[email protected]>
nickgerace
force-pushed
the
victor/eng-2633-add-ability-to-kill-functions
branch
from
August 15, 2024 21:32
2e7e654
to
255668d
Compare
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Aug 15, 2024
Signed-off-by: Victor Bustamante <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-buck2-resources
A-bytes-lines-codec
A-config-file
A-cyclone
Area: Function execution engine [Rust]
A-dal
A-dal-test
A-lang-js
A-module-index
A-nats-subscriber
A-object-tree
A-pinga
A-rebaser
A-sdf
Area: Primary backend API service [Rust]
A-si-data-nats
A-si-data-pg
A-si-layer-cache
A-si-pkg
A-si-pool-noodle
A-si-posthog-rs
A-si-settings
Area: Backend service settings management [Rust]
A-si-std
A-si-test-macros
A-telemetry-application-rs
A-telemetry-rs
A-veritech
Area: Task execution backend service [Rust]
A-web
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR contains the ability to kill function executions as well as a variety of other enhancements and tuning to the function execution layer.
Killing function executions occurs through the
FuncRunner
by using its new "cancel execution" functionality. This works by using theveritech
client to request thatveritech
kill its ongoing function execution via atokio::sync::oneshot
channel.What about multiple
veritechs
? This PR is untested with multipleveritechs
, but the use case was considered. The idea is that allveritechs
will receive the message and only one of them will act upon it. The otherveritechs
will "no-op". This will require testing and potential tuning to the NATS subscriptions. For example, they should all consume the same messages for requests to cancel active executions.Future Work
We need to de-duplicate a lot of the
veritech
server logic. We also need to keep an eye on the mutex locking and be sure that we cannot hit a deadlock. While we are there, it would be worth seeing if we can use our once cell lazy implementation of the "kill sender map" or if we should store it in anArc<Mutex<T>>
on the server itself. Finally, we should test multipleveritechs
and "no-op" if a "kill sender" could not be found (only oneveritech
will have it).Primary Changes
FuncRun
executions via theFuncRunner
veritech
communication unrelated to literal function execution (i.e. communication forveritech
itself)FuncRunId
converted to aString
(potential to make this strongly typed in the future)FuncRunId
in theFuncDispatchContext
FuncRunState::Cancelled
for cancelledFuncRuns
, but do not do the same forActionRunState
FuncRun
ActionRun
can be retried whereas if we added a similar "cancelled"ActionRunState
, it would be unclearSecondary Changes
ADMIN_PANEL_ACCESS
feature flagveritech
internal errors to be sent to averitech
client by a single kind stored by a const ("veritechServer"
)FunctionResultFailure
fields private and provide two constructor methods: one for generic use and one for internalveritech
errorsFunctionResultFailure
was used forveritech
internal errors by using the"veritechServer"
kind, so we now group this functionality for clarity (and"veritechServer"
is now stored in a const)"systeminit"
email"with_subject"
functions from theveritech
client"output_tx"
option forveritech
client requests