Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to kill function executions #4334

Merged
merged 2 commits into from
Aug 15, 2024

Conversation

nickgerace
Copy link
Contributor

@nickgerace nickgerace commented Aug 14, 2024

Description

This PR contains the ability to kill function executions as well as a variety of other enhancements and tuning to the function execution layer.

Killing function executions occurs through the FuncRunner by using its new "cancel execution" functionality. This works by using the veritech client to request that veritech kill its ongoing function execution via a tokio::sync::oneshot channel.

What about multiple veritechs? This PR is untested with multiple veritechs, but the use case was considered. The idea is that all veritechs will receive the message and only one of them will act upon it. The other veritechs will "no-op". This will require testing and potential tuning to the NATS subscriptions. For example, they should all consume the same messages for requests to cancel active executions.

image

Future Work

We need to de-duplicate a lot of the veritech server logic. We also need to keep an eye on the mutex locking and be sure that we cannot hit a deadlock. While we are there, it would be worth seeing if we can use our once cell lazy implementation of the "kill sender map" or if we should store it in an Arc<Mutex<T>> on the server itself. Finally, we should test multiple veritechs and "no-op" if a "kill sender" could not be found (only one veritech will have it).

Primary Changes

  • Add ability to cancel FuncRun executions via the FuncRunner
  • Add ability to kill function executions from the admin dashboard and in the API
  • Add "meta" subscriber and "meta" NATS subject prefix for veritech communication unrelated to literal function execution (i.e. communication for veritech itself)
  • Replace hardcoded execution ids with FuncRunId converted to a String (potential to make this strongly typed in the future)
    • We now thread through the FuncRunId in the FuncDispatchContext
  • Add FuncRunState::Cancelled for cancelled FuncRuns, but do not do the same for ActionRunState
    • They should still fail, but their reason for failing is due to a cancelled FuncRun
    • This also makes it clear that the ActionRun can be retried whereas if we added a similar "cancelled" ActionRunState, it would be unclear

Secondary Changes

  • Add admin dashboard and endpoint (no change set required)
  • Add ADMIN_PANEL_ACCESS feature flag
  • Group veritech internal errors to be sent to a veritech client by a single kind stored by a const ("veritechServer")
  • Make FunctionResultFailure fields private and provide two constructor methods: one for generic use and one for internal veritech errors
    • We found existing locations where FunctionResultFailure was used for veritech internal errors by using the "veritechServer" kind, so we now group this functionality for clarity (and "veritechServer" is now stored in a const)
  • Add ability to determine if the history actor's email is a "systeminit" email
  • Add integration test for cancel execution
  • Remove unused "with_subject" functions from the veritech client
  • Make the "output_tx" option for veritech client requests
    • We do not need it for cancelling function executions

@nickgerace nickgerace force-pushed the victor/eng-2633-add-ability-to-kill-functions branch 2 times, most recently from c93191a to 418c127 Compare August 15, 2024 17:31
@@ -33,9 +33,7 @@ impl FuncDispatch for FuncBackendJsAttribute {
before: Vec<BeforeFunction>,
) -> Box<Self> {
let request = ResolverFunctionRequest {
// Once we start tracking the state of these executions, then this id will be useful,
// but for now it's passed along and back, and is opaque
execution_id: "tomcruise".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rip

@nickgerace nickgerace added this pull request to the merge queue Aug 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 15, 2024
@nickgerace nickgerace force-pushed the victor/eng-2633-add-ability-to-kill-functions branch from 418c127 to 2e7e654 Compare August 15, 2024 21:25
This commit contains the ability to kill function executions as well as
a variety of other enhancements and tuning to the function execution
layer.

Killing function executions occurs through the FuncRunner by using its
new "cancel execution" functionality. This works by using the veritech
client to request that veritech kill its ongoing function execution via
a "tokio::sync::oneshot" channel.

What about multiple veritechs? This commit is untested with multiple
veritechs, but the use case was considered. The idea is that all
veritechs will receive the message and only one of them will act upon
it. The other veritechs will "no-op". This will require testing and
potential tuning to the NATS subscriptions. For example, they should all
consume the same messages for requests to cancel active executions.

Future work: we need to de-duplicate a lot of the veritech server logic.
We also need to keep an eye on the mutex locking and be sure that we
cannot hit a deadlock. While we are there, it would be worth seeing if
we can use our once cell lazy implementation of the "kill sender map" or
if we should store it in an "Arc<Mutex<T>>" on the server itself.
Finally, we should test multiple veritechs and "no-op" if a "kill
sender" could not be found (only one veritech will have it).

Primary changes:
- Add ability to cancel func run executions via the FuncRunner
- Add ability to kill function executions from the admin dashboard and
  in the API
- Add "meta" subscriber and "meta" NATS subject prefix for veritech
  communication unrelated to literal function execution (i.e.
  communication for veritech itself)
- Replace hardcoded execution ids with FuncRunId converted to a String
  (potential to make this strongly typed in the future)
  - We now thread through the FuncRunId in the FuncDispatchContext
- Add "FuncRunState::Cancelled" for cancelled FuncRuns, but do not do
  the same for ActionRunState
  - They should still fail, but their reason for failing is due to a
    cancelled FuncRun
  - This also makes it clear that the ActionRun can be retried whereas
    if we added a similar "cancelled" ActionRunState, it would be
    unclear

Secondary changes:
- Add admin dashboard and endpoint (no change set required)
- Add "ADMIN_PANEL_ACCESS" feature flag
- Group veritech internal errors to be sent to a veritech client by a
  single kind stored by a const ("veritechServer")
- Make "FunctionResultFailure" fields private and provide two
  constructor methods: one for generic use and one for internal
  veritech errors
  - We found existing locations where "FunctionResultFailure" was used
    for veritech internal errors by using the "veritechServer" kind, so
    we now group this functionality for clarity (and "veritechServer" is
    now stored in a const)
- Add ability to determine if the history actor's email is a
  "systeminit" email
- Add integration test for cancel execution
- Remove unused "with_subject" functions from the veritech client
- Make the "output_tx" option for veritech client requests
  - We do not need it for cancelling function executions

Signed-off-by: Nick Gerace <[email protected]>
Co-authored-by: Victor Bustamante <[email protected]>
@nickgerace nickgerace force-pushed the victor/eng-2633-add-ability-to-kill-functions branch from 2e7e654 to 255668d Compare August 15, 2024 21:32
@nickgerace nickgerace added this pull request to the merge queue Aug 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 15, 2024
@nickgerace nickgerace added this pull request to the merge queue Aug 15, 2024
@nickgerace nickgerace removed this pull request from the merge queue due to a manual request Aug 15, 2024
@vbustamante vbustamante added this pull request to the merge queue Aug 15, 2024
Merged via the queue into main with commit 9f070c2 Aug 15, 2024
9 checks passed
@vbustamante vbustamante deleted the victor/eng-2633-add-ability-to-kill-functions branch August 15, 2024 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants