Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a transform to easily create summary tasks #71

Open
ahal opened this issue Jun 28, 2022 · 7 comments
Open

Add a transform to easily create summary tasks #71

ahal opened this issue Jun 28, 2022 · 7 comments

Comments

@ahal
Copy link
Collaborator

ahal commented Jun 28, 2022

Often people want to have a task that can summarize the results of other tasks. Either the entire graph or some subset. In Gecko there's a code-review task that does just this:
https://searchfox.org/mozilla-central/source/taskcluster/ci/code-review/kind.yml

It basically waits for all tasks with the code-review attribute to finish, and then sends a pulse message to notify consumers that they're ready (the consumers do the actual status inspection).

I propose we:

  1. Create a transform file that does something similar to the code review transform
  2. Implement some pre-defined "behaviors" the task can follow. Here are some example behaviours:
  • noop - Task always passes, routes can notify that all tasks being summarized are finished
  • require_pass - Task passes if all dependencies pass, otherwise it fails (can be useful if using an on-exception notify route)
  • custom - Task runs some arbitrary command as normal (?)
@escapewindow
Copy link
Contributor

escapewindow commented Jun 28, 2022

I think one problem is that we don't have a "trigger this task when all the dependencies are in a resolved state, whether that's completed, failure, or exception" model. So if one or more of the dependencies fail, we won't get a notification. But I'm not sure if that would be a great model anyway, since we'd have a race condition when we cancel all tasks in a graph or hit the deadline for tasks in a graph.

This is why I was wondering if going outside of the graph, to something listening in pulse, might be better, though we don't necessarily want to wait for a 24h deadline to be notified when something is hung or in a failure state mid-graph either. Not sure.

But this could definitely be useful for telling us when all the dependent tasks are completed, definitely.

@ahal
Copy link
Collaborator Author

ahal commented Jun 28, 2022

I think one problem is that we don't have a "trigger this task when all the dependencies are in a resolved state, whether that's completed, failure, or exception" model.

We do, see the all-resolved value for requires in the task schema. Taskgraph already has hookups for this (it's also what the code-review task uses). I just found out about this myself :)

since we'd have a race condition when we cancel all tasks in a graph or hit the deadline for tasks in a graph.

Yeah, those edge cases would need some special consideration. Maybe the cancel-all action could cancel everything except summary tasks so they still fire. And maybe summary tasks could have a deadline equal to latest deadline in dependencies + 1 hour or something like that.

This is why I was wondering if going outside of the graph, to something listening in pulse, might be better, though we don't necessarily want to wait for a 24h deadline to be notified when something is hung or in a failure state mid-graph either. Not sure.

I agree, I think this would be better. Though I think we could implement this issue in a day or two of work, whereas outside of Taskgraph feels like weeks to quarters.

@escapewindow
Copy link
Contributor

I think one problem is that we don't have a "trigger this task when all the dependencies are in a resolved state, whether that's completed, failure, or exception" model.

We do, see the all-resolved value for requires in the task schema. Taskgraph already has hookups for this (it's also what the code-review task uses). I just found out about this myself :)

Nice, TIL :)
If an upstream task from your dependencies is broken, you may be waiting a long time, but perhaps that's sufficient for an initial generic solution.

since we'd have a race condition when we cancel all tasks in a graph or hit the deadline for tasks in a graph.

Yeah, those edge cases would need some special consideration. Maybe the cancel-all action could cancel everything except summary tasks so they still fire. And maybe summary tasks could have a deadline equal to latest deadline in dependencies + 1 hour or something like that.

Yeah, or maybe we don't want notifications for every canceled graph anyway. Might be a topic of discussion.

This is why I was wondering if going outside of the graph, to something listening in pulse, might be better, though we don't necessarily want to wait for a 24h deadline to be notified when something is hung or in a failure state mid-graph either. Not sure.

I agree, I think this would be better. Though I think we could implement this issue in a day or two of work, whereas outside of Taskgraph feels like weeks to quarters.

👍 Outside of taskgraph could tie into the git pushlog / generic dashboard / improved TC UI / generic treeherder replacement project, and live in future blue sky land.

@marco-c
Copy link

marco-c commented Jun 29, 2022

This would help with mozilla/code-coverage#1047. Currently we have a service listening to pulse messages that acts when it receives a "group finished" notification. Here's the source code of it if you're curious: https://github.com/mozilla/code-coverage/blob/master/events/code_coverage_events/workflow.py.

@bendk
Copy link

bendk commented Jul 12, 2022

FWIW, here's how we set something like this up in app-services: https://github.com/mozilla/application-services/blob/main/taskcluster/app_services_taskgraph/transforms/deps_complete.py

@ahal
Copy link
Collaborator Author

ahal commented Jul 21, 2022

Turns out there is a transform that somewhat helps here already:
https://github.com/taskcluster/taskgraph/blob/main/src/taskgraph/transforms/code_review.py

So maybe this issue is more about re-naming it to something more generic and including more features + a howto guide in the docs.

@JohanLorenzo
Copy link
Contributor

Hey there! @ahal just pointed me to this issue.

Just like app-services, the Android repos have their own implementation which actually around code_review.py. This task is called complete and make it depends on any task we want to track. complete automatically turns red if one of its parents tasks failed.

For the record, Android repos have had this implementation for years. I think it's mature-enough to be part of core taskgraph 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants