Add a transform to easily create `summary` tasks #71

ahal · 2022-06-28T20:13:30Z

Often people want to have a task that can summarize the results of other tasks. Either the entire graph or some subset. In Gecko there's a code-review task that does just this:
https://searchfox.org/mozilla-central/source/taskcluster/ci/code-review/kind.yml

It basically waits for all tasks with the code-review attribute to finish, and then sends a pulse message to notify consumers that they're ready (the consumers do the actual status inspection).

I propose we:

Create a transform file that does something similar to the code review transform
Implement some pre-defined "behaviors" the task can follow. Here are some example behaviours:

noop - Task always passes, routes can notify that all tasks being summarized are finished
require_pass - Task passes if all dependencies pass, otherwise it fails (can be useful if using an on-exception notify route)
custom - Task runs some arbitrary command as normal (?)

The text was updated successfully, but these errors were encountered:

escapewindow · 2022-06-28T20:19:30Z

I think one problem is that we don't have a "trigger this task when all the dependencies are in a resolved state, whether that's completed, failure, or exception" model. So if one or more of the dependencies fail, we won't get a notification. But I'm not sure if that would be a great model anyway, since we'd have a race condition when we cancel all tasks in a graph or hit the deadline for tasks in a graph.

This is why I was wondering if going outside of the graph, to something listening in pulse, might be better, though we don't necessarily want to wait for a 24h deadline to be notified when something is hung or in a failure state mid-graph either. Not sure.

But this could definitely be useful for telling us when all the dependent tasks are completed, definitely.

ahal · 2022-06-28T20:30:52Z

I think one problem is that we don't have a "trigger this task when all the dependencies are in a resolved state, whether that's completed, failure, or exception" model.

We do, see the all-resolved value for requires in the task schema. Taskgraph already has hookups for this (it's also what the code-review task uses). I just found out about this myself :)

since we'd have a race condition when we cancel all tasks in a graph or hit the deadline for tasks in a graph.

Yeah, those edge cases would need some special consideration. Maybe the cancel-all action could cancel everything except summary tasks so they still fire. And maybe summary tasks could have a deadline equal to latest deadline in dependencies + 1 hour or something like that.

This is why I was wondering if going outside of the graph, to something listening in pulse, might be better, though we don't necessarily want to wait for a 24h deadline to be notified when something is hung or in a failure state mid-graph either. Not sure.

I agree, I think this would be better. Though I think we could implement this issue in a day or two of work, whereas outside of Taskgraph feels like weeks to quarters.

escapewindow · 2022-06-28T21:21:21Z

I think one problem is that we don't have a "trigger this task when all the dependencies are in a resolved state, whether that's completed, failure, or exception" model.

We do, see the all-resolved value for requires in the task schema. Taskgraph already has hookups for this (it's also what the code-review task uses). I just found out about this myself :)

Nice, TIL :)
If an upstream task from your dependencies is broken, you may be waiting a long time, but perhaps that's sufficient for an initial generic solution.

since we'd have a race condition when we cancel all tasks in a graph or hit the deadline for tasks in a graph.

Yeah, those edge cases would need some special consideration. Maybe the cancel-all action could cancel everything except summary tasks so they still fire. And maybe summary tasks could have a deadline equal to latest deadline in dependencies + 1 hour or something like that.

Yeah, or maybe we don't want notifications for every canceled graph anyway. Might be a topic of discussion.

This is why I was wondering if going outside of the graph, to something listening in pulse, might be better, though we don't necessarily want to wait for a 24h deadline to be notified when something is hung or in a failure state mid-graph either. Not sure.

I agree, I think this would be better. Though I think we could implement this issue in a day or two of work, whereas outside of Taskgraph feels like weeks to quarters.

👍 Outside of taskgraph could tie into the git pushlog / generic dashboard / improved TC UI / generic treeherder replacement project, and live in future blue sky land.

marco-c · 2022-06-29T08:34:25Z

This would help with mozilla/code-coverage#1047. Currently we have a service listening to pulse messages that acts when it receives a "group finished" notification. Here's the source code of it if you're curious: https://github.com/mozilla/code-coverage/blob/master/events/code_coverage_events/workflow.py.

bendk · 2022-07-12T19:30:00Z

FWIW, here's how we set something like this up in app-services: https://github.com/mozilla/application-services/blob/main/taskcluster/app_services_taskgraph/transforms/deps_complete.py

ahal · 2022-07-21T13:13:40Z

Turns out there is a transform that somewhat helps here already:
https://github.com/taskcluster/taskgraph/blob/main/src/taskgraph/transforms/code_review.py

So maybe this issue is more about re-naming it to something more generic and including more features + a howto guide in the docs.

JohanLorenzo · 2023-02-09T15:01:24Z

Hey there! @ahal just pointed me to this issue.

Just like app-services, the Android repos have their own implementation which actually around code_review.py. This task is called complete and make it depends on any task we want to track. complete automatically turns red if one of its parents tasks failed.

For the record, Android repos have had this implementation for years. I think it's mature-enough to be part of core taskgraph 👍

ahal mentioned this issue Jul 20, 2022

(v103) Bug 1779797 - Android: release branch string uplifts should auto-merge if CI is green mozilla-mobile/fenix#26111

Merged

4 tasks

miladHakimi mentioned this issue Apr 17, 2023

fix: build-docker silent injection with dontbuild bug. #209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a transform to easily create `summary` tasks #71

Add a transform to easily create `summary` tasks #71

ahal commented Jun 28, 2022

escapewindow commented Jun 28, 2022 •

edited

Loading

ahal commented Jun 28, 2022

escapewindow commented Jun 28, 2022

marco-c commented Jun 29, 2022

bendk commented Jul 12, 2022

ahal commented Jul 21, 2022

JohanLorenzo commented Feb 9, 2023

Add a transform to easily create summary tasks #71

Add a transform to easily create summary tasks #71

Comments

ahal commented Jun 28, 2022

escapewindow commented Jun 28, 2022 • edited Loading

ahal commented Jun 28, 2022

escapewindow commented Jun 28, 2022

marco-c commented Jun 29, 2022

bendk commented Jul 12, 2022

ahal commented Jul 21, 2022

JohanLorenzo commented Feb 9, 2023

Add a transform to easily create `summary` tasks #71

Add a transform to easily create `summary` tasks #71

escapewindow commented Jun 28, 2022 •

edited

Loading