Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow management #3

Open
dabreegster opened this issue Oct 31, 2022 · 5 comments
Open

Workflow management #3

dabreegster opened this issue Oct 31, 2022 · 5 comments

Comments

@dabreegster
Copy link
Contributor

I want to take simple scripts like https://github.com/acteng/abstreet-to-atip/blob/main/split_uk_osm.sh and https://github.com/acteng/abstreet-to-atip/blob/main/build_route_snappers.sh and run them better. Requirements:

  • Specify a simple DAG of tasks
  • Web UI to track the progress of that DAG and to interactively debug
  • Per task, stash STDOUT and STDERR logs somewhere. And track basic runtime.
  • Parallelize loops, specifying the parallelism
  • Just run locally on one linux box, no distributed systems needed

I'm using https://github.com/Nukesor/pueue right now for A/B Street imports, but it lacks the UI.

At a glance, the most popular option out there is https://snakemake.readthedocs.io/en/stable/executing/monitoring.html, which has a (possibly WIP) web dashboard. I'll give it a quick try, but I guess another requirement is "easy install", and anything involving conda is definitely not.

@dabreegster
Copy link
Contributor Author

Pueue might be one of the quickest things to iterate on here. https://github.com/Nukesor/pueue/wiki/Miscellaneous#summarising-pueue-status-job-states has some scripts for monitoring. Log retrieval for failed jobs isn't hard.

@Robinlovelace
Copy link

Interested to see which one you go for and keen to learn from the workflow management/build process. Could be handy for other projects. What about good old fashioned make?

@dabreegster
Copy link
Contributor Author

make only satisfies the first requirement. It doesn't even give you job logs, and definitely no nice summary.

The Apache Beam ecosystem is another option, but it's over-powered / meant for very different workflows.

@dabreegster
Copy link
Contributor Author

@dabreegster
Copy link
Contributor Author

I'm re-importing all UK maps into A/B Street right now using my existing pueue workflow, plus https://github.com/Nukesor/pueue/wiki/Miscellaneous#summarising-pueue-status-job-states to track progress:
Screenshot from 2022-11-04 11-21-39
The UX is way better than running grep | wc -l. It'd be pretty low effort to hack together something to list out failed jobs, browse job logs, etc. There've been a few people expressing interest in a terminal tool around pueue, so I kind of suspect if I started something, other people would flesh it out later.

But I still want to play with Airflow, in case it just does all of this stuff already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants