Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User story: Parallelize video encoding to speed up processing of videos #14

Closed
6 of 10 tasks
galenlynch opened this issue Nov 3, 2024 · 1 comment
Closed
6 of 10 tasks
Assignees

Comments

@galenlynch
Copy link
Collaborator

galenlynch commented Nov 3, 2024

User story

As a user I want to see the results as soon as possible. Right now, each video is processed sequentially, with each one taking many hours to process. This can result in a delay of a day or more between doing an experiment and seeing the results.

Acceptance criteria

  • Allow encoding sub processes to run simultaneously, instead of in sequence.
  • Make sequential or parallel encoding an option.
  • Properly handle the outputs of each sub process, so that their outputs are not garbled in the log.
  • Ensure that any encoding error will cause the entire job to error: thereby preventing silent errors.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

I think there are plenty of ways to do this with the stdlib, instead of using a heavy dependency like dask. One idea might be something like concurrent.futures.ProcessPoolExecutor to launch and monitor the subprocesses.

One place you could easily introduce this parallelism is in the transform directory function. You could imagine building a list of transform specifications (file names, ffmpeg args) while crawling over the input directory, and then at the end dispatching those transforms in a parallel way if the job requests parallel encoding.

I also think doing this on a single node would be a fine place to start.

@jwong-nd jwong-nd self-assigned this Nov 4, 2024
@jwong-nd
Copy link
Contributor

jwong-nd commented Nov 4, 2024

#16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants