Task: Batching / Pydra Optimizations #148

wilke0818 · 2024-08-15T17:43:46Z

Description

As the project develops, many of our tools work with lists of Audio objects with the goal to be that they can be optimized into Pydra workflows and have easy to use pipelines, especially for those with minimal experience. Part of this task also comes with understanding how to make this robust and easy to implement such that a user can ask for a simple task and doesn't have to be the one to consider how to batch the audios and split up the audios to optimize for Pydra (i.e. I have 64 audios and a choice between a GPU that can do batch sizes of 8 vs. 8-CPUs/cores that I can thread across).

Tasks

Pydra interface within senselab that allows for easy developmental integration (i.e. if you design a function that takes a List of Audios, we might have some logic that could easily take a reference to this function and optimize inputs to it, where our APIs would then use this logic if no user specification is given)
- Discuss whether a custom interface for this is needed or if Pydra's existing functionality is easy enough to use for our development
- If this custom interface is needed, design and implement logic for turning existing methods into tasks, splitting data across these methods for parallelization (including when and how to batch them together), and logic for making this choice for users within our existing APIs
- Benchmark how efficient different solutions are, possibly using different environments like Openmind or Google Colab. This will be important for making choices on when to split across many CPUs vs. when to use a GPU and what batch size might be optimal for a GPU
Replace existing interfaces/APIs with this logic under the hood to optimize our existing code
Make sure batching/batch size is available across our functionalities and used properly

Freeform Notes

No response

wilke0818 · 2024-08-15T17:46:55Z

I had some existing code/ideas when working on SER and trying to make a tutorial for that but before I got to the benchmarking stage I got pulled into some other work that took priority. Was developing from this Colab when I ran into the original Pydra issues that distracted me.
https://colab.research.google.com/drive/1dNR1omKar-weU94PCib3zapV-5Ab-zJK?usp=sharing

adi611 · 2024-09-29T15:30:54Z

Hi, I have some experience working with Pydra and would be happy to help with any tasks or subtasks related to it. cc: @wilke0818 @fabiocat93

fabiocat93 · 2024-09-29T16:25:42Z

Hi, I have some experience working with Pydra and would be happy to help with any tasks or subtasks related to it. cc: @wilke0818 @fabiocat93

Thank you, @adi611! @wilke0818, would you mind outlining the issue you encountered in a simple, reproducible manner so that @adi611 can explore and help with it?

wilke0818 added the enhancement New feature or request label Aug 15, 2024

wilke0818 assigned 900miles, ibevers, fabiocat93 and wilke0818 Aug 15, 2024

fabiocat93 added this to senselab Nov 14, 2024

fabiocat93 moved this to Backlog in senselab Nov 18, 2024

fabiocat93 moved this from Backlog to Needs brainstorming in senselab Nov 18, 2024

fabiocat93 unassigned 900miles and fabiocat93 Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task: Batching / Pydra Optimizations #148

Task: Batching / Pydra Optimizations #148

wilke0818 commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

adi611 commented Sep 29, 2024

fabiocat93 commented Sep 29, 2024

Task: Batching / Pydra Optimizations #148

Task: Batching / Pydra Optimizations #148

Comments

wilke0818 commented Aug 15, 2024

Description

Tasks

Freeform Notes

wilke0818 commented Aug 15, 2024

adi611 commented Sep 29, 2024

fabiocat93 commented Sep 29, 2024