Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task: Batching / Pydra Optimizations #148

Open
6 tasks
wilke0818 opened this issue Aug 15, 2024 · 3 comments
Open
6 tasks

Task: Batching / Pydra Optimizations #148

wilke0818 opened this issue Aug 15, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@wilke0818
Copy link
Collaborator

Description

As the project develops, many of our tools work with lists of Audio objects with the goal to be that they can be optimized into Pydra workflows and have easy to use pipelines, especially for those with minimal experience. Part of this task also comes with understanding how to make this robust and easy to implement such that a user can ask for a simple task and doesn't have to be the one to consider how to batch the audios and split up the audios to optimize for Pydra (i.e. I have 64 audios and a choice between a GPU that can do batch sizes of 8 vs. 8-CPUs/cores that I can thread across).

Tasks

  • Pydra interface within senselab that allows for easy developmental integration (i.e. if you design a function that takes a List of Audios, we might have some logic that could easily take a reference to this function and optimize inputs to it, where our APIs would then use this logic if no user specification is given)
    • Discuss whether a custom interface for this is needed or if Pydra's existing functionality is easy enough to use for our development
    • If this custom interface is needed, design and implement logic for turning existing methods into tasks, splitting data across these methods for parallelization (including when and how to batch them together), and logic for making this choice for users within our existing APIs
    • Benchmark how efficient different solutions are, possibly using different environments like Openmind or Google Colab. This will be important for making choices on when to split across many CPUs vs. when to use a GPU and what batch size might be optimal for a GPU
  • Replace existing interfaces/APIs with this logic under the hood to optimize our existing code
  • Make sure batching/batch size is available across our functionalities and used properly

Freeform Notes

No response

@wilke0818 wilke0818 added the enhancement New feature or request label Aug 15, 2024
@wilke0818
Copy link
Collaborator Author

I had some existing code/ideas when working on SER and trying to make a tutorial for that but before I got to the benchmarking stage I got pulled into some other work that took priority. Was developing from this Colab when I ran into the original Pydra issues that distracted me.
https://colab.research.google.com/drive/1dNR1omKar-weU94PCib3zapV-5Ab-zJK?usp=sharing

@adi611
Copy link

adi611 commented Sep 29, 2024

Hi, I have some experience working with Pydra and would be happy to help with any tasks or subtasks related to it. cc: @wilke0818 @fabiocat93

@fabiocat93
Copy link
Collaborator

Hi, I have some experience working with Pydra and would be happy to help with any tasks or subtasks related to it. cc: @wilke0818 @fabiocat93

Thank you, @adi611! @wilke0818, would you mind outlining the issue you encountered in a simple, reproducible manner so that @adi611 can explore and help with it?

@fabiocat93 fabiocat93 moved this to Backlog in senselab Nov 18, 2024
@fabiocat93 fabiocat93 moved this from Backlog to Needs brainstorming in senselab Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Needs brainstorming
Development

No branches or pull requests

5 participants