Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial weight loading for reduced RAM utilization #5626

Open
kushalpatil07 opened this issue Sep 25, 2024 · 1 comment
Open

Partial weight loading for reduced RAM utilization #5626

kushalpatil07 opened this issue Sep 25, 2024 · 1 comment
Assignees
Labels
feature A request for a proper, new feature. module: runtime Issues related to core runtime triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@kushalpatil07
Copy link

🚀 The feature, motivation and pitch

I'm trying to use Executorch for running LLM's on low powered(low RAM) devices. I would like all the weights to not be loaded into RAM when the model is loaded initially. As and when the forward pass is happening new weights to be loaded and the model to be executed to improve RAM utilization.

This can be further extended by predicting which sparse weights to load as described in Apple's paper LLM in a flash, which has seen a inference speed-up along with less RAM usage
Link here
@iseeyuan

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

@Olivia-liu Olivia-liu added feature A request for a proper, new feature. module: runtime Issues related to core runtime triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Sep 25, 2024
@a8nova
Copy link

a8nova commented Oct 4, 2024

Hi, If no one is working on this, i can contribute to this task. Please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: runtime Issues related to core runtime triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants