-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weight Stripping #3207
Comments
Looking to work on this as an extension of weight streaming; do we have a specific format already for a weights file or is that something that needs to be decided? |
List of items for basic proof of concept:
|
Replaced current literals with a |
Added the ability to write and save weights in the |
Fixed issue with writing weights and added test that successfully reads weights from file and adds weight back to MXR file |
|
Finished above, going to look into different quantization options. |
Enable creating engines (currently, MXR files, eventually perhaps dynamic objects) without embedding the weights in the engine.
Use cases:
(1) Support compilation for various batch sizes without duplicating the weights.
(2) Support multiple execution configurations with different quantization options (including mixed precision), without necessarily having to embed the weights in all the created engines.
(3) multi-GPU execution may benefit from this also, especially when it comes to creating multiple multiGPU execution configurations (partitions, execution schedules)
Technical considerations:
How do we treat literals?
Perhaps we need to have the MXR files contain the steps required to recreate the literals from the weights' file, and that may require a new type ( finalized lliterals vs future literal or meta-literal)
The text was updated successfully, but these errors were encountered: