Weight Stripping #3207

hgaspar · 2024-06-21T10:39:29Z

Enable creating engines (currently, MXR files, eventually perhaps dynamic objects) without embedding the weights in the engine.

Use cases:
(1) Support compilation for various batch sizes without duplicating the weights.
(2) Support multiple execution configurations with different quantization options (including mixed precision), without necessarily having to embed the weights in all the created engines.
(3) multi-GPU execution may benefit from this also, especially when it comes to creating multiple multiGPU execution configurations (partitions, execution schedules)

Technical considerations:
How do we treat literals?
Perhaps we need to have the MXR files contain the steps required to recreate the literals from the weights' file, and that may require a new type ( finalized lliterals vs future literal or meta-literal)

eddieliao · 2024-07-10T16:01:31Z

Looking to work on this as an extension of weight streaming; do we have a specific format already for a weights file or is that something that needs to be decided?

eddieliao · 2024-07-31T14:24:51Z

List of items for basic proof of concept:

Remove weights from .mxr files
Save weights to separate file(s)
Load in weights during runtime

eddieliao · 2024-08-01T20:20:01Z

Replaced current literals with a fetch_literals dummy instruction that contains no data. Greatly reduces .mxr size, although still need to investigate why read of the model fails.

eddieliao · 2024-08-08T19:25:31Z

Added the ability to write and save weights in the strip_weights pass. Need to figure out how to pass output location to the pass (remove hard-coded location).

simberg-amd · 2024-08-23T19:00:12Z

Fixed issue with writing weights and added test that successfully reads weights from file and adds weight back to MXR file

simberg-amd · 2024-08-23T19:52:42Z

Might look into taking the extra pass out of target.cpp and moving things into write_literals. This way instead of adding the literal first and then removing it, just add the dummy instruction in there and save the weights to the file if want to strip_weights. This would make it so no time is added during compilation and don't have to do MIGRAPHX_COPY_LITERALS{}
Also could put in a check to see if weight file is already available for model compiling, and then don't write weights again for new compiled MXR file.

simberg-amd · 2024-08-28T17:29:46Z

Finished above, going to look into different quantization options.

hgaspar added the enhancement New feature or request label Jun 21, 2024

eddieliao self-assigned this Jul 10, 2024

simberg-amd self-assigned this Aug 23, 2024

simberg-amd linked a pull request Sep 4, 2024 that will close this issue

Weight stripping #3416

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight Stripping #3207

Weight Stripping #3207

hgaspar commented Jun 21, 2024

eddieliao commented Jul 10, 2024

eddieliao commented Jul 31, 2024 •

edited by simberg-amd

Loading

eddieliao commented Aug 1, 2024

eddieliao commented Aug 8, 2024

simberg-amd commented Aug 23, 2024 •

edited

Loading

simberg-amd commented Aug 23, 2024

simberg-amd commented Aug 28, 2024

Weight Stripping #3207

Weight Stripping #3207

Comments

hgaspar commented Jun 21, 2024

eddieliao commented Jul 10, 2024

eddieliao commented Jul 31, 2024 • edited by simberg-amd Loading

eddieliao commented Aug 1, 2024

eddieliao commented Aug 8, 2024

simberg-amd commented Aug 23, 2024 • edited Loading

simberg-amd commented Aug 23, 2024

simberg-amd commented Aug 28, 2024

eddieliao commented Jul 31, 2024 •

edited by simberg-amd

Loading

simberg-amd commented Aug 23, 2024 •

edited

Loading