feat: Publish/load pre-quantized models #34

carlosgjs · 2024-02-01T00:12:50Z

Running quantized models significantly reduces the GPU memory required for inference. Instead of downloading the full model and quantizing it during the load, we can quantize the model offline and save it. At runtime the (smaller) quantized model can be loaded.

This PR includes 3 changes:

Updates to the bitsandbytes and transformers versions which support quantizing
A notebook for quantizing and publishing models
Logic to map known models to their pre-quantized versions.

Closes #4
Closes #8

codecov-commenter · 2024-02-01T00:16:33Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (e7c86f5) 97.32% compared to head (c864e8c) 97.85%.

Files	Patch %	Lines
src/autora/doc/runtime/predict_hf.py	93.75%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #34      +/-   ##
==========================================
+ Coverage   97.32%   97.85%   +0.53%     
==========================================
  Files           5        5              
  Lines         224      233       +9     
==========================================
+ Hits          218      228      +10     
+ Misses          6        5       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

uwcdc

Those were the only 2 things that I had questions about. But I love the updated formatting in the other files.

notebooks/import_model.ipynb

uwcdc

Overall good. Just a couple small fixes + a compliment.

notebooks/generate.ipynb

notebooks/import_model.ipynb

src/autora/doc/runtime/predict_hf.py

uwcdc

LGTM!

carlosgjs and others added 3 commits January 29, 2024 15:17

feat: Import/save models

d05c094

update deps

35dcbec

redirect to quantized model

2a98332

carlosgjs and others added 3 commits February 1, 2024 21:00

handle config for pre-quantized models

d5c5542

fix formatting

90c1099

unit tests

cc6bf7e

carlosgjs requested review from lsetiawan, anujsinha3 and uwcdc February 1, 2024 21:43

carlosgjs added 2 commits February 1, 2024 13:47

install cuda during testing

a12d245

Merge branch 'main' into carlosg/savemodel

416f11f

uwcdc reviewed Feb 1, 2024

View reviewed changes

notebooks/import_model.ipynb Show resolved Hide resolved

notebooks/import_model.ipynb Outdated Show resolved Hide resolved

carlosgjs and others added 2 commits February 2, 2024 10:08

Change quantized model path

b7ea55f

update notebook

90d2995

carlosgjs requested a review from uwcdc February 2, 2024 19:01

carlosgjs marked this pull request as ready for review February 2, 2024 19:01

carlosgjs added 4 commits February 2, 2024 11:56

formatting

4c052bd

remove huggingface login reqs from notebook

a169f53

small instruction tweak in notebook

be86df4

Save output as unicode

5dc1b91

uwcdc reviewed Feb 5, 2024

View reviewed changes

notebooks/generate.ipynb Show resolved Hide resolved

notebooks/import_model.ipynb Outdated Show resolved Hide resolved

src/autora/doc/runtime/predict_hf.py Show resolved Hide resolved

better HF auth

c864e8c

uwcdc approved these changes Feb 5, 2024

View reviewed changes

carlosgjs merged commit 7891902 into main Feb 5, 2024
9 checks passed

carlosgjs deleted the carlosg/savemodel branch February 5, 2024 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Publish/load pre-quantized models #34

feat: Publish/load pre-quantized models #34

carlosgjs commented Feb 1, 2024 •

edited

Loading

codecov-commenter commented Feb 1, 2024 •

edited

Loading

uwcdc left a comment

uwcdc left a comment

uwcdc left a comment

feat: Publish/load pre-quantized models #34

feat: Publish/load pre-quantized models #34

Conversation

carlosgjs commented Feb 1, 2024 • edited Loading

codecov-commenter commented Feb 1, 2024 • edited Loading

Codecov Report

uwcdc left a comment

Choose a reason for hiding this comment

uwcdc left a comment

Choose a reason for hiding this comment

uwcdc left a comment

Choose a reason for hiding this comment

carlosgjs commented Feb 1, 2024 •

edited

Loading

codecov-commenter commented Feb 1, 2024 •

edited

Loading