-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracted Subset of AutoGPTQ library into Accelerated-Peft Plugin #48
Merged
fabianlim
merged 20 commits into
foundation-model-stack:main
from
achew010:extracted_autogptq
Jul 15, 2024
Merged
Extracted Subset of AutoGPTQ library into Accelerated-Peft Plugin #48
fabianlim
merged 20 commits into
foundation-model-stack:main
from
achew010:extracted_autogptq
Jul 15, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fabianlim
requested changes
Jul 3, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks quite good overall, but requesting first round of changes
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Show resolved
Hide resolved
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Outdated
Show resolved
Hide resolved
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Outdated
Show resolved
Hide resolved
achew010
force-pushed
the
extracted_autogptq
branch
from
July 4, 2024 06:45
61e0b56
to
b42d401
Compare
fabianlim
reviewed
Jul 4, 2024
@achew010 this needs a formatting, and some bench results |
achew010
force-pushed
the
extracted_autogptq
branch
6 times, most recently
from
July 8, 2024 08:02
dab9a8d
to
0858912
Compare
2 tasks
achew010
force-pushed
the
extracted_autogptq
branch
from
July 11, 2024 06:16
1cf8811
to
1f35ea4
Compare
fabianlim
reviewed
Jul 11, 2024
fabianlim
reviewed
Jul 11, 2024
fabianlim
reviewed
Jul 11, 2024
fabianlim
reviewed
Jul 11, 2024
fabianlim
reviewed
Jul 11, 2024
fabianlim
reviewed
Jul 11, 2024
fabianlim
reviewed
Jul 12, 2024
...used-ops-and-kernels/src/fms_acceleration_foak/fused_ops/unsloth_lora/gptq/triton/kernels.py
Show resolved
Hide resolved
achew010
force-pushed
the
extracted_autogptq
branch
from
July 15, 2024 04:22
04b7817
to
61fe08c
Compare
fabianlim
approved these changes
Jul 15, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved
Merged
fabianlim
added a commit
that referenced
this pull request
Nov 5, 2024
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
fabianlim
added a commit
that referenced
this pull request
Nov 8, 2024
* remove skip on test now #48 is complete Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fix fusedops test Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fix model patching in test Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fix test to tail on input grads Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fix dropout in fused_lora Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fmt + lint Signed-off-by: Yu Chin Fabian Lim <[email protected]> --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR addresses #38 and extracts a subset of GPTQModel, a refactored fork of AutoGPTQ into
fms_acceleration_peft/src/gptqmodel
to do away with the problematic installation of AutoGPTQ.This is because
cudatoolkit
.Additions
src/gptqmodel
containing extracted codegptqmodel
tests/test_gptq_model.py
to ensure the extracted subset maintains the same behaviour as the originalIssues:
Comparing new benchmarks against our current reference
scripts/benchmarks/ref
, we noticed a non-zerolora dropout
will incur some memory overhead that make experiments for large models run out of memory (elaborated in Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50). The comparison tool will pick this difference in experiment result as an outlier but will also flag out the parameter change in the report.Temporary fix to FOAK dequantization triton kernel to only offset if using official AutoGPTQ package.
Benchmarks
There seems to be an improvement to throughput with the new library on FOAK. Comparing the throughput from our reference against the updated benches on Mistral-7B-GPTQ and Llama70B-GPTQ in the table below.
We see similar throughput to previous reference throughput for accelerated-peft-autogptq plugin
We see higher throughput on the FOAK rows
Mistral-7B-GPTQ
Llama2-70B-GPTQ
Mistral-7B-GPTQ
Reference
Updated
Llama70B-GPTQ
Reference
Updated
Unit Tests
Comparison Tool
The tool compares the set of benchmark results against a previous reference. It generates a chart for every metric compared (e.g.
train_loss
,train_tokens_per_second
,mem_alloc
...) as well as a csv file of outliers that are significantly different from the reference.Usage
Chart:
Generally we see the new benchmark results from the extracted gptq package (New axis) match closely with that of the previous benchmark using the official autogptq package (Ref axis).
Table:
In the table below, the values from the
reference
column refer to values seen in previous benchmarks and values from thenew
column refer to values seen in the current benchmark. Outliers will have significant difference between the 2 columns. The outliers seen below are reported outliers due to the OOM issue in #50.Note: Any hyperparameter difference between the new bench results and the reference will be the rightmost columns appended at the back following
reference
andnew
.outlier.csv