fix: Deprecation Warnings in AutoCast API #113

Abhishek-TAMU · 2024-11-27T18:11:07Z

This PR fixes issue: #107

Changes:
Modified the autocasting decorator to @torch.amp.custom_fwd and @torch.amp.custom_bwd for autograd function.

Signed-off-by: Abhishek <[email protected]>

fabianlim · 2024-11-27T23:59:39Z

@Abhishek-TAMU it looks good in general.

I think it needs a lint, try tox -e fmt,lint in the accelerated-peft and fused-ops-and-kernels folders.

Also will be good to run a local bench test to see if there is any regeression in the performance (highly doubt)

to do this you run tox -e run-benches and then run

    PYTHONPATH=. \
    python scripts/compare_with_reference.py \
	--result_dir $RESULT_DIR \
	--reference_benchmark_filepath scripts/benchmarks/refs/a100_80gb.csv \
	--indices \
	framework_config model_name_or_path \
        num_gpus per_device_train_batch_size

If you want to make it faster you can just uncomment the other models and leave this line

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU · 2024-11-28T05:41:48Z

Thank you @fabianlim for guidance.

I fixed fmt and lint issue. I also ran tox -e run-benches and file compare_with_reference.py using the command.
There is no outlier.csv, in benchmark_outputs folder. So I assume that's how you must be checking of any outlier and decide if new code changes affects anything.

Abhishek-TAMU · 2024-11-28T05:44:33Z

Also for running tox -e run-benches I had to add datasets and trl in tox.ini section [testenv:run-benches].
If this sounds good then I can push this change also.

fabianlim · 2024-11-28T06:06:34Z

it should not be the case befcause it should come in with fms-hf-tuning. Also i think something is wrong with the bench, it should take awhile to run (few hours). can you open your benchmarks.csv and see what is inside.

Abhishek-TAMU · 2024-11-28T07:21:31Z

benchmarks.csv is empty with below columns, after running tox -e run-benches and compare_with_reference.py :

framework_config,mem_nvidia_mem_reserved,model_name_or_path,num_gpus,peft_method,per_device_train_batch_size,torch_dtype

Also I commented out this line and next line and just left model TheBloke/Mistral-7B-v0.1-GPTQ

fabianlim · 2024-11-28T07:40:01Z

@Abhishek-TAMU you need to inspect the benchmark_outputs folder and see what failed. If benchmarks.csv is empty it means everything failed.

Abhishek-TAMU · 2024-11-29T20:27:04Z

Thanks for the input Fabian. Sharing benchmark_output folder here. There doesn't seem to be any regression in the performance.

fabianlim · 2024-11-30T03:39:50Z

@Abhishek-TAMU lookin at your benches, the train loss seems to be a bit higher, can you take a quick look

maybe what you can do is take one example, revert the change, then run it again and see if you get the old loss
if after revert, then you can try to revert back to old versions, and see if its dependent on that

Abhishek-TAMU · 2024-12-02T03:27:38Z

@fabianlim Running benchmark from code from main branch (without my changes) gives this train loss plot: (Other plots are same). Do you think this is a significant change in train loss compared to train loss with my code changes ?

fabianlim · 2024-12-02T03:41:21Z

@Abhishek-TAMU ic.. ok then its due to variation . i approve. Also did you verify that the warning messages went away?

fabianlim

LGTM and there are no regressions

Abhishek-TAMU · 2024-12-02T04:51:56Z

Also did you verify that the warning messages went away?

Yes, I checked it by running fms-hf-tuning with below arguments in config and did not get those warnings again after the code changes in the PR.

"auto_gptq": ["triton_v2"],
"fp16": true,
"torch_dtype": "float16",
"fast_kernels": [true, true, true]

Abhishek-TAMU added 5 commits November 26, 2024 20:03

fix: warning of torch.amp.custom_bwd

745ba88

Signed-off-by: Abhishek <[email protected]>

fix: warning of torch.amp.custom_bwd

560b95b

Signed-off-by: Abhishek <[email protected]>

fix: warning of torch.amp.custom_bwd

ea7e3fa

Signed-off-by: Abhishek <[email protected]>

fix: warning of torch.amp.custom_bwd

fb4c557

Signed-off-by: Abhishek <[email protected]>

fix: warning of torch.amp.custom_bwd

b188b64

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU requested a review from fabianlim as a code owner November 27, 2024 18:11

Abhishek-TAMU mentioned this pull request Nov 27, 2024

FIx Deprecation Warnings in AutoCast API #107

Closed

Abhishek-TAMU added 2 commits November 27, 2024 23:50

fix: fmt, lint

a44f96f

Signed-off-by: Abhishek <[email protected]>

fix: warning of torch.amp.custom_bwd

149eaaa

Signed-off-by: Abhishek <[email protected]>

fabianlim approved these changes Dec 2, 2024

View reviewed changes

fabianlim merged commit c70ffe0 into foundation-model-stack:main Dec 2, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Deprecation Warnings in AutoCast API #113

fix: Deprecation Warnings in AutoCast API #113

Abhishek-TAMU commented Nov 27, 2024 •

edited by fabianlim

Loading

fabianlim commented Nov 27, 2024 •

edited

Loading

Abhishek-TAMU commented Nov 28, 2024

Abhishek-TAMU commented Nov 28, 2024

fabianlim commented Nov 28, 2024 •

edited

Loading

Abhishek-TAMU commented Nov 28, 2024 •

edited

Loading

fabianlim commented Nov 28, 2024 •

edited

Loading

Abhishek-TAMU commented Nov 29, 2024

fabianlim commented Nov 30, 2024 •

edited

Loading

Abhishek-TAMU commented Dec 2, 2024

fabianlim commented Dec 2, 2024 •

edited

Loading

fabianlim left a comment

Abhishek-TAMU commented Dec 2, 2024

fix: Deprecation Warnings in AutoCast API #113

fix: Deprecation Warnings in AutoCast API #113

Conversation

Abhishek-TAMU commented Nov 27, 2024 • edited by fabianlim Loading

fabianlim commented Nov 27, 2024 • edited Loading

Abhishek-TAMU commented Nov 28, 2024

Abhishek-TAMU commented Nov 28, 2024

fabianlim commented Nov 28, 2024 • edited Loading

Abhishek-TAMU commented Nov 28, 2024 • edited Loading

fabianlim commented Nov 28, 2024 • edited Loading

Abhishek-TAMU commented Nov 29, 2024

fabianlim commented Nov 30, 2024 • edited Loading

Abhishek-TAMU commented Dec 2, 2024

fabianlim commented Dec 2, 2024 • edited Loading

fabianlim left a comment

Choose a reason for hiding this comment

Abhishek-TAMU commented Dec 2, 2024

Abhishek-TAMU commented Nov 27, 2024 •

edited by fabianlim

Loading

fabianlim commented Nov 27, 2024 •

edited

Loading

fabianlim commented Nov 28, 2024 •

edited

Loading

Abhishek-TAMU commented Nov 28, 2024 •

edited

Loading

fabianlim commented Nov 28, 2024 •

edited

Loading

fabianlim commented Nov 30, 2024 •

edited

Loading

fabianlim commented Dec 2, 2024 •

edited

Loading