-
-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using two 8xH100 nodes to train. encounter error bf16 requested, but AMP is not supported on this GPU. Requires Ampere series or above. #1924
Comments
the same settings used in Regular training, works. |
settings in accelerate: |
this is the snippet for multinode slave settings: |
I recommend not using the accelerate config and removing that file. axolotl handles much of that automatically. See https://axolotlai.substack.com/p/fine-tuning-llama-31b-waxolotl-on |
ok, is it the accelerate config causing the issue? |
Often, it is |
we tried that still same issue, also went through https://axolotlai.substack.com/p/fine-tuning-llama-31b-waxolotl-on this requires axolot cloud, Im using my own two 8xh100 clusters. any scripts that work? |
@michaellin99999 , hey! From my understanding, those scripts should work for any systems as Lambda just provides bare compute. Can you let us know if you still get this issue and how we can help solve it? |
Please check that this issue hasn't been reported before.
Expected Behavior
This issue should not occur, as H100 definitely supports bf16.
Current behaviour
outputs error: Value error, bf16 requested, but AMP is not supported on this GPU. Requires Ampere series or above.
Steps to reproduce
run the script https://github.com/axolotl-ai-cloud/axolotl/blob/main/docs/multi-node.qmd
Config yaml
Possible solution
no idea what is causing this issue.
Which Operating Systems are you using?
Python Version
3.11.9
axolotl branch-commit
none
Acknowledgements
The text was updated successfully, but these errors were encountered: