Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295

giuliabaldini · 2024-11-15T08:26:51Z

Hi there,

this PR has the changes requested in #974. I unfortunately don't have a system where I can test this myself, but I have been testing it with other people on a cluster that has multiple GPUs.

The only problem is that I think that the fix at llama.py:1694 does not seem to work, as we are still getting the error. So to make it run we have actually removed this check. Any ideas of how to fix that? Is it problematic to remove that check there?

@hife-ai @Datta0 @Sehyo

danielhanchen · 2024-11-17T09:02:13Z

Will re-investigate this - apologies on the delay!

Datta0 · 2024-11-17T16:12:42Z

Btw just thinking out loud (or thinking as written text)
Should we consolidate all these multi GPU errors into a single function? rn I see there's check_nvidia and the other part of code in from_pretrained.

giuliabaldini · 2024-11-18T08:09:06Z

@Datta0, yeah, I definitely agree. However I am not incredibly familiar with patching functions this way, wouldn't the function have to be part of all the patched code, meaning that we have to rewrite it every time?

Peter-Fy · 2024-11-19T07:55:10Z

I tried deleting the check code in tokenizer_utils.py and llama.py, but I’m still getting the following error:

Traceback (most recent call last):

  File "/home/fdf/dpo_finetune.py", line 116, in <module>

    main()

  File "/home/fdf/dpo_finetune.py", line 108, in main

    trainer.train()

  File "<string>", line 40, in train

RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

However, I don’t know which line in unsloth triggered this error, so I can’t proceed to delete the check code further.

Sehyo · 2024-11-19T07:59:31Z

Hi guys, I have been a bit busy. I can submit version with all the fixes on either thursday or friday, have a hectic schedule until then.

giuliabaldini · 2024-11-19T09:05:33Z

Hi @Peter-Fy, did you try to install unsloth from this PR branch? Do you still get the error?

Peter-Fy · 2024-11-21T05:37:12Z

Hi @Peter-Fy, did you try to install unsloth from this PR branch? Do you still get the error?

Yes, I install unsloth from this PR branch, but I still get the error like:

Traceback (most recent call last):
  File "/home/fdf/qlora_finetune.py", line 133, in <module>
    main()
  File "/home/fdf/qlora_finetune.py", line 125, in main
    trainer.train()
  File "<string>", line 39, in train
RuntimeError: tokenizer_utils.py:971 Unsloth currently does not support multi GPU setups - but we are working on it!

So I delete the check code in tokenizer_utils.py:971, but I get another error like:

Traceback (most recent call last):
  File "/home/fdf/qlora_finetune.py", line 133, in <module>
    main()
  File "/home/fdf/qlora_finetune.py", line 125, in main
    trainer.train()
  File "<string>", line 40, in train
RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

Peter-Fy · 2024-11-21T05:38:34Z

Hi guys, I have been a bit busy. I can submit version with all the fixes on either thursday or friday, have a hectic schedule until then.

That will be helpful, looking forward to your fixes.

giuliabaldini · 2024-11-29T07:42:43Z

@Peter-Fy sorry, I did not see the full conversion. Are you using any vision models?

giuliabaldini and others added 3 commits November 13, 2024 10:56

Add check for multiple GPUs in check_nvidia

2df7d85

Merge branch 'unslothai:main' into main

dabcc6b

Add GPU ID checks to make the multi gpu error less sensitive

2569ceb

giuliabaldini and others added 2 commits December 5, 2024 17:13

Merge branch 'unslothai:main' into main

cfc325a

Update line numbers

5f920ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295

Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295

giuliabaldini commented Nov 15, 2024

danielhanchen commented Nov 17, 2024

Datta0 commented Nov 17, 2024

giuliabaldini commented Nov 18, 2024

Peter-Fy commented Nov 19, 2024

Sehyo commented Nov 19, 2024

giuliabaldini commented Nov 19, 2024

Peter-Fy commented Nov 21, 2024

Peter-Fy commented Nov 21, 2024

giuliabaldini commented Nov 29, 2024

Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295

Are you sure you want to change the base?

Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295

Conversation

giuliabaldini commented Nov 15, 2024

danielhanchen commented Nov 17, 2024

Datta0 commented Nov 17, 2024

giuliabaldini commented Nov 18, 2024

Peter-Fy commented Nov 19, 2024

Sehyo commented Nov 19, 2024

giuliabaldini commented Nov 19, 2024

Peter-Fy commented Nov 21, 2024

Peter-Fy commented Nov 21, 2024

giuliabaldini commented Nov 29, 2024