Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GPU id error when restarting on a different instance #760

Open
maxjeblick opened this issue Jun 21, 2024 · 0 comments
Open

[BUG] GPU id error when restarting on a different instance #760

maxjeblick opened this issue Jun 21, 2024 · 0 comments
Labels
type/bug Bug in code

Comments

@maxjeblick
Copy link
Contributor

🐛 Bug

Hi :D

This is a follow up on #99.

I'm currently using LLM Studio on different instances that have varying number of GPUs. data and output folder are stored persistently and I attach them when starting a new instance.

On a single GPU instance, when I start from a previous experiment that was using gpu id 2, I get the following error:

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

rather than an error like "Gpu id 2 does not exist on this machine, please change your configuration".

It seems that

  • config checks do not account for gpu id mismatch
  • train.py either silently switches to cpu or raises the error above

To Reproduce

  1. Run an experiment on a machine with at least 2 gpus, select gpu 1. Can be default settings.
  2. Restart llm studio, with only gpu 0 as visible device.
  3. Click on New experiment from current experiment.
  4. The following message should appear ImportError: Using bitsandbytes8-bit quantization requires Accelerate:pip install accelerateand the latest version of bitsandbytes:pip install -i https://pypi.org/simple/ bitsandbytes``
  5. If using float16, etc. no error should occur, but model will train on cpu.

LLM Studio version

bc2ff4f

@maxjeblick maxjeblick added the type/bug Bug in code label Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Bug in code
Projects
None yet
Development

No branches or pull requests

1 participant