You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Has anyone done fine tuning on CPU instead of GPU? If you prefix the "accelerate" command with:
CUDA_VISIBLE_DEVICES=""
then it will run using CPU. For example:
CUDA_VISIBLE_DEVICES="" accelerate launch -m axolotl.cli.train config_lora_solar.yml
However, unlike other systems, it doesn't seem use all the available CPUs / cores, it looks like it uni-processes. Has anyone been able to make Axoloti use all the available CPU horsepower?
Why run on CPU?
One word: RAM
For models that exceed the available local GPU RAM (nearly all models >7B parameters do), the only way to train locally is via CPU as the technology to split between GPU and CPU is still very young and doesn't work too well.
If you are not in a rush, and you have a powerful set of CPUs with lots of cores, then running on CPU is perfectly feasible (and in the case of Apple, it can be equivalent to running on GPU). But the big advantage is that unlike GPU memory, CPU memory can be expanded and is (relatively) cheap. And the single biggest challenge with self-hosted training is memory.
For quick light jobs, commodity GPUs are the answer (as well as for inference). But for training that's good to run in the slower lane, CPU is a great alternative.
The problem that we have is making the best use of the available threads. I don't see how Axoloti does this other than with the data_processes parameter that is only used in pre-training, not the actual training.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Has anyone done fine tuning on CPU instead of GPU? If you prefix the "accelerate" command with:
CUDA_VISIBLE_DEVICES=""
then it will run using CPU. For example:
CUDA_VISIBLE_DEVICES="" accelerate launch -m axolotl.cli.train config_lora_solar.yml
However, unlike other systems, it doesn't seem use all the available CPUs / cores, it looks like it uni-processes. Has anyone been able to make Axoloti use all the available CPU horsepower?
Why run on CPU?
One word: RAM
For models that exceed the available local GPU RAM (nearly all models >7B parameters do), the only way to train locally is via CPU as the technology to split between GPU and CPU is still very young and doesn't work too well.
If you are not in a rush, and you have a powerful set of CPUs with lots of cores, then running on CPU is perfectly feasible (and in the case of Apple, it can be equivalent to running on GPU). But the big advantage is that unlike GPU memory, CPU memory can be expanded and is (relatively) cheap. And the single biggest challenge with self-hosted training is memory.
For quick light jobs, commodity GPUs are the answer (as well as for inference). But for training that's good to run in the slower lane, CPU is a great alternative.
The problem that we have is making the best use of the available threads. I don't see how Axoloti does this other than with the data_processes parameter that is only used in pre-training, not the actual training.
Anyone had any experience/ideas on this?
Beta Was this translation helpful? Give feedback.
All reactions