Running on CPU for models that exceed the GPU memory - Soaking up the available cores #1032

Speedway1 · 2024-01-02T23:22:21Z

Speedway1
Jan 2, 2024

Has anyone done fine tuning on CPU instead of GPU? If you prefix the "accelerate" command with:
CUDA_VISIBLE_DEVICES=""

then it will run using CPU. For example:
CUDA_VISIBLE_DEVICES="" accelerate launch -m axolotl.cli.train config_lora_solar.yml

However, unlike other systems, it doesn't seem use all the available CPUs / cores, it looks like it uni-processes. Has anyone been able to make Axoloti use all the available CPU horsepower?

Why run on CPU?

One word: RAM

For models that exceed the available local GPU RAM (nearly all models >7B parameters do), the only way to train locally is via CPU as the technology to split between GPU and CPU is still very young and doesn't work too well.

If you are not in a rush, and you have a powerful set of CPUs with lots of cores, then running on CPU is perfectly feasible (and in the case of Apple, it can be equivalent to running on GPU). But the big advantage is that unlike GPU memory, CPU memory can be expanded and is (relatively) cheap. And the single biggest challenge with self-hosted training is memory.

For quick light jobs, commodity GPUs are the answer (as well as for inference). But for training that's good to run in the slower lane, CPU is a great alternative.

The problem that we have is making the best use of the available threads. I don't see how Axoloti does this other than with the data_processes parameter that is only used in pre-training, not the actual training.

Anyone had any experience/ideas on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on CPU for models that exceed the GPU memory - Soaking up the available cores #1032

{{title}}

Replies: 0 comments

Select a reply

Running on CPU for models that exceed the GPU memory - Soaking up the available cores #1032

Speedway1 Jan 2, 2024

Replies: 0 comments

Speedway1
Jan 2, 2024