Saving GGUF for Ollama: CUDA driver error: out of memory #1389

criogennn · 2024-12-05T21:00:59Z

Is it possible that my video memory is sufficient for training the model but insufficient for saving it in the GGUF format? I have an RTX 3050 with 8 GB of VRAM. I receive the error "CUDA driver error: out of memory" when running:

model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")

Does this error necessarily indicate a lack of memory, or could it mean something else? I would appreciate any assistance.

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-12-12T10:06:47Z

@criogennn Much apologies on the delay - could you try

model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m", maximum_memory_usage = 0.7)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving GGUF for Ollama: CUDA driver error: out of memory #1389

Saving GGUF for Ollama: CUDA driver error: out of memory #1389

criogennn commented Dec 5, 2024

danielhanchen commented Dec 12, 2024

Saving GGUF for Ollama: CUDA driver error: out of memory #1389

Saving GGUF for Ollama: CUDA driver error: out of memory #1389

Comments

criogennn commented Dec 5, 2024

danielhanchen commented Dec 12, 2024