Finetuning 2b out of memory on Kaggle T4 x 2 #54

yudataguy · 2024-10-23T00:15:57Z

I'm following colabs/fine_tuning_tutorial.ipynb, but still ran out of memory on step
params = params_lib.load_and_format_params(ckpt_path)

error message:

E1023 00:10:32.928098      30 pjrt_stream_executor_client.cc:2809] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 1049100288 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
             parameter allocation: 1000.50MiB
              constant allocation:         0B
        maybe_live_out allocation: 1000.50MiB
     preallocated temp allocation:         0B
                 total allocation:    1.95GiB
              total fragmentation:         0B (0.00%)
Peak buffers:
	Buffer 1:
		Size: 1000.50MiB
		Entry Parameter Subshape: bf16[256128,2048]
		==========================

	Buffer 2:
		Size: 1000.50MiB
		XLA Label: fusion
		Shape: bf16[256128,2048]
		==========================

I thought it shouldn't be running out of memory with a 2b model on T4x2. How can I solve this issue?

The text was updated successfully, but these errors were encountered:

Gopi-Uppari · 2024-10-28T09:07:17Z

Hi @yudataguy,

I successfully reproduced the issue on Kaggle T4 x2 GPUs, but the error did not occur when I ran the same code in Google Colab with the v4 runtime (as mentioned in the tutorial notebook). In Kaggle, one GPU’s memory is fully occupied, and the code params = params_lib.load_and_format_params(ckpt_path) does not automatically utilize the free GPU. This suggests a memory allocation management issue. To resolve this and avoid the error, please refer to the solution provided in the code below, as well as the linked gist file for more details.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning 2b out of memory on Kaggle T4 x 2 #54

Finetuning 2b out of memory on Kaggle T4 x 2 #54

yudataguy commented Oct 23, 2024

Gopi-Uppari commented Oct 28, 2024

Finetuning 2b out of memory on Kaggle T4 x 2 #54

Finetuning 2b out of memory on Kaggle T4 x 2 #54

Comments

yudataguy commented Oct 23, 2024

Gopi-Uppari commented Oct 28, 2024