`XLA_DISABLE_FUNCTIONALIZATION=0` with ZeRO-1 diverges for Mistral on NxD #26

michaelbenayoun · 2024-07-17T14:01:08Z

It seems that the loss is not converging or that we OOM depending on the XLA_DISABLE_FUNCTIONALIZATION flag and ZeRO-1.

System info

aws-neuronx-runtime-discovery==2.9
libneuronxla==2.0.2335
neuronx-cc==2.14.213.0+013d129b
neuronx-distributed==0.8.0
torch==2.1.2
torch-neuronx==2.1.2.2.2.0
torch-xla==2.1.3
torchvision==0.16.2

I ran the same training job with 4 settings: XLA_DISABLE_FUNCTIONALIZATION = 0 | 1 and ZeRO-1 enabled / disabled:

`XLA_DISABLE_FUNCTIONALIZATION=0` and ZeRO-1

In this case the loss is diverging.

Note: Since I am using Optimum Neuron, I am not sure if this is my integration of the ZeroRedundancyOptimizer or if it is an actual bug on your end and / or torch_xla.

`XLA_DISABLE_FUNCTIONALIZATION=1` and ZeRO-1

In this case the loss diverges to inf.

`XLA_DISABLE_FUNCTIONALIZATION=0` and regular optimizer

In this case we OOM.

`XLA_DISABLE_FUNCTIONALIZATION=1` and regular optimizer

The loss converges.

The text was updated successfully, but these errors were encountered:

gsnaws · 2024-07-17T20:09:17Z

Hi @michaelbenayoun . Can you please help with a simple reproduction script. It would help narrow down the root cause.

michaelbenayoun · 2024-07-18T08:39:26Z

It is using Optimum Neuron.
You can use install it from sources:

pip install git+https://github.com/huggingface/optimum-neuron.git

Then you can use this script as the basis to test: train_mistral.sh.txt

jeffhataws changed the title ~~XLA_DISABLE_FUNCTIONALIZATION=0 with ZeRO-1 diverges~~ XLA_DISABLE_FUNCTIONALIZATION=0 with ZeRO-1 diverges for Mistral on NxD Jul 22, 2024

aws-taylor added the bug Something isn't working label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`XLA_DISABLE_FUNCTIONALIZATION=0` with ZeRO-1 diverges for Mistral on NxD #26

`XLA_DISABLE_FUNCTIONALIZATION=0` with ZeRO-1 diverges for Mistral on NxD #26

michaelbenayoun commented Jul 17, 2024

gsnaws commented Jul 17, 2024

michaelbenayoun commented Jul 18, 2024

XLA_DISABLE_FUNCTIONALIZATION=0 with ZeRO-1 diverges for Mistral on NxD #26

XLA_DISABLE_FUNCTIONALIZATION=0 with ZeRO-1 diverges for Mistral on NxD #26

Comments

michaelbenayoun commented Jul 17, 2024

System info

XLA_DISABLE_FUNCTIONALIZATION=0 and ZeRO-1

XLA_DISABLE_FUNCTIONALIZATION=1 and ZeRO-1

XLA_DISABLE_FUNCTIONALIZATION=0 and regular optimizer

XLA_DISABLE_FUNCTIONALIZATION=1 and regular optimizer

gsnaws commented Jul 17, 2024

michaelbenayoun commented Jul 18, 2024

`XLA_DISABLE_FUNCTIONALIZATION=0` with ZeRO-1 diverges for Mistral on NxD #26

`XLA_DISABLE_FUNCTIONALIZATION=0` with ZeRO-1 diverges for Mistral on NxD #26

`XLA_DISABLE_FUNCTIONALIZATION=0` and ZeRO-1

`XLA_DISABLE_FUNCTIONALIZATION=1` and ZeRO-1

`XLA_DISABLE_FUNCTIONALIZATION=0` and regular optimizer

`XLA_DISABLE_FUNCTIONALIZATION=1` and regular optimizer