Fix and enable few ORTModule Unit Tests #19847

pengwa · 2024-03-11T09:47:34Z

Fix and enable few ORTModule Unit Tests

Fix 'test_bert_inputs_with_dynamic_shape' and 'test_bert_result_with_layerwise_recompute' generate Nan loss in ORT run.

The root cause is, the logic to generatic attention mask test data is not correct, only 0 or 1 is allowed in the dataset, but we see lots of other numbers. ( The reason we don't have this using old version of transformers for example v4.4.2 or 4.16.2 is because they don't contains such huggingface/transformers@d3cb288, which increase the scaling to a bigger number, causing a overflow to inf)

Another improvement during the investigation using convergence tools:
Don't dump the activations during model export phase, otherwise, the dumped data might contains some PyTorch run's result making us confused during comparing with stock PyTorch run results.

Motivation and Context

pengwa · 2024-03-12T02:49:34Z

Thanks @baijumeswani !

re-enable uts

86683d7

pengwa requested a review from askhade March 11, 2024 09:47

pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Mar 11, 2024

pengwa requested review from jingyanwangms and baijumeswani March 11, 2024 09:48

pengwa added 2 commits March 11, 2024 02:50

fixes

2145da4

minor

1fdad1d

baijumeswani approved these changes Mar 12, 2024

View reviewed changes

pengwa merged commit 3e954da into main Mar 12, 2024
93 of 95 checks passed

pengwa deleted the pengwa/reenable_uts branch March 12, 2024 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and enable few ORTModule Unit Tests #19847

Fix and enable few ORTModule Unit Tests #19847

pengwa commented Mar 11, 2024 •

edited

Loading

pengwa commented Mar 12, 2024

Fix and enable few ORTModule Unit Tests #19847

Fix and enable few ORTModule Unit Tests #19847

Conversation

pengwa commented Mar 11, 2024 • edited Loading