Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and enable few ORTModule Unit Tests #19847

Merged
merged 3 commits into from
Mar 12, 2024
Merged

Fix and enable few ORTModule Unit Tests #19847

merged 3 commits into from
Mar 12, 2024

Conversation

pengwa
Copy link
Contributor

@pengwa pengwa commented Mar 11, 2024

Fix and enable few ORTModule Unit Tests

Fix 'test_bert_inputs_with_dynamic_shape' and 'test_bert_result_with_layerwise_recompute' generate Nan loss in ORT run.

The root cause is, the logic to generatic attention mask test data is not correct, only 0 or 1 is allowed in the dataset, but we see lots of other numbers. ( The reason we don't have this using old version of transformers for example v4.4.2 or 4.16.2 is because they don't contains such huggingface/transformers@d3cb288, which increase the scaling to a bigger number, causing a overflow to inf)

Another improvement during the investigation using convergence tools:
Don't dump the activations during model export phase, otherwise, the dumped data might contains some PyTorch run's result making us confused during comparing with stock PyTorch run results.

Motivation and Context

@pengwa pengwa requested a review from askhade March 11, 2024 09:47
@pengwa pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Mar 11, 2024
@pengwa pengwa merged commit 3e954da into main Mar 12, 2024
93 of 95 checks passed
@pengwa pengwa deleted the pengwa/reenable_uts branch March 12, 2024 02:49
@pengwa
Copy link
Contributor Author

pengwa commented Mar 12, 2024

Thanks @baijumeswani !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants