Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchlib] native_layer_norm_float32 is failing on MacOS CI with significant absolute difference #1470

Closed
justinchuby opened this issue Apr 26, 2024 · 5 comments · Fixed by #1538
Assignees
Labels
broken ci topic: torch_lib Related to the torch/aten function lib in development

Comments

@justinchuby
Copy link
Collaborator

justinchuby commented Apr 26, 2024

Observations: Doesn't fail on other platforms. Started to fail only after the Mac runners on Github were down and brought back up a day ago. All failing tests have 1D single element inputs.

Example https://github.com/microsoft/onnxscript/actions/runs/8840793727/job/24276753517

_ TestOutputConsistencyEagerCPU.test_output_match_opinfo__native_layer_norm_cpu_float32 (inputs="['Tensor<torch.Size([1]), dtype=torch.float32>', (1,), 'Tensor<torch.Size([1]), dtype=torch.float32>', 'Tensor<torch.Size([1]), dtype=torch.float32>', 1e-05]", kwargs='{}', sample_num=8) _
[gw1] darwin -- Python 3.10.11 /Users/runner/work/onnxscript/onnxscript/.nox/test/bin/python
tests/function_libs/torch_lib/ops_test.py:279: in run_test_output_match
    torch.testing.assert_close(
E   AssertionError: Tensor-likes are not close!
E   
E   Mismatched elements: 1 / 1 (100.0%)
E   Greatest absolute difference: 1.154693603515625 at index (0,) (up to 0.00018 allowed)
E   Greatest relative difference: 0.0036514615640044212 at index (0,) (up to 3.7e-05 allowed)

cc @thiagocrepaldi @titaiwangms @xiaowuhu @fatcat-z

@justinchuby justinchuby added topic: torch_lib Related to the torch/aten function lib in development broken ci labels Apr 26, 2024
@justinchuby justinchuby assigned justinchuby and unassigned xiaowuhu and fatcat-z May 3, 2024
@justinchuby
Copy link
Collaborator Author

justinchuby commented May 10, 2024

Pasting the script here for access
native_layer_norm_issue.zip

import os
import onnx
import torch
import numpy as np
import onnxruntime as ort

current_folder = os.path.dirname(os.path.abspath(__file__))
onnx_file_path = current_folder + "/test_mac_issue.onnx"

def prepare_onnx_inputs(input_node_names, torch_tensor):
    onnx_inputs = {}
    onnx_tensor = torch_tensor.detach().cpu().numpy()

    onnx_inputs[input_node_names[0]] = onnx_tensor
    onnx_inputs[input_node_names[1]] = [1, 2, 3]
    onnx_inputs[input_node_names[2]] = onnx_tensor
    onnx_inputs[input_node_names[3]] = onnx_tensor

    return onnx_inputs

onnx_model = onnx.load(onnx_file_path)
onnx.checker.check_model(onnx_model)

providers = ['CPUExecutionProvider']
opt_without_optimization = ort.SessionOptions()
opt_without_optimization.graph_optimization_level = (
    ort.GraphOptimizationLevel.ORT_DISABLE_ALL
)
ort_session_no_optimization = ort.InferenceSession(onnx_file_path, opt_without_optimization, providers=providers)

input_node_names = [n.name for n in ort_session_no_optimization.get_inputs()]
torch_tensor = torch.rand(size=(1, 2, 3), dtype=torch.float32)

all_inputs = prepare_onnx_inputs(input_node_names, torch_tensor)

o_onnx_no_optimization = ort_session_no_optimization.run(None, all_inputs)
torch_output = torch.native_layer_norm(torch_tensor, [1, 2, 3], torch_tensor, torch_tensor, 0.5)


print("Actual ONNX without optimization:")
print (o_onnx_no_optimization)

print("Expected Torch output:")
print(torch_output)

print("==== Compare ORT and Torch results ====")
for i in range(len(o_onnx_no_optimization)):
    np.testing.assert_allclose(o_onnx_no_optimization[i], torch_output[i], rtol=1e-04, atol=1e-03)

print("==== Pass ====")

@justinchuby
Copy link
Collaborator Author

@fatcat-z
Copy link
Contributor

This should be an ORT issue and has filed one: microsoft/onnxruntime#20676

@justinchuby
Copy link
Collaborator Author

Could you help skipping the test for now?

@shubhambhokare1
Copy link
Contributor

#1538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken ci topic: torch_lib Related to the torch/aten function lib in development
Projects
None yet
5 participants