Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. #21133

Merged
merged 38 commits into from
Jul 17, 2024

Conversation

ranjitshs
Copy link
Contributor

Description

Enablement of onnxruntime for AIX and fixing issues related to big-endian platform.

Motivation and Context

changes in this PR contains:

  1. Enablement code for building onnxruntime on AIX operating system.
  2. while testing the build on AIX, we found issues related to big endian platform . More details about few of those issues can be found in Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

Below are list of files and the description about the change.

  1. cmake/CMakeLists.txt
    [BUILDING on AIX issue] check for "IBMClang" is added for handling -Wno-unused-parameter
  2. cmake/external/onnxruntime_external_deps.cmake
    [BUILDING on AIX issue]Enabling gtest_disable_pthreads for AIX
  3. cmake/onnxruntime.cmake
    [BUILDING on AIX issue]
    o Blocking codes for AIX which generates generated_source.c and further requires some symbol files.
    o Putting NO AIX check for non-supported linker flags like --Xlinker
    o iconv linking
  4. cmake/onnxruntime_framework.cmake
    [BUILDING on AIX issue]Putting NO AIX check for -Wl,-rpath='$ORIGIN'
  5. cmake/onnxruntime_mlas.cmake
    [BUILDING on AIX issue]POWER10 releated macro/function definition .
  6. cmake/onnxruntime_providers_cpu.cmake
    [BUILDING on AIX issue]Putting NO AIX check for non-supported linker flags like --Xlinker
  7. cmake/onnxruntime_unittests.cmake
    [BUILDING on AIX issue]
    o Putting NO AIX check for non-supported linker flags like --Xlinker
    o Adding required libraries for AIX linker under applicatiion like onnxruntime_shared_lib_test ,onnxruntime_logging_apis etc
  8. cmake/patches/flatbuffers/flatbuffers.patch
    [BUILDING on AIX issue] Handling of TypeCode in include/flatbuffers/flatbuffers.h under AIX + clang
  9. onnxruntime/contrib_ops/cpu/murmur_hash3.cc
    [Big endian issue] Byte-Conversion handlling in compute() and getblock() routines
  10. onnxruntime/contrib_ops/cpu/quantization/matmul_nbits_impl.cc
    [Big endian issue] Handling of test failures . Byte swapping for quant_value.
  11. onnxruntime/core/framework/tensorprotoutils.cc
    [Big endian issue]
    Implementation of SetRawDataInTensorProto , ConvertRawDataInTensorProto .
    o SetRawDataInTensorProto : Wrapper for set_raw_data(). Calling ConvertRawDataInTensorProto() in big-endian system
    o ConvertRawDataInTensorProto : function used mainly on big-endian system for byte-swapping of tensor raw_data
  12. onnxruntime/core/framework/tensorprotoutils.h
    [Big endian issue]
    Declaration of SetRawDataInTensorProto, ConvertRawDataInTensorProto
  13. onnxruntime/core/graph/graph.cc
    [Big endian issue]
    o Call ConvertRawDataInTensorProto for SPARSE_TENSOR type
    o Call ConvertRawDataInTensorProto for SaveToOrtFormat
  14. onnxruntime/core/mlas/lib/platform.cpp
    [BUILDING on AIX issue] POWER10 released enablement for AIX
  15. onnxruntime/core/mlas/lib/power/qgemm_kernel_power10.cpp
    [BUILDING on AIX issue]Handling of __vector under AIX+clang
  16. onnxruntime/core/mlas/lib/qgemm.h
    [BUILDING on AIX issue] Adding _AIX flag
  17. onnxruntime/core/mlas/lib/qlmul.cpp
    [BUILDING on AIX issue] Handling of __vector under AIX+clang
  18. onnxruntime/core/optimizer/attention_fusion.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  19. onnxruntime/core/optimizer/compute_optimizer/shared_utils.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  20. onnxruntime/core/optimizer/constant_folding.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  21. onnxruntime/core/optimizer/embed_layer_norm_fusion.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  22. onnxruntime/core/optimizer/nchwc_transformer.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  23. onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  24. onnxruntime/core/optimizer/qdq_transformer/qdq_s8_to_u8.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  25. onnxruntime/core/optimizer/qdq_transformer/s8_to_u8.h
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  26. onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_actions.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  27. onnxruntime/core/optimizer/reshape_fusion.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  28. onnxruntime/core/optimizer/stft_decomposition.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  29. onnxruntime/core/optimizer/transpose_optimization/ort_optimizer_api_impl.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  30. onnxruntime/core/platform/path_lib.h
    [BUILDING on AIX issue] Moving to normal function call, instead of template
  31. onnxruntime/core/platform/posix/env.cc
    [BUILDING on AIX issue]Blocking syscall.h in AIX
  32. onnxruntime/core/session/inference_session.cc
    [Big endian issue] Removing ORT_RETURN_IF_NOT, FLATBUFFERS_LITTLEENDIAN
  33. onnxruntime/test/flatbuffers/flatbuffer_utils_test.cc
    [Big endian issue] Call ConvertRawDataInTensorProto in CreateInitializer and ExternalWriteReadWithLoadInitializers
  34. onnxruntime/test/framework/sparse_kernels_test.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  35. onnxruntime/test/framework/tensorutils_test.cc
    [Big endian issue] Helper method ConvertEndianessForVector and call this from required place.
  36. onnxruntime/test/framework/test_tensor_loader.cc
    o. [BUILDING on AIX issue] Handling of getcwd for AIX
    o. [Big endian issue] Bytes Swapping in run_external_data_test
  37. onnxruntime/test/onnx/main.cc
    [Big endian issue] including for AIX
  38. onnxruntime/test/onnx/tensorprotoutils.cc
    [Big endian issue] Bytes swapping in UnpackTensorWithRawData
  39. onnxruntime/test/optimizer/graph_transform_test.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  40. onnxruntime/test/optimizer/graph_transform_test_builder.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  41. onnxruntime/test/optimizer/graph_transform_test_builder.h
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  42. onnxruntime/test/optimizer/initializer_test.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  43. onnxruntime/test/optimizer/nchwc_optimizer_test.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  44. onnxruntime/test/providers/base_tester.cc
    [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  45. onnxruntime/test/providers/cpu/generator/random_test.cc
    [BUILDING on AIX issue] Adding AIX check in MultinomialGoodCase

@tianleiwu
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

cmake/onnxruntime.cmake Outdated Show resolved Hide resolved
@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@ranjitshs
Copy link
Contributor Author

@tianleiwu
I see one failing orttraining-amd-gpu-ci-pipeline (Linux_Test_ubuntu) , yesterday it was passed. Should we re-run it ?

@tianleiwu
Copy link
Contributor

tianleiwu commented Jul 16, 2024

@tianleiwu I see one failing orttraining-amd-gpu-ci-pipeline (Linux_Test_ubuntu) , yesterday it was passed. Should we re-run it ?

Done.

@tianleiwu tianleiwu requested review from yihonglyu and liqunfu July 16, 2024 23:52
@tianleiwu
Copy link
Contributor

@yihonglyu, @liqunfu please take a look at mlas changes.

@ranjitshs
Copy link
Contributor Author

@tianleiwu I see one failing orttraining-amd-gpu-ci-pipeline (Linux_Test_ubuntu) , yesterday it was passed. Should we re-run it ?

Done.

Still , it's failing. Could you please re-run.

@ranjitshs
Copy link
Contributor Author

thanks @tianleiwu for the approval.

@yihonglyu, @liqunfu We are targeting to merge this PR in upcoming v1.19, so requesting you to complete the review/approval for mlas changes before tentatively feature complete date July 19 . Thank you in Advance.

@tianleiwu tianleiwu merged commit 6c7562b into microsoft:main Jul 17, 2024
85 of 87 checks passed
@snnn
Copy link
Member

snnn commented Jul 17, 2024

What linker is used on AIX? It doesn't support hiding private symbols?

@ranjitshs
Copy link
Contributor Author

What linker is used on AIX? It doesn't support hiding private symbols?

@snnn
We are using AIX linker and AIX linker do support hiding symbols but the process of symbol management is different than Linux. We will explore this.

@ranjitshs
Copy link
Contributor Author

Thanks @tianleiwu @snnn @liqunfu @yihonglyu for providing suggestion/help on this issue/PR.
We will monitor local CI setup and create issue if we see any build/tests failure in main branch.

@ranjitshs ranjitshs deleted the aix-main branch July 18, 2024 06:52
@snnn
Copy link
Member

snnn commented Jul 18, 2024

AIX linker do support hiding symbols but the process of symbol management is different than Linux.

Is there any public document I can take a look?

@ranjitshs
Copy link
Contributor Author

ranjitshs commented Jul 18, 2024

AIX linker do support hiding symbols but the process of symbol management is different than Linux.

Is there any public document I can take a look?

https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command This is AIX linker man page.

For reference, I am adding link.txt generated by cmake for onnxruntime library.
link.txt

In this file, we can see, cmake first creates exports.exp, then this file is passed to linker.

@snnn
Copy link
Member

snnn commented Jul 18, 2024

So https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/gen_def.py should be extended to support this new format.

@ranjitshs
Copy link
Contributor Author

So https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/gen_def.py should be extended to support this new format.

Yes . This script needs some changes for AIX support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants