-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove dnf update from docker build scripts (#17551)
### Description 1. Remove 'dnf update' from docker build scripts, because it upgrades TRT packages from CUDA 11.x to CUDA 12.x. To reproduce it, you can run the following commands in a CentOS CUDA 11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8. ``` export v=8.6.1.6-1.cuda11.8 dnf install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v} libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v} dnf update -y ``` The last command will generate the following outputs: ``` ======================================================================================================================== Package Architecture Version Repository Size ======================================================================================================================== Upgrading: libnvinfer-devel x86_64 8.6.1.6-1.cuda12.0 cuda 542 M libnvinfer-headers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 118 k libnvinfer-headers-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 14 k libnvinfer-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 13 M libnvinfer-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 13 M libnvinfer-vc-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 107 k libnvinfer-vc-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 251 k libnvinfer8 x86_64 8.6.1.6-1.cuda12.0 cuda 543 M libnvonnxparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 467 k libnvonnxparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 757 k libnvparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 2.0 M libnvparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 854 k Installing dependencies: cuda-toolkit-12-0-config-common noarch 12.0.146-1 cuda 7.7 k cuda-toolkit-12-config-common noarch 12.2.140-1 cuda 7.9 k libcublas-12-0 x86_64 12.0.2.224-1 cuda 361 M libcublas-devel-12-0 x86_64 12.0.2.224-1 cuda 397 M Transaction Summary ======================================================================================================================== ``` As you can see from the output, they are CUDA 12 packages. The problem can also be solved by lock the packages' versions by using "dnf versionlock" command right after installing the CUDA/TRT packages. However, going forward, to get the better reproducibility, I suggest manually fix dnf package versions in the installation scripts like we do for TRT now. ```bash v="8.6.1.6-1.cuda11.8" &&\ yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\ yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\ libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v} ``` When we have a need to upgrade a package due to security alert or some other reasons, we manually change the version string instead of relying on "dnf update". Though this approach increases efforts, it can make our pipeines more stable. 2. Move python test to docker ### Motivation and Context Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x and the result package is totally not usable(crashes every time)
- Loading branch information
Showing
32 changed files
with
351 additions
and
244 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
117 changes: 117 additions & 0 deletions
117
tools/ci_build/github/azure-pipelines/templates/py-packaging-linux-test-cpu.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
parameters: | ||
- name: arch | ||
type: string | ||
|
||
- name: base_image | ||
type: string | ||
|
||
- name: devtoolset_rootpath | ||
type: string | ||
|
||
- name: ld_library_path_arg | ||
type: string | ||
|
||
- name: prepend_path | ||
type: string | ||
|
||
- name: machine_pool | ||
type: string | ||
|
||
- name: extra_job_id | ||
type: string | ||
default: '' | ||
|
||
- name: python_wheel_suffix | ||
type: string | ||
default: '' | ||
|
||
|
||
# TODO: Ideally it should fetch information from the build that triggers it | ||
- name: cmake_build_type | ||
type: string | ||
default: 'Release' | ||
values: | ||
- Debug | ||
- Release | ||
- RelWithDebInfo | ||
- MinSizeRel | ||
|
||
- name: timeout | ||
type: number | ||
default: 120 | ||
|
||
jobs: | ||
- job: Linux_Test_CPU${{ parameters.extra_job_id }}_${{ parameters.arch }} | ||
timeoutInMinutes: ${{ parameters.timeout }} | ||
variables: | ||
skipComponentGovernanceDetection: true | ||
workspace: | ||
clean: all | ||
pool: ${{ parameters.machine_pool }} | ||
steps: | ||
- checkout: self | ||
clean: true | ||
submodules: none | ||
# The public ADO project | ||
- ${{ if eq(variables['System.CollectionId'], 'f3ad12f2-e480-4533-baf2-635c95467d29') }}: | ||
- download: current # pipeline resource identifier. | ||
artifact: 'drop-linux-cpu-${{ parameters.arch }}' | ||
|
||
- download: current # pipeline resource identifier. | ||
artifact: 'onnxruntime${{ parameters.python_wheel_suffix }}' | ||
|
||
- bash: | | ||
set -e -x | ||
mv "$(Pipeline.Workspace)/drop-linux-cpu-${{ parameters.arch }}" $(Build.BinariesDirectory)/${{parameters.cmake_build_type}} | ||
mv "$(Pipeline.Workspace)/onnxruntime${{ parameters.python_wheel_suffix }}" "$(Build.BinariesDirectory)/whl" | ||
cp -r "$(Build.BinariesDirectory)/whl" $(Build.BinariesDirectory)/tmp | ||
find "$(Build.BinariesDirectory)/tmp" -name '*.whl' -exec bash -c 'unzip -d "${1%.*}" "$1"' _ {} \; | ||
# The private ADO project | ||
- ${{ if eq(variables['System.CollectionId'], 'bc038106-a83b-4dab-9dd3-5a41bc58f34c') }}: | ||
- download: build # pipeline resource identifier. | ||
artifact: 'drop-linux-cpu-${{ parameters.arch }}' | ||
|
||
- download: build # pipeline resource identifier. | ||
artifact: 'onnxruntime${{ parameters.python_wheel_suffix }}' | ||
|
||
- bash: | | ||
set -e -x | ||
ls $(Pipeline.Workspace)/build | ||
mv "$(Pipeline.Workspace)/build/drop-linux-cpu-${{ parameters.arch }}" $(Build.BinariesDirectory)/${{parameters.cmake_build_type}} | ||
mv "$(Pipeline.Workspace)/build/onnxruntime${{ parameters.python_wheel_suffix }}" "$(Build.BinariesDirectory)/whl" | ||
cp -r "$(Build.BinariesDirectory)/whl" $(Build.BinariesDirectory)/tmp | ||
find "$(Build.BinariesDirectory)/tmp" -name '*.whl' -exec bash -c 'unzip -d "${1%.*}" "$1"' _ {} \; | ||
# The BinSkim task uses a dotnet program which doesn't support ARM CPUs yet | ||
- ${{ if eq(parameters.arch, 'x86_64') }}: | ||
- task: BinSkim@4 | ||
displayName: 'Run BinSkim' | ||
inputs: | ||
AnalyzeTargetGlob: '$(Build.BinariesDirectory)/tmp/**/*.so' | ||
continueOnError: true | ||
|
||
#- task: PostAnalysis@2 | ||
# inputs: | ||
# GdnBreakAllTools: true | ||
# GdnBreakPolicy: M365 | ||
# GdnBreakPolicyMinSev: Error | ||
|
||
- template: get-docker-image-steps.yml | ||
parameters: | ||
Dockerfile: tools/ci_build/github/linux/docker/inference/x64/python/cpu/Dockerfile.manylinux2_28_cpu | ||
Context: tools/ci_build/github/linux/docker/inference/x64/python/cpu | ||
DockerBuildArgs: "--build-arg POLICY=manylinux_2_28 --build-arg BUILD_UID=$( id -u ) --build-arg BASEIMAGE=${{ parameters.base_image }} --build-arg PLATFORM=${{ parameters.arch }} --build-arg PREPEND_PATH=${{ parameters.prepend_path }} --build-arg LD_LIBRARY_PATH_ARG=${{ parameters.ld_library_path_arg }} --build-arg DEVTOOLSET_ROOTPATH=${{ parameters.devtoolset_rootpath }}" | ||
Repository: onnxruntimecpubuildpython${{ parameters.arch }} | ||
${{ if eq(parameters.arch, 'aarch64') }}: | ||
UpdateDepsTxt: false | ||
|
||
- task: Bash@3 | ||
displayName: 'Bash Script' | ||
inputs: | ||
targetType: filePath | ||
filePath: tools/ci_build/github/linux/run_python_dockertest.sh | ||
arguments: -d CPU -c ${{parameters.cmake_build_type}} -i onnxruntimecpubuildpython${{ parameters.arch }} | ||
|
||
- task: mspremier.PostBuildCleanup.PostBuildCleanup-task.PostBuildCleanup@3 | ||
displayName: 'Clean Agent Directories' | ||
condition: always() |
98 changes: 98 additions & 0 deletions
98
tools/ci_build/github/azure-pipelines/templates/py-packaging-linux-test-cuda.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
parameters: | ||
- name: arch | ||
type: string | ||
|
||
- name: device | ||
type: string | ||
values: | ||
- CPU | ||
- GPU | ||
|
||
- name: machine_pool | ||
type: string | ||
|
||
- name: extra_job_id | ||
type: string | ||
default: '' | ||
|
||
- name: python_wheel_suffix | ||
type: string | ||
default: '' | ||
|
||
|
||
# TODO: Ideally it should fetch information from the build that triggers it | ||
- name: cmake_build_type | ||
type: string | ||
default: 'Release' | ||
values: | ||
- Debug | ||
- Release | ||
- RelWithDebInfo | ||
- MinSizeRel | ||
|
||
- name: timeout | ||
type: number | ||
default: 120 | ||
|
||
jobs: | ||
- job: Linux_Test_GPU${{ parameters.extra_job_id }}_${{ parameters.arch }} | ||
timeoutInMinutes: ${{ parameters.timeout }} | ||
variables: | ||
skipComponentGovernanceDetection: true | ||
workspace: | ||
clean: all | ||
pool: ${{ parameters.machine_pool }} | ||
steps: | ||
- checkout: self | ||
clean: true | ||
submodules: none | ||
# The public ADO project | ||
# - ${{ if eq(variables['System.CollectionId'], 'f3ad12f2-e480-4533-baf2-635c95467d29') }}: | ||
|
||
# The private ADO project | ||
- ${{ if eq(variables['System.CollectionId'], 'bc038106-a83b-4dab-9dd3-5a41bc58f34c') }}: | ||
- download: build # pipeline resource identifier. | ||
artifact: 'drop-linux-gpu-${{ parameters.arch }}' | ||
|
||
- download: build # pipeline resource identifier. | ||
artifact: 'onnxruntime${{ parameters.python_wheel_suffix }}' | ||
|
||
- bash: | | ||
set -e -x | ||
ls $(Pipeline.Workspace)/build | ||
mv "$(Pipeline.Workspace)/build/drop-linux-gpu-${{ parameters.arch }}" $(Build.BinariesDirectory)/${{parameters.cmake_build_type}} | ||
mv "$(Pipeline.Workspace)/build/onnxruntime${{ parameters.python_wheel_suffix }}" "$(Build.BinariesDirectory)/whl" | ||
cp -r "$(Build.BinariesDirectory)/whl" $(Build.BinariesDirectory)/tmp | ||
find "$(Build.BinariesDirectory)/tmp" -name '*.whl' -exec bash -c 'unzip -d "${1%.*}" "$1"' _ {} \; | ||
# The BinSkim task uses a dotnet program which doesn't support ARM CPUs yet | ||
- ${{ if eq(parameters.arch, 'x86_64') }}: | ||
- task: BinSkim@4 | ||
displayName: 'Run BinSkim' | ||
inputs: | ||
AnalyzeTargetGlob: '$(Build.BinariesDirectory)/tmp/**/*.so' | ||
continueOnError: true | ||
|
||
#- task: PostAnalysis@2 | ||
# inputs: | ||
# GdnBreakAllTools: true | ||
# GdnBreakPolicy: M365 | ||
# GdnBreakPolicyMinSev: Error | ||
|
||
- template: get-docker-image-steps.yml | ||
parameters: | ||
Dockerfile: tools/ci_build/github/linux/docker/Dockerfile.manylinux2_28_cuda11_8_tensorrt8_6 | ||
Context: tools/ci_build/github/linux/docker | ||
DockerBuildArgs: "--network=host --build-arg POLICY=manylinux_2_28 --build-arg PLATFORM=x86_64 --build-arg PREPEND_PATH=/usr/local/cuda/bin --build-arg LD_LIBRARY_PATH_ARG=/usr/local/lib64 --build-arg DEVTOOLSET_ROOTPATH=/usr --build-arg BUILD_UID=$( id -u ) --build-arg PLATFORM=${{ parameters.arch }}" | ||
Repository: onnxruntimecuda118xtrt86build${{ parameters.arch }} | ||
|
||
- task: Bash@3 | ||
displayName: 'Bash Script' | ||
inputs: | ||
targetType: filePath | ||
filePath: tools/ci_build/github/linux/run_python_dockertest.sh | ||
arguments: -d GPU -c ${{parameters.cmake_build_type}} -i onnxruntimecuda118xtrt86build${{ parameters.arch }} | ||
|
||
- task: mspremier.PostBuildCleanup.PostBuildCleanup-task.PostBuildCleanup@3 | ||
displayName: 'Clean Agent Directories' | ||
condition: always() |
Oops, something went wrong.