This section gives an overview of how to build the Triton server. For information on building the Triton client libraries and examples see Client Libraries.
Triton server is built using the build.py script. The build.py script currently supports building for the following platforms. See Building on Unsupported Platforms if you are attempting to build Triton on a platform that is not listed here.
For Ubuntu-20.04 build.py supports both a Docker build and a non-Docker build.
-
Build using Docker and the TensorFlow and PyTorch Docker images from NVIDIA GPU Cloud (NGC).
The easiest way to build Triton is to use Docker. The result of the build will be a Docker image called tritonserver that will contain the tritonserver executable in /opt/tritonserver/bin and the required shared libraries in /opt/tritonserver/lib. The backends built for Triton will be in /opt/tritonserver/backends (note that as of the 21.02 release the TensorRT backend is still included in the core of Triton and so does not appear in /opt/tritonserver/backends).
The first step for any build is to checkout the triton-inference-server/server repo branch for the release you are interested in building (or the master branch to build from the development branch). Then run build.py as described below. The build.py script performs these steps when building with Docker.
-
Fetch the appropriate <xx.yy>-py3-min image from NGC. The min image is a minimal/base image that contains the CUDA, cuDNN, TensorRT and other dependencies that are required to build Triton.
-
Create a tritonserver_buildbase Docker image that adds additional build dependencies to the min image.
-
Run build.py within the tritonserver_buildbase image to actually build Triton. See Build without Docker for more details on that part of the build process. The result of this step is a tritonserver_build image that contains the built Triton artifacts.
-
Create the final tritonserver Docker image by extracting the appropriate libraries, executables and other artifacts from tritonserver_build.
By default, build.py does not enable any of Triton's optional features and so you must enable them explicitly. The following build.py invocation builds all features, backends, and repository agents.
$ ./build.py --cmake-dir=<path/to/repo>/build --build-dir=/tmp/citritonbuild --enable-logging --enable-stats --enable-tracing --enable-metrics --enable-gpu-metrics --enable-gpu --filesystem=gcs --filesystem=azure_storage --filesystem=s3 --endpoint=http --endpoint=grpc --repo-tag=common:<container tag> --repo-tag=core:<container tag> --repo-tag=backend:<container tag> --repo-tag=thirdparty:<container tag> --backend=ensemble --backend=tensorrt --backend=identity:<container tag> --backend=repeat:<container tag> --backend=square:<container tag> --backend=onnxruntime:<container tag> --backend=pytorch:<container tag> --backend=tensorflow1:<container tag> --backend=tensorflow2:<container tag> --backend=openvino:<container tag> --backend=python:<container tag> --backend=dali:<container tag> --repoagent=checksum:<container tag>
If you are building on master/main branch then <container tag> should be set to "main". If you are building on a release branch you should set <container tag> to match the branch name. For example, if you are building on the r21.02 branch you should set <container tag> to be "r21.02". You can use a different <container tag> for a component to instead use the corresponding branch/tag in the build. For example, if you have a branch called "mybranch" in the identity_backend repo that you want to use in the build, you would specify --backend=identity:mybranch.
To build Triton without using Docker you must install the build dependencies that are handled automatically when building with Docker. These dependencies include CUDA and cuDNN, TensorRT, and the dependencies listed in the create_dockerfile_buildbase() function of build.py.
Once you have installed these dependencies on your build system you can then use build.py with the --no-container-build flag to build Triton. See the build.py invocation in Build using Docker for an example of how to run build.py. You can use that same invocation with the --no-container-build flag to build without Docker.
The first step for any build is to checkout the triton-inference-server/server repo branch for the release you are interested in building (or the master branch to build from the development branch). Then run build.py as described below. The build.py script will perform the following steps (note that if you are building with Docker that these same steps will be performed during the Docker build within the tritonserver_build container).
-
Use the CMake files in build to build Triton's core shared library and tritonserver executable.
-
Fetch each requested backend and build it using the CMake file from the corresponding backend repo. For example, the ONNX Runtime backend is built using triton-inference-server/onnxruntime_backend/CMakeLists.txt. Some of the backends may use Docker as part of their build (for example ONNX Runtime and OpenVINO). If you don't want to use Docker in those cases you must consult the build process for those backends.
-
Fetch each repository agent and build it using the CMake file from the corresponding repo. For example, the Checksum repository agent is built using triton-inference-server/checksum_repository_agent/CMakeLists.txt.
By default build.py clones Triton repos from https://github.com/triton-inference-server. Use the --github-organization options to select a different URL.
The backends can also be built independently in each of the backend repositories. See the backend repo for more information.
For Triton to support NVIDIA GPUs you must install CUDA, cuBLAS and cuDNN. These libraries must be installed on system include and library paths so that they are available for the build. The version of the libraries used for a given release can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of the libraries but you may have build or execution issues since non-supported versions are not tested.
The TensorRT includes and libraries must be installed on system include and library paths so that they are available for the build. The version of TensorRT used in a given release can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of TensorRT but you may have build or execution issues since non-supported versions are not tested.
Under Construction
Under Construction
Building for an unsupported OS and/or hardware platform is possible. All of the build scripting and CMake files are included in the public repos. However, due to differences in compilers, libraries, package management, etc. you may have to make changes in the build scripts, CMake files and the source code.
You should familiarize yourself with the build process for supported platforms by reading the above documentation and then follow the process for the supported platform that most closely matches the platform you are interested in (for example, if you are trying to build for RHEL/x86-64 then follow the Building for Ubuntu 20.04 process. You will likely need to make changes in the following areas.
-
The build.py script installs dependencies for the build using platform-specific packaging tools, for example, apt-get for Ubuntu. You will need to change build.py to use the packaging tool appropriate for your platform.
-
The package and libraries names for your platform may differ from those used by build.py when installing dependencies. You will need to find the corresponding packages on libraries on your platform.
-
Your platform may use a different compiler or compiler version than the support platforms. As a result you may encounter build errors that need to be fixed by editing the source code or changing the compilation flags.
-
Triton depends on a large number of open-source packages that it builds from source. If one of these packages does not support your platform them you may need to disable the Triton feature that depends on that package. For example, Triton supports the S3 filesystem by building the aws-sdk-cpp package. If aws-sdk-cpp doesn't build for your platform then you can remove the need for that package by not specifying --filesystem=s3 when you run build.py. In general, you should start by running build.py with the minimal required feature set.
-
The TensorFlow backend extracts pre-built shared libraries from the TensorFlow NGC container as part of the build. This container is only available for Ubuntu-20.04 / x86-64 and so if you require the TensorFlow backend for your platform you will need download the TensorFlow container and modify its build to produce shared libraries for your platform. You must use the TensorFlow source and build scripts from within the NGC container because they contain Triton-specific patches that are required for the Triton TensorFlow backend.
-
By default, the PyTorch backend build extracts pre-built shared libraries from The PyTorch NGC container. But the build can also use PyTorch shared libraries that you build separately for your platform. See the pytorch_backend build process for details.