PyTriton server for serving model inference requests. The inference server runs in a Singularity container, as we are otherwise unable to install PyTriton in the cluster environment.
As per the Martinos Docs, it is important create a symlink for the ~/.singularity
folder, otherwise Singularity will generate many GBs of data that surpass the user home quota.
If you previously used Singularity, first delete the folder in your home directory:
rm -rf ~/.singularity
Then, create a symlink to /space/neo/4/.singularity
(or some other location):
ln -s /space/neo/4/.singularity ~/.singularity
This setup only needs to be completed if building a new image.
Images cannot be built on the cluster without sudo
privleges. Instead, the image can be built on a development machine and uploaded to the cluster. See the Singularity docs for OS-dependent installation for your local machine; these steps are enumerated below for Windows.
The following examples may be helpful, though the .def
file takes tips from several sources. Importantly, we do not use an Nvidia-provided base docker image because it 1) takes a long time to download and presented (potentially VM-related) issues when generating a .sif
file and 2) PyTorch handles isstallation of CUDA and cuDNN, so we can keep the base image lighter.
- https://github.com/sylabs/examples/blob/master/machinelearning/intel-tensorflow/Singularity.mkl.def
- https://github.com/bdusell/singularity-tutorial
If on Linux or Mac, read the Singularity docs. For Linux, you need sudo
. You can also potentially skip this by exploring the free Singularity remote build server. The following is what worked for building the image on Windows. In a nutshell, we will use a prebuilt image to run a VM with singularity installed. From within the VM, we can then build a Singularity image from the .def
file on Windows.
- Install the following:
- Git for Windows and/or Cygwin (for a Unix-like shell)
- VirtualBox (for running the VM)
- Vagrant (for convenient command line tools for managing the VM)
- Vagrant Manager (suggested by Singularity docs; may not be necessary)
- Create a folder to host your Singularly installation. E.g.,
C:\Users\BRO7\vm-singularity
. - From within the folder you just created, initialize a
Vagrantfile
:(We use the above VM because at the time of writing this README, the cluster was running Singularity 3.7.)vagrant init sylabs/singularity-3.7-ubuntu-bionic64
- You can edit the
Vagrantfile
to configure the VM. In particular, we want to bind local directory containing inference server code to/triton
by adding the following line (with the first argument reflecting the correct path on your machine):You may want to increase the resources allocated to the VM:config.vm.synced_folder "C:\\Users\\BRO7\\Documents\\triton", "/triton"
# Increase size of the primary drive # config.vm.disk :disk, size: "40GB", primary: true config.vm.provider "virtualbox" do |vb| # Customize the amount of memory on the VM: vb.memory = "2048" # Allocate additional CPUs vb.cpus = "2" end
- Now, power up the VM and log in: (If asked for a password, it will be
vagrant
)vagrant up vagrant ssh
- When first logged into the vagrant container, check the Singularity vesion with
singularity version
. - See the steps below for building the image while logging into the VM.
- You can exit the guest OS with
exit
and then (if done) bring down the VM withvagrant halt
. - If you find yourself wanting to delete the VM, you can run
vagrant destroy
. The next time you runvagrant up
it will reconstuct the VM based on yourVagrantfile
.
The container running the inference server is defined by the python_3.10.def
file. The purpose of the container is primarily to provide the build tools and environment necessary to install and run the nvidia-triton
Python package (and any other packages we want). The def
file may need to be modified (and the container rebuilt) if additional tools need to be installed to support new or updated models. Documentation for the def files can be found at https://docs.sylabs.io/guides/3.0/user-guide/definition_files.html.
Python libraries are handled separately (on a per-project basis) by Pipenv. This 1) reduces the image size and 2) allows for multiple applications (e.g., including client scripts) to use the same image.
PyTriton docs state that it is most rigorously tested on Ubuntu 22.04, so we choose that OS for the base image. We rely on PyTorch to install CUDA support via pip. (Singularity also provides a --nv
flag when running the container.)
To build the image, make sure you are logged into the VM (if on Windows) and in the /triton
directory you previously bound to the codebase on the local file system. (If on Linux with sudo
, just cd
to the codebase.)
You can then run:
sudo singularity build python_3.10.sif python_3.10.def
Upload the built image (python_3.10.sif
) to the cluster, replacing your username below:
rsync --info=progress1 python_3.10.sif [email protected]:/vast/neurobooth/sif/python_3.10.sif
Other tools, such as sftp
or scp
, may be used if you do not have rsync
installed on your machine. Cygwin provides an rsync
installation for Windows.
NOTE: In the past, storing the image on the same machine running the image (e.g., /space/neo/4
) has caused server crashes.
Storing the images under /vast/neurobooth
sidesteps this issue.
This step is only necessary if updating the Python libraries or for first-time setup of a new project.
For this project, you can log into the cluster cd /space/drwho/3/neurobooth/applications/triton_server/
, then run ./install_python_libs.sh
.
The below section explains some of what is going on in this script.
After updating the libraries, you should check the new Pipfile
and Pipfile.lock
files into your code repository.
The Python libraries needed by the image are specified in the project's Pipfile
.
To generate a new Pipfile
on the cluster, first cd
to your project directory (e.g., /space/drwho/3/neurobooth/applications/triton_server
) and run the following command (modifying the package list as necessary):
HARG=$(pwd) # Store the absolute path to your current directory
singularity exec -H $HARG /vast/neurobooth/sif/python_3.10.sif pipenv install \
numpy \
"scipy>=1.10" \
pandas \
"torch>=2.1" \
torchvision \
torchaudio \
nvidia-pytriton \
lightning \
torchmetrics \
"pydantic>=2.0" \
pyyaml \
tqdm
The -H $HARG
is necessary for this to work as intended. By default, Singularity binds your home directory to the container's home directory. The -H
option overides this behavior so that the container's home directory is bound to the specified directory on the cluster. This argument must be an absolute (e.g., no .
or ..
) path.
You can then validate that GPU support is enabled with the following command:
singularity exec -H $HARG --nv /vast/neurobooth/sif/python_3.10.sif pipenv run \
python -c "import torch; print(f'GPU Enabled: {torch.cuda.is_available()}, # GPUs: {torch.cuda.device_count()}')"
To install local code repos (via pip -e
), you need to make sure their location on the host file system is mounted to
the singularity container using the --bind
argument. For example:
singularity exec \
-H $(pwd) \
--bind "/space/neo/3/neurobooth/applications/neurobooth-analysis-tools:/dep/neurobooth-analysis-tools" \
/vast/neurobooth/sif/python_3.10.sif \
pipenv install -e /dep/neurobooth-analysis-tools
To start the server, execute ./run_server.sh
.
Note: To run as a service the container will need to be started as a service. See https://docs.sylabs.io/guides/3.0/user-guide/running_services.html
A python script can be run in the Singularity container with:
./singularity_exec.sh pipenv run python script.py
Server status can be queried via:
curl -v localhost:8000/v2/health/live
The stats of a model can be queried via:
curl -v localhost:8000/v2/models/model_name/ready