-
lock frequency, turn on ECC. nvidia-smi -q -d CLOCK; nvidia-smi -ac 5001,1590 -i x
-
using nvidia-docker and specified image:
nvidia-docker run --rm -it --name=triton_bert -p8000:8000 -p8001:8001 -v/path/to/model/repo:/models nvcr.io/nvidia/tensorrtserver:19.09-py3
-
organize your model repository with model files and config.pbtxt
-
export LD_PRELOAD=/path/to/libcommon.so:/path/to/libbert_plugins.so:/path/to/libtf_fastertransformer.so
-
Lauch server:
trtserver --model-store=/models --log-verbose=1 --strict-model-config=False
-
In another CLI, run client image:
docker run --rm -it --net=host nvcr.io/nvidia/tensorrtserver:20.02-py3-clientsdk
-
In this container, run client(just an example, modify the params according to the help):
install/bin/perf_client -m bert_trt_fp16 -d -c1 -l2000 -p15000 -b8 -i grpc -u localhost:8001 -t1 --max-threads=64 --input-data=zero