This tutorial will guide you through step-by-step instructions for
- Installing Intel® Optimization for TensorFlow Serving as Docker image
- Running an example - serving ResNet-50 v1 saved model using REST API and gRPC.
-
Access to a machine with the following resources:
-
Hardware recommendations
- minimum of 20 GB of free disk space (required)
- minimum of 8 logical cores (highly recommended).
- We recommend the following Instance types for cloud VM's which have the latest Intel® Xeon® Processors:
- AWS: C5 Instances (Helpful guides: Get_Started, Accessing_Instances)
- GCP: "Intel Skylake" or "Intel Cascade Lake" CPU Platform (Helpful guides: Get_Started, Accessing_Instances)
- Azure: Fsv2-series, Hc-series, or M-series (Helpful guides: Get_Started, Accessing_Instances)
-
Software recommendations
-
-
Install Docker CE
- Click here for Ubuntu instructions. For other OS platforms, see here.
- Setup docker to be used as a non-root user, to run
docker
commands withoutsudo
. Exit and restart your SSH session so that your username is in effect in the docker group.sudo usermod -aG docker `whoami`
- After exiting and restarting your SSH session, you should be able to run
docker
commands withoutsudo
.NOTE: If your machine is behind a proxy, See HTTP/HTTPS proxy section heredocker run hello-world
We will break down the installation into 2 steps:
- Step 1: Pull or build the Intel Optimized TensorFlow Serving Docker image
- Step 2: Verify the Docker image by serving a simple model - half_plus_two
The recommended way to use TensorFlow Serving is with Docker images. The easiest way to get an image is to pull the latest version from Docker Hub.
$ docker pull intel/intel-optimized-tensorflow-serving:2.2.0
- Login into your machine via SSH and clone the Tensorflow Serving repository and save the path of this cloned directory (Also, adding it to
.bashrc
) for ease of use for the remainder of this tutorial.git clone https://github.com/tensorflow/serving.git export TF_SERVING_ROOT=$(pwd)/serving echo "export TF_SERVING_ROOT=$(pwd)/serving" >> ~/.bashrc
If you pulled the image and cloned the repository, you can move on to step 2. Alternatively, you can build an image with TensorFlow Serving optimized for Intel® Processors. You can build the docker images using this script or continue with the steps below.
-
Using
Dockerfile.devel-mkl
, build an image with Intel optimized ModelServer. This creates an image with all the required development tools and builds from sources. The image size will be around 5GB and will take some time. On AWS c5.4xlarge instance (16 logical cores), it took about 25min.NOTE: It is recommended that you build an official release version using
--build-arg TF_SERVING_VERSION_GIT_BRANCH="<release_number>"
, but if you wish to build the (unstable) head of master, omit the build argument and master will be used by default.cd $TF_SERVING_ROOT/tensorflow_serving/tools/docker/ docker build \ -f Dockerfile.devel-mkl \ --build-arg TF_SERVING_BUILD_OPTIONS="--config=mkl" \ --build-arg TF_SERVING_VERSION_GIT_BRANCH="2.2.0" \ -t intel/intel-optimized-tensorflow-serving:2.2.0-devel .
-
Next, using
Dockerfile.mkl
, build a serving image which is a light-weight image without any development tools in it.Dockerfile.mkl
will build a serving image by copying Intel optimized libraries and ModelServer from the development image built in the previous step -tensorflow/serving:latest-devel-mkl
cd $TF_SERVING_ROOT/tensorflow_serving/tools/docker/ docker build \ -f Dockerfile.mkl \ --build-arg TF_SERVING_BUILD_OPTIONS="--config=mkl" \ --build-arg TF_SERVING_VERSION_GIT_BRANCH="2.2.0" \ -t intel/intel-optimized-tensorflow-serving:2.2.0 .
NOTE 1: Docker build commands require a
.
path argument at the end; see docker examples for more background.NOTE 2: If your machine is behind a proxy, you will need to pass proxy arguments to both build commands. For example:
--build-arg http_proxy="http://proxy.url:proxy_port" --build-arg https_proxy="http://proxy.url:proxy_port"
-
Once you built both the images, you should be able to list them using command
docker images
docker images REPOSITORY TAG IMAGE ID CREATED SIZE intel/intel-optimized-tensorflow-serving 2.2.0 d33c8d849aa3 7 minutes ago 520MB intel/intel-optimized-tensorflow-serving 2.2.0-devel a2e69840d5cc 8 minutes ago 5.21GB ubuntu 18.04 20bb25d32758 13 days ago 87.5MB hello-world latest fce289e99eb9 5 weeks ago 1.84kB
Let us test the server by serving a simple oneDNN version of half_plus_two model which is included in the repo which we cloned in the previous step.
-
Set the location of test model data:
export TEST_DATA=$TF_SERVING_ROOT/tensorflow_serving/servables/tensorflow/testdata
-
Start the container
- with
-d
, runs the container as a background process - with
-p
, publish the container’s port 8501 to host's port 8501 where the TF serving listens to REST API requests - with
--name
, assign a name to the container for acessing later for checking status or killing it. - with
-v
, mount the host local model directory$TEST_DATA/saved_model_half_plus_two_mkl
on the container/models/half_plus_two
. - with
-e
, setting an environment variable in the container which is read by TF serving - with
intel/intel-optimized-tensorflow-serving:2.2.0
docker image
docker run \ -d \ -p 8501:8501 \ --name tfserving_half_plus_two \ -v $TEST_DATA/saved_model_half_plus_two_mkl:/models/half_plus_two \ -e MODEL_NAME=half_plus_two \ intel/intel-optimized-tensorflow-serving:2.2.0
- with
-
Query the model using the predict API:
curl -d '{"instances": [1.0, 2.0, 5.0]}' \ -X POST http://localhost:8501/v1/models/half_plus_two:predict
You should see the following output:
{ "predictions": [2.5, 3.0, 4.5] }
NOTE: If you see any issues as below after sending predict request, please make sure to set your proxy (inside corporate environment)
curl -d '{"instances": [1.0, 2.0, 5.0]}' \ -X POST http://localhost:8501/v1/models/half_plus_two:predict \ <http://localhost:8501/v1/models/half_plus_two:predict> <HTML> <HEAD><TITLE>Redirection</TITLE></HEAD> <BODY><H1>Redirect</H1></BODY>
Place this proxy information in your
~/.bashrc
or/etc/environment
export http_proxy="<http_proxy>" export https_proxy="<https_proxy>" export ftp_proxy="<ftp_proxy>" export socks_proxy="<socks_proxy>" export HTTP_PROXY=${http_proxy} export HTTPS_PROXY=${https_proxy} export FTP_PROXY=${ftp_proxy} export SOCKS_PROXY=${socks_proxy} export no_proxy=localhost,127.0.0.1,<add_your_machine_ip>,<add_your_machine_hostname> export NO_PROXY=${no_proxy}
-
After you are fininshed with querying, you can stop the container which is running in the background. To restart the container with the same name, you need to stop and remove the container from the registry. To view your running containers run
docker ps
.docker rm -f tfserving_half_plus_two
-
Note: If you want to confirm that Intel® oneAPI Deep Neural Network Library (Intel® oneDNN) optimizations are being used, add
-e MKLDNN_VERBOSE=1
to thedocker run
command. This will log Intel oneDNN messages in the docker logs, which you can inspect after a request is processed.docker run \ -d \ -p 8501:8501 \ --name tfserving_half_plus_two \ -v $TEST_DATA/saved_model_half_plus_two_mkl:/models/half_plus_two \ -e MODEL_NAME=half_plus_two \ -e MKLDNN_VERBOSE=1 \ intel/intel-optimized-tensorflow-serving:2.2.0
Query the model using the predict API as before:
curl -d '{"instances": [1.0, 2.0, 5.0]}' \ -X POST http://localhost:8501/v1/models/half_plus_two:predict
Result:
{ "predictions": [2.5, 3.0, 4.5] }
Then, you should see the Intel oneDNN verbose output like below when you display the container's logs:
docker logs tfserving_half_plus_two
Output:
... mkldnn_verbose,exec,reorder,simple:any,undef,in:f32_nhwc out:f32_nChw16c,num:1,1x1x10x10,0.00488281 mkldnn_verbose,exec,reorder,simple:any,undef,in:f32_hwio out:f32_OIhw16i16o,num:1,1x1x1x1,0.000976562 mkldnn_verbose,exec,convolution,jit_1x1:avx512_common,forward_training,fsrc:nChw16c fwei:OIhw16i16o fbia:x fdst:nChw16c,alg:convolution_direct,mb1_g1ic1oc1_ih10oh10kh1sh1dh0ph0_iw10ow10kw1sw1dw0pw0,0.00805664 mkldnn_verbose,exec,reorder,simple:any,undef,in:f32_nChw16c out:f32_nhwc,num:1,1x1x10x10,0.012207
TensorFlow Serving requires the model to be in SavedModel format. In this example, we will :
- Download a pre-trained ResNet-50 v1 SavedModel
- Use the python client code from the TensorFlow Serving repository and query using two methods:
- Using REST API, which is simple to set up, but lacks performance when compared with gRPC
- Using gRPC, which has optimal performance but the client code requires additional dependencies to be installed
NOTE: NCHW data format is optimal for Intel-optimized TensorFlow Serving.
mkdir /tmp/resnet
curl -s http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW_jpg.tar.gz \
| tar --strip-components=2 -C /tmp/resnet -xvz
-
Querying using REST API is simple to set up, but lacks performance when compared with gRPC.
-
If a running container is using port 8501, you need to stop it. View your running containers with
docker ps
. To stop and remove the contatiner from the registry, copy theCONTAINER ID
from thedocker ps
output and rundocker rm -f <container_id>
. -
Start the container
- with
-d
, runs the container as a background process - with
-p
, publish the container’s port 8501 to host's port 8501 where the TF serving listens to REST API requests - with
--name
, assign a name to the container for acessing later for checking status or killing it. - with
-v
, mount the host local model directory/tmp/resnet
on the container/models/resnet
. - with
-e
, setting an environment variable in the container which is read by TF serving - with
intel/intel-optimized-tensorflow-serving:2.2.0
docker image
docker run \ -d \ -p 8501:8501 \ --name=tfserving_resnet_restapi \ -v "/tmp/resnet:/models/resnet" \ -e MODEL_NAME=resnet \ intel/intel-optimized-tensorflow-serving:2.2.0
- with
-
If you don't already have them, install the prerequisites for running the python client code
sudo apt-get install -y python python-requests
-
Run the example
resnet_client.py
script from the TensorFlow Serving repositorypython $TF_SERVING_ROOT/tensorflow_serving/example/resnet_client.py
You should see the following output:
Prediction class: 286, avg latency: 34.7315 ms
Note: The real performance you see will depend on your hardware, environment, and whether or not you have configured the server parameters optimally. See the General Best Practices for more information.
-
After you are fininshed with querying, you can stop the container which is running in the background. To restart the container with the same name, you need to stop and remove the container from the registry. To view your running containers run
docker ps
.docker rm -f tfserving_resnet_restapi
-
Querying using gRPC will have optimal performance but the client code requires additional dependencies to be installed.
-
If a running container is using port 8500, you need to stop it. View your running containers with
docker ps
. To stop and remove the contatiner from the registry, copy theCONTAINER ID
from thedocker ps
output and rundocker rm -f <container_id>
. -
Start a container
- with
-d
, runs the container as a background process - with
-p
, publish the container’s port 8500 to host's port 8500 where the TF serving listens to gRPC requests - with
--name
, assign a name to the container for acessing later for checking status or killing it. - with
-v
, mount the host local model directory/tmp/resnet
on the container/models/resnet
. - with
-e
, setting an environment variable in the container which is read by TF serving - with
intel/intel-optimized-tensorflow-serving:2.2.0
docker image
docker run \ -d \ -p 8500:8500 \ --name=tfserving_resnet_grpc \ -v "/tmp/resnet:/models/resnet" \ -e MODEL_NAME=resnet \ intel/intel-optimized-tensorflow-serving:2.2.0
- with
-
You will need a few python packages in order to run the client, we recommend installing them in a virtual environment.
sudo apt-get install -y python python-pip pip install virtualenv
-
Create and activate the python virtual envirnoment. Install the packages needed for the gRPC client.
cd ~ virtualenv -p python3 tfserving_venv source tfserving_venv/bin/activate pip install requests tensorflow tensorflow-serving-api
-
Run the example
resnet_client_grpc.py
script from the TensorFlow Serving repository, which you cloned earlier.Note: You may have to migrate the script for TF2 compatibility, because it was not up to date last time we checked. To fix the script, you can search-and-replace
tf.app
withtf.compat.v1.app
.python $TF_SERVING_ROOT/tensorflow_serving/example/resnet_client_grpc.py
You should see the similar output as below:
outputs { key: "classes" value { dtype: DT_INT64 tensor_shape { dim { size: 1 } } int64_val: 286 } } outputs { key: "probabilities" value { dtype: DT_FLOAT tensor_shape { dim { size: 1 } dim { size: 1001 } } float_val: 7.8115895974e-08 float_val: 3.93756813821e-08 float_val: 6.0871172991e-07 ..... ..... } } model_spec { name: "resnet" version { value: 1538686758 } signature_name: "serving_default" }
-
To deactivate your virtual environment:
deactivate
-
After you are fininshed with querying, you can stop the container which is running in the background. To restart the container with the same name, you need to stop and remove the container from the registry. To view your running containers run
docker ps
.docker rm -f tfserving_resnet_grpc
If you have any problems while making a request, the best way to debug is to check the docker logs.
First, find the Container ID of your running docker container with docker ps
and then view its logs with docker logs <container_id>
.
If you have added -e MKLDNN_VERBOSE=1
to the docker run
command, you should see mkldnn_verbose messages too.