This guide shows how to build an Intel® Extension for TensorFlow* CC library from source and how to work with tensorflow_cc to build bindings for C/C++ languages on Ubuntu.
Verified Hardware Platforms:
- Intel® CPU (Xeon, Core)
- Intel® Data Center GPU Flex Series
- Intel® Data Center GPU Max Series
- Intel® Arc™ Graphics (experimental)
To build Intel® Extension for TensorFlow*, install Bazel 5.3.0. Refer to install Bazel.
Here are the recommended commands:
$ wget https://github.com/bazelbuild/bazel/releases/download/5.3.0/bazel-5.3.0-installer-linux-x86_64.sh
$ bash bazel-5.3.0-installer-linux-x86_64.sh --user
Check Bazel is installed successfully and is version 5.3.0:
$ bazel --version
$ git clone https://github.com/intel/intel-extension-for-tensorflow.git intel-extension-for-tensorflow
$ cd intel-extension-for-tensorflow/
-
Install Conda.
-
Create Virtual Running Environment
$ conda create -n itex_build python=3.10
$ conda activate itex_build
Note, we support Python versions 3.8 through 3.11.
Install TensorFlow 2.15.0, and refer to Install TensorFlow for details.
$ pip install tensorflow==2.15.0
Check TensorFlow was installed successfully and is version 2.15.0:
$ python -c "import tensorflow as tf;print(tf.__version__)"
Install the Intel GPU Driver in the building server, which is needed to build with GPU support and AOT (Ahead-of-time compilation).
Refer to Install Intel GPU driver for details.
Note:
-
Make sure to install developer runtime packages before building Intel® Extension for TensorFlow*.
-
AOT (Ahead-of-time compilation)
AOT is a compiling option that reduces the initialization time of GPU kernels at startup time by creating the binary code for a specified hardware platform during compiling. AOT will make the installation package larger but improve performance time.
Without AOT, Intel® Extension for TensorFlow* will be translated to binary code for local hardware platform during startup. That will prolong startup time when using a GPU to several minutes or more.
For more information, refer to Use AOT for Integrated Graphics (Intel GPU).
We recommend you install the oneAPI base toolkit using sudo
(or as root user) to the system directory /opt/intel/oneapi
.
The following commands assume the oneAPI base tookit is installed in /opt/intel/oneapi
. If you installed it in some other folder, please update the oneAPI path as appropriate.
Refer to Install oneAPI Base Toolkit Packages
The oneAPI base toolkit provides compiler and libraries needed by Intel® Extension for TensorFlow*.
Enable oneAPI components:
$ source /opt/intel/oneapi/compiler/latest/env/vars.sh
$ source /opt/intel/oneapi/mkl/latest/env/vars.sh
Configure the system build by running the ./configure
command at the root of your cloned Intel® Extension for TensorFlow* source tree.
$ ./configure
Choose n
to build for CPU only. Refer to Configure Example.
Configure the system build by running the ./configure
command at the root of your cloned Intel® Extension for TensorFlow* source tree. This script prompts you for the location of Intel® Extension for TensorFlow* dependencies and asks for additional build configuration options (path to DPC++ compiler, for example).
$ ./configure
-
Choose
Y
for Intel GPU support. Refer to Configure Example. -
Specify the Location of Compiler (DPC++).
Default is
/opt/intel/oneapi/compiler/latest/linux/
, which is the default installed path. ClickEnter
to confirm default location.If it's differenct, confirm the compiler (DPC++) installed path and fill the correct path.
-
Specify the Ahead of Time (AOT) Compilation Platforms.
Default is '', which means no AOT.
Fill one or more device type strings of special hardware platforms, such as
ats-m150
,acm-g11
.Here is the list of GPUs we've verified:
GPU device type Intel® Data Center GPU Flex Series 170 ats-m150 Intel® Data Center GPU Flex Series 140 ats-m75 Intel® Data Center GPU Max Series pvc Intel® Arc™ A730M acm-g10 Intel® Arc™ A380 acm-g11 Please refer to the
Available GPU Platforms
section in the end of the Ahead of Time Compilation document for more device types or create an issue to ask support.To get the full list of supported device types, use the OpenCL™ Offline Compiler (OCLOC) tool (which is installed as part of the GPU driver), and run the following command, please look for
-device <device_type>
field of the output:ocloc compile --help
-
Choose to Build with oneMKL Support.
We recommend choosing
y
.Default is
/opt/intel/oneapi/mkl/latest
, which is the default installed path. ClickEnter
to confirm default location.If it's wrong, please confirm the oneMKL installed path and fill the correct path.
For GPU support
$ bazel build -c opt --config=gpu //itex:libitex_gpu_cc.so
CC library location: <Path to intel-extension-for-tensorflow>/bazel-bin/itex/libitex_gpu_cc.so
NOTE: libitex_gpu_cc.so
is depended on libitex_gpu_xetla.so
, so libitex_gpu_xetla.so
shoule be copied to the same diretcory of libitex_gpu_cc.so
$ cd <Path to intel-extension-for-tensorflow>
$ cp bazel-out/k8-opt-ST-*/bin/itex/core/kernels/gpu/libitex_gpu_xetla.so bazel-bin/itex/
For CPU support
$ bazel build -c opt --config=cpu //itex:libitex_cpu_cc.so
If you want to build with threadpool, you should add buid options --define=build_with_threadpool=true
and environment variables ITEX_OMP_THREADPOOL=0
$ bazel build -c opt --config=cpu --define=build_with_threadpool=true //itex:libitex_cpu_cc.so
CC library location: <Path to intel-extension-for-tensorflow>/bazel-bin/itex/libitex_cpu_cc.so
NOTE: libitex_cpu_cc.so
is depended on libiomp5.so
, so libiomp5.so
shoule be copied to the same diretcory of libitex_cpu_cc.so
$ cd <Path to intel-extension-for-tensorflow>
$ cp bazel-out/k8-opt-ST-*/bin/external/llvm_openmp/libiomp5.so bazel-bin/itex/
a. Download Tensorflow* 2.15.0 python package
$ wget https://files.pythonhosted.org/packages/ed/1a/b4ab4b8f8b3a41fade4899fd00b5b2d2dad0981f3e1bb10df4c522975fd7/tensorflow-2.15.0.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
b. Unzip Tensorflow* python package
$ unzip tensorflow-2.15.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -d tensorflow_src
c. Create symbolic link
$ cd ./tensorflow_src.13.0/tensorflow/
$ ln -s libtensorflow_cc.so.2 libtensorflow_cc.so
$ ln -s libtensorflow_framework.so.2 libtensorflow_framework.so
libtensorflow_cc.so location: <Path to tensorflow_src>/tensorflow/libtensorflow_cc.so
libtensorflow_framework.so location: <Path to tensorflow_src>/tensorflow/libtensorflow_framework.so
Tensorflow header file location: <Path to tensorflow_src>/tensorflow/include
a. Prepare TensorFlow* source code
$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout origin/r2.14 -b r2.14
b. Build libtensorflow_cc.so
$ ./configure
$ bazel build --jobs 96 --config=opt //tensorflow:libtensorflow_cc.so
$ ls ./bazel-bin/tensorflow/libtensorflow_cc.so
libtensorflow_cc.so location: <Path to tensorflow>/bazel-bin/tensorflow/libtensorflow_cc.so
c. Create symbolic link for libtensorflow_framework.so
$ cd ./bazel-bin/tensorflow/
$ ln -s libtensorflow_framework.so.2 libtensorflow_framework.so
libtensorflow_framework.so location: <Path to tensorflow>/bazel-bin/tensorflow/libtensorflow_framework.so
c. Build Tensorflow header files
$ bazel build --config=opt tensorflow:install_headers
$ ls ./bazel-bin/tensorflow/include
Tensorflow header file location: <Path to tensorflow>/bazel-bin/tensorflow/include
Configure the linker environmental variables with Intel® Extension for TensorFlow* CC library (libitex_gpu_cc.so or libitex_cpu_cc.so) path:
$ export LIBRARY_PATH=$LIBRARY_PATH:<Path to intel-extension-for-tensorflow>/bazel-bin/itex/
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<Path to intel-extension-for-tensorflow>/bazel-bin/itex/
TensorFlow* has C API: TF_LoadPluggableDeviceLibrary
to support the pluggable device library.
To support Intel® Extension for TensorFlow* cc library, we need to modify the original C++ code:
a. Add the header file: "tensorflow/c/c_api_experimental.h"
.
#include "tensorflow/c/c_api_experimental.h"
b. Load libitex_gpu_cc.so or libitex_cpu_cc.so by TF_LoadPluggableDeviceLibrary
.
TF_Status* status = TF_NewStatus();
TF_LoadPluggableDeviceLibrary(<lib_path>, status);
The original simple example for using TensorFlow* C++ API.
// example.cc
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"
int main() {
using namespace tensorflow;
using namespace tensorflow::ops;
Scope root = Scope::NewRootScope();
auto X = Variable(root, {5, 2}, DataType::DT_FLOAT);
auto assign_x = Assign(root, X, RandomNormal(root, {5, 2}, DataType::DT_FLOAT));
auto Y = Variable(root, {2, 3}, DataType::DT_FLOAT);
auto assign_y = Assign(root, Y, RandomNormal(root, {2, 3}, DataType::DT_FLOAT));
auto Z = Const(root, 2.f, {5, 3});
auto V = MatMul(root, assign_x, assign_y);
auto VZ = Add(root, V, Z);
std::vector<Tensor> outputs;
ClientSession session(root);
// Run and fetch VZ
TF_CHECK_OK(session.Run({VZ}, &outputs));
LOG(INFO) << "Output:\n" << outputs[0].matrix<float>();
return 0;
}
The updated example with Intel® Extension for TensorFlow* enabled
// example.cc
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"
+ #include "tensorflow/c/c_api_experimental.h"
int main() {
using namespace tensorflow;
using namespace tensorflow::ops;
+ TF_Status* status = TF_NewStatus();
+ string xpu_lib_path = "libitex_gpu_cc.so";
+ TF_LoadPluggableDeviceLibrary(xpu_lib_path.c_str(), status);
+ TF_Code code = TF_GetCode(status);
+ if ( code == TF_OK ) {
+ LOG(INFO) << "intel-extension-for-tensorflow load successfully!";
+ } else {
+ string status_msg(TF_Message(status));
+ LOG(WARNING) << "Could not load intel-extension-for-tensorflow, please check! " << status_msg;
+ }
Scope root = Scope::NewRootScope();
auto X = Variable(root, {5, 2}, DataType::DT_FLOAT);
auto assign_x = Assign(root, X, RandomNormal(root, {5, 2}, DataType::DT_FLOAT));
auto Y = Variable(root, {2, 3}, DataType::DT_FLOAT);
auto assign_y = Assign(root, Y, RandomNormal(root, {2, 3}, DataType::DT_FLOAT));
auto Z = Const(root, 2.f, {5, 3});
auto V = MatMul(root, assign_x, assign_y);
auto VZ = Add(root, V, Z);
std::vector<Tensor> outputs;
ClientSession session(root);
// Run and fetch VZ
TF_CHECK_OK(session.Run({VZ}, &outputs));
LOG(INFO) << "Output:\n" << outputs[0].matrix<float>();
return 0;
}
Place a Makefile
file in the same directory of example.cc
with the following contents:
- Replace
<TF_INCLUDE_PATH>
with local Tensorflow* header file path. e.g.<Path to tensorflow_src>/tensorflow/include
- Replace
<TFCC_PATH>
with local Tensorflow* CC library path. e.g.<Path to tensorflow_src>/tensorflow/
// Makefile
target = example_test
cc = g++
TF_INCLUDE_PATH = <TF_INCLUDE_PATH>
TFCC_PATH = <TFCC_PATH>
include = -I $(TF_INCLUDE_PATH)
lib = -L $(TFCC_PATH) -ltensorflow_framework -ltensorflow_cc
flag = -Wl,-rpath=$(TFCC_PATH) -std=c++17
source = ./example.cc
$(target): $(source)
$(cc) $(source) -o $(target) $(include) $(lib) $(flag)
clean:
rm $(target)
run:
./$(target)
Go to the directory of example.cc and Makefile, then build and run example.
$ make
$ ./example_test
NOTE: For GPU support, please set up oneapi environment variables before running the example.
$ source /opt/intel/oneapi/compiler/latest/env/vars.sh
$ source /opt/intel/oneapi/mkl/latest/env/vars.sh