-
Notifications
You must be signed in to change notification settings - Fork 89
Getting started: using the new features of MIGraphX 0.3
MIGraphX 0.3 supports the following new features:
- Tensorflow support
- Quantization support, part 1
- Horizontal fusion
This page provides examples of how to use these new features.
The initial release of quantization support will quantize graph weights and values from float32 to float16. A new "quantize" API is added to the MIGraphX library. The function can be called from both Python and from C++ interfaces. Examples below illustrate C++ API calls.
#include <migraphx/quantization.hpp>
void quantize(program& prog, const std::vector<std::string>& ins_names);
void quantize(program& prog);
The quantization function should be called after the program is loaded and before it is compiled, e.g.
prog = parse_onnx(model_filename);
quantize(prog);
prog.compile(migraphx::gpu::target{});
When called with one argument, the quantize function will change all operations to float16. To quantize only particular operations, one provides a vector of operation names. For example, the following will quantize addition and subtraction:
quantize(prog,{"add","sub"});
Quantization from float32 to float16 canspeed up programs, both by using faster GPU operations and by reducing the amount of data that must be copied between layers.
MIGraphX now can read models frozen for inference from the Tensorflow framework. Frozen tensorflow models are prepared with several steps to (a) remove training operators (b) freeze the graph including model weights into a *.pb file. These steps are illustrated below with an example from the Tensorflow Research Slim model library. We have verified MIGraphX using image models from this library.
The Slim model library includes a script named "export_inference_graph.py", that saves just the model definition to a file. Our first step is to call this script
prompt% python3 ${TENSORFLOW_MODELS}/research/slim/export_inference_graph.py --model_name=inception_v4 --output_file=${INCEPTIONDIR}/inception_v4_model.pb --batch_size=1
A few items to note about this particular invocation
- The script has an option to save either a training-focused version of the model or an inference-focused version. By default, it uses an inference graph. Certain operators, e.g. dropout are removed from the graph. MIGraphX does not support such operators. Hence, if you are saving your own graph, you may need to save a graph specifically focused on inference.
- We pass along a parameter to freeze a particular batch size with the graph. MIGraphX does not currently support variable graph sizes.
The next step is to freeze the graph itself. Freezing the graph will pull together the trained weights along with the saved model and save the combination as a frozen model. To freeze the graph, one needs to identify the "output nodes" for the last computation. We can find the names of output nodes with the following command
prompt% ${TENSORFLOWDIR}/bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=${INCEPTIONDIR}/inception_v4_model.pb
As an input argument, we pass along the model file we saved in the previous command. This produces output that includes the following line
Found 2 possible outputs: (name=InceptionV4/AuxLogits/Aux_logits/BiasAdd, op=BiasAdd) (name=InceptionV4/Logits/Predictions, op=Softmax)
We take this information and pass it to the freeze_graph script as follows
prompt% ${TENSORFLOWDIR}/bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=${INCEPTIONDIR}/inception_v4_model.pb \
--input_binary=true \
--input_checkpoint=${INCEPTIONDIR}/inception_v4.ckpt \
--output_node_names=InceptionV4/Logits/Predictions \
--output_graph=${INCEPTIONDIR}/inception_v4_i1.pb
This script combines together the input checkpoint (which contains frozen weights), saved model and outputs a frozen tensorflow model "inception_v4_i1.pb" that we can use with MIGraphX
MIGraphX provides the following API that will load a frozen Tensorflow model for use in MIGraphX
#include <migraphx/tf.hpp>
/// Create a program from a tf pb file (default is nhwc format)
program parse_tf(const std::string& name, bool is_nhwc);
This API is similar to the the parse_onnx() routine previously available in MIGraphX, except it enables both NHWC and NCHW formats. The API is also available as a Python interface with current limitation that the Python API supports either TF or ONNX but not both simultaneously. A cmake variable
cmake MIGRAPHX_ENABLE_TF=On
can be set at build time to enable this Python API for Tensorflow.