Before learning this developer guide of bolt, the code architecture is strongly recommended for you to read in advance. In code architecture, you will get the deep understanding of the whole design of bolt, which helps you develop bolt more efficiently. If you want to verify your model quickly, you can use the out-of-the-box c api or java api to infer your model and check the inference result. If your model run with time series data, you can use Flow to accelerate the inference. What’s more, if your encounter unsupported operators in conversion or inference of your model, you can customize the unsupported operators step by step which has been described in details in the document.
Use out-of-the-box API to infer your model
C API
Java API
Accelerate time series model by Flow
Customize models with unsupported operators step by step
model conversion customization
tensor computing customization
inference's engine customization
How to contribute
submit issue
pull request
Bolt provides C API document generated by doxygen to help you use C API, image classification example and Chinese input method example. You can compile it and link libbolt.so library with your C/C++ project.
Bolt provides Java API document generated by doxygen to help use Java API with a detailed example. You can compile bolt and load libBoltModel.so to using the Java Native Interface(JNI) with your Java project.
Bolt provides easy-to-use python api for the developers.Please check the usage of python api in Bolt.
Flow provides API document generated by doxygen to help use Flow C++ header, and examples(tinybert, faceSR, ASR). You can also use Java API and there is a simple GSR test.
-
Use predefined flow protobuf standard to define a graph
Here is an example for CV application faceSR graph file flow_facesr.prototxt. This graph has one input, one input node, one inference node and one output. Input node need to be marked as Input, and inference node need to be marked as Inference. Each node can have multiple input or output tensors. Each type node has typical fields.
-
Add output tensor size infer function for each node, and register function to Flow function manager (optional)
facesr doesn't need to post-process the final tensor, so the node's output tensor can be used directly.
If you need to post-process the final tensor, you can refer to flow_tinybert which has defined a post-processing function(tinybertInferOutputSize) and register the post-processing function by using flowRegisterFunction API.
-
Add input tensor pre-processing function for each node, and register function to Flow function manager (optional)
(same as output tensor size infer function)
-
Add output tensor post-processing function for each node, and register function to Flow function manager (optional)
(same as output tensor size infer function)
-
Declare a Flow object and set CPU cores and GPU. Describe the task by Task format and use enque API to add the task into Flow heterogeneous executor.
-
Use dequeue API to get the result sorted in FIFO order. You can choose to set the results as a block to get all enqueue task results at the same time. size function can be used to query the unfinished task number.
In model_tools, you can define any operator for model conversion.
- Switch to code of the specific framework (caffe/onnx/tflite) you are working on;
- Judge the op whether it is a weight-op or non-weight-op;
- Define the Operator parameter format;
- Extract the meta information of the operator;
- Extract the weight data if the operator is a weight-op, otherwise skip this step.
-
Example: support
pooling
in caffe converter-
Switch to model_tools/src/caffe, which is the caffe converter for bolt;
-
Judgment: pooling is non-weight-op.
-
Define
pooling
parameter format.3.1 Modify OperatorType data structure in common/uni/include/operator_type.h
typedef enum { ... OT_Pooling, // Addition ... } OperatorType
3.2 Modify inline const char* const* OperatorTypeName() function in common/uni/include/operator_type.h
inline const char* const* OperatorTypeName() { static const char* const names[] = { ... "OT_Pooling", // Addition, please corresponds to the OperatorType ... } }
3.3 Add
pooling
definition of bolt in common/uni/include/parameter_spec.h// Addition ======> typedef struct { unsigned int kernel_t; unsigned int kernel_h; unsigned int kernel_w; unsigned int stride_t; unsigned int stride_h; unsigned int stride_w; unsigned int pad_before; unsigned int pad_after; unsigned int pad_top; unsigned int pad_bottom; unsigned int pad_left; unsigned int pad_right; RoundMode round_mode; PoolingMode mode; } PoolingParamSpec; // <====== Addition
3.4 Modify int get_operator_parameter_size(OperatorType operatorType) function in common/uni/include/parameter_spec.h
std::map<OperatorType, int> operatorParameterSizeMap = { ... {OT_Pooling, sizeof(PoolingParamSpec)}, // Addition };
-
Extract the meta information of
pooling
operator in caffe.4.1 Modify OperatorType convert_caffe_type(std::string inputType) function in model_tools/src/caffe/caffe_adaptee.h.
Add the caffe type mapping code as following:
OperatorType convert_caffe_type(std::string inputType) { std::map<std::string, OperatorType> operatorMap = { // Addition ======> {"Pooling", OT_Pooling}, // <====== Addition }; }
4.2 Register the abstract adapt_Pooling() function in class ModelAdaptee if it has not been registered in model_tools/src/model_adaptee.h. Otherwise, skip this step.
virtual EE adapt_operator(OperatorType type, ParameterSpec *ps) { std::map<OperatorType, AdaptOperatorFunction> functions = { // Addition ======> {OT_Pooling, &ModelAdaptee::adapt_Pooling}, // <====== Addition }; } // Addition ======> REGISTER_EMPTY_ADAPT_OPERATOR(adapt_Pooling) // <====== Addition
4.3 Extract the meta information of pooling operator from caffe model, add ParameterSpec adapt_Pooling() override function in model_tools/src/caffe/caffe_adaptee.h.
// Addition ======> ParameterSpec adapt_Pooling() override { ParameterSpec ps; PoolingParamSpec p; memset(&p, 0, sizeof(p)); p.kernel_t = 1; p.stride_t = 1; p.pad_before = 0; p.pad_after = 0; auto cp = layer.pooling_param(); if (cp.has_kernel_w() && cp.has_kernel_h()) { p.kernel_w = cp.kernel_w(); p.kernel_h = cp.kernel_h(); } else { p.kernel_h = cp.kernel_size(); p.kernel_w = p.kernel_h; } if (cp.has_stride_w() && cp.has_stride_h()) { p.stride_w = cp.stride_w(); p.stride_h = cp.stride_h(); } else { p.stride_h = cp.stride(); p.stride_w = p.stride_h; } bool global_pooling = cp.global_pooling(); if (global_pooling) { p.kernel_h = 0; p.kernel_w = 0; p.stride_h = 1; p.stride_w = 1; } else { CHECK_REQUIREMENT(p.kernel_h > 0); } if (cp.has_pad_w() && cp.has_pad_h()) { p.pad_left = cp.pad_w(); p.pad_right = p.pad_left; p.pad_top = cp.pad_h(); p.pad_bottom = p.pad_top; } else { p.pad_top = cp.has_pad() ? cp.pad() : 0; p.pad_bottom = p.pad_top; p.pad_left = p.pad_top; p.pad_right = p.pad_top; } if (cp.has_round_mode() && cp.round_mode() == 1) { p.round_mode = ROUND_FLOOR; } else { p.round_mode = ROUND_CEIL; } auto op = cp.pool(); switch (op) { case caffe::PoolingParameter_PoolMethod_MAX: { p.mode = POOLING_MAX; break; } case caffe::PoolingParameter_PoolMethod_AVE: { p.mode = POOLING_MEAN; break; } default: { const google::protobuf::EnumDescriptor *descriptor = caffe::PoolingParameter::PoolMethod_descriptor(); UNI_ERROR_LOG("can not map operator name:%s %s to Pooling.\n", this->layer.name().c_str(), descriptor->FindValueByNumber(op)->name().c_str()); } } ps.pooling_spec = p; return ps; } // <====== Addition
-
Pooling is non-weight op, skip this step.
-
-
Example: support
pooling
in onnx converter-
Switch to model_tools/src/onnx, which is the onnx converter for bolt;
-
Judgment: pooling is non-weight-op;
-
Define
pooling
parameter format.Note: Definition actions same with add pooling in caffe converter step 3 . Please refer the former content.
-
Extract the meta information of
pooling
operator in onnx.4.1 Modify the function named OperatorType convert_onnx_type(std::string inputType) in model_tools/onnx/onnx_adaptee.h.
Add the onnx type mapping code as following:
OperatorType convert_onnx_type(std::string inputType) { std::map<std::string, OperatorType> operatorMap = { // Addition ======> {"AveragePool", OT_Pooling}, {"MaxPool", OT_Pooling}, {"GlobalAveragePool", OT_Pooling}, // <====== Addition }; }
4.2 Register the abstract adapt_Pooling() function in class ModelAdaptee if it has not been registered in model_tools/src/model_adaptee.h. Otherwise, skip this step.
virtual EE adapt_operator(OperatorType type, ParameterSpec *ps) { std::map<OperatorType, AdaptOperatorFunction> functions = { // Addition ======> {OT_Pooling, &ModelAdaptee::adapt_Pooling}, // <====== Addition }; } // Addition ======> REGISTER_EMPTY_ADAPT_OPERATOR(adapt_Pooling) // <====== Addition
4.3 Extract the meta information of pooling operator from onnx model, add ParameterSpec adapt_Pooling() override function in model_tools/onnx/onnx_adaptee.h.
// Addition ======> ParameterSpec adapt_Pooling() override { ParameterSpec ps; PoolingParamSpec p; memset(&p, 0, sizeof(p)); std::string autoPad = get_string(this->onnxNode, "auto_pad"); std::vector<int> kernels = get_ints(this->onnxNode, "kernel_shape"); std::vector<int> strides = get_ints(this->onnxNode, "strides"); std::vector<int> pads = get_ints(this->onnxNode, "pads"); int ceil_mode = get_int(this->onnxNode, "ceil_mode", 0); const std::string &onnxNodeType = this->onnxNode.op_type(); if (onnxNodeType == "AveragePool" || onnxNodeType == "ReduceMean" || onnxNodeType == "GlobalAveragePool") { p.mode = POOLING_MEAN; } else { p.mode = POOLING_MAX; } if (ceil_mode) { p.round_mode = ROUND_CEIL; } else { p.round_mode = ROUND_FLOOR; } p.kernel_t = 0; p.kernel_h = 0; p.kernel_w = 0; if (kernels.size() == 3) { p.kernel_t = kernels[0]; p.kernel_h = kernels[1]; p.kernel_w = kernels[2]; } else if (kernels.size() == 2) { p.kernel_t = 1; p.kernel_h = kernels[0]; p.kernel_w = kernels[1]; } else if (kernels.size() == 1) { p.kernel_t = 1; p.kernel_h = kernels[0]; p.kernel_w = 1; } p.stride_t = 1; p.stride_h = 1; p.stride_w = 1; if (strides.size() == 3) { p.stride_t = strides[0]; p.stride_h = strides[1]; p.stride_w = strides[2]; } else if (strides.size() == 2) { p.stride_h = strides[0]; p.stride_w = strides[1]; } else if (strides.size() == 1) { p.stride_h = strides[0]; } p.pad_before = 0; p.pad_top = 0; p.pad_left = 0; p.pad_after = 0; p.pad_bottom = 0; p.pad_right = 0; if (pads.size() == 6) { p.pad_before = pads[0]; p.pad_top = pads[1]; p.pad_left = pads[2]; p.pad_after = pads[3]; p.pad_bottom = pads[4]; p.pad_right = pads[5]; } else if (pads.size() == 4) { p.pad_top = pads[0]; p.pad_left = pads[1]; p.pad_bottom = pads[2]; p.pad_right = pads[3]; } else if (pads.size() == 2) { p.pad_top = pads[0]; p.pad_bottom = pads[1]; } else if (autoPad == "SAME_UPPER") { p.pad_top = (p.kernel_h - 1) / 2; p.pad_bottom = (p.kernel_h - 1) - p.pad_top; p.pad_left = (p.kernel_w - 1) / 2; p.pad_right = (p.kernel_w - 1) - p.pad_left; } ps.pooling_spec = p; return ps; } // <======= Addition
-
Pooling is non-weight op, skip this step.
-
-
Example: support
pooling
in tflite converter-
Switch to model_tools/src/tflite, which is the tflite converter for bolt;
-
Judgment: pooling is non-weight-op;
-
Define
pooling
parameter format;Note: Definition actions same with add pooling in caffe converter step(3) . Please refer the former content.
-
Extract the meta information of
pooling
operator in tflite.4.1 Modify OperatorType convert_tflite_type(std::string inputType) function in model_tools/tflite/tflite_adaptee.h.
Add the tflite type mapping code as following:
OperatorType convert_tflite_type(tflite::BuiltinOperator tfliteType) { std::map<tflite::BuiltinOperator, OperatorType> operatorMap = { // Addition ======> {tflite::BuiltinOperator_MAX_POOL_2D, OT_Pooling}, {tflite::BuiltinOperator_AVERAGE_POOL_2D, OT_Pooling}, // <====== Addition }; }
4.2 Register the abstract adapt_Pooling() function in class ModelAdaptee if it has not been registered in model_tools/src/model_adaptee.h. Otherwise, skip this step.
virtual EE adapt_operator(OperatorType type, ParameterSpec *ps) { std::map<OperatorType, AdaptOperatorFunction> functions = { // Addition ======> {OT_Pooling, &ModelAdaptee::adapt_Pooling}, // <====== Addition }; } // Addition ======> REGISTER_EMPTY_ADAPT_OPERATOR(adapt_Pooling) // <====== Addition
4.3 Extract the meta information of pooling operator from tflite model, add ParameterSpec adapt_Pooling() override function in model_tools/tflite/tflite_adaptee.h.
// Addition ======> ParameterSpec adapt_Pooling() override { ParameterSpec ps; PoolingParamSpec p; memset(&p, 0, sizeof(p)); p.kernel_t = 1; p.stride_t = 1; p.pad_before = 0; p.pad_after = 0; p.pad_top = 0; p.pad_bottom = 0; p.pad_left = 0; p.pad_right = 0; p.round_mode = ROUND_CEIL; const auto &inputTensor = this->tfliteTensors[this->tfliteOperators[this->tfliteOperatorIndex]->inputs[0]]; const auto &inputShape = inputTensor->shape; CHECK_REQUIREMENT(inputShape.size() == 4); if (opCode == tflite::BuiltinOperator_MEAN) { // Interpret as global pooling const auto &axisTensor = this->tfliteTensors[this->tfliteOperators[this->tfliteOperatorIndex]->inputs[1]]; const auto &axisData = tfliteModelBuffer[axisTensor->buffer]->data; auto axisPtr = reinterpret_cast<const int32_t *>(axisData.data()); CHECK_REQUIREMENT(1 == axisPtr[0] && 2 == axisPtr[1]); p.mode = POOLING_MEAN; p.kernel_h = 0; p.kernel_w = 0; p.stride_h = 1; p.stride_w = 1; } else { const auto &tflitePoolOption = this->tfliteOperators[this->tfliteOperatorIndex]->builtin_options.AsPool2DOptions(); p.kernel_h = tflitePoolOption->filter_height; p.kernel_w = tflitePoolOption->filter_width; p.stride_h = tflitePoolOption->stride_h; p.stride_w = tflitePoolOption->stride_w; int tfPaddingRoundMode = tflitePoolOption->padding; if (tfPaddingRoundMode == 0) { p.round_mode = ROUND_TF_SAME; int oLength = (inputShape[2] + p.stride_w - 1) / p.stride_w; int padLength = UNI_MAX((oLength - 1) * p.stride_w + p.kernel_w - inputShape[2], 0); p.pad_left = padLength / 2; p.pad_right = padLength - p.pad_left; oLength = (inputShape[1] + p.stride_h - 1) / p.stride_h; padLength = UNI_MAX((oLength - 1) * p.stride_h + p.kernel_h - inputShape[1], 0); p.pad_top = padLength / 2; p.pad_bottom = padLength - p.pad_top; } else if (tfPaddingRoundMode == 1) { p.round_mode = ROUND_TF_VALID; } else { UNI_ERROR_LOG("can not process operator location:%d Pooling round mode.\n", this->tfliteOperatorIndex); } if (opCode == tflite::BuiltinOperator_MAX_POOL_2D) { p.mode = POOLING_MAX; } else if (opCode == tflite::BuiltinOperator_AVERAGE_POOL_2D) { p.mode = POOLING_MEAN; } insertActivationOperator( getActivationOperatorType(tflitePoolOption->fused_activation_function)); } ps.pooling_spec = p; return ps; } // <====== Addition
-
Pooling is non-weight op, skip this step.
-
In tensor, you can define any operator for computing.
- Create a new operator file in compute/tensor/src;
- The computing implementations on various backends(CPU, GPU) are usually different. You should add the corresponding operator implementation to the specific folder in compute/tensor/src depending on the target backend.
-
Example: add
pooling
operator in tensor-
Create
pooling.cpp
in compute/tensor/src, the complete implementation refers to compute/tensor/src/pooling.cpp -
For CPU, create
pooling.cpp
in compute/tensor/src/cpu/arm/pooling.cpp, and dispatch to implementations of different data type(bnn/fp16/fp32/int8). -
For GPU, create
pooling.cpp
in compute/tensor/src/gpu/mali/pooling.cpp, and only fp16 supported now compute/tensor/src/gpu/mali/fp16/pooling_mali_fp16.cpp, and put your cl file in compute/tensor/src/gpu/mali/cl/pooling_max.cpp, the file name of cl must be the same with kernel name. if your kernel has compile option, create .sh file in common/gcl/tools/kernel_lib_compile/sh/compile, the file name of sh must be the same with kernel name.
-
In engine, you can define any operator for the inference of your model.
- Add the definition of the specific operator in inference/engine/include;
- If the specific operator implement in CPU is different from its implement in GPU, implement should be divided into CPU and GPU version. If the specific operator implement in CPU is same with its implement in GPU, skip this step.
-
Example: add
pooling
operator in inference/engine-
Create
pooling.hpp
in inference/engine/include, add the definition ofpooling
operator, the complete implement code refers to inference/engine/include/pooling.hpp -
pooling
operator implement in CPU is different from its implement in GPU. Sopooling
implement should be two version: CPU and GPU(1) Create
pooling_cpu.hpp
and addpooling
CPU implement in inference/engine/include/cpu , the complete implement refers to inference/engine/include/cpu/pooling_cpu.hpp(2) Create
pooling_ocl.hpp
and addpooling
GPU implement in inference/engine/include/ocl , the complete implement refers to inference/engine/include/ocl/pooling_ocl.hpp
-
-
question
Submit any question you have encountered when you use Bolt. You can give feedback to us through committing issues. Refer to https://github.com/huawei-noah/bolt/issues, create your new issue and submit it. The issue can be a bug in Bolt, a suggestion for Bolt, or anything you don't understand about Bolt.
-
feature request
Submit any feature that you want but it has not been implemented in Bolt. We have created a special issue and you can leave a commit under this issue . We will seriously consider the needs of all users and continue to enrich the functions of Bolt.
-
add MIT license
For consistency, please add MIT license at the head of your source codes indicating your codes will be open to all.
-
provide an executable unit test
Fork Bolt on your github account. Modify your code and make sure your code pass all testing cases. Commit the code and initiate a pull request on github.