Before learning this developer guide of bolt, the code architecture is strongly recommended for you to read in advance. In code architecture, you will get the deep understanding of the whole design of bolt, which helps you develop bolt more efficiently. If you want to verify your model quickly, you can use the out-of-the-box c api or java api to infer your model and check the inference result. If your model run with time series data, you can use Flow to accelerate the inference. What’s more, if your encounter unsupported operators in conversion or inference of your model, you can customize the unsupported operators step by step which has been described in details in the document.

    Use out-of-the-box API to infer your model
        C API
        Java API
    Accelerate time series model by Flow
    Customize models with unsupported operators step by step
        model conversion customization
        tensor computing customization
        inference's engine customization
    How to contribute
        submit issue
        pull request

Use out-of-the-box API to infer your model

C API

Bolt provides C API document generated by doxygen to help you use C API, image classification example and Chinese input method example. You can compile it and link libbolt.so library with your C/C++ project.

Java API

Bolt provides Java API document generated by doxygen to help use Java API with a detailed example. You can compile bolt and load libBoltModel.so to using the Java Native Interface(JNI) with your Java project.

Python API

Bolt provides easy-to-use python api for the developers.Please check the usage of python api in Bolt.

Accelerate time series model by Flow

Flow provides API document generated by doxygen to help use Flow C++ header, and examples(tinybert, faceSR, ASR). You can also use Java API and there is a simple GSR test.

Here are the steps to use Flow:

Use predefined flow protobuf standard to define a graph

Here is an example for CV application faceSR graph file flow_facesr.prototxt. This graph has one input, one input node, one inference node and one output. Input node need to be marked as Input, and inference node need to be marked as Inference. Each node can have multiple input or output tensors. Each type node has typical fields.
Add output tensor size infer function for each node, and register function to Flow function manager (optional)

facesr doesn't need to post-process the final tensor, so the node's output tensor can be used directly.

If you need to post-process the final tensor, you can refer to flow_tinybert which has defined a post-processing function(tinybertInferOutputSize) and register the post-processing function by using flowRegisterFunction API.
Add input tensor pre-processing function for each node, and register function to Flow function manager (optional)

(same as output tensor size infer function)
Add output tensor post-processing function for each node, and register function to Flow function manager (optional)

(same as output tensor size infer function)
Define a Flow object and add task

Declare a Flow object and set CPU cores and GPU. Describe the task by Task format and use enque API to add the task into Flow heterogeneous executor.
Get Flow process result

Use dequeue API to get the result sorted in FIFO order. You can choose to set the results as a block to get all enqueue task results at the same time. size function can be used to query the unfinished task number.

Customize models with unsupported operators step by step

model conversion customization

In model_tools, you can define any operator for model conversion.

Switch to code of the specific framework (caffe/onnx/tflite) you are working on;
Judge the op whether it is a weight-op or non-weight-op;
Define the Operator parameter format;
Extract the meta information of the operator;
Extract the weight data if the operator is a weight-op, otherwise skip this step.

Example: support pooling in caffe converter

Switch to model_tools/src/caffe, which is the caffe converter for bolt;
Judgment: pooling is non-weight-op.

Define pooling parameter format.

3.1 Modify OperatorType data structure in common/uni/include/operator_type.h

typedef enum {
...
    OT_Pooling,    //  Addition 
...
} OperatorType

3.2 Modify inline const char* const* OperatorTypeName() function in common/uni/include/operator_type.h

inline const char* const* OperatorTypeName() {
    static const char* const names[] = {
        ...
        "OT_Pooling",    // Addition, please corresponds to the OperatorType
        ...
    }
}

3.3 Add pooling definition of bolt in common/uni/include/parameter_spec.h

// Addition ======>
typedef struct {
    unsigned int kernel_t;
    unsigned int kernel_h;
    unsigned int kernel_w;
    unsigned int stride_t;
    unsigned int stride_h;
    unsigned int stride_w;
    unsigned int pad_before;
    unsigned int pad_after;
    unsigned int pad_top;
    unsigned int pad_bottom;
    unsigned int pad_left;
    unsigned int pad_right;
    RoundMode round_mode;
    PoolingMode mode;
} PoolingParamSpec;
// <====== Addition

3.4 Modify int get_operator_parameter_size(OperatorType operatorType) function in common/uni/include/parameter_spec.h

std::map<OperatorType, int> operatorParameterSizeMap = {
    ...
    {OT_Pooling, sizeof(PoolingParamSpec)},    // Addition
    };

Extract the meta information of pooling operator in caffe.

4.1 Modify OperatorType convert_caffe_type(std::string inputType) function in model_tools/src/caffe/caffe_adaptee.h.

Add the caffe type mapping code as following:

OperatorType convert_caffe_type(std::string inputType) {
    std::map<std::string, OperatorType> operatorMap = {
       // Addition ======>
       {"Pooling", OT_Pooling},
       // <====== Addition
    };
}

4.2 Register the abstract adapt_Pooling() function in class ModelAdaptee if it has not been registered in model_tools/src/model_adaptee.h. Otherwise, skip this step.

virtual EE adapt_operator(OperatorType type, ParameterSpec *ps) {
    std::map<OperatorType, AdaptOperatorFunction> functions = {
        // Addition ======>
        {OT_Pooling, &ModelAdaptee::adapt_Pooling},
        // <====== Addition
    };
}

// Addition ======>
REGISTER_EMPTY_ADAPT_OPERATOR(adapt_Pooling)
// <====== Addition

4.3 Extract the meta information of pooling operator from caffe model, add ParameterSpec adapt_Pooling() override function in model_tools/src/caffe/caffe_adaptee.h.

// Addition ======>
ParameterSpec adapt_Pooling() override
{
    ParameterSpec ps;
    PoolingParamSpec p;
    memset(&p, 0, sizeof(p));
    p.kernel_t = 1;
    p.stride_t = 1;
    p.pad_before = 0;
    p.pad_after = 0;
    auto cp = layer.pooling_param();
    if (cp.has_kernel_w() && cp.has_kernel_h()) {
        p.kernel_w = cp.kernel_w();
        p.kernel_h = cp.kernel_h();
    } else {
        p.kernel_h = cp.kernel_size();
        p.kernel_w = p.kernel_h;
    }
    if (cp.has_stride_w() && cp.has_stride_h()) {
        p.stride_w = cp.stride_w();
        p.stride_h = cp.stride_h();
    } else {
        p.stride_h = cp.stride();
        p.stride_w = p.stride_h;
    }
    bool global_pooling = cp.global_pooling();
    if (global_pooling) {
        p.kernel_h = 0;
        p.kernel_w = 0;
        p.stride_h = 1;
        p.stride_w = 1;
    } else {
        CHECK_REQUIREMENT(p.kernel_h > 0);
    }
    if (cp.has_pad_w() && cp.has_pad_h()) {
        p.pad_left = cp.pad_w();
        p.pad_right = p.pad_left;
        p.pad_top = cp.pad_h();
        p.pad_bottom = p.pad_top;
    } else {
        p.pad_top = cp.has_pad() ? cp.pad() : 0;
        p.pad_bottom = p.pad_top;
        p.pad_left = p.pad_top;
        p.pad_right = p.pad_top;
    }

    if (cp.has_round_mode() && cp.round_mode() == 1) {
        p.round_mode = ROUND_FLOOR;
    } else {
        p.round_mode = ROUND_CEIL;
    }
    auto op = cp.pool();
    switch (op) {
        case caffe::PoolingParameter_PoolMethod_MAX: {
            p.mode = POOLING_MAX;
            break;
        }
        case caffe::PoolingParameter_PoolMethod_AVE: {
            p.mode = POOLING_MEAN;
            break;
        }
        default: {
            const google::protobuf::EnumDescriptor *descriptor =
                caffe::PoolingParameter::PoolMethod_descriptor();
            UNI_ERROR_LOG("can not map operator name:%s %s to Pooling.\n",
                this->layer.name().c_str(), descriptor->FindValueByNumber(op)->name().c_str());
        }
    }
    ps.pooling_spec = p;
    return ps;
}
// <====== Addition

Pooling is non-weight op, skip this step.

Example: support pooling in onnx converter

Switch to model_tools/src/onnx, which is the onnx converter for bolt;
Judgment: pooling is non-weight-op;
Define pooling parameter format.

Note: Definition actions same with add pooling in caffe converter step 3 . Please refer the former content.

Extract the meta information of pooling operator in onnx.

4.1 Modify the function named OperatorType convert_onnx_type(std::string inputType) in model_tools/onnx/onnx_adaptee.h.

Add the onnx type mapping code as following:

OperatorType convert_onnx_type(std::string inputType) {
    std::map<std::string, OperatorType> operatorMap = {
        // Addition ======>
        {"AveragePool", OT_Pooling},
        {"MaxPool", OT_Pooling},
        {"GlobalAveragePool", OT_Pooling},
        // <====== Addition
    };
}

4.2 Register the abstract adapt_Pooling() function in class ModelAdaptee if it has not been registered in model_tools/src/model_adaptee.h. Otherwise, skip this step.

virtual EE adapt_operator(OperatorType type, ParameterSpec *ps) {
    std::map<OperatorType, AdaptOperatorFunction> functions = {
        // Addition ======>
        {OT_Pooling, &ModelAdaptee::adapt_Pooling},
        // <====== Addition
    };
}

// Addition ======>
REGISTER_EMPTY_ADAPT_OPERATOR(adapt_Pooling)
// <====== Addition

4.3 Extract the meta information of pooling operator from onnx model, add ParameterSpec adapt_Pooling() override function in model_tools/onnx/onnx_adaptee.h.

// Addition ======>
ParameterSpec adapt_Pooling() override
{
    ParameterSpec ps;
    PoolingParamSpec p;
    memset(&p, 0, sizeof(p));
    std::string autoPad = get_string(this->onnxNode, "auto_pad");
    std::vector<int> kernels = get_ints(this->onnxNode, "kernel_shape");
    std::vector<int> strides = get_ints(this->onnxNode, "strides");
    std::vector<int> pads = get_ints(this->onnxNode, "pads");
    int ceil_mode = get_int(this->onnxNode, "ceil_mode", 0);

    const std::string &onnxNodeType = this->onnxNode.op_type();
    if (onnxNodeType == "AveragePool" || onnxNodeType == "ReduceMean" ||
        onnxNodeType == "GlobalAveragePool") {
        p.mode = POOLING_MEAN;
    } else {
        p.mode = POOLING_MAX;
    }

    if (ceil_mode) {
        p.round_mode = ROUND_CEIL;
    } else {
        p.round_mode = ROUND_FLOOR;
    }

    p.kernel_t = 0;
    p.kernel_h = 0;
    p.kernel_w = 0;
    if (kernels.size() == 3) {
        p.kernel_t = kernels[0];
        p.kernel_h = kernels[1];
        p.kernel_w = kernels[2];
    } else if (kernels.size() == 2) {
        p.kernel_t = 1;
        p.kernel_h = kernels[0];
        p.kernel_w = kernels[1];
    } else if (kernels.size() == 1) {
        p.kernel_t = 1;
        p.kernel_h = kernels[0];
        p.kernel_w = 1;
    }

    p.stride_t = 1;
    p.stride_h = 1;
    p.stride_w = 1;
    if (strides.size() == 3) {
        p.stride_t = strides[0];
        p.stride_h = strides[1];
        p.stride_w = strides[2];
    } else if (strides.size() == 2) {
        p.stride_h = strides[0];
        p.stride_w = strides[1];
    } else if (strides.size() == 1) {
        p.stride_h = strides[0];
    }

    p.pad_before = 0;
    p.pad_top = 0;
    p.pad_left = 0;
    p.pad_after = 0;
    p.pad_bottom = 0;
    p.pad_right = 0;
    if (pads.size() == 6) {
        p.pad_before = pads[0];
        p.pad_top = pads[1];
        p.pad_left = pads[2];
        p.pad_after = pads[3];
        p.pad_bottom = pads[4];
        p.pad_right = pads[5];
    } else if (pads.size() == 4) {
        p.pad_top = pads[0];
        p.pad_left = pads[1];
        p.pad_bottom = pads[2];
        p.pad_right = pads[3];
    } else if (pads.size() == 2) {
        p.pad_top = pads[0];
        p.pad_bottom = pads[1];
    } else if (autoPad == "SAME_UPPER") {
        p.pad_top = (p.kernel_h - 1) / 2;
        p.pad_bottom = (p.kernel_h - 1) - p.pad_top;
        p.pad_left = (p.kernel_w - 1) / 2;
        p.pad_right = (p.kernel_w - 1) - p.pad_left;
    }
    ps.pooling_spec = p;
    return ps;
}
// <======= Addition

Pooling is non-weight op, skip this step.

Example: support pooling in tflite converter

Switch to model_tools/src/tflite, which is the tflite converter for bolt;
Judgment: pooling is non-weight-op;
Define pooling parameter format;

Note: Definition actions same with add pooling in caffe converter step(3) . Please refer the former content.

Extract the meta information of pooling operator in tflite.

4.1 Modify OperatorType convert_tflite_type(std::string inputType) function in model_tools/tflite/tflite_adaptee.h.

Add the tflite type mapping code as following:

OperatorType convert_tflite_type(tflite::BuiltinOperator tfliteType) {
    std::map<tflite::BuiltinOperator, OperatorType> operatorMap = {
        // Addition ======>
        {tflite::BuiltinOperator_MAX_POOL_2D, OT_Pooling},
        {tflite::BuiltinOperator_AVERAGE_POOL_2D, OT_Pooling},
        // <====== Addition
    };
}

4.2 Register the abstract adapt_Pooling() function in class ModelAdaptee if it has not been registered in model_tools/src/model_adaptee.h. Otherwise, skip this step.

virtual EE adapt_operator(OperatorType type, ParameterSpec *ps) {
    std::map<OperatorType, AdaptOperatorFunction> functions = {
        // Addition ======>
        {OT_Pooling, &ModelAdaptee::adapt_Pooling},
        // <====== Addition
    };
}

// Addition ======>
REGISTER_EMPTY_ADAPT_OPERATOR(adapt_Pooling)
// <====== Addition

4.3 Extract the meta information of pooling operator from tflite model, add ParameterSpec adapt_Pooling() override function in model_tools/tflite/tflite_adaptee.h.

// Addition ======>
ParameterSpec adapt_Pooling() override
{
    ParameterSpec ps;
    PoolingParamSpec p;
    memset(&p, 0, sizeof(p));
    p.kernel_t = 1;
    p.stride_t = 1;
    p.pad_before = 0;
    p.pad_after = 0;
    p.pad_top = 0;
    p.pad_bottom = 0;
    p.pad_left = 0;
    p.pad_right = 0;
    p.round_mode = ROUND_CEIL;

    const auto &inputTensor =
        this->tfliteTensors[this->tfliteOperators[this->tfliteOperatorIndex]->inputs[0]];
    const auto &inputShape = inputTensor->shape;
    CHECK_REQUIREMENT(inputShape.size() == 4);
    if (opCode == tflite::BuiltinOperator_MEAN) {  // Interpret as global pooling
        const auto &axisTensor =
            this->tfliteTensors[this->tfliteOperators[this->tfliteOperatorIndex]->inputs[1]];
        const auto &axisData = tfliteModelBuffer[axisTensor->buffer]->data;
        auto axisPtr = reinterpret_cast<const int32_t *>(axisData.data());
        CHECK_REQUIREMENT(1 == axisPtr[0] && 2 == axisPtr[1]);
        p.mode = POOLING_MEAN;
        p.kernel_h = 0;
        p.kernel_w = 0;
        p.stride_h = 1;
        p.stride_w = 1;
    } else {
        const auto &tflitePoolOption =
            this->tfliteOperators[this->tfliteOperatorIndex]->builtin_options.AsPool2DOptions();
        p.kernel_h = tflitePoolOption->filter_height;
        p.kernel_w = tflitePoolOption->filter_width;
        p.stride_h = tflitePoolOption->stride_h;
        p.stride_w = tflitePoolOption->stride_w;
        int tfPaddingRoundMode = tflitePoolOption->padding;
        if (tfPaddingRoundMode == 0) {
            p.round_mode = ROUND_TF_SAME;

            int oLength = (inputShape[2] + p.stride_w - 1) / p.stride_w;
            int padLength = UNI_MAX((oLength - 1) * p.stride_w + p.kernel_w - inputShape[2], 0);
            p.pad_left = padLength / 2;
            p.pad_right = padLength - p.pad_left;

            oLength = (inputShape[1] + p.stride_h - 1) / p.stride_h;
            padLength = UNI_MAX((oLength - 1) * p.stride_h + p.kernel_h - inputShape[1], 0);
            p.pad_top = padLength / 2;
            p.pad_bottom = padLength - p.pad_top;
        } else if (tfPaddingRoundMode == 1) {
            p.round_mode = ROUND_TF_VALID;
        } else {
            UNI_ERROR_LOG("can not process operator location:%d Pooling round mode.\n",
                this->tfliteOperatorIndex);
        }
        if (opCode == tflite::BuiltinOperator_MAX_POOL_2D) {
            p.mode = POOLING_MAX;
        } else if (opCode == tflite::BuiltinOperator_AVERAGE_POOL_2D) {
            p.mode = POOLING_MEAN;
        }
        insertActivationOperator(
            getActivationOperatorType(tflitePoolOption->fused_activation_function));
    }
    ps.pooling_spec = p;
    return ps;
}
// <====== Addition

Pooling is non-weight op, skip this step.

tensor computing customization

In tensor, you can define any operator for computing.

Create a new operator file in compute/tensor/src;
The computing implementations on various backends(CPU, GPU) are usually different. You should add the corresponding operator implementation to the specific folder in compute/tensor/src depending on the target backend.

Example: add pooling operator in tensor
1. Create pooling.cpp in compute/tensor/src, the complete implementation refers to compute/tensor/src/pooling.cpp
2. For CPU, create pooling.cpp in compute/tensor/src/cpu/arm/pooling.cpp, and dispatch to implementations of different data type(bnn/fp16/fp32/int8).
3. For GPU, create pooling.cpp in compute/tensor/src/gpu/mali/pooling.cpp, and only fp16 supported now compute/tensor/src/gpu/mali/fp16/pooling_mali_fp16.cpp, and put your cl file in compute/tensor/src/gpu/mali/cl/pooling_max.cpp, the file name of cl must be the same with kernel name. if your kernel has compile option, create .sh file in common/gcl/tools/kernel_lib_compile/sh/compile, the file name of sh must be the same with kernel name.

inference's engine customization

In engine, you can define any operator for the inference of your model.

Add the definition of the specific operator in inference/engine/include;
If the specific operator implement in CPU is different from its implement in GPU, implement should be divided into CPU and GPU version. If the specific operator implement in CPU is same with its implement in GPU, skip this step.

Example: add pooling operator in inference/engine
1. Create pooling.hpp in inference/engine/include, add the definition of pooling operator, the complete implement code refers to inference/engine/include/pooling.hpp
2. pooling operator implement in CPU is different from its implement in GPU. So pooling implement should be two version: CPU and GPU
  
  (1) Create pooling_cpu.hpp and add pooling CPU implement in inference/engine/include/cpu , the complete implement refers to inference/engine/include/cpu/pooling_cpu.hpp
  
  (2) Create pooling_ocl.hpp and add pooling GPU implement in inference/engine/include/ocl , the complete implement refers to inference/engine/include/ocl/pooling_ocl.hpp

How to contribute

submit issue

question

Submit any question you have encountered when you use Bolt. You can give feedback to us through committing issues. Refer to https://github.com/huawei-noah/bolt/issues, create your new issue and submit it. The issue can be a bug in Bolt, a suggestion for Bolt, or anything you don't understand about Bolt.
feature request

Submit any feature that you want but it has not been implemented in Bolt. We have created a special issue and you can leave a commit under this issue . We will seriously consider the needs of all users and continue to enrich the functions of Bolt.

pull request

add MIT license

For consistency, please add MIT license at the head of your source codes indicating your codes will be open to all.
provide an executable unit test

Fork Bolt on your github account. Modify your code and make sure your code pass all testing cases. Commit the code and initiate a pull request on github.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEVELOPER.md

DEVELOPER.md

Contents

Use out-of-the-box API to infer your model

C API

Java API

Python API

Accelerate time series model by Flow

Here are the steps to use Flow:

Use predefined flow protobuf standard to define a graph

Add output tensor size infer function for each node, and register function to Flow function manager (optional)

Add input tensor pre-processing function for each node, and register function to Flow function manager (optional)

Add output tensor post-processing function for each node, and register function to Flow function manager (optional)

Define a Flow object and add task

Get Flow process result

Customize models with unsupported operators step by step

model conversion customization

tensor computing customization

inference's engine customization

How to contribute

submit issue

pull request

Files

DEVELOPER.md

Latest commit

History

DEVELOPER.md

File metadata and controls

Contents

Use out-of-the-box API to infer your model

C API

Java API

Python API

Accelerate time series model by Flow

Here are the steps to use Flow:

Use predefined flow protobuf standard to define a graph

Add output tensor size infer function for each node, and register function to Flow function manager (optional)

Add input tensor pre-processing function for each node, and register function to Flow function manager (optional)

Add output tensor post-processing function for each node, and register function to Flow function manager (optional)

Define a Flow object and add task

Get Flow process result

Customize models with unsupported operators step by step

model conversion customization

tensor computing customization

inference's engine customization

How to contribute

submit issue

pull request