From 5e8c0019d95c269943ceb9d32db501a054d4104f Mon Sep 17 00:00:00 2001
From: Varun Tripathi <v-tripathi@ti.com>
Date: Fri, 3 Mar 2023 15:08:59 +0530
Subject: [PATCH] [DOCS] Updated Supported OPs documentation & fixed cpp docs

---
 docs/supported_ops_rts_versions.md | 109 +++++++++++++----------------
 examples/tidlrt_cpp/README.md      |  19 -----
 2 files changed, 50 insertions(+), 78 deletions(-)
diff --git a/docs/supported_ops_rts_versions.md b/docs/supported_ops_rts_versions.md
index ebaedf1..0283116 100644
--- a/docs/supported_ops_rts_versions.md
+++ b/docs/supported_ops_rts_versions.md
@@ -1,14 +1,13 @@
 # Supported Operators & Runtimes
-
-# TIDL-Runtime Supported Layers Overview
+## TIDL-Runtime Supported Layers Overview:
 1. Convolution Layer
 2. Spatial Pooling Layer
     - Average and Max Pooling
 3. Global Pooling Layer
-    - Average and Max Pooling
+    - Average Pooling
 4. ReLU Layer 
 5. Element Wise Layer
-    - Add, Product and Max
+    - Add, Product
 6. Inner Product Layer
     - Fully Connected Layer
 7. Soft Max Layer
@@ -31,71 +30,63 @@
 24. Sigmoid Layer
 25. Batch Reshape Layer
 26. Data/Format conversion layer 
-
-## Core Layers/Operators Mapping & Notes
-| No | TIDL Layer Type                | Caffe Layer Type                    | Tensorflow Ops                          | ONNX Ops                                    | tflite Ops                                 | Notes |
-|:--:|:-------------------------------|:------------------------------------|:----------------------------------------|:--------------------------------------------|:-------------------------------------------|:------|
-| 1  | TIDL_ConvolutionLayer          | Convolution<br>ConvolutionDepthwise | Conv2D<br>DepthwiseConv2dNative         | Conv                                        | CONV_2D<br>DEPTHWISE_CONV_2D               | Regular & depth-wise conv will be imported as conv. <br> For TF and tflite DepthwiseConv2dNative, depth_multiplier shall be 1 in Number of input channels > 1. <br> ReLU & BN layers will be merged into conv to get better performance. 1x1 conv will be converted to innerproduct.<br>Validated kernel size: 1x1, 3x3, 5x5, 7x7,1x3,3x1,1x5,5x1,1x7,7x1.<br> If stride == 4, only supported kernel == 11x11.<br>if stride == 2, kernel should be less than 7. Even dimensions of kernel like 2x2, 4x4, 6x6 are not supported.<br>Depthwise Separable Convolution only supports 3x3,5x5,7x7 with stride 1 and 3x3 with stride 2.<br> Dilated Convolution is only supported for non-strided convolution<br> Its recommended to have kernelH*kernelW*input channel/groupNum+enableBias % 64 == 0 whereever possible as it results into better utilization of hardware.<br> **Note : Please refer MMALIB release notes for all supported configuration.**<br> **Note : Some of the kernel combination's are not optimized in current release, please refer MMALIB release notes for the same.** |
-| 2  | TIDL_BatchNormLayer            | BatchNorm                           | FusedBatchNorm                          | BatchNormalization                          |                                            | ReLU & Scale & Bias & PReLU & Leaky Relu will be merged & imported as BN.<br> All the channel-wise Broad cast operations are mapped to BN now.|
-| 3  | TIDL_PoolingLayer              | Pooling                             | MaxPooling<br>AvgPooling<br>Mean        | MaxPool<br>AveragePool<br>GlobalAveragePool | MAX_POOL_2D<br>AVERAGE_POOL_2D<br>MEAN     | Validated pooling size: 1x1(MAX, stride 1x1/2x2), 2x2, 3x3.<br>4x4 pooling is not optimal. |
-| 4  | TIDL_EltWiseLayer              | EltWise                             | Add<br>Mul                              | Add<br>Mul                                  | ADD<br>MUL                                 | Only support SUM/MAX/PRODUCT.<br>Only support 2 inputs. |
-| 5  | TIDL_InnerProductLayer         | InnerProduct                        | MatMul                                  | Gemm                                        | FULLY_CONNECTED                            | Input shape must be 1x1x1xN. Please use global pooling/flatten before innerproduct.<br>Feature size larger than 2048*2048 is not optimal. |
-| 6  | TIDL_SoftMaxLayer              | SoftMax                             | Softmax                                 | Softmax                                     | SOFTMAX                                    | Input shape must be 1x1x1xN. Please use global pooling/flatten before softmax. |
-| 7  | TIDL_Deconv2DLayer             | Deconvolution                       | Conv2DTranspose                         | ConvTranspose                               | TRANSPOSE_CONV                             | Only 8x8, 4x4 and 2x2 kernel with 2x2 stride is supported. Recommend to use Resize/Upsample to get better performance. The output feature-map size shall be 2x the input|
-| 8  | TIDL_ConcatLayer               | Concat                              | ConcatV2                                | Concat                                      | CONCATENATION                              | Concat will do channel-wise combination by default. Concat will be width-wise if coming after a flatten layer. used in the context of SSD.<br> Width/Height wise concat is supported with Caffe|
-| 9  | TIDL_SliceLayer                | Slice                               | Slice                                   | Split                                       | NA                                         | Only support channel-wise slice. |
-| 10 | TIDL_CropLayer                 | Crop                                | NA                                      | NA                                          | NA                                         |  |
-| 11 | TIDL_FlattenLayer              | Flatten                             | NA                                      | Flatten                                     | NA                                         | 16bit is not optimal in current version. |
-| 12 | TIDL_ArgMaxLayer               | ArgMax                              | Argmax                                  | ArgMax                                      | ARG_MAX                                    | Only support axis == 1, mainly for the last layer of sematic segmentation. |
-| 13 | TIDL_DetectionOutputLayer      | DetectionOutput                     | tensorflow Object Detection API         | NA                                          | NA                                         | Please refer to comment 1. |
-| 14 | TIDL_ShuffleChannelLayer       | ShuffleChannel                      | NA                                      | Reshape + Transpose + Reshape               | NA                                         |  |
-| 15 | TIDL_ResizeLayer               | NA                                  | ResizeNearestNeighbor<br>ResizeBilinear | UpSample                                    | RESIZE_NEAREST_NEIGHBOR<br>RESIZE_BILINEAR | Only support Power of 2 and symmetric resize. Note that any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. As an example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize  |
-| 16 | TIDL_DepthToSpaceLayer          | NA                                  | NA | DepthToSpace                      | DEPTH_TO_SPACE |  Supports non-strided convolution with upscale of 2, 4 and 8 | 
-| 17 | TIDL_SigmoidLayer          | SIGMOID/LOGISTIC                         | Sigmoid/Logistic | Sigmoid/Logistic                       | SIGMOID/LOGISTIC |   |
-| 18 | TIDL_PadLayer          | NA                                  | Pad | Pad                                    | PAD |   |
-| 19 | TIDL_ColorConversionLayer          | NA                                  | NA | NA                                    | NA |  Only YUV420 NV12 format conversion to RGB/BGR color format is supported |
-| 20 | TIDL_BatchReshapeLayer | NA                                  | NA | NA                                    | NA |  used to covert batch of images to format which suits TIDL-RT and then convert back, refer [here](tidl_fsg_batch_processing.md) for more details |
-| 21 | TIDL_DataConvertLayer          | NA                                  | NA | NA                                    | NA |  NA |
-
+***
+## Core Layers/Operators Mapping & Notes:
+| No | TIDL Layer Type                | ONNX Ops                                    | TFLite Ops                                 | Notes |
+|:--:|:-------------------------------|:--------------------------------------------|:-------------------------------------------|:------|
+| 1  | TIDL_ConvolutionLayer          | Conv                                        | CONV_2D<br>DEPTHWISE_CONV_2D               | Regular & Depthwise convolution will be imported as convolution <br> For TFLite DepthwiseConv2dNative, depth_multiplier shall be 1 if number of input channels > 1. <br> ReLU & Batchnorm layers will be merged into convolution to get better performance<br>Validated kernel sizes: 1x1, 3x3, 5x5, 7x7,1x3,3x1,1x5,5x1,1x7,7x1.<br> If stride == 4, only supported kernel == 11x11.<br>if stride == 2, kernel should be less than 7. Even kernel dimensions like 2x2, 4x4, 6x6 are not supported.<br>Depthwise Separable Convolution only supports 3x3,5x5,7x7 with stride 1 and 3x3 with stride 2.<br> Dilated Convolution is only supported for non-strided convolution<br> **Note : Please refer to MMALIB's release notes in your SDK for all supported configuration**<br> **Note : Some of the kernel combinations are not optimized in the current release, please refer to MMALIB's release notes for the same** |
+| 2  | TIDL_BatchNormLayer            | BatchNormalization                          |                                            | ReLU, Scale, Bias, PReLU, Leaky ReLU, Hard Sigmoid & ELU will be merged & imported as batchnorm<br> All channel-wise broadcast operations are mapped to batchnorm |
+| 3  | TIDL_PoolingLayer              | MaxPool<br>AveragePool<br>GlobalAveragePool | MAX_POOL_2D<br>AVERAGE_POOL_2D<br>MEAN     | Pooling has been validated for the following kernel sizes: 3x3,2x2,1x1, with a maximum stride of 2 |
+| 4  | TIDL_EltWiseLayer              | Add<br>Mul                                  | ADD<br>MUL                                 | Support for 2 tensors validated extensively, multiple input tensors have had limited validation |
+| 5  | TIDL_InnerProductLayer         | Gemm                                        | FULLY_CONNECTED                            | Input shape must be 1x1x1xN.Please use global pooling/flatten before innerproduct<br>Feature size larger than 2048*2048 is not optimal |
+| 6  | TIDL_SoftMaxLayer              | Softmax                                     | SOFTMAX                                    | Input shape must be 1x1x1xN. Please use global pooling/flatten before softmax. |
+| 7  | TIDL_Deconv2DLayer             | ConvTranspose                               | TRANSPOSE_CONV                             | Only 8x8, 4x4 and 2x2 kernel with 2x2 stride is supported. It is recommended to use Resize/Upsample to get better performance|
+| 8  | TIDL_ConcatLayer               | Concat                                      | CONCATENATION                              | Concat defaults channel-wise by default. Concat will be width-wise if it happens post a flatten layer (used in the context of SSD)|
+| 9  | TIDL_SliceLayer                | Split                                       | NA                                         | Only channel wise slice is supported |
+| 10 | TIDL_CropLayer                 | NA                                          | NA                                         |  |
+| 11 | TIDL_FlattenLayer              | Flatten                                     | NA                                         | 16-bit is not optimal in the current version|
+| 12 | TIDL_ArgMaxLayer               | ArgMax                                      | ARG_MAX                                    | Only axis == 1 is supported (For Semantic Segmentation) |
+| 13 | TIDL_DetectionOutputLayer      | NA                                          | NA                                         | Please refer to the [Meta Architecture Documentation](./tidl_fsg_od_meta_arch.md) for further details |
+| 14 | TIDL_ShuffleChannelLayer       | Reshape + Transpose + Reshape               | NA                                         |  |
+| 15 | TIDL_ResizeLayer               | UpSample                                    | RESIZE_NEAREST_NEIGHBOR<br>RESIZE_BILINEAR | Only power of 2 and symmetric resize is supported <br>Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer <br> For example, an 8x8 resize will be replaced by a 4x4 resize followed by a 2x2 resize  |
+| 16 | TIDL_DepthToSpaceLayer         | DepthToSpace                                | DEPTH_TO_SPACE                             |  Supports non-strided convolution with upscale factors of 2, 4 and 8 | 
+| 17 | TIDL_SigmoidLayer              | Sigmoid/Logistic                            | SIGMOID/LOGISTIC                           |   |
+| 18 | TIDL_PadLayer                  | Pad                                         | PAD                                        |   |
+| 19 | TIDL_ColorConversionLayer      | NA                                          | NA                                         |  Only YUV420 NV12 format conversion to RGB/BGR color format is supported |
+| 20 | TIDL_BatchReshapeLayer         | NA                                          | NA                                         |  |
+| 21 | TIDL_DataConvertLayer          | NA                                          | NA                                         |  |
+<br>
 ## Other compatible layers
-| No | Caffe Layer Type | Tensorflow Ops | ONNX Ops  | tflite Ops    | Notes |
-|:--:|:-----------------|:---------------|:----------|:--------------|-------|
-| 1  | Bias             | BiasAdd        |           |               | Bias will be imported as BN. |
-| 2  | Scale            |                |           |               | Scale will be imported as BN. |
-| 3  | ReLU             | Relu<br>Relu6  | Relu      | RELU<br>RELU6 | ReLU will be imported as BN. |
-| 4  | PReLU            |                | Prelu     |               | PReLU will be imported as BN. |
-| 5  | Split            |                | Split     |               | Split layer will be removed after import. |
-| 6  | Reshape          | Reshape        | Reshape   | RESHAPE       | Please refer to comment 1. |
-| 7  | Permute          |                |           |               | Please refer to comment 1. |
-| 8  | Priorbox         |                |           |               | Please refer to comment 1. |
-| 9  |                  | Pad            | Pad       | PAD           | Padding will be taken care of during import process, and this layer will be automatically removed by import tool. |
-| 10 |                  |                |           | MINIMUM       | For relu6 / relu8      |
-| 11 | DropOut          |                |           |               | This layer is only used in training, and this layer will be automatically removed during import process. |
-| 12 |                  | Squeeze        |           |               | Flatten       |
-| 13 |                  | Shape          |           |               | Resize       |
-| 14 |                  |                | Transpose |               | For ShuffleChannelLayer       |
-| 15 |                  |                | Clip      |               | Parametric activation threshold PACT       |
-| 16 |                  |                | LeakyRelu |  LEAKY_RELU   | Leaky Relu will be imported as BN       |
-
+| No | ONNX Ops  | TFLite Ops    | Notes |
+|:--:|:----------|:--------------|-------|
+| 1  | Split     |               | Split layer will be removed after import |
+| 2  | Reshape   | RESHAPE       | Please refer to [Meta Architecture Documentation](./tidl_fsg_od_meta_arch.md) for further details |
+| 5  |            | MINIMUM       | For ReLU6 / ReLU8      |
+| 6  |           |               | This layer is only used in training, and this layer will be automatically removed during import process|
+| 7  |           |               | Flatten       |
+| 8  |           |               | Resize       |
+| 9  | Transpose |               | For ShuffleChannelLayer only      |
+| 10 | Clip      |               | Parametric activation threshold PACT       |
 
+<br>
 
 ## For Unlisted Layers/Operators
 
 Any unrecognized layers/operators will be converted to TIDL_UnsupportedLayer as a place-holder. The shape & parameters might not be correct. You may get the TIDL-RT importer result, but with such situation imported model will not work for inference on target/PC. |
-
-* If this operation is supported by TIDL-RT inference, but not supported by TIDL-RT import tool:
-    - Please modify the import tool source code.
-* If this operation is not supported by TIDL-RT inference, use open source run time
+<br>
 
 # Supported model formats & operator versions
-Proto file from below version are used for validating pre-trained models. In most cases new version models also shall work since the basic operations like convolution, pooling etc don't change
-  - Caffe - 0.17 (caffe-jacinto in gitHub)
-  - Tensorflow - 1.12
+Proto files from the versions below are used for validating pre-trained models. In most cases, models from new versions should also work since the core operators tend to remain the same
   - ONNX - 1.3.0 (opset 9 and 11)
   - TFLite - Tensorflow 2.0-Alpha
 
-*Since the Tensorflow 2.0 is planning to drop support for frozen buffer, we recommend to users to migrate to TFlite model format for Tensorflow 1.x.x as well. TFLite model format is supported in both TF 1.x.x and TF 2.x*
+*Since the Tensorflow 2.0 is planning to drop support for frozen buffer, we recommend to users to migrate to TFLite model format for Tensorflow 1.x.x as well. TFLite model format is supported in both TF 1.x.x and TF 2.x*
+
+# Feature set comparision across devices
 
-*Fixed-point models are only supported for TFLite & need calibrations images for TDA4VM"
+| Feature  | AM62A | AM68A |AM68PA (TDA4VM) | AM69A|
+| ------- |:-----------:|:-----------:|:-----------:|:-----------:|
+| Support for native inference of TFLite PTQ Models (int8)  | :heavy_check_mark: |:heavy_check_mark: | :x: |:heavy_check_mark:|
+| Support for LUT based operators  | :x: |:heavy_check_mark: | :heavy_check_mark:|:heavy_check_mark:|
 
-#Feature set comparison across devices:
+*TFLite Fixed-point PTQ models will need calibration frames on (AM68PA)TDA4VM*
\ No newline at end of file
diff --git a/examples/tidlrt_cpp/README.md b/examples/tidlrt_cpp/README.md
index be2cc1f..ac59399 100644
--- a/examples/tidlrt_cpp/README.md
+++ b/examples/tidlrt_cpp/README.md
@@ -9,14 +9,8 @@
 
 ## Introduction
    - TIDL RT CPP APIs only supports the model inference for the models which can be fully offloaded to DSP. The user is expected  to run the [Python Examples](../osrt_python/README.md#python-example) on PC to generate the model artifacts.
-> Note : We are planing to clean-up and unify the user interface for CPP examples by next release. We are also planning to add more CPP examples.
-
 ## Setup
 - Prepare the Environment for the Model compilation by following the setup section [here](../../README.md#setup)
-<<<<<<< HEAD
-=======
-
->>>>>>> README link fix TIDL-2747
 
 ## Build 
   - Build the CPP examples using cmake from repository base directory
@@ -33,20 +27,7 @@
     ./bin/Release/tidlrt_clasification -l test_data/labels.txt -i test_data/airshow.jpg  -f model-artifacts/tfl/mobilenet_v1_1.0_224/ -d 1
     ```
 ## Validation on Target
-<<<<<<< HEAD
-<<<<<<< HEAD
-- Build and run steps remains same for PC emulation and target. Copy the below folders from PC to the EVM where this repo is cloned before running the examples
-
-=======
-<<<<<<< HEAD
-- Build and runt steps remains same for PC emulation and target. Copy the below folders from PC to the EVM where this repo is cloned before running the examples
-=======
-- Build and run steps remains same for PC emulation and target. Copy the below folders from PC to the EVM where this repo is cloned before running the examples
->>>>>>> README link fix TIDL-2747
->>>>>>> README link fix TIDL-2747
-=======
 - Build and run steps remains same for PC emulation and target. Copy the below folders from PC to the EVM where this repo is cloned before running the examples
->>>>>>> need link update
   
     ```
     ./model-artifacts