Unified Inference Frontend (UIF) 1.2 User Guide

4.3: Deploy Model for Target Platforms

4.3.1: Deploy Model for FPGA

4.3.1.1: In-Framework

WeGO (<Whole Graph Optimizer) offers a smooth solution to deploy models on cloud DPU by integrating the Vitis™ AI Development kit with TensorFlow 1.x, TensorFlow 2.x, and PyTorch frameworks.

The following platforms are supported for WeGo:

Versal™ AI Core series VCK5000-PROD, V70

For more information on setting up the host and running WeGO examples, see the WeGo section.

4.3.1.2: Native

The following platforms are supported for UIF 1.2:

Zynq® UltraScale+™ MPSoC ZU9EG, ZCU102
Zynq UltraScale+ MPSoC ZU7EV, ZCU104
Zynq UltraScale+ MPSoC, Kria KV260
Versal AI Core series VC1902, VCK190, V70
Versal Edge AI Core series VE2082, VEK280

Run Models on Edge Platform

Refer to MPSoC and Versal to set up the board respectively.

Download the VART runtime and install it.

tar -xzvf vitis-ai-runtime-3.0.0.tar.gz
cd vitis-ai-runtime-3.0.0/2022.2/aarch64/centos
rpm -ivh --force *.rpm

Download the pre-compiled model from Vitis AI Model Zoo.

Take resnet_v1_101_tf as an example. Copy the model to the board.
```
tar -xzvf resnet_v1_101_tf-zcu102_zcu104_kv260-r3.0.0.tar.gz -C /usr/share/vitis_ai_library/models
```
Download the test examples from Vitis AI library examples. For resnet_v1_101_tf model, the classification example is used to test.
Cross-compile the classification example on the host, then copy the executable program to the target.
```
cd classification
bash build.sh
```

Run the program on the target:

./test_jpeg_classification resnet_v1_101_tf sample_classification.jpg

To test the performance of the model, run the following command:

./test_performance_facedetect resnet_v1_101_tf test_performance_facedetect.list -t 8 -s 60

-t: <num_of_threads>
-s: <num_of_seconds>

Run Models on Cloud Platform

Download Vitis-AI, enter the Vitis-AI directory, and then start the Docker® software. For more information, see the Getting Started section in the Vitis AI™ development environment documentation.
For the V70 Versal card, follow the instructions in Set Up the V70 Accelerator Card to set up the host.
Run Vitis AI Library examples on V70. For more information, see Run Vitis AI Library Samples in the Vitis AI documentation.

4.3.2: Deploy Model for CPU

4.3.2.1: Run UIF Models with ZenDNN

This section introduces using the ZenDNN optimized models with TensorFlow, PyTorch, and ONNXRT.

Run Examples with TensorFlow+ZenDNN

Install TensorFlow+ZenDNN. For more information, see the Installation section.

This tutorial uses ResNet50 as an example. Download the ResNet50 model. For more information, see the UIF Model Setup section.

Unzip the model package:

unzip tf_resnetv1_50_imagenet_224_224_6.97G_1.1_Z4.0.zip

Check the readme.md file for required dependencies. Run the run_bench.sh script for FP32 model and run_bench_quant.sh for the quantized model to benchmark the performance of ResNet-50:
```
cd tf_resnetv1_50_imagenet_224_224_6.97G_1.1_Z4.0
bash run_bench.sh 64 640
bash run_bench_quant.sh 64 640
```

Similarly, use the run_eval scripts for validating the accuracy. To set up the validation data, refer to the readme files provided with the model package.

Run Examples with PyTorch+ZenDNN

Install PyTorch+ZenDNN. For more information, see the Installation section.

This tutorial uses personreid-resnet50 as an example. Download the personreid-resnet50 model as described in the UIF Model Setup section.

Unzip the model package.

unzip pt_personreid-res50_market1501_256_128_5.3G_1.1_Z4.0.zip

Check the readme.md file for required dependencies. Run the run_bench.sh script for FP32 model and run_bench_quant.sh for the quantized model to benchmark the performance of personreid-resnet50.
```
cd pt_personreid-res50_market1501_256_128_5.3G_1.1_Z4.0
bash run_bench.sh 64 640
bash run_bench_quant.sh 64 640
```

Similarly, use the run_eval scripts for validating the accuracy. To set up the validation data, refer to the readme files provided with the model package.

Run Examples with ONNXRT+ZenDNN

Install ONNXRT+ZenDNN. For more information, see the Installation section.

This tutorial uses ResNet50 as an example. Download the ResNet50 model as described in the UIF Model Setup section.

Unzip the model package.

unzip onnx_resnetv1_50_imagenet_224_224_6.97G_1.1_Z4.0.zip

Check the readme.md file for required dependencies. Run the run_bench.sh script for FP32 model and run_bench_quant.sh for the quantized model to benchmark the performance of ResNet50.
```
cd onnx_resnetv1_50_imagenet_224_224_6.97G_1.1_Z4.0
bash run_bench.sh 64 640
bash run_bench_quant.sh 64 640
```

Similarly, use the run_eval scripts for validating the accuracy. To set up the validation data, refer to the readme files provided with the model package.

4.3.2.2: Run Custom Models with ZenDNN

Float Models

To run any single-precision (float) custom model on ZenDNN, follow the steps given in the ZenDNN Installation to install TensorFlow+ZenDNN, PyTorch+ZenDNN or ONNXRT+ZenDNN. Once installation is complete, the model can be run with standard inference steps. One such example is provided in the example section.

Model Compression Techniques for ZenDNN

1. Pruning a Deep Learning Model

To use the neural compression technique of the pruning a deep learning model, follow the steps given in the 4.1: Prune Model with UIF Optimizer section. After the pruned models are generated, they can be run on frameworks built with ZenDNN.

2. Quantizing a Deep Learning Model

Supporting quantization for AMD CPUs is done in two steps:

Use the UIF Quantizer tool to quantize a model.
Run the quantized model generated in step 1 through the ZenDNN model converter tool to create ZenDNN optimized model which can be run on ZenDNN.

To make use of the ZenDNN model converter tool:

Set up the environment:
1. Install conda.
2. Set up the TensorFlow+ZenDNN environment by following the steps in the ZenDNN Installation section.
3. Install up the model converter tool:
  
  From the .whl file provided for the model converter at /tools/zendnn, install using the following command:
```
python -m pip install ModelConverter-0.1-py3-none-linux_x86_64.whl
```
Convert the quantized model to a ZenDNN optimized model:

The quantized model which is generated with the UIF Quantizer tool for TensorFlow is given as input to the Model Converter tool.

Run the model converter using the following command:
```
model_converter --model_file <path/to/the/model> --out_location <path/to/output/directory>
```
Parameter Descriptions
```
--model_file      : Graph/model to be used for optimization.
--out_location    : Path to where the optimized model should be saved.
```
Example usage is as follows:
```
model_converter \
--model_file ~/quantized/quantized_pruned_19.56B.pb \
--out_location ./outputs/
```
The result is an optimized graph that will be saved at the desired output location. The model will be saved with the same name appended with _amd_opt.pb. In the example, the model will be saved as quantized_pruned_19.56B_amd_opt.pb to the outputs folder. This optimized model can then be run on AMD CPUs through ZenDNN. Refer to the AMD page for ZenDNN for more info.

Note: Currently only TensorFlow models quantized using the UIF Quantizer tool are supported with model converter tool.

This model converter is tested to work with Resnetv1 models (ResNet50, ResNet101, ResNet152), Inception models (InceptionV1, InceptionV3, InceptionV4), VGG models (VGG16, VGG19), EfficientNet models (EfficientNet-S, EfficientNet-M, EfficientNet-L), and RefineDet variants.

4.3.3: Deploy Model for GPU

Note: This GPU example assumes you run inside a Docker image started as described in section 1.1.3: Pull a UIF Docker Image and have downloaded the Resnet50v1.5 model as described in section 2.3: Get MIGraphX Models from UIF Model Zoo.

The following example describes the steps needed to run GPU inference using MIGraphX using a model named resnet50_fp32.onnx from the Model Zoo.

For additional information and examples on running MIGraphX, refer to the ROCm Deep Learning Guide.

4.3.3.1: Preliminary Steps

Download and run a GPU Docker. For more information, refer to the installation instructions in Installation.


  prompt% docker pull amdih/uif-pytorch:uif1.2_rocm5.6.1_vai3.5_py3.8_pytorch1.13 

  prompt% docker run -it –cap-add=SYS_PTRACE –security-opt seccomp=undefined --device=/dev/kfd --device=dri --group-add render --ipc=host --shm-size 8G amdih/uif-pytorch:uif1.2_rocm5.6.1_vai3.5_py3.8_pytorch1.13 - base

Download a trained model for the GPU. For more information, refer to Model Setup.


    prompt% cd ~ 

    prompt% git clone https://github.com/AMD/uif.git 

    prompt% cd uif/docs/2_model_setup

    prompt% python3 downloader.py

The following prompt appears:


    input:pt 
    
    choose model 

    0 : all 

    1 : pt_resnet50v1.5_imagenet_224_224_8.2G_1.1_M2.4

    … 

    input num: 1

The ResNet50 v1.5 PyTorch model is selected.


3. Choose model type.

0: all 

1: GPU 

2: MI100 

3: MI210

...

Select and download an MI-210 YModel:

input num:3 
  pt_resnet50v1.5_imagenet_224_224_8.2G_1.1_M2.4_MI210.zip  
                                          100.0%|100% done

The desired model is downloaded.

Note: The model is tuned for the current hardware:

    prompt% env MIOPEN_FIND_ENFORCE=3 migraphx-driver run resnet50_fp32.mxr

Run a MIGraphX example using a downloaded model.

Note: This example is adapted from Performing Inference using MIGraphX Python Library in the ROCm™ software platform documentation. For more details, refer to resnet50_inference.ipynb.

4.3.3.2: Prepare the Example

Install the Python packages used in the example.

    prompt% pip install opencv-python==4.1.2.30 

    prompt% pip install matplotlib

Clone the MIGraphX repository to get the example.

    prompt% cd ~ 

    prompt% git clone https://github.com/ROCmSoftwarePlatform/AMDMIGraphX 

    prompt% cd AMDMIGraphX/examples/vision/python_resnet50

Download a sample video and name it sample_vid.mp4.


    prompt% apt install youtube-dl 

    prompt% youtube-dl https://youtu.be/TkqYmvH_XVs 

    prompt% mv sample_vid-TkqYmvH_XVs.mp4 sample_vid.mp4

4.3.3.3: Run the Example from Python

    prompt% python3 

        import numpy as np 

        from matplotlib import pyplot as plt 

        import cv2 

        import json 

        import time 

        import os.path 

        from os import path 

        import sys  

        import migraphx  

        with open(‘imagenet_simple_labels.json’) as json_data: 

        labels = json.load(json.data)

4.3.3.4: Set Up the Video and Capture the Model


    model = migraphx.parse_onnx("resnet50_fp32.mxr") 

    model.compile(migraphx.get_target("gpu")) 

    model.print()     # Printed in terminal  

    cap = cv2.VideoCapture("sample_vid.mp4")

4.3.3.5: Add Code for Preprocessing Video Frames

    def make_nxn(image, n): 

        width  = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) 

        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) 

    if height > width: 

        dif = height - width 

        bar = dif // 2  

        square = image[(bar + (dif % 2)):(height - bar),:] 

        return cv2.resize(square, (n, n)) 

    elif width > height: 

        dif = width - height 

        bar = dif // 2 

        square = image[:,(bar + (dif % 2)):(width - bar)] 

        return cv2.resize(square, (n, n)) 

    else: 

        return cv2.resize(image, (n, n))         

    def preprocess(img_data): 

        mean_vec = np.array([0.485, 0.456, 0.406]) 

        stddev_vec = np.array([0.229, 0.224, 0.225]) 

        norm_img_data = np.zeros(img_data.shape).astype('float32') 

        for i in range(img_data.shape[0]):   

            norm_img_data[i,:,:] = (img_data[i,:,:]/255 - mean_vec[i]) / stddev_vec[i] 

        return norm_img_data  

    def predict_class(frame) -> int: 

        # Crop and resize original image 

        cropped = make_nxn(frame, 224) 

        # Convert from HWC to CHW 

        chw = cropped.transpose(2,0,1) 

    # Apply normalization 

        pp = preprocess(chw) 

    # Add singleton dimension (CHW to NCHW) 

        data = np.expand_dims(pp.astype('float32'),0) 

    # Run the model 

        results = model.run({'data':data}) 

    # Extract the index of the top prediction 

        res_npa = np.array(results[0]) 

        return np.argmax(res_npa)

4.3.3.6: Run the Complete Look over Video


    while (cap.isOpened()): 

        start = time.perf_counter() 

        ret, frame = cap.read() 

        if not ret: break      

        top_prediction = predict_class(frame)      

        end = time.perf_counter() 

        fps = 1 / (end - start) 

        fps_str = f"Frames per second: {fps:0.1f}" 

        label_str = "Top prediction: {}".format(labels[top_prediction])  

        labeled = cv2.putText(frame,  

                          label_str,  

                          (50, 50),  

                          cv2.FONT_HERSHEY_SIMPLEX,  

                          2,  

                          (255, 255, 255),  

                          3,  

                          cv2.LINE_AA) 

    labeled = cv2.putText(labeled,  

                          fps_str,  

                          (50, 1060),  

                          cv2.FONT_HERSHEY_SIMPLEX,  

                          2,  

                          (255, 255, 255),  

                          3,  

                          cv2.LINE_AA) 

    cv2.imshow("Resnet50 Inference", labeled)  

    if cv2.waitKey(1) & 0xFF == ord('q'): # 'q' to quit 

        break  

    cap.release() 

    cv2.destroyAllWindows()

< Previous | Next >

License

UIF is licensed under Apache License Version 2.0. Refer to the LICENSE file for the full license text and copyright notice.

Technical Support

Contact uif_support@amd.com for questions, issues, and feedback on UIF.

Submit your questions, feature requests, and bug reports on the GitHub issues page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deployingmodel.md

deployingmodel.md

Unified Inference Frontend (UIF) 1.2 User Guide

4.3: Deploy Model for Target Platforms

Table of Contents

4.3.1: Deploy Model for FPGA

4.3.1.1: In-Framework

4.3.1.2: Native

Run Models on Edge Platform

Run Models on Cloud Platform

4.3.2: Deploy Model for CPU

4.3.2.1: Run UIF Models with ZenDNN

Run Examples with TensorFlow+ZenDNN

Run Examples with PyTorch+ZenDNN

Run Examples with ONNXRT+ZenDNN

4.3.2.2: Run Custom Models with ZenDNN

Float Models

Model Compression Techniques for ZenDNN

1. Pruning a Deep Learning Model

2. Quantizing a Deep Learning Model

4.3.3: Deploy Model for GPU

4.3.3.1: Preliminary Steps

4.3.3.2: Prepare the Example

4.3.3.3: Run the Example from Python

4.3.3.4: Set Up the Video and Capture the Model

4.3.3.5: Add Code for Preprocessing Video Frames

4.3.3.6: Run the Complete Look over Video

License

Technical Support

Files

deployingmodel.md

Latest commit

History

deployingmodel.md

File metadata and controls

Unified Inference Frontend (UIF) 1.2 User Guide

4.3: Deploy Model for Target Platforms

Table of Contents

4.3.1: Deploy Model for FPGA

4.3.1.1: In-Framework

4.3.1.2: Native

Run Models on Edge Platform

Run Models on Cloud Platform

4.3.2: Deploy Model for CPU

4.3.2.1: Run UIF Models with ZenDNN

Run Examples with TensorFlow+ZenDNN

Run Examples with PyTorch+ZenDNN

Run Examples with ONNXRT+ZenDNN

4.3.2.2: Run Custom Models with ZenDNN

Float Models

Model Compression Techniques for ZenDNN

1. Pruning a Deep Learning Model

2. Quantizing a Deep Learning Model

4.3.3: Deploy Model for GPU

4.3.3.1: Preliminary Steps

4.3.3.2: Prepare the Example

4.3.3.3: Run the Example from Python

4.3.3.4: Set Up the Video and Capture the Model

4.3.3.5: Add Code for Preprocessing Video Frames

4.3.3.6: Run the Complete Look over Video

License

Technical Support