- Open a command prompt and execute:
git clone https://github.com/Xilinx/Vitis-AI.git cd Vitis-AI git checkout 1.4.1
- Follow the Vitis-AI installation process here
- Once the installation is completed open a terminal in the Vitis-AI directory and execute:
git clone https://github.com/RaffaeleBerzoini/SENECA.git ./docker_run.sh xilinx/vitis-ai-gpu:latest
The working directory should look similar to:
SENECA # your WRK_DIR
.
├── application
├── build
├── charts
├── preprocessing
├── extract_slices.py
└── prepare_dataset.sh
├── results
├── ...
└── .py files
- Download the dataset
- Data wil be downloaded in a folder named OrganSegmentations. If not rename it
- Move the OrganSegmentations folder in
WRK_DIR/preprocessing
. Now the workspace should look like:
SENECA # your WRK_DIR
.
├── application
├── preprocessing
├── OrganSegmentations
├── labels-0.nii.gz
├── ...
└── volume-139.nii.gz
├── extract_slices.py
└── prepare_dataset.sh
├── ...
└── .py files
- In the command prompt execute:
Vitis-AI /workspace > conda activate vitis-ai-tensorflow2 (vitis-ai-tensorflow2) Vitis-AI /workspace > cd SENECA (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA > pip install -r requirements.txt (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA > cd preprocessing (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA/preprocessing > sh prepare_dataset_sh (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA/preprocessing > cd ..
- Wait for the slice extraction. This could take several minutes
Now you should be in the WRK_DIR with the following setup:
SENECA # your WRK_DIR
.
├── ...
├── build
├── dataset
├── input
└── target
├── ...
└── .py files
In the WRK_DIR execute:
python train.py --batchsize 8 --layers 4 --filters 8 --epochs 75
to train the 1 million parameters model. To test other configurations reported on the paper follow this table:
Configuration | --layers | --filters | Parameters [x 106] |
---|---|---|---|
1M | 4 | 8 | ~ 1.034 |
2M | 5 | 6 | ~ 2.329 |
4M | 5 | 8 | ~ 4.136 |
8M | 5 | 11 | ~ 7.814 |
16M | 5 | 16 | ~ 16.522 |
During training, each time validation results improve, a float model is saved in:
build/float_model/{val_loss:.4f}-f_model.h5
You can perform Post Training Quantization (PTQ) or Fast Finetuning Quantization (FFQ) to quantize the float model. PTQ is to be preferred in terms of time and computation needs. Try FFQ if you're experiencing performance losses after PTQ
In the WRK_DIR execute:
python quantize.py -m build/float_model/0.1021-f_model.h5 --evaluate --calibration 500
- You would want to try different calibration dimensions if there is a lot of performance loss after quantization
In the WRK_DIR execute:
python quantize.py -m build/float_model/0.1021-f_model.h5 --evaluate --calibration 100 --fastfinetuning --fftepochs 5
- Modify the
fast_ft_epochs
as you like - Keep in mind that FFT requires more memory as you increase the calibration dataset dimensions and the number of FFT epochs
Note that here 0.1021-f_model.h5
is just an example. Check in your build/float_model/
directory which float models have been generated during training.
- The quantized model is saved in
build/quant_model/q_model.h5
.
To compile the q_model.h5
for the FPGA execute one of these command.
sh compile.sh ZCU102
for the ZCU102sh compile.sh ZCU104
for the ZCU104sh compile.sh vck190
for the VCK190sh compile.sh u50
for the ALVEO U50
For the ZCU104 (used for this work) the compiled model is saved in build/compiled_zcu104/
directory
Set up the evaluation board (we used the ZCU104 for this work) as stated here.
In the WRK_DIR execute:
sh deployment_setup.sh 0 100 zcu104
-
The firsts two args stands for
- Starting image in the images directories list
- Number of images to be prepared
-
Change the third arg (
zcu104
in our case) if your target board is different
Copy the build/target/
directory to your board with scp -r build/target/ root@[email protected]:~/.
assuming that the target board IP address is 192.168.1.227 - adjust this as appropriate for your system.
You could also directly copy the folder to the board SD card
On the board execute:
root@xilinx-zcu104-2021_1:~# cd target
root@xilinx-zcu104-2021_1:~/target# python3 app_mt.py --threads 4 --model unet.xmodel --save
Command line options:
--image_dir : images
--threads : 4
--model : unet.xmodel
--save : True
------------------------------------
Pre-processing 100 images...
Starting 4 threads...
------------------------------------
Throughput=274.73 fps, total frames = 100, time=0.3640 seconds
Saving 100 predictions...
------------------------------------
To evaluate results:
python3 scores.py
Command line options:
--image_dir : predictions
--label_dir : labels
------------------------------
------------------------------
Global dice :
Mean on slices: 88.77 +- 10.02
Weighted Mean on organs: 93.04 +- 0.07
------------------------------
Organs dice
Liver: 91.63 +- 0.09
Bladder: 79.21 +- 0.09
Lungs: 96.16 +- 0.09
Kidneys: 81.32 +- 0.08
Bones: 94.35 +- 0.03
The script prints out also other metrics for a more complete analysis.
If you find this repository useful, please use the following citation:
@inproceedings{berzoini2021onhow,
title={On How to Push Efficient Medical Semantic Segmentation to the Edge: the SENECA approach},
author={Berzoini, Raffaele and D'Arnese, Eleonora and Conficconi, Davide},
booktitle={2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
year={2022},
organization={IEEE}
}