Skip to content

Latest commit

 

History

History
108 lines (85 loc) · 6.59 KB

README.md

File metadata and controls

108 lines (85 loc) · 6.59 KB

OCTADeeplearning

This repository mainly about deep learning parts which is consists of 4 parts(main, data, train, test).
The data pre-processing part is able to check in this Github Page.

  1. Environment Setting

Basically, deeplearning environment needs to consider lots of things. Like, verision of cuda, nvidia driver and the Deep learning framework. So, it is highly recommended to use docker. I also made my experiment environment by utilizing the docker. The fundamental environment for this experiment is like below.

  • Ubuntu (Linux OS for using Nvidia docker)
  • pytorch v1.11.0
  • cuda 11.3
  • cudnn 8

It's little bit tricky unless download these seperately.
But, you don't need to be worry about this, Check the dockerfile above and use it.

Dockerfile

You can also download the docker image through the dockerhub.
The basic usage of this file is consists of 2 steps (build & run). Each command are operated on the shell prompt.

  • Build example
 docker build . -t octa3d
  • Run example
 docker run -d -it \
 -v /data:/root/Share/OCTA3d/data \ 
 -v /home/Project/OCTA3d:/root/Share/OCTA3d \
 --name "octa3d" \
 --rm --gpus all octa3d:latest

  1. Main

The main function depicts overall process. Using Data_Handler in data.py, the input data for the learning has been set up. All the arguements from the argparser has been described in the main.py script.

  1. Data

The data.py is for handling dataset. From the pre-processing(split patch images for 2D, clipping for 3D normalizing) to customize Pytorch's Dataset. I was needed to do this task for each different dimension respectively. The concrete detail is described on the script through the comments.

  1. Train

Classification, Autoencoder pre-training (by customizing Clinicadl method) Basically we utilize the pre-invented CNN models as they've been proved it's performence. The point is, utilizing with our pre-processing method, we could get the increased inference scores. The models that we have used for are depicted below table.

Dimension VGGNet ResNet Inception V3 Net Efficient Net Vision Transformer
2D 16, 19 50, 152 O O O
3D 16 18, 50 O O X

There are several libraries to use these models and they actually automatically downloaded by provided Dockerfile. For the paper, we utilize the VGG19, ResNet-50,152, Inception V3 for 2D and ResNet 18, 50, Inception V3 for 3D. Because these models have been proved to be useful for the retina disease classifcation by previous researches. After taking binary-classification, it was able to verify that retaining volumetric information has a higher performance.

image

To leverage the transfer learning, adapt the autoencoder structure for pre-training and use the encoder parts for the classification with the fully connected layer. As pre-invented transfer learning method is actually using the model parameters which come from the natural image. To match the given medical data and overcome the aforementioned limitation, this architecture should be applied.

image

Currently, The multi-classification module has been tested and these will be combined with binary-classification with only the classification module. As their difference is just the way of scoring. Sooner these are integrated.

  1. Test

For the testset which had been splitted about 30% from the total data was used for the extracted best models. To explain the classification process of the extracted model, we visualize them by the Grad-CAM (by customizing M3d-cam) As 3D volumetric data is used, the Grad-CAM has been customized to expand the dimension from 2D to 3D. Overall process is like below.
image

After this process, improved retina lesion detection has been watched. For the case of utilizing the voluemtric information, the retinopathology has been detected quited accurately and the results are shown in below figure [D], [E]. As 3D attention map has been extracted, we can simulatenously observe the z-axis information of the lesion of retina. However, in case of 2D image, only x-y information is able to be acquired like figure [B], [C].

image[A] is a X-Z image (=B-scan image) of the OCT volume data. [B] is projected X-Y image (=En-face image). [C] 2D Grad-CAM results from B. [D] is an extracted X-Y slice image from 3D OCTA volume (position : red dot). [E] is an 3D Grad-CAM results from D.