Skip to content

loijilai/Fine-Tuning-DETR

Repository files navigation

Fine-Tuning DETR on Custom Dataset

Environment

  • System Information
    OS: Ubuntu 18.04
    CPU: Intel Xeon Silver 4110 (32) @ 1.7GHz
    GPU: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller
    GPU: NVIDIA Tesla V100 PCIe 16GB
    Memory: 20343MiB / 385656MiB
    GPU Driver: NVIDIA 460.91.03

How to run my code

First, clone the repository locally:

https://github.com/loijilai/Fine-Tuning-DETR.git

Then, install PyTorch and torchvision:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Data Augmentation Pipeline

  • Use this file to filter the train set and get a new file called for_blip2.py to be used later. (All filtered pictures has only one category, one bounding box, except jellyfishs can have at most 6 bounding boxes)
  • Create a separate conda environment called blip2
  • Run image captioning on all images in for_blip2.py by running blip2.py, image captions will be added in a output file called for_gligen.py
  • Create a separate conda environment called gligen
  • Run image generation with three different stategies
    bash ./GLIGEN/run_gen.sh
    
  • After 7 categories * 20 images * 3 strategies = 420 images generated, augment train set annotations with add_train.py
  • Augment train set with move_pictures.py
  • Check the generated image quality using FID scores, manually select 140 real images and resize using this script and run
    python -m pytorch_fid path/to/dataset1 path/to/dataset2
    

Training

  • Refer to this document on how to fine-tune detr on custom dataset. Use this script to get pretrained model.

  • Train without data augmentation

    CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
    python ./detr/main.py \
    --dataset_file your_dataset \
    --coco_path <PATH_TO_DATASET>
    --epochs 350 \
    --lr=1e-4  \
    --batch_size=2 \
    --num_workers=4 \
    --output_dir=./outputs \
    --resume=<PATH_TO_CHECKPOINT>
    
  • Train with data augmentation (text grounding template2 only)

    bash ./run_text2.sh
    
  • Train with data augmentation (text and image grounding)

    bash ./run_text_image.sh
    

Inference and Evaluation

To get output.json

CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_json.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>

To get visualization result

CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_visualize.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>

To get map scores

python evaluate.py ./outputs/json/output.json ./hw1_dataset/annotations/val.json 

Utilities

To get bounding box on images use this script

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages