Fine-Tuning DETR on Custom Dataset

Environment

System Information
OS: Ubuntu 18.04
CPU: Intel Xeon Silver 4110 (32) @ 1.7GHz
GPU: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller
GPU: NVIDIA Tesla V100 PCIe 16GB
Memory: 20343MiB / 385656MiB
GPU Driver: NVIDIA 460.91.03

How to run my code

First, clone the repository locally:

https://github.com/loijilai/Fine-Tuning-DETR.git

Then, install PyTorch and torchvision:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Data Augmentation Pipeline

Use this file to filter the train set and get a new file called for_blip2.py to be used later. (All filtered pictures has only one category, one bounding box, except jellyfishs can have at most 6 bounding boxes)
Create a separate conda environment called blip2
Run image captioning on all images in for_blip2.py by running blip2.py, image captions will be added in a output file called for_gligen.py
Create a separate conda environment called gligen
Run image generation with three different stategies
```
bash ./GLIGEN/run_gen.sh
```
After 7 categories * 20 images * 3 strategies = 420 images generated, augment train set annotations with add_train.py
Augment train set with move_pictures.py
Check the generated image quality using FID scores, manually select 140 real images and resize using this script and run
```
python -m pytorch_fid path/to/dataset1 path/to/dataset2
```

Training

Refer to this document on how to fine-tune detr on custom dataset. Use this script to get pretrained model.

Train without data augmentation

CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/main.py \
--dataset_file your_dataset \
--coco_path <PATH_TO_DATASET>
--epochs 350 \
--lr=1e-4  \
--batch_size=2 \
--num_workers=4 \
--output_dir=./outputs \
--resume=<PATH_TO_CHECKPOINT>

Train with data augmentation (text grounding template2 only)
```
bash ./run_text2.sh
```
Train with data augmentation (text and image grounding)
```
bash ./run_text_image.sh
```

Inference and Evaluation

To get output.json

CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_json.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>

To get visualization result

CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_visualize.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>

To get map scores

python evaluate.py ./outputs/json/output.json ./hw1_dataset/annotations/val.json

Utilities

To get bounding box on images use this script

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
GLIGEN		GLIGEN
detr		detr
outputs		outputs
utils		utils
.gitignore		.gitignore
README.md		README.md
add_train.py		add_train.py
blip2.py		blip2.py
evaluate.py		evaluate.py
filter_trainset.py		filter_trainset.py
gligen_inference_backup.py		gligen_inference_backup.py
manually_select_140_photos.py		manually_select_140_photos.py
move_pictures.py		move_pictures.py
run_text2.sh		run_text2.sh
run_text_image.sh		run_text_image.sh
server_setup.sh		server_setup.sh
visulization.py		visulization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning DETR on Custom Dataset

Environment

How to run my code

Data Augmentation Pipeline

Training

Inference and Evaluation

Utilities

About

Releases

Packages

Languages

loijilai/Fine-Tuning-DETR

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning DETR on Custom Dataset

Environment

How to run my code

Data Augmentation Pipeline

Training

Inference and Evaluation

Utilities

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages