- System Information
OS: Ubuntu 18.04
CPU: Intel Xeon Silver 4110 (32) @ 1.7GHz
GPU: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller
GPU: NVIDIA Tesla V100 PCIe 16GB
Memory: 20343MiB / 385656MiB
GPU Driver: NVIDIA 460.91.03
First, clone the repository locally:
https://github.com/loijilai/Fine-Tuning-DETR.git
Then, install PyTorch and torchvision:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Install pycocotools (for evaluation on COCO) and scipy (for training):
conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
- Use this file to filter the train set and get a new file called
for_blip2.py
to be used later. (All filtered pictures has only one category, one bounding box, except jellyfishs can have at most 6 bounding boxes) - Create a separate conda environment called
blip2
- Run image captioning on all images in
for_blip2.py
by running blip2.py, image captions will be added in a output file calledfor_gligen.py
- Create a separate conda environment called
gligen
- Run image generation with three different stategies
bash ./GLIGEN/run_gen.sh
- After 7 categories * 20 images * 3 strategies = 420 images generated, augment train set annotations with add_train.py
- Augment train set with move_pictures.py
- Check the generated image quality using FID scores, manually select 140 real images and resize using this script and run
python -m pytorch_fid path/to/dataset1 path/to/dataset2
-
Refer to this document on how to fine-tune detr on custom dataset. Use this script to get pretrained model.
-
Train without data augmentation
CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \ python ./detr/main.py \ --dataset_file your_dataset \ --coco_path <PATH_TO_DATASET> --epochs 350 \ --lr=1e-4 \ --batch_size=2 \ --num_workers=4 \ --output_dir=./outputs \ --resume=<PATH_TO_CHECKPOINT>
-
Train with data augmentation (text grounding template2 only)
bash ./run_text2.sh
-
Train with data augmentation (text and image grounding)
bash ./run_text_image.sh
To get output.json
CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_json.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>
To get visualization result
CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_visualize.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>
To get map scores
python evaluate.py ./outputs/json/output.json ./hw1_dataset/annotations/val.json
To get bounding box on images use this script