Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

Very low performance for segmentation task. #613

Open
lianzheng-research opened this issue Nov 13, 2023 · 0 comments
Open

Very low performance for segmentation task. #613

lianzheng-research opened this issue Nov 13, 2023 · 0 comments

Comments

@lianzheng-research
Copy link

Thanks for your excellent work!

I have pretrained DETR on COCO dataset with command:

CUDA_VISIBLE_DEVICES=4,5,6,7 python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path ../datasets/coco/ --batch_size 6 --world_size 4 --output_dir outputs/detr-coco-20231028/box_model/

And now I'm trying to finetune it on segmentation task with command:

CUDA_VISIBLE_DEVICES=4,5,6,7 python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --masks --epochs 25 --lr_drop 15 --coco_path ../datasets/coco/ --coco_panoptic_path ../datasets/coco_panoptic/ --dataset_file coco_panoptic --frozen_weights outputs/detr-coco-20231028/box_model/checkpoint.pth --output_dir outputs/detr-coco-20231028/segm/ --batch_size 8 --world_size 4

But my loss is not decreasing and maintains around 45 points from the beginning to the end. The class error has the same behavior.

Epoch: [0]  [   0/3696]  eta: 6:24:44  lr: 0.000100  class_error: 75.10  loss: 45.1652 (45.1652)  loss_bbox: 1.1062 (1.1062)  loss_bbox_0: 1.0980 (1.0980)  loss_bbox_1: 1.0900 (1.0900)  loss_bbox_2: 1.1241 (1.1241)  loss_bbox_3: 1.1097 (1.1097)  loss_bbox_4: 1.0968 (1.0968)  loss_ce: 5.2375 (5.2375)  loss_ce_0: 5.1831 (5.1831)  loss_ce_1: 5.2124 (5.2124)  loss_ce_2: 5.2343 (5.2343)  loss_ce_3: 5.2243 (5.2243)  loss_ce_4: 5.2440 (5.2440)  loss_dice: 0.9501 (0.9501)  loss_giou: 1.0081 (1.0081)  loss_giou_0: 1.0500 (1.0500)  loss_giou_1: 1.0273 (1.0273)  loss_giou_2: 1.0172 (1.0172)  loss_giou_3: 1.0156 (1.0156)  loss_giou_4: 1.0123 (1.0123)  loss_mask: 0.1242 (0.1242)  cardinality_error_unscaled: 88.5625 (88.5625)  cardinality_error_0_unscaled: 88.5625 (88.5625)  cardinality_error_1_unscaled: 88.5625 (88.5625)  cardinality_error_2_unscaled: 88.5625 (88.5625)  cardinality_error_3_unscaled: 88.5625 (88.5625)  cardinality_error_4_unscaled: 88.5625 (88.5625)  class_error_unscaled: 75.0977 (75.0977)  loss_bbox_unscaled: 0.2212 (0.2212)  loss_bbox_0_unscaled: 0.2196 (0.2196)  loss_bbox_1_unscaled: 0.2180 (0.2180)  loss_bbox_2_unscaled: 0.2248 (0.2248)  loss_bbox_3_unscaled: 0.2219 (0.2219)  loss_bbox_4_unscaled: 0.2194 (0.2194)  loss_ce_unscaled: 5.2375 (5.2375)  loss_ce_0_unscaled: 5.1831 (5.1831)  loss_ce_1_unscaled: 5.2124 (5.2124)  loss_ce_2_unscaled: 5.2343 (5.2343)  loss_ce_3_unscaled: 5.2243 (5.2243)  loss_ce_4_unscaled: 5.2440 (5.2440)  loss_dice_unscaled: 0.9501 (0.9501)  loss_giou_unscaled: 0.5041 (0.5041)  loss_giou_0_unscaled: 0.5250 (0.5250)  loss_giou_1_unscaled: 0.5137 (0.5137)  loss_giou_2_unscaled: 0.5086 (0.5086)  loss_giou_3_unscaled: 0.5078 (0.5078)  loss_giou_4_unscaled: 0.5062 (0.5062)  loss_mask_unscaled: 0.1242 (0.1242)  time: 6.2459  data: 2.2261  max mem: 11054
...
Epoch: [24]  [3695/3696]  eta: 0:00:04  lr: 0.000010  class_error: 75.27  loss: 45.3449 (44.8820)  loss_bbox: 1.1708 (1.1440)  loss_bbox_0: 1.1676 (1.1385)  loss_bbox_1: 1.1531 (1.1288)  loss_bbox_2: 1.1777 (1.1487)  loss_bbox_3: 1.1720 (1.1482)  loss_bbox_4: 1.1655 (1.1405)  loss_ce: 5.3088 (5.3016)  loss_ce_0: 5.2577 (5.2610)  loss_ce_1: 5.3025 (5.2981)  loss_ce_2: 5.3195 (5.3152)  loss_ce_3: 5.2854 (5.2848)  loss_ce_4: 5.3072 (5.2983)  loss_dice: 0.4689 (0.4755)  loss_giou: 0.9462 (0.9466)  loss_giou_0: 0.9980 (0.9972)  loss_giou_1: 0.9560 (0.9649)  loss_giou_2: 0.9686 (0.9571)  loss_giou_3: 0.9512 (0.9525)  loss_giou_4: 0.9495 (0.9480)  loss_mask: 0.0307 (0.0324)  cardinality_error_unscaled: 89.6562 (89.5641)  cardinality_error_0_unscaled: 89.6562 (89.5627)  cardinality_error_1_unscaled: 89.6562 (89.5637)  cardinality_error_2_unscaled: 89.6562 (89.5635)  cardinality_error_3_unscaled: 89.6562 (89.5636)  cardinality_error_4_unscaled: 89.6562 (89.5639)  class_error_unscaled: 79.0230 (77.3168)  loss_bbox_unscaled: 0.2342 (0.2288)  loss_bbox_0_unscaled: 0.2335 (0.2277)  loss_bbox_1_unscaled: 0.2306 (0.2258)  loss_bbox_2_unscaled: 0.2355 (0.2297)  loss_bbox_3_unscaled: 0.2344 (0.2296)  loss_bbox_4_unscaled: 0.2331 (0.2281)  loss_ce_unscaled: 5.3088 (5.3016)  loss_ce_0_unscaled: 5.2577 (5.2610)  loss_ce_1_unscaled: 5.3025 (5.2981)  loss_ce_2_unscaled: 5.3195 (5.3152)  loss_ce_3_unscaled: 5.2854 (5.2848)  loss_ce_4_unscaled: 5.3072 (5.2983)  loss_dice_unscaled: 0.4689 (0.4755)  loss_giou_unscaled: 0.4731 (0.4733)  loss_giou_0_unscaled: 0.4990 (0.4986)  loss_giou_1_unscaled: 0.4780 (0.4824)  loss_giou_2_unscaled: 0.4843 (0.4786)  loss_giou_3_unscaled: 0.4756 (0.4763)  loss_giou_4_unscaled: 0.4748 (0.4740)  loss_mask_unscaled: 0.0307 (0.0324)  time: 4.1762  data: 2.0446  max mem: 24500

After finetuning on COCO panoptic dataset for 25 epochs, I got the following scores:

Accumulating evaluation results...
DONE (t=16.20s).
Accumulating evaluation results...
DONE (t=28.51s).
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.026
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.051
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.005
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.031
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.069
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.051
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.083
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.099
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.101
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.190
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.039
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.064
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.039
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.046
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.090
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.065
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.108
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.130
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.040
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.240

...

          |    PQ     SQ     RQ     N
--------------------------------------
All       |   0.0    0.0    0.0   133
Things    |   0.0    0.0    0.0    80
Stuff     |   0.0    0.0    0.0    53

I need some help. Why did I get such low scores? Thank you very much!

@lianzheng-research lianzheng-research changed the title Please read & provide the following Very low performance for segmentation task. Nov 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant