Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix problem when training yolov3 with no-valid-bbox img #1555

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Aktcob
Copy link
Contributor

@Aktcob Aktcob commented Dec 2, 2020

Now u can train yolov3 with negative image

@mli
Copy link
Member

mli commented Dec 2, 2020

Job PR-1555-1 is done.
Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1555/1/index.html
Code coverage of this PR: pr.svg vs. Master: master.svg

@zhreshold
Copy link
Member

@Aktcob Do you have a real training result with mixed images that has no groundtruth? Does the change cause NaNs?

@Aktcob
Copy link
Contributor Author

Aktcob commented Dec 3, 2020

@zhreshold hi, I did not find any NANs right now when training my custom datasets.

BTW, I have trained like this for a long time. I think many people are faced with this problem when training yolo, so I create this PR.

@mli
Copy link
Member

mli commented Dec 3, 2020

Job PR-1555-2 is done.
Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1555/2/index.html
Code coverage of this PR: pr.svg vs. Master: master.svg

@chinakook
Copy link
Member

chinakook commented Dec 24, 2020

I also have a solution to train faster rcnn with no-valid-bbox img. I found that, the data transformers may change the label(as box)'s value so you need get the -1 valued box back after the transforms.

@Aktcob
Copy link
Contributor Author

Aktcob commented Dec 25, 2020

@chinakook yep. The data transformers may change the gt bbox[-1,-1,-1,-1,-1] to [xx, xx, xx, xx, -1]. However, the class is still -1. So, when do label assignment, should remove this fake bbox according to the class -1.

BTW, data transformers could not change the size of bbox. So, the value of xx is the same. For example, [-1,-1,-1,-1,-1] to [200,200,200,200,-1]. And this fake bbox will not be assigned to any anchor cause IOU is zero.

Refer to:
https://github.com/dmlc/gluon-cv/pull/1555/files#diff-6d664a77b7ba0b38ea36b82ebc52b4a3ff73a08ff485d45582b6b87b2e9019d7R106

if np_gt_ids[b, m, 0] < 0: # ignore fake bbox
break

@chinakook
Copy link
Member

chinakook commented Dec 25, 2020

@Aktcob In the faster rcnn, the sampler would treat [-1,-1,-1,-1,-1] as ignore box, bug treat [200,200,200,200,-1] as negative box which will be trained ( with only softmax entropy, and without box regression) together with positive boxes as the last -1 denotes class is background(negative).
I don't know how yolo do that but I think that we should ignore this box because we just need other negative boxes in this image. The [-1,-1,-1,-1,-1] sample should be used to make the model running normally, but cannot be trained.

@Aktcob
Copy link
Contributor Author

Aktcob commented Dec 25, 2020

@chinakook But [200,200,200,200] is a special bbox, that is, a point with 0 width, 0 height.
No anchor could match it.

Invalid box should have area of 0.
https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/rcnn/faster_rcnn/rcnn_target.py#L50

@chinakook
Copy link
Member

But I found no code to verify the area. The line below would assign the [200,200,200,200,-1] to negative, not ignore.

gt_score = F.sign(F.sum(gt_box, axis=-1, keepdims=True) + 1)

@Aktcob
Copy link
Contributor Author

Aktcob commented Dec 25, 2020

@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.

@chinakook
Copy link
Member

@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.

Did solve the nan problem?

@Aktcob
Copy link
Contributor Author

Aktcob commented Dec 25, 2020

@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.

Did solve the nan problem?

When training yolov3, no NAN problem.

@zhreshold
Copy link
Member

@Aktcob Since voc detection dataset is modified, all object detector will be affected. I will hold this until other detector training is verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants