Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPN ROI Choosing #5

Open
Max-Fu opened this issue Jul 15, 2017 · 29 comments
Open

FPN ROI Choosing #5

Max-Fu opened this issue Jul 15, 2017 · 29 comments

Comments

@Max-Fu
Copy link

Max-Fu commented Jul 15, 2017

Hi there! As I scan through Feature Pyramid Network for Object Detection, I found a part where there is a formula for choosing the feature map for ROI based on the size of the region proposal. Can you show me how you implement this? I wish to implement FPN on the new Object Detection API provided by Tensorflow.

@Max-Fu
Copy link
Author

Max-Fu commented Jul 15, 2017

k = k0 + log2(√(wh)/224) where k0 equals 4, k equals the output-layer's layer number, w and h are the width and height of the regional proposal.

@xmyqsh
Copy link
Owner

xmyqsh commented Jul 20, 2017

in file proposal_target_layer.py L101

`def calc_level(width, height):
return min(5, max(2, int(4 + np.log2(np.sqrt(width * height) / 224))))

level = lambda roi : calc_level(roi[3] - roi[1], roi[4] - roi[2])   # roi: [0, x0, y0, x1, y1]

leveled_rois = [[], [], [], []]
leveled_rois[0] = [roi for roi in rois if level(roi) == 2]
leveled_rois[1] = [roi for roi in rois if level(roi) == 3]
leveled_rois[2] = [roi for roi in rois if level(roi) == 4]
leveled_rois[3] = [roi for roi in rois if level(roi) == 5]`

this logic can be implemented either in proposal_target_layer or in roi_pooling_layer
implemented in proposal_target_layer need call four roi_pooling_layer but may benefit from CPU and GPU parallel
implemented in roi_pooling_layer need just one roi_pooling_layer and better benefit of acceleration of GPU

Do you think the latter one is a better chioce?

@Max-Fu
Copy link
Author

Max-Fu commented Jul 20, 2017

Thank you for answering this question! I just finished my implementation of FPN over the new Tensorflow Object Detection API. I implemented this algorithm in roi pooling layer.

@Max-Fu
Copy link
Author

Max-Fu commented Jul 20, 2017

Choosing ROI is definitely the better choice. Then I was just confused about where to add this formula.

@xmyqsh
Copy link
Owner

xmyqsh commented Jul 24, 2017

Hey man,
How is your training result?
The rpn_loss of my training result is many times larger than the FastRCNN loss.
Do you think I should also add k = k0 + log2(√(wh)/224) into archor target layer?
Has the paper mentioned this? I think it should be a reasonable improvement.
(Set w and h to be the the width and height of ground truth bbox in this layer.)

@Max-Fu
Copy link
Author

Max-Fu commented Jul 25, 2017

The training result was not as good as the one mentioned in the paper (I only trained for 2 days). RPN loss was also many times larger than the Faster RCNN loss. (I don't know much about fast rcnn though). You can definitely try your method, see if it is correct.

@Zehaos
Copy link

Zehaos commented Jul 26, 2017

Hi, @xmyqsh @Max-Fu
The author claim that they use 4 step training rather than end2end training. (Please refer to 5.2.2 Shareing Features).
I implemented FPN using MXNET, and I have tried alternated training. The RPN result is good (8 points higher than the res50-c4 baseline), but the fast-rcnn result is quite bad.

@xmyqsh
Copy link
Owner

xmyqsh commented Jul 28, 2017

@Zehaos
Good!
I will try it.
But how to evaluate RPN result, AP or AR?
Do you know where has the definition of AR in Table 1, average recall?

@Zehaos
Copy link

Zehaos commented Jul 28, 2017

@xmyqsh
I used average recall to do the evaluation (on VOC dataset). Table 1 is the eval result of COCO tools? I'm not sure.

@Johere
Copy link

Johere commented Jul 31, 2017

@xmyqsh Hi, you mentioned that the logic of choosing the feature map for ROI can be implemented either in proposal_target_layer or in roi_pooling_layer. And I implemented this algorithm in roi pooling layer but I got a bad result. However, I find that proposal_target_layer is not used in the stage 'TEST', while roi_pooling_layer is both used in the stage 'TRAIN' and 'TEST'. So the implementation of these two situations should be different? Is there anything wrong with my understanding?

@xmyqsh
Copy link
Owner

xmyqsh commented Jul 31, 2017

@Johere
You are right.
Among the three layers of RPN, only proposal layer used in 'TEST' phase.
Anchor target layer is used to generate the delta of anchors for RPN training. And the proposal target layer is used to generate the delta of proposal region as well as proposal region (ROI) for Fast-RCNN training. Well, the proposal layer is used to generate the proposal region (ROI). Roi pooling is used to crop the ROI from the feature map, then pool them into unified 7x7 features.

I implemented the logic of choosing the feature map for ROI (k = k0 + log2(√(wh)/224)), I think it should be better to say P2~P5 aware/wising ROI, in proposal layer. My proposal layer in 'TEST' phase output P2~P5 aware/wising ROI, which is different from its output in 'TRAIN' phase.

@Johere
Copy link

Johere commented Jul 31, 2017

@xmyqsh
Thank you very much!
May I ask you about your training results? I modified the roi_pooling_layer by choosing the feature map
(P2/P3/P4/P5) before roi pooling operate, and the rest code of this layer remains the same, but the result was bad... How about your implementation?

@xmyqsh
Copy link
Owner

xmyqsh commented Jul 31, 2017

@Johere

I implement the feature map(P2/P3/P4/P5) choosing operate in proposal layer in 'TEST' phase, and in proposal target layer in 'TRAIN' phase.

If you implement this in roi_pooling_layer, be aware of recoding the mapping of feature map (P2/P3/P4/P5) and ROIS, the mapping relationship should be used in backward process again.

If your RPN performance is not as good as the paper says, and you just use one image, not two as the paper says, in one forward/backward process, I think you should use lower learning rate than the paper says, cause there are not enough efficient rois in rgs_loss, so its gradient may be not stable, and a lower learning rate should be a better choice.

I'm just optimizing and testing my RPN performance. Previous end-to-end training result is bad.

@Johere
Copy link

Johere commented Jul 31, 2017

@xmyqsh
OK. Thank you for answering me!

@xmyqsh
Copy link
Owner

xmyqsh commented Aug 2, 2017

@Zehaos
P6 should be included in RPN's head, but I encountered numerical problem(nan) during training when I added it.
Have you encountered similar problem?

@Zehaos
Copy link

Zehaos commented Aug 2, 2017

@xmyqsh
No. I use max pooling to downsample P5 and allow border anchor during training, the training is smooth.

@xmyqsh
Copy link
Owner

xmyqsh commented Aug 2, 2017

@Zehaos
Same to you.
What's your max pooling's kernel size, 3x3 or 1x1?
And your learning rate is 0.02 as the paper says?

@Zehaos
Copy link

Zehaos commented Aug 2, 2017

@xmyqsh
Kernel size=2 ... stride=2.
I used lr=0.002 due to a smaller batch size(1img/gpu * 4 gpu).

@xmyqsh
Copy link
Owner

xmyqsh commented Aug 2, 2017

@Zehaos
After use Kernel size=2, NAN disappeared...
Thank you!

@Zehaos
Copy link

Zehaos commented Aug 2, 2017

@xmyqsh You are welcome.

@xmyqsh
Copy link
Owner

xmyqsh commented Aug 7, 2017

@Zehaos
How many image_batch_size do you use in fast-rcnn of alternated training?
Larger image_batch_size should help training?

@Zehaos
Copy link

Zehaos commented Aug 7, 2017

@xmyqsh
I used image_batch_size = 2, roi_batch_size = 256. Larger image_batch_size should help because of less ROI correlation.

@Feynman27
Copy link

Feynman27 commented Sep 15, 2017

@xmyqsh Your current implementation for choosing the pyramid level assigns all rois to each feature map. For example, you are sampling 128 rois and assigning each of them to the 4 pyramid levels (P2~P5), resulting in 512 rois per image. Is this deliberate? Shouldn't each roi be assigned to a unique level in the feature pyramid -- given by the formula in the paper?

For example, compare leveled_idxs in the two implementations below (they are not the same)

  1. RoI indexes are assigned to each level
leveled_idxs = [[]] * 4
for idx, roi in enumerate(rois):
        level_idx = level(roi) - 2
        leveled_idxs[level_idx].append(idx)
  1. RoI indexes assigned to different levels, determined by k = k0 + log2(√(wh)/224):
leveled_idxs = [[], [], [], []]
for idx, roi in enumerate(rois):
        level_idx = level(roi) - 2
        leveled_idxs[level_idx].append(idx)

@stillwalker1234
Copy link

@Feynman27

That's a really subtle error, have you tried training with that mod?

@xmyqsh
Copy link
Owner

xmyqsh commented Sep 20, 2017

@Feynman27
Good!

@Feynman27
Copy link

Yes, but surprisingly, it didn't really change the mAP much. It actually dropped it about 0.5-1.0 percentage point.

@xmyqsh
Copy link
Owner

xmyqsh commented Sep 20, 2017

@Feynman27
Have you changed the related codes in proposal_layer.py and proposal_target_layer.py simultaneously?

@Feynman27
Copy link

Feynman27 commented Sep 20, 2017 via email

@hhchyer
Copy link

hhchyer commented Mar 13, 2018

@Feynman27 The formula to choose level of proposal layer should be a balance of speed and accuracy. The proposals can benefit from other layers even.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants