Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not Using the First Feature (c11 and c21) for the flow estimation; Upsampling not present in the shallowest flow estimation #137

Open
Avishka-Perera opened this issue Feb 14, 2024 · 1 comment

Comments

@Avishka-Perera
Copy link

Dear @deqings, @mingyuliutw, @jrenzhile, @KinglittleQ,

This issue is based on the file PyTorch/models/PWCNet.py which was initially committed by @mingyuliutw . We have two problems.

  1. c11 and c21 are not used for flow estimation
  2. Upsample operation is not done in the pyramid level 1

1. c11 and c21 are not used for flow estimation

We noted that in the PyTorch implementation, the first level of features are not used for the flow estimation. Specifically, in the line 182 and 183, the features corresponding to the first layer (c11 and c21) and retrieved through convolution operations. But these are only used as input to the next convolution stack, but not for the flow estimation.

In contrast, all the other features ( c12, c22, c13, c23, ...) are used in successive flow estimations to warp and find the corresponding correlation volume.

As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(A)

We suspect that this can be handled by adding another flow estimation block starting by line 264 that utilizes c11 and c21

2. Upsampling operation is not done in the pyramid level 1

The transpose convolution is performed to upsample the predicted flow and decoder features by a factor of 2 at all the pyramid levels (lines 206, 207; lines 220, 221; lines 234, 235; lines 250, 251) except for the last layer.

As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(B)

We suspect that this can be handled by adding another transpose convolution starting by line 264, and then send the result through the context network.

As a combined result of (A) and (B), the final flow will be 4 times smaller than the input. We would like to know, if this is a mistake in the code or are we supposed to simply interpolate and scale the flow by a factor of 4 to compare against the ground truth.

Any help is greatly appreciated 😃

Thank you,
Kind regards.

@Avishka-Perera Avishka-Perera changed the title Not Using the First Feature (c11 and c21) for the flow estimation; Upsampling not present in the final flow estimation Not Using the First Feature (c11 and c21) for the flow estimation; Upsampling not present in the shallowest flow estimation Feb 14, 2024
@Avishka-Perera
Copy link
Author

Small update,

I figured out the reason for question 2. So I'll withdraw that.

Instead, won't refining the flow also in the image level give better results? Is there a specific reason why this haven't been implemented in PWCNet? I'm about to test this out. Before that, I was thinking if you would have any thoughts regarding it.

Kind regards,
Avishka Perera

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant