You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is based on the file PyTorch/models/PWCNet.py which was initially committed by @mingyuliutw . We have two problems.
c11 and c21 are not used for flow estimation
Upsample operation is not done in the pyramid level 1
1. c11 and c21 are not used for flow estimation
We noted that in the PyTorch implementation, the first level of features are not used for the flow estimation. Specifically, in the line 182 and 183, the features corresponding to the first layer (c11 and c21) and retrieved through convolution operations. But these are only used as input to the next convolution stack, but not for the flow estimation.
In contrast, all the other features ( c12, c22, c13, c23, ...) are used in successive flow estimations to warp and find the corresponding correlation volume.
As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(A)
We suspect that this can be handled by adding another flow estimation block starting by line 264 that utilizes c11 and c21
2. Upsampling operation is not done in the pyramid level 1
The transpose convolution is performed to upsample the predicted flow and decoder features by a factor of 2 at all the pyramid levels (lines 206, 207; lines 220, 221; lines 234, 235; lines 250, 251) except for the last layer.
As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(B)
We suspect that this can be handled by adding another transpose convolution starting by line 264, and then send the result through the context network.
As a combined result of (A) and (B), the final flow will be 4 times smaller than the input. We would like to know, if this is a mistake in the code or are we supposed to simply interpolate and scale the flow by a factor of 4 to compare against the ground truth.
Any help is greatly appreciated 😃
Thank you,
Kind regards.
The text was updated successfully, but these errors were encountered:
Avishka-Perera
changed the title
Not Using the First Feature (c11 and c21) for the flow estimation; Upsampling not present in the final flow estimation
Not Using the First Feature (c11 and c21) for the flow estimation; Upsampling not present in the shallowest flow estimation
Feb 14, 2024
I figured out the reason for question 2. So I'll withdraw that.
Instead, won't refining the flow also in the image level give better results? Is there a specific reason why this haven't been implemented in PWCNet? I'm about to test this out. Before that, I was thinking if you would have any thoughts regarding it.
Dear @deqings, @mingyuliutw, @jrenzhile, @KinglittleQ,
This issue is based on the file
PyTorch/models/PWCNet.py
which was initially committed by @mingyuliutw . We have two problems.c11
andc21
are not used for flow estimation1.
c11
andc21
are not used for flow estimationWe noted that in the PyTorch implementation, the first level of features are not used for the flow estimation. Specifically, in the line
182
and183
, the features corresponding to the first layer (c11
andc21
) and retrieved through convolution operations. But these are only used as input to the next convolution stack, but not for the flow estimation.In contrast, all the other features ( c12, c22, c13, c23, ...) are used in successive flow estimations to warp and find the corresponding correlation volume.
As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(A)
We suspect that this can be handled by adding another flow estimation block starting by line 264 that utilizes
c11
andc21
2. Upsampling operation is not done in the pyramid level 1
The transpose convolution is performed to upsample the predicted flow and decoder features by a factor of 2 at all the pyramid levels (lines 206, 207; lines 220, 221; lines 234, 235; lines 250, 251) except for the last layer.
As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(B)
We suspect that this can be handled by adding another transpose convolution starting by line 264, and then send the result through the context network.
As a combined result of (A) and (B), the final flow will be 4 times smaller than the input. We would like to know, if this is a mistake in the code or are we supposed to simply interpolate and scale the flow by a factor of 4 to compare against the ground truth.
Any help is greatly appreciated 😃
Thank you,
Kind regards.
The text was updated successfully, but these errors were encountered: