Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 694 Bytes

dense_model.md

File metadata and controls

3 lines (2 loc) · 694 Bytes

Instead of extracting patch descriptions, Dense-ContextDesc consumes image inputs, densely extracts feature representation, and uses Spatial Transformer Networks to crop a set of "feature patches" around the keypoint.

For example, using the scale and orientation parameters obtained from SIFT detector, we compose an affine transformation matrix, then crop a "feature patch" sized 8x8x128 around each keypoint location. The "feature patch" is then mapped to a 1x1x128 descriptor via a convolution with 8x8 filter. This process is similar to LF-Net, with scale and orientation pre-given.