owlvit/2 dynamic input resolution. #34764

bastrob · 2024-11-17T22:27:24Z

What does this PR do?

Towards #30579
Hey, this is a draft, Im wondering how can we manage the variables impacted by the dynamic input change in the OwlViTForObjectDetection class (self.sqrt_num_patches, self.box_bias) ?
Is there a better way to handle this ?

The interpolate_pos_encoding allows new input size respecting height==width strictly ?
In that case I should ensure this.
If not, I think sqrt_num_patches needs to be decomposed too (_h, _w), some examples where it might throws exc:
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1355
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1459
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1719

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qubvel

Hi @bastrob! Thanks for opening PR!

It would be great to enable any image resolution, but If you will not find a way to manage height != width images we can limit it to a square input image size. We just have to make sure the proper error is raised.

qubvel · 2024-11-18T10:53:36Z

src/transformers/models/owlv2/modeling_owlv2.py

+            _, _, height, width = pixel_values.shape
+            # height must eq width.
+            self.sqrt_num_patches = height // self.config.vision_config.patch_size
+            self.box_bias = self.compute_box_bias(self.sqrt_num_patches)


I assume we can always use this code path without if/else

qubvel · 2024-11-18T10:54:05Z

src/transformers/models/owlv2/modeling_owlv2.py

+            _, _, height, width = pixel_values.shape
+            # height must eq width.
+            self.sqrt_num_patches = height // self.config.vision_config.patch_size
+            self.box_bias = self.compute_box_bias(self.sqrt_num_patches)
+        else:


owlvit/2 dynamic input resolution.

30f3c2d

qubvel added the Vision label Nov 18, 2024

qubvel mentioned this pull request Nov 18, 2024

Community contribution: enable dynamic resolution input for more vision models. #30579

Open

11 tasks

qubvel reviewed Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

owlvit/2 dynamic input resolution. #34764

owlvit/2 dynamic input resolution. #34764

bastrob commented Nov 17, 2024

qubvel left a comment •

edited

Loading

qubvel Nov 18, 2024

qubvel Nov 18, 2024

owlvit/2 dynamic input resolution. #34764

Are you sure you want to change the base?

owlvit/2 dynamic input resolution. #34764

Conversation

bastrob commented Nov 17, 2024

What does this PR do?

Before submitting

Who can review?

qubvel left a comment • edited Loading

Choose a reason for hiding this comment

qubvel Nov 18, 2024

Choose a reason for hiding this comment

qubvel Nov 18, 2024

Choose a reason for hiding this comment

qubvel left a comment •

edited

Loading