Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the network suit for long-text recognition? #6

Open
WudiJoey opened this issue Jun 15, 2021 · 6 comments
Open

Is the network suit for long-text recognition? #6

WudiJoey opened this issue Jun 15, 2021 · 6 comments

Comments

@WudiJoey
Copy link

WudiJoey commented Jun 15, 2021

Thanks for your work!
I read your paper and notice that input images are resized to [224, 224]. In the case of long text line,does it influence the accuracy?
Look forward to your reply!

@WudiJoey WudiJoey changed the title can the network suit for long Is the network suit for long-text recognition? Jun 15, 2021
@WudiJoey
Copy link
Author

Addding: the width of the text image is often greater than the height. Can image information be preserved to the greatest extent if image is resized to square?
Look forward to your reply~

@roatienza
Copy link
Owner

Hi, The resized images (224x224) are still human readable. The attention maps on square images also appear to be giving proper weights on each character region. Other than these, there was no empirical proof on how is the resizing affecting the accuracy. The alternative way is to resize to (100, 32) and use padding to scaled up to 224x224.

@WudiJoey
Copy link
Author

Thanks for your reply~
I will try your work.

@luvwinnie
Copy link

luvwinnie commented Nov 10, 2021

I'm trying to resize a very long sentence , i resized the image to fixed apsect ratio of height 32 and padded the image to 224,224 for example the image shows like this, @WudiJoey have you ever try to train on long width image? Does it effect the accuracy even the image is squeeze something like this?
Screen Shot 2021-11-10 at 12 00 15

@WudiJoey
Copy link
Author

I'm trying to resize a very long sentence , i resized the image to fixed apsect ratio of height 32 and padded the image to 224,224 for example the image shows like this, @WudiJoey have you ever try to train on long width image? Does it effect the accuracy even the image is squeeze something like this? Screen Shot 2021-11-10 at 12 00 15

I haven't try your resize method because i think maybe large blank area will introduce useless infomation. I just resize my images to square directly and it can work. But i think there is a better way to process those long width images, like cutting the image and arrange them by rows.

@luvwinnie
Copy link

Thank you for reply! Cutting the image and arrange by rows seems like a very good way to do so, I would like to take a try.

Hmm...however currently it seems like the inputs is fixed by the base VisionTransformer, maybe we should find out a way to handle variable image just like convolution.... maybe the base Vision Transformer can be improved by using other latest vision transformer based network architecture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants