Is the network suit for long-text recognition? #6

WudiJoey · 2021-06-15T12:20:29Z

Thanks for your work!
I read your paper and notice that input images are resized to [224， 224]. In the case of long text line，does it influence the accuracy?
Look forward to your reply!

WudiJoey · 2021-06-16T01:52:47Z

Addding: the width of the text image is often greater than the height. Can image information be preserved to the greatest extent if image is resized to square？
Look forward to your reply~

roatienza · 2021-06-16T09:42:05Z

Hi, The resized images (224x224) are still human readable. The attention maps on square images also appear to be giving proper weights on each character region. Other than these, there was no empirical proof on how is the resizing affecting the accuracy. The alternative way is to resize to (100, 32) and use padding to scaled up to 224x224.

WudiJoey · 2021-06-16T09:52:18Z

Thanks for your reply~
I will try your work.

luvwinnie · 2021-11-10T03:01:40Z

I'm trying to resize a very long sentence , i resized the image to fixed apsect ratio of height 32 and padded the image to 224,224 for example the image shows like this, @WudiJoey have you ever try to train on long width image? Does it effect the accuracy even the image is squeeze something like this?

WudiJoey · 2021-11-15T03:22:59Z

I'm trying to resize a very long sentence , i resized the image to fixed apsect ratio of height 32 and padded the image to 224,224 for example the image shows like this, @WudiJoey have you ever try to train on long width image? Does it effect the accuracy even the image is squeeze something like this?

I haven't try your resize method because i think maybe large blank area will introduce useless infomation. I just resize my images to square directly and it can work. But i think there is a better way to process those long width images, like cutting the image and arrange them by rows.

luvwinnie · 2021-11-15T03:30:42Z

Thank you for reply! Cutting the image and arrange by rows seems like a very good way to do so, I would like to take a try.

Hmm...however currently it seems like the inputs is fixed by the base VisionTransformer, maybe we should find out a way to handle variable image just like convolution.... maybe the base Vision Transformer can be improved by using other latest vision transformer based network architecture

WudiJoey changed the title ~~can the network suit for long~~ Is the network suit for long-text recognition? Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the network suit for long-text recognition? #6

Is the network suit for long-text recognition? #6

WudiJoey commented Jun 15, 2021 •

edited

Loading

WudiJoey commented Jun 16, 2021

roatienza commented Jun 16, 2021

WudiJoey commented Jun 16, 2021

luvwinnie commented Nov 10, 2021 •

edited

Loading

WudiJoey commented Nov 15, 2021

luvwinnie commented Nov 15, 2021

Is the network suit for long-text recognition? #6

Is the network suit for long-text recognition? #6

Comments

WudiJoey commented Jun 15, 2021 • edited Loading

WudiJoey commented Jun 16, 2021

roatienza commented Jun 16, 2021

WudiJoey commented Jun 16, 2021

luvwinnie commented Nov 10, 2021 • edited Loading

WudiJoey commented Nov 15, 2021

luvwinnie commented Nov 15, 2021

WudiJoey commented Jun 15, 2021 •

edited

Loading

luvwinnie commented Nov 10, 2021 •

edited

Loading