You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm extracting data from PDF with native text and some rows of the table have their content shuffled, as you can see in this live example or here:
vs
I'm using Tessaract as OCR but if I understood well, it should not be used since the text is native. I also saw that behavior with some bold text (but not all), I don't know if it's related.
Is there a workaround? Maybe some misused params on my configuration?
Thank you
The text was updated successfully, but these errors were encountered:
I didn't. I tried a workaround with pattern matching because my use case only need to know if a kind of substring exists, but it's harder when the words are in reverse.
Are you using Tessaract too? I don't think it's related but maybe I'm wrong and it's the source of the issue
Hi,
I'm extracting data from PDF with native text and some rows of the table have their content shuffled, as you can see in this live example or here:
vs
I'm using Tessaract as OCR but if I understood well, it should not be used since the text is native. I also saw that behavior with some bold text (but not all), I don't know if it's related.
Is there a workaround? Maybe some misused params on my configuration?
Thank you
The text was updated successfully, but these errors were encountered: