-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tesseract does not recognize letters of good quality #3858
Comments
Either try to preprocess the image before running (latest, not a rather old) Tesseract. Or use a different software. Tesseract works best with black (or at least grey) letters on white background. It can also handle inverted line images with white letters on black background. The current code does not handle lines with a mix of normal and inverted text. Generally layout and line detection with algorithms seems to be difficult, and Tesseract does not work nearly perfectly for complex layouts. Other modern software uses trained neural networks for layout detection and might give better results. Tesseract still misses that, mainly because of missing developer resources. |
Some of the problems reported here might be fixed with draft pull request #3857:
|
Is this with Tess |
No, it is with Tesseract 5. 😄 |
Environment
My observation is about the following image:
Current Behavior:
In
tessarect does not recognize the following letters:
Expected Behavior:
I had expected, that tesseract would recognize this letters, because the quality of this letters is quite good.
I am just surprised, that it does not recognize them. What do I miss?
Do you have an explanation for this behaviour?
Would be very interested in an answer.
Thank you.
The text was updated successfully, but these errors were encountered: