-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving OCR recognition #8
Comments
I have been testing alternatives to Tesseract and Easy OCR seems to do a much better recognition work (but messes the output format a little if there is furigana, see: JaidedAI/EasyOCR#575 I have barely no coding experience but I'm looking into trying to fork the project to try to add support for backends different to Tesseract. Will report if I manage to do anything useful. |
Improving the OCR accuracy is definitely an ongoing goal of this project. Including alternative backends does sound interesting however it seems like Easy OCR only supports python. I think a better option would be to focus development efforts towards fine tuning tesseract to recognize text better along with some extra text processing. One of the first steps would be to implement a text processing stage which replaces many of the commonly missed characters with the expected ones, sort of how Kaku does it. Another thing to look into is further training the models to adapt to commonly missed fonts. I'm open to any contributions or ideas so feel free to share your findings. |
So here's one problem I found while OCRing Steins;Gate. When I change the Otsu Score Fraction to anything greater than or equal to Here's the relevant line of code: Let me know your thoughts on this, what I should test this setting on, etc. |
Are there any options that can be played with to try to improve recognition? In example, this text:
Is readed as:
「 お しゝ ! 邊つかったか ? .
If I remove the furigana from the selection (not very convinient for multiline texts) I get :
「 おいしい ! 上幅つかっ たか ? 」
The text was updated successfully, but these errors were encountered: