Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get whitelist to work with pytesseract #23

Open
GoogleCodeExporter opened this issue Mar 21, 2016 · 0 comments
Open

How to get whitelist to work with pytesseract #23

GoogleCodeExporter opened this issue Mar 21, 2016 · 0 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?

Trying to use the code that makes a whitelist for Tesseract like follows

ocr = tesseract.TessBaseAPI()
ocr.SetVariable("tessedit_char_whitelist", "0123456789;")
ocr.SetPageSegMode(tesseract.PSM_AUTO)
ocr.Init("C:\\Program Files (x86)\\Tesseract-OCR\\","eng",tesseract.OEM_DEFAULT)

What is the expected output? What do you see instead?

Intended output is to have only "0123456789;" characters be recognized when 
using the image_to_string() function.  Using code like what is above, 
image_to_string() just ignores it and grabs whatever characters it finds.

What version of the product are you using? On what operating system?

pytesseract-0.1, Python 2.7, Windows 8.1

Please provide any additional information below.

I've been trying everything people use for Tesseract-OCR, but that doesn't work 
with pytesseract.  I haven't been able to find any solution or method to 
whitelisting with the image_to_string() function anywhere, which would be 
immensely helpful in improving the accuracy of the function.

Thanks in advance for any help on the matter.

Original issue reported on code.google.com by [email protected] on 9 Jun 2015 at 6:58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant