How to get whitelist to work with pytesseract #23

GoogleCodeExporter · 2016-03-21T05:55:55Z

What steps will reproduce the problem?

Trying to use the code that makes a whitelist for Tesseract like follows

ocr = tesseract.TessBaseAPI()
ocr.SetVariable("tessedit_char_whitelist", "0123456789;")
ocr.SetPageSegMode(tesseract.PSM_AUTO)
ocr.Init("C:\\Program Files (x86)\\Tesseract-OCR\\","eng",tesseract.OEM_DEFAULT)

What is the expected output? What do you see instead?

Intended output is to have only "0123456789;" characters be recognized when 
using the image_to_string() function.  Using code like what is above, 
image_to_string() just ignores it and grabs whatever characters it finds.

What version of the product are you using? On what operating system?

pytesseract-0.1, Python 2.7, Windows 8.1

Please provide any additional information below.

I've been trying everything people use for Tesseract-OCR, but that doesn't work 
with pytesseract.  I haven't been able to find any solution or method to 
whitelisting with the image_to_string() function anywhere, which would be 
immensely helpful in improving the accuracy of the function.

Thanks in advance for any help on the matter.

Original issue reported on code.google.com by [email protected] on 9 Jun 2015 at 6:58

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Mar 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get whitelist to work with pytesseract #23

How to get whitelist to work with pytesseract #23

GoogleCodeExporter commented Mar 21, 2016

How to get whitelist to work with pytesseract #23

How to get whitelist to work with pytesseract #23

Comments

GoogleCodeExporter commented Mar 21, 2016