This page lists repositories with Tesseract4 compatible tessdata (for --oem 1 - LSTM) by Tesseract community.
Such tessdata contributions should ideally document everything needed to reproduce the training process (fonts, images, ground truth, texts, scripts, documentation, ...).
Language Code | Language | Data File | Contributor | Info |
---|---|---|---|---|
khmLimon | Khmer | best | OpenInstituteCambodia/phyrumsk | PR in tessdata_best |
cop | Coptic | best | shreeshrii/tessdata_coptic | tesseract-ocr forum post |
jpn_vert | Japanese Vertical | best | zodiac3539/jpn_vert | tesseract-ocr forum post |
ocrb_plus | MRZ | best | shreeshrii/tessdata_ocrb | tesseract-ocr forum post |
jav_java | Aksara Jawa | best | Shreeshrii/tessdata_jav_java | tesseract-ocr forum post |
mrz | MRZ | best | DoubangoTelecom/tesseractMRZ | tesseract-ocr forum post |
dot_matrix | MRZ | best | ameera3/OCR_Expiration_Date | tesseract-ocr forum post |
e13b | E13B (or MICR) | best | ElMagoElGato/tess_e13b_training | tesseract-ocr forum post |
e13b | E13B (or MICR) | best | DoubangoTelecom/tesseractMICR | tesseract-ocr forum post |