Tesseract User Manual

This user manual is for Tesseract versions 4.x.x and 5.0.0.x. For versions 3.05.02 and older, see the documentation for old versions.

Introduction

Tesseract is an open source text recognition (OCR) Engine, available under the Apache 2.0 license..

The current official release is 4.1.1.
The master branch on Github can be used by those who want the latest code for LSTM (--oem 1) and legacy (--oem 0) Tesseract. The master branch is using 5.0.0 versioning because code modernization caused API compatibility issues with 4.x release.
The 3.05 branch on GitHub can be used by those who want the bug fixes for 3.05.02 release for legacy Tesseract.

Tesseract can be used directly via command line, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages. Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page.

Tesseract can be used in your own project, under the terms of the Apache License 2.0. It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the 3rdParty page for a sample of what has been done with it.

If you have a question, first read the documentation, particularly the FAQ to see if your problem is addressed there. If not, search the Issues List, Tesseract user forum and Tesseract developer forum, and if you still can't find what you need, please ask a question in Tesseract user forum Google group.

Also, Tesseract is free software, so if you want to pitch in and help, please do! If you find a bug and fix it yourself, the best thing to do is to attach the patch to your bug report in the Issues List.

Releases and Changelog

4.0 with LSTM

Tesseract 4.0x+ added a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux with official Language Model data available for 100+ languages and 35+ scripts. For detailed information about the different types of models, see Data Files.

Model files for version 4.00 are available from tessdata tagged 4.00. It has models from November 2016. The individual language file links are available from the following link.

tessdata 4.00 November 2016

Model files for version 4.0.0 and later are available from tessdata tagged 4.0.0. It has legacy models from September 2017 that have been updated with Integer versions of tessdata_best LSTM models. This set of traineddata files has support for both the legacy recognizer with --oem 0 and for LSTM models with --oem 1. These models are available from the following Github repo.

tessdata

Two more sets of official traineddata, trained at Google, are made available in the following Github repos. These do not have the legacy models and only have LSTM models usable with --oem 1.

5.0.0.x

Tesseract 5.0.0.x source code is available in the 'master' branch of the repository. The master branch is using 5.0.0 versioning because code modernization caused API incompatibility with 4.x release.

Binaries are available from:

Language model traineddata files same as listed above for version 4.0.0 can be used with Tesseract 5.0.0.x. These are available from:

Compiling and Installation

Usage

Technical Information

Historical Technical Documentation
API/ABI changes review for Tesseract
Manual Pages
Source Documentation generated by Doxygen
Neural Nets in Tesseract 4.00
VGSL Specs
VGSL Specs info from Tensorflow
Network spec for tessdata_fast models
Network spec for tessdata_best models
DAS 2016 tutorial slides Slides #2, #6, #7 have information about LSTM integration in Tesseract 4.0x.
4.0 Accuracy and Performance
Tesseract OpenCL - Experimental

Name		Name	Last commit message	Last commit date
Latest commit History 1,773 Commits
examples		examples
images		images
tess3		tess3
4.0-Accuracy-and-Performance.md		4.0-Accuracy-and-Performance.md
4.0-Docker-Containers.md		4.0-Docker-Containers.md
4.0-with-LSTM.md		4.0-with-LSTM.md
4.0x-Changelog.md		4.0x-Changelog.md
4.0x-Common-Errors-and-Resolutions.md		4.0x-Common-Errors-and-Resolutions.md
404.html		404.html
APIExample-user_patterns.md		APIExample-user_patterns.md
APIExample.md		APIExample.md
AddOns.md		AddOns.md
Command-Line-Usage.md		Command-Line-Usage.md
Compiling-–-GitInstallation.md		Compiling-–-GitInstallation.md
Compiling.md		Compiling.md
Data-Files-Contributions.md		Data-Files-Contributions.md
Data-Files-in-different-versions.md		Data-Files-in-different-versions.md
Data-Files-in-tessdata_best.md		Data-Files-in-tessdata_best.md
Data-Files-in-tessdata_fast.md		Data-Files-in-tessdata_fast.md
Data-Files.md		Data-Files.md
Documentation.md		Documentation.md
Downloads.md		Downloads.md
Examples_C++.md		Examples_C++.md
FAQ.md		FAQ.md
Fonts.md		Fonts.md
Home.md		Home.md
ImproveQuality.md		ImproveQuality.md
Installation.md		Installation.md
Make-Box-Files.md		Make-Box-Files.md
Making-Box-Files---4.0.md		Making-Box-Files---4.0.md
NeuralNetsInTesseract4.00.md		NeuralNetsInTesseract4.00.md
OldVersionDocs.md		OldVersionDocs.md
Planning.md		Planning.md
README.md		README.md
ReleaseNotes.md		ReleaseNotes.md
ScrollView.jar		ScrollView.jar
TesseractOpenCL.md		TesseractOpenCL.md
TestingTesseract.md		TestingTesseract.md
The-Hallucination-Effect.md		The-Hallucination-Effect.md
Training-Tesseract.md		Training-Tesseract.md
TrainingTesseract-4.00---Finetune.md		TrainingTesseract-4.00---Finetune.md
TrainingTesseract-4.00.md		TrainingTesseract-4.00.md
TrainingTesseract.md		TrainingTesseract.md
UNLV-Testing-of-Tesseract.md		UNLV-Testing-of-Tesseract.md
User-App-Example.md		User-App-Example.md
User-Projects-–-3rdParty.md		User-Projects-–-3rdParty.md
VGSLSpecs.md		VGSLSpecs.md
ViewerDebugging.md		ViewerDebugging.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tesseract User Manual

Introduction

Releases and Changelog

4.0 with LSTM

5.0.0.x

Compiling and Installation

Usage

Technical Information

Training

Testing

External Projects

User Manual for Old Versions

About

Releases

Packages

Languages

hunsra/tessdoc

Folders and files

Latest commit

History

Repository files navigation

Tesseract User Manual

Introduction

Releases and Changelog

4.0 with LSTM

5.0.0.x

Compiling and Installation

Usage

Technical Information

Training

Testing

External Projects

User Manual for Old Versions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages