4.0 with LSTM

4.0 +

Tesseract 4.0 + source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in tessdata, tessdata_best, tessdata_fast repositories.

Documentation

NeuralNetsInTesseract4.00
VGSLSpecs
VGSLSpecs info from Tensorflow
DAS 2016 tutorial slides
Slides #2, #6, #7 have information about LSTM integration in Tesseract 4.0.
4.0 Accuracy and Performance

Training Tesseract LSTM engine

4.x ppa

Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:

https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr

Leptonica 1.74.1 package for Debian:

https://packages.debian.org/sid/libleptonica-dev

4.0.0-alpha for Windows

Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha are available from the following links. Each one is from a different commit from master branch in early 2017. See individual sites for more details:

Windows Installer made with MinGW-w64 from UB Mannheim
zip file with cppan generated .dll and .exe files, You have to install VC2015 x86 redist from microsoft.com in order to run them.
Win64 build of tesseract 4.0.0 alpha, leptonica 1.74.1, and charlesw/tesseract .Net wrapper - built using CPPAN for Visual Studio 2017.

4.0.0-alpha with GUI frontend

VietOCR

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR from

VietOCR5.0alpha
Visual C++ Redistributable for Visual Studio 2015 runtime - vc_redist.x86.exe is REQUIRED for VietOCR to run correctly.

VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.

gImageReader

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader from

Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:

https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *

3.05-dev

The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.05.01 release.

An unofficial installer for Tesseract 3.05-dev for Windows is available from Tesseract at UB Mannheim. This includes the training tools.

Current official release

The current official release is 3.05.01.

Old wiki - no longer maintained. The pages were moved, see the new documentation.

As of 02/02/2020

These wiki pages are no longer maintained.

All pages were moved to tesseract-ocr/tessdoc.

The latest documentation is available at https://tesseract-ocr.github.io/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly