Skip to content

4.0 with LSTM

Shreeshrii edited this page Sep 11, 2019 · 58 revisions

4.0 +

Tesseract 4.0 + source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in tessdata, tessdata_best, tessdata_fast repositories.

Documentation

Training Tesseract LSTM engine

4.x ppa

Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:

Leptonica 1.74.1 package for Debian:

4.0.0-alpha for Windows

Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha are available from the following links. Each one is from a different commit from master branch in early 2017. See individual sites for more details:

4.0.0-alpha with GUI frontend

VietOCR

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR from

VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.

gImageReader

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader from

Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:

https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *

3.05-dev

The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.05.01 release.

An unofficial installer for Tesseract 3.05-dev for Windows is available from Tesseract at UB Mannheim. This includes the training tools.

Current official release

The current official release is 3.05.01.

As of 02/02/2020


These wiki pages are no longer maintained.

All pages were moved to tesseract-ocr/tessdoc.

The latest documentation is available at https://tesseract-ocr.github.io/.


Clone this wiki locally