Skip to content

4.0 with LSTM

Shreeshrii edited this page Mar 26, 2017 · 58 revisions

4.0

Tesseract 4.0 alpha source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in the tessdata repository.

Documentation

Training Tesseract LSTM engine

3.0 version of box files can be converted for use with LSTM training by adding a tab character at end of each line. Mark EOL and 'Mark EOL Bulk' functions under Edit in Box Editor tab of latest version of JTessboxeditor - jTessBoxEditor-2.0-Beta can be used to do it automatically.

4.0.0-alpha ppa

Unofficial Ubuntu PPAs for Tesseract 4.00 & Leptonica 1.74:

Leponica 1.74.1 package for Debian:

4.0.0-alpha for Windows

Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha (Jan 30, 2017) are available from the following links:

4.0.0-alpha with GUI frontend

VietOCR

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR from

VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.

gImageReader

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader from

Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:

https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *

3.05-dev

An unofficial installer for Tesseract 3.05-dev for Windows is available from [Tesseract at UB Mannheim] (https://github.com/UB-Mannheim/tesseract/wiki). This includes the training tools.

The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.04 release.

3.04.1

The current official release is [3.04.1] (https://github.com/tesseract-ocr/tesseract/releases/tag/3.04.01).

As of 02/02/2020


These wiki pages are no longer maintained.

All pages were moved to tesseract-ocr/tessdoc.

The latest documentation is available at https://tesseract-ocr.github.io/.


Clone this wiki locally