Skip to content

4.0 with LSTM

Shreeshrii edited this page Mar 5, 2017 · 58 revisions

4.0

Tesseract 4.0 alpha source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in the tessdata repository.

Documentation

4.0.0-alpha ppa

Unofficial Ubuntu PPAs for Tesseract 4.00 & Leptonica 1.74:

Leponica 1.74.1 package for Debian:

4.0.0-alpha for Windows

Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha (Jan 30, 2017) are available from the following links:

4.0.0-alpha with GUI frontend

VietOCR

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR from

VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.

gImageReader

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader from

Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:

https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *

3.05-dev

An unofficial installer for Tesseract 3.05-dev for Windows is available from [Tesseract at UB Mannheim] (https://github.com/UB-Mannheim/tesseract/wiki). This includes the training tools.

The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.04 release.

3.04.1

The current official release is [3.04.1] (https://github.com/tesseract-ocr/tesseract/releases/tag/3.04.01).

As of 02/02/2020


These wiki pages are no longer maintained.

All pages were moved to tesseract-ocr/tessdoc.

The latest documentation is available at https://tesseract-ocr.github.io/.


Clone this wiki locally