-
Notifications
You must be signed in to change notification settings - Fork 9.9k
4.0 with LSTM
Tesseract 4.0 alpha source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in the tessdata repository.
-
DAS 2016 tutorial slides
Slides #2, #6, #7 have information about LSTM integration in Tesseract 4.0. -
[4.0 Accuracy and Performance](4.0 Accuracy and Performance)
-
TrainingTesseract 4.00 - Replace Top Layer Example - Norwegian
-
TrainingTesseract 4.00 - Replace Top Layer Example - Devanagari
Unofficial Ubuntu PPAs for Tesseract 4.00 & Leptonica 1.74:
Leponica 1.74.1 package for Debian:
Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha (Jan 30, 2017) are available from the following links:
- Windows Installer made with MinGW-w64 from UB Mannheim
- zip file with cppan generated .dll and .exe files, You have to install VC2015 x86 redist from microsoft.com in order to run them.
Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR from
-
Visual C++ Redistributable for Visual Studio 2015 runtime - vc_redist.x86.exe is REQUIRED for VietOCR to run correctly.
VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.
Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader from
- gImageReader_3.2.1_qt5_i686_tesseract4.0.0.git2f10be5.exe
- gImageReader_3.2.1_qt5_x86_64_tesseract4.0.0.git2f10be5.exe
Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:
https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *
An unofficial installer for Tesseract 3.05-dev for Windows is available from [Tesseract at UB Mannheim] (https://github.com/UB-Mannheim/tesseract/wiki). This includes the training tools.
The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.04 release.
The current official release is [3.04.1] (https://github.com/tesseract-ocr/tesseract/releases/tag/3.04.01).
Old wiki - no longer maintained. The pages were moved, see the new documentation.
These wiki pages are no longer maintained.
All pages were moved to tesseract-ocr/tessdoc.
The latest documentation is available at https://tesseract-ocr.github.io/.