-
Notifications
You must be signed in to change notification settings - Fork 9.8k
4.0 with LSTM
Tesseract 4.0 + source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in tessdata, tessdata_best, tessdata_fast repositories.
-
DAS 2016 tutorial slides
Slides #2, #6, #7 have information about LSTM integration in Tesseract 4.0.
Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:
Leptonica 1.74.1 package for Debian:
Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha are available from the following links. Each one is from a different commit from master branch in early 2017. See individual sites for more details:
- Windows Installer made with MinGW-w64 from UB Mannheim
- zip file with cppan generated .dll and .exe files, You have to install VC2015 x86 redist from microsoft.com in order to run them.
- Win64 build of tesseract 4.0.0 alpha, leptonica 1.74.1, and charlesw/tesseract .Net wrapper - built using CPPAN for Visual Studio 2017.
Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR from
-
Visual C++ Redistributable for Visual Studio 2015 runtime - vc_redist.x86.exe is REQUIRED for VietOCR to run correctly.
VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.
Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader from
- gImageReader_3.2.1_qt5_i686_tesseract4.0.0.git2f10be5.exe
- gImageReader_3.2.1_qt5_x86_64_tesseract4.0.0.git2f10be5.exe
Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:
https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *
The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.05.01 release.
An unofficial installer for Tesseract 3.05-dev for Windows is available from Tesseract at UB Mannheim. This includes the training tools.
The current official release is 3.05.01.
Old wiki - no longer maintained. The pages were moved, see the new documentation.
These wiki pages are no longer maintained.
All pages were moved to tesseract-ocr/tessdoc.
The latest documentation is available at https://tesseract-ocr.github.io/.