-
Notifications
You must be signed in to change notification settings - Fork 0
Compiling
Note: This wiki expects you to be familiar with compiling software on your operation system.
Autotools Leptonica
If they are not already installed, you need the following libraries (Ubuntu):
sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libicu-dev # (if you plan to make the training tools)
sudo apt-get install libpango1.0-dev # (if you plan to make the training tools)
sudo apt-get install libcairo2-dev # (if you plan to make the training tools)
You also need to install Leptonica. There is an apt-get package libleptonica-dev, but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.
** 3.01 requires at least v1.67 of Leptonica.
** 3.02 requires at least v1.69 of Leptonica. (Both available in Ubuntu 12.04 Precise Pangolin.)
** 3.03 requires at least v1.70 of Leptonica. (Both available in Ubuntu 14.04 Trusty Tahr.)**
** 3.04 requires at least v1.71 of Leptonica.**
The sources are at http://www.leptonica.org/. The instructions at Leptonica README are clear, but basically it is as described in Compilation below.
Ensure that the development headers for Leptonica are installed before compiling Tesseract. Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.
Tesseract uses a standard autotools based build system, so the compilation process should be familiar.
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
On some systems autotools does not create m4 directory automatically (giving the error: "configure: error: cannot find macro directory 'm4'"). In this case you must create m4 directory (mkdir m4
), and then rerun the above commands starting with ./configure.
If you want the training tools (3.03), you will also need to run the following commands:
make training
sudo make training-install
Build of training tools is not available if you do not have necessary dependencies (pay attention to messages from ./configure script).
Tesseract can be configured to install anywhere, which makes it possible to install it without root access.
To install it in $HOME/local:
./autogen.sh
./configure --prefix=$HOME/local/
make install
To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:
./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure \
--prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make install
- Download langugage data file (e.g. 'wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.01.eng.tar.gz' for 3.01 version)
- Decompress it ('tar xf tesseract-ocr-3.01.eng.tar.gz')
- Move it to installation of tessdata (e.g. 'mv tesseract-ocr/tessdata $TESSDATA_PREFIX' if defined TESSDATA_PREFIX)
You can also use:
export TESSDATA_PREFIX=/some/path/to/tessdata
to point to your tessdata directory (example: if your tessdata path is '/usr/local/share/tessdata' you have to use 'export TESSDATA_PREFIX='/usr/local/share/').
For tesseract-ocr 3.02 please follow instruction in Visual Studio 2008 Developer Notes for Tesseract-OCR.
Download these packages from the Downloads page:
-
tesseract-[version].tar.gz
- Tesseract source -
tesseract-[version]-win_vs.zip
- Visual studio (2008 & 2010) solution with necessary libraries -
tesseract-ocr-[version].eng.tar.gz
- English language file for tesseract (or download other language training file)
Unpack them to one directory (e.g. tesseract-3.01
). Note that tesseract-ocr-[version].eng.tar.gz
names the root directory 'tesseract-ocr'
instead of 'tesseract-[version]'
.
Windows relevant files are located in vs2008 directory (e.g. 'tesseract-3.01\vs2008'). The same build process as usual applies: Open tesseract.sln with VC++Express 2008 and build all (or just Tesseract.) It should compile (in at least release mode) without having to install anything further. The dll dependencies and Leptonica are included. Output will be in tesseract-3.01\vs2008\bin (or tesseract-3.01\vs2008\bin.rd or tesseract-3.01\vs2008\bin.dbg based on configuration build).
Have a look at blog How to build Tesseract 3.03 with Visual Studio 2013.
For Mingw+Msys have a look at blog Compiling Leptonica and Tesseract-ocr with Mingw+Msys.
For Cygwin have a look at blog How to build Tesseract on Cygwin. Simon Eigeldinger started provide tesseract compiled by cygwin.