Skip to content

Commit 89c329c

Browse files
committed
Update README.md
1 parent d5fdb4c commit 89c329c

File tree

1 file changed

+27
-3
lines changed

1 file changed

+27
-3
lines changed

README.md

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
NB: The `clstm` subproject is now in its own repository at
2-
3-
https://github.com/tmbdev/clstm
1+
Note: The text line recognizer has been ported to C++ and is now a separate project, the CLSTM project, available here: https://github.com/tmbdev/clstm
42

53
ocropy
64
======
@@ -73,6 +71,32 @@ You can also generate training data using ocropus-linegen:
7371
This will create a directory "linegen/..." containing training data
7472
suitable for training OCRopus with synthetic data.
7573

74+
## CLSTM vs OCRopy
75+
76+
The CLSTM project (https://github.com/tmbdev/clstm) is a replacement for
77+
`ocropus-rtrain` and `ocropus-rpred` in C++ (it used to be a subproject of
78+
`ocropy` but has been moved into a separate project now). It is significantly faster than
79+
the Python versions and has minimal library dependencies, so it is suitable
80+
for embedding into C++ programs.
81+
82+
Python and C++ models can _not_ be interchanged, both because the save file
83+
formats are different and because the text line normalization is slightly
84+
different. Error rates are about the same.
85+
86+
In addition, the C++ command line tool (`clstmctc`) has different command line
87+
options and currently requiresloading training data into HDF5 files, instead
88+
of being trained off a list of image files directly (image file-based training
89+
will be added to `clstmctc` soon).
90+
91+
Generally, your best bet for CLSTM and OCRopy is to rely only on the command
92+
line tools; that makes it easy to replace different components. In addition, you
93+
should keep your OCR training data in .png/.gt.txt files so that you can easily
94+
retrain models as better recognizers become available.
95+
96+
After making CLSTM a full replacement for `ocropus-rtrain`/`ocropus-rpred`, the
97+
next step will be to replace the binarization, text/image segmentation, and layout
98+
analysis in OCRopus with trainable 2D LSTM models.
99+
76100
## Solution for clang
77101

78102
[Read README_OSX.md](README_OSX.md)

0 commit comments

Comments
 (0)