-
Notifications
You must be signed in to change notification settings - Fork 596
Description
Hi,
I'm facing 2 problems:
1 - I need to use Ocropy to extract text from documents in Portuguese.
So far, I have added the required characters in char.py and I am training (with a previously trained model) the network based on this: https://github.com/tmbdev/ocropy/wiki/Working-with-Ground-Truth.
2 - I know about document quality restrictions (300 dpi), but some images that I have are bad scans. I've tried the same images in other APIs (like Google Vision) and got better results, but I liked ocropy.
I'm wondering if there are some preprocess techniques that can improve the results.
So, what can I do? What is the best way to generate data for training ocropy network?
Edit: ocropy training supports multithreading?
Thanks!
-
Python version:
Python 2.7.14 :: Anaconda, Inc. -
Git revision of ocropy:
commit e9b6121
Merge: 43381c4 289a58f
Author: Konstantin Baierer kba@users.noreply.github.com
Date: Mon Feb 19 19:24:12 2018 +0100Merge pull request ocropus-gpageseg: Enable usage of masks to specify column separators/ ignore areas of scan #236 from lehzwo/master
ocropus-gpageseg: Enable usage of masks to specify column separators/ ignore areas of scan
-
Operating System and version:
Linux ubuntu-virtual 4.10.0-28-generic doc #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux