How can I improve ocropus accuracy?

Hi,
I'm facing 2 problems:

1 - I need to use Ocropy to extract text from documents in Portuguese.
So far, I have added the required characters in char.py and I am training (with a previously trained model) the network based on this: https://github.com/tmbdev/ocropy/wiki/Working-with-Ground-Truth.

2 - I know about document quality restrictions (300 dpi), but some images that I have are bad scans. I've tried the same images in other APIs (like Google Vision) and got better results, but I liked ocropy.
I'm wondering if there are some preprocess techniques that can improve the results.

So, what can I do? What is the best way to generate data for training ocropy network?
Edit: ocropy training supports multithreading?
Thanks!
  




* Python version:
Python 2.7.14 :: Anaconda, Inc.

* Git revision of ocropy: 
commit e9b6121de2637e54495125c6a97a4ef75d872a2e
Merge: 43381c4 289a58f
Author: Konstantin Baierer <kba@users.noreply.github.com>
Date:   Mon Feb 19 19:24:12 2018 +0100

    Merge pull request #236 from lehzwo/master
    
    ocropus-gpageseg: Enable usage of masks to specify column separators/ ignore areas of scan

* Operating System and version:
Linux ubuntu-virtual 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I improve ocropus accuracy? #296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How can I improve ocropus accuracy? #296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions