Skip to content

Commit ae84a8e

Browse files
committed
Merge branch 'master' of github.com:tmbdev/ocropy
2 parents 3b843f5 + 7eac431 commit ae84a8e

24 files changed

+1153
-160
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
OLD
22
JUNK
3-
.hg
43
book/
54
temp/
65
models/
@@ -13,3 +12,5 @@ build/
1312
*.os
1413
*.a
1514
*.so
15+
.~*.vue
16+
doc/.ipynb_checkpoints/

.hgignore

Lines changed: 0 additions & 39 deletions
This file was deleted.

.travis.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,4 +45,5 @@ install:
4545
script:
4646
- mkdir ../test_folder
4747
- cd ../test_folder
48-
- ../ocropy/run-test
48+
- ../ocropy/tests/run-unit
49+
- ../ocropy/run-test-ci

README.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
1-
------------------------
2-
| Project Announcements
3-
|:-----------------------
4-
| The text line recognizer has been ported to C++ and is now a separate project, the CLSTM project, available here: https://github.com/tmbdev/clstm
5-
| Please welcome @zuphilip and @kba as additional project maintainers. @tmb is busy developing new DNN models for document analysis (among other things). (10/15/2016)
6-
------------------------
7-
81
ocropy
92
======
103

4+
[![Build Status](https://travis-ci.org/tmbdev/ocropy.svg)](https://travis-ci.org/tmbdev/ocropy)
5+
[![CircleCI](https://circleci.com/gh/UB-Mannheim/ocropy/tree/pull%2F4.svg?style=svg)](https://circleci.com/gh/UB-Mannheim/ocropy/tree/pull%2F4)
6+
[![license](https://img.shields.io/github/license/tmbdev/ocropy.svg)](https://github.com/tmbdev/ocropy/blob/master/LICENSE)
7+
[![Wiki](https://img.shields.io/badge/wiki-11%20pages-orange.svg)](https://github.com/tmbdev/ocropy/wiki)
8+
[![Join the chat at https://gitter.im/tmbdev/ocropy](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tmbdev/ocropy?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
9+
1110
OCRopus is a collection of document analysis programs, not a turn-key OCR system.
1211
In order to apply it to your documents, you may need to do some image preprocessing,
1312
and possibly also train new models.
@@ -21,24 +20,22 @@ trace by default since it seems to confuse too many users).
2120
Installing
2221
----------
2322

24-
[![Build Status](https://travis-ci.org/tmbdev/ocropy.svg)](https://travis-ci.org/tmbdev/ocropy)
25-
[![Join the chat at https://gitter.im/tmbdev/ocropy](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tmbdev/ocropy?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
26-
2723
To install OCRopus dependencies system-wide:
2824

2925
$ sudo apt-get install $(cat PACKAGES)
3026
$ wget -nd http://www.tmbdev.net/en-default.pyrnn.gz
3127
$ mv en-default.pyrnn.gz models/
3228
$ sudo python setup.py install
3329

34-
Alternatively, dependencies can be installed into a [Python Virtual Environment]
35-
(http://docs.python-guide.org/en/latest/dev/virtualenvs/):
30+
Alternatively, dependencies can be installed into a
31+
[Python Virtual Environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/):
3632

3733
$ virtualenv ocropus_venv/
3834
$ source ocropus_venv/bin/activate
3935
$ pip install -r requirements.txt
4036
$ wget -nd http://www.tmbdev.net/en-default.pyrnn.gz
4137
$ mv en-default.pyrnn.gz models/
38+
$ python setup.py install
4239

4340
An additional method using [Conda](http://conda.pydata.org/) is also possible:
4441

@@ -97,6 +94,14 @@ suitable for training OCRopus with synthetic data.
9794

9895
## Roadmap
9996

97+
------------------------
98+
| Project Announcements
99+
|:-----------------------
100+
| The text line recognizer has been ported to C++ and is now a separate project, the CLSTM project, available here: https://github.com/tmbdev/clstm
101+
| New GPU-capable text line recognizers and deep-learning based layout analysis methods are in the works and will be published as separate projects some time in 2017.
102+
| Please welcome @zuphilip and @kba as additional project maintainers. @tmb is busy developing new DNN models for document analysis (among other things). (10/15/2016)
103+
------------------------
104+
100105
A lot of excellent packages have become available for deep learning, vision, and GPU computing over the last few years.
101106
At the same time, it has become feasible now to address problems like layout analysis and text line following
102107
through attentional and reinforcement learning mechanisms. I (@tmb) am planning on developing new software using these

circle.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
machine:
2+
python:
3+
version: 2.7.12
4+
environment:
5+
# Set matplotlb backend to the non-interactive antigrain image lib
6+
MPLBACKEND: Agg
7+
dependencies:
8+
pre:
9+
# 'models' folder is cached, don't download twice
10+
- cd models && wget -nc http://www.tmbdev.net/en-default.pyrnn.gz
11+
# Pipe to cat to hide the progress bars
12+
- pip install -r requirements.txt|cat
13+
cache_directories:
14+
- models
15+
test:
16+
override:
17+
- PATH=$PWD:$PATH ./run-test-ci

doc/dewarp.odg

14.9 KB
Binary file not shown.

doc/dewarp.png

69.2 KB
Loading

doc/line-normalization.ipynb

Lines changed: 290 additions & 0 deletions
Large diffs are not rendered by default.
File renamed without changes.

doc/workflow.html

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
<html><head><title>workflow.vue</title><script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.1/jquery.min.js" type="text/javascript"></script><script type="text/javascript">
2+
jQuery.noConflict();
3+
</script>
4+
<script src="http://vue.tufts.edu/htmlexport-includes/jquery.maphilight.min.js" type="text/javascript"></script><script src="http://vue.tufts.edu/htmlexport-includes/v3/tooltip.min.js" type="text/javascript"></script><script type="text/javascript">jQuery(function() {jQuery.fn.maphilight.defaults = {
5+
fill: false,
6+
fillColor: '000000',
7+
fillOpacity: 0.2,
8+
stroke: true,
9+
strokeColor: '282828',
10+
strokeOpacity: 1,
11+
strokeWidth: 4,
12+
fade: true,
13+
alwaysOn: false
14+
}
15+
jQuery('.example2 img').maphilight();
16+
});
17+
</script>
18+
<style type="text/css">
19+
#tooltip{
20+
position:absolute;
21+
border:1px solid #333;
22+
background:#f7f5d1;
23+
padding:2px 5px;
24+
color:#333;
25+
display:none;
26+
}
27+
</style>
28+
</head><body>
29+
<div class="example2"><img class="map" src="workflow.png" width="959.0" height="626.0" usemap="#vuemap"><map name="vuemap"> <area id="node0" shape="rect" coords="162,312,293,365"></area>
30+
<area href="https://github.com/tmbdev/ocropy/wiki/Compute-errors-and-confusions" target="_blank" id="node1" shape="rect" coords="718,316,849,369"></area>
31+
<area href="https://github.com/tmbdev/ocropy/wiki/Page-Segmentation" target="_blank" id="node2" shape="rect" coords="241,215,385,268"></area>
32+
<area href="https://github.com/tmbdev/ocropy/wiki/Working-with-Ground-Truth" target="_blank" id="node3" shape="rect" coords="482,310,627,363"></area>
33+
<area id="node4" shape="rect" coords="490,113,621,166"></area>
34+
<area id="node5" shape="rect" coords="715,505,846,558"></area>
35+
<area id="node6" shape="rect" coords="15,212,146,265"></area>
36+
<area id="node7" shape="rect" coords="489,215,620,268"></area>
37+
<area id="node8" shape="rect" coords="716,420,847,473"></area>
38+
<area id="node9" shape="rect" coords="652,112,801,165"></area>
39+
<area href="https://github.com/tmbdev/ocropy/wiki/Working-with-Ground-Truth" target="_blank" id="node10" shape="rect" coords="476,419,632,472"></area>
40+
<area id="node11" shape="rect" coords="42,163,118,186"></area>
41+
<area id="node12" shape="rect" coords="716,586,777,609"></area>
42+
<area id="node13" shape="rect" coords="801,586,860,609"></area>
43+
<area href="https://github.com/tmbdev/hocr-tools#hocr-pdf" target="_blank" id="node14" shape="rect" coords="492,17,623,70"></area>
44+
<area id="node15" shape="rect" coords="707,231,799,254"></area>
45+
<area id="node16" shape="rect" coords="660,33,820,56"></area>
46+
<area id="node17" shape="rect" coords="870,319,942,357"></area>
47+
<area id="node18" shape="rect" coords="888,435,934,458"></area>
48+
49+
</map></div></body></html>

0 commit comments

Comments
 (0)