Skip to content

Commit fbc4858

Browse files
committed
add layout parser examples and docs
1 parent ac1a540 commit fbc4858

File tree

2 files changed

+35
-6
lines changed

2 files changed

+35
-6
lines changed

docs/api-reference.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -87,16 +87,18 @@ The ``ocrpy.experimental`` module contains experimental features that are not ye
8787
or be removed in future releases.
8888

8989
Currently it exposes the :class:`ocrpy.experimental.document_classifier.DocumentClassifier`
90-
class which can be used to classify documents into various categories.
91-
92-
Other experimental features which are not exposed include document layout parsers which would enable
93-
you to extract the layout of a document and extract the text from the document with different type of parsers
94-
like table or text parsers.
90+
class which can be used to classify documents into various categories & :class:`ocrpy.experimental.layout_parser.DocumentLayoutParser`
91+
class which can be used to identify different components of a document like text, title, table, figures etc.
9592

93+
These can be used along with the ocr pipelines, as preprocessing utils to identify different types of documents
94+
and their layout and launch appropriate ocr pipelines for custom processing.
9695

9796
.. autoclass:: ocrpy.experimental.document_classifier.DocumentClassifier
9897
.. automethod:: ocrpy.experimental.document_classifier.DocumentClassifier.predict
9998

99+
.. autoclass:: ocrpy.experimental.layout_parser.DocumentLayoutParser
100+
.. automethod:: ocrpy.experimental.layout_parser.DocumentLayoutParser.parse
101+
100102

101103
``ocrpy.utils``
102104
---------------

docs/examples.rst

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,4 +146,31 @@ When you run this for the first time, it will download the pretrained model weig
146146
Alternatively you can download or use your own pretrained model weights as well.
147147
For more info on this see Huggingface `transformers <https://huggingface.co/docs/transformers/index>`_ library documentation.
148148

149-
For more information on the default model and the categories it classifies to, please refer `ocrpy.experimental.document_classifier`.
149+
For more information on the default model and the categories it classifies to, please refer `ocrpy.experimental.document_classifier`.
150+
151+
Parse layout with ``ocrpy.experimental.layout_parser``
152+
---------------------------------------------------
153+
154+
In this example let's look at how you can use ``ocrpy`` to parse layout from a document using the
155+
``ocrpy.experimental.layout_parser`` API.
156+
157+
.. code-block:: python
158+
159+
from ocrpy import DocumentReader, TextParser
160+
from ocrpy.experimental import LayoutParser
161+
162+
DOC_PATH = '../documents/document.img' # path to an image or pdf file on s3 bucket, gcs bucket or local directory.
163+
164+
reader = DocumentReader(file=DOC_PATH)
165+
text_parser = TextParser()
166+
layout_parser = LayoutParser()
167+
168+
parsed_layout = layout_parser.parse(reader, text_parser)
169+
170+
.. note:: ``ocrpy`` uses Microsoft's LayoutParser library in the backend to perform the layout parsing.
171+
as such, please make sure you have the ``layoutparser`` library installed, if not please install it from
172+
`LayoutParser <https://github.com/Layout-Parser/layout-parser>`_.
173+
174+
When you run this for the first time, it will download the pretrained model weights and store them in a local directory.
175+
Alternatively you can download or use your own pretrained model weights as well. The model weights can be downloaded from
176+
`LayoutParser Model Catalog <https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html#model-catalog>`_.

0 commit comments

Comments
 (0)