add layout parser examples and docs

bharathgs · bharathgs · commit fbc485843a70 · 2022-08-08T17:27:17.000+05:30
diff --git a/docs/api-reference.rst b/docs/api-reference.rst
@@ -87,16 +87,18 @@ The ``ocrpy.experimental`` module contains experimental features that are not ye
 or be removed in future releases. 
 
 Currently it exposes the :class:`ocrpy.experimental.document_classifier.DocumentClassifier` 
-class which can be used to classify documents into various categories.
-
-Other experimental features which are not exposed include document layout parsers which would enable 
-you to extract the layout of a document and extract the text from the document with different type of parsers 
-like table or text parsers.
+class which can be used to classify documents into various categories & :class:`ocrpy.experimental.layout_parser.DocumentLayoutParser` 
+class which can be used to identify different components of a document like text, title, table, figures etc.
 
+These can be used along with the ocr pipelines, as preprocessing utils to identify different types of documents
+and their layout and launch appropriate ocr pipelines for custom processing.
 
 .. autoclass:: ocrpy.experimental.document_classifier.DocumentClassifier
 .. automethod:: ocrpy.experimental.document_classifier.DocumentClassifier.predict
 
+.. autoclass:: ocrpy.experimental.layout_parser.DocumentLayoutParser
+.. automethod:: ocrpy.experimental.layout_parser.DocumentLayoutParser.parse
+
 
 ``ocrpy.utils``
 ---------------
diff --git a/docs/examples.rst b/docs/examples.rst
@@ -146,4 +146,31 @@ When you run this for the first time, it will download the pretrained model weig
 Alternatively you can download or use your own pretrained model weights as well. 
 For more info on this see Huggingface `transformers <https://huggingface.co/docs/transformers/index>`_ library documentation.
 
-For more information on the default model and the categories it classifies to, please refer `ocrpy.experimental.document_classifier`.
+For more information on the default model and the categories it classifies to, please refer `ocrpy.experimental.document_classifier`.
+
+Parse layout with ``ocrpy.experimental.layout_parser``
+---------------------------------------------------
+
+In this example let's look at how you can use ``ocrpy`` to parse layout from a document using the 
+``ocrpy.experimental.layout_parser`` API.
+
+.. code-block:: python
+
+    from ocrpy import DocumentReader, TextParser
+    from ocrpy.experimental import LayoutParser
+
+    DOC_PATH = '../documents/document.img' # path to an image or pdf file on s3 bucket, gcs bucket or local directory.
+
+    reader = DocumentReader(file=DOC_PATH)
+    text_parser = TextParser()
+    layout_parser = LayoutParser()
+
+    parsed_layout = layout_parser.parse(reader, text_parser)
+
+.. note:: ``ocrpy`` uses Microsoft's LayoutParser library in the backend to perform the layout parsing.
+            as such, please make sure you have the ``layoutparser`` library installed, if not please install it from 
+            `LayoutParser <https://github.com/Layout-Parser/layout-parser>`_.
+
+When you run this for the first time, it will download the pretrained model weights and store them in a local directory.
+Alternatively you can download or use your own pretrained model weights as well. The model weights can be downloaded from
+`LayoutParser Model Catalog <https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html#model-catalog>`_.