Skip to content
@datalab-to

Datalab

Developing state of the art document intelligence models.

Pinned Loading

  1. marker marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    Python 28.5k 1.9k

  2. surya surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    Python 18.5k 1.2k

  3. pdftext pdftext Public

    Extract structured text from pdfs quickly

    Python 592 54

Repositories

Showing 7 of 7 repositories
  • marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    datalab-to/marker’s past year of commit activity
    Python 28,470 1,863 272 35 Updated Sep 7, 2025
  • surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    datalab-to/surya’s past year of commit activity
    Python 18,496 1,249 123 12 Updated Sep 7, 2025
  • datalab-on-prem Public

    Scripts to run Datalab's self-service on-prem container

    datalab-to/datalab-on-prem’s past year of commit activity
    Shell 0 0 0 0 Updated Aug 29, 2025
  • sdk Public
    datalab-to/sdk’s past year of commit activity
    HTML 3 MIT 2 2 1 Updated Aug 21, 2025
  • datalab-to/inference-mirror’s past year of commit activity
    Python 2 1 0 1 Updated Aug 13, 2025
  • docext Public

    An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

    datalab-to/docext’s past year of commit activity
    Python 4 Apache-2.0 1 0 0 Updated Jun 18, 2025
  • pdftext Public

    Extract structured text from pdfs quickly

    datalab-to/pdftext’s past year of commit activity
    Python 592 Apache-2.0 54 8 4 Updated Jun 11, 2025

Most used topics

Loading…