Skip to content

How to use custom chunking with external API (NLM Ingestor) in OpenWebUI? #475

@drspam1991

Description

@drspam1991

Description

I want to use a custom chunking method for document processing in OpenWebUI. Specifically, I have an external service (nlm-ingestor) that extracts semantic chunks from PDFs very accurately. I want to integrate this service with OpenWebUI's document handling pipeline.

Use Case

Currently, OpenWebUI supports document processing with built-in chunking, but I need to override this with my own chunking logic. I chunk the input document like this:

from llmsherpa.readers import LayoutPDFReader

llmsherpa_api_url = "http://localhost:5010/api/parseDocument?renderFormat=all"
pdf_path = "/home/user/sample.pdf"

pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_path)

chunks = [chunk.to_text() for chunk in doc.chunks()]

I want to inject these extracted chunks into OpenWebUI so that they can be indexed and retrieved via its RAG system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions