Description
I want to use a custom chunking method for document processing in OpenWebUI. Specifically, I have an external service (nlm-ingestor) that extracts semantic chunks from PDFs very accurately. I want to integrate this service with OpenWebUI's document handling pipeline.
Use Case
Currently, OpenWebUI supports document processing with built-in chunking, but I need to override this with my own chunking logic. I chunk the input document like this:
from llmsherpa.readers import LayoutPDFReader
llmsherpa_api_url = "http://localhost:5010/api/parseDocument?renderFormat=all"
pdf_path = "/home/user/sample.pdf"
pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_path)
chunks = [chunk.to_text() for chunk in doc.chunks()]
I want to inject these extracted chunks into OpenWebUI so that they can be indexed and retrieved via its RAG system.
Description
I want to use a custom chunking method for document processing in OpenWebUI. Specifically, I have an external service (
nlm-ingestor) that extracts semantic chunks from PDFs very accurately. I want to integrate this service with OpenWebUI's document handling pipeline.Use Case
Currently, OpenWebUI supports document processing with built-in chunking, but I need to override this with my own chunking logic. I chunk the input document like this:
I want to inject these extracted chunks into OpenWebUI so that they can be indexed and retrieved via its RAG system.