Split documents by page #3074
aleksitukiainen
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
Hi @aleksitukiainen! I think this feature would be a nice extension of #2932 where we added the page number as metadata to Documents. It would be nice if you could raise a feature request issue for this. Also, would you be interested in making a contribution to Haystack with this feature? Otherwise, we will put it in our backlog and work on this ourselves. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I think one natural extension of the PreProcessor.split(split_by: str) method would be to also enable splitting a document by page. Often a single page of a document contains a specific set of content that is about the same subtopic and thus splitting by page would be a great feature. Unsure how much others might be needing this, but given document splitting is one of the key ways of making sizable chunks for retrievers and readers, I feel like it will be a useful addition.
I'm currently needing this, but will probably build a manual work-around.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions