Dynamic Script Chunking for interviews and presentations #6800
Unanswered
TomLucidor
asked this question in
Questions
Replies: 1 comment
-
Also spotted this for reference https://www.assemblyai.com/blog/text-segmentation-approaches-datasets-and-evaluation-metrics/ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently the
DocumentSplitter
method only splits text by sentences, passages (paragrpah equivalent), and pages, and then bounds them by set chunks with overlaps. Sometimes however related information could be pushed into two seperate chunks. This might not work for oral contentReference to software https://github.com/deepset-ai/haystack/blob/main/haystack/components/preprocessors/document_splitter.py
For context, if the "document" in question is an interview script without well defined paragraphs/passage or even pages, and only large sets of sentences in a timeline, how can splitting the script be possible without losing surrounding context?
Idea borrowed from https://github.com/nicktill/YTRecap/blob/main/src/app.py
Side note: Video chatbots can be made https://github.com/Anil-matcha/Youtube-to-chatbot
Beta Was this translation helpful? Give feedback.
All reactions