Skip to content

Commit 07baaa0

Browse files
add docs for DocMuncher (#529)
1 parent b2b46d3 commit 07baaa0

File tree

6 files changed

+65
-1
lines changed

6 files changed

+65
-1
lines changed

docs/blog/posts/docmuncher.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
category:
3+
- Admin App
4+
- API
5+
date: 2025-04-15
6+
---
7+
8+
# Eating documents for breakfast -- introducing DocMuncher
9+
10+
Content cards are great, but what if all your information lives in documents? No problem -- now you can just upload the documents and turn them into cards!
11+
12+
<!-- more -->
13+
14+
## What does DocMuncher do?
15+
16+
On the Admin App, you can now upload either .zip files or .pdfs, in addition to creating content cards directly.
17+
18+
The backend then converts the document into bite-sized chunks that get turned into content cards.
19+
20+
This is the first version that we have for ingesting documents, but we're planning to build clustering for semantically similar cards and paraphrasing them so they're more readable.
21+
22+
## Doc references
23+
- [DocMuncher](../../components/docmuncherce/index.md)
101 KB
Loading

docs/components/docmuncher/index.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Document Ingestion aka DocMuncher
2+
3+
`/docmuncher` is the endpoint that allows document uploads. We then chunk each uploaded document to create content cards tagged with each document name.
4+
This content is then searchable using the `/search` endpoints.
5+
6+
<img src="./docmuncher_api.png">
7+
8+
There are specifically two endpoints: the `POST` endpoint accepts document uploads (.pdf or .zip) and creates FastAPI background tasks for each document uploaded. The `GET` endpoints return the status of the created jobs.
9+
10+
## Process flow for document ingestion
11+
```mermaid
12+
sequenceDiagram
13+
autonumber
14+
Admin App->>Backend: Document upload
15+
Backend->>MistralAI: Convert document test to Markdown
16+
MistralAI->>LangChain: Chunk Markdown text by headers
17+
LangChain->>Backend: <Markdown chunks>
18+
Backend->>Backend: Post-process chunks for continuity, formatting
19+
Backend->>Admin App: <Final content cards + tags>
20+
```
21+
22+
23+
## Upcoming
24+
25+
- [ ] Semantiic clustering: cluster cards with similar content
26+
- [ ] Paraphrasing cards for brevity

docs/components/index.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,4 +83,13 @@ There are 5 main components in Ask-A-Question.
8383

8484
[:octicons-arrow-right-24: More info](./voice-service/index.md)
8585

86+
87+
- :material-api:{ .lg .middle .red} __Document Ingestion__
88+
89+
---
90+
91+
_(Optional)_ Supports .pdf or .zip file ingestion to create content cards
92+
93+
[:octicons-arrow-right-24: More info](./docmuncher/index.md)
94+
8695
</div>

docs/roadmap.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
| | Content Tags | :white_check_mark: | Add tags to your content for easy browsing (and more to come!) |
1111
| **Q3 2024** | Analytics for Feedback and Content | :white_check_mark: | See content use, questions that receive poor feedback, missing content, and more |
1212
| | Voice notes support | :white_check_mark: | Automatic Speech Recognition for audio message to content matching |
13-
| | Multi-turn chat | :pencil: | Refine or clarify user question through conversation. |
13+
| | Multi-turn chat | :white_check_mark: | Refine or clarify user question through conversation. |
1414
| | Engineering Dashboard | :pencil: | Monitor uptime, response rates, throughput HTTP response codes |
1515
| Q4 2024 | Personalization and contextualization | :pencil: | Use contextual information to improve responses |
1616
| | Multimedia content | :pencil: | Respond with not just text but images and audio as well. |

mkdocs.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,16 +44,22 @@ nav:
4444
- Content Feedback: components/qa-service/content-feedback.md
4545
- Urgency Detection Service:
4646
- components/urgency-detection/index.md
47+
- Workspaces:
48+
- components/workspaces/index.md
4749
- Internal components:
4850
- LLM Proxy Server:
4951
- components/litellm-proxy/index.md
5052
- Hugging Face Embeddings:
5153
- components/huggingface-embeddings/index.md
5254
- How to use: components/huggingface-embeddings/how-to-use.md
55+
- Multi-turn Chat:
56+
- components/multi-turn-chat/index.md
5357
- Voice Service:
5458
- components/voice-service/index.md
5559
- In-house models: components/voice-service/in-house-models.md
5660
- External APIs: components/voice-service/external-apis.md
61+
- DocMuncher:
62+
- components/docmuncher/index.md
5763

5864
- Integrations:
5965
- integrations/index.md

0 commit comments

Comments
 (0)