You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@
25
25
> 1.**Document Processing Upgrade**: Unstructured.io has been replaced with Docling for document parsing and extraction of text, tables, and images to be embedded.
26
26
> 2.**Enhanced RAG References**: Links to source documents and reference images present in reranked retrieved chunks stored in local storage are added to the bottom of the RAG responses.
27
27
>
28
-
> To use Unstructured.io based solution, refer release - v2.0.
28
+
> To use Unstructured.io based solution, refer release - [v2.0](https://github.com/souvikmajumder26/Multi-Agent-Medical-Assistant/tree/v2.0).
29
29
30
30
## 📚 Table of Contents
31
31
-[Overview](#overview)
@@ -116,14 +116,16 @@ If you like what you see and would want to support the project's developer, you
116
116
117
117
- 🤖 **Multi-Agent Architecture** : Specialized agents working in harmony to handle diagnosis, information retrieval, reasoning, and more
118
118
119
-
- 🔍 **Advanced RAG Retrieval System** :
119
+
- 🔍 **Advanced Agentic RAG Retrieval System** :
120
120
121
121
- Docling based parsing to extract text, tables, and images from PDFs.
122
122
- Embedding markdown formatted text, tables and LLM based image summaries.
123
123
- LLM based semantic chunking with structural boundary awareness.
124
124
- LLM based query expansion with related medical domain terms.
125
125
- Qdrant hybrid search combining BM25 sparse keyword search along with dense embedding vector search.
126
+
- HuggingFace Cross-Encoder based reranking of retrieved document chunks for accurate LLM reponses.
126
127
- Input-output guardrails to ensure safe and relevant responses.
128
+
- Links to source documents and images present in reference document chunks provided with reponse.
127
129
- Confidence-based agent-to-agent handoff between RAG and Web Search to prevent hallucinations.
128
130
129
131
- 🏥 **Medical Imaging Analysis**
@@ -242,7 +244,7 @@ docker run -d \
242
244
-v $(pwd)/uploads:/app/uploads \
243
245
medical-assistant
244
246
```
245
-
The application will be available at: `http://localhost:8000`
247
+
The application will be available at: [http://localhost:8000](http://localhost:8000)
246
248
247
249
### 5️⃣ Ingest Data into Vector DB from Docker Container
self.max_context_length=8192# ADD THIS LINE (Change based on your need) # 1024 proved to be too low and caused issue (retrieved content length > context length = no context added) in formatting context in response_generator code
1. Answer the query based ONLY on the information provided in the context.
123
-
2. If the context doesn't contain relevant information to answer the query, state: "I don't have enough information to answer this question based on the provided context."
124
-
3. Do not use prior knowledge not contained in the context.
125
-
5. Be concise and accurate.
126
-
6. Provide a well-structured response based on retrieved knowledge."""# ADD THIS LINE
127
-
self.include_sources=True# ADD THIS LINE
128
-
self.metrics_save_path="./logs/rag_metrics.json"# ADD THIS LINE
117
+
self.max_context_length=8192# (Change based on your need) # 1024 proved to be too low (retrieved content length > context length = no context added) in formatting context in response_generator code
118
+
119
+
self.include_sources=True# Show links to reference documents and images along with corresponding query response
129
120
130
121
# ADJUST ACCORDING TO ASSISTANT'S BEHAVIOUR BASED ON THE DATA INGESTED:
131
-
self.min_retrieval_confidence=0.40#the auto routing from RAG agent to WEB_SEARCH agent is dependent on this value
122
+
self.min_retrieval_confidence=0.40# The auto routing from RAG agent to WEB_SEARCH agent is dependent on this value
132
123
133
124
self.context_limit=20# include last 20 messsages (10 Q&A pairs) in history
134
125
@@ -147,21 +138,8 @@ def __init__(self):
147
138
temperature=0.1# Keep deterministic for classification tasks
148
139
)
149
140
150
-
classAPIConfig:
151
-
def__init__(self):
152
-
self.host="0.0.0.0"
153
-
self.port=8000
154
-
self.debug=True
155
-
self.rate_limit=10
156
-
self.max_image_upload_size=5# 1 MB max upload
157
-
158
141
classSpeechConfig:
159
142
def__init__(self):
160
-
# self.tts_voice_id = "EXAVITQu4vr4xnSDxMaL"
161
-
# self.tts_stability = 0.5
162
-
# self.tts_similarity_boost = 0.8
163
-
# self.stt_model = "whisper-1"
164
-
# self.stt_language = "en"
165
143
self.eleven_labs_api_key=os.getenv("ELEVEN_LABS_API_KEY") # Replace with your actual key
166
144
self.eleven_labs_voice_id="21m00Tcm4TlvDq8ikWAM"# Default voice ID (Rachel)
167
145
@@ -178,6 +156,14 @@ def __init__(self):
178
156
self.validation_timeout=300
179
157
self.default_action="reject"
180
158
159
+
classAPIConfig:
160
+
def__init__(self):
161
+
self.host="0.0.0.0"
162
+
self.port=8000
163
+
self.debug=True
164
+
self.rate_limit=10
165
+
self.max_image_upload_size=5# max upload size in MB
In contrast to pixel-wise losses which act on pixels independently and cannot enforce spatial constraints, the star-shape loss (Mirikharaji and Hamarneh, 2018) aims to capture class label dependencies and preserve the target object structure in the predicted segmentation masks. Based upon prior knowledge about the shape of skin lesions, the star-shape loss, L ssh penalizes discontinuous decisions in the estimated output as follows:
4
+
5
+
$$\mathcal { L } _ { s s h } ( X, Y ; \theta ) = \sum _ { i = 1 } ^ { N } \sum _ { p \in \Omega } \sum _ { q \in \mathcal { I } _ { p c } } \mathbb { 1 } _ { y _ { i p } = y _ { i q } } \times | y _ { i p } - \hat { y } _ { i p } | \times | \hat { y } _ { i p } - \hat { y } _ { i q } |, \\ \intertext { n center. } \mathcal { I } _ { s s h } \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$ } \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime`} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime`}}$$
6
+
7
+
where c is the lesion center, ' pc is the line segment connecting pixels p and c and, q is any pixel lying on ' pc . This loss encourages all pixels lying between p and q on ' pc to be assigned the same estimator whenever p and q have the same ground-truth label. The result is a radial spatial coherence from the lesion center.
8
+
9
+
10
+
## 3.2.7. End-Point Error Loss
11
+
12
+
Many authors consider the lesion boundary the most challenging region to segment. The end-point error loss (Sarker et al., 2018; Singh et al., 2019) underscores borders by using the first derivative of the segmentation masks instead of their raw values:
13
+
14
+
$$\mathcal { L } _ { e p e } ( X, Y ; \theta ) & = \sum _ { i = 1 } ^ { N } \sum _ { p \in \Omega } \sqrt { ( \xi _ { i p } ^ { 0 } - y _ { i p } ^ { 0 } ) ^ { 2 } + ( \xi _ { i p } ^ { 1 } - y _ { i p } ^ { 1 } ) ^ { 2 } } ),$$
15
+
16
+
where ˆ 0 y ip and ˆ y 1 ip are the directional first derivatives of the estimated segmentation map in the x and y spatial directions, respectively and, similarly, y 0 ip and y 1 ip for the ground-truth derivatives. Thus, this loss function encourages the magnitude and orientation of edges of estimation and ground-truth to match, thereby mitigating vague boundaries in skin lesion segmentation.
17
+
18
+
19
+
## 3.2.8. Adversarial Loss
20
+
21
+
Another way to add high-order class-label consistency is adversarial training. Adversarial training may be employed along with traditional supervised training to distinguish estimated segmentation from ground-truths using a discriminator. The optimization objective will weight a pixel-wise loss L s matching prediction to ground-truth, and an adversarial loss, as follows:
22
+
23
+
$$\mathcal { L } _ { a d v } ( X, Y ; \theta, \theta _ { a } ) = \mathcal { L } _ { s } ( X, Y ; \theta ) = \lambda [ \mathcal { L } _ { c e } ( Y, 1 ; \theta _ { a } ) + \mathcal { L } _ { c e } ( \hat { Y }, 0 ; \theta, \theta _ { a } ) ],$$
24
+
25
+
where GLYPH<18> a are the adversarial model parameters. The adversarial loss employs a binary cross-entropy loss to encourage the segmentation model to produce indistinguishable prediction maps from ground-truth maps. The adversarial objective (Eqn. (16)) is optimized in a mini-max game by simultaneously minimizing it with respect to GLYPH<18> and maximizing it with respect to GLYPH<18> a .
26
+
27
+
Pixel-wise losses, such as cross-entropy (Izadi et al., 2018; Singh et al., 2019; Jiang et al., 2019), soft Jaccard (Sarker et al., 2019; Tu et al., 2019; Wei et al., 2019), end-point error (Tu et al., 2019; Singh et al., 2019), MSE (Peng et al., 2019) and MAE (Sarker et al., 2019; Singh et al., 2019; Jiang et al., 2019) losses have all been incorporated in adversarial learning of skin lesion segmentation. In addition, Xue et al. (2018) and Tu et al. (2019) presented a multi-scale adversarial term to match a hierarchy of
28
+
29
+
<!-- page_break -->
30
+
31
+
local and global contextual features in the predicted maps and ground-truths. In particular, they minimize the MAE of multi-scale features extracted from di GLYPH<11> erent layers of the adversarial model.
32
+
33
+
34
+
## 3.2.9. Rank Loss
35
+
36
+
Assuming that hard-to-predict pixels lead to larger prediction errors while training the model, rank loss (Xie et al., 2020b) is proposed to encourage learning more discriminative information for harder pixels. The image pixels are ranked based on their prediction errors, and the top K pixels with the largest prediction errors from the lesion or background areas are selected. Let ˆ y 0 i j and ˆ y 1 il are respectively the selected j th hard-to-predict pixel of background and l th hard-to-predict pixel of lesion in the image i , we have:
37
+
38
+
$$\mathcal { L } _ { r a n k } ( X, Y ; \theta ) = \sum _ { i = 1 } ^ { N } \sum _ { j = 1 } ^ { K } \sum _ { l = 1 } ^ { K } \max \{ 0, \hat { y } _ { i j } ^ { 0 } - \hat { y } _ { i l } ^ { 1 } + m a r g i n \},$$
39
+
40
+
which encourages ˆ 1 y il to be greater than ˆ y 0 i j plus margin.
41
+
42
+
Similar to rank loss, narrowband suppression loss (Deng et al., 2020) also adds a constraint between hard-to-predict pixels of background and lesion. Di GLYPH<11> erent from rank loss, narrowband suppression loss collects pixels in a narrowband along the groundtruth lesion boundary with radius r instead of all image pixels and then selects the top K pixels with the largest prediction errors.
43
+
44
+
45
+
## 4. Evaluation
46
+
47
+
Evaluation is one of the main challenges for any image segmentation task, skin lesions included (Celebi et al., 2015b). Segmentation evaluation may be subjective or objective (Zhang et al., 2008), the former involving the visual assessment of the results by a panel of human experts, and the latter involving the comparison of the results with ground-truth segmentations using quantitative evaluation metrics.
48
+
49
+
Subjective evaluation may provide a nuanced assessment of results, but because experts must grade each batch of results, it is usually too laborious to be applied, except in limited settings. In objective assessment, experts are consulted once, to provide the ground-truth segmentations, and that knowledge can then be reused indefinitely. However, due to intra- and inter-annotator variations, it raises the question of whether any individual ground-truth segmentation reflects the ideal 'true' segmentation, an issue we address in Section 4.2. It also raises the issue of choosing one or more evaluation metrics (Section 4.3).
0 commit comments