Skip to content

Commit 68f7bbf

Browse files
Merge pull request #61 from souvikmajumder26/dev
Document path and app host modified
2 parents 243eb95 + fb742ff commit 68f7bbf

File tree

51 files changed

+589
-592
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+589
-592
lines changed

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
> 1. **Document Processing Upgrade**: Unstructured.io has been replaced with Docling for document parsing and extraction of text, tables, and images to be embedded.
2626
> 2. **Enhanced RAG References**: Links to source documents and reference images present in reranked retrieved chunks stored in local storage are added to the bottom of the RAG responses.
2727
>
28-
> To use Unstructured.io based solution, refer release - v2.0.
28+
> To use Unstructured.io based solution, refer release - [v2.0](https://github.com/souvikmajumder26/Multi-Agent-Medical-Assistant/tree/v2.0).
2929
3030
## 📚 Table of Contents
3131
- [Overview](#overview)
@@ -116,14 +116,16 @@ If you like what you see and would want to support the project's developer, you
116116

117117
- 🤖 **Multi-Agent Architecture** : Specialized agents working in harmony to handle diagnosis, information retrieval, reasoning, and more
118118

119-
- 🔍 **Advanced RAG Retrieval System** :
119+
- 🔍 **Advanced Agentic RAG Retrieval System** :
120120

121121
- Docling based parsing to extract text, tables, and images from PDFs.
122122
- Embedding markdown formatted text, tables and LLM based image summaries.
123123
- LLM based semantic chunking with structural boundary awareness.
124124
- LLM based query expansion with related medical domain terms.
125125
- Qdrant hybrid search combining BM25 sparse keyword search along with dense embedding vector search.
126+
- HuggingFace Cross-Encoder based reranking of retrieved document chunks for accurate LLM reponses.
126127
- Input-output guardrails to ensure safe and relevant responses.
128+
- Links to source documents and images present in reference document chunks provided with reponse.
127129
- Confidence-based agent-to-agent handoff between RAG and Web Search to prevent hallucinations.
128130

129131
- 🏥 **Medical Imaging Analysis**
@@ -242,7 +244,7 @@ docker run -d \
242244
-v $(pwd)/uploads:/app/uploads \
243245
medical-assistant
244246
```
245-
The application will be available at: `http://localhost:8000`
247+
The application will be available at: [http://localhost:8000](http://localhost:8000)
246248

247249
### 5️⃣ Ingest Data into Vector DB from Docker Container
248250

@@ -293,7 +295,7 @@ docker logs medical-assistant-app
293295
```
294296

295297

296-
## 📌 Option 2: Manual Installation <a name="manual-setup"></a>
298+
## 📌 Option 2: Without Using Docker <a name="manual-setup"></a>
297299

298300
### 1️⃣ Clone the Repository
299301
```bash
@@ -343,7 +345,7 @@ pip install -r requirements.txt
343345
```bash
344346
python app.py
345347
```
346-
The application will be available at: `http://localhost:8000`
348+
The application will be available at: [http://localhost:8000](http://localhost:8000)
347349

348350
### 6️⃣ Ingest additional data into the Vector DB
349351
Run any one of the following commands as required.

agents/rag_agent/__init__.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def ingest_directory(self, directory_path: str) -> Dict[str, Any]:
5252
raise ValueError(f"Directory not found: {directory_path}")
5353

5454
# Get all files in the directory
55-
files = [os.path.join(directory_path, f) for f in os.listdir(directory_path)
55+
files = [os.path.join(directory_path + '/', f) for f in os.listdir(directory_path)
5656
if os.path.isfile(os.path.join(directory_path, f))]
5757

5858
if not files:
@@ -210,7 +210,8 @@ def process_query(self, query: str, chat_history: Optional[List[Dict[str, str]]]
210210
response = self.response_generator.generate_response(
211211
query=query,
212212
retrieved_docs=reranked_documents,
213-
picture_paths=reranked_top_k_picture_paths
213+
picture_paths=reranked_top_k_picture_paths,
214+
chat_history=chat_history
214215
)
215216

216217
# Add timing information

app.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
config = Config()
2727

2828
# Initialize FastAPI app
29-
app = FastAPI(title="Multi-Agent Medical Chatbot", version="1.0")
29+
app = FastAPI(title="Multi-Agent Medical Chatbot", version="2.0")
3030

3131
# Set up directories
3232
UPLOAD_FOLDER = "uploads/backend"
@@ -394,4 +394,4 @@ async def request_entity_too_large(request, exc):
394394
)
395395

396396
if __name__ == "__main__":
397-
uvicorn.run(app, host="127.0.0.1", port=8000)
397+
uvicorn.run(app, host=config.api.host, port=config.api.port)

config.py

Lines changed: 14 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ def __init__(self):
6565
self.collection_name = "medical_assistance_rag" # Ensure a valid name
6666
self.chunk_size = 512 # Modify based on documents and performance
6767
self.chunk_overlap = 50 # Modify based on documents and performance
68-
self.processed_docs_dir = "./data/processed" # Set a default value
6968
# self.embedding_model = "text-embedding-3-large"
7069
# Initialize Azure OpenAI Embeddings
7170
self.embedding_model = AzureOpenAIEmbeddings(
@@ -109,26 +108,18 @@ def __init__(self):
109108
)
110109
self.top_k = 5
111110
self.vector_search_type = 'similarity' # or 'mmr'
112-
self.similarity_threshold = 0.75
113-
self.huggingface_token = os.getenv("HUGGINGFACE_TOKEN")
114111

115-
self.chunking_strategy = "hybrid" # Options: semantic, sliding_window, recursive, hybrid
112+
self.huggingface_token = os.getenv("HUGGINGFACE_TOKEN")
116113

117114
self.reranker_model = "cross-encoder/ms-marco-TinyBERT-L-6"
118115
self.reranker_top_k = 3
119116

120-
self.max_context_length = 8192 # ADD THIS LINE (Change based on your need) # 1024 proved to be too low and caused issue (retrieved content length > context length = no context added) in formatting context in response_generator code
121-
self.response_format_instructions = """Instructions:
122-
1. Answer the query based ONLY on the information provided in the context.
123-
2. If the context doesn't contain relevant information to answer the query, state: "I don't have enough information to answer this question based on the provided context."
124-
3. Do not use prior knowledge not contained in the context.
125-
5. Be concise and accurate.
126-
6. Provide a well-structured response based on retrieved knowledge.""" # ADD THIS LINE
127-
self.include_sources = True # ADD THIS LINE
128-
self.metrics_save_path = "./logs/rag_metrics.json" # ADD THIS LINE
117+
self.max_context_length = 8192 # (Change based on your need) # 1024 proved to be too low (retrieved content length > context length = no context added) in formatting context in response_generator code
118+
119+
self.include_sources = True # Show links to reference documents and images along with corresponding query response
129120

130121
# ADJUST ACCORDING TO ASSISTANT'S BEHAVIOUR BASED ON THE DATA INGESTED:
131-
self.min_retrieval_confidence = 0.40 #the auto routing from RAG agent to WEB_SEARCH agent is dependent on this value
122+
self.min_retrieval_confidence = 0.40 # The auto routing from RAG agent to WEB_SEARCH agent is dependent on this value
132123

133124
self.context_limit = 20 # include last 20 messsages (10 Q&A pairs) in history
134125

@@ -147,21 +138,8 @@ def __init__(self):
147138
temperature = 0.1 # Keep deterministic for classification tasks
148139
)
149140

150-
class APIConfig:
151-
def __init__(self):
152-
self.host = "0.0.0.0"
153-
self.port = 8000
154-
self.debug = True
155-
self.rate_limit = 10
156-
self.max_image_upload_size = 5 # 1 MB max upload
157-
158141
class SpeechConfig:
159142
def __init__(self):
160-
# self.tts_voice_id = "EXAVITQu4vr4xnSDxMaL"
161-
# self.tts_stability = 0.5
162-
# self.tts_similarity_boost = 0.8
163-
# self.stt_model = "whisper-1"
164-
# self.stt_language = "en"
165143
self.eleven_labs_api_key = os.getenv("ELEVEN_LABS_API_KEY") # Replace with your actual key
166144
self.eleven_labs_voice_id = "21m00Tcm4TlvDq8ikWAM" # Default voice ID (Rachel)
167145

@@ -178,6 +156,14 @@ def __init__(self):
178156
self.validation_timeout = 300
179157
self.default_action = "reject"
180158

159+
class APIConfig:
160+
def __init__(self):
161+
self.host = "0.0.0.0"
162+
self.port = 8000
163+
self.debug = True
164+
self.rate_limit = 10
165+
self.max_image_upload_size = 5 # max upload size in MB
166+
181167
class UIConfig:
182168
def __init__(self):
183169
self.theme = "light"
@@ -198,7 +184,7 @@ def __init__(self):
198184
self.ui = UIConfig()
199185
self.eleven_labs_api_key = os.getenv("ELEVEN_LABS_API_KEY")
200186
self.tavily_api_key = os.getenv("TAVILY_API_KEY")
201-
self.max_conversation_history = 40 # storing 20 sets of QnA in history, history is truncated based on this value
187+
self.max_conversation_history = 20 # Include last 20 messsages (10 Q&A pairs) in history
202188

203189
# # Example usage
204190
# config = Config()
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
## 3.2.6. Star-Shape Loss
2+
3+
In contrast to pixel-wise losses which act on pixels independently and cannot enforce spatial constraints, the star-shape loss (Mirikharaji and Hamarneh, 2018) aims to capture class label dependencies and preserve the target object structure in the predicted segmentation masks. Based upon prior knowledge about the shape of skin lesions, the star-shape loss, L ssh penalizes discontinuous decisions in the estimated output as follows:
4+
5+
$$\mathcal { L } _ { s s h } ( X, Y ; \theta ) = \sum _ { i = 1 } ^ { N } \sum _ { p \in \Omega } \sum _ { q \in \mathcal { I } _ { p c } } \mathbb { 1 } _ { y _ { i p } = y _ { i q } } \times | y _ { i p } - \hat { y } _ { i p } | \times | \hat { y } _ { i p } - \hat { y } _ { i q } |, \\ \intertext { n center. } \mathcal { I } _ { s s h } \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$ } \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime`} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime`}}$$
6+
7+
where c is the lesion center, ' pc is the line segment connecting pixels p and c and, q is any pixel lying on ' pc . This loss encourages all pixels lying between p and q on ' pc to be assigned the same estimator whenever p and q have the same ground-truth label. The result is a radial spatial coherence from the lesion center.
8+
9+
10+
## 3.2.7. End-Point Error Loss
11+
12+
Many authors consider the lesion boundary the most challenging region to segment. The end-point error loss (Sarker et al., 2018; Singh et al., 2019) underscores borders by using the first derivative of the segmentation masks instead of their raw values:
13+
14+
$$\mathcal { L } _ { e p e } ( X, Y ; \theta ) & = \sum _ { i = 1 } ^ { N } \sum _ { p \in \Omega } \sqrt { ( \xi _ { i p } ^ { 0 } - y _ { i p } ^ { 0 } ) ^ { 2 } + ( \xi _ { i p } ^ { 1 } - y _ { i p } ^ { 1 } ) ^ { 2 } } ),$$
15+
16+
where ˆ 0 y ip and ˆ y 1 ip are the directional first derivatives of the estimated segmentation map in the x and y spatial directions, respectively and, similarly, y 0 ip and y 1 ip for the ground-truth derivatives. Thus, this loss function encourages the magnitude and orientation of edges of estimation and ground-truth to match, thereby mitigating vague boundaries in skin lesion segmentation.
17+
18+
19+
## 3.2.8. Adversarial Loss
20+
21+
Another way to add high-order class-label consistency is adversarial training. Adversarial training may be employed along with traditional supervised training to distinguish estimated segmentation from ground-truths using a discriminator. The optimization objective will weight a pixel-wise loss L s matching prediction to ground-truth, and an adversarial loss, as follows:
22+
23+
$$\mathcal { L } _ { a d v } ( X, Y ; \theta, \theta _ { a } ) = \mathcal { L } _ { s } ( X, Y ; \theta ) = \lambda [ \mathcal { L } _ { c e } ( Y, 1 ; \theta _ { a } ) + \mathcal { L } _ { c e } ( \hat { Y }, 0 ; \theta, \theta _ { a } ) ],$$
24+
25+
where GLYPH&lt;18&gt; a are the adversarial model parameters. The adversarial loss employs a binary cross-entropy loss to encourage the segmentation model to produce indistinguishable prediction maps from ground-truth maps. The adversarial objective (Eqn. (16)) is optimized in a mini-max game by simultaneously minimizing it with respect to GLYPH&lt;18&gt; and maximizing it with respect to GLYPH&lt;18&gt; a .
26+
27+
Pixel-wise losses, such as cross-entropy (Izadi et al., 2018; Singh et al., 2019; Jiang et al., 2019), soft Jaccard (Sarker et al., 2019; Tu et al., 2019; Wei et al., 2019), end-point error (Tu et al., 2019; Singh et al., 2019), MSE (Peng et al., 2019) and MAE (Sarker et al., 2019; Singh et al., 2019; Jiang et al., 2019) losses have all been incorporated in adversarial learning of skin lesion segmentation. In addition, Xue et al. (2018) and Tu et al. (2019) presented a multi-scale adversarial term to match a hierarchy of
28+
29+
<!-- page_break -->
30+
31+
local and global contextual features in the predicted maps and ground-truths. In particular, they minimize the MAE of multi-scale features extracted from di GLYPH&lt;11&gt; erent layers of the adversarial model.
32+
33+
34+
## 3.2.9. Rank Loss
35+
36+
Assuming that hard-to-predict pixels lead to larger prediction errors while training the model, rank loss (Xie et al., 2020b) is proposed to encourage learning more discriminative information for harder pixels. The image pixels are ranked based on their prediction errors, and the top K pixels with the largest prediction errors from the lesion or background areas are selected. Let ˆ y 0 i j and ˆ y 1 il are respectively the selected j th hard-to-predict pixel of background and l th hard-to-predict pixel of lesion in the image i , we have:
37+
38+
$$\mathcal { L } _ { r a n k } ( X, Y ; \theta ) = \sum _ { i = 1 } ^ { N } \sum _ { j = 1 } ^ { K } \sum _ { l = 1 } ^ { K } \max \{ 0, \hat { y } _ { i j } ^ { 0 } - \hat { y } _ { i l } ^ { 1 } + m a r g i n \},$$
39+
40+
which encourages ˆ 1 y il to be greater than ˆ y 0 i j plus margin.
41+
42+
Similar to rank loss, narrowband suppression loss (Deng et al., 2020) also adds a constraint between hard-to-predict pixels of background and lesion. Di GLYPH&lt;11&gt; erent from rank loss, narrowband suppression loss collects pixels in a narrowband along the groundtruth lesion boundary with radius r instead of all image pixels and then selects the top K pixels with the largest prediction errors.
43+
44+
45+
## 4. Evaluation
46+
47+
Evaluation is one of the main challenges for any image segmentation task, skin lesions included (Celebi et al., 2015b). Segmentation evaluation may be subjective or objective (Zhang et al., 2008), the former involving the visual assessment of the results by a panel of human experts, and the latter involving the comparison of the results with ground-truth segmentations using quantitative evaluation metrics.
48+
49+
Subjective evaluation may provide a nuanced assessment of results, but because experts must grade each batch of results, it is usually too laborious to be applied, except in limited settings. In objective assessment, experts are consulted once, to provide the ground-truth segmentations, and that knowledge can then be reused indefinitely. However, due to intra- and inter-annotator variations, it raises the question of whether any individual ground-truth segmentation reflects the ideal 'true' segmentation, an issue we address in Section 4.2. It also raises the issue of choosing one or more evaluation metrics (Section 4.3).

0 commit comments

Comments
 (0)