souvikmajumder26
diff --git a/‎README.md
Lines changed: 7 additions & 5 deletions b/‎README.md
Lines changed: 7 additions & 5 deletions
diff --git a/‎agents/rag_agent/__init__.py
Lines changed: 3 additions & 2 deletions b/‎agents/rag_agent/__init__.py
Lines changed: 3 additions & 2 deletions
diff --git a/‎app.py
Lines changed: 2 additions & 2 deletions b/‎app.py
Lines changed: 2 additions & 2 deletions
diff --git a/‎config.py
Lines changed: 14 additions & 28 deletions b/‎config.py
Lines changed: 14 additions & 28 deletions
diff --git a/‎data/docs_db/0c164d49-ea1d-4fde-bebf-9de5bb77dc1f
Lines changed: 49 additions & 0 deletions b/‎data/docs_db/0c164d49-ea1d-4fde-bebf-9de5bb77dc1f
Lines changed: 49 additions & 0 deletions
diff --git a/‎data/docs_db/b514fecc-dccb-4b54-93ed-229578508f1d renamed to ‎data/docs_db/10b1ed58-6ce1-4ba6-9ef6-6d9b9e5ff888 b/‎data/docs_db/b514fecc-dccb-4b54-93ed-229578508f1d renamed to ‎data/docs_db/10b1ed58-6ce1-4ba6-9ef6-6d9b9e5ff888
@@ -25,7 +25,7 @@
 > 1. **Document Processing Upgrade**: Unstructured.io has been replaced with Docling for document parsing and extraction of text, tables, and images to be embedded.
 > 2. **Enhanced RAG References**: Links to source documents and reference images present in reranked retrieved chunks stored in local storage are added to the bottom of the RAG responses.
 >
-> To use Unstructured.io based solution, refer release - v2.0.
+> To use Unstructured.io based solution, refer release - [v2.0](https://github.com/souvikmajumder26/Multi-Agent-Medical-Assistant/tree/v2.0).
  
 ## 📚 Table of Contents
 - [Overview](#overview)
@@ -116,14 +116,16 @@ If you like what you see and would want to support the project's developer, you
 
 - 🤖 **Multi-Agent Architecture** : Specialized agents working in harmony to handle diagnosis, information retrieval, reasoning, and more
 
-- 🔍 **Advanced RAG Retrieval System** :
+- 🔍 **Advanced Agentic RAG Retrieval System** :
 
   - Docling based parsing to extract text, tables, and images from PDFs.
   - Embedding markdown formatted text, tables and LLM based image summaries.
   - LLM based semantic chunking with structural boundary awareness.
   - LLM based query expansion with related medical domain terms.
   - Qdrant hybrid search combining BM25 sparse keyword search along with dense embedding vector search.
+  - HuggingFace Cross-Encoder based reranking of retrieved document chunks for accurate LLM reponses.
   - Input-output guardrails to ensure safe and relevant responses.
+  - Links to source documents and images present in reference document chunks provided with reponse.
   - Confidence-based agent-to-agent handoff between RAG and Web Search to prevent hallucinations.
 
 - 🏥 **Medical Imaging Analysis**  
@@ -242,7 +244,7 @@ docker run -d \
   -v $(pwd)/uploads:/app/uploads \
   medical-assistant
 ```
-The application will be available at: `http://localhost:8000`
+The application will be available at: [http://localhost:8000](http://localhost:8000)
 
 ### 5️⃣ Ingest Data into Vector DB from Docker Container
 
@@ -293,7 +295,7 @@ docker logs medical-assistant-app
 ```
 
 
-## 📌 Option 2: Manual Installation  <a name="manual-setup"></a>
+## 📌 Option 2: Without Using Docker  <a name="manual-setup"></a>
 
 ### 1️⃣ Clone the Repository  
 ```bash  
@@ -343,7 +345,7 @@ pip install -r requirements.txt
 ```bash
 python app.py
 ```
-The application will be available at: `http://localhost:8000`
+The application will be available at: [http://localhost:8000](http://localhost:8000)
 
 ### 6️⃣ Ingest additional data into the Vector DB
 Run any one of the following commands as required.
 
@@ -52,7 +52,7 @@ def ingest_directory(self, directory_path: str) -> Dict[str, Any]:
                 raise ValueError(f"Directory not found: {directory_path}")
 
             # Get all files in the directory
-            files = [os.path.join(directory_path, f) for f in os.listdir(directory_path) 
+            files = [os.path.join(directory_path + '/', f) for f in os.listdir(directory_path) 
                      if os.path.isfile(os.path.join(directory_path, f))]
 
             if not files:
@@ -210,7 +210,8 @@ def process_query(self, query: str, chat_history: Optional[List[Dict[str, str]]]
             response = self.response_generator.generate_response(
                 query=query,
                 retrieved_docs=reranked_documents,
-                picture_paths=reranked_top_k_picture_paths
+                picture_paths=reranked_top_k_picture_paths,
+                chat_history=chat_history
                 )
 
             # Add timing information
 
@@ -26,7 +26,7 @@
 config = Config()
 
 # Initialize FastAPI app
-app = FastAPI(title="Multi-Agent Medical Chatbot", version="1.0")
+app = FastAPI(title="Multi-Agent Medical Chatbot", version="2.0")
 
 # Set up directories
 UPLOAD_FOLDER = "uploads/backend"
@@ -394,4 +394,4 @@ async def request_entity_too_large(request, exc):
     )
 
 if __name__ == "__main__":
-    uvicorn.run(app, host="127.0.0.1", port=8000)
+    uvicorn.run(app, host=config.api.host, port=config.api.port)
@@ -65,7 +65,6 @@ def __init__(self):
         self.collection_name = "medical_assistance_rag"  # Ensure a valid name
         self.chunk_size = 512  # Modify based on documents and performance
         self.chunk_overlap = 50  # Modify based on documents and performance
-        self.processed_docs_dir = "./data/processed"  # Set a default value
         # self.embedding_model = "text-embedding-3-large"
         # Initialize Azure OpenAI Embeddings
         self.embedding_model = AzureOpenAIEmbeddings(
@@ -109,26 +108,18 @@ def __init__(self):
         )
         self.top_k = 5
         self.vector_search_type = 'similarity'  # or 'mmr'
-        self.similarity_threshold = 0.75
-        self.huggingface_token = os.getenv("HUGGINGFACE_TOKEN")
 
-        self.chunking_strategy = "hybrid" # Options: semantic, sliding_window, recursive, hybrid
+        self.huggingface_token = os.getenv("HUGGINGFACE_TOKEN")
 
         self.reranker_model = "cross-encoder/ms-marco-TinyBERT-L-6"
         self.reranker_top_k = 3
 
-        self.max_context_length = 8192  # ADD THIS LINE (Change based on your need) # 1024 proved to be too low and caused issue (retrieved content length > context length = no context added) in formatting context in response_generator code
-        self.response_format_instructions = """Instructions:
-        1. Answer the query based ONLY on the information provided in the context.
-        2. If the context doesn't contain relevant information to answer the query, state: "I don't have enough information to answer this question based on the provided context."
-        3. Do not use prior knowledge not contained in the context.
-        5. Be concise and accurate.
-        6. Provide a well-structured response based on retrieved knowledge."""  # ADD THIS LINE
-        self.include_sources = True  # ADD THIS LINE
-        self.metrics_save_path = "./logs/rag_metrics.json"  # ADD THIS LINE
+        self.max_context_length = 8192  # (Change based on your need) # 1024 proved to be too low (retrieved content length > context length = no context added) in formatting context in response_generator code
+
+        self.include_sources = True  # Show links to reference documents and images along with corresponding query response
 
         # ADJUST ACCORDING TO ASSISTANT'S BEHAVIOUR BASED ON THE DATA INGESTED:
-        self.min_retrieval_confidence = 0.40  #the auto routing from RAG agent to WEB_SEARCH agent is dependent on this value
+        self.min_retrieval_confidence = 0.40  # The auto routing from RAG agent to WEB_SEARCH agent is dependent on this value
 
         self.context_limit = 20     # include last 20 messsages (10 Q&A pairs) in history
 
@@ -147,21 +138,8 @@ def __init__(self):
             temperature = 0.1  # Keep deterministic for classification tasks
         )
 
-class APIConfig:
-    def __init__(self):
-        self.host = "0.0.0.0"
-        self.port = 8000
-        self.debug = True
-        self.rate_limit = 10
-        self.max_image_upload_size = 5  # 1 MB max upload
-
 class SpeechConfig:
     def __init__(self):
-        # self.tts_voice_id = "EXAVITQu4vr4xnSDxMaL"
-        # self.tts_stability = 0.5
-        # self.tts_similarity_boost = 0.8
-        # self.stt_model = "whisper-1"
-        # self.stt_language = "en"
         self.eleven_labs_api_key = os.getenv("ELEVEN_LABS_API_KEY")  # Replace with your actual key
         self.eleven_labs_voice_id = "21m00Tcm4TlvDq8ikWAM"    # Default voice ID (Rachel)
 
@@ -178,6 +156,14 @@ def __init__(self):
         self.validation_timeout = 300
         self.default_action = "reject"
 
+class APIConfig:
+    def __init__(self):
+        self.host = "0.0.0.0"
+        self.port = 8000
+        self.debug = True
+        self.rate_limit = 10
+        self.max_image_upload_size = 5  # max upload size in MB
+
 class UIConfig:
     def __init__(self):
         self.theme = "light"
@@ -198,7 +184,7 @@ def __init__(self):
         self.ui = UIConfig()
         self.eleven_labs_api_key = os.getenv("ELEVEN_LABS_API_KEY")
         self.tavily_api_key = os.getenv("TAVILY_API_KEY")
-        self.max_conversation_history = 40  # storing 20 sets of QnA in history, history is truncated based on this value
+        self.max_conversation_history = 20  # Include last 20 messsages (10 Q&A pairs) in history
 
 # # Example usage
 # config = Config()
@@ -0,0 +1,49 @@
+## 3.2.6. Star-Shape Loss
+
+In contrast to pixel-wise losses which act on pixels independently and cannot enforce spatial constraints, the star-shape loss (Mirikharaji and Hamarneh, 2018) aims to capture class label dependencies and preserve the target object structure in the predicted segmentation masks. Based upon prior knowledge about the shape of skin lesions, the star-shape loss, L ssh penalizes discontinuous decisions in the estimated output as follows:
+
+$$\mathcal { L } _ { s s h } ( X, Y ; \theta ) = \sum _ { i = 1 } ^ { N } \sum _ { p \in \Omega } \sum _ { q \in \mathcal { I } _ { p c } } \mathbb { 1 } _ { y _ { i p } = y _ { i q } } \times | y _ { i p } - \hat { y } _ { i p } | \times | \hat { y } _ { i p } - \hat { y } _ { i q } |, \\ \intertext { n center. } \mathcal { I } _ { s s h } \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$ } \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime`} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime$} \, \text{$\prime`}}$$
+
+where c is the lesion center, ' pc is the line segment connecting pixels p and c and, q is any pixel lying on ' pc . This loss encourages all pixels lying between p and q on ' pc to be assigned the same estimator whenever p and q have the same ground-truth label. The result is a radial spatial coherence from the lesion center.
+
+
+## 3.2.7. End-Point Error Loss
+
+Many authors consider the lesion boundary the most challenging region to segment. The end-point error loss (Sarker et al., 2018; Singh et al., 2019) underscores borders by using the first derivative of the segmentation masks instead of their raw values:
+
+$$\mathcal { L } _ { e p e } ( X, Y ; \theta ) & = \sum _ { i = 1 } ^ { N } \sum _ { p \in \Omega } \sqrt { ( \xi _ { i p } ^ { 0 } - y _ { i p } ^ { 0 } ) ^ { 2 } + ( \xi _ { i p } ^ { 1 } - y _ { i p } ^ { 1 } ) ^ { 2 } } ),$$
+
+where ˆ 0 y ip and ˆ y 1 ip are the directional first derivatives of the estimated segmentation map in the x and y spatial directions, respectively and, similarly, y 0 ip and y 1 ip for the ground-truth derivatives. Thus, this loss function encourages the magnitude and orientation of edges of estimation and ground-truth to match, thereby mitigating vague boundaries in skin lesion segmentation.
+
+
+## 3.2.8. Adversarial Loss
+
+Another way to add high-order class-label consistency is adversarial training. Adversarial training may be employed along with traditional supervised training to distinguish estimated segmentation from ground-truths using a discriminator. The optimization objective will weight a pixel-wise loss L s matching prediction to ground-truth, and an adversarial loss, as follows:
+
+$$\mathcal { L } _ { a d v } ( X, Y ; \theta, \theta _ { a } ) = \mathcal { L } _ { s } ( X, Y ; \theta ) = \lambda [ \mathcal { L } _ { c e } ( Y, 1 ; \theta _ { a } ) + \mathcal { L } _ { c e } ( \hat { Y }, 0 ; \theta, \theta _ { a } ) ],$$
+
+where GLYPH&lt;18&gt; a are the adversarial model parameters. The adversarial loss employs a binary cross-entropy loss to encourage the segmentation model to produce indistinguishable prediction maps from ground-truth maps. The adversarial objective (Eqn. (16)) is optimized in a mini-max game by simultaneously minimizing it with respect to GLYPH&lt;18&gt; and maximizing it with respect to GLYPH&lt;18&gt; a .
+
+Pixel-wise losses, such as cross-entropy (Izadi et al., 2018; Singh et al., 2019; Jiang et al., 2019), soft Jaccard (Sarker et al., 2019; Tu et al., 2019; Wei et al., 2019), end-point error (Tu et al., 2019; Singh et al., 2019), MSE (Peng et al., 2019) and MAE (Sarker et al., 2019; Singh et al., 2019; Jiang et al., 2019) losses have all been incorporated in adversarial learning of skin lesion segmentation. In addition, Xue et al. (2018) and Tu et al. (2019) presented a multi-scale adversarial term to match a hierarchy of
+
+<!-- page_break -->
+
+local and global contextual features in the predicted maps and ground-truths. In particular, they minimize the MAE of multi-scale features extracted from di GLYPH&lt;11&gt; erent layers of the adversarial model.
+
+
+## 3.2.9. Rank Loss
+
+Assuming that hard-to-predict pixels lead to larger prediction errors while training the model, rank loss (Xie et al., 2020b) is proposed to encourage learning more discriminative information for harder pixels. The image pixels are ranked based on their prediction errors, and the top K pixels with the largest prediction errors from the lesion or background areas are selected. Let ˆ y 0 i j and ˆ y 1 il are respectively the selected j th hard-to-predict pixel of background and l th hard-to-predict pixel of lesion in the image i , we have:
+
+$$\mathcal { L } _ { r a n k } ( X, Y ; \theta ) = \sum _ { i = 1 } ^ { N } \sum _ { j = 1 } ^ { K } \sum _ { l = 1 } ^ { K } \max \{ 0, \hat { y } _ { i j } ^ { 0 } - \hat { y } _ { i l } ^ { 1 } + m a r g i n \},$$
+
+which encourages ˆ 1 y il to be greater than ˆ y 0 i j plus margin.
+
+Similar to rank loss, narrowband suppression loss (Deng et al., 2020) also adds a constraint between hard-to-predict pixels of background and lesion. Di GLYPH&lt;11&gt; erent from rank loss, narrowband suppression loss collects pixels in a narrowband along the groundtruth lesion boundary with radius r instead of all image pixels and then selects the top K pixels with the largest prediction errors.
+
+
+## 4. Evaluation
+
+Evaluation is one of the main challenges for any image segmentation task, skin lesions included (Celebi et al., 2015b). Segmentation evaluation may be subjective or objective (Zhang et al., 2008), the former involving the visual assessment of the results by a panel of human experts, and the latter involving the comparison of the results with ground-truth segmentations using quantitative evaluation metrics.
+
+Subjective evaluation may provide a nuanced assessment of results, but because experts must grade each batch of results, it is usually too laborious to be applied, except in limited settings. In objective assessment, experts are consulted once, to provide the ground-truth segmentations, and that knowledge can then be reused indefinitely. However, due to intra- and inter-annotator variations, it raises the question of whether any individual ground-truth segmentation reflects the ideal 'true' segmentation, an issue we address in Section 4.2. It also raises the issue of choosing one or more evaluation metrics (Section 4.3).