A powerful document analysis and question-answering system built with Streamlit, LangChain, and Ollama. This application enables users to interact with their documents through natural language queries, leveraging advanced RAG (Retrieval Augmented Generation) technology.
- 📄 Multi-format Document Support (PDF, CSV, TXT)
- 💬 Interactive Chat Interface
- 🔄 Batch Query Processing
- 📊 Document Processing with Advanced RAG
- 🚀 Optimized Retrieval System
- 📥 Exportable Results in JSON Format
- 🔄 Real-time Streaming Responses
- 🎯 Context-aware Document Analysis
- Python 3.8+
- Streamlit
- LangChain
- Ollama (running locally or on a remote server)
- FAISS for vector storage
- Clone the repository:
git clone https://github.com/tankwin08/doc_intelligence_process.git
cd doc_intelligence_process
- Install the required dependencies:
pip install -r requirements.txt
- Start the Streamlit app:
streamlit run streamlit_app.py
- Open your browser and navigate to the URL displayed in the terminal (typically http://localhost:8501 )
- PDF (.pdf)
- CSV (.csv)
- Text (.txt)
- Word Documents (.docx, .doc)
- PowerPoint (.ppt, .pptx)
- Images (.jpg, .jpeg, .png)
The application uses Ollama to run LLMs locally. By default, it uses:
- deepseek-r1:latest for text generation
- nomic-embed-text for embeddings You can change the model in the sidebar of the application.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.