Skip to content

Commit 8d08f37

Browse files
committed
Added documentation & Screenshot
1 parent 09c507a commit 8d08f37

File tree

4 files changed

+204
-0
lines changed

4 files changed

+204
-0
lines changed

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 [VISHAL VERMA (stacksapien)]
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# QA with Your Documents
2+
3+
This project provides an interactive Streamlit-based web application that allows users to upload PDF and CSV files, store their content in a vector database using LangChain and Chroma, and query the uploaded documents using OpenAI's LLMs (e.g., GPT-3.5-turbo). The app intelligently retrieves relevant information from the documents and provides citations for the sources.
4+
5+
---
6+
7+
## Features
8+
9+
- **Upload and Process Documents**:
10+
- Upload multiple PDF and CSV files.
11+
- Extract content using LangChain's document loaders.
12+
- **Vector Database Storage**:
13+
14+
- Store document embeddings in a persistent Chroma vector database.
15+
16+
- **Interactive Query System**:
17+
18+
- Ask questions about the uploaded documents.
19+
- Retrieve answers along with source citations.
20+
21+
- **Download Cited Files**:
22+
- Easily download files cited in the query response.
23+
24+
---
25+
26+
## Technologies Used
27+
28+
- **Streamlit**: For creating the web interface.
29+
- **LangChain**: For document processing and retrieval.
30+
- **Chroma**: As the vector database for storing embeddings.
31+
- **OpenAI API**: For LLM-based query answering.
32+
- **Python**: The core language for building the application.
33+
34+
---
35+
36+
## Installation
37+
38+
### Prerequisites
39+
40+
- Python 3.10 or later
41+
- OpenAI API Key
42+
43+
### Steps
44+
45+
1. **Clone the Repository**:
46+
47+
```bash
48+
git clone git@github.com:stacksapien/smart-doc-search.git
49+
cd smart-doc-search
50+
```
51+
52+
2. **Set Up a Virtual Environment**:
53+
54+
```bash
55+
python3 -m venv env
56+
source env/bin/activate # On Windows: .\\env\\Scripts\\activate
57+
```
58+
59+
3. **Install Dependencies**:
60+
61+
```bash
62+
pip install -r requirements.txt
63+
```
64+
65+
4. **Configure Environment Variables**:
66+
Create a file named `.env` in the root directory and add your OpenAI API key:
67+
68+
```
69+
OPENAI_API_KEY=your_openai_api_key
70+
```
71+
72+
5. **Run the Application**:
73+
74+
```bash
75+
streamlit run app.py
76+
```
77+
78+
6. **Access the App**:
79+
Open your browser and navigate to:
80+
```
81+
http://localhost:8501
82+
```
83+
84+
---
85+
86+
## Usage
87+
88+
### Upload Files
89+
90+
- Upload one or more PDF or CSV files using the file uploader.
91+
- Uploaded files are processed and stored in the `uploaded_files` directory.
92+
93+
### Ask Questions
94+
95+
- Enter your query in the text box provided.
96+
- The app retrieves relevant answers from the uploaded documents and displays the sources.
97+
98+
### Download Cited Files
99+
100+
- Files cited in the response are available for download.
101+
102+
---
103+
104+
## File Structure
105+
106+
```
107+
smart-doc-search/
108+
109+
├── app.py # Main Streamlit application
110+
├── requirements.txt # List of Python dependencies
111+
├── .env # Environment variables (not included in Git)
112+
├── uploaded_files/ # Directory for storing uploaded files
113+
├── chromadb/ # Directory for persistent Chroma vector database
114+
└── README.md # Project documentation
115+
```
116+
117+
---
118+
119+
## Deployment
120+
121+
### Deploy on AWS EC2
122+
123+
1. Launch an Ubuntu EC2 instance and configure security groups to allow inbound traffic on ports 22 and 8501.
124+
2. SSH into the instance and set up Python, Streamlit, and the application as per the installation instructions.
125+
3. Use a process manager like `tmux` or `screen` to keep the app running.
126+
127+
### Use Custom Domain
128+
129+
- Configure a reverse proxy (e.g., Nginx) to serve the Streamlit app under your domain.
130+
- Enable HTTPS using Certbot for SSL certificates.
131+
132+
---
133+
134+
## Contributing
135+
136+
Contributions are welcome! Please follow these steps:
137+
138+
1. Fork the repository.
139+
2. Create a new branch: `git checkout -b feature-name`
140+
3. Commit your changes: `git commit -m "Add feature-name"`
141+
4. Push to the branch: `git push origin feature-name`
142+
5. Submit a pull request.
143+
144+
---
145+
146+
## License
147+
148+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
149+
150+
---
151+
152+
## Acknowledgments
153+
154+
- [LangChain](https://langchain.com)
155+
- [Streamlit](https://streamlit.io)
156+
- [Chroma](https://www.trychroma.com)
157+
- [OpenAI](https://openai.com)
158+
159+
---
160+
161+
## Issues
162+
163+
If you encounter any issues or have feature requests, please [open an issue](https://github.com/stacksapien/smart-doc-search/issues).
164+
165+
---
166+
167+
## Screenshots
168+
169+
### Upload Documents
170+
171+
![Upload Documents](screenshots/1.png)
172+
173+
### Query and Get Results
174+
175+
![Query Results](screenshots/2.png)
176+
177+
---
178+
179+
## Author
180+
181+
- **Vishal Verma** - [LinkedIn](https://www.linkedin.com/in/stacksapien)
182+
183+
Feel free to reach out with any questions or feedback!

screenshots/1.png

179 KB
Loading

screenshots/2.png

356 KB
Loading

0 commit comments

Comments
 (0)