SiteBot is a Streamlit-based web application that allows users to interact with website content intelligently. By loading website URLs, users can ask questions about the content, and SiteBot provides accurate, context-aware responses using Retrieval-Augmented Generation (RAG) powered by Groq and Chroma vector stores.
- Website Content Analysis: Load multiple website URLs and query their content.
- Conversational Interface: Engage in a chat-like interaction with the assistant, maintaining conversation history.
- Suggested Queries: Automatically generates relevant questions based on loaded website content.
- Multi-Website Insights: Combines information from multiple websites, highlighting agreements, differences, and unique perspectives.
- Modern UI: Built with Streamlit, Tailwind CSS, and custom styling for an enhanced user experience.
- Robust Error Handling: Gracefully handles network issues, invalid URLs, and other exceptions.
- Python 3.8+
- Streamlit: For the web interface.
- LangChain: For building the RAG pipeline.
- Chroma: For vector storage and retrieval.
- Groq: For language model inference.
- HuggingFace Embeddings: For text embeddings.
- BeautifulSoup: For web scraping.
- Requests: For HTTP requests.
- ChromaDB: For persistent vector storage.
-
Clone the Repository:
git clone https://github.com/your-username/sitebot.git cd sitebot -
Create a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Set Up Environment Variables: Create a
.envfile in the project root and add your Groq API key:GROQ_API_KEY=your_groq_api_key
-
Run the Application:
streamlit run app2.py
- Add Websites: In the sidebar, enter website URLs to load their content.
- Select Websites: Choose which loaded websites to query.
- Ask Questions: Use the chat input to ask questions about the selected websites.
- Explore Suggested Questions: Try pre-generated questions for quick insights.
- Clear Chat: Reset the conversation history using the "Clear Chat" button.
- Add URLs like
https://example.comandhttps://anotherexample.com. - Select both URLs from the multiselect dropdown.
- Ask: "What are the main topics covered by these websites?"
- SiteBot will provide a combined response with insights from both websites, including a comparative analysis if applicable.
sitebot/
├── app2.py # Main application script
├── requirements.txt # Python dependencies
├── .env # Environment variables (not tracked)
├── chroma_db/ # Chroma vector store data
├── README.md # Project documentation
See requirements.txt for a complete list. Key dependencies include:
streamlitlangchainlangchain-communitylangchain-groqchromadbrequestsbeautifulsoup4python-dotenv
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/your-feature). - Commit your changes (
git commit -m 'Add your feature'). - Push to the branch (
git push origin feature/your-feature). - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.