Skip to content

kunalverma2512/Intellexa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intellexa

Intellexa is a lightweight and experimental RAG (Retrieval-Augmented Generation) based system aimed at assisting users by analyzing their HTML-based website content and generating intelligent answers or summaries related to that content.

This project is currently limited to HTML-based websites only. Non-HTML-based sites or JavaScript-heavy SPAs are not yet supported.


💡 Objective

To create a simple and practical AI assistant that:

  • Crawls and processes a user-provided HTML website.
  • Converts the website content into vector embeddings.
  • Stores and indexes the embeddings in Qdrant (a vector database).
  • Uses a RAG pipeline with LangChain to retrieve relevant context and answer queries based on the content.
  • Provides a clean UI for inputting a website and asking questions.

This project is designed as a proof-of-concept and is not yet production-ready.


🧱 Tech Stack

🔹 Frontend (Vite + React)

  • Form to submit website URL and direct text filling form.
  • Chat-like UI for asking questions.
  • Loading indicators and basic user experience enhancements.

🔹 Node Backend (server/)

  • Handles initial website crawling and sends it to the Python backend.
  • Serves as a communication bridge between frontend and ML components.

🔹 Python Backend (python-backend/)

  • Extracts text from the HTML using langchain.
  • Splits the text into chunks using LangChain's TextSplitter.
  • Converts chunks to embeddings using sentence-transformers.
  • Stores vectors in Qdrant for similarity search.
  • Uses RAG (Retrieval-Augmented Generation) to answer user queries using retrieved chunks.
  • Placeholder for future Neo4j integration (currently unused).

✅ Features Implemented

  • ✅ Website crawling and scraping (HTML only).
  • ✅ Chunking and embedding generation.
  • ✅ Vector storage and retrieval using Qdrant.
  • ✅ Question answering using LangChain RAG pipeline.
  • ✅ Seamless multi-service interaction (Node + Python + Frontend).
  • ✅ Clean GitHub structure with appropriate .gitignore in all folders.

📁 Folder Structure

Intellexa/
├── client/           # React frontend (Vite)
├── backend/          # Node.js backend (Crawling & API)
├── python-backend/   # FastAPI + LangChain backend
├── docker-compose.yml
├── README.md

🚀 How to Clone and Run This Project

📥 Step 1: Clone the Repository

git clone https://github.com/kunalverma2512/Intellexa.git
cd Intellexa

💻 Start the React Frontend

cd client
npm install
npm run dev

🛠️ Start the Node.js Backend

cd backend
npm install
npm run dev

🐍 Start the Python Backend (FastAPI + LangChain)

Make sure you have Docker and Docker Compose installed.

From the root directory of the project, run:

docker-compose up --build

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published