InsightPDF is a full-stack application that lets you upload multiple PDF documents and chat with them using advanced AI. It leverages Google Gemini for embeddings, FAISS for vector search, and S3 for scalable, serverless storage.
- Upload Multiple PDFs: Drag and drop or select multiple PDF files for analysis.
- AI-Powered Chat: Ask questions about your uploaded documents and get intelligent, context-aware answers.
- Per-Session Isolation: Each upload session is isolated with a unique session ID.
- Cloud-Native Storage: All vector indexes are stored and loaded directly from S3 (no local storage required).
- Amazon S3 Vectors Storage: Utilizes Amazon S3 Vectors for scalable, serverless vector storage and high-performance similarity search.
- Python 3.11 or higher
- Node.js 20+ and npm
my-project/
├── backend/
│ ├── .dockerignore
│ ├── Dockerfile
│ ├── Dockerfile.local
│ ├── main.py
│ └── ai_utils.py
│ └── config.py
│ └── pdf_utils.py
│ └── vector_utils.py
│ └── requirements.txt
│ └── .env.example
├── frontend/
│ ├── public/
| | ├── favicon.co
| | ├── logo.png
│ ├── src/
| | ├── components/
| | | ├── ChatBot.tsx
| | | ├── ChatInput.tsx
| | | ├── GradientBG.tsx
| | | ├── MessageBubble.tsx
| | | ├── PdfUploader.tsx
| | ├── App.tsx
| | ├── main.tsx
| | ├── theme.tsx
│ ├── .env.example
│ ├── eslint.config.js
│ ├── index.html
│ ├── package-lock.json
│ ├── package.json
│ ├── vite-env.d.ts
│ ├── vite.config.js
└── .gitignore
└── README.md
-
Upload PDFs:
- User uploads one or more PDF files via the frontend.
- Backend extracts text, chunks it, creates a FAISS vector index, and uploads it to S3.
- Backend returns a
session_idto the frontend.
-
Chat with Documents:
- User asks questions in the chat UI.
- Frontend sends the question and
session_idto the backend. - Backend loads the FAISS index from S3, performs a similarity search, and uses Google Gemini to generate an answer.
- Answer is returned and displayed in the chat.
- GOOGLE_API_KEY=your-google-api-key
Get your Google API key from: https://aistudio.google.com/app/apikey - S3_BUCKET=your-s3-bucket-name
- CORS_ORIGINS=your-frontend-domain
- VITE_API_URL=http://localhost:8000
cd backend- Create and activate a Python virtual environment.
- Install dependencies:
pip install -r requirements.txt - Copy
.env.exampleto.envand fill in your values. - Run the server:
uvicorn main:app --reload
cd frontend- Install dependencies:
npm install - Copy
.env.exampleto.envand setVITE_API_URLif needed. - Run the dev server:
npm run dev
- Build the Docker image using the provided Lambda Dockerfile (
backend/Dockerfile). - Push the image to AWS ECR (Elastic Container Registry).
- Create or update an AWS Lambda function using the ECR image.
- Set environment variables in Lambda as needed.
- Create an API Gateway and connect it to your Lambda function.
- Configure CORS and endpoint security as required.
- Build the frontend with
npm run build. Configure the VITE_API_URL before building. - Upload the contents of the
distfolder to your S3 bucket. - Set the S3 bucket for static website hosting (block public access, use OAC if needed).
- Create a CloudFront distribution pointing to the S3 bucket.
- (Optional) Set up a custom domain and SSL certificate using AWS ACM.
- Update DNS records to point your domain to the CloudFront distribution.
+Backend:
- FastAPI (API framework)
- FAISS (vector search)
- Google Gemini (AI embeddings & chat)
- PyMuPDF, PyPDF2 (PDF parsing)
- Boto3 (AWS SDK for S3)
Frontend:
- React (UI library)
- Vite (build tool)
- Material UI (component library)
Cloud & DevOps:
- AWS Lambda (serverless backend)
- AWS API Gateway (API endpoint)
- AWS S3 (static hosting & vector storage)
- AWS CloudFront (CDN & SSL)
- AWS ECR (container registry)
- Docker (containerization)
This project is licensed under the MIT License. See the LICENSE file for details.
Created by Harsh Negi
- CI/CD Pipelines: Automate build, test, and deployment using GitHub Actions or AWS CodePipeline.
- Session Management & User Authentication: Add user accounts, persistent sessions, and secure authentication (OAuth, JWT, etc.).
Suggestions and contributions are welcome!