AI-Powered Document Insight Tool

Professional-grade resume analysis platform powered by advanced AI models with intelligent document type detection and comprehensive insight generation.

🎯 Project Overview

An enterprise-grade document analysis platform specializing in resume processing with AI-powered insights. The system automatically detects document types and provides structured analysis optimized for professional recruitment and career development workflows.

🔑 Key Specializations

Resume Analysis: Specialized parsing for professional documents with structured output
Document Type Detection: Intelligent classification of uploaded documents
Multi-AI Integration: Dual AI provider setup with intelligent fallback mechanisms
Real-time Processing: Asynchronous document processing with live progress tracking

🏗️ System Architecture

graph TB
    subgraph "Frontend Layer (Vercel)"
        A[React 18 + TypeScript]
        A1[TailwindCSS Styling]
        A2[Clerk Authentication]
        A3[Mobile Detection]
        A4[404 Error Handling]
    end
    
    subgraph "API Gateway (Vercel Serverless)"
        B[FastAPI Backend]
        B1[JWT Validation]
        B2[File Upload Handler]
        B3[Error Management]
        B4[Health Monitoring]
    end
    
    subgraph "Document Processing Engine"
        C[PDFPlumber Extractor]
        C1[Document Type Detector]
        C2[Content Sanitization]
        C3[Text Preprocessing]
    end
    
    subgraph "AI Analysis Layer"
        D[Sarvam AI Primary]
        E[Gemini AI Secondary]
        F[Intelligent Fallback]
        G[Response Structuring]
    end
    
    subgraph "Data Layer"
        H[MongoDB Atlas]
        H1[User Document Store]
        H2[Binary PDF Storage]
        H3[Analysis History]
    end
    
    subgraph "Authentication & Security"
        I[Clerk Identity Provider]
        I1[JWT Token Management]
        I2[User Session Handling]
    end
    
    A -->|HTTPS API Calls| B
    B -->|Document Upload| C
    C -->|Extracted Text| D
    D -->|Fallback on Error| E
    E -->|Final Fallback| F
    D -->|Success Response| G
    E -->|Success Response| G
    F -->|Keyword Analysis| G
    G -->|Structured Data| H
    B -->|Auth Validation| I
    H -->|User Data| B
    B -->|JSON Response| A

🛠️ Technology Stack

Backend Infrastructure

Component	Technology	Version	Purpose
Framework	FastAPI	0.104.1	High-performance async API
Language	Python	3.12+	Core backend logic
Database	MongoDB Atlas	7.0	Document storage & history
PDF Processing	PDFPlumber	0.9.0	Text extraction from PDFs
HTTP Client	httpx	0.24.0	Async AI API communications
Authentication	PyJWT	2.8.0	JWT token validation
Deployment	Vercel Serverless	-	Auto-scaling serverless functions

Frontend Architecture

Component	Technology	Version	Purpose
Framework	React	18.2.0	Component-based UI framework
Language	TypeScript	5.0.2	Type-safe development
Build Tool	Vite	5.0+	Fast development & bundling
Styling	TailwindCSS	3.3.0	Utility-first CSS framework
Authentication	Clerk React	4.29.0	User authentication SDK
HTTP Client	Axios	1.6.0	API communication
Icons	Lucide React	0.263.0	Consistent iconography
Routing	React Router	6.8.0	Client-side navigation

🚀 Local Development Setup

Prerequisites

Node.js 18+ and npm
Python 3.9+
Git for version control
MongoDB Atlas account (free tier available)
Clerk account for authentication
AI API Keys (optional - fallback available)

1. Clone Repository

git clone https://github.com/TechyCSR/AI-Powered-Document-Insight-Tool.git
cd AI-Powered-Document-Insight-Tool

2. Backend Setup

cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Setup environment variables
cp env.example .env

Edit backend/.env with your configuration:

# MongoDB Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/document_insights?retryWrites=true&w=majority

# Clerk Configuration
CLERK_SECRET_KEY=sk_test_your_clerk_secret_key_here

# AI API Keys (Optional - fallback available)
SARVAM_API_KEY=your_sarvam_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here

# Application Settings
ENVIRONMENT=development
DEBUG=True
ALLOWED_ORIGINS=http://localhost:5173,http://localhost:3000

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Setup environment variables
cp env.example .env.local

Edit frontend/.env.local with your configuration:

VITE_CLERK_PUBLISHABLE_KEY=pk_test_your_clerk_publishable_key_here
VITE_API_URL=http://localhost:8000

4. Start Development Servers

Terminal 1 - Backend API:

cd backend
# Ensure virtual environment is activated
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend Application:

cd frontend
npm run dev

5. Access Application

Frontend: http://localhost:5173
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/api/v1/health

6. Development Workflow

Backend Changes: Auto-reload enabled with --reload flag
Frontend Changes: Hot Module Replacement (HMR) active
Database: MongoDB Atlas automatically syncs
Authentication: Clerk handles dev/prod environments automatically

📝 Quick Setup Checklist

Python 3.9+ installed
Node.js 18+ installed
MongoDB Atlas account created
Clerk account setup with project created
Environment variables configured
Dependencies installed
Both servers running
Application accessible at localhost:5173

🌐 Production Deployment

Production URLs

Frontend Application: https://summary.techycsr.dev
Backend API: https://summaryapi.techycsr.me
Health Endpoint: https://summaryapi.techycsr.dev/api/v1/health

API Endpoints

Core Endpoints

GET    /api/v1/health                     # System health check
POST   /api/v1/upload-resume              # Document upload & analysis
GET    /api/v1/insights                   # User document history
GET    /api/v1/document/{id}/preview      # PDF preview (authenticated)

Health Check Response

{
  "status": "healthy",
  "timestamp": "2025-08-30T13:27:21.331972",
  "environment": "production",
  "database": {
    "connected": true,
    "status": "connected",
    "error": null,
    "insights_count": 42
  }
}

Upload Resume Request

POST /api/v1/upload-resume
Content-Type: multipart/form-data
Authorization: Bearer {jwt_token}

file: {pdf_file}
provider: "sarvam" | "gemini"

Upload Response (Resume)

{
  "summary": "**👤 Name:** John Doe\n**📧 Contact:** john.doe@email.com, +1-555-0123\n**💼 Professional Summary:** Experienced software engineer with 5+ years...",
  "provider": "sarvam",
  "is_fallback": false,
  "filename": "resume.pdf",
  "upload_date": "2025-08-30T13:27:21.331972",
  "document_id": "66d1b2c3d4e5f6789abcdef0"
}

🧠 AI Analysis Features

Resume Analysis Format

👤 Name: [Extracted full name]
📧 Contact: [Phone, Email, Location]
💼 Professional Summary: [Key qualifications highlights]
🎯 Core Skills: [Technical and soft skills]
💪 Experience Highlights:
  • [Most relevant role with quantified achievements]
  • [Second important position with metrics]
  • [Third significant role with impact data]
🎓 Education: [Degrees, institutions, GPA if available]
🏆 Notable Achievements:
  • [Top accomplishment with quantified results]
  • [Second significant achievement]
  • [Third notable accomplishment]
📊 Career Insights:
  • Years of Experience: [Calculated total]
  • Industry Focus: [Primary domain]
  • Career Level: [Entry/Mid/Senior/Executive]

General Document Analysis Format

📄 Document Type: [Auto-detected type]
📝 Document Summary: [Comprehensive overview]
🔍 Key Insights: [Major findings and observations]
📊 Main Topics: [Primary and secondary topics]
💡 Critical Information: [Important facts and recommendations]
🎯 Target Audience: [Intended readers]
📈 Key Takeaways: [Actionable insights]

Document Type Detection

The system automatically detects document types:

Resume/CV: Professional experience documents
Research Paper: Academic and scientific documents
Proposal: Project and business proposals
Legal Document: Contracts and agreements
Report: Analysis and findings documents
General Document: Other document types

🚀 Core Functionality

1. Document Upload & Processing

File Validation: PDF-only, 10MB size limit
Text Extraction: Advanced PDFPlumber integration
Content Sanitization: Clean text preprocessing
Progress Tracking: Real-time upload status

2. AI-Powered Analysis

Document Type Detection: Intelligent classification
Dual AI Integration: Primary (Sarvam) + Secondary (Gemini)
Intelligent Fallback: Keyword frequency analysis
Structured Output: Formatted, actionable insights

3. User Management

Secure Authentication: Clerk-based JWT validation
Personal History: Complete analysis tracking
PDF Preview: Authenticated document viewing
Session Management: Persistent user sessions

4. Enterprise Features

Responsive Design: Mobile detection with desktop optimization
Error Handling: Professional 404 pages and error management
Health Monitoring: System status tracking
Performance Optimization: Async processing and caching

📱 User Experience Design

Responsive Behavior

Desktop Optimized: Full-featured dashboard experience
Mobile Detection: Automatic redirection to mobile-optimized messaging
Tablet Support: Warning banners for limited mobile functionality
Progressive Enhancement: Graceful degradation across devices

Interface Highlights

Modern Dashboard: Clean, professional layout
Drag-and-Drop Upload: Intuitive file handling
Real-time Feedback: Progress indicators and status updates
Theme Support: Dark/light mode toggle
Error Recovery: User-friendly error messages and recovery options

⚡ Performance & Scalability

Backend Optimization

Serverless Architecture: Auto-scaling Vercel functions
Async Processing: Non-blocking I/O operations
Connection Pooling: Optimized MongoDB connections
Error Recovery: Graceful degradation and reconnection logic

Frontend Optimization

Code Splitting: Dynamic imports for reduced bundle size
Lazy Loading: On-demand component loading
Caching Strategy: Optimized API response caching
Bundle Analysis: Size optimization and tree shaking

Database Performance

Indexed Queries: Optimized user-based data retrieval
Document Storage: Efficient binary PDF storage
Connection Management: Serverless-optimized pooling

🔒 Security Implementation

Authentication & Authorization

JWT Validation: Secure token-based authentication
User Isolation: Strict data access controls
Session Management: Secure session handling
API Protection: Authenticated endpoint access

Data Security

File Validation: Strict PDF-only upload enforcement
Size Limits: 10MB maximum file size
Content Sanitization: Safe text processing
Secure Storage: Encrypted MongoDB Atlas storage

Infrastructure Security

HTTPS Enforcement: SSL/TLS encryption
Environment Variables: Secure configuration management
API Rate Limiting: Protection against abuse
Error Handling: Secure error message disclosure

📊 System Status

✅ Backend API: Operational (99.9% uptime)
✅ Frontend App: Deployed & Responsive
✅ Database: MongoDB Atlas Connected (42 documents)
✅ AI Services: Sarvam + Gemini Operational
✅ Authentication: Clerk Integration Active
✅ File Processing: PDF Upload & Analysis Working
✅ Mobile Support: Detection & Redirection Active
✅ Error Handling: 404 Pages & Recovery Implemented

🎯 Project Highlights

Production-Ready: Fully deployed and operational system
Enterprise-Grade: Professional UI/UX with comprehensive error handling
AI-Specialized: Optimized for resume analysis with fallback intelligence
Scalable Architecture: Serverless deployment with auto-scaling capabilities
Modern Tech Stack: Latest frameworks and best practices implementation
Security-First: Comprehensive authentication and data protection
Performance-Optimized: Fast loading times and efficient processing

Built with modern web technologies for professional document analysis workflows. Deployed and operational at summary.techycsr.me "filename": "resume.pdf", "upload_date": "2025-08-28T14:00:00Z", "provider": "sarvam", "summary": "Professional summary...", "is_fallback": false, "file_size": 1234567 } ], "total_count": 1 }


## 🚀 Deployment

### Prerequisites for Deployment
1. **Vercel Account** - For hosting both frontend and backend
2. **MongoDB Atlas** - Cloud database
3. **Clerk Account** - Authentication service
4. **Domain** (optional) - For custom domain

### Backend Deployment (Vercel)

1. **Connect to Vercel**
```bash
cd backend
npm i -g vercel  # Install Vercel CLI
vercel  # Follow the prompts

Set Environment Variables Go to your Vercel dashboard and add:

MONGODB_URI
CLERK_SECRET_KEY
SARVAM_API_KEY
GEMINI_API_KEY
ENVIRONMENT=production
DEBUG=False
ALLOWED_ORIGINS=https://your-frontend-domain.vercel.app

Frontend Deployment (Vercel)

Connect to Vercel

cd frontend
vercel  # Follow the prompts

Set Environment Variables Add to Vercel dashboard:

VITE_CLERK_PUBLISHABLE_KEY
VITE_API_BASE_URL=https://your-backend-domain.vercel.app/api/v1

Post-Deployment Configuration

Update Clerk Settings
- Add your production domains to allowed origins
- Update redirect URLs
Update API CORS
- Add production frontend URL to backend CORS settings
Test the Deployment
- Verify authentication flow
- Test file upload functionality
- Check AI provider integration

👨‍💻 Developer

Built with ❤️ by @TechyCSR

Professional full-stack developer specializing in AI-powered applications and modern web technologies.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
setup.sh		setup.sh

License

TechyCSR/AI-Powered-Document-Insight-Tool

Folders and files

Latest commit

History

Repository files navigation