Skip to content

TechyCSR/AI-Powered-Document-Insight-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI-Powered Document Insight Tool

Professional-grade resume analysis platform powered by advanced AI models with intelligent document type detection and comprehensive insight generation.

Live Demo API Status

🎯 Project Overview

An enterprise-grade document analysis platform specializing in resume processing with AI-powered insights. The system automatically detects document types and provides structured analysis optimized for professional recruitment and career development workflows.

πŸ”‘ Key Specializations

  • Resume Analysis: Specialized parsing for professional documents with structured output
  • Document Type Detection: Intelligent classification of uploaded documents
  • Multi-AI Integration: Dual AI provider setup with intelligent fallback mechanisms
  • Real-time Processing: Asynchronous document processing with live progress tracking

πŸ—οΈ System Architecture

graph TB
    subgraph "Frontend Layer (Vercel)"
        A[React 18 + TypeScript]
        A1[TailwindCSS Styling]
        A2[Clerk Authentication]
        A3[Mobile Detection]
        A4[404 Error Handling]
    end
    
    subgraph "API Gateway (Vercel Serverless)"
        B[FastAPI Backend]
        B1[JWT Validation]
        B2[File Upload Handler]
        B3[Error Management]
        B4[Health Monitoring]
    end
    
    subgraph "Document Processing Engine"
        C[PDFPlumber Extractor]
        C1[Document Type Detector]
        C2[Content Sanitization]
        C3[Text Preprocessing]
    end
    
    subgraph "AI Analysis Layer"
        D[Sarvam AI Primary]
        E[Gemini AI Secondary]
        F[Intelligent Fallback]
        G[Response Structuring]
    end
    
    subgraph "Data Layer"
        H[MongoDB Atlas]
        H1[User Document Store]
        H2[Binary PDF Storage]
        H3[Analysis History]
    end
    
    subgraph "Authentication & Security"
        I[Clerk Identity Provider]
        I1[JWT Token Management]
        I2[User Session Handling]
    end
    
    A -->|HTTPS API Calls| B
    B -->|Document Upload| C
    C -->|Extracted Text| D
    D -->|Fallback on Error| E
    E -->|Final Fallback| F
    D -->|Success Response| G
    E -->|Success Response| G
    F -->|Keyword Analysis| G
    G -->|Structured Data| H
    B -->|Auth Validation| I
    H -->|User Data| B
    B -->|JSON Response| A
Loading

πŸ› οΈ Technology Stack

Backend Infrastructure

Component Technology Version Purpose
Framework FastAPI 0.104.1 High-performance async API
Language Python 3.12+ Core backend logic
Database MongoDB Atlas 7.0 Document storage & history
PDF Processing PDFPlumber 0.9.0 Text extraction from PDFs
HTTP Client httpx 0.24.0 Async AI API communications
Authentication PyJWT 2.8.0 JWT token validation
Deployment Vercel Serverless - Auto-scaling serverless functions

Frontend Architecture

Component Technology Version Purpose
Framework React 18.2.0 Component-based UI framework
Language TypeScript 5.0.2 Type-safe development
Build Tool Vite 5.0+ Fast development & bundling
Styling TailwindCSS 3.3.0 Utility-first CSS framework
Authentication Clerk React 4.29.0 User authentication SDK
HTTP Client Axios 1.6.0 API communication
Icons Lucide React 0.263.0 Consistent iconography
Routing React Router 6.8.0 Client-side navigation


πŸš€ Local Development Setup

Prerequisites

  • Node.js 18+ and npm
  • Python 3.9+
  • Git for version control
  • MongoDB Atlas account (free tier available)
  • Clerk account for authentication
  • AI API Keys (optional - fallback available)

1. Clone Repository

git clone https://github.com/TechyCSR/AI-Powered-Document-Insight-Tool.git
cd AI-Powered-Document-Insight-Tool

2. Backend Setup

cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Setup environment variables
cp env.example .env

Edit backend/.env with your configuration:

# MongoDB Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/document_insights?retryWrites=true&w=majority

# Clerk Configuration
CLERK_SECRET_KEY=sk_test_your_clerk_secret_key_here

# AI API Keys (Optional - fallback available)
SARVAM_API_KEY=your_sarvam_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here

# Application Settings
ENVIRONMENT=development
DEBUG=True
ALLOWED_ORIGINS=http://localhost:5173,http://localhost:3000

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Setup environment variables
cp env.example .env.local

Edit frontend/.env.local with your configuration:

VITE_CLERK_PUBLISHABLE_KEY=pk_test_your_clerk_publishable_key_here
VITE_API_URL=http://localhost:8000

4. Start Development Servers

Terminal 1 - Backend API:

cd backend
# Ensure virtual environment is activated
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend Application:

cd frontend
npm run dev

5. Access Application

6. Development Workflow

  1. Backend Changes: Auto-reload enabled with --reload flag
  2. Frontend Changes: Hot Module Replacement (HMR) active
  3. Database: MongoDB Atlas automatically syncs
  4. Authentication: Clerk handles dev/prod environments automatically

πŸ“ Quick Setup Checklist

  • Python 3.9+ installed
  • Node.js 18+ installed
  • MongoDB Atlas account created
  • Clerk account setup with project created
  • Environment variables configured
  • Dependencies installed
  • Both servers running
  • Application accessible at localhost:5173

🌐 Production Deployment

Production URLs

API Endpoints

Core Endpoints

GET    /api/v1/health                     # System health check
POST   /api/v1/upload-resume              # Document upload & analysis
GET    /api/v1/insights                   # User document history
GET    /api/v1/document/{id}/preview      # PDF preview (authenticated)

Health Check Response

{
  "status": "healthy",
  "timestamp": "2025-08-30T13:27:21.331972",
  "environment": "production",
  "database": {
    "connected": true,
    "status": "connected",
    "error": null,
    "insights_count": 42
  }
}

Upload Resume Request

POST /api/v1/upload-resume
Content-Type: multipart/form-data
Authorization: Bearer {jwt_token}

file: {pdf_file}
provider: "sarvam" | "gemini"

Upload Response (Resume)

{
  "summary": "**πŸ‘€ Name:** John Doe\n**πŸ“§ Contact:** john.doe@email.com, +1-555-0123\n**πŸ’Ό Professional Summary:** Experienced software engineer with 5+ years...",
  "provider": "sarvam",
  "is_fallback": false,
  "filename": "resume.pdf",
  "upload_date": "2025-08-30T13:27:21.331972",
  "document_id": "66d1b2c3d4e5f6789abcdef0"
}

🧠 AI Analysis Features

Resume Analysis Format

πŸ‘€ Name: [Extracted full name]
πŸ“§ Contact: [Phone, Email, Location]
πŸ’Ό Professional Summary: [Key qualifications highlights]
🎯 Core Skills: [Technical and soft skills]
πŸ’ͺ Experience Highlights:
  β€’ [Most relevant role with quantified achievements]
  β€’ [Second important position with metrics]
  β€’ [Third significant role with impact data]
πŸŽ“ Education: [Degrees, institutions, GPA if available]
πŸ† Notable Achievements:
  β€’ [Top accomplishment with quantified results]
  β€’ [Second significant achievement]
  β€’ [Third notable accomplishment]
πŸ“Š Career Insights:
  β€’ Years of Experience: [Calculated total]
  β€’ Industry Focus: [Primary domain]
  β€’ Career Level: [Entry/Mid/Senior/Executive]

General Document Analysis Format

πŸ“„ Document Type: [Auto-detected type]
πŸ“ Document Summary: [Comprehensive overview]
πŸ” Key Insights: [Major findings and observations]
πŸ“Š Main Topics: [Primary and secondary topics]
πŸ’‘ Critical Information: [Important facts and recommendations]
🎯 Target Audience: [Intended readers]
πŸ“ˆ Key Takeaways: [Actionable insights]

Document Type Detection

The system automatically detects document types:

  • Resume/CV: Professional experience documents
  • Research Paper: Academic and scientific documents
  • Proposal: Project and business proposals
  • Legal Document: Contracts and agreements
  • Report: Analysis and findings documents
  • General Document: Other document types

πŸš€ Core Functionality

1. Document Upload & Processing

  • File Validation: PDF-only, 10MB size limit
  • Text Extraction: Advanced PDFPlumber integration
  • Content Sanitization: Clean text preprocessing
  • Progress Tracking: Real-time upload status

2. AI-Powered Analysis

  • Document Type Detection: Intelligent classification
  • Dual AI Integration: Primary (Sarvam) + Secondary (Gemini)
  • Intelligent Fallback: Keyword frequency analysis
  • Structured Output: Formatted, actionable insights

3. User Management

  • Secure Authentication: Clerk-based JWT validation
  • Personal History: Complete analysis tracking
  • PDF Preview: Authenticated document viewing
  • Session Management: Persistent user sessions

4. Enterprise Features

  • Responsive Design: Mobile detection with desktop optimization
  • Error Handling: Professional 404 pages and error management
  • Health Monitoring: System status tracking
  • Performance Optimization: Async processing and caching

πŸ“± User Experience Design

Responsive Behavior

  • Desktop Optimized: Full-featured dashboard experience
  • Mobile Detection: Automatic redirection to mobile-optimized messaging
  • Tablet Support: Warning banners for limited mobile functionality
  • Progressive Enhancement: Graceful degradation across devices

Interface Highlights

  • Modern Dashboard: Clean, professional layout
  • Drag-and-Drop Upload: Intuitive file handling
  • Real-time Feedback: Progress indicators and status updates
  • Theme Support: Dark/light mode toggle
  • Error Recovery: User-friendly error messages and recovery options

⚑ Performance & Scalability

Backend Optimization

  • Serverless Architecture: Auto-scaling Vercel functions
  • Async Processing: Non-blocking I/O operations
  • Connection Pooling: Optimized MongoDB connections
  • Error Recovery: Graceful degradation and reconnection logic

Frontend Optimization

  • Code Splitting: Dynamic imports for reduced bundle size
  • Lazy Loading: On-demand component loading
  • Caching Strategy: Optimized API response caching
  • Bundle Analysis: Size optimization and tree shaking

Database Performance

  • Indexed Queries: Optimized user-based data retrieval
  • Document Storage: Efficient binary PDF storage
  • Connection Management: Serverless-optimized pooling

πŸ”’ Security Implementation

Authentication & Authorization

  • JWT Validation: Secure token-based authentication
  • User Isolation: Strict data access controls
  • Session Management: Secure session handling
  • API Protection: Authenticated endpoint access

Data Security

  • File Validation: Strict PDF-only upload enforcement
  • Size Limits: 10MB maximum file size
  • Content Sanitization: Safe text processing
  • Secure Storage: Encrypted MongoDB Atlas storage

Infrastructure Security

  • HTTPS Enforcement: SSL/TLS encryption
  • Environment Variables: Secure configuration management
  • API Rate Limiting: Protection against abuse
  • Error Handling: Secure error message disclosure

πŸ“Š System Status

βœ… Backend API: Operational (99.9% uptime)
βœ… Frontend App: Deployed & Responsive
βœ… Database: MongoDB Atlas Connected (42 documents)
βœ… AI Services: Sarvam + Gemini Operational
βœ… Authentication: Clerk Integration Active
βœ… File Processing: PDF Upload & Analysis Working
βœ… Mobile Support: Detection & Redirection Active
βœ… Error Handling: 404 Pages & Recovery Implemented

🎯 Project Highlights

  • Production-Ready: Fully deployed and operational system
  • Enterprise-Grade: Professional UI/UX with comprehensive error handling
  • AI-Specialized: Optimized for resume analysis with fallback intelligence
  • Scalable Architecture: Serverless deployment with auto-scaling capabilities
  • Modern Tech Stack: Latest frameworks and best practices implementation
  • Security-First: Comprehensive authentication and data protection
  • Performance-Optimized: Fast loading times and efficient processing

Built with modern web technologies for professional document analysis workflows. Deployed and operational at summary.techycsr.me "filename": "resume.pdf", "upload_date": "2025-08-28T14:00:00Z", "provider": "sarvam", "summary": "Professional summary...", "is_fallback": false, "file_size": 1234567 } ], "total_count": 1 }


## πŸš€ Deployment

### Prerequisites for Deployment
1. **Vercel Account** - For hosting both frontend and backend
2. **MongoDB Atlas** - Cloud database
3. **Clerk Account** - Authentication service
4. **Domain** (optional) - For custom domain

### Backend Deployment (Vercel)

1. **Connect to Vercel**
```bash
cd backend
npm i -g vercel  # Install Vercel CLI
vercel  # Follow the prompts
  1. Set Environment Variables Go to your Vercel dashboard and add:
  • MONGODB_URI
  • CLERK_SECRET_KEY
  • SARVAM_API_KEY
  • GEMINI_API_KEY
  • ENVIRONMENT=production
  • DEBUG=False
  • ALLOWED_ORIGINS=https://your-frontend-domain.vercel.app

Frontend Deployment (Vercel)

  1. Connect to Vercel
cd frontend
vercel  # Follow the prompts
  1. Set Environment Variables Add to Vercel dashboard:
  • VITE_CLERK_PUBLISHABLE_KEY
  • VITE_API_BASE_URL=https://your-backend-domain.vercel.app/api/v1

Post-Deployment Configuration

  1. Update Clerk Settings

    • Add your production domains to allowed origins
    • Update redirect URLs
  2. Update API CORS

    • Add production frontend URL to backend CORS settings
  3. Test the Deployment

    • Verify authentication flow
    • Test file upload functionality
    • Check AI provider integration

πŸ‘¨β€πŸ’» Developer

Built with ❀️ by @TechyCSR

Professional full-stack developer specializing in AI-powered applications and modern web technologies.


Β© 2025 TechyCSR β€’ AI-Powered Document Analysis Platform β€’ summary.techycsr.me