A production-grade backend system that automates the full lifecycle of candidate CV processing for an education recruitment agency โ from bulk upload through AI-powered enhancement, PDF generation, geo-filtered organization matching, and targeted email outreach.
Built with Django 6, Celery, MinIO, PostgreSQL, Redis, and SendGrid, containerised with Docker. ๐ณ
- Overview
- Architecture
- Tech Stack
- Project Structure
- Getting Started โ Docker
- Getting Started โ Local Development
- Environment Variables
- API Reference
- Background Tasks
- Performance Notes
- Key Design Decisions
EduKai automates a recruitment agency's entire candidate workflow:
- Bulk CV Upload โ Upload 500โ1000 CVs at once. Each CV is stored in MinIO and queued for AI processing.
- AI Processing โ A FastAPI AI service extracts candidate data, performs quality checks, and generates enhanced email content using Celery.
- PDF Generation โ WeasyPrint generates a branded enhanced CV PDF stored in MinIO.
- Availability Email โ Candidates are automatically emailed about new opportunities via SendGrid. โ๏ธ
- Organization Management โ Import 24,000+ schools from Excel, auto-geocode addresses using Nominatim (free, no API key).
- Geo Filtering โ Find all organizations within N km of a candidate using their postcode. ๐
- Targeted Outreach โ Send candidate profiles to up to 1000 selected school contacts in one request.
- Dashboard โ Real-time statistics, activity log, and notification system for the system operator. ๐
graph TD
subgraph Infrastructure
Redis[Redis<br/>Broker + Cache<br/>:6379]
Postgres[PostgreSQL<br/>Primary DB<br/>:5431]
MinIO[MinIO<br/>File Storage<br/>:9000/:9001]
end
subgraph AI["AI Service"]
AIApp[AI FastAPI<br/>:8080]
AIWorker[AI Celery Worker<br/>GPT tasks]
end
subgraph Backend["Backend"]
Django[Django<br/>:8000]
CeleryDefault[Celery default<br/>concurrency: 4]
CeleryPolling[Celery polling<br/>concurrency: 4]
CeleryPDF[Celery pdf<br/>concurrency: 2]
CeleryBeat[Celery beat<br/>every 5 min]
end
Redis --> AIApp
Redis --> AIWorker
Redis --> Django
Redis --> CeleryDefault
Redis --> CeleryPolling
Redis --> CeleryPDF
Redis --> CeleryBeat
Postgres --> Django
Postgres --> CeleryDefault
Postgres --> CeleryPolling
Postgres --> CeleryPDF
MinIO --> Django
MinIO --> CeleryDefault
Django <-->|CV URL + task_id| AIApp
sequenceDiagram
participant U as User
participant D as Django API
participant M as MinIO
participant R as Redis Queue
participant W1 as Worker (default)
participant AI as AI Service
participant W2 as Worker (polling)
participant W3 as Worker (pdf)
participant SG as SendGrid
U->>D: POST /upload/ (CV files)
D->>M: Store CV file
D->>R: Queue process_cv_task (2s stagger)
D-->>U: 202 Accepted + batch_id
R->>W1: process_cv_task
W1->>AI: POST /api/v1/regeneration/ (cv_url)
AI-->>W1: { task_id }
W1->>R: Queue poll_ai_result_task
loop Poll every 30s - Default 60 round
R->>W2: poll_ai_result_task
W2->>AI: GET /api/v1/tasks/{task_id}
AI-->>W2: { status: PENDING }
end
AI-->>W2: { status: completed, result: {...} }
W2->>M: Download + save profile photo
W2->>R: Queue generate_pdf_task
R->>W3: generate_pdf_task
W3->>W3: WeasyPrint renders HTML โ PDF
W3->>M: Save enhanced CV PDF
W3->>R: Queue send_availability_email
R->>W1: send_availability_email_task
W1->>SG: Send email to candidate
graph LR
subgraph Queues
Q1[default queue]
Q2[polling queue]
Q3[pdf queue]
Q4[beat scheduler]
end
subgraph default_tasks["default tasks"]
T1[process_cv_task]
T2[send_availability_email]
T3[send_to_contacts]
T4[geocode tasks]
T5[import_excel tasks]
end
subgraph polling_tasks["polling tasks"]
T6[poll_ai_result_task]
T7[poll_rewrite_result_task]
end
subgraph pdf_tasks["pdf tasks"]
T8[generate_enhanced_cv_pdf_task]
end
subgraph beat_tasks["beat tasks"]
T9[sync_batch_counts every 5 min]
end
Q1 --> default_tasks
Q2 --> polling_tasks
Q3 --> pdf_tasks
Q4 --> beat_tasks
flowchart LR
A[Candidate<br/>location string] --> B{Has lat/lng?}
B -- No --> C[Nominatim geocode<br/>on demand]
C --> D[Save lat/lng to DB]
D --> E[Calculate distances]
B -- Yes --> E
E --> F[Filter orgs within radius and other parameter]
F --> G[Return contacts<br/>sorted by distance]
G --> H[User selects contacts]
H --> I[POST send-to-contacts]
I --> J[send_to_contacts_task<br/>queue: default]
J --> K[SendGrid bulk send]
| Layer | Technology |
|---|---|
| Web Framework | Django 6.0.2 + Django REST Framework |
| AI Service | FastAPI + Celery (separate service) |
| Task Queue | Celery 5.6 with Redis broker |
| Database | PostgreSQL 16 |
| Cache / Broker | Redis 7 |
| File Storage | MinIO (S3-compatible) |
| PDF Generation | WeasyPrint |
| SendGrid | |
| Geocoding | Nominatim / OpenStreetMap (free, no API key) |
| Auth | JWT via djangorestframework-simplejwt (HttpOnly cookies) |
| API Docs | drf-spectacular (Swagger + ReDoc) |
| Containerisation | Docker + Docker Compose |
EduKai-CV-Automation-Engine/
โโโ docker-compose.yml # Orchestrates all 9 services
โ
โโโ Backend/ # Django backend (primary focus)
โ โโโ Dockerfile
โ โโโ requirements.txt
โ โโโ manage.py
โ โโโ .env.example # Copy to .env and configure
โ โโโ Create_the_MinIO_Bucket.py # One-time MinIO bucket setup
โ โ
โ โโโ edukai/ # Django project config
โ โ โโโ settings.py
โ โ โโโ celery.py # Celery app + task routing
โ โ โโโ urls.py
โ โ
โ โโโ account/ # Auth, users, dashboard, activity log
โ โ โโโ models.py # User + ActivityLog models
โ โ โโโ views.py # Auth, dashboard, activity endpoints
โ โ โโโ serializers.py
โ โ โโโ utils/
โ โ โโโ activity.py # log_activity() helper
โ โ โโโ cookies.py # HttpOnly JWT cookie helpers
โ โ โโโ password_reset.py # OTP via Redis + SendGrid
โ โ
โ โโโ candidate/ # Core candidate management
โ โ โโโ models.py # Candidate, CandidateUploadBatch
โ โ โโโ views.py # 15+ API endpoints
โ โ โโโ serializers.py
โ โ โโโ tasks/
โ โ โ โโโ process_cv.py # Task 1: submit CV to AI
โ โ โ โโโ poll_ai_result.py # Task 2: poll AI, save data, download photo
โ โ โ โโโ generate_pdf.py # Task 3: WeasyPrint PDF generation
โ โ โ โโโ rewrite_cv.py # AI rewrite polling task
โ โ โ โโโ send_email.py # Candidate availability email
โ โ โ โโโ send_to_contacts.py # Bulk outreach to school contacts
โ โ โ โโโ geocode.py # On-demand candidate geocoding
โ โ โ โโโ sync_batch.py # Periodic batch progress sync
โ โ โ โโโ cleanup.py # MinIO file cleanup on delete
โ โ โโโ utils/
โ โ โ โโโ minio_utils.py # Pre-signed URL generation
โ โ โ โโโ pagination.py # StandardPagination class
โ โ โโโ templates/
โ โ โโโ candidate/
โ โ โโโ enhanced_cv.html # WeasyPrint CV template
โ โ
โ โโโ organization/ # School/organization management
โ โ โโโ models.py # Organization + OrganizationContact
โ โ โโโ views.py # CRUD + import + geo filter endpoints
โ โ โโโ serializers.py
โ โ โโโ tasks/
โ โ โโโ geocode.py # Postcode to lat/lng via Nominatim
โ โ โโโ import_excel.py # Bulk Excel import (24,000+ orgs)
โ โ
โ โโโ Demo Data/
โ โโโ Organizations.xlsx # Sample organization data
โ โโโ Contacts.xlsx # Sample contact data
โ โโโ Demo CV/ # Sample CV PDFs for testing
โ
โโโ AI/ # FastAPI AI service (separate service)
โโโ app/
โ โโโ main.py # FastAPI app entry point
โ โโโ tasks.py # Celery tasks (CV processing)
โ โโโ api/v1/routes.py # /regeneration, /rewrite, /tasks endpoints
โ โโโ services/
โ โ โโโ ai_service.py # OpenAI GPT integration
โ โ โโโ file_service.py # CV download and parsing
โ โโโ prompts/ # GPT prompt templates
โโโ requirements.txt
- Docker Desktop installed and running
- Git
1. Clone the repository
git clone https://github.com/Mehedi-Hasan-Rabbi/EduKai-CV-Automation-Engine
cd EduKai-CV-Automation-Engine2. Configure environment variables ๐
cp Backend/.env.example Backend/.env
cp AI/.env.example AI/.envOpen Backend/.env and set at minimum:
SECRET_KEY=your-50-char-secret-key-here
SENDGRID_API_KEY=SG....
SENDGRID_FROM_EMAIL=you@yourdomain.comOpen AI/.env and add:
OPENAI_API_KEY=sk-...3. Build and start all services ๐ข
docker compose up --buildThis starts 9 containers. Wait until you see all workers report ready.
4. Create a superuser
In a new terminal:
docker compose exec backend python manage.py createsuperuser5. Create the MinIO bucket ๐ชฃ
Option A โ via script (recommended):
cd Backend
docker compose exec backend python Create_the_MinIO_Bucket.pyOption B โ via browser:
- Open http://localhost:9001
- Login:
minioadmin/minioadmin123 - Create a bucket named
edukai - Set the bucket access policy to Public
6. Verify everything is running โ
| Service | URL |
|---|---|
| Django API | http://localhost:8000 |
| Swagger Docs | http://localhost:8000/api/docs/ |
| ReDoc | http://localhost:8000/api/redoc/ |
| Django Admin | http://localhost:8000/admin/ |
| AI Service | http://localhost:8080 |
| MinIO Console | http://localhost:9001 |
7. Import demo data (optional) ๐ฅ
POST http://localhost:8000/api/organizations/import/
Body: form-data โ file: Backend/Demo Data/Organizations.xlsx
POST http://localhost:8000/api/organizations/import/contacts/
Body: form-data โ file: Backend/Demo Data/Contacts.xlsx
Running without Docker requires starting each service manually. You need PostgreSQL, Redis, and MinIO running locally first.
# Redis
docker run --name edukai-redis -d -p 6379:6379 --rm redis
# MinIO
docker run -d -p 9000:9000 -p 9001:9001 \
--name edukai-minio \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin123 \
-v minio_data:/data \
minio/minio server data --console-address ":9001"
# PostgreSQL
docker run --name edukai-postgres \
-e POSTGRES_PASSWORD=ultr4_instinct \
-p 5431:5432 \
-d postgrescd AI
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy and configure env
cp .env.example .env
# Add OPENAI_API_KEY to .envOpen two terminals in the AI/ directory:
# Terminal 1 โ AI FastAPI server
uvicorn app.main:app --reload --port 8080
# Terminal 2 โ AI Celery worker
celery -A app.core.celery_app worker --loglevel=infocd Backend
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy and configure env
cp .env.example .env
# Set DATABASE_URL, REDIS_URL, MINIO_*, AI_BASE_URL, etc.
# Run migrations and create superuser
python manage.py migrate
python manage.py createsuperuser# Terminal 1 โ Django development server
python manage.py runserver
# Terminal 2 โ Default worker (CV processing, geocoding, emails, imports)
celery -A edukai worker \
--queues=default \
--concurrency=4 \
--loglevel=info \
--hostname=default@%h
# Terminal 3 โ PDF worker (WeasyPrint, memory-intensive)
celery -A edukai worker \
--queues=pdf \
--concurrency=2 \
--loglevel=info \
--hostname=pdf@%h
# Terminal 4 โ Polling worker (AI result polling)
celery -A edukai worker \
--queues=polling \
--concurrency=4 \
--loglevel=info \
--hostname=polling@%h
# Terminal 5 โ Beat scheduler (periodic tasks every 5 min)
celery -A edukai beat --loglevel=infopython Create_the_MinIO_Bucket.pyOr open http://localhost:9001, login with minioadmin/minioadmin123, create bucket edukai, set access to Public.
| Variable | Required | Description |
|---|---|---|
SECRET_KEY |
โ | Django secret key (min 50 chars) |
DEBUG |
โ | True for dev, False for production |
DATABASE_URL |
โ | PostgreSQL connection string |
REDIS_URL |
โ | Redis URL for Django cache (e.g Use DB 1) |
CELERY_BROKER_URL |
โ | Redis URL for Celery broker (e.g Use DB 2) |
CELERY_RESULT_BACKEND |
โ | Redis URL for Celery results (e.g Use DB 2) |
USE_S3 |
โ | True to use MinIO/S3 storage |
MINIO_ACCESS_KEY |
โ | MinIO access key |
MINIO_SECRET_KEY |
โ | MinIO secret key |
MINIO_BUCKET_NAME |
โ | MinIO bucket name |
MINIO_ENDPOINT_URL |
โ | Internal MinIO URL (backend to MinIO) |
MINIO_PUBLIC_URL |
โ | Public MinIO URL (browser to MinIO) |
AI_BASE_URL |
โ | AI service base URL |
AI_POLL_INTERVAL_SECONDS |
โ | Polling interval in seconds (default: 30) |
AI_POLL_MAX_RETRIES |
โ | Max poll attempts (default: 60 = 30 min) |
SENDGRID_API_KEY |
โ | SendGrid API key |
SENDGRID_FROM_EMAIL |
โ | Verified sender email address |
SENDGRID_FROM_NAME |
โ | Display name for outgoing emails |
SENDGRID_REPLY_TO_EMAIL |
โ | Reply-to email address |
CV_LOGO_PATH |
โ | Path to logo image used in CV PDF |
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
โ | OpenAI API key |
REDIS_URL |
โ | Redis URL (uses separate DB from backend. e.g. Use DB 3) |
APP_BASE_URL |
โ | AI service's own base URL |
Full interactive documentation at http://localhost:8000/api/docs/.
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| POST | /register/ |
Public | Create account |
| POST | /login/ |
Public | Login, sets HttpOnly JWT cookies |
| POST | /logout/ |
Required | Logout, clears cookies |
| POST | /token/refresh/ |
Public | Refresh access token from cookie |
| GET | /me/ |
Required | Current user profile |
| PATCH | /profile/update/ |
Required | Update profile photo, name, etc. |
| POST | /password/update/ |
Required | Change password |
| POST | /forgot-password/ |
Public | Request password reset OTP via email |
| POST | /verify-otp/ |
Public | Verify OTP code |
| POST | /reset-password/ |
Public | Set new password |
| GET | /dashboard/ |
Superuser | System-wide statistics |
| GET | /activity/ |
Superuser | Activity log and notifications |
| POST | /activity/mark-read/ |
Superuser | Mark notifications as read |
| Method | Endpoint | Description |
|---|---|---|
| POST | /upload/ |
Bulk CV upload (multipart, up to 1000 files) |
| GET | / |
Paginated candidate list with filters |
| GET | /<id>/ |
Full candidate detail |
| PATCH | /<id>/update/ |
Edit candidate fields, triggers PDF regen if needed |
| DELETE | /<id>/delete/ |
Delete candidate and MinIO files (async) |
| POST | /<id>/rewrite/ |
Trigger AI CV rewrite |
| GET | /<id>/rewrite/status/ |
Poll rewrite completion |
| GET | /<id>/nearby-organizations/ |
Organizations within radius of candidate |
| GET | /<id>/nearby-contacts/ |
School contacts within radius (filterable) |
| POST | /<id>/send-to-contacts/ |
Email candidate profile to up to 1000 contacts |
| GET | /send-status/<task_id>/ |
Poll email send task result |
| GET | /batches/ |
Paginated list of upload batches |
| GET | /batches/<id>/ |
Batch progress and status |
| DELETE | /batches/<id>/delete/ |
Delete batch and all candidates |
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Paginated list with filters (phase, town, postcode, geo radius) |
| POST | / |
Create organization (auto-geocodes postcode) |
| GET | /<id>/ |
Organization detail with nested contacts |
| PATCH | /<id>/ |
Update organization (re-geocodes if address changes) |
| DELETE | /<id>/ |
Delete organization and all contacts |
| POST | /import/ |
Bulk import from Excel file (background task) |
| POST | /import/contacts/ |
Bulk contact import from Excel |
| GET | /import/status/<task_id>/ |
Poll import task result |
| GET | /contacts/ |
All contacts across all organizations |
| GET | /<id>/contacts/ |
Contacts for a specific organization |
| POST | /<id>/contacts/ |
Add contact to organization |
| GET | /contacts/<id>/ |
Contact detail |
| PATCH | /contacts/<id>/ |
Update contact |
| DELETE | /contacts/<id>/ |
Delete contact |
Four dedicated Celery queues prevent task interference under heavy load:
| Queue | Container | Tasks | Concurrency |
|---|---|---|---|
default |
celery_default |
CV processing, geocoding, emails, Excel import | 4 |
polling |
celery_polling |
AI result polling, rewrite polling | 4 |
pdf |
celery_pdf |
PDF generation (memory-intensive) | 2 |
beat |
celery_beat |
Periodic tasks (batch sync every 5 min) | โ |
[Upload] BulkCVUploadView
โโ process_cv_task (default)
POST cv_url to AI โ get task_id
โโ poll_ai_result_task (polling)
polls every 30s โ extracts data โ downloads profile photo
โโ generate_enhanced_cv_pdf_task (pdf)
WeasyPrint โ PDF โ MinIO
โโ send_availability_email_task (default)
SendGrid โ candidate inbox
sync_batch_counts runs every 5 minutes via Celery Beat. It recalculates batch processed_count and failed_count from actual candidate statuses โ fixing batches stuck at 0% when workers crash mid-task.
These are real-world observations from production testing.
CV processing time โ A batch of 600 CVs took approximately 1.5 hours end-to-end. This includes AI extraction (the bottleneck), PDF generation, and email sending. Processing time scales linearly with batch size.
Geocoding time โ Nominatim (free OpenStreetMap geocoder) enforces a 1 request/second rate limit. For safety, a 2-second stagger is used between geocoding tasks. Calculation: number_of_organizations ร 2 seconds. For 500 organizations, expect ~17 minutes. For 24,000 organizations, geocoding runs as a background process over many hours. The application remains fully functional during geocoding โ only the geo-radius filter is unavailable for organizations not yet geocoded.
Sendgrid email deliverability โ New SendGrid accounts have low sender reputation. Emails may initially go to spam. This improves over time as the domain builds reputation. To improve deliverability: verify your sending domain DNS records in SendGrid, enable IP warmup for new accounts, use a personal sender name rather than a brand name, and avoid emojis in subject lines. ๐จ
Bulk email sending โ Sending to 1000 contacts is handled by a single Celery task that iterates sequentially. SendGrid processes each email individually. Expect roughly 2-5 minutes for 1000 emails depending on network latency.
Separate Celery queues โ PDF generation is slow and memory-heavy. Mixing it with polling tasks in one queue would cause AI polling to timeout. Each queue has appropriate concurrency for its workload.
Two MinIO clients โ Pre-signed URLs must be signed with the public URL (what the browser sees). File operations use the internal container URL for speed. minio_utils.py maintains two separate boto3 clients to handle this correctly and avoid SignatureDoesNotMatch errors.
On-demand geocoding for candidates โ Geocoding 1000+ candidates on upload would take 20+ minutes and block the upload response. Coordinates are populated only when a geo filter is first requested, then cached permanently on the candidate record for instant subsequent lookups.
is_regeneration flag on PDF generation โ When a user edits job_titles, name, or location, the PDF is automatically regenerated. The flag skips incrementing batch.processed_count and sending the availability email again on regeneration, preventing double-counting and duplicate emails.
Short PostgreSQL conn_max_age โ Under heavy concurrent load (300+ CVs), long-lived DB connections are killed by PostgreSQL, causing SSL connection closed unexpectedly errors in workers. Set to 60 seconds to force fresh reconnections instead of reusing stale ones.
ActivityLog 1000-entry limit โ The system is operated by a single user. Rather than a complex WebSocket or pub/sub notification infrastructure, a simple DB-backed activity log with automatic pruning at 1000 entries covers all needs efficiently.
JWT in HttpOnly cookies โ Access and refresh tokens are stored in HttpOnly cookies, not localStorage. This prevents XSS attacks from stealing tokens. The custom CookieJWTAuthentication class falls back to the Authorization header for Swagger UI compatibility.
Batch task staggering โ When uploading 250 CVs, tasks fire with a 2-second countdown per CV (countdown=index * 2). This prevents the AI service from being overwhelmed with simultaneous requests and reduces the chance of 429 rate limit errors.
Backend/Backend.postman_collection.json contains all endpoints pre-configured for local testing. Import it into Postman and set base_url to http://localhost:8000.
MIT License โ see LICENSE for details.