Plagiarism and AI Content Detector (Nokia_Hackathon)

1. Project Overview

This web application detects plagiarism by comparing submitted text against web sources and also estimates the probability of the content being AI generated. Users can register, sign in and review past reports which are available for download as PDFs.

Directory structure

backend/   # Flask application
  app.py   # entry point
  venv/    # contains application modules and requirements.txt
frontend/  # React/Vite user interface

2. Features

User Management:
- User registration and login.
- Secure authentication using session cookies (via Flask-Login).
Core Analysis:
- Plagiarism detection for uploaded files (.txt, .pdf, .docx) or direct text input.
- Comparison against web sources using Google Search.
- AI-generated content probability estimation using Google Gemini.
Reporting:
- Detailed analysis reports including:
  - Originality score.
  - AI probability score with reasoning.
  - Top keywords extracted from the document.
  - Search queries generated and used for web search.
  - List of detected overlaps with source URLs and similarity scores.
- Downloadable PDF reports of the analysis.
User Dashboard & History:
- Dashboard displaying user statistics (e.g., total reports generated, last checked document).
- History page listing all past analysis reports for the logged-in user.
User Profile:
- Basic user profile display page.
User Interface:
- Responsive UI built with React.

🧩 3. Architecture Diagram

The following diagram outlines the core workflow of the system:

🔄 Workflow Summary:

User Input (text or file) is submitted.
Text Extraction parses the input.
Query Generation creates search terms.
Web Search & Scraping fetches relevant sources.
Plagiarism Detection compares scraped data.
AI Detection assesses AI generation likelihood.
PDF Report is generated with full analysis.
UI Output presents the report and metrics to the user.

4. Tech Stack

Backend:

Language: Python
Framework: Flask
Database: MongoDB (using PyMongo)
Authentication: Flask-Login, Flask-Bcrypt
APIs & Services:
- Google Generative AI (Gemini for embeddings, AI detection, query generation)
- Serper API (for Google search results)
Core Logic Libraries:
- NLTK (text processing, keyword extraction)
- pdfminer.six (PDF text extraction)
- python-docx (DOCX text extraction)
- ReportLab (PDF report generation)
- BeautifulSoup4 (web scraping for source content)
- RapidFuzz (fuzzy string matching for plagiarism)
Other: python-dotenv (environment variable management)

Frontend:

Language: JavaScript
Library/Framework: React
Build Tool: Vite
Routing: React Router
State Management: React Context (used for useAuth and usePlagiarismCheck hooks)
Styling: CSS (custom styles, no specific framework noted)
Other: js-cookie (though potentially not strictly needed for auth due to Flask-Login's HTTP-only cookies, it might be used or was planned).

General:

Version Control: Git

5. Prerequisites

Node.js and npm (or yarn) for the frontend.
Python 3 (e.g., 3.8+) and pip for the backend.
Access to a MongoDB instance (local or cloud-hosted).
API Keys for:
- Google Generative AI (from Google AI Studio)
- Serper API (from serper.dev)

6. Environment Variables

An example environment file is included at backend/.env.example. Copy this file to backend/.env and replace the placeholder values with your own configuration:

FLASK_SECRET_KEY=your_flask_secret_key
MONGO_URI=mongodb://localhost:27017/plagiarism_detector
GOOGLE_API_KEY=your_google_api_key
SERPER_API_KEY=your_serper_api_key
DEBUG=True

FLASK_SECRET_KEY should be a random string, MONGO_URI points to your MongoDB instance and the API keys are obtained from Google AI Studio and serper.dev. Set DEBUG=False in production.

Note for Frontend: The frontend makes API calls to the backend, typically at http://127.0.0.1:5000. This is hardcoded in the React hooks (frontend/src/hooks/). If your backend runs on a different URL, you'll need to update these hooks.

7. Backend Setup

Clone the repository:

git clone <repository_url>
cd <repository_name>

Navigate to the backend directory:
```
cd backend
```
Create a Python virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
venv\Scripts\activate
```
- macOS/Linux:
```
source venv/bin/activate
```
Install dependencies:
```
pip install -r venv/requirements.txt
```
(The requirements.txt file lists all backend dependencies.)
NLTK Data Download (if needed): The application uses NLTK for text processing. The core_detector.py script attempts to download the 'stopwords' corpus if not found. You might need to ensure 'punkt' (for tokenization) is also available if not already handled by other dependencies or if explicit sentence tokenization is used elsewhere. If you encounter NLTK-related errors, you can manually download them by running a Python interpreter:
```
import nltk
nltk.download('stopwords')
nltk.download('punkt')
```
Create and populate the .env file: As described in the "Environment Variables" section, create a .env file in the backend directory with your API keys and configurations.
Run the Flask development server:
```
python app.py
```
The backend should now be running on http://127.0.0.1:5000.

8. Frontend Setup

Navigate to the frontend directory (from the project root):
```
cd frontend
```
(If you are in the backend directory, use cd ../frontend)
Install dependencies:
```
npm install
```
(or yarn install if you prefer yarn)
Run the Vite development server:
```
npm run dev
```
(or yarn dev)
The frontend application should now be running, typically on http://127.0.0.1:5173, and will attempt to connect to the backend API at http://127.0.0.1:5000.

9. API Endpoints (Key Examples)

The backend exposes several API endpoints, including:

POST /api/auth/register: User registration.
POST /api/auth/login: User login.
POST /api/auth/logout: User logout.
GET /@me: Get details of the currently authenticated user.
POST /analyse: Submit text or a file for plagiarism and AI content analysis. Requires authentication.
GET /api/history: Retrieve the analysis history for the authenticated user. Requires authentication.
GET /download-report/<filename>: Download a specific PDF report. Requires authentication.
GET /api/dashboard_stats: Get user statistics for the dashboard. Requires authentication.

(Refer to backend/app.py for the complete list of routes and their functionalities.)

10. Contributing

Contributions are welcome! Please fork the repository, create a new branch for your feature or fix, and submit a pull request with your changes.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
backend		backend
frontend		frontend
README.md		README.md
architecture.png		architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Plagiarism and AI Content Detector (Nokia_Hackathon)

1. Project Overview

Directory structure

2. Features

🧩 3. Architecture Diagram

🔄 Workflow Summary:

4. Tech Stack

5. Prerequisites

6. Environment Variables

7. Backend Setup

8. Frontend Setup

9. API Endpoints (Key Examples)

10. Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

piyush2229/Nokia_Hackathon_Team_6

Folders and files

Latest commit

History

Repository files navigation

Plagiarism and AI Content Detector (Nokia_Hackathon)

1. Project Overview

Directory structure

2. Features

🧩 3. Architecture Diagram

🔄 Workflow Summary:

4. Tech Stack

5. Prerequisites

6. Environment Variables

7. Backend Setup

8. Frontend Setup

9. API Endpoints (Key Examples)

10. Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages