This web application detects plagiarism by comparing submitted text against web sources and also estimates the probability of the content being AI generated. Users can register, sign in and review past reports which are available for download as PDFs.
backend/ # Flask application
app.py # entry point
venv/ # contains application modules and requirements.txt
frontend/ # React/Vite user interface
- User Management:
- User registration and login.
- Secure authentication using session cookies (via Flask-Login).
- Core Analysis:
- Plagiarism detection for uploaded files (.txt, .pdf, .docx) or direct text input.
- Comparison against web sources using Google Search.
- AI-generated content probability estimation using Google Gemini.
- Reporting:
- Detailed analysis reports including:
- Originality score.
- AI probability score with reasoning.
- Top keywords extracted from the document.
- Search queries generated and used for web search.
- List of detected overlaps with source URLs and similarity scores.
- Downloadable PDF reports of the analysis.
- Detailed analysis reports including:
- User Dashboard & History:
- Dashboard displaying user statistics (e.g., total reports generated, last checked document).
- History page listing all past analysis reports for the logged-in user.
- User Profile:
- Basic user profile display page.
- User Interface:
- Responsive UI built with React.
The following diagram outlines the core workflow of the system:
- User Input (text or file) is submitted.
- Text Extraction parses the input.
- Query Generation creates search terms.
- Web Search & Scraping fetches relevant sources.
- Plagiarism Detection compares scraped data.
- AI Detection assesses AI generation likelihood.
- PDF Report is generated with full analysis.
- UI Output presents the report and metrics to the user.
Backend:
- Language: Python
- Framework: Flask
- Database: MongoDB (using PyMongo)
- Authentication: Flask-Login, Flask-Bcrypt
- APIs & Services:
- Google Generative AI (Gemini for embeddings, AI detection, query generation)
- Serper API (for Google search results)
- Core Logic Libraries:
- NLTK (text processing, keyword extraction)
- pdfminer.six (PDF text extraction)
- python-docx (DOCX text extraction)
- ReportLab (PDF report generation)
- BeautifulSoup4 (web scraping for source content)
- RapidFuzz (fuzzy string matching for plagiarism)
- Other: python-dotenv (environment variable management)
Frontend:
- Language: JavaScript
- Library/Framework: React
- Build Tool: Vite
- Routing: React Router
- State Management: React Context (used for
useAuth
andusePlagiarismCheck
hooks) - Styling: CSS (custom styles, no specific framework noted)
- Other:
js-cookie
(though potentially not strictly needed for auth due to Flask-Login's HTTP-only cookies, it might be used or was planned).
General:
- Version Control: Git
- Node.js and npm (or yarn) for the frontend.
- Python 3 (e.g., 3.8+) and pip for the backend.
- Access to a MongoDB instance (local or cloud-hosted).
- API Keys for:
- Google Generative AI (from Google AI Studio)
- Serper API (from serper.dev)
An example environment file is included at backend/.env.example
. Copy this
file to backend/.env
and replace the placeholder values with your own
configuration:
FLASK_SECRET_KEY=your_flask_secret_key
MONGO_URI=mongodb://localhost:27017/plagiarism_detector
GOOGLE_API_KEY=your_google_api_key
SERPER_API_KEY=your_serper_api_key
DEBUG=True
FLASK_SECRET_KEY
should be a random string, MONGO_URI
points to your MongoDB
instance and the API keys are obtained from Google AI Studio and serper.dev.
Set DEBUG=False
in production.
Note for Frontend: The frontend makes API calls to the backend, typically at http://127.0.0.1:5000
. This is hardcoded in the React hooks (frontend/src/hooks/
). If your backend runs on a different URL, you'll need to update these hooks.
-
Clone the repository:
git clone <repository_url> cd <repository_name>
-
Navigate to the backend directory:
cd backend
-
Create a Python virtual environment:
python -m venv venv
-
Activate the virtual environment:
- Windows:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
-
Install dependencies:
pip install -r venv/requirements.txt
(The
requirements.txt
file lists all backend dependencies.) -
NLTK Data Download (if needed): The application uses NLTK for text processing. The
core_detector.py
script attempts to download the 'stopwords' corpus if not found. You might need to ensure 'punkt' (for tokenization) is also available if not already handled by other dependencies or if explicit sentence tokenization is used elsewhere. If you encounter NLTK-related errors, you can manually download them by running a Python interpreter:import nltk nltk.download('stopwords') nltk.download('punkt')
-
Create and populate the
.env
file: As described in the "Environment Variables" section, create a.env
file in thebackend
directory with your API keys and configurations. -
Run the Flask development server:
python app.py
-
The backend should now be running on
http://127.0.0.1:5000
.
-
Navigate to the frontend directory (from the project root):
cd frontend
(If you are in the
backend
directory, usecd ../frontend
) -
Install dependencies:
npm install
(or
yarn install
if you prefer yarn) -
Run the Vite development server:
npm run dev
(or
yarn dev
) -
The frontend application should now be running, typically on
http://127.0.0.1:5173
, and will attempt to connect to the backend API athttp://127.0.0.1:5000
.
The backend exposes several API endpoints, including:
POST /api/auth/register
: User registration.POST /api/auth/login
: User login.POST /api/auth/logout
: User logout.GET /@me
: Get details of the currently authenticated user.POST /analyse
: Submit text or a file for plagiarism and AI content analysis. Requires authentication.GET /api/history
: Retrieve the analysis history for the authenticated user. Requires authentication.GET /download-report/<filename>
: Download a specific PDF report. Requires authentication.GET /api/dashboard_stats
: Get user statistics for the dashboard. Requires authentication.
(Refer to backend/app.py
for the complete list of routes and their functionalities.)
Contributions are welcome! Please fork the repository, create a new branch for your feature or fix, and submit a pull request with your changes.