Skip to content

Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.

License

Notifications You must be signed in to change notification settings

Md-Emon-Hasan/InformaTruth

Repository files navigation

InformaTruth: AI-Driven News Authenticity Analyzer

CI/CD Python PyTorch Hugging Face LangChain scikit-learn Pandas FastAPI Docker React Tailwind CSS Vite

InformaTruth is an end-to-end AI-powered multi-agent fact-checking system that automatically verifies news articles, PDFs, and web content. It leverages RoBERTa fine-tuning, LangGraph orchestration, RAG pipelines, and fallback retrieval agents to deliver reliable, context-aware verification. The system features a modular multi-agent architecture including Planner, Retriever, Generator, Memory, and Fallback Agents, integrating diverse tools for comprehensive reasoning.

It achieves ~70% accuracy and F1 ~69% on the LIAR dataset, with 95% query coverage and ~60% improved reliability through intelligent tool routing and memory integration. Designed for real-world deployment, InformaTruth includes a Flask-based responsive UI, FastAPI endpoints, Dockerized containers, and a CI/CD pipeline, enabling enterprise-grade automated fact verification at scale.

demo.mp4

InformaTruth InformaTruth


Live Demo

Try it now: InformaTruth — Fake News Detection AI App


Tech Stack

Category Technology/Resource
Core Framework PyTorch, Transformers, HuggingFace
Frontend Framework React.js (Vite), Tailwind CSS, DaisyUI
Backend Framework FastAPI (Async, Pydantic)
Classification Model Fine-tuned RoBERTa-base on LIAR Dataset
Explanation Model FLAN-T5-base (Zero-shot Prompting)
Training Data LIAR Dataset (Political Fact-Checking)
Evaluation Metrics Accuracy, Precision, Recall, F1-score
Training Framework HuggingFace Trainer
LangGraph Orchestration LangGraph (Multi-Agent Directed Acyclic Execution Graph)
Agents Used PlannerAgent, InputHandlerAgent, ToolRouterAgent, ExecutorAgent, ExplanationAgent, FallbackSearchAgent
Input Modalities Raw Text, Website URLs (via Newspaper3k), PDF Documents (via PyMuPDF)
Tool Augmentation DuckDuckGo Search API (Fallback), Wikipedia (Planned), ToolRouter Logic
Web Scraping Newspaper3k (HTML → Clean Article)
PDF Parsing PyMuPDF (Backend) / PDF.js (Frontend)
Explainability Natural language justification generated using FLAN-T5
State Management Shared State Object (LangGraph-compatible)
Hosting Platform Render (Docker)
Version Control Git, GitHub
Logging & Debugging Centralized Logs in logs/ directory
Database SQLite + SQLModel (Auto-persistence of analysis results)
Input Support Text, URLs, PDF documents

Key Features

  • Monolithic & Agentic Architecture Strictly organized codebase following agentic principles with modular separation of concerns.

  • Modern React Frontend A responsive, pixel-perfect UI built with React, Vite, and Tailwind CSS, featuring dark mode and glassmorphism design.

  • FastAPI Backend High-performance asynchronous API handling automatic documentation and efficient model serving.

  • Multi-Format Input Support Accepts raw text, web URLs, and PDF documents (with client-side text extraction).

  • Full NLP Pipeline Integrates fake news classification (RoBERTa) and natural language explanation (FLAN-T5).

  • Modular Agent-Based Architecture Built using LangGraph with modular agents: Planner, Router, Executor, and Fallback.

  • Explanation Generation Uses FLAN-T5 to generate human-readable rationales for model predictions.

  • Comprehensive Testing Targeting 100% test coverage with automated unit and integration tests using pytest (Backend) and vitest (Frontend).

  • Structured Logging All logs are automatically saved to the logs/ directory for better debugging and monitoring.


Project File Structure

InformaTruth/
│
├── .github/
│   └── workflows/
│       └── main.yml                  # CI/CD Configuration
│
├── backend/                          # FastAPI Backend
│   ├── app/                          # Application Package
│   │   ├── agents/                   # Modular Pipeline Agents
│   │   │   ├── executor.py
│   │   │   ├── fallback_search.py
│   │   │   ├── input_handler.py
│   │   │   ├── planner.py
│   │   │   └── router.py
│   │   ├── graph/                    # LangGraph Orchestration
│   │   │   ├── builder.py
│   │   │   └── state.py
│   │   ├── models/                   # AI Model Wrappers
│   │   │   ├── classifier.py
│   │   │   ├── db.py                 # Database Models
│   │   │   └── loader.py
│   │   ├── utils/                    # Shared Utilities
│   │   │   ├── logger.py
│   │   │   └── results.py
│   │   ├── db.py                     # Database Connection & Setup
│   │   └── main.py                   # FastAPI entry point
│   │   └── valid.tsv
│   ├── logs/                         # Application Logs
│   │   └── fake_news_pipeline.log
│   ├── news/                         # Sample Data
│   │   └── news.pdf
│   ├── tests/                        # Backend Tests
│   │   ├── conftest.py
│   │   ├── test_agents.py
│   │   ├── test_api.py
│   │   ├── test_db.py                # Database Integration Tests
│   │   ├── test_edge_cases.py
│   │   ├── test_lifespan.py
│   │   └── test_models.py
│   ├── train/                        # Training module
│   │   ├── config.py
│   │   ├── data_loader.py
│   │   ├── predictor.py
│   │   ├── run.py
│   │   ├── trainer.py
│   │   └── utils.py
│   ├── config.py                     # Global Configuration
│   ├── Dockerfile                    # Backend Dockerfile
│   ├── pyproject.toml                # Project Configuration
│   ├── requirements.txt              # Python dependencies
│   └── setup.py                      # Package Setup
│
├── frontend/                         # React Frontend
│   ├── public/                       # Static Assets
│   ├── src/                          # Source Code
│   │   ├── assets/                   # Images/Vectors
│   │   │   └── react.svg
│   │   ├── components/               # React Components
│   │   │   ├── AnalysisForm.jsx
│   │   │   ├── Footer.jsx
│   │   │   ├── Hero.jsx
│   │   │   ├── Navbar.jsx
│   │   │   └── ResultsParams.jsx
│   │   ├── App.jsx                   # Main App Component
│   │   ├── App.test.jsx              # Frontend Unit Tests
│   │   ├── index.css                 # Tailwind
│   │   ├── main.jsx                 
│   │   └── setupTests.js             
│   ├── Dockerfile                    # Frontend Dockerfile
│   ├── eslint.config.js            
│   ├── index.html                    # HTML Entry Point
│   ├── package-lock.json           
│   ├── package.json                 
│   └── vite.config.js                # Vite Configuration
│
├── docker-compose.yml                # Docker Orchestration
├── demo.mp4                          # Demo Video
├── demo-1.png                        # Demo Image
├── demo-2.png                        # Demo Image
├── LICENSE                           # Project License
├── README.md                         # Documentation
├── render.yml                        # Render Deployment Config
└── run.py                            # Root launch script

Getting Started

1. Running the Application (Local Development)

To launch both the FastAPI Backend and React Frontend locally in parallel:

python run.py
  • Backend: http://localhost:8000
  • Frontend: http://localhost:5173

2. Running with Docker (Production/Containerized)

To build and run the entire stack using Docker Compose:

docker-compose up --build

3. Running Training

To trigger the model training process (ensure you are in backend/):

cd backend
python train/run.py

4. Running Tests

Backend (Pytest):

cd backend
python -m pytest tests/ --cov=app --cov-report=term-missing

Frontend (Vitest):

cd frontend
npm test

System Architecture

graph TD
    A[User Input] --> B{Input Type}
    B -->|Text| C[Direct Text Processing]
    B -->|URL| D[Newspaper3k Parser]
    B -->|PDF| E[PDF.js Extraction]

    C --> F[Text Cleaner]
    D --> F
    E --> F

    F --> G[Context Validator]
    G -->|Sufficient Context| H[RoBERTa Classifier]
    G -->|Insufficient Context| I[Web Search Agent]
    
    I --> J[Context Aggregator]
    J --> H

    H --> K[FLAN-T5 Explanation Generator]
    K --> L[Output Formatter]
    
    L --> M[React Frontend]

    style M fill:#e3f2fd,stroke:#90caf9
    style G fill:#fff9c4,stroke:#fbc02d
    style I fill:#fbe9e7,stroke:#ff8a65
    style H fill:#f1f8e9,stroke:#aed581
Loading

Model Performance

Epoch Train Loss Val Loss Accuracy F1 Precision Recall
1 0.6353 0.6205 0.6557 0.6601 0.6663 0.6557
2 0.6132 0.5765 0.7032 0.6720 0.6817 0.7032
3 0.5957 0.5779 0.6970 0.6927 0.6899 0.6970
4 0.5781 0.5778 0.6978 0.6899 0.6864 0.6978
5 0.5599 0.5810 0.6954 0.6882 0.6846 0.6954

Emphasis on Recall ensures the model catches most fake news cases.


Professional Testing & Quality

1. Linting

Backend (Ruff):

cd backend
ruff check app/ tests/

Frontend (ESLint):

cd frontend
npm run lint

2. Code Formatting

Backend (Black):

cd backend
black app/ tests/

CI/CD Pipeline (GitHub Actions)

The project utilizes a comprehensive GitHub Actions workflow for automated testing and validation.

Workflow Features:

  • Backend:
    • Sets up Python 3.11
    • Installs dependencies from requirements.txt
    • Runs Ruff for linting
    • Runs Black for formatting checks
    • Executs Pytest with coverage reporting
  • Frontend:
    • Sets up Node.js 18
    • Installs dependencies (npm ci)
    • Runs ESLint
    • Executes Vitest unit tests
  • Docker:
    • Builds backend and frontend Docker images upon successful tests

Trigger

The pipeline runs automatically on every push and pull_request to the main branch.


Developed By

Md Emon Hasan
Email: emon.mlengineer@gmail.com WhatsApp: +8801834363533
GitHub: Md-Emon-Hasan
LinkedIn: Md Emon Hasan
Facebook: Md Emon Hasan


License

MIT License. Free to use with credit.

About

Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published