Skip to content
Deoansh Deo edited this page Apr 13, 2025 · 6 revisions

OCR Reader & Translator

Welcome to the OCR Reader & Translator Wiki

Thank you for visiting the Wiki for the OCR Reader & Translator, a sophisticated full-stack application developed to extract text from images and PDFs using Optical Character Recognition (OCR),
detect code snippets, and provide multilingual translation. This project integrates a robust Flask- based backend with an intuitive React-based frontend, demonstrating advanced software engineering principles. As a subcomponent of the broader initiative at [PROJECT] (https://github.com/deoanshdeo/Project-starts), this tool integrates the Full-Stack with Machine Learning to produce a workable application. This Wiki serves as a comprehensive resource for users, contributors, and potential collaborators, offering detailed documentation, usage guides.

  1. Project Overview and Technical Architecture

    The OCR Reader & Translator is designed to address real-world challenges in document digitization and multilingual processing. The backend leverages state-of-the-art libraries—pytesseract, EasyOCR, and Transformers (TrOCR and M2M100)—to deliver high-accuracy text extraction and translation. The frontend provides a modern, responsive interface with features like theme switching and drag-and-drop functionality, built using Tailwind CSS and React.
  • Backend Components:

    • app/init.py: Initializes the Flask application with CORS support.
    • app/ocr.py: Implements multi-engine OCR (Tesseract, EasyOCR, TrOCR) with preprocessing for code detection.
    • app/routes.py: Defines the /process API endpoint for OCR and translation requests.
    • app/translate.py: Utilizes the M2M100 model for multilingual translation.
    • app/main.py: Serves as the entry point to launch the Flask server.
    • Dependencies managed via requirements.txt.
  • Frontend Components:

    • public/index.html: Main HTML structure with JavaScript integration.
    • src/components/Form.js: Handles text input, file uploads, and image pasting.
    • src/components/Popup.js: Displays results with copy-to-clipboard functionality.
    • src/components/ThemeSwitch.js: Manages light/dark theme toggling.
    • src/App.js: Orchestrates the application layout with particle effects.
    • Custom styling in index.css with Tailwind configuration in tailwind.config.js.
  • Key Technical Components:

    • Tesseract OCR: An open-source OCR engine integrated for initial text extraction, enhanced with custom preprocessing to detect code snippets. Supports multiple languages, making it a cornerstone for multilingual functionality.
    • M2M100: A Facebook-developed Transformer model for translation across over 100 languages, showcasing advanced NLP capabilities in the backend.
    • TrOCR: A Transformer-based OCR model that improves accuracy on complex layouts, complementing Tesseract and EasyOCR for robust text extraction.
    • Tailwind CSS: Powers the frontend’s responsive design, including glassmorphism and neon glow effects, reflecting modern UI/UX expertise.
    • Status: Actively developed as of April 10, 2025.
  1. Installation and Setup Guide

    To set up the OCR Reader & Translator locally, follow these detailed steps:
  • Prerequisites:

    • Node.js and npm: Required for the frontend (install via sudo apt install nodejs npm on Ubuntu).
    • Python 3.8+ and pip: Necessary for the backend (install via sudo apt install python3-pip).
    • Git: For cloning the repository (sudo apt install git).
    • Tesseract OCR:
      • Install on Ubuntu with sudo apt update followed by sudo apt install tesseract-ocr libtesseract- dev.
      • Verify with tesseract --version.
      • For additional languages(e.g., Hindi), use sudo apt install tesseract-ocr-hin (replace with appropriate language code).
  • Installation Steps:

    1. Clone the repository: git clone https://github.com/your-username/ocr-reader.git and cd ocr- reader.
    2. Set up the backend: cd backend, pip install -r requirements.txt, and python main.py (runs at http://0.0.0.0:5000).
    3. Set up the frontend: cd ../frontend, npm install, and npm start (runs at http://localhost:3000/).
    4. Access the application at http://localhost:3000/.
  • Troubleshooting:

    • If Tesseract fails, ensure the path is set in backend/app/ocr.py (e.g., pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract').
    • Check port availability (5000 for backend, 3000 for frontend).
  1. Usage Instructions

    To maximize the utility of the OCR Reader & Translator:

    1. Launch the app at http://localhost:3000/.
    2. Input text manually, upload an image or PDF, or paste an image (Ctrl+V or drag-and-drop).
    3. Select OCR Extract to extract text or Translate to convert to another language.
    4. Choose source and target languages (e.g., auto-detect to English) if translating.
    5. Click Process to view results in a popup; use the Copy button to save output.
    6. Clear the form with the Clear button to start a new.
  2. Frequently Asked Questions

    • Q: Why is OCR not detecting text?
      • A: Ensure Tesseract is installed and the image is clear; adjust preprocessing in ocr.py if needed.
    • Q: How do I support a new language?
      • A: Install the corresponding Tesseract language pack (e.g., tesseract-ocr-fra) and update translate.py with the language code.
    • Q: Connection errors between frontend and backend?
      • A: Verify both are running and ports (5000, 3000) are open.
  3. Future Roadmap

    • Expand language support with additional Tesseract packs and M2M100 models.
    • Enhance code detection accuracy with machine learning optimizations.
    • Introduce batch processing for multiple files.
    • Updates will be documented here as they progress.
    • Note: This Wiki will expand with API documentation, code examples, and deployment guides in future updates to provide an even more comprehensive resource.
  4. Contact Information

    For inquiries or collaboration:

  5. Acknowledgments

    This project benefits from the open-source community, particularly libraries like React, Flask, Transformers, and Tailwind CSS. It was inspired by the demand for efficient, multilingual document processing solutions.