-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Thank you for visiting the Wiki for the OCR Reader & Translator, a sophisticated full-stack
application developed to extract text from images and PDFs using Optical Character Recognition
(OCR),
detect code snippets, and provide multilingual
translation. This project integrates a robust Flask- based
backend with an intuitive React-based
frontend, demonstrating advanced software engineering
principles. As a subcomponent of the broader initiative at [PROJECT]
(https://github.com/deoanshdeo/Project-starts), this tool integrates the Full-Stack with Machine Learning
to produce a workable application. This Wiki serves as a comprehensive resource for users, contributors,
and potential collaborators, offering detailed documentation, usage guides.
-
The
OCR Reader & Translator
is designed to address real-world challenges in document digitization and multilingual processing. The backend leverages state-of-the-art libraries—pytesseract
,EasyOCR
, andTransformers
(TrOCR and M2M100)—to deliver high-accuracy text extraction and translation. The frontend provides a modern, responsive interface with features like theme switching and drag-and-drop functionality, built using Tailwind CSS and React.
-
- app/init.py: Initializes the Flask application with CORS support.
- app/ocr.py: Implements multi-engine OCR (Tesseract, EasyOCR, TrOCR) with preprocessing for code detection.
-
app/routes.py: Defines the
/process
API endpoint for OCR and translation requests. -
app/translate.py: Utilizes the
M2M100
model for multilingual translation. -
app/main.py: Serves as the entry point to launch the
Flask server
. -
Dependencies managed via
requirements.txt
.
-
- public/index.html: Main HTML structure with JavaScript integration.
- src/components/Form.js: Handles text input, file uploads, and image pasting.
- src/components/Popup.js: Displays results with copy-to-clipboard functionality.
- src/components/ThemeSwitch.js: Manages light/dark theme toggling.
- src/App.js: Orchestrates the application layout with particle effects.
- Custom styling in
index.css
with Tailwind configuration intailwind.config.js
.
-
-
Tesseract OCR: An
open-source
OCR engine integrated for initial text extraction, enhanced with custom preprocessing to detect code snippets. Supports multiple languages, making it a cornerstone for multilingual functionality. - M2M100: A Facebook-developed Transformer model for translation across over 100 languages, showcasing advanced NLP capabilities in the backend.
-
TrOCR: A
Transformer-based OCR model
that improves accuracy on complex layouts, complementingTesseract
andEasyOCR
for robust text extraction. -
Tailwind CSS: Powers the frontend’s responsive design, including
glassmorphism
andneon glow effects
, reflecting modern UI/UX expertise. - Status: Actively developed as of April 10, 2025.
-
Tesseract OCR: An
- To set up the OCR Reader & Translator locally, follow these detailed steps:
-
-
Node.js and npm: Required for the frontend (install via
sudo apt install nodejs npm
on Ubuntu). -
Python 3.8+ and pip: Necessary for the backend (install via
sudo apt install python3-pip
). -
Git: For cloning the repository (
sudo apt install git
). -
Tesseract OCR:
- Install on Ubuntu with
sudo apt update
followed bysudo apt install tesseract-ocr libtesseract- dev
. - Verify with
tesseract --version
. - For additional languages(e.g., Hindi), use
sudo apt install tesseract-ocr-hin
(replace with appropriate language code).
- Install on Ubuntu with
-
Node.js and npm: Required for the frontend (install via
-
-
Clone the repository:
git clone https://github.com/your-username/ocr-reader.git
andcd ocr- reader
. - Set up the backend: cd backend, pip install -r requirements.txt, and python main.py (runs at http://0.0.0.0:5000).
- Set up the frontend: cd ../frontend, npm install, and npm start (runs at http://localhost:3000/).
- Access the application at http://localhost:3000/.
-
Clone the repository:
-
- If Tesseract fails, ensure the path is set in backend/app/ocr.py (e.g., pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract').
- Check port availability (5000 for
backend
, 3000 forfrontend
).
-
To maximize the utility of the OCR Reader & Translator:
- Launch the app at http://localhost:3000/.
- Input text manually, upload an image or PDF, or paste an image (
Ctrl+V
ordrag-and-drop
). - Select OCR Extract to extract text or Translate to convert to another language.
- Choose
source
andtarget
languages (e.g., auto-detect to English) if translating. - Click Process to view results in a popup; use the Copy button to save output.
- Clear the form with the Clear button to start a new.
-
-
Q: Why is OCR not detecting text?
-
A: Ensure
Tesseract
is installed and the image is clear; adjust preprocessing in ocr.py if needed.
-
A: Ensure
-
Q: How do I support a new language?
-
A: Install the corresponding
Tesseract language pack
(e.g., tesseract-ocr-fra) and updatetranslate.py
with the language code.
-
A: Install the corresponding
-
Q: Connection errors between frontend and backend?
- A: Verify both are running and ports (5000, 3000) are open.
-
Q: Why is OCR not detecting text?
-
- Expand language support with additional Tesseract packs and M2M100 models.
- Enhance code detection accuracy with machine learning optimizations.
- Introduce batch processing for multiple files.
- Updates will be documented here as they progress.
- Note: This Wiki will expand with API documentation, code examples, and deployment guides in future updates to provide an even more comprehensive resource.
-
For inquiries or collaboration:
- Author: Deoansh Deo
- Email: deoanshdeo@gmail.com
- LinkedIn: https://www.linkedin.com/in/deoansh-deo-b04569224
-
This project benefits from the open-source community, particularly libraries like
React
,Flask
,Transformers
, andTailwind CSS
. It was inspired by the demand for efficient, multilingual document processing solutions.