Skip to content

Udayps2303/Line-Level-OCR-using-CNN-BiLSTM-CTC-loss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ”ค Odia Line-Level OCR โ€” CNN-BiLSTM-CTC

A TensorFlow/Keras implementation of a deep CNN-BiLSTM-CTC model for recognising printed Odia script from line-level images. Trained on ~4 lakh line samples, the model achieves a Character Error Rate (CER) < 2% by epoch 30โ€“40.


๐Ÿ“ Architecture

Input (H=64, W=variable, C=1 grayscale)
        โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  CNN Block 0 : Conv2D(64)  โ†’ BN โ†’ ReLU โ†’ MaxPool(2,2)     โ”‚  Hโ†’H/2,  Wโ†’W/2
  โ”‚  CNN Block 1 : Conv2D(128) โ†’ BN โ†’ ReLU โ†’ MaxPool(2,2)     โ”‚  Hโ†’H/4,  Wโ†’W/4
  โ”‚  CNN Block 2 : Conv2D(256) โ†’ BN โ†’ ReLU โ†’ MaxPool(2,1)     โ”‚  Hโ†’H/8,  W unchanged
  โ”‚  CNN Block 3 : Conv2D(256) โ†’ BN โ†’ ReLU โ†’ MaxPool(2,1)     โ”‚  Hโ†’H/16, W unchanged
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚  shape: (B, 4, W/4, 256)
  Permute + Reshape โ†’ (B, T=W/4, 1024)   [feature per time-step]
        โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  BiLSTM 0 : 256 fwd + 256 bwd โ†’ 512                       โ”‚
  โ”‚  Dropout 0.3                                               โ”‚
  โ”‚  BiLSTM 1 : 256 fwd + 256 bwd โ†’ 512                       โ”‚
  โ”‚  Dropout 0.3                                               โ”‚
  โ”‚  BiLSTM 2 : 256 fwd + 256 bwd โ†’ 512                       โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚
  Dense(num_classes + 1)    [+1 for CTC blank at index 0]
        โ”‚
  CTC Loss / Greedy Decode

Parameter count (approx): ~11 M total (CNN ~1.2 M ยท BiLSTM stack ~9.5 M ยท Dense ~0.1 M)


๐Ÿ“ Project Structure

odia-line-ocr/
โ”œโ”€โ”€ config.yaml                   # All hyperparameters & paths
โ”œโ”€โ”€ train.py                      # Main training script
โ”œโ”€โ”€ evaluate.py                   # Test-set evaluation (CER / WER)
โ”œโ”€โ”€ infer.py                      # Single image / directory inference
โ”œโ”€โ”€ compute_cer_&_wer.py          # Standalone CER/WER computation
โ”œโ”€โ”€ image_preprocessing_VIT.py    # Image preprocessing utilities (ViT-style)
โ”œโ”€โ”€ image_size.py                 # Image size analysis
โ”œโ”€โ”€ map_stats.py                  # Mapping file statistics
โ”œโ”€โ”€ sorting_new_map.py            # Sort/clean mapping file
โ”œโ”€โ”€ create_final_mapping.py       # Build final mapping from raw data
โ”œโ”€โ”€ Line_extraction_using_teseract.py  # Tesseract-based line extraction
โ”œโ”€โ”€ requirements.txt              # pip dependencies
โ”œโ”€โ”€ environment.yml               # Conda environment (recommended)
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ dataset.py                # tf.data pipeline, preprocessing & charset utils
โ”‚   โ””โ”€โ”€ charset.txt               # Auto-generated Odia Unicode charset
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ cnn_bilstm_ctc.py         # Model architecture + CTCModel class
โ””โ”€โ”€ utils/
    โ”œโ”€โ”€ build_charset.py          # One-time charset builder
    โ”œโ”€โ”€ lr_schedule.py            # Warmup + cosine annealing
    โ””โ”€โ”€ metrics.py                # CER / WER via edit distance

โš™๏ธ Environment Setup

Option A โ€” Conda (recommended)

conda env create -f environment.yml
conda activate odia_ocr

Option B โ€” pip + system CUDA

Requires CUDA 11.8 and cuDNN 8.6

python -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate
pip install -r requirements.txt

Verify GPU:

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
# Expected: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

๐Ÿš€ Usage

1. Configure paths in config.yaml

dataset:
  mapping_file: "output/map_sorted.txt"
  image_root:   "output"

2. Build charset (run once)

Scans all ground-truth lines and writes data/charset.txt:

python utils/build_charset.py --config config.yaml

Expected output:

Total lines   : ~400,000
Unique chars  : ~200โ€“250   (Odia Unicode + punctuation + digits)
Charset saved : data/charset.txt

3. Train

python train.py --config config.yaml

Resume from a checkpoint:

python train.py --config config.yaml --resume checkpoints/best_model.keras

Monitor with TensorBoard:

tensorboard --logdir logs/

4. Evaluate (CER / WER on test set)

python evaluate.py --config config.yaml --checkpoint checkpoints/best_model.keras

5. Inference

Single image:

python infer.py --image /path/to/line.png --checkpoint checkpoints/best_model.keras

Whole directory:

python infer.py --dir /path/to/lines/ --checkpoint checkpoints/best_model.keras > results.tsv

๐Ÿ—‚๏ธ Mapping File Format

Each line is tab-separated:

OdiaLineLevelDataSet/40_op/40_0001-003/40_0001-003_line_1.png	เฌธเฌฎเฌพเฌฆเฌ• :
  • Column 1: relative image path (joined with image_root)
  • Column 2: ground-truth Odia text (UTF-8)

๐ŸŽ›๏ธ Key Hyperparameters (config.yaml)

Parameter Default Notes
target_height 64 Fixed height in pixels
max_width 2048 Lines wider than this are centre-cropped
batch_size 16 Reduce to 8 if VRAM < 8 GB
epochs 100 Early stopping patience = 10
initial_lr 1e-3 Warmup + cosine annealing
lstm_units [256, 256, 256] Units per direction (output = 2ร—)
mixed_precision true fp16 โ€” requires CUDA GPU

๐Ÿ› ๏ธ Training Tips

Situation Recommendation
VRAM < 8 GB Set batch_size: 8 in config
Training loss NaN Lower initial_lr to 5e-4
CER plateaus early Enable dilation_erosion augmentation
Very long lines (>1800 px) Increase max_width to 2560
Want faster convergence Increase lstm_units to [512, 512, 256]

๐Ÿ“Š Expected Results

Metric Target
Character Error Rate (CER) < 2% by epoch 30โ€“40
Char Accuracy ~99%
Word Error Rate (WER) < 5%

Based on comparable Bengali architecture (47K samples). With 4-lakh Odia samples + augmentation, results can exceed these benchmarks.


๐Ÿ”‘ Key Design Decisions

  1. Fixed height (64px), variable width โ€” avoids distorting Odia conjuncts
  2. Grayscale conversion โ€” reduces memory by 3ร—; colour carries no extra signal for printed text
  3. CTC blank = index 0 โ€” consistent with tf.nn.ctc_loss(blank_index=0)
  4. W-halving only in first 2 pool layers โ€” preserves fine horizontal resolution needed for dense Odia conjunct sequences
  5. Warmup + cosine decay โ€” stable training from scratch on large datasets
  6. Mixed precision (fp16) โ€” ~1.8ร— throughput on Ampere/Turing GPUs

๐Ÿ“ฆ Dependencies

  • Python 3.10
  • TensorFlow 2.15 (with CUDA support)
  • OpenCV 4.8
  • Albumentations 1.3
  • pyctcdecode
  • editdistance

See requirements.txt or environment.yml for the full pinned list.


๐Ÿ“„ License

This project is released for research and educational purposes. Please cite appropriately if used in academic work.


๐Ÿ™ Acknowledgements

  • Architecture inspired by CRNN (Shi et al., 2016) adapted for Indic scripts
  • Dataset: Odia line-level images from CDAC
  • Odia Unicode support via standard UTF-8 encoding

Line-Level-OCR-using-CNN-BiLSTM-CTC-loss

About

CNN-BiLSTM-CTC model for printed Odia script recognition from line-level images, built with TensorFlow/Keras. Achieves CER < 2% on ~4 lakh line samples.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages