A TensorFlow/Keras implementation of a deep CNN-BiLSTM-CTC model for recognising printed Odia script from line-level images. Trained on ~4 lakh line samples, the model achieves a Character Error Rate (CER) < 2% by epoch 30โ40.
Input (H=64, W=variable, C=1 grayscale)
โ
โโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CNN Block 0 : Conv2D(64) โ BN โ ReLU โ MaxPool(2,2) โ HโH/2, WโW/2
โ CNN Block 1 : Conv2D(128) โ BN โ ReLU โ MaxPool(2,2) โ HโH/4, WโW/4
โ CNN Block 2 : Conv2D(256) โ BN โ ReLU โ MaxPool(2,1) โ HโH/8, W unchanged
โ CNN Block 3 : Conv2D(256) โ BN โ ReLU โ MaxPool(2,1) โ HโH/16, W unchanged
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ shape: (B, 4, W/4, 256)
Permute + Reshape โ (B, T=W/4, 1024) [feature per time-step]
โ
โโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BiLSTM 0 : 256 fwd + 256 bwd โ 512 โ
โ Dropout 0.3 โ
โ BiLSTM 1 : 256 fwd + 256 bwd โ 512 โ
โ Dropout 0.3 โ
โ BiLSTM 2 : 256 fwd + 256 bwd โ 512 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Dense(num_classes + 1) [+1 for CTC blank at index 0]
โ
CTC Loss / Greedy Decode
Parameter count (approx): ~11 M total (CNN ~1.2 M ยท BiLSTM stack ~9.5 M ยท Dense ~0.1 M)
odia-line-ocr/
โโโ config.yaml # All hyperparameters & paths
โโโ train.py # Main training script
โโโ evaluate.py # Test-set evaluation (CER / WER)
โโโ infer.py # Single image / directory inference
โโโ compute_cer_&_wer.py # Standalone CER/WER computation
โโโ image_preprocessing_VIT.py # Image preprocessing utilities (ViT-style)
โโโ image_size.py # Image size analysis
โโโ map_stats.py # Mapping file statistics
โโโ sorting_new_map.py # Sort/clean mapping file
โโโ create_final_mapping.py # Build final mapping from raw data
โโโ Line_extraction_using_teseract.py # Tesseract-based line extraction
โโโ requirements.txt # pip dependencies
โโโ environment.yml # Conda environment (recommended)
โโโ data/
โ โโโ dataset.py # tf.data pipeline, preprocessing & charset utils
โ โโโ charset.txt # Auto-generated Odia Unicode charset
โโโ models/
โ โโโ cnn_bilstm_ctc.py # Model architecture + CTCModel class
โโโ utils/
โโโ build_charset.py # One-time charset builder
โโโ lr_schedule.py # Warmup + cosine annealing
โโโ metrics.py # CER / WER via edit distance
conda env create -f environment.yml
conda activate odia_ocrRequires CUDA 11.8 and cuDNN 8.6
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtVerify GPU:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
# Expected: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]dataset:
mapping_file: "output/map_sorted.txt"
image_root: "output"Scans all ground-truth lines and writes data/charset.txt:
python utils/build_charset.py --config config.yamlExpected output:
Total lines : ~400,000
Unique chars : ~200โ250 (Odia Unicode + punctuation + digits)
Charset saved : data/charset.txt
python train.py --config config.yamlResume from a checkpoint:
python train.py --config config.yaml --resume checkpoints/best_model.kerasMonitor with TensorBoard:
tensorboard --logdir logs/python evaluate.py --config config.yaml --checkpoint checkpoints/best_model.kerasSingle image:
python infer.py --image /path/to/line.png --checkpoint checkpoints/best_model.kerasWhole directory:
python infer.py --dir /path/to/lines/ --checkpoint checkpoints/best_model.keras > results.tsvEach line is tab-separated:
OdiaLineLevelDataSet/40_op/40_0001-003/40_0001-003_line_1.png เฌธเฌฎเฌพเฌฆเฌ :
- Column 1: relative image path (joined with
image_root) - Column 2: ground-truth Odia text (UTF-8)
| Parameter | Default | Notes |
|---|---|---|
target_height |
64 | Fixed height in pixels |
max_width |
2048 | Lines wider than this are centre-cropped |
batch_size |
16 | Reduce to 8 if VRAM < 8 GB |
epochs |
100 | Early stopping patience = 10 |
initial_lr |
1e-3 | Warmup + cosine annealing |
lstm_units |
[256, 256, 256] | Units per direction (output = 2ร) |
mixed_precision |
true | fp16 โ requires CUDA GPU |
| Situation | Recommendation |
|---|---|
| VRAM < 8 GB | Set batch_size: 8 in config |
| Training loss NaN | Lower initial_lr to 5e-4 |
| CER plateaus early | Enable dilation_erosion augmentation |
| Very long lines (>1800 px) | Increase max_width to 2560 |
| Want faster convergence | Increase lstm_units to [512, 512, 256] |
| Metric | Target |
|---|---|
| Character Error Rate (CER) | < 2% by epoch 30โ40 |
| Char Accuracy | ~99% |
| Word Error Rate (WER) | < 5% |
Based on comparable Bengali architecture (47K samples). With 4-lakh Odia samples + augmentation, results can exceed these benchmarks.
- Fixed height (64px), variable width โ avoids distorting Odia conjuncts
- Grayscale conversion โ reduces memory by 3ร; colour carries no extra signal for printed text
- CTC blank = index 0 โ consistent with
tf.nn.ctc_loss(blank_index=0) - W-halving only in first 2 pool layers โ preserves fine horizontal resolution needed for dense Odia conjunct sequences
- Warmup + cosine decay โ stable training from scratch on large datasets
- Mixed precision (fp16) โ ~1.8ร throughput on Ampere/Turing GPUs
- Python 3.10
- TensorFlow 2.15 (with CUDA support)
- OpenCV 4.8
- Albumentations 1.3
- pyctcdecode
- editdistance
See requirements.txt or environment.yml for the full pinned list.
This project is released for research and educational purposes. Please cite appropriately if used in academic work.
- Architecture inspired by CRNN (Shi et al., 2016) adapted for Indic scripts
- Dataset: Odia line-level images from CDAC
- Odia Unicode support via standard UTF-8 encoding