[Help] Trouble fine-tuning PaddleOCR for Vietnamese OCR #16624

tttiuem2k3 · 2025-10-09T07:24:34Z

tttiuem2k3
Oct 9, 2025

I’m developing an OCR system for Vietnamese invoices and documents.
My goal is to fine-tune the recognition (rec) model in PaddleOCR so it can accurately read Vietnamese text (with tone marks, diacritics, and spaces).

Problem

After training for 15 epochs, the model accuracy stays almost zero (acc ≈ 0.00), and norm_edit_distance only improves slightly (~0.05–0.09).
Loss decreases normally, so the training loop works — but the recognizer doesn’t learn.

I suspect the issue is a mismatch between the recognition model and my Vietnamese character set (diacritics are likely OOV).

What I’ve tried

Used latin_PP-OCRv3_mobile_rec.yml with:
- Global.character_type: latin
- Character.character_dict_path: ./data/vietnamese/vi_vietnam.txt
  My vi_vietnam.txt dictionary includes all Vietnamese letters (
  "àáạảãăằắặẳẵâầấậẩẫèéẹẻẽêềếệểễ"
  "ìíịỉĩòóọỏõôồốộổỗơờớợởỡùúụủũ"
  "ưừứựửữỳýỵỷỹđ"
  "ÀÁẠẢÃĂẰẮẶẲẴÂẦẤẬẨẪÈÉẸẺẼÊỀẾỆỂỄ"
  "ÌÍỊỈĨÒÓỌỎÕÔỒỐỘỔỖƠỜỚỢỞỠÙÚỤỦŨ"
  "ƯỪỨỰỬỮỲÝỴỶỸĐ"
  ) in both uppercase and lowercase.
Dataset: ~25k cropped word images (train_list.txt / val_list.txt format).

Training command:
python3 tools/train.py
-c configs/rec/PP-OCRv3/multi_language/latin_PP-OCRv3_mobile_rec.yml
-o Global.use_gpu=True
Global.epoch_num=15
Global.save_model_dir=./model/ppocrv3_vi
Global.character_type=latin
Global.character_dict_path=./data/vietnamese/vi_vietnam.txt

Still, accuracy remains near zero and Vietnamese diacritics are ignored or misread.

Questions / Need Advice

Which recognition model is best for Vietnamese?
Should I use PP-OCRv5 multilingual, PP-OCRv5 server, or SVTR instead of the v3 Latin model?
Is there a pretrained checkpoint known to work better for tonal or Unicode languages?
YAML configuration for Vietnamese dictionary
Should I set Character.character_type to "ch" to ensure PaddleOCR actually loads my vi_dict.txt?
Which parameters are essential for Vietnamese invoices?
- Character.character_dict_path: ./dict/vi_dict.txt
- Character.use_space_char: True
- Character.max_text_length: 60
Any example YAML file for training a custom language recognizer?

Training settings

Recommended epoch count for convergence (40–500 epochs?)
Suggested data augmentation for printed invoice text (should I disable heavy distortions)?
How to check OOV or Unicode normalization (NFC) issues before training?

Goal

I want to build a reliable Vietnamese OCR recognizer for invoices/forms using PaddleOCR —
able to recognize diacritics, spaces, and case-sensitive text correctly.

If anyone has a working example config or fine-tuned model for Vietnamese, I would really appreciate your guidance or YAML reference.

Thank you so much for your time and help 🙏

liuhongen1234567 · 2025-10-10T02:57:15Z

liuhongen1234567
Oct 10, 2025
Collaborator

Hello, you might want to try the PP-OCRv5 Latin model. The configuration file is as follows: https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/PP-OCRv5/multi_language/latin_PP-OCRv5_mobile_rec.yml

You can download the training weights from the PaddleOCR multilingual recognition documentation https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/OCR.html#1-ocr-pipeline-introduction.

1 reply

tttiuem2k3 Oct 10, 2025
Author

Thanks so much for your feedback!
I’m currently trying to pretrain the latin_PP-OCRv5_mobile_rec model, but I’ve encountered an out-of-memory issue when running it on a 16 GB P100 GPU (Kaggle environment).
I’d like to ask for some advice regarding hardware requirements for pretraining this model from scratch, as well as possible alternatives if my hardware is limited.

Specifically:

I’m referring to this configuration file: 👉 latin_PP-OCRv5_mobile_rec.yml
As far as I know, PP-OCRv5 provides both mobile and server versions.

I’d like to know:

For pretraining from scratch, what would be the minimum and recommended hardware setup (GPU, CPU, RAM, storage) needed to train this model stably with a large dataset (millions of images)?
If the hardware is insufficient (e.g., only 1 GPU with 16 GB VRAM), are there ways to:
Fine-tune the existing pretrained model for a specific domain (e.g., invoices, packaging, noisy text)?
Reduce GPU memory usage by tuning .yml parameters such as batch size, gradient accumulation, AMP, etc.?
What kind of dataset preparation (format, data volume) would be effective for fine-tuning in such cases?

I’d really appreciate any guidance or practical experience regarding:

Recommended hardware configurations for full pretraining
Optimization strategies for small GPU environments
Best practices for fine-tuning PP-OCRv5 effectively

Thank you very much! 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Help] Trouble fine-tuning PaddleOCR for Vietnamese OCR #16624

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Help] Trouble fine-tuning PaddleOCR for Vietnamese OCR #16624

Uh oh!

tttiuem2k3 Oct 9, 2025

Replies: 1 comment · 1 reply

Uh oh!

liuhongen1234567 Oct 10, 2025 Collaborator

Uh oh!

tttiuem2k3 Oct 10, 2025 Author

tttiuem2k3
Oct 9, 2025

Replies: 1 comment 1 reply

liuhongen1234567
Oct 10, 2025
Collaborator

tttiuem2k3 Oct 10, 2025
Author