[Help] After training PP-OCRv5_server_rec with Vietnamese data, it no longer recognizes Japanese text #16817

tttiuem2k3 · 2025-10-24T06:26:42Z

tttiuem2k3
Oct 24, 2025

Hello PaddleOCR team,

I need some help with an issue I encountered during training.
When I train the PP-OCRv5_server_rec model with Vietnamese data, it performs very well for Vietnamese recognition.
However, after training, the model forgets Japanese — it fails to recognize Japanese text correctly (characters are either misrecognized or displayed as squares).

Here’s what I did:

Kept the original PP-OCRv5_server_rec architecture

Used a custom Vietnamese dictionary (vi_dict.txt) during training

Confirmed that the Japanese dictionary still exists and is complete

Despite that, Japanese OCR no longer works properly, while the original pretrained model could handle multiple languages (including Japanese) without any problem.

👉 My question is:
How can I train the model with Vietnamese data without losing the multilingual capability (especially for Japanese)?

Thank you very much for your help!

liuhongen1234567 · 2025-10-24T06:54:42Z

liuhongen1234567
Oct 24, 2025
Collaborator

Hello, it is recommended to use PP-OCRv5_server_rec to scrape some Japanese data and add it to the training set for mixed training.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Help] After training PP-OCRv5_server_rec with Vietnamese data, it no longer recognizes Japanese text #16817

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Help] After training PP-OCRv5_server_rec with Vietnamese data, it no longer recognizes Japanese text #16817

Uh oh!

tttiuem2k3 Oct 24, 2025

Replies: 1 comment

Uh oh!

liuhongen1234567 Oct 24, 2025 Collaborator

tttiuem2k3
Oct 24, 2025

liuhongen1234567
Oct 24, 2025
Collaborator