Detection FineTune, hmean=0.92, accuracy=0.008 #16612

BaeSyoon · 2025-09-30T07:51:08Z

BaeSyoon
Sep 30, 2025

Training Environment:
PaddleOCR 3.2.0
PaddlePaddle-gpu 3.2.0
Python 3.10.12

Model Used:
PP-OCRv5_mobile_det_pretrained.pdparams

Issue:
I performed detection fine-tuning with my own dataset. Since I used the same dataset for training and inference, I expected high accuracy during inference.
However, during training, the best model achieved hmean = 0.9294 (92%), but when running inference after exported model, the accuracy is only about 0.8%, which is extremely low.

Dataset Characteristics:
The dataset contains very small images, ranging from 10×10 px to 50×50 px.
The label data is structured as follows. Since this is a digit recognition dataset, I stored the transcription in integer format:
train_det/img_02308.jpg [{"transcription": 50, "points": [[7, 7], [14, 7], [14, 14], [7, 14]]}]

Config File:
I used the original PP-OCRv5_mobile_det config file and only modified paths and epoch number.

Global:
  model_name: PP-OCRv5_mobile_det
  debug: false
  use_gpu: true
  epoch_num: &epoch_num 50
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./train_models/PP-OCRv5_mobile_det_002
  save_epoch_step: 10
  eval_batch_step: [0, 100]
  cal_metric_during_train: false
  pretrained_model:
  checkpoints:
  save_inference_dir: ./output/inference_det/
  use_visualdl: false
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./checkpoints/det_db/predicts_db.txt
  d2s_train_image_shape: [3, 640, 640]
  distributed: true

Inference Code:
To evaluate detection performance only, I used the TextDetection class:

MODEL_DIR = "export_models/inference/"
model = TextDetection(model_name="PP-OCRv5_mobile_det", model_dir=MODEL_DIR)

Could you please let me know what might be causing such a huge gap between the training evaluation results and the inference results? Any guidance would be greatly appreciated.

liuhongen1234567 · 2025-10-10T03:25:11Z

liuhongen1234567
Oct 10, 2025
Collaborator

Hello, there are some differences in image preprocessing methods and evaluation between inference and PP-OCRv5. During inference, the scaling method used is 960 max (the optimal configuration for Chinese recognition and normally sized images), while during evaluation, it is 736 min. You can make the following changes to modify the preprocessing method during inference to match that of evaluation. An example code is provided below.

from paddleocr import TextDetection
model = TextDetection(model_name="PP-OCRv5_server_det",limit_type="min",limit_side_len=736)
output = model.predict("general_ocr_001.png", batch_size=1)
for res in output:
    res.print()
    res.save_to_img(save_path="./output/")
    res.save_to_json(save_path="./output/res.json")

3 replies

BaeSyoon Oct 13, 2025
Author

Thank you very much for your response. But I didn’t fully understand it.
Is there any detailed documentation or research paper related to this? Also, I’d like to know the detailed process of how detection training is carried out. I couldn’t find this information on the official website, so I’m wondering if there’s any documentation or paper related to it. Thank you.

liuhongen1234567 Oct 13, 2025
Collaborator

Hello, this requires you to delve into the PaddleOCR source code. The official documentation is mainly aimed at regular users and cannot possibly list every detail exhaustively. I figured out the scaling strategy during evaluation by checking the data augmentation section in the configuration file, then searching for DetResizeForTest, and printing it out in the source code at that location.

PaddleOCR/ppocr/data/imaug/operators.py

Line 206 in eaede68

class DetResizeForTest(object):

BaeSyoon Oct 14, 2025
Author

I’ve understood how to approach this, and I finally achieved the accuracy I wanted (around 90–100%). Thank you so much for your help.

However, when using the modified limit setting, inference speed becomes significantly slower.
So now, I’d like to improve the accuracy in normal inference mode (without modifying the limit during inference).

My dataset consists of very small images (10–100px).
When using the default model (PP-OCRv5_mobile) for detection, I get about 25% accuracy.

I tried three approaches:

Changed d2s_train_image_shape = [3,128,128]
→ I expected higher accuracy in normal inference since training used small images,
but the result stopped at only 1% accuracy.

Added

DetResizeForTest:
limit_type: "min"
limit_side_len: 128

under the Train section in the config file.
→ In this case, hmean stayed at 0 throughout training.
I assume this happened because the points in the label file weren’t adjusted,
but modifying them seems complicated, so I postponed it.

Added the same DetResizeForTest block
under the Eval section in the config file.
→ This time, accuracy in normal inference reached 7%,
but still lower than the base pretrained model.

I’d like to ask for advice on how to train effectively for low-resolution (small) images.
Thank you always for your continuous support and guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Detection FineTune, hmean=0.92, accuracy=0.008 #16612

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Detection FineTune, hmean=0.92, accuracy=0.008 #16612

Uh oh!

Uh oh!

BaeSyoon Sep 30, 2025

Replies: 1 comment · 3 replies

Uh oh!

liuhongen1234567 Oct 10, 2025 Collaborator

Uh oh!

BaeSyoon Oct 13, 2025 Author

Uh oh!

liuhongen1234567 Oct 13, 2025 Collaborator

Uh oh!

BaeSyoon Oct 14, 2025 Author

BaeSyoon
Sep 30, 2025

Replies: 1 comment 3 replies

liuhongen1234567
Oct 10, 2025
Collaborator

BaeSyoon Oct 13, 2025
Author

liuhongen1234567 Oct 13, 2025
Collaborator

BaeSyoon Oct 14, 2025
Author