skip sfx when scanning images #16576

nhannguyen0411 · 2025-09-25T03:41:54Z

nhannguyen0411
Sep 25, 2025

Hi everyone, I'm new to PPOCRV5, currently I'm having a problem using PPOCRV5 to scan text on images. When I add the parameter lang="en", it automatically scans kanji/kana as numbers, but if I remove lang="en", the kanji/kana are kept but other characters are recognized incorrectly. Now I want to remove the sfx recognition or keep the kanji/kana when leaving lang="en". Does anyone have a way? Thanks.

python: 3.11
paddleocr: latest

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,           
    text_det_limit_side_len=1280,
    text_det_limit_type="max",
    use_textline_orientation=False,
    lang="en"
)

res_list = ocr.predict(input="images/page_003.png")

for res in res_list:
    res.print() 
    res.save_to_img("output/step1/")   
    res.save_to_json("output/step1/")

lang="en"

without lang="en"

liuhongen1234567 · 2025-09-27T13:39:52Z

liuhongen1234567
Sep 27, 2025
Collaborator

Hello, in this case, you might consider performing a secondary recognition. For example, for all text areas recognized as numbers, use a text recognition model without the lang="en" parameter to recognize them again. Here is the documentation for using the text recognition model. https://www.paddleocr.ai/main/en/version3.x/module_usage/text_recognition.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

skip sfx when scanning images #16576

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

skip sfx when scanning images #16576

Uh oh!

nhannguyen0411 Sep 25, 2025

Replies: 1 comment

Uh oh!

liuhongen1234567 Sep 27, 2025 Collaborator

nhannguyen0411
Sep 25, 2025

liuhongen1234567
Sep 27, 2025
Collaborator