How to avoid reloading PP-StructureV3 models on every request in Flask/Celery without std::exception crashes? #16589

Atulok0506 · 2025-09-26T18:31:36Z

Atulok0506
Sep 26, 2025

I’m using PP-StructureV3 in a Flask + Celery application.

If I reload the pipeline for each request, it works but is too slow (since it reloads ~6 models every time).

If I cache/reuse the pipeline across requests, the first request succeeds, but on the second one I get a std::exception crash from Paddle’s C++ backend.

This makes it look like Paddle’s internal state gets corrupted after reuse.

Question:
What is the recommended way to run PP-StructureV3 in a long-lived server environment (Flask/Celery)?

Is there an officially supported pattern for keeping the models loaded in memory across multiple requests?

Or is the only safe option to reload the pipeline each time or run it in short-lived subprocesses?

Do PaddleOCR/PaddleX developers recommend any workaround for avoiding std::exception while still getting reasonable performance?

liuhongen1234567 · 2025-09-27T13:09:34Z

liuhongen1234567
Sep 27, 2025
Collaborator

Hello, for deployment, you can refer to the documentation at https://www.paddleocr.ai/latest/en/version3.x/deployment/serving.html.

If you want to avoid loading the model every time, you can initialize the service with pipeline = PPStructureV3() outside or at a higher level, and then simply call output = pipeline.predict("./pp_structure_v3_demo.png") each time you want to test in Flask.

1 reply

Sanket-Navriti Oct 5, 2025

Hello, thank you for the response.

I've tried initializing pipeline = PPStructureV3() outside the request handler as recommended, but I'm running into a critical issue:

First request: Works perfectly ✓
Second request: Crashes with std::exception from Paddle's C++ backend ✗

The pipeline seems to have corrupted internal state after the first prediction.

My setup:

Flask + Celery application
PP-StructureV3 initialized once at module level
Reusing same pipeline instance across requests

Already tried:

Thread locks
Garbage collection between requests
use_gpu=False
Single-threaded Flask

None of these resolve the std::exception on the second request.

Current situation:
The only working solution is reinitializing the pipeline each time, but this reloads ~6 models per request (5-10 seconds overhead), which defeats the purpose.
This take time to process the pdf like now it's taking 4 minutes minimum for processing.

Questions:

Is there a known issue with PP-StructureV3's state management in long-lived processes?
Are there configuration options to reset the C++ backend between predictions?
What's the recommended production pattern for handling multiple sequential requests?
Should we use subprocess/container-per-request instead?

The documentation suggests keeping the pipeline loaded, but the C++ exception indicates state isn't being properly reset between predictions. Any guidance on production-ready deployment would be helpful.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to avoid reloading PP-StructureV3 models on every request in Flask/Celery without std::exception crashes? #16589

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to avoid reloading PP-StructureV3 models on every request in Flask/Celery without std::exception crashes? #16589

Uh oh!

Atulok0506 Sep 26, 2025

Replies: 1 comment · 1 reply

Uh oh!

liuhongen1234567 Sep 27, 2025 Collaborator

Uh oh!

Sanket-Navriti Oct 5, 2025

Atulok0506
Sep 26, 2025

Replies: 1 comment 1 reply

liuhongen1234567
Sep 27, 2025
Collaborator