[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph #2914

littledgg · 2025-07-18T08:40:58Z

when enable chunked prefill，max-model-len is much bigger than max-num-batched-tokens，cudagraph capture graph of max-model-len will cost a lot GPU Memory，which may cause OOM.

…ill + CudaGraph

paddle-bot · 2025-07-18T08:41:02Z

Thanks for your contribution!

gongshaotian

LGTM

[Executor] Avoid OOM when start the service while Enable Chunked Pref…

27449a0

…ill + CudaGraph

paddle-bot bot added the contributor label Jul 18, 2025

Merge branch 'develop' into oom_chunkedprefill_cudagraph

093aaab

gongshaotian approved these changes Jul 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph #2914

[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph #2914

littledgg commented Jul 18, 2025

Uh oh!

paddle-bot bot commented Jul 18, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph #2914

Are you sure you want to change the base?

[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph #2914

Conversation

littledgg commented Jul 18, 2025

Uh oh!

paddle-bot bot commented Jul 18, 2025

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!