support 32K model len on deepseek r1 W8A8 #728

sunbaosong · 2025-04-29T14:43:58Z

What this PR does / why we need it?

Optimize NPU memory usage. #723

vllm v0.8.4.rc2 and DeepSeek R1 can only support a model length of 16K. When attempting to run with a model length of 32K, an "Out of Memory" (OOM) error will occur.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed

Signed-off-by: sunbaosong <13793883820@163.com>

Yikun · 2025-05-06T02:12:25Z

Thanks, merged. Will be included in v0.8.5rc1.

support 32K model len on deepseek r1 W8A8

d983c71

Signed-off-by: sunbaosong <13793883820@163.com>

sunbaosong marked this pull request as ready for review April 29, 2025 14:44

github-actions bot added the module:quantization label Apr 29, 2025

sunbaosong changed the title ~~support 32K model len on deepseek r1 W8A8~~ [Performance] support 32K model len on deepseek r1 W8A8 Apr 29, 2025

sunbaosong changed the title ~~[Performance] support 32K model len on deepseek r1 W8A8~~ support 32K model len on deepseek r1 W8A8 Apr 29, 2025

ganyi1996ppo approved these changes May 6, 2025

View reviewed changes

Yikun merged commit d6bfae8 into vllm-project:main May 6, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support 32K model len on deepseek r1 W8A8 #728

support 32K model len on deepseek r1 W8A8 #728

Uh oh!

sunbaosong commented Apr 29, 2025 •

edited by Yikun

Loading

Uh oh!

Uh oh!

Yikun commented May 6, 2025

Uh oh!

Uh oh!

support 32K model len on deepseek r1 W8A8 #728

support 32K model len on deepseek r1 W8A8 #728

Uh oh!

Conversation

sunbaosong commented Apr 29, 2025 • edited by Yikun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Yikun commented May 6, 2025

Uh oh!

Uh oh!

sunbaosong commented Apr 29, 2025 •

edited by Yikun

Loading