[Misc][Profiling] Make PyTorch profiler gzip and CUDA time dump configurable #29568

zhangruoxu · 2025-11-27T02:37:54Z

Purpose

We observed that enabling both use_gzip and dump_self_cuda_time_total in the vLLM torch profiler introduces significant overhead during profiling.

For example, when profiling 10 randomly generated requests (1000 input tokens, 200 output tokens) on an A100 using the Qwen3-32B model, we found that gzip compression of the profiling trace and dumping the CUDA time table take ~68 seconds, dominating the overall profiling time.

The main sources of overhead appear to be:

Gzip compression of the profiling trace file
Generation and dumping of the CUDA time summary table

After disabling these two features, the total profiling dump time is reduced to ~18 seconds.

In many profiling scenarios (e.g., quick performance checks or small-scale experiments), users may not need gzip compression or the CUDA time table. Therefore, it would be helpful to make these two behaviors individually configurable via environment variables—enabled by default for completeness, but optionally turnable off when faster profiling turnaround is preferred. Moreover, gzip compression could potentially be performed asynchronously after the trace is dumped, allowing lower-latency profiling in staging or pre-production environments.

This patch proposes adding such configurability so users can selectively disable gzip compression and/or CUDA time table generation when they want a faster and lighter profiling workflow.

Fixes #29564

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-11-27T02:38:05Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2025-11-27T02:38:31Z

Documentation preview: https://vllm--29568.org.readthedocs.build/en/29568/

gemini-code-assist

Code Review

This pull request introduces two new environment variables, VLLM_TORCH_PROFILER_USE_GZIP and VLLM_TORCH_PROFILER_DUMP_CUDA_TIME_TOTAL, to make parts of the PyTorch profiler functionality configurable. This allows users to disable gzip compression of profiling traces and the dumping of CUDA time tables, which can help reduce profiling overhead.

The changes are implemented correctly:

New environment variables are added in vllm/envs.py with appropriate defaults and parsing logic that is consistent with existing variables.
The use_gzip parameter for torch.profiler.tensorboard_trace_handler is now controlled by VLLM_TORCH_PROFILER_USE_GZIP in vllm/profiler/gpu_profiler.py, vllm/v1/engine/async_llm.py, and vllm/v1/worker/xpu_worker.py.
The logic for dumping the CUDA time total table in vllm/profiler/gpu_profiler.py is now conditional on the VLLM_TORCH_PROFILER_DUMP_CUDA_TIME_TOTAL flag.
Documentation in docs/contributing/profiling.md has been updated to reflect these new options.

The changes are well-contained and correctly implement the intended functionality. I have not found any high or critical issues. The code quality is good.

Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>

LucasWilkinson

Makes sense to me (I would definitely use this); thanks!

zhangruoxu requested a review from jikunshang as a code owner November 27, 2025 02:37

mergify bot added documentation Improvements or additions to documentation nvidia v1 labels Nov 27, 2025

github-project-automation bot added this to NVIDIA Nov 27, 2025

gemini-code-assist bot reviewed Nov 27, 2025

View reviewed changes

zhangruoxu force-pushed the add_profiler_options branch 7 times, most recently from e2e5b68 to 2efb082 Compare November 27, 2025 03:57

Make PyTorch profiler gzip and CUDA time dump configurable

5eca561

Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>

zhangruoxu force-pushed the add_profiler_options branch from 2efb082 to 5eca561 Compare November 27, 2025 08:12

zhangruoxu changed the title ~~Make PyTorch profiler gzip and CUDA time dump configurable (#29564)~~ Make PyTorch profiler gzip and CUDA time dump configurable Nov 27, 2025

LucasWilkinson approved these changes Nov 28, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 28, 2025

LucasWilkinson enabled auto-merge (squash) November 28, 2025 19:11

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2025

LucasWilkinson changed the title ~~Make PyTorch profiler gzip and CUDA time dump configurable~~ [Misc][Profiling] Make PyTorch profiler gzip and CUDA time dump configurable Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc][Profiling] Make PyTorch profiler gzip and CUDA time dump configurable #29568

[Misc][Profiling] Make PyTorch profiler gzip and CUDA time dump configurable #29568

zhangruoxu commented Nov 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

mergify bot commented Nov 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

LucasWilkinson left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Misc][Profiling] Make PyTorch profiler gzip and CUDA time dump configurable #29568

Are you sure you want to change the base?

[Misc][Profiling] Make PyTorch profiler gzip and CUDA time dump configurable #29568

Conversation

zhangruoxu commented Nov 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

mergify bot commented Nov 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangruoxu commented Nov 27, 2025 •

edited by github-actions bot

Loading