Skip to content

Conversation

maleksan85
Copy link
Contributor

@maleksan85 maleksan85 commented Oct 16, 2025

GPT OSS, m and n to check: ROCm@bcc4e69

HIP_VISIBLE_DEVICES=7 \
HSA_NO_SCRATCH_RECLAIM=1 \
NCCL_MIN_NCHANNELS=112 \
USE_FASTSAFETENSOR=1 \
SAFETENSORS_FAST_GPU=1 \
VLLM_DISABLE_COMPILE_CACHE=1 \
VLLM_ROCM_USE_AITER=1 \
VLLM_USE_AITER_UNIFIED_ATTENTION=1 \
VLLM_ROCM_USE_AITER_MHA=0 \
VLLM_USE_AITER_TRITON_GEMM=0 \
TRITON_HIP_PRESHUFFLE_SCALES=1 \
vllm serve /data/models/openai/gpt-oss-120b \
    --host localhost \
    --port 30000 \
    --tensor-parallel-size 1 \
    --max-num-batched-tokens 8192 \
    --max-num-seqs 32 \
    --gpu-memory-utilization 0.9 \
    --max-model-len 2048 \
    --swap-space 16 \
    --block-size 64 \
    --async-scheduling \
    --no-enable-prefix-caching \
    --disable-log-requests \
    --compilation-config='{"pass_config":{"enable_attn_fusion":true,"enable_noop":true,"enable_fusion":true},"cudagraph_mode":"FULL","custom_ops":["+rms_norm","+silu_and_mul","+quant_fp8"],"splitting_ops":[]}'
vllm bench \
  --host localhost \
  --port 30000 \
  --model /data/models/openai/gpt-oss-120b \
  --dataset-name random \
  --random-input-len 1024 \
  --random-output-len 1024 \
  --random-prefix-len 0 \
  --request-rate "inf" \
  --max-concurrency 32 \
  --num-prompts 640 \
  --ignore-eos \
  --percentile-metrics ttft,tpot,itl,e2el

Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
@mergify mergify bot added the rocm Related to AMD ROCm label Oct 16, 2025
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

1 participant