Skip to content

Commit 72a0c3f

Browse files
committed
add eval config for Qwen3-235B-A22B-Thinking-2507-FP8
Signed-off-by: Huamin Li <3ericli@gmail.com>
1 parent 99722d5 commit 72a0c3f

File tree

4 files changed

+27
-2
lines changed

4 files changed

+27
-2
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_name: "Qwen/Qwen3-235B-A22B-Thinking-2507-FP8"
2+
backend: "vllm"
3+
tasks:
4+
- name: "mmlu_pro"
5+
metrics:
6+
- name: "exact_match,custom-extract"
7+
value: 0.77
8+
num_fewshot: 5
9+
limit: 250 # will run on 250 * 14 subjects = 3500 samples
10+
max_model_len: 8096
11+
gen_kwargs: "top_p=1,top_k=0,max_gen_toks=1536"
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml
1+
Qwen3-235B-A22B-Thinking-2507-FP8.yaml

.buildkite/lm-eval-harness/test_lm_eval_correctness.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@ def launch_lm_eval(eval_config, tp_size):
4040
# existing text models in CI, so only apply it for mm.
4141
apply_chat_template=backend == "vllm-vlm",
4242
batch_size=batch_size,
43+
gen_kwargs=eval_config.get("gen_kwargs", None),
4344
)
45+
4446
return results
4547

4648

.buildkite/test-pipeline.yaml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1084,7 +1084,7 @@ steps:
10841084
- tests/weight_loading
10851085
commands:
10861086
- bash weight_loading/run_model_weight_loading_test.sh -c weight_loading/models-large.txt
1087-
1087+
10881088
- label: NixlConnector PD accuracy tests (Distributed) # 30min
10891089
timeout_in_minutes: 30
10901090
working_dir: "/vllm-workspace/tests"
@@ -1126,6 +1126,18 @@ steps:
11261126
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
11271127
- pytest -s -v test_lm_eval_correctness.py --config-list-file=configs/models-large.txt --tp-size=4
11281128

1129+
##### H100 test #####
1130+
- label: LM Eval Medium Models (H100) # optional
1131+
gpu: h100
1132+
optional: true
1133+
num_gpus: 4
1134+
working_dir: "/vllm-workspace/.buildkite/lm-eval-harness"
1135+
source_file_dependencies:
1136+
- csrc/
1137+
- vllm/model_executor/layers/quantization
1138+
commands:
1139+
- pytest -s -v test_lm_eval_correctness.py --config-list-file=configs/models-medium-h100.txt --tp-size=4
1140+
11291141
##### H200 test #####
11301142
- label: Distributed Tests (H200) # optional
11311143
gpu: h200

0 commit comments

Comments
 (0)