Skip to content

Getting stuck at tool call in evaluation of Qwen3 #7

@Aliasgarsaifee

Description

@Aliasgarsaifee

Can anyone help me with the benchmark.
I am running Qwen3-80b model benchmark and facing many issues.
One thing is it is getting stuck at this using the evaluation script -

2025-09-19 13:21:36,783 - evaluation_logger_Attraction-62 - INFO - Test Example Attraction-62
2025-09-19 13:21:36,783 - evaluation_logger_Attraction-62 - INFO - Query: My child is a huge fan of the Harry Potter series. Can you check out what Harry Potter-themed attractions or activities are available in London? If the first one costs more than 100, keep checking the next ones until you find something that's under 100.
2025-09-19 13:21:37,349 - evaluation_logger_Attraction-62 - INFO - Function Calls: 
[
    {
        "name": "Search_Attraction_Location",
        "arguments": {
            "query": "London"
        }
    }
]

2025-09-19 13:21:37,349 - evaluation_logger_Attraction-62 - INFO - Golden Function Call: 
[
    {
        "name": "Search_Attraction_Location",
        "arguments": {
            "query": "Harry Potter, London"
        }
    },
    {
        "name": "Get_Attraction_Details",
        "arguments": {
            "slug": "prdg4urreipy-harry-potters-london-experience-tour"
        }
    }
]

Also I have changed Qwen Model and Runner to take vllm_url as input as it was not there, nothing else is changed.

VLLM Server Command:

vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --host "0.0.0.0" --port "8000" --uvicorn-log-level warning --served-model-name qwen3-next --trust-remote-code --gpu-memory-utilization "0.9" --enable-prefix-caching --max-model-len "131072" --enable-auto-tool-choice --tool-call-parser hermes --tensor-parallel-size "4" --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

Evaluation Script Command:

python evaluation.py --model_name=qwen3-next --vllm_url=http://0.0.0.0:8000/v1 --proc_num=1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions