Skip to content

[BUG]: Tool calling not detected in Phoenix evals when using Gemini or Ollama (works correctly with OpenAI) #2183

@vigneshrajaswarise

Description

@vigneshrajaswarise

Where do you use Phoenix

Self-hosted

What version of Phoenix are you using?

11.21.0

What happened?

When running Phoenix locally with Flowise local setup and connecting Phoenix to evaluate pre-build tool-calling behavior, the evaluation works as expected with OpenAI models. Tool calls are detected and scored correctly.

However, when switching to Gemini or Ollama models:

 1.The models successfully produce tool calls (verified in logs).

 2.The tools are executed correctly.

 3.Phoenix eval reports that no tool was called, leading to inaccurate evaluation results.

This issue seems specific to Phoenix’s tool-call detection logic for non-OpenAI models.

Additional information

Steps to Reproduce

    1.Run Phoenix locally with Flowise setup.

    2.Connect Phoenix to a local LLM (Gemini or Ollama).

    3.Configure a tool (e.g., search, calculator).

    4.Run evaluation for tool calling.

    5.Observe:

             OpenAI → tool calls detected correctly.

             Gemini/Ollama → tool calls executed but Phoenix eval shows tool not called.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

📘 Todo

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions