You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running Phoenix locally with Flowise local setup and connecting Phoenix to evaluate pre-build tool-calling behavior, the evaluation works as expected with OpenAI models. Tool calls are detected and scored correctly.
However, when switching to Gemini or Ollama models:
1.The models successfully produce tool calls (verified in logs).
2.The tools are executed correctly.
3.Phoenix eval reports that no tool was called, leading to inaccurate evaluation results.
This issue seems specific to Phoenix’s tool-call detection logic for non-OpenAI models.
Additional information
Steps to Reproduce
1.Run Phoenix locally with Flowise setup.
2.Connect Phoenix to a local LLM (Gemini or Ollama).
3.Configure a tool (e.g., search, calculator).
4.Run evaluation for tool calling.
5.Observe:
OpenAI → tool calls detected correctly.
Gemini/Ollama → tool calls executed but Phoenix eval shows tool not called.