Python framework for evaluating LLM tool-calling behavior with comprehensive metrics on accuracy, efficiency, and correctness
metrics openai testing-framework ai-agents tool-use llm langchain anthropic function-calling llm-evaluation ai-agents-and-tools llm-metrics
-
Updated
Oct 28, 2025 - Python