-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Bug
0 / 10 of 1 issue completed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
With the following metacoder configuration, the models
parameters are completely ignored. This is caused by omitting the server_combinations
parameter. This is unexpected behavior.
Secondly, as with Issue #18 if the OpenAI calls fail for any reason, the code gets caught in a try/fail/repeat loop. This should be documented as a separate bug. It likely has a similar solution as in the cited issue.
Expected behavior could be one (or more) of the following:
- Produce an error message like
"No server_combinations configuration specified in metacoder input file "tests/input/literature_mcp_eval_config.yaml". This is a required setting. Please read the documentation for how to create this file."
- Produce the above error message and also include a script that can independently validate the YAML file and report any errors.
- Use all provided information to establish reasonable default parameters (e.g., the
models
configuration) instead of assumingopenai
andgpt40
.
name: pubmed tools evals
description: |
Evaluations for multiple pubmed MCPs
coders:
goose: {}
models:
claude-sonnet:
provider: anthropic
name: claude-sonnet-4-20250514
servers:
mcp-simple-pubmed:
name: pubmed
command: uvx
args: [mcp-simple-pubmed]
env:
PUBMED_EMAIL: ctparker@lbl.gov
# When server_combinations are missing, the models parameter is ignored, defaulting to openai + gpt40
#server_combinations:
# - [mcp-simple-pubmed]
cases:
- name: PMID_28027860_Full_Text
metrics: [CorrectnessMetric]
input: "What is the first sentence of section 2 in PMID: 28027860?"
expected_output: |
Even though many of NFLE's core features have been clarified in the last two decades, some critical issues remain controversial."
threshold: 0.9
The above metacoder configuration results in the following config.yaml
in eval_workdir/claude-sonnet_goose_PMID_28027860_Full_Text_no_servers/claude-sonnet_goose_PMID_28027860_Full_Text/.config/goose/
:
GOOSE_MODEL: gpt-4o
GOOSE_PROVIDER: openai
extensions:
developer:
bundled: true
display_name: Developer
enabled: true
name: developer
timeout: 300
type: builtin
If Option 3 (see above) were implemented, this would look like:
GOOSE_MODEL: claude-sonnet-4-20250514
GOOSE_PROVIDER: anthropic
extensions:
developer:
bundled: true
display_name: Developer
enabled: true
name: developer
timeout: 300
type: builtin
Sub-issues
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working