Skip to content

Commit 356d6bf

Browse files
feat: ragas evals CLI (#2086)
```bash ❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass Running evaluation: test_app/evals/app_eval.py Dataset: rag_dataset Getting dataset: rag_dataset ✓ Loaded dataset with 30 rows ✓ Completed experiments successfully ╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮ │ Experiment: vibrant_naur │ │ Dataset: rag_dataset (30 rows) │ ╰──────────────────────────────────────────────────────────────────────────────╯ Numerical Metrics ┏━━━━━━━━━━┳━━━━━━━━━┓ ┃ Metric ┃ Current ┃ ┡━━━━━━━━━━╇━━━━━━━━━┩ │ accuracy │ 0.933 │ └──────────┴─────────┘ Categorical Metrics ┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓ ┃ Metric ┃ Category ┃ Current ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩ │ fail or pass │ fail │ 26 │ │ │ pass │ 4 │ └──────────────┴──────────┴─────────┘ ✓ Experiment results displayed ✓ Evaluation completed successfully ``` ```bash ❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass --baseline suspicious_babbage Running evaluation: test_app/evals/app_eval.py Dataset: rag_dataset Baseline: suspicious_babbage Getting dataset: rag_dataset ✓ Loaded dataset with 30 rows ✓ Completed experiments successfully Comparing against baseline: suspicious_babbage ╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮ │ Experiment: pedantic_mccarthy │ │ Dataset: rag_dataset (30 rows) │ │ Baseline: suspicious_babbage │ ╰──────────────────────────────────────────────────────────────────────────────╯ Numerical Metrics ┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓ ┃ Metric ┃ Current ┃ Baseline ┃ Delta ┃ Gate ┃ ┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩ │ accuracy │ 0.900 │ 1.000 │ ▼0.100 │ fail │ └──────────┴─────────┴──────────┴────────┴──────┘ Categorical Metrics ┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓ ┃ Metric ┃ Category ┃ Current ┃ Baseline ┃ Delta ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩ │ fail or pass │ fail │ 26 │ 25 │ ▲1 │ │ │ pass │ 4 │ 5 │ ▼1 │ └──────────────┴──────────┴─────────┴──────────┴───────┘ ✓ Comparison completed ✓ Evaluation completed successfully ``` --------- Co-authored-by: jjmachan <jamesjithin97@gmail.com>
1 parent 8445350 commit 356d6bf

File tree

7 files changed

+492
-12
lines changed

7 files changed

+492
-12
lines changed

experimental/pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,9 @@ platform = "ragas_experimental.project.backends.platform:PlatformProjectBackend"
4747
include = ["ragas_experimental*"]
4848
exclude = ["site*", "old_nbs*", "experiments*", "_proc*", "build*", "dist*"]
4949

50+
[project.scripts]
51+
ragas = "ragas_experimental.cli:app"
52+
5053
[tool.setuptools_scm]
5154
root = ".." # Points to monorepo root, one directory up
5255
version_file = "ragas_experimental/_version.py" # Creates a version file

0 commit comments

Comments
 (0)