diff --git a/docs/getstarted/evals.md b/docs/getstarted/evals.md index 249f03928..a04851dd0 100644 --- a/docs/getstarted/evals.md +++ b/docs/getstarted/evals.md @@ -62,7 +62,7 @@ choose_evaluator_llm.md **Evaluation** -Here we will use [AspectCritic](../concepts/metrics/available_metrics/aspect_critic.md), which an LLM based metric that outputs pass/fail given the evaluation criteria. +Here we will use [AspectCritic](../concepts/metrics/available_metrics/aspect_critic.md), which is an LLM based metric that outputs pass/fail given the evaluation criteria. ```python @@ -148,8 +148,8 @@ Output {'summary_accuracy': 0.84} ``` -This score shows that out of all the samples in our test data, only 84% of summaries passes the given evaluation criteria. Now, **It -s important to see why is this the case**. +This score shows that out of all the samples in our test data, only 84% of summaries passes the given evaluation criteria. Now, **It's +important to see why is this the case**. Export the sample level scores to pandas dataframe @@ -187,4 +187,4 @@ If you want help with improving and scaling up your AI application using evals. ## Up Next -- [Evaluate a simple RAG application](rag_eval.md) \ No newline at end of file +- [Evaluate a simple RAG application](rag_eval.md)