Skip to content

Commit f14c926

Browse files
authored
Flow Judge Integration (#275)
* haystack integration docs * renamed integration file * updated logo * fix yaml error * modified doc * Flow Judge integration * flow-judge integration * udpate installation * flow-judge integration * flow-judge integration * flow-judge integration * flow-judge integration
1 parent 5d48c36 commit f14c926

File tree

2 files changed

+144
-0
lines changed

2 files changed

+144
-0
lines changed

integrations/flow-judge.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
layout: integration
3+
name: Flow Judge
4+
description: Evaluate Haystack pipelines using Flow Judge
5+
authors:
6+
- name: Flow AI
7+
socials:
8+
github: flowaicom
9+
twitter: flowaicom
10+
linkedin: https://www.linkedin.com/company/flowaicom/
11+
pypi: https://pypi.org/project/flow-judge/
12+
repo: https://github.com/flowaicom/flow-judge
13+
type: Evaluation Framework
14+
report_issue: https://github.com/flowaicom/flow-judge/issues
15+
logo: /logos/flow-ai.png
16+
version: Haystack 2.0
17+
toc: true
18+
---
19+
### **Table of Contents**
20+
- [Overview](#overview)
21+
- [Installation](#installation)
22+
- [Usage](#usage)
23+
- [License](#license)
24+
25+
## Overview
26+
This integration allows you to evaluate Haystack pipelines using Flow Judge.
27+
28+
Flow Judge is an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafted for accuracy, speed, and customization.
29+
30+
Read the technical report [here](https://www.flow-ai.com/blog/flow-judge).
31+
32+
## Installation
33+
34+
For running Flow Judge with vLLM engine:
35+
```bash
36+
pip install flow-judge[vllm]
37+
pip install 'flash_attn>=2.6.3' --no-build-isolation
38+
```
39+
For running Flow Judge with transformers:
40+
```bash
41+
pip install flow-judge[hf]
42+
```
43+
If flash attention:
44+
```bash
45+
pip install 'flash_attn>=2.6.3' --no-build-isolation
46+
```
47+
For running Flow Judge with Llamafile on macOS:
48+
```bash
49+
pip install flow-judge[llamafile]
50+
pip install 'flash_attn>=2.6.3' --no-build-isolation
51+
```
52+
To learn more about the installation, visit the [Flow Judge Installation](https://pypi.org/project/flow-judge/) page.
53+
54+
Finally install Haystack:
55+
```bash
56+
pip install haystack-ai
57+
```
58+
59+
## Usage
60+
Flow Judge integration with Haystack is designed to facilitate the evaluation of Haystack pipelines using Flow Judge. This integration allows you to seamlessly integrate Flow Judge into your Haystack workflows, enabling you to evaluate and improve your LLM systems with precision and efficiency.
61+
62+
Flow Judge offers a set-of built-in metrics and easy-to-create custom metrics.
63+
64+
### Available Built-in Metrics
65+
66+
Built-in metrics come with 3 different scoring scales Binary, 3-point Likert and 5-point Likert:
67+
- Response Correctness
68+
- Response Faithfulness
69+
- Response Relevance
70+
71+
To check the available metrics you can run:
72+
```python
73+
from flow_judge.metrics import list_all_metrics
74+
list_all_metrics()
75+
```
76+
77+
While these preset metrics provide a solid foundation for evaluation, the true power of Flow Judge lies in its ability to create custom metrics tailored to your specific requirements. This flexibility allows for a more nuanced and comprehensive assessment of your LLM systems. Please refer to our [tutorial](https://github.com/flowaicom/flow-judge/blob/main/examples/2_custom_evaluation_criteria.ipynb) for creating custom metrics for more details.
78+
79+
### Components
80+
This integration introduces `HaystackFlowJudge` component, which is used just like other evaluator components in Haystack.
81+
82+
For details about the use and parameters of this component please refer to [HaystackFlowJudge class](https://github.com/flowaicom/flow-judge/blob/main/flow_judge/integrations/haystack.py) and Haystack's [LLMEvaluator component](https://docs.haystack.deepset.ai/reference/evaluators-api#module-llm_evaluator).
83+
84+
### Use Flow Judge with Haystack
85+
We have created a comprehensive guide on how to effectively use Flow Judge with Haystack. You can access it [here](https://github.com/flowaicom/flow-judge/blob/main/examples/5_evaluate_haystack_rag_pipeline.ipynb). This tutorial demonstrates how to evaluate a RAG pipeline built with Haystack using Flow Judge.
86+
87+
### Quick Example
88+
The code snippet below provides a simpler example of how to integrate Flow Judge with Haystack. However, we recommend following the full tutorial for a deeper understanding of the concepts and implementation.
89+
90+
```python
91+
from flow_judge.integrations.haystack import HaystackFlowJudge
92+
from flow_judge.metrics.presets import RESPONSE_FAITHFULNESS_5POINT
93+
from flow_judge import Hf
94+
95+
from haystack import Pipeline
96+
97+
# Create a model using Hugging Face Transformers with Flash Attention
98+
model = Hf() # We support also Vllm, Llamafile
99+
100+
# Evaluation sample
101+
questions = ["What is the termination clause in the contract?"]
102+
contexts = ["This contract may be terminated by either party upon providing thirty (30) days written notice to the other party. In the event of a breach of contract, the non-breaching party may terminate the contract immediately."]
103+
answers = ["The contract can be terminated by either party with thirty days written notice."]
104+
105+
# Define the HaystackFlowJudge evaluator, we will use the built-in metric for faithfulness
106+
# For parameters refer to Haystack's [LLMEvaluator](https://docs.haystack.deepset.ai/reference/evaluators-api#module-llm_evaluator) and HaystackFlowJudge class.
107+
ff_evaluator = HaystackFlowJudge(
108+
metric=RESPONSE_FAITHFULNESS_5POINT,
109+
model=model,
110+
progress_bar=True,
111+
raise_on_failure=True,
112+
save_results=True,
113+
fail_on_parse_error=False
114+
)
115+
116+
# Setup the pipeline
117+
eval_pipeline = Pipeline()
118+
119+
# Add components to the pipeline
120+
eval_pipeline.add_component("ff_evaluator", ff_evaluator)
121+
122+
# Run the eval pipeline
123+
results = eval_pipeline.run(
124+
{
125+
"ff_evaluator": {
126+
'query': questions,
127+
'context': contexts,
128+
'response': answers,
129+
}
130+
}
131+
)
132+
133+
# Print eval results
134+
for result in results['ff_evaluator']['results']:
135+
score = result['score']
136+
feedback = result['feedback']
137+
print(f"Score: {score}")
138+
print(f"Feedback: {feedback}\n")
139+
140+
```
141+
142+
### License
143+
The code is licensed under the [Apache 2.0 license.](https://github.com/flowaicom/flow-judge/blob/main/LICENSE)
144+

logos/flow-ai.png

23.1 KB
Loading

0 commit comments

Comments
 (0)