RAG systems (classical - graph - agentic) evaluation

Systems being compared

Results

On the "huggingface 10 docs" evaluation dataset:

Framework	Config	Version description	LLM	Normalized accuracy
llama-index	experiments/llama_index/out-of-the-box	Llama-3-70B-Instruct	Does not use embeddings (default behaviour without using a graphDB)	74.7%
llama-index	experiments/llama_index/out-of-the-box-neo4j	Mistral 7B Instruct Quantized 4 bit (AWQ)	Out of the box config + use neo4j to enable embeddings + switch llm to a 7B model	81%
lightrag	experiments/lightrag/out-of-the-box	Llama-3-70B-Instruct	Default configuration, with embeddings	81%

Evaluation built based on the following cookbook by huggingface: https://huggingface.co/learn/cookbook/en/rag_evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
experiments		experiments
llama_index		llama_index
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
README_long.md		README_long.md
requirements.txt		requirements.txt
run_experiments.ipynb		run_experiments.ipynb
setup.py		setup.py