This repository supports research and experimentation around understanding and mitigating hallucinations in embeddings — specifically how embeddings can fail to capture human-like understanding.
- Compare Embeddings: Measure similarity between sentences using cosine similarity (or other metrics like dot product, Euclidean distance).
 - Fine-Tune Embeddings: Fine-tune SentenceTransformer models to reduce hallucinations and improve semantic understanding.
 
.
├── data/                        # Training, validation, and test data
├── fine-tuning/                
│   ├── embedding-fine-tune.py  # Fine-tunes embedding models
│   └── eval.py                 # Evaluates and compares model embeddings
├── outputs/                    # Outputs of similarity scoring between sentence pairs
├── results/                    # Results of evaluation and comparisons
├── requirements.txt            # Python dependencies
└── README.md                   # Project documentation
You can create an environment using any of the following:
python -m venv halluc-env
source halluc-env/bin/activate  # On Windows: halluc-env\Scripts\activate
pip install -r requirements.txtconda create -n halluc-env python=3.10
conda activate halluc-env
pip install -r requirements.txtuv venv halluc-env
source halluc-env/bin/activate
uv pip install -r requirements.txtCreate a .env file in the root directory to set environment variables for your project. This is useful for managing sensitive information like API keys and Azure OpenAI related details.
AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_API_KEY= API_VERSION=2024-10-21 AZURE_DEPLOYMENT= MODEL_NAME= TEMPERATURE=0.0
Fine-tune a SentenceTransformer model using the provided training data to reduce hallucinations:
python ./fine-tuning/embedding-fine-tune.pyCompare a fine-tuned model against a foundational model using evaluation datasets:
python ./fine-tuning/eval.pyThis will output evaluation results in the results/ directory.
Use the sentence similarity comparison utility to find the semantic similarity between any two sentences. Cosine similarity is used by default.
Results are stored in the outputs/ directory.
Currently implemented:
- ✅ Cosine Similarity (default)
 
You can easily switch to other metrics such as:
- Dot Product
 - Euclidean Distance
 
These metrics are applied on the sentence embeddings generated using SentenceTransformer models.
data/: Contains training, validation, and test sets used for fine-tuning.results/: Contains evaluation output comparing foundational and fine-tuned models.outputs/: Contains similarity scores between sentence pairs.
This project is part of research for the paper:
"Hallucination by Design: How Embeddings Fail Understanding Human Language"
It explores:
- Where and why embeddings hallucinate
 - How fine-tuning helps mitigate such hallucinations
 - Benchmarks for measuring improvements
 
This project is released under the MIT License.
Feel free to fork, experiment, and contribute via pull requests or discussions.
- Sentence-Transformers
 - The open-source community supporting transparency in embedding evaluation and interpretability