Skip to content

Cyber-Mood/Emotion-Detection-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Emotion Detection: A Comparative Analysis

A comprehensive analysis of Machine Learning and Transformer models for multi-label emotion detection on the GoEmotions dataset.

This repository contains the code and resources for the project "Emotion Detection: A Comparative Analysis," which evaluates the performance of classic machine learning models against modern Transformer architectures. The key achievement of this project is a fine-tuned RoBERTa model that demonstrates high sensitivity with a state-of-the-art recall score.


πŸš€ Features

  • Comprehensive Analysis: Compares 4 different models (Logistic Regression, Random Forest, DistilBERT, and RoBERTa).
  • Multi-Label Classification: Tackles the complex task of predicting multiple emotions for a single piece of text.
  • In-Depth EDA: Includes detailed Exploratory Data Analysis on the GoEmotions dataset, highlighting the severe class imbalance.
  • High-Recall Model: The fine-tuned RoBERTa model achieves a weighted average recall of 0.66, demonstrating high sensitivity.
  • Code: All code is provided in easy-to-follow Jupyter/Colab notebooks.

πŸ“– Methodology

The project workflow is divided into two parallel approaches:

  1. Classic Machine Learning Baseline:

    • Text is vectorized using TF-IDF.
    • Logistic Regression and Random Forest models are trained using a MultiOutputClassifier.
  2. Transformer Fine-Tuning:

    • Text is tokenized using specific tokenizers for DistilBERT and RoBERTa.
    • The pre-trained models are fine-tuned on the GoEmotions dataset using PyTorch and Hugging Face.

πŸ“Š Results

The results clearly show the superiority of Transformer models. Our fine-tuned RoBERTa model achieved the best performance, most notably a high recall score, indicating its effectiveness at identifying emotions.

A key finding was the precision-recall trade-off, a direct consequence of the dataset's class imbalance. While our model excels at finding emotions (high recall), it sometimes over-predicts, leading to lower precision.


πŸ› οΈ How to Use

To run this project, follow these steps:

  1. Clone the repository:

  2. Install the dependencies:

    pip install -r requirements.txt
  3. Download the dataset: The GoEmotions.csv dataset kaggle or Google's github.

  4. Run the notebooks: Open the files in the notebooks/ directory using Jupyter Notebook, JupyterLab, or Google Colab.


πŸ‘₯ Authors

About

Multi-Label Emotion Detection based in GoEmotions Dataset with ML Models and Transformers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published