Skip to content

Emotion architecture from Reddit comments: rater behavior, semantic clusters, and contradiction mapping in GoEmotions.

Notifications You must be signed in to change notification settings

a-neti-neti/goemotions-eda-annotation-diagnostics

Repository files navigation

Emotion Architecture in Reddit Comments

A Diagnostic and Predictive Exploration of Crowdsourced Emotional Labels

Project Overview

This project explores emotional patterns, rater bias, and label reliability in the GoEmotions dataset using Python and data science tools.

The EDA investigates:

  • Annotation inconsistency and “neutral spamming”
  • Emotion co-occurrence and correlation
  • Contradictions between labels and textual signals
  • Predictive performance using TF-IDF + Logistic Regression
  • Semantic structure using Word2Vec + t-SNE

This project was presented as the final EDA project at TovTech.


Main Notebook


Key Visual

Average Emotion Scores by number of Raters per Comment


DataCamp Learning Experience

As part of this EDA, I studied statistics, data science, and modeling using the DataCamp platform. I learned to:

  • Apply diagnostic thinking to real-world datasets
  • Identify annotation bias and contradictions
  • Use statistical tools to interpret NLP structures
  • Process findings clearly under pressure

Despite a nontraditional background, this project helped me realize the power of clean logic, ethical scrutiny, and practical data skills.


Dataset


Contact

Feel free to reach out or fork this repo if you're interested in:

  • Emotion AI quality control
  • Annotator profiling
  • Ethical NLP systems