Aspect-Based Sentiment Classification for E-Commerce Reviews

Project conducted: December 2024 (was not uploaded at the time)

Building a machine learning pipeline to classify customer sentiments across multiple product aspects in the e-commerce domain.

Introduction

This project aims to extract structured insights from unstructured customer reviews collected from Hasaki.vn. By leveraging NLP and machine learning, it automatically identifies six key product aspects:

Store
Service
Packaging
Price
Quality
Others

Each review is classified by both aspect and sentiment polarity (Positive/Negative/Neutral), supporting customer experience analysis at a granular level.

Dataset

Data Source

Data was crawled using Selenium from product pages and review sections on Hasaki.vn.

Labeling Process

A hybrid labeling approach was used:

Gemini API: for automated tagging of aspects and sentiments.
Manual verification: to improve reliability.

Each review may be linked to multiple aspects, each with a corresponding sentiment label.

Tools & Technologies

Languages: Python
Libraries: pandas, numpy, scikit-learn, gensim, tensorflow, keras
Embedding: Word2Vec
Model persistence: joblib, pickle
Development: Google Colab (please update file paths if running locally)

Project Workflow

1. Data Crawling

File: crawl_comment.ipynb
Crawls product IDs and corresponding customer reviews
Saves raw review data to data/data_crawl.xlsx

2. Labeling

File: code_label_gemini_api.ipynb
Uses Gemini API to assign:
- Aspect tags: e.g., Service, Packaging...
- Sentiment: Positive, Negative, Neutral
Outputs structured file data/data_label.xlsx with one-hot encoded aspect columns

3. Data Preprocessing

File: data_preprocessing.ipynb
Includes:
- Lowercasing, punctuation/special character removal
- Emoji replacement
- Tokenization using Vietnamese rules
- Stopword removal using a curated .txt list
Output: cleaned review sequences in data/data_preprocess.xlsx

Example wordcloud:

4. Embedding & Tokenization

Trains a custom Word2Vec model
Tokenizes input sequences using Keras Tokenizer
Produces:
- word2vec_sentiment.model (embedding)
- tokenizer.pkl (used for training neural networks)

5. Model Training

Folder model_code/ contains 6 training notebooks (1 per aspect)
Models trained:
- Logistic Regression
- SVM
- Random Forest
- Neural Network
Best-performing model selected per aspect
Trained .joblib models stored in model_file/

6. Evaluation

Performed with:
- Accuracy
- Precision / Recall / F1-score
- Confusion Matrix
- Classification Report
- ROC Curve

Example for aspect_quality:

Summary all:

Aspect	Model	Accuracy
Service	Neural Network	92.3%
Store	Random Forest	90.1%
Packaging	Random Forest	89.72%
Others	Random Forest	88.23%
Price	Neural Network	85.4%

Result

Result from running main.ipynb with a Vietnamese review input

How to Use

Option 1: Full Pipeline

Run the following in order (on Colab):
- crawl_comment.ipynb
- code_label_gemini_api.ipynb
- data_preprocessing.ipynb
- Any training notebook in model_code/
- main.ipynb

Option 2: Pre-trained Models

Download:
- embedding_model_file/
- model_file/
Update paths in main.ipynb
Input a Vietnamese review → returns aspect-level sentiment

Learning Outcomes

Applied aspect-based sentiment analysis to real-world e-commerce data
Trained and compared multiple classification models
Built a functional inference pipeline ready for deployment
Practiced crawling, labeling, NLP preprocessing, model evaluation, and modular code organization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Aspect-Based Sentiment Classification for E-Commerce Reviews

Introduction

Dataset

Data Source

Labeling Process

Tools & Technologies

Project Workflow

1. Data Crawling

2. Labeling

3. Data Preprocessing

4. Embedding & Tokenization

5. Model Training

6. Evaluation

Result

How to Use

Option 1: Full Pipeline

Option 2: Pre-trained Models

Learning Outcomes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
data		data
embedding_model_file		embedding_model_file
model_code		model_code
model_file		model_file
requirement		requirement
README.md		README.md
code_label_gemini_api.ipynb		code_label_gemini_api.ipynb
crawl_comment.ipynb		crawl_comment.ipynb
data_preprocessing.ipynb		data_preprocessing.ipynb
main.ipynb		main.ipynb

GinnTers/Aspect-sentiment-review-classifier

Folders and files

Latest commit

History

Repository files navigation

Aspect-Based Sentiment Classification for E-Commerce Reviews

Introduction

Dataset

Data Source

Labeling Process

Tools & Technologies

Project Workflow

1. Data Crawling

2. Labeling

3. Data Preprocessing

4. Embedding & Tokenization

5. Model Training

6. Evaluation

Result

How to Use

Option 1: Full Pipeline

Option 2: Pre-trained Models

Learning Outcomes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages