This repository contains the solution for Task 5, focused on multi-class text classification of US consumer finance complaints. The code builds an end-to-end machine learning pipeline for pre-processing, feature extraction, model training, evaluation, and prediction using the consumer_complaints.csv dataset from Kaggle.
-
Dataset Download
- Downloaded via Kaggle CLI:
kaggle/us-consumer-finance-complaints - Contains open US consumer complaint narrative texts and product category labels.
- Downloaded via Kaggle CLI:
-
Dependencies
- Install via:
pip install pandas scikit-learn matplotlib nltk seaborn
- Install via:
-
Code
- Complete pipeline in
consumer_complaint_classification.py - Key steps: data cleaning, TF-IDF extraction, four ML models (LR, SVM, RF, NB), evaluation.
- Complete pipeline in
-
How to Run
- Place
consumer_complaints.csvand the script in the project root. - Run:
python consumer_complaint_classification.py
- Place
Below are the saved results and evaluation visualizations, each included as PNG from the screenshots/ folder. Every screenshot contains system date/time and my username in the window for verification.
- Model metrics (accuracy, precision, recall, F1) for all classifiers as output by the classification report.
Confusion matrices for each classifier, allowing visual inspection of prediction breakdown for each product category:
- Logistic Regression:
- Naive Bayes:
- Random Forest:
- SVM:
Sample output printout and screenshot of the best-performing model selection, along with its prediction evidence.
- consumer_complaint_classification.py — Complete ML pipeline with all steps.
- consumer_complaints.csv — Dataset from Kaggle.
Final Year B.Tech CCE
Shyam Anand
October 2025
This project submitted for Kaiburr Assessment 2025.
