Skip to content

Machine learning pipeline for classifying consumer complaint texts into product categories, with code, results, and screenshots. Built for Kaiburr Assessment 2025.

Notifications You must be signed in to change notification settings

ShyamAnand2/Kaiburr-Assessment-Task5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kaiburr Assessment 2025 — Task 5: Consumer Complaint Classification

Overview

This repository contains the solution for Task 5, focused on multi-class text classification of US consumer finance complaints. The code builds an end-to-end machine learning pipeline for pre-processing, feature extraction, model training, evaluation, and prediction using the consumer_complaints.csv dataset from Kaggle.


Steps

  1. Dataset Download

    • Downloaded via Kaggle CLI: kaggle/us-consumer-finance-complaints
    • Contains open US consumer complaint narrative texts and product category labels.
  2. Dependencies

    • Install via:
      pip install pandas scikit-learn matplotlib nltk seaborn
      
  3. Code

    • Complete pipeline in consumer_complaint_classification.py
    • Key steps: data cleaning, TF-IDF extraction, four ML models (LR, SVM, RF, NB), evaluation.
  4. How to Run

    • Place consumer_complaints.csv and the script in the project root.
    • Run:
      python consumer_complaint_classification.py
      

Results

Below are the saved results and evaluation visualizations, each included as PNG from the screenshots/ folder. Every screenshot contains system date/time and my username in the window for verification.


1. Model Comparison Table

  • Model metrics (accuracy, precision, recall, F1) for all classifiers as output by the classification report.
Task-5-TABLE

2. Confusion Matrices by Model

Confusion matrices for each classifier, allowing visual inspection of prediction breakdown for each product category:

  • Logistic Regression:
Task-5-Logistic Regression Confusion Matrix
  • Naive Bayes:
Task-5-NaiveBayesConfusionMatrix
  • Random Forest:
Task-5-RandomForestConfusionMatrix
  • SVM:
Task-5-SVMConfusionMatrix

3. Best Performing Model Proof

Sample output printout and screenshot of the best-performing model selection, along with its prediction evidence.

-Task-5-BestPerformingModel


Key Files

  • consumer_complaint_classification.py — Complete ML pipeline with all steps.
  • consumer_complaints.csv — Dataset from Kaggle.

Author

Final Year B.Tech CCE
Shyam Anand
October 2025


License

This project submitted for Kaiburr Assessment 2025.

About

Machine learning pipeline for classifying consumer complaint texts into product categories, with code, results, and screenshots. Built for Kaiburr Assessment 2025.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages