Skip to content

Develops 2 machine learning models that predict which customers are likely to leave the service, and to identify the most influential factors contributing to churn.

Notifications You must be signed in to change notification settings

patelkrina100/Telco-Customer-Churn-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Telco Customer Churn Prediction

This project analyzes customer churn for a telecommunications company using the IBM Telco Customer Churn dataset. The goal is to develop and evaluate machine learning models that predict which customers are likely to leave the service, and to identify the most influential factors contributing to churn.


πŸ“ Dataset

  • Source: Kaggle – Telco Customer Churn Dataset
  • Context: This dataset represents customer data from a fictional telecom company, including demographics, service usage, contract information, and churn status.

🎯 Objectives

  • Clean and preprocess the dataset
  • Train and evaluate two machine learning models: Logistic Regression and Random Forest
  • Compare model performance using standard metrics
  • Identify key features influencing churn
  • Visualize model outputs and insights for stakeholders

πŸ› οΈ Tools & Libraries

  • Python
  • pandas, numpy – Data manipulation
  • scikit-learn – Modeling and evaluation
  • matplotlib, seaborn – Visualization
  • StandardScaler – Feature normalization

🧩 Workflow Summary

  1. Data Cleaning & Preprocessing

    • Converted TotalCharges to numeric
    • Removed customerID column and rows with missing values
    • Encoded categorical variables using one-hot encoding
    • Scaled numeric features for logistic regression
  2. Modeling

    • Split data into training and test sets (80/20 split)
    • Trained both Logistic Regression and Random Forest classifiers
    • Predicted churn outcomes and probability scores
  3. Evaluation

    • Confusion Matrix
    • Classification Report: Precision, Recall, F1-Score
    • ROC AUC Score
    • ROC Curve Visualization
    • Feature Importance Analysis (Random Forest)

πŸ” Key Results

Metric Logistic Regression Random Forest
Accuracy ~80% ~79%
ROC AUC Score ~0.84 ~0.82
Precision (Churn = True) ~0.65 ~0.63
Recall (Churn = True) ~0.57 ~0.52
  • Logistic Regression slightly outperformed Random Forest in classification metrics and ROC AUC.
  • Random Forest offered clearer interpretability via feature importances.
  • Most influential features: Contract type, Tenure, Monthly Charges, and Online Security.

πŸ“Œ Conclusions

  • Contract-related features were the strongest indicators of churn β€” especially customers on month-to-month contracts.
  • Short tenure and high monthly charges also correlated strongly with churn risk.
  • Both models are useful, but Logistic Regression may generalize better for this dataset due to cleaner class separation.

πŸš€ Future Improvements

  • Perform hyperparameter tuning (e.g., GridSearchCV)
  • Explore XGBoost or Gradient Boosting
  • Use SHAP values for model interpretability
  • Build an interactive dashboard using Streamlit
  • Simulate an A/B test for customer retention offers

πŸ‘©β€πŸ’» Author

Krina Patel
B.S. Data Science & Business Administration, Northeastern University
πŸ“§ patelkrina100@gmail.com
πŸ”— LinkedIn | GitHub | Tableau Portfolio

About

Develops 2 machine learning models that predict which customers are likely to leave the service, and to identify the most influential factors contributing to churn.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published