Telecom Churn Prediction

Overview

This project aims to predict customer churn for a telecom company by analyzing customer behavior over a four-month period. The goal is to identify high-value customers who are at risk of churning and recommend actionable strategies to improve customer retention. The project was initially made as part of a graduate data science course.

Problem Statement

In the highly competitive telecom industry, customer churn poses significant challenges. Acquiring new customers costs 5-10 times more than retaining existing ones. Predicting churn, especially for prepaid customers, is critical as they can leave without formal notice. This project focuses on the Indian and Southeast Asian markets, where the prepaid model dominates.

Definitions of Churn:

Revenue-based churn: Customers generating minimal or no revenue.
Usage-based churn: Customers with no incoming/outgoing calls or internet usage over a defined period.

Objective:

Predict churn in the 9th month based on data from months 6, 7, and 8.
Identify key indicators of churn.

Dataset

The dataset contains customer-level information over four months, and is stored in the dataset folder of the repository:

Columns: 226 features, including usage patterns, recharge amounts, and customer demographics.
Rows: ~100,000 customer records.

Key Details:

Monthly Encodings: June (6), July (7), August (8), and September (9).
High-Value Customers: Defined as those whose average recharge in the first two months is above the 70th percentile.
Target Variable: Churn (1 if the customer stopped using services in month 9, 0 otherwise).

Refer to the included Data Dictionary for details on column abbreviations and meanings.

Methodology

Data Preparation

Initial Cleanup:
- Removed non-predictive columns (e.g., mobile_number, circle_id).
- Addressed missing values through imputation or column removal.
Feature Engineering:
- Derived metrics such as average revenue, total usage, and recharge patterns.
- Tagged high-value customers.

Exploratory Data Analysis (EDA)

Analyzed data distributions and correlations.
Segmented customers by usage and revenue metrics.

Target Definition

Churn is defined by the absence of calls and internet usage in the churn phase (month 9).

Dimensionality Reduction

Applied Principal Component Analysis (PCA) to reduce 226 features into a smaller set of components while retaining variance.

Modeling

Algorithms:
- Logistic Regression (baseline).
- Random Forest, Gradient Boosting (e.g., XGBoost).
Class Imbalance Handling:
- Used SMOTE and class weighting techniques.
Evaluation Metrics:
- Precision, Recall, F1-Score, and AUC-ROC.

Model Evaluation and Insights

Selected the best-performing model based on evaluation metrics.
Identified significant predictors of churn using feature importance (e.g., decision trees) and regression coefficients.

Results

Best Model: Gradient Boosting with an AUC-ROC of 0.85.
Key Indicators of Churn:
- Decline in recharge amounts.
- Reduced call and internet usage in the action phase.
Recommendations:
- Offer retention incentives to high-risk customers.
- Monitor declining usage patterns for early intervention.

Project Pipeline

Data Collection: Load and clean telecom customer data.
EDA and Feature Engineering: Derive meaningful metrics and segment customers.
Model Building:
- Apply dimensionality reduction.
- Train and tune predictive models.
Evaluation: Assess models based on business-centric metrics.
Insights and Recommendations: Use findings to propose retention strategies.

Requirements

Python Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, imbalanced-learn

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dataset		dataset
Churn_Prediction_model.ipynb		Churn_Prediction_model.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Telecom Churn Prediction

Overview

Problem Statement

Definitions of Churn:

Objective:

Dataset

Key Details:

Methodology

Data Preparation

Exploratory Data Analysis (EDA)

Target Definition

Dimensionality Reduction

Modeling

Model Evaluation and Insights

Results

Project Pipeline

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AbhishekSinghDhadwal/TelecomChurnPredictor

Folders and files

Latest commit

History

Repository files navigation

Telecom Churn Prediction

Overview

Problem Statement

Definitions of Churn:

Objective:

Dataset

Key Details:

Methodology

Data Preparation

Exploratory Data Analysis (EDA)

Target Definition

Dimensionality Reduction

Modeling

Model Evaluation and Insights

Results

Project Pipeline

Requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages