Predicting Customer Churn for OTT and SaaS

End-to-end churn analysis project using the Telco Customer Churn dataset. This project identifies churn drivers, predicts churn probability, and segments high-risk users for business action.

View Analysis Outputs Dataset Source

7032

Customers analyzed

26.58%

Overall churn rate

0.7946

Logistic Regression accuracy

0.7584

Random Forest accuracy

19.80%

High-risk users

Key Findings

High Early Churn

New users show the highest churn rates compared to long-tenure users.

Price Sensitivity

Users with higher monthly charges are more likely to churn.

Support Signal

Frequent support interactions are associated with elevated churn risk.

Project Objectives

Calculate churn rate across customer groups
Identify key factors causing churn
Build a prediction model for churn probability
Segment users into Low, Medium, and High risk
Recommend practical retention strategies

Workflow

Data loading and validation from the source CSV
Data cleaning and preprocessing
Feature engineering for engagement and risk signals
Exploratory analysis and correlation study
Model training and evaluation
Risk segmentation and business recommendation generation

Open Chart Analysis to view all generated charts with explanations.

Installation Steps

Clone the repository:
git clone https://github.com/bikram73/Subscription_Churn_Analysis.git
Move into the project folder:
cd Subscription_Churn_Analysis
Create virtual environment:
python -m venv .venv
Activate environment (Windows PowerShell):
.venv\Scripts\Activate.ps1
Install dependencies:
pip install -r requirements.txt
Run the pipeline:
python subscription_churn.py

Tech Stack

Data Processing

Pandas, NumPy

Visualization

Matplotlib, Seaborn

Machine Learning

Scikit-learn

The original Telco dataset does not directly include OTT telemetry fields like usage frequency, last login days, and support calls. These are engineered as deterministic proxy features.