Churn Prediction
End-to-End ML System

Enterprise-grade telecom churn prediction pipeline — from raw CSV to production API. XGBoost + GPU acceleration, 250-trial Optuna tuning, Great Expectations validation, MLflow experiment tracking, FastAPI serving, and 13-job CI/CD to AWS ECR.

GitHub Repository

Endpoint Status

Try the Churn Prediction Model

⚠️ ECS CLOUD ENDPOINT IS NOT RUNNING TO REDUCE COST.

The FastAPI inference service is configured and containerised but the ECS task is kept stopped to avoid idle cloud charges. The screenshots below show the live API in action — spin it up locally with docker run from the GitHub repo.

Project Overview

From Raw Data to Production API

A full-stack ML engineering project solving telecom customer churn — built to enterprise production standards. The system ingests raw customer data, enforces data contracts with Great Expectations, engineers 11 domain-specific features, trains an XGBoost model on GPU, and serves predictions via a FastAPI microservice deployed to AWS.

Every layer is observable: MLflow tracks all experiments, GitHub Actions runs 13 CI/CD jobs on every push, and the prediction threshold is fully configurable at serving time — giving business teams control over precision vs. recall tradeoffs.

Python 3.11 XGBoost + GPU FastAPI MLflow Docker AWS S3 / ECR

Model Performance — Holdout

Recall 0.9737

AUC-ROC 0.8712

F1 Score 0.7143

Precision 0.5688

Confusion Matrix (threshold 0.35)

37,524

True Negative

5,386

False Positive

232

False Negative

8,358

True Positive

Business Impact

How This System Enables Stakeholder Decisions

Four organizational roles, one shared intelligence layer.

📊

CRM / Retention Team

Who do we contact first this week?

Daily batch scores from /predict/batch API
Risk-ranked customer list by churn probability
Configurable threshold tunes outreach volume
False-negative rate < 3% minimises missed churners

Prioritise daily call queue

🎯

Marketing / Growth

Which segment needs a retention offer?

CLV × Risk Score × Engagement Score segmentation
High-CLV + high-risk → premium win-back campaign
Contract-type features identify at-risk cohorts
Month-to-month subscribers flagged proactively

Target campaign spend

📈

Executive / Finance

What is the revenue risk this quarter?

Churn probability × monthly charge = projected ARR loss
Cohort-level risk dashboards from batch scoring
Retention spend ROI calculated from predicted saves
Model recall = 97.37% → near-complete risk visibility

Revenue forecasting

⚙️

Product / Ops

Is the model safe to deploy? What changed?

Great Expectations gates block bad data at ingestion
140+ pytest tests + code-coverage thresholds in CI
MLflow tracks every run, param, metric, artefact
Docker health checks + /ready endpoint in prod

Safe, observable releases

97.37%

Recall on
51,500-sample holdout

51.5K

Holdout samples
never seen in training

140+

Automated tests
across 13 CI/CD jobs

0.35

Default threshold
configurable at serve time

Business Value

Why 97% Recall Changes the Business

Cost of a Missed Churner

Every false negative is a customer who churned without intervention. At average telecom CLV of $300–$800, a 3% miss rate on 51,500 samples represents $460K–$1.2M in recoverable revenue annually.

Precision–Recall Tradeoff

The threshold (0.35 default) is tunable at inference time. Lower threshold → higher recall, more outreach cost. Higher threshold → higher precision, fewer false alarms. Business owns the tradeoff — not the data scientist.

Feature-Driven Segmentation

Engineered features like CLV, Risk Score, Engagement Score, and Contract Stability Index let marketing teams build retention cohorts directly from model outputs — without a separate analysis step.

Production-Grade Reliability

13 CI/CD jobs, 140+ tests, health checks, and data validation gates mean the system degrades gracefully — not silently. Ops teams get observable, auditable, reproducible behaviour at every deploy.

Configurable Threshold — Business Team Controls Risk

0.25

Low threshold
Maximum recall · high outreach cost

0.35 ✓

Default (production)
Balanced recall vs. precision

0.50

High threshold
High precision · fewer interventions

ROI Analysis

ROI Simulation — What 1% Retention Improvement Means

Based on a 51,500-customer portfolio at industry-average telecom CLV

Scenario Assumptions

51,500 customers · avg monthly charge $65 · avg CLV $780 · retention offer cost $45/customer · baseline churn rate 17.1%

8,807

Predicted churners
identified (recall 97.37%)

out of 9,045 actual churners

$396K

Intervention cost
at $45 per outreach

targeted only, not bulk

$3.4M

Revenue preserved
at 50% win-back rate

$780 CLV × 4,400 saves

8.6×

Return on investment
($3.4M / $396K)

conservative estimate

Every 1% improvement in retention on this customer base preserves approximately $401K in annual revenue — making the model's 97.37% recall directly measurable in dollars, not just in metrics.

ML Pipeline

8 sequential stages from raw data to production model

01

Data Ingestion

Load train/test/holdout CSVs. Validate file existence, column presence, and basic schema before any processing begins.

02

Data Validation

Great Expectations suite: schema checks, business rule validation (gender, contract type, charge ranges), and statistical drift detection.

03

Preprocessing

Column normalisation, null handling strategies per feature, categorical encoding, and TotalCharges numeric conversion.

04

Feature Engineering

11 domain features: CLV, Risk Score, Engagement Score, Contract Stability Index, Service Complexity, Tenure Groups, and more.

05

Hyperparameter Tuning

Optuna Bayesian optimisation over 250 trials. Recall-focused objective with XGBoost GPU backend and custom threshold search.

06

Model Training

XGBoost with tree_method=gpu_hist. Best params from Optuna applied. Early stopping on validation recall. Full MLflow logging.

07

Evaluation

Holdout evaluation on 51,500 samples. Confusion matrix, classification report, AUC-ROC. Recall target ≥ 0.95 enforced by CI.

08

Model Registry

MLflow model registry + AWS S3 versioned artefact storage. Production version tagged, previous versions retained for rollback.

System Architecture Overview

Four layers designed for observability, reproducibility and zero-downtime deployments

🏗️

Data Layer

CSV ingestion (train / test / holdout)
Great Expectations validation suite
Pandas preprocessing pipeline
AWS S3 artefact storage

🧠

ML Layer

XGBoost + GPU training
Optuna 250-trial tuning
MLflow experiment tracking
Model registry + versioning

🚀

ML Pipeline Layer

FastAPI microservice
Docker multi-stage build
Health & readiness endpoints
Batch + single prediction APIs

☁️

Infrastructure Layer

GitHub Actions 13-job CI/CD
AWS ECR container registry
ECS-ready deployment config
Automated tests + coverage gates

Engineered Features

11 domain-specific features constructed from raw telecom signals

Feature	Formula / Logic	Business Meaning
CLV	MonthlyCharges × tenure	Estimated lifetime revenue per customer
risk_score	Weighted contract + monthly charge + tenure signal	Composite churn propensity index
engagement_score	service_count × tenure_norm	How embedded the customer is in the product
contract_stability	contract_type_encoded × tenure	Contractual lock-in strength
service_complexity	Count of active add-on services	Switching friction from multiple services
tenure_group	Binned tenure: new / mid / loyal	Customer lifecycle stage
charge_per_service	MonthlyCharges / (service_count + 1)	Perceived value-for-money signal
paperless_auto_pay	PaperlessBilling AND AutoPay flag	Digital engagement indicator
senior_no_support	Senior citizen AND no tech support	High-vulnerability segment flag
high_value_churn_risk	CLV > median AND contract = monthly	Priority intervention flag for CRM
charge_increase_risk	TotalCharges / (tenure + 1) deviation	Detects unexpectedly rising cost burden

Cloud & MLOps

Infrastructure & Tooling

GitHub Actions CI/CD

13 jobs: black, isort, flake8, mypy, bandit, safety, pytest (unit/integration/e2e), coverage gate ≥ 80%, Docker build, ECR push.

AWS S3 + ECR

Model artefacts versioned in S3. Container images built and pushed to ECR in CI. ECS deployment manifests included.

MLflow Tracking

Every training run logs params, metrics, confusion matrix artifact, and model binary. Promotes best run to registry automatically.

Docker Multi-stage Build

Builder stage installs deps, runtime stage copies only artefacts. Non-root user, health checks, and PYTHONDONTWRITEBYTECODE optimisations.

Design Patterns

Engineering Principles

Pipeline Factory Pattern

Each stage is an isolated, testable unit. Stages compose into a DAG — easy to swap, extend, or run in parallel.

Config-Driven Inference

Prediction threshold passed as request parameter — no redeploy needed to change business operating point.

Fail-Fast Validation

Great Expectations gates run before any model code. Bad data raises immediately — no silent model degradation.

Separation of Concerns

Feature engineering, model training, serving, and validation each live in isolated modules — enabling independent testing and deployment.

Churn Prediction
End-to-End ML System

Try the Churn Prediction Model

From Raw Data to Production API

How This System Enables Stakeholder Decisions

Why 97% Recall Changes the Business

Cost of a Missed Churner

Precision–Recall Tradeoff

Feature-Driven Segmentation

Production-Grade Reliability

Configurable Threshold — Business Team Controls Risk

ROI Simulation — What 1% Retention Improvement Means

ML Pipeline

Data Ingestion

Data Validation

Preprocessing

Feature Engineering

Hyperparameter Tuning

Model Training

Evaluation

Model Registry

System Architecture Overview

Data Layer

ML Layer

ML Pipeline Layer

Infrastructure Layer

Engineered Features

Infrastructure & Tooling

Engineering Principles

Tech Stack

Want to explore the full system?

Churn PredictionEnd-to-End ML System

Try the Churn Prediction Model

From Raw Data to Production API

How This System Enables Stakeholder Decisions

Why 97% Recall Changes the Business

Cost of a Missed Churner

Precision–Recall Tradeoff

Feature-Driven Segmentation

Production-Grade Reliability

Configurable Threshold — Business Team Controls Risk

ROI Simulation — What 1% Retention Improvement Means

ML Pipeline

Data Ingestion

Data Validation

Preprocessing

Feature Engineering

Hyperparameter Tuning

Model Training

Evaluation

Model Registry

System Architecture Overview

Data Layer

ML Layer

ML Pipeline Layer

Infrastructure Layer

Engineered Features

Infrastructure & Tooling

Engineering Principles

Tech Stack

Want to explore the full system?

Churn Prediction
End-to-End ML System