Churn Prediction
End-to-End ML System

Enterprise-grade telecom churn prediction pipeline — from raw CSV to production API. XGBoost + GPU acceleration, 250-trial Optuna tuning, Great Expectations validation, MLflow experiment tracking, FastAPI serving, and 13-job CI/CD to AWS ECR.

GitHub Repository
Endpoint Status

Try the Churn Prediction Model

⚠️ ECS CLOUD ENDPOINT IS NOT RUNNING TO REDUCE COST.

The FastAPI inference service is configured and containerised but the ECS task is kept stopped to avoid idle cloud charges. The screenshots below show the live API in action — spin it up locally with docker run from the GitHub repo.

Churn Prediction API screenshot 1
Churn Prediction API screenshot 2
Screenshot large view
97.37%
Recall (holdout)
51,500
Holdout Samples
250
Optuna Trials
11
Engineered Features
140+
Automated Tests
13
CI/CD Jobs
Project Overview

From Raw Data to Production API

A full-stack ML engineering project solving telecom customer churn — built to enterprise production standards. The system ingests raw customer data, enforces data contracts with Great Expectations, engineers 11 domain-specific features, trains an XGBoost model on GPU, and serves predictions via a FastAPI microservice deployed to AWS.

Every layer is observable: MLflow tracks all experiments, GitHub Actions runs 13 CI/CD jobs on every push, and the prediction threshold is fully configurable at serving time — giving business teams control over precision vs. recall tradeoffs.

Python 3.11 XGBoost + GPU FastAPI MLflow Docker AWS S3 / ECR
Model Performance — Holdout
Recall 0.9737
AUC-ROC 0.8712
F1 Score 0.7143
Precision 0.5688
Confusion Matrix (threshold 0.35)
37,524
True Negative
5,386
False Positive
232
False Negative
8,358
True Positive
Business Impact

How This System Enables Stakeholder Decisions

Four organizational roles, one shared intelligence layer.

📊
CRM / Retention Team
Who do we contact first this week?
  • Daily batch scores from /predict/batch API
  • Risk-ranked customer list by churn probability
  • Configurable threshold tunes outreach volume
  • False-negative rate < 3% minimises missed churners
Prioritise daily call queue
🎯
Marketing / Growth
Which segment needs a retention offer?
  • CLV × Risk Score × Engagement Score segmentation
  • High-CLV + high-risk → premium win-back campaign
  • Contract-type features identify at-risk cohorts
  • Month-to-month subscribers flagged proactively
Target campaign spend
📈
Executive / Finance
What is the revenue risk this quarter?
  • Churn probability × monthly charge = projected ARR loss
  • Cohort-level risk dashboards from batch scoring
  • Retention spend ROI calculated from predicted saves
  • Model recall = 97.37% → near-complete risk visibility
Revenue forecasting
⚙️
Product / Ops
Is the model safe to deploy? What changed?
  • Great Expectations gates block bad data at ingestion
  • 140+ pytest tests + code-coverage thresholds in CI
  • MLflow tracks every run, param, metric, artefact
  • Docker health checks + /ready endpoint in prod
Safe, observable releases
97.37%
Recall on
51,500-sample holdout
51.5K
Holdout samples
never seen in training
140+
Automated tests
across 13 CI/CD jobs
0.35
Default threshold
configurable at serve time
Business Value

Why 97% Recall Changes the Business

Cost of a Missed Churner

Every false negative is a customer who churned without intervention. At average telecom CLV of $300–$800, a 3% miss rate on 51,500 samples represents $460K–$1.2M in recoverable revenue annually.

Precision–Recall Tradeoff

The threshold (0.35 default) is tunable at inference time. Lower threshold → higher recall, more outreach cost. Higher threshold → higher precision, fewer false alarms. Business owns the tradeoff — not the data scientist.

Feature-Driven Segmentation

Engineered features like CLV, Risk Score, Engagement Score, and Contract Stability Index let marketing teams build retention cohorts directly from model outputs — without a separate analysis step.

Production-Grade Reliability

13 CI/CD jobs, 140+ tests, health checks, and data validation gates mean the system degrades gracefully — not silently. Ops teams get observable, auditable, reproducible behaviour at every deploy.

Configurable Threshold — Business Team Controls Risk

0.25
Low threshold
Maximum recall · high outreach cost
0.35 ✓
Default (production)
Balanced recall vs. precision
0.50
High threshold
High precision · fewer interventions
ROI Analysis

ROI Simulation — What 1% Retention Improvement Means

Based on a 51,500-customer portfolio at industry-average telecom CLV

Scenario Assumptions

51,500 customers · avg monthly charge $65 · avg CLV $780 · retention offer cost $45/customer · baseline churn rate 17.1%

8,807
Predicted churners
identified (recall 97.37%)
out of 9,045 actual churners
$396K
Intervention cost
at $45 per outreach
targeted only, not bulk
$3.4M
Revenue preserved
at 50% win-back rate
$780 CLV × 4,400 saves
8.6×
Return on investment
($3.4M / $396K)
conservative estimate
Every 1% improvement in retention on this customer base preserves approximately $401K in annual revenue — making the model's 97.37% recall directly measurable in dollars, not just in metrics.

ML Pipeline

8 sequential stages from raw data to production model

01

Data Ingestion

Load train/test/holdout CSVs. Validate file existence, column presence, and basic schema before any processing begins.

02

Data Validation

Great Expectations suite: schema checks, business rule validation (gender, contract type, charge ranges), and statistical drift detection.

03

Preprocessing

Column normalisation, null handling strategies per feature, categorical encoding, and TotalCharges numeric conversion.

04

Feature Engineering

11 domain features: CLV, Risk Score, Engagement Score, Contract Stability Index, Service Complexity, Tenure Groups, and more.

05

Hyperparameter Tuning

Optuna Bayesian optimisation over 250 trials. Recall-focused objective with XGBoost GPU backend and custom threshold search.

06

Model Training

XGBoost with tree_method=gpu_hist. Best params from Optuna applied. Early stopping on validation recall. Full MLflow logging.

07

Evaluation

Holdout evaluation on 51,500 samples. Confusion matrix, classification report, AUC-ROC. Recall target ≥ 0.95 enforced by CI.

08

Model Registry

MLflow model registry + AWS S3 versioned artefact storage. Production version tagged, previous versions retained for rollback.

System Architecture Overview

Four layers designed for observability, reproducibility and zero-downtime deployments

🏗️

Data Layer

  • CSV ingestion (train / test / holdout)
  • Great Expectations validation suite
  • Pandas preprocessing pipeline
  • AWS S3 artefact storage
🧠

ML Layer

  • XGBoost + GPU training
  • Optuna 250-trial tuning
  • MLflow experiment tracking
  • Model registry + versioning
🚀

ML Pipeline Layer

  • FastAPI microservice
  • Docker multi-stage build
  • Health & readiness endpoints
  • Batch + single prediction APIs
☁️

Infrastructure Layer

  • GitHub Actions 13-job CI/CD
  • AWS ECR container registry
  • ECS-ready deployment config
  • Automated tests + coverage gates

Engineered Features

11 domain-specific features constructed from raw telecom signals

Feature Formula / Logic Business Meaning
CLV MonthlyCharges × tenure Estimated lifetime revenue per customer
risk_score Weighted contract + monthly charge + tenure signal Composite churn propensity index
engagement_score service_count × tenure_norm How embedded the customer is in the product
contract_stability contract_type_encoded × tenure Contractual lock-in strength
service_complexity Count of active add-on services Switching friction from multiple services
tenure_group Binned tenure: new / mid / loyal Customer lifecycle stage
charge_per_service MonthlyCharges / (service_count + 1) Perceived value-for-money signal
paperless_auto_pay PaperlessBilling AND AutoPay flag Digital engagement indicator
senior_no_support Senior citizen AND no tech support High-vulnerability segment flag
high_value_churn_risk CLV > median AND contract = monthly Priority intervention flag for CRM
charge_increase_risk TotalCharges / (tenure + 1) deviation Detects unexpectedly rising cost burden
Cloud & MLOps

Infrastructure & Tooling

GitHub Actions CI/CD

13 jobs: black, isort, flake8, mypy, bandit, safety, pytest (unit/integration/e2e), coverage gate ≥ 80%, Docker build, ECR push.

AWS S3 + ECR

Model artefacts versioned in S3. Container images built and pushed to ECR in CI. ECS deployment manifests included.

MLflow Tracking

Every training run logs params, metrics, confusion matrix artifact, and model binary. Promotes best run to registry automatically.

Docker Multi-stage Build

Builder stage installs deps, runtime stage copies only artefacts. Non-root user, health checks, and PYTHONDONTWRITEBYTECODE optimisations.

Design Patterns

Engineering Principles

Pipeline Factory Pattern

Each stage is an isolated, testable unit. Stages compose into a DAG — easy to swap, extend, or run in parallel.

Config-Driven Inference

Prediction threshold passed as request parameter — no redeploy needed to change business operating point.

Fail-Fast Validation

Great Expectations gates run before any model code. Bad data raises immediately — no silent model degradation.

Separation of Concerns

Feature engineering, model training, serving, and validation each live in isolated modules — enabling independent testing and deployment.

Tech Stack

ML & Data Science
Python 3.11 XGBoost (GPU) Optuna scikit-learn pandas numpy matplotlib seaborn
MLOps & Tracking
MLflow Great Expectations FastAPI Pydantic v2 Uvicorn
Infrastructure & DevOps
Docker GitHub Actions AWS S3 AWS ECR AWS ECS pytest black bandit mypy

Want to explore the full system?

The full infrastructure code, CI/CD pipeline, Docker configuration, and API implementation are available on GitHub.

View on GitHub ← All Projects