Portfolio · Case Study
A customer analytics platform for a bilingual DTC health-supplements brand running separate English and Spanish Shopify storefronts (~83K combined customers). Order, customer, and web-analytics data from both storefronts were ingested through PySpark pipelines on Databricks into a medallion (Bronze/Silver/Gold) Delta Lake architecture, then unified into a single customer table keyed on email. Customers were segmented with RFM — recency, frequency, and monetary value — using KMeans on each dimension, producing four behavioral segments: Window-Shoppers, One-Timers, Emerging Loyalists, and Loyalists. A cohort-based XGBoost model then predicted which lifetime-value tier a customer would reach from only their first four months of activity, and a migration analysis tracked how cohorts moved between segments over time. The findings drove a segment-level marketing strategy spanning discounting, channel investment, and reactivation.
Two separate Shopify storefronts — English and Spanish — plus Google Analytics and call-center data were ingested through PySpark pipelines on Databricks into a medallion (Bronze/Silver/Gold) Delta Lake architecture. Both customer bases were unified into a single table keyed on email, with storefront kept as a dimension, then enriched with RFM features and monthly cohort assignments.
Customers are segmented with classic RFM, but each dimension is clustered independently with KMeans rather than bucketed by fixed thresholds — so the cut points adapt to the actual distribution. Segments feed a cohort-based LTV model that predicts a customer's eventual value tier from only their first four months of behavior.

The full base scored on recency, frequency, and monetary value and split into four behavioral segments, each profiled by AOV, lifetime value, units-per-order, and subscription rate.

Where customers come from paired with what each channel is actually worth: traffic mix alongside AOV, LTV, and units-per-order by source.

How the segments compare on RFM and revenue, and how a single monthly cohort migrates toward Loyalist status over a full year.

Transition probabilities between segments a year on, showing which customers advance, hold, or slip back, and where intervention pays off.