RFM Customer Segmentation:
Analyzing Purchase Behavior
Applying the Recency, Frequency, Monetary (RFM) model to transform raw transactional data into meaningful customer segments, driving personalized marketing and retention efforts.
Project Overview: Quantifying Customer Value
RFM analysis is a data-driven marketing technique used to quantitatively rank and group customers based on their purchasing habits. The resulting segments allow businesses to allocate marketing resources efficiently by targeting the most valuable or 'at-risk' customer groups.
- Dataset: E-commerce Transactional Data (Customer ID, Invoice Date, Sales Amount).
- Goal: Create and label distinct customer segments (e.g., Champions, At-Risk, New Customers).
- Technique: Quantile-based Ranking (q-cut) for R, F, and M scores.
- Deliverable: Actionable segmentation for personalized campaigns.
Step 1: Calculating R, F, and M Metrics
The first step aggregates transactional data for each customer ID to derive the three key metrics, based on a defined 'snapshot date' (usually the most recent date in the dataset).
import pandas as pd
from datetime import datetime
# Load and prepare data (simulated)
df = pd.read_csv('ecommerce_transactions.csv')
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
# Define a snapshot date (1 day after the last transaction date)
SNAPSHOT_DATE = df['InvoiceDate'].max() + pd.Timedelta(days=1)
# Calculate RFM metrics per customer
rfm_df = df.groupby('CustomerID').agg(
Recency=('InvoiceDate', lambda x: (SNAPSHOT_DATE - x.max()).days),
Frequency=('InvoiceNo', 'nunique'),
Monetary=('SalesAmount', 'sum')
).reset_index()
print("RFM Metrics Calculated:")
print(rfm_df.head())
Step 2: Assigning RFM Scores (1-5)
To simplify segmentation, we rank each metric into a score from 1 to 5 using quartiles (q-cuts). Note that Recency is scored inversely: the lower the recency value (more recent), the higher the score (5).
# Assign scores using quantiles (5 is the best score)
# R_Score: Lower Recency is better, so we reverse the quantile order
rfm_df['R_Score'] = pd.qcut(rfm_df['Recency'], 5, labels=[5, 4, 3, 2, 1])
# F_Score: Higher Frequency is better
rfm_df['F_Score'] = pd.qcut(rfm_df['Frequency'].rank(method='first'), 5, labels=[1, 2, 3, 4, 5])
# M_Score: Higher Monetary is better
rfm_df['M_Score'] = pd.qcut(rfm_df['Monetary'], 5, labels=[1, 2, 3, 4, 5])
# Combine R, F, and M scores into a single RFM_Score string
rfm_df['RFM_Score'] = rfm_df['R_Score'].astype(str) + rfm_df['F_Score'].astype(str) + rfm_df['M_Score'].astype(str)
Step 3: Defining Customer Segments
Using the combined RFM score, we map customers into high-level, actionable segments. For example, "Champions" (R=5, F=5, M=5) are the most valuable, while "Losing Customers" (R=1, F=1, M=1) require immediate attention.
Core RFM Segments
Actionable Marketing Strategy
The real value of RFM lies in its direct applicability to marketing campaigns. Each segment requires a unique treatment.
- Champions (R:5, F:5, M:5): Reward them. Focus on high-value pre-sales or exclusive new product launches. Avoid excessive discounting.
- At-Risk Customers (R:1-2, F:3-5, M:3-5): Win-back campaigns. Offer personalized discounts or surveys to understand why they stopped purchasing.
- Hibernating (R:1, F:1, M:1): Last resort. Deep discounting or entirely new product line offers to wake them up.