This project analyzes an A/B test dataset from Kaggle: AB Test Data
Goal: Determine whether the variant group generated significantly different conversion rates or revenue compared to the control group.
- USER_ID: Unique user identifier
- VARIANT_NAME: 'control' or 'variant'
- REVENUE: Revenue generated by the user (can be 0)
Key facts:
- 10,000 total records, ~50/50 split between groups.
- 98.5% of users have zero revenue.
- Highly skewed revenue distribution with extreme outliers.
- Checked data balance, missing values, and outliers.
- Visualized conversion and revenue distributions.
- Conversion Rate: Two-proportion Z-test (binary paid/not paid).
- Revenue (paying users): Mann–Whitney U test (non-parametric).
- Effect size for both tests (Cohen’s h, Rank-biserial correlation).
- Bootstrap confidence intervals for revenue differences.
- Minimum Detectable Effect (MDE) & power analysis.
Metric | Control | Variant | p-value | Significance |
---|---|---|---|---|
Conversion Rate | 1.61% | 1.43% | 0.488 | ❌ No |
Revenue (paying users) | 2,96 | 2,17 | 0.079 | ❌ No |
Group | Total Users | Paying Users | Conversion Rate | Paying_Mean | Paying_Median |
---|---|---|---|---|---|
control | 4984 | 80 | 0.016051364365971106 | 8.0375 | 2.96 |
variant | 5016 | 72 | 0.014354066985645933 | 4.8815277777777775 | 2.17 |
- Effect sizes indicate negligible to small differences.
- Bootstrap CI includes zero → differences may be positive or negative.
- Power analysis shows MDE ≈ 0,78 pp change, requiring ~16× more data to detect small effects.
- Revenue data is heavily skewed, making non-parametric tests more reliable.
- No statistical evidence that the variant performs differently from control in conversion rate or payer revenue.
- Small observed differences could be due to random chance and dataset is underpowered for small effect sizes.
AB_Test_Project.ipynb
→ Full analysis notebookAB_Test_Project.pdf
→ Clean PDF versionAB_Test.pptx
→ Slide deck presentation
- Hypothesis testing: Z-test, Mann–Whitney
- Effect size interpretation
- Power analysis & MDE
- Bootstrap confidence intervals
- Data visualization & storytelling