Skip to content

Portfolio of reproducible data-science projects (forecasting + NLP) on synthetic retail datasets. Notebooks, figures, and READMEs included.

License

Notifications You must be signed in to change notification settings

nbchambers95/data-science-projects

Repository files navigation

Data Science & Analytics Portfolio

Python 3.9+ pandas statsmodels matplotlib seaborn License: MIT

This repo highlights applied projects in text analytics, forecasting, and regression.
All public datasets here are synthetic or public—no sensitive data.


📁 Projects

1) Milkshake Sales Forecasting — OLS (lags & weather) vs ARIMA/SARIMAX

OLS (lags) forecast   OLS (lags + weather) forecast


2) Competitive Analysis — Text Analytics

Compare sentiment and themes between two brands using NLP (sentiment, n-grams, light topic modeling).

Sentiment by business   Top unigrams by business   Top bigrams by business


3) Storewide Sales Forecasting — Linear Models (Calendar vs Calendar+Weather)

OLS calendar-only forecast   OLS calendar + weather forecast

Headline (test set, dollars):

Model RMSE MAE % within $20 PI coverage Avg PI range
Calendar only 89.98 59.49 50.70% 88.73% $790.84
Calendar + weather 91.46 59.96 53.52% 91.55% $734.92

🛠️ Stack

Python (pandas, NumPy, scikit-learn, statsmodels, matplotlib, seaborn)

Provenance & acknowledgements

  • This repository contains my capstone project for the M.S. at Appalachian State University.
  • All code was written by me. I drew on class materials and public documentation for reference, and I generated/used synthetic datasets that mirror the original private data.
  • I received feedback on modeling choices and presentation from Prof. Jeff Kaleta.
  • I also used AI assistance (ChatGPT) for drafting/refactoring text, improving documentation, and suggesting code organization. I reviewed, tested, and am responsible for all final code and results.

Releases

No releases published

Packages

No packages published