📄 Final Report (PDF): View Full Analysis
📊 Google Colab Notebook: View Cleaned Source Code and Full Version
📽️ Checkpoint Presentation Video: Watch on YouTube
This project explores how unsupervised machine learning can help uncover socially vulnerable communities across the state of Virginia using the CDC’s 2018 Social Vulnerability Index (SVI) dataset. Our workflow includes:
-
Dimensionality Reduction: Principal Component Analysis (PCA) reduces 128 socio-demographic indicators into a compact vulnerability space.
-
Clustering: K-Means is used for full-state coverage, while DBSCAN identifies dense clusters of high-risk communities (hotspots).
-
Scoring Metrics: Two composite scores proposed are PCA-Norm and Distance-Norm to validate against official SVI scores using Pearson correlation.
This project was developed for CS 4774: Machine Learning at the University of Virginia (Spring 2025), and offers a data-driven framework for identifying and visualizing social vulnerability across Virginia's communities.