End-to-End ML Regression Project

This repository demonstrates a complete machine learning pipeline, covering everything from data acquisition to feature transformation. It is based on Chapter 2 of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron.

The project uses the California housing dataset and is designed to serve as a robust template for structuring real-world regression workflows.

Project Highlights

Clean, modular pipeline using Scikit-Learn
Exploratory Data Analysis (EDA) with insightful visualizations
Feature engineering (ratios, log transforms, cluster similarities)
Custom transformers and reusable pipeline components
One-hot encoding of categorical features
Unified ColumnTransformer-based preprocessing pipeline
Output-ready dataset with 24 engineered features

Goal

To provide a clear, modular, and reproducible example of a real-world machine learning pipeline for regression tasks—structured in a way that supports scaling, experimentation, and future model training.

Dataset

California Housing Dataset from the 1990 U.S. Census

Predicts median house value from 9+ features
Common benchmark for regression modeling and pipeline design
Loaded via fetch_california_housing or from external .tgz file

Core Concepts Covered

Concept	Implementation
Data Loading	`fetch_housing_data()` with caching
Visualization	Histograms, scatterplots, `scatter_matrix`
Stratified Sampling	Based on income categories
Correlation Analysis	Pearson coefficient, matrix, and plots
Feature Engineering	Rooms-per-household, income ratios, etc.
Pipelines	`Pipeline` + `ColumnTransformer`
Custom Transformers	`CombinedAttributesAdder`, cluster encoder
Categorical Encoding	`OneHotEncoder`
Scaling	`StandardScaler`

Reference

Based on concepts from:

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
by Aurélien Géron

Notes

This repo focuses only on data handling and preprocessing. Model training, evaluation, and hyperparameter tuning

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
exercises		exercises
House Price Predictor.ipynb		House Price Predictor.ipynb
LinearRegression and K-nearest neighbors.ipynb		LinearRegression and K-nearest neighbors.ipynb
README.md		README.md
Select and Train a Model.ipynb		Select and Train a Model.ipynb
a. MAE and RMSE code snippet.ipynb		a. MAE and RMSE code snippet.ipynb
b. Fetching, Loading, and Analyzing the data.ipynb		b. Fetching, Loading, and Analyzing the data.ipynb
c. Creating TestSet and TrainSet.ipynb		c. Creating TestSet and TrainSet.ipynb
d. Stratified Sampling.ipynb		d. Stratified Sampling.ipynb
e. Visualizing Data.ipynb		e. Visualizing Data.ipynb
f. Correlations.ipynb		f. Correlations.ipynb
g. Experimenting with different Attributes Combinations.ipynb		g. Experimenting with different Attributes Combinations.ipynb
h. Clean the Data with Imputer.ipynb		h. Clean the Data with Imputer.ipynb
housing.csv		housing.csv
i. OrdinalEncoder & OneHotEncoder.ipynb		i. OrdinalEncoder & OneHotEncoder.ipynb
j. Feature Scaling.ipynb		j. Feature Scaling.ipynb
k. Feature Transforming.ipynb		k. Feature Transforming.ipynb
l. Custom Transformers.ipynb		l. Custom Transformers.ipynb
m. Transformation Pipeline.ipynb		m. Transformation Pipeline.ipynb
n. Hyperparameter Tuners.ipynb		n. Hyperparameter Tuners.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-to-End ML Regression Project

Project Highlights

Goal

Dataset

Core Concepts Covered

Reference

Notes

About

Uh oh!

Releases

Packages

Languages

eigenlambda123/hands-on-ml-regression

Folders and files

Latest commit

History

Repository files navigation

End-to-End ML Regression Project

Project Highlights

Goal

Dataset

Core Concepts Covered

Reference

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages