Skip to content

This is a small project using statistics to perform Data Science in Python. Visualizations were created and hypothesis testing was performed using Python libraries on a housing dataset based in Boston, MA.

Notifications You must be signed in to change notification settings

benjaminfunk47/Boston_Housing_Hypothesis_Testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Tools Used:

This is a small project to create visualizations and perform hypothesis testing using a dataset on houses in Boston, Massachusetts. This was all performed in Python. The Python libraries used were:

  • Scipy
  • Seaborn
  • Matplotlib
  • Pandas

Project Scenario:

I'm a Data Scientist with a housing agency in Boston, MA and have been given access to a previous dataset on housing prices derived from the U.S. Census Service to present insights to higher management. Based on my experience in Statistics, what information can you provide them to help with making an informed decision? Upper management will like to get some insight into the following:

  • Is there a significant difference in the median value of houses bounded by the Charles river or not?

  • Is there a difference in median values of houses of each proportion of owner-occupied units built before 1940?

  • Can we conclude that there is no relationship between Nitric oxide concentrations and the proportion of non-retail business acres per town?

  • What is the impact of an additional weighted distance to the five Boston employment centres on the median value of owner-occupied homes?

Using the appropriate graphs and charts, I'll generate statistics and visualizations that I think will be useful for the upper management to give them important insights given the questions they are asking.

The following describes the dataset variables:

  • CRIM - per capita crime rate by town

  • ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

  • INDUS - proportion of non-retail business acres per town.

  • CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

  • NOX - nitric oxides concentration (parts per 10 million)

  • RM - average number of rooms per dwelling

  • AGE - proportion of owner-occupied units built prior to 1940

  • DIS - weighted distances to five Boston employment centres

  • RAD - index of accessibility to radial highways

  • TAX - full-value property-tax rate per $10,000

  • PTRATIO - pupil-teacher ratio by town

  • LSTAT - % lower status of the population

  • MEDV - Median value of owner-occupied homes in $1000's

About

This is a small project using statistics to perform Data Science in Python. Visualizations were created and hypothesis testing was performed using Python libraries on a housing dataset based in Boston, MA.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published