Welcome to the Data Science and Big Data Analytics (DSBDA) Mini Project repository. This project is designed to assist third-year Computer Engineering students at Savitribai Phule Pune University (SPPU) in their DSBDA coursework. By providing comprehensive datasets and Python code, this repository aims to facilitate a deeper understanding of data analysis techniques and their practical applications.
This project focuses on analyzing crime-related data to extract meaningful insights. The datasets encompass various aspects of crime statistics, including property theft, violent crimes, and custodial deaths. Through this project, students will learn to preprocess data, perform exploratory data analysis (EDA), and visualize findings to draw informed conclusions.
The repository includes the following datasets in CSV format:
- 10_Property_stolen_and_recovered.csv: Details of stolen and recovered property cases.
- 20_Victims_of_rape.csv: Statistics on rape victims categorized by age and region.
- 25_Complaints_against_police.csv: Records of complaints filed against police personnel.
- 28_Trial_of_violent_crimes_by_courts.csv: Information on court trials related to violent crimes.
- 29_Period_of_trials_by_courts.csv: Duration statistics of various court trials.
- 30_Auto_theft.csv: Data on reported auto theft incidents.
- 31_Serious_fraud.csv: Records of serious fraud cases reported.
- 32_Murder_victim_age_sex.csv: Demographic details of murder victims.
- 33_CH_not_murder_victim_age_sex.csv: Data on culpable homicide cases not amounting to murder, with victim demographics.
- 35_Human_rights_violation_by_police.csv: Instances of human rights violations attributed to police actions.
- 36_Police_housing.csv: Information on housing facilities provided to police personnel.
- 39_Specific_purpose_of_kidnapping_and_abduction.csv: Categorization of kidnapping and abduction cases based on intent.
- 40_01_Custodial_death_person_remanded.csv: Details of custodial deaths of remanded individuals.
- 40_02_Custodial_death_person_not_remanded.csv: Records of custodial deaths of individuals not on remand.
- 40_03_Custodial_death_during_production.csv: Cases of custodial deaths occurring during court productions.
The repository is organized as follows:
- crime/: Directory containing Python scripts for data analysis.
- datasets/: Folder housing all the CSV files mentioned above.
- notebooks/: Jupyter notebooks demonstrating data analysis and visualization techniques.
To effectively utilize this repository, follow the steps below to set up your environment, run the Streamlit application, and optionally deploy it for wider accessibility.
Begin by cloning this repository to your local machine:
git clone https://github.com/ironman2024/Mini-Project-DSBDA-SPPU.git
Ensure you have Python 3.x installed on your system. It's recommended to create a virtual environment to manage dependencies:
cd Mini-Project-DSBDA-SPPU
python -m venv venv
source venv/bin/activate # On Windows, use 'venv\Scripts\activate'
Install the required packages:
pip install -r requirements.txt
The requirements.txt
file includes all necessary libraries, such as:
pandas
numpy
matplotlib
seaborn
streamlit
This project includes a Streamlit application for interactive data analysis. To launch the app:
streamlit run app.py
This command will start a local development server and open the application in your default web browser. citeturn0search0
For detailed data analysis and visualization, navigate to the notebooks
directory:
cd notebooks
Open the Jupyter notebooks using:
jupyter notebook
These notebooks provide step-by-step analyses and visualizations of the datasets.
To make the Streamlit application accessible online, consider deploying it using Streamlit Community Cloud:
-
Prepare Your Repository: Ensure your project is pushed to a public GitHub repository.
-
Sign Up on Streamlit Community Cloud: Create an account at Streamlit Community Cloud.
-
Deploy the App:
- Click on "New app" and connect your GitHub repository.
- Select the repository and branch containing your
app.py
. - Click "Deploy."
Your application will be live and accessible via a unique URL.
By following these steps, you can set up, run, and deploy the DSBDA Mini Project, enhancing your data analysis skills and sharing your work with others.
Below is a sample Python script demonstrating data loading and basic analysis:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
file_path = 'datasets/10_Property_stolen_and_recovered.csv'
data = pd.read_csv(file_path)
# Display the first few rows
print(data.head())
# Basic statistics
print(data.describe())
# Check for missing values
print(data.isnull().sum())
# Visualization: Property stolen vs. recovered
plt.figure(figsize=(10, 6))
sns.barplot(x='State/UT', y='Number of cases property stolen', data=data, color='red', label='Stolen')
sns.barplot(x='State/UT', y='Number of cases property recovered', data=data, color='green', label='Recovered')
plt.xticks(rotation=90)
plt.title('Property Stolen vs. Recovered by State/UT')
plt.legend()
plt.show()
Explanation:
-
Import Libraries: The script imports necessary libraries:
pandas
for data manipulation, andmatplotlib.pyplot
andseaborn
for data visualization. -
Load Dataset: The dataset
10_Property_stolen_and_recovered.csv
is loaded into a DataFrame. -
Inspect Data: The first few rows and basic statistics of the dataset are displayed to understand its structure and contents.
-
Check Missing Values: The script checks for any missing values in the dataset to ensure data quality.
-
Visualization: A bar plot is created to compare the number of property theft cases versus recovered cases across different States/UTs. This helps in visualizing and comparing the effectiveness of property recovery efforts regionally.
Contributions to enhance this project are welcome. Feel free to fork the repository, make modifications, and submit pull requests. Your contributions can help fellow students and practitioners in their data science journey.
This project is licensed under the MIT License. You are free to use, modify, and distribute this code as per the license terms.
By providing this repository, we aim to support students in their DSBDA coursework and foster a collaborative learning environment. If you find this project helpful, consider starring the repository to show your support.