Skip to content

Diwali Sales Data Analysis is a Python project that analyzes Diwali season sales data to uncover insights on customer demographics, purchasing patterns, and regional performance using data cleaning, EDA, and visualization techniques.

Notifications You must be signed in to change notification settings

MoreSangeet/Diwali-Sales-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Diwali Sales Data Analysis

Welcome to the Diwali Sales Data Analysis project! This Python project focuses on analyzing sales data from Diwali, extracting valuable insights regarding customer demographics, purchasing patterns, and regional sales performance.


Dataset Overview
This project uses a Diwali Sales Data stored in a CSV file. The dataset contains various customer and sales-related details, which are crucial for analyzing customer behavior and making business decisions.


📚 Libraries Used

We’ve used the following Python libraries for data manipulation, visualization, and analysis:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt  # For data visualization
import seaborn as sns           # For advanced data visualization
%matplotlib inline

📥 Data Import and Preparation

We start by importing the Diwali sales data from a CSV file and then perform various preprocessing tasks:

# Import CSV file
df = pd.read_csv(r"C:\Users\sangit\Downloads\Python_Diwali_Sales_Analysis\Diwali Sales Data.csv", encoding='unicode_escape')

## View dataset dimensions and first few records**
df.shape  # Output: (11251, 15)
df.head()  # Preview the first 5 rows of the dataset

The dataset contains 15 columns and 11,251 rows with details like user ID, product information, gender, age group, marital status, and purchase details.


🧹 Data Cleaning and Transformation

We clean the data by:

  • Dropping irrelevant or empty columns
  • Handling missing values
  • Converting data types for consistency
# Drop irrelevant/blank columns
df.drop(['Status', 'unnamed1'], axis=1, inplace=True)

# Check for missing values
df.isnull().sum()

# Drop rows with null values
df.dropna(inplace=True)

# Convert 'Amount' to integer type
df['Amount'] = df['Amount'].astype('int')

🔎 Data Exploration

1. General Statistics

We use the describe() method to understand key statistics like mean, standard deviation, and range for numerical columns:

df.describe()
  • The average age of customers is approximately 35 years.
  • The average purchase amount is ₹9453.

2. Data Visualizations

Gender Distribution

We plot a countplot to visualize the gender distribution of customers:

sns.countplot(x='Gender', data=df)

Insights:

  • Most customers are female.

  • Females also have a higher purchasing power compared to males.

Age Group vs Purchase Amount

We explore the relationship between age group and total purchase amounts using a barplot:

Copy
sns.barplot(x='Age Group', y='Amount', data=df.groupby(['Age Group'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False))

Insights:

  • The 26-35 years age group is the most active in making purchases, particularly females.

Top States by Sales

We visualize the total sales by state to understand regional performance:

Copy
sns.barplot(x='State', y='Amount', data=df.groupby(['State'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head(10))

Insights:

  • Uttar Pradesh, Maharashtra, and Karnataka have the highest sales.

Marital Status and Purchases

We compare purchase amounts based on marital status and gender:

Copy
sns.barplot(data=df.groupby(['Marital_Status', 'Gender'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False), x='Marital_Status', y='Amount', hue='Gender')

Insights:

  • Married women tend to spend more, especially on specific product categories.

Occupation vs Sales

We analyze which occupations contribute most to sales:

Copy
sns.barplot(data=df.groupby(['Occupation'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False), x='Occupation', y='Amount')

Insights:

  • Occupations in IT, Healthcare, and Aviation lead in purchases.

Product Category Breakdown

We visualize the top-selling product categories:

Copy
sns.barplot(data=df.groupby(['Product_Category'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head(10), x='Product_Category', y='Amount')

Insights:

  • Food, Clothing, and Electronics are the most popular categories.

🎯 Key Insights & Conclusion

Project Learnings

  • Data Cleaning and Manipulation: We used various techniques to clean and prepare the data for analysis, ensuring consistency across all columns.
  • Exploratory Data Analysis (EDA): Leveraged libraries like pandas, matplotlib, and seaborn to gain deeper insights into the dataset through visualization and statistical analysis.
  • Improved Customer Understanding: By analyzing different customer demographics (state, occupation, gender, and age), we identified segments of potential customers that could be targeted for more personalized marketing.
  • Sales Insights: We identified the most popular products and categories, which can help in better inventory planning and sales forecasting.

Conclusion

From the analysis, we concluded that:

  • Married women in the age group 26-35 years from states like Uttar Pradesh, Maharashtra, and Karnataka working in sectors like IT, Healthcare, and Aviation are more likely to purchase products, particularly from Food, Clothing, and Electronics categories.

This analysis can help businesses optimize inventory, target the right customer segments, and improve marketing strategies to boost sales.


📌 Project Learnings

  • Data cleaning and manipulation
  • Exploratory data analysis (EDA) using pandas, matplotlib, and seaborn libraries
  • Improved customer experience by identifying potential customers across different states, occupation, gender, and age groups
  • Improved sales by identifying most selling product categories and products, which can help to plan inventory and hence meet the demands.

Thank you for visiting my project! I hope the analysis gives you valuable insights into consumer behavior and sales trends during the Diwali season. If you found this project helpful or interesting, feel free to star it ⭐ and leave a comment. Happy learning! 🙌

About

Diwali Sales Data Analysis is a Python project that analyzes Diwali season sales data to uncover insights on customer demographics, purchasing patterns, and regional performance using data cleaning, EDA, and visualization techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published