A comprehensive loan portfolio analysis project to identify profitable segments, high-risk borrowers, and strategic insights for a bankβs lending business.
This project explores borrower behavior, financial performance, and risk exposure using Python, Jupyter Notebook, and data visualization β helping the bank make data-driven lending decisions and minimize portfolio losses.
This project aims to analyze a bankβs loan data to:
- Understand loan performance and profitability trends
- Identify risk concentrations in customers, products, and regions
- Discover the most reliable borrower segments
- Provide strategic recommendations to optimize profit and reduce losses
- What percentage of loans are good (fully paid) vs bad (charged off)?
- Which loan purposes and terms are most popular and profitable?
- How does employment length affect funding and repayment behavior?
- Which home ownership groups are most financially stable?
- Which states or regions contribute most to profit and risk?
- What strategic actions can reduce overall loan losses?
| Tool | Purpose |
|---|---|
| Python (Pandas, Matplotlib, Seaborn) | Data cleaning, transformation, and visualization |
| Jupyter Notebook | Exploratory data analysis and reporting |
| Power BI | Interactive dashboard creation and storytelling |
| Excel / CSV | Raw data input and initial inspection |
| SQL | Data querying and aggregation for analysis |
- Good Loans: 86.18% of portfolio generating $65.57M profit
- Bad Loans: 13.82% resulting in $28.25M loss
- Net Profit: $37.31M, confirming overall profitability
- β Goal: Reduce charged-off losses to improve total returns.
- Borrowers with 10+ years of employment are the most stable and profitable.
- βLess than 1 yearβ employees form a high-risk group, though large in volume.
- π Conclusion: The bankβs foundation lies in long-term, stable employees.
- Debt Consolidation dominates the portfolio β highest funding, revenue, and demand.
- However, this creates a single point of business risk.
- βοΈ Strategy: Diversify product offerings to reduce dependency.
- Mortgage holders receive and return the most funds, confirming asset-backed stability.
- Renters form a major risk segment by volume and need stricter underwriting.
- π§ Insight: The bankβs financial stability relies heavily on mortgage customers.
- California (CA) is the top-performing but most risk-exposed state.
- Heavy regional dependence makes the portfolio vulnerable to local economic downturns.
- π Strategy: Expand lending to states like TX, NY, FL for risk diversification.
| Segment | Description | Impact |
|---|---|---|
| Good Loans | 86.18% of portfolio | +$65.57M profit |
| Bad Loans | 13.82% charged off | -$28.25M loss |
| Top Product | Debt Consolidation | High profit, high risk |
| Top Borrowers | 10+ years employed, mortgage holders | Most stable & profitable |
| High Risk Borrowers | Renters, <1 year employed | Require strict underwriting |
| Top Region | California | Profitable but risky |
βββ Bank Loan Analysis.ipynb # Main analysis notebook
βββ Bank_loan_data.csv # Dataset file
βββ images/ # Folder containing chart images
β βββ Total_Amount_Received_by_Month.png
β βββ Total_Amount_Received_by_Employee_Length.png
β βββ Total_Amount_Received_by_States.png
βββ README.md # Project documentation
Here are some sample visualizations from the analysis:
This chart shows how loan repayments increased steadily each month, reaching their highest point in December 2021. It highlights the bankβs strong cash flow growth and ability to scale operations with rising customer demand.
This chart shows that borrowers with 10+ years of employment contribute the highest total repayments, proving they are the most financially reliable group. It also shows that borrowers with less than 1 year of employment form a large but riskier segment.
This chart shows that California (CA) generates the highest total repayments, making it the bankβs most profitable and dominant market. However, it also reveals regional concentration risk, suggesting the need to expand in New York (NY), Texas (TX), and Florida (FL) for better diversification.
The analysis confirms the bankβs loan business is in a phase of rapid, profitable growth, but with significant risk concentration in a few key areas.
- Profitability: The bank earns 37.31M Dollar net profit despite $28.25M losses.
- Risk Concentration: Over-reliance on Debt Consolidation loans and California market poses serious exposure.
- Customer Insights:
- Reliable: 10+ years employed, Mortgage holders
- Risky: Renters, <1 year employed
- Tighten Underwriting for Debt Consolidation loans.
- Implement Risk-Based Pricing for Renters and short-term employees.
- Diversify Markets beyond California to reduce regional dependency.
By improving risk control and portfolio balance, the bank can increase profits, reduce default losses, and build a sustainable, data-driven lending strategy.
- Data Cleaning & Preparation
- Exploratory Data Analysis (EDA)
- Business Intelligence & Visualization (Power BI)
- Statistical & Financial Analysis
- Risk Assessment & Strategic Reporting
- Data Storytelling & Insight Communication
β
Identified $28.25M loss drivers
β
Pinpointed top 3 reliable borrower segments
β
Defined actionable strategies to boost ROI
β
Built a storytelling dashboard for executive-level decisions
π€ Harsh Belekar
π Data Analyst | Python | SQL | Power BI | Excel | Data Visualization
π¬ LinkedIn | πGitHub
β If you found this project helpful, feel free to star the repo and connect with me for collaboration!


