A Python-based implementation of Decision Trees built from scratch, complete with Entropy (Information Gain) and Gini Index as splitting criteria. The project also includes a Graphviz-powered visualizer to generate crisp, interpretable tree diagrams.
- Custom Entropy & Gini Functions – Implemented from scratch, no external ML libraries.
- Dynamic Tree Builder – Recursively constructs decision trees using chosen impurity measures.
- Dual Criteria Support
- Entropy (Information Gain) → ID3-style splitting
- Gini Index → CART-style splitting
- Interactive Visualizations – Trees are exported as PNGs with Graphviz.
- Human-readable structure – Leaf nodes represent final decisions, internal nodes show feature splits.
DecisionTreeVisualizer/
│── Dataset.csv # Input dataset
│── Tree.py # Main script with tree logic + visualization
│── tree_entropy.png # Tree built using entropy
│── tree_gini.png # Tree built using gini index
│── README.md # You are here
-
Entropy & Gini Calculation
- Computes uncertainty of class labels.
- Lower impurity ⇒ better split.
-
Attribute Selection
- Recursively selects the best feature based on chosen metric (Entropy or Gini).
-
Tree Construction
- Splits dataset into subsets by feature values.
- Continues until pure leaves or no features remain.
-
Visualization
- Uses Graphviz (Digraph) to generate interpretable flowchart-like trees.
Install the required Python libraries:
pip install pandas numpy graphviz
Also, install the Graphviz system package (needed for rendering images):
sudo apt-get update && sudo apt-get install -y graphviz
brew install graphviz
Download from Graphviz.org and add it to your PATH.
Run the script with your dataset (weekend.csv as default):
python Tree.py
This will generate:
tree_entropy.png # Decision tree using Information Gain
tree_gini.png # Decision tree using Gini Index
Both trees will be saved in the working directory and usually open automatically.
Sure! Here’s the input part formatted as a table that you can directly paste into your README:
Here is an example of the input dataset used in the project:
Weekend | Weather | Parents | Financial Condition | Decision |
---|---|---|---|---|
W1 | Sunny | Yes | Rich | Cinema |
W2 | Sunny | No | Rich | Play Tennis |
W3 | Windy | Yes | Rich | Cinema |
W4 | Rainy | Yes | Poor | Cinema |
W5 | Rainy | No | Rich | Stay in |
W6 | Rainy | Yes | Poor | Cinema |
W7 | Windy | No | Poor | Cinema |
W8 | Windy | No | Rich | Shopping |
W9 | Windy | Yes | Rich | Cinema |
W10 | Sunny | No | Rich | Play Tennis |
- 🟦 Feature Nodes → Light-blue rounded boxes
- 🟩 Leaf Nodes → Green ellipses (final decision)
📌 Entropy-based trees maximize information gain
📌 Gini-based trees minimize class impurity