This project implements a comprehensive reinforcement learning environment for the Monopoly board game, featuring a novel hybrid Deep Q-Network (DQN) architecture that combines algorithmic strategies with learned behaviors. The system was developed as part of a bachelor's thesis project to explore advanced AI decision-making in complex, multi-agent, stochastic environments.
- Novel Hybrid Architecture: The project introduces a groundbreaking approach where multiple specialized neural networks handle different decision types, while algorithmic agents provide expert knowledge for complex scenarios with large action spaces.
- Multi-Network DQN Design: Unlike traditional single-network approaches, this implementation uses separate networks for each major action type (property management, trading, financial decisions), enabling focused learning and improved decision accuracy.
- Expert Learning Integration: The system implements supervised pre-training using expert agents, followed by reinforcement learning through environmental interaction, significantly reducing training time and improving convergence.
- Optimized Game Simulation: Custom-built Monopoly environment with faithful rule implementation, optimized for fast simulation and RL training with proper state validation and error handling.
- 90% win rate against the parent algorithmic agent
- 5.3% point improvement in 12-agent round-robin tournaments compared to the baseline model
- Faithful reproduction of official Monopoly rules with RL-friendly adaptations
- Real-time game visualization with React frontend
- Comprehensive tournament management system for agent comparison
- Support for multiple agent types: random, algorithmic, strategic variants, and advanced DQN
- Ideal for research, benchmarking, and educational purposes in multi-agent reinforcement learning
- Python 3.10+ (developed and tested with Python 3.10.15)
- macOS with MPS support (for GPU acceleration) or Windows/Linux with CUDA
- Node.js 16+ and npm for the frontend interface
NOTE: For this you will need to have
conda
installed. If you do not have it, you can install it by following the instructions from the official documentation.
1. Create and activate conda environment:
conda create -n monopoly-rl python=3.10
conda activate monopoly-rl
2. Install dependencies:
# Install requirements
pip install -r requirements.txt
# For macOS with MPS (GPU acceleration)
conda install -c apple tensorflow-deps
pip install tensorflow-macos==2.10.0
pip install tensorflow-metal==0.6.0
# For Windows/Linux (CPU/CUDA)
pip install tensorflow
1. Install dependencies:
cd frontend
npm install
Play a game against the DQN Agent:
1. Run the script which will take care of everything:
cd src
python main.py
2. Access the interface:
Open your browser and navigate to http://localhost:5173
Train a new DQN agent:
For training a new agent, please refer to the training scripts, found in the dqn folder. There you can find training scripts for each of the specialized networks, ehich you can configure however you want.
The system is built with a modular, object-oriented design following software engineering best practices:
- Game State Management: Centralized state class that preserves all game attributes (player positions, balances, properties). All state updates are validated to ensure legal moves and maintain game integrity.
- Game Manager: Coordinates game logic, player turns, and rule enforcement. Acts as the main controller, interfacing with specialized managers for different game aspects.
- Specialized Managers: Modular components handle specific game mechanics: dice rolling, chance/community chest cards, trading, and property management with built-in validation.
- Player Base Class: Abstract interface defining callback methods for agent decision-making. Supports multiple agent types with consistent API for easy extensibility.
---
config:
theme: neutral
---
graph TD
GS[Game State] --> SE[State Encoder<br/>100 features]
SE --> MN[Multiple Specialized Networks]
subgraph "Specialized Q-Networks"
MN --> BP[Buy Property<br/>Network]
MN --> UP[Property Upgrade<br/>Network]
MN --> FN[Financial Management<br/>Network]
MN --> JL[Jail Decision<br/>Network]
end
BP --> HD{Hybrid Decision<br/>Layer}
UP --> HD
FN --> HD
JL --> HD
SA[Strategic Agent<br/>Fallback] --> HD
subgraph "Training Pipeline"
EL[Expert Learning<br/>Pre-training] --> RL[Reinforcement Learning<br/>Self-play]
RL --> ER[Experience Replay<br/>Separate buffers]
ER --> TN[Target Networks<br/>Stable learning]
end
HD --> AC[Action]
EL -.-> BP
EL -.-> UP
EL -.-> FN
EL -.-> JL
RL -.-> BP
RL -.-> UP
RL -.-> FN
RL -.-> JL
classDef coreNode fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000000
classDef networkNode fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000
classDef trainingNode fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
classDef hybridNode fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000000
class SE,MN coreNode
class BP,UP,FN,JL networkNode
class EL,RL,ER,TN trainingNode
class HD hybridNode
The core innovation lies in the hybrid approach that combines the best of algorithmic and learning-based strategies:
- Property Purchase Network: Specialized for buy/pass decisions on unowned properties
- Property Management Network: Handles upgrade/downgrade decisions for owned properties
- Jail Network: Manages player interactions while in jail, including decisions to pay bail or use a "Get Out of Jail Free" card
- Financial Network: Handles mortgage/unmortgage and cash management decisions
The system uses supervised learning to bootstrap the networks:
- Collect gameplay data from expert algorithmic agents
- Pre-train each network on relevant decision scenarios
- Initialize with expert knowledge to reduce exploration time
Networks continue learning through environmental interaction:
- Experience Replay: Store and sample past experiences for stable learning
- Target Networks: Separate target networks for stable Q-value updates
- Epsilon-Greedy Exploration: Balance exploration vs exploitation
- Reward Shaping: Custom reward functions for different game phases
The environment encodes the complex Monopoly state into a format suitable for neural networks:
State Vector Components:
- Player positions, cash, and property ownership (40 properties × 4 players)
- Property development levels and mortgage status
- Game phase indicators (early, mid, late game)
- Recent action history and opponent behavior patterns
- Dice roll outcomes and card draw results
Action Space Discretization:
- Binary decisions: Buy/Pass, Upgrade/Hold, Accept/Reject trade
- Categorical choices: Which properties to develop, mortgage priorities
- Hybrid decisions: Algorithmic for complex trades, learned for simple choices
Comprehensive evaluation framework to assess agent performance:
- Round-Robin Tournaments: All agents play against each other multiple times
- Statistical Analysis: Win rates, average game length, financial performance
- Strategy Analysis: Property acquisition patterns, trading behavior
- Performance Visualization: Real-time dashboard showing training progress
- Language: Python 3.10.15
- AI Framework: TensorFlow 2.10.0 with Keras
- GPU Acceleration: TensorFlow Metal 0.6.0 (macOS), CUDA (Windows/Linux)
- Frontend: React 18.3.1 with Vite 6.0.1
- API: FastAPI for backend services
- Styling: Tailwind CSS 3.4.16
- Algorithm: Deep Q-Networks (DQN) with experience replay
- Network Architecture: Multiple specialized networks for different action types
- Training Paradigm: Hybrid approach combining supervised learning and reinforcement learning
- Optimization: Adam optimizer with learning rate scheduling
- Memory Management: Experience replay buffer
Component | Minimum |
---|---|
RAM | 16GB (shared memory) |
CPU | 10-core |
Storage | 10GB |
GPU | M2 Pro GPU(16 cores) |
OS | macOS Squoia 15.4.1 |
- Vectorized Operations: NumPy and TensorFlow operations for fast computation
- Batch Processing: Efficient batch training with configurable batch sizes
- Memory Pooling: Reuse of game state objects to reduce garbage collection
- Parallel Simulation: Multiprocessing for tournament execution
- Model Optimization: TensorFlow model optimization for deployment