Skip to content

Bachelor thesis: Reinforcement learning Monopoly agent with hybrid DQN architecture and multi-network design, trained using expert learning

Notifications You must be signed in to change notification settings

w-i-l/monopoly-reinforcement-learning-agent

Repository files navigation

Monopoly RL Agent

A Reinforcement Learning Environment and a DQN Agent for Monopoly Game Strategy Optimization

image



About it

This project implements a comprehensive reinforcement learning environment for the Monopoly board game, featuring a novel hybrid Deep Q-Network (DQN) architecture that combines algorithmic strategies with learned behaviors. The system was developed as part of a bachelor's thesis project to explore advanced AI decision-making in complex, multi-agent, stochastic environments.

Key Innovations

  • Novel Hybrid Architecture: The project introduces a groundbreaking approach where multiple specialized neural networks handle different decision types, while algorithmic agents provide expert knowledge for complex scenarios with large action spaces.
  • Multi-Network DQN Design: Unlike traditional single-network approaches, this implementation uses separate networks for each major action type (property management, trading, financial decisions), enabling focused learning and improved decision accuracy.
  • Expert Learning Integration: The system implements supervised pre-training using expert agents, followed by reinforcement learning through environmental interaction, significantly reducing training time and improving convergence.
  • Optimized Game Simulation: Custom-built Monopoly environment with faithful rule implementation, optimized for fast simulation and RL training with proper state validation and error handling.

Achievement Highlights

  • 90% win rate against the parent algorithmic agent
  • 5.3% point improvement in 12-agent round-robin tournaments compared to the baseline model
  • Faithful reproduction of official Monopoly rules with RL-friendly adaptations
  • Real-time game visualization with React frontend
  • Comprehensive tournament management system for agent comparison
  • Support for multiple agent types: random, algorithmic, strategic variants, and advanced DQN
  • Ideal for research, benchmarking, and educational purposes in multi-agent reinforcement learning


How to use it

Prerequisites

  • Python 3.10+ (developed and tested with Python 3.10.15)
  • macOS with MPS support (for GPU acceleration) or Windows/Linux with CUDA
  • Node.js 16+ and npm for the frontend interface

NOTE: For this you will need to have conda installed. If you do not have it, you can install it by following the instructions from the official documentation.

Backend Setup (Python Environment)

1. Create and activate conda environment:

conda create -n monopoly-rl python=3.10
conda activate monopoly-rl

2. Install dependencies:

# Install requirements
pip install -r requirements.txt

# For macOS with MPS (GPU acceleration)
conda install -c apple tensorflow-deps
pip install tensorflow-macos==2.10.0
pip install tensorflow-metal==0.6.0

# For Windows/Linux (CPU/CUDA)
pip install tensorflow

Frontend Setup (React Interface)

1. Install dependencies:

cd frontend
npm install

Quick Start Guide

Play a game against the DQN Agent:

1. Run the script which will take care of everything:

cd src
python main.py

2. Access the interface:

Open your browser and navigate to http://localhost:5173

Train a new DQN agent:

For training a new agent, please refer to the training scripts, found in the dqn folder. There you can find training scripts for each of the specialized networks, ehich you can configure however you want.



How it works

Environment Architecture

The system is built with a modular, object-oriented design following software engineering best practices:

  • Game State Management: Centralized state class that preserves all game attributes (player positions, balances, properties). All state updates are validated to ensure legal moves and maintain game integrity.
  • Game Manager: Coordinates game logic, player turns, and rule enforcement. Acts as the main controller, interfacing with specialized managers for different game aspects.
  • Specialized Managers: Modular components handle specific game mechanics: dice rolling, chance/community chest cards, trading, and property management with built-in validation.
  • Player Base Class: Abstract interface defining callback methods for agent decision-making. Supports multiple agent types with consistent API for easy extensibility.

Hybrid DQN Architecture

---
config:
  theme: neutral
---
graph TD
    GS[Game State] --> SE[State Encoder<br/>100 features]
    SE --> MN[Multiple Specialized Networks]
    subgraph "Specialized Q-Networks"
        MN --> BP[Buy Property<br/>Network]
        MN --> UP[Property Upgrade<br/>Network]
        MN --> FN[Financial Management<br/>Network]
        MN --> JL[Jail Decision<br/>Network]
    end
    BP --> HD{Hybrid Decision<br/>Layer}
    UP --> HD
    FN --> HD
    JL --> HD
    SA[Strategic Agent<br/>Fallback] --> HD
    subgraph "Training Pipeline"
        EL[Expert Learning<br/>Pre-training] --> RL[Reinforcement Learning<br/>Self-play]
        RL --> ER[Experience Replay<br/>Separate buffers]
        ER --> TN[Target Networks<br/>Stable learning]
    end
    HD --> AC[Action]
    EL -.-> BP
    EL -.-> UP
    EL -.-> FN
    EL -.-> JL
    RL -.-> BP
    RL -.-> UP
    RL -.-> FN
    RL -.-> JL
    classDef coreNode fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000000
    classDef networkNode fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000
    classDef trainingNode fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
    classDef hybridNode fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000000
    class SE,MN coreNode
    class BP,UP,FN,JL networkNode
    class EL,RL,ER,TN trainingNode
    class HD hybridNode
Loading

The core innovation lies in the hybrid approach that combines the best of algorithmic and learning-based strategies:

Multi-Network Design

  • Property Purchase Network: Specialized for buy/pass decisions on unowned properties
  • Property Management Network: Handles upgrade/downgrade decisions for owned properties
  • Jail Network: Manages player interactions while in jail, including decisions to pay bail or use a "Get Out of Jail Free" card
  • Financial Network: Handles mortgage/unmortgage and cash management decisions

Expert Learning Phase

The system uses supervised learning to bootstrap the networks:

  1. Collect gameplay data from expert algorithmic agents
  2. Pre-train each network on relevant decision scenarios
  3. Initialize with expert knowledge to reduce exploration time

Reinforcement Learning Phase

Networks continue learning through environmental interaction:

  • Experience Replay: Store and sample past experiences for stable learning
  • Target Networks: Separate target networks for stable Q-value updates
  • Epsilon-Greedy Exploration: Balance exploration vs exploitation
  • Reward Shaping: Custom reward functions for different game phases

State Representation and Action Spaces

The environment encodes the complex Monopoly state into a format suitable for neural networks:

State Vector Components:

  • Player positions, cash, and property ownership (40 properties × 4 players)
  • Property development levels and mortgage status
  • Game phase indicators (early, mid, late game)
  • Recent action history and opponent behavior patterns
  • Dice roll outcomes and card draw results

Action Space Discretization:

  • Binary decisions: Buy/Pass, Upgrade/Hold, Accept/Reject trade
  • Categorical choices: Which properties to develop, mortgage priorities
  • Hybrid decisions: Algorithmic for complex trades, learned for simple choices

Tournament and Evaluation System

Comprehensive evaluation framework to assess agent performance:

  • Round-Robin Tournaments: All agents play against each other multiple times
  • Statistical Analysis: Win rates, average game length, financial performance
  • Strategy Analysis: Property acquisition patterns, trading behavior
  • Performance Visualization: Real-time dashboard showing training progress


Tech specs

Development Environment

  • Language: Python 3.10.15
  • AI Framework: TensorFlow 2.10.0 with Keras
  • GPU Acceleration: TensorFlow Metal 0.6.0 (macOS), CUDA (Windows/Linux)
  • Frontend: React 18.3.1 with Vite 6.0.1
  • API: FastAPI for backend services
  • Styling: Tailwind CSS 3.4.16

Machine Learning Architecture

  • Algorithm: Deep Q-Networks (DQN) with experience replay
  • Network Architecture: Multiple specialized networks for different action types
  • Training Paradigm: Hybrid approach combining supervised learning and reinforcement learning
  • Optimization: Adam optimizer with learning rate scheduling
  • Memory Management: Experience replay buffer

System Requirement on which the project was developed

Component Minimum
RAM 16GB (shared memory)
CPU 10-core
Storage 10GB
GPU M2 Pro GPU(16 cores)
OS macOS Squoia 15.4.1

Performance Optimizations

  • Vectorized Operations: NumPy and TensorFlow operations for fast computation
  • Batch Processing: Efficient batch training with configurable batch sizes
  • Memory Pooling: Reuse of game state objects to reduce garbage collection
  • Parallel Simulation: Multiprocessing for tournament execution
  • Model Optimization: TensorFlow model optimization for deployment

About

Bachelor thesis: Reinforcement learning Monopoly agent with hybrid DQN architecture and multi-network design, trained using expert learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published