π Hands-on learning repository following the Anyscale Introduction to Ray Course
This repository contains practical Python implementations and examples designed to complement the Anyscale "Introduction to Ray" course. Each script demonstrates key concepts from distributed computing with Ray, progressing from basic tasks to advanced machine learning workflows.
- Course Overview
- Learning Path
- Project Structure
- Quick Start
- Key Ray Concepts
- Running Examples
- Documentation
- Utilities & Troubleshooting
- Additional Resources
This repository implements examples for the following Anyscale course modules:
| Course Module | Repository Files | Key Concepts |
|---|---|---|
| ποΈ Ray Core Fundamentals | ray_core.py, ray_actors.py |
Remote functions, ObjectRefs, Actors |
| β‘ Ray Core Advanced | ray_advanced.py |
Object Store, Runtime Environments, Resource Management |
| π Ray Data Processing | ray_data.py |
Distributed data processing, ETL pipelines |
| π€ Ray AI Libraries | ray_ai.py |
XGBoost integration, Distributed ML workflows |
| π― Ray Tune Optimization | ray_tune.py, ray_tune_torch.py |
Hyperparameter tuning, AutoML, Experiment tracking |
| π₯ Ray Train & PyTorch | ray_torch.py, ray_torch_ddp.py |
Distributed training, Model parallelism, DDP |
| π Ray Serve Deployment | ray_serve.py |
ML model serving, API endpoints, Scalable inference |
| π§ͺ Testing & Utilities | ray_minimal_test.py, shell scripts |
Memory management, Cleanup utilities |
After working through this repository, you'll understand:
- Ray's distributed computing model and core abstractions
- How to write scalable remote functions and stateful actors
- Object store patterns and memory management strategies
- Integration of Ray with popular ML libraries (XGBoost, PyTorch)
- Best practices for distributed training and model serving
- Troubleshooting and resource optimization techniques
Estimated time: 2-3 hours
-
π Prerequisites Check
python --version # Should be 3.11+ uv --version # Package manager
-
π Ray Basics -
ray_minimal_test.py- Verify Ray installation
- Understand basic Ray initialization
- Simple remote functions
-
π§ Core Concepts -
ray_core.py- Remote functions (
@ray.remote) - Object references (
ray.get,ray.put) - Common patterns and anti-patterns
- Remote functions (
Estimated time: 3-4 hours
-
π₯ Stateful Actors -
ray_actors.py- Actor lifecycle and state management
- Actor handles and communication
- Use cases for actors vs tasks
-
β‘ Advanced Features -
ray_advanced.py- Distributed object store
- Runtime environments
- Resource allocation and fractional resources
- Nested tasks and patterns
Estimated time: 4-5 hours
-
π€ ML Workflows -
ray_ai.py- Ray integration with XGBoost
- Distributed data processing
- Model training and evaluation
-
π₯ Distributed Training -
ray_torch.py- Ray Train with PyTorch
- Distributed data parallel training
- Checkpointing and metrics
- β Can create and call remote functions
- β Understand ObjectRefs and object store
- β Can implement and use Ray actors
- β Familiar with runtime environments
- β Can integrate Ray with ML libraries
- β Understand distributed training patterns
ray_fundamentals/
βββ π README.md # This comprehensive guide
βββ π¦ pyproject.toml # Project dependencies & config
βββ π Python Learning Modules:
β βββ ray_minimal_test.py # β
Installation verification
β βββ ray_core.py # ποΈ Remote functions & ObjectRefs
β βββ ray_actors.py # π₯ Stateful actors & communication
β βββ ray_advanced.py # β‘ Object store & runtime environments
β βββ ray_ai.py # π€ XGBoost ML workflow
β βββ ray_torch.py # π₯ PyTorch distributed training
βββ π οΈ Utility Scripts:
β βββ cleanup_ray.sh # π§Ή Clean Ray temp files
β βββ run_ray_safe.sh # π‘οΈ Run with memory limits
βββ π Documentation:
βββ docs/
βββ ray_resources.md # CPU/GPU resource allocation
βββ ray_runtime_notes.md # Runtime environment deep-dive
| File | Purpose | Key Concepts | Prerequisites |
|---|---|---|---|
ray_minimal_test.py |
π§ͺ Verify setup | Ray initialization, basic remote functions | Python basics |
ray_core.py |
ποΈ Foundation concepts | @ray.remote, ray.get(), ray.put(), anti-patterns |
None |
ray_actors.py |
π₯ Stateful computing | Actor classes, state management, handles | ray_core.py |
ray_advanced.py |
β‘ Advanced patterns | Object store, runtime envs, resources, nested tasks | ray_actors.py |
ray_ai.py |
π€ ML integration | XGBoost + Ray, distributed ML workflows | ML basics, pandas |
ray_torch.py |
π₯ Distributed training | Ray Train, PyTorch DDP, checkpointing | PyTorch knowledge |
- Python: 3.11 or higher
- Memory: 4GB+ RAM recommended
- OS: Linux, macOS, or Windows with WSL
# Clone or navigate to this repository
cd ray_fundamentals
# Install dependencies using uv (recommended)
uv sync
# Alternative: using pip
pip install -r requirements.txt# Test Ray installation with minimal example
uv run ray_minimal_test.py
# or: python ray_minimal_test.pyExpected output:
Ray initialized successfully!
Available resources: {'CPU': 1.0, 'memory': 256000000}
Simple task result: 84
Small matrix test passed: True
Ray shutdown successfully!
Begin with the Learning Path above, starting with ray_core.py:
uv run ray_core.py@ray.remote
def compute_task(data):
return process(data)
# Schedule task execution
future = compute_task.remote(my_data)
result = ray.get(future) # Retrieve resultFiles: ray_core.py, ray_advanced.py
@ray.remote
class StatefulWorker:
def __init__(self):
self.state = {}
def update(self, key, value):
self.state[key] = value
# Create actor instance
worker = StatefulWorker.remote()
worker.update.remote("key", "value")Files: ray_actors.py
# Store large objects once, reference many times
large_data = ray.put(massive_dataset)
results = [process_data.remote(large_data) for _ in range(10)]Files: ray_advanced.py
# Distributed training with Ray Train
from ray.train.xgboost import XGBoostTrainer
trainer = XGBoostTrainer(
datasets={"train": train_dataset},
params={"objective": "reg:squarederror"}
)
result = trainer.fit()Files: ray_ai.py, ray_torch.py
# Method 1: Using uv (recommended)
uv run <script_name>.py
# Method 2: Direct python execution
python <script_name>.py
# Method 3: With custom memory limits
./run_ray_safe.sh # Runs ray_advanced.py with memory constraintsFor systems with limited RAM:
# Use the provided safe execution script
./run_ray_safe.sh
# Or set environment variables manually
export RAY_OBJECT_STORE_ALLOW_SLOW_STORAGE=1
export RAY_memory_usage_threshold=0.6
python ray_advanced.py# 1. Verify installation
uv run ray_minimal_test.py
# 2. Learn core concepts
uv run ray_core.py
# 3. Explore actors
uv run ray_actors.py
# 4. Advanced patterns
./run_ray_safe.sh # runs ray_advanced.py
# 5. ML workflows
uv run ray_ai.py
# 6. Distributed training
uv run ray_torch.py
# 7. Cleanup (if needed)
./cleanup_ray.shThe docs/ directory contains additional learning resources:
| Document | Description | Key Topics |
|---|---|---|
ray_resources.md |
CPU/GPU Resource Management | num_cpus, num_gpus, resource allocation |
ray_runtime_notes.md |
Runtime Environments Deep Dive | Environment isolation, pip vs uv, Docker containers |
- Start with code examples
- Reference
ray_resources.mdwhen working withray_advanced.py - Review
ray_runtime_notes.mdfor production deployment insights
| Script | Purpose | Usage |
|---|---|---|
cleanup_ray.sh |
Remove Ray temporary files | ./cleanup_ray.sh |
run_ray_safe.sh |
Execute with memory limits | ./run_ray_safe.sh |
| Issue | Symptoms | Solution |
|---|---|---|
| Memory errors | Ray crashes, OOM kills | Use run_ray_safe.sh or reduce data sizes |
| Port conflicts | "Address already in use" | Run ray stop or ./cleanup_ray.sh |
| Import errors | Module not found | Ensure uv sync completed successfully |
| Slow startup | Long initialization times | Clean temp files with cleanup_ray.sh |
# Check Ray status
ray status
# View Ray dashboard (if available)
# Open browser to: http://localhost:8265
# Monitor system resources
htop # or top on macOS
# Check disk usage
df -h /tmp # Ray uses /tmp by defaultEnvironment Variables:
# Memory management
export RAY_OBJECT_STORE_ALLOW_SLOW_STORAGE=1
export RAY_memory_usage_threshold=0.6
# Disable warnings
export RAY_DISABLE_IMPORT_WARNING=1
# Custom temp directory
export RAY_TMPDIR=/path/to/custom/tmp- Ray Documentation - Comprehensive official docs
- Ray GitHub Repository - Source code and issues
- Anyscale Platform - Managed Ray platform
- Ray Tutorial - Getting started guide
- Ray Design Patterns - Common usage patterns
- Ray Examples - Official examples repository
- Ray Architecture - System design overview
- Performance Tips - Optimization guidelines
- Memory Management - Memory optimization strategies
- Ray Clusters - Multi-node setup
- Kubernetes - K8s integration
- Docker - Containerization guide
- Coding Style: Follows PEP 8 with extensive inline documentation
- Version Control: Uses
jj(Jitijiji) as an experimental alternative to Git - Package Management: Primary dependency management via
uvfor faster installs - Testing Strategy: Executable examples with assertions and print statements for validation
This is a personal learning repository, but suggestions and improvements are welcome! Feel free to:
- Report issues or errors in examples
- Suggest additional Ray concepts to explore
- Share alternative approaches or optimizations
π― Happy Learning! Start your Ray journey with the Learning Path and dive into distributed computing! π