Skip to content

Commit f3cec0e

Browse files
authored
Merge pull request #4 from AIComputing101/coketaste/fix-build
Fix build of rocm7 and cuda
2 parents 360e2ba + e3b6f95 commit f3cec0e

File tree

93 files changed

+3798
-1585
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+3798
-1585
lines changed

CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ docker-compose up -d cuda-dev # For NVIDIA GPUs
4141
docker-compose up -d rocm-dev # For AMD GPUs
4242

4343
# Option 2: Native development
44-
# Install CUDA Toolkit 12.9.1+ or ROCm 6.4.3+
44+
# Install CUDA Toolkit 12.9.1+ or ROCm latest
4545
# See modules/module1/README.md for detailed setup instructions
4646

4747
# Build all examples
@@ -241,8 +241,8 @@ When reporting bugs, please include:
241241
### Environment Information
242242
- **Operating System**: (Ubuntu 22.04, Windows 11, etc.)
243243
- **GPU**: (RTX 4090, RX 7900 XTX, etc.)
244-
- **Driver Version**: (NVIDIA 535.x, ROCm 6.4.3, etc.)
245-
- **CUDA/HIP Version**: (12.9.1, 6.4.3, etc.)
244+
- **Driver Version**: (NVIDIA 535.x, ROCm latest, etc.)
245+
- **CUDA/HIP Version**: (12.9.1, 7.0, etc.)
246246
- **Docker**: (if using containerized development)
247247

248248
### Bug Description

README.md

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
44
[![CUDA](https://img.shields.io/badge/CUDA-12.9.1-76B900?logo=nvidia)](https://developer.nvidia.com/cuda-toolkit)
5-
[![ROCm](https://img.shields.io/badge/ROCm-6.4.3-red?logo=amd)](https://rocmdocs.amd.com/)
5+
[![ROCm](https://img.shields.io/badge/ROCm-7.0-red?logo=amd)](https://rocmdocs.amd.com/)
66
[![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker)](https://www.docker.com/)
7-
[![Examples](https://img.shields.io/badge/Examples-70%2B-green)](modules/)
7+
[![Examples](https://img.shields.io/badge/Examples-71-green)](modules/)
88
[![CI](https://img.shields.io/badge/CI-GitHub%20Actions-2088FF?logo=github-actions)](https://github.com/features/actions)
99

1010
**A comprehensive, hands-on educational project for mastering GPU programming with CUDA and HIP**
@@ -35,7 +35,7 @@
3535
**GPU Programming 101** is a complete educational resource for learning modern GPU programming. This project provides:
3636

3737
- **9 comprehensive modules** covering beginner to expert topics
38-
- **70+ working code examples** in both CUDA and HIP
38+
- **71 working code examples** in both CUDA and HIP
3939
- **Cross-platform support** for NVIDIA and AMD GPUs
4040
- **Production-ready development environment** with Docker
4141
- **Professional tooling** including profilers, debuggers, and CI/CD
@@ -197,10 +197,11 @@ This architectural knowledge is essential for writing efficient GPU code and is
197197
|---------|-------------|
198198
| 🎯 **Complete Curriculum** | 9 progressive modules from basics to advanced topics |
199199
| 💻 **Cross-Platform** | Full CUDA and HIP support for NVIDIA and AMD GPUs |
200-
| 🐳 **Docker Ready** | Complete containerized development environment |
201-
| 🔧 **Production Quality** | Professional build systems, testing, and profiling |
200+
| 🐳 **Docker Ready** | Complete containerized development environment with CUDA 12.9.1 & ROCm 7.0 |
201+
| 🔧 **Production Quality** | Professional build systems, auto-detection, testing, and profiling |
202202
| 📊 **Performance Focus** | Optimization techniques and benchmarking throughout |
203203
| 🌐 **Community Driven** | Open source with comprehensive contribution guidelines |
204+
| 🧪 **Advanced Libraries** | Support for Thrust, MIOpen, and production ML frameworks |
204205

205206
## 🚀 Quick Start
206207

@@ -217,14 +218,14 @@ cd gpu-programming-101
217218

218219
# Inside container: verify GPU access and start learning
219220
/workspace/test-gpu.sh
220-
cd modules/module1 && make && ./01_vector_addition_cuda
221+
cd modules/module1 && make && ./build/01_vector_addition_cuda
221222
```
222223

223224
### Option 2: Native Installation
224225
For direct system installation:
225226

226227
```bash
227-
# Prerequisites: CUDA 11.0+ or ROCm 5.0+, GCC 7+, Make
228+
# Prerequisites: CUDA 12.0+ or ROCm 7.0+, GCC 9+, Make
228229

229230
# Clone and build
230231
git clone https://github.com/AIComputing101/gpu-programming-101.git
@@ -265,7 +266,7 @@ Our comprehensive curriculum progresses from fundamental concepts to production-
265266
| [**Module 8**](modules/module8/) | 🚀 Expert | 10-12h | **Domain Applications** | ML, Scientific Computing | 4 |
266267
| [**Module 9**](modules/module9/) | 🚀 Expert | 6-8h | **Production Deployment** | Libraries, Integration, Scaling | 4 |
267268

268-
**📈 Progressive Learning Path: 70+ Examples • 50+ Hours • Beginner to Expert**
269+
**📈 Progressive Learning Path: 71 Examples • 50+ Hours • Beginner to Expert**
269270

270271
### Learning Progression
271272

@@ -313,7 +314,7 @@ Module 5: Performance Tuning
313314
### Software Requirements
314315

315316
#### Operating System Support
316-
- **Linux** (Recommended): Ubuntu 22.04 LTS, RHEL 8/9, SLES 15 SP5
317+
- **Linux** (Recommended): Ubuntu 22.04/24.04 LTS, RHEL 8/9, SLES 15 SP5
317318
- **Windows**: Windows 10/11 with WSL2 recommended for optimal compatibility
318319
- **macOS**: macOS 12+ (Metal Performance Shaders for basic GPU compute)
319320

@@ -322,7 +323,7 @@ Module 5: Performance Tuning
322323
- **Driver Requirements**:
323324
- Linux: 550.54.14+ for CUDA 12.4+
324325
- Windows: 551.61+ for CUDA 12.4+
325-
- **ROCm Platform**: 6.0+ (Docker uses ROCm 6.4.3)
326+
- **ROCm Platform**: 7.0+ (Docker uses ROCm 7.0)
326327
- **Driver Requirements**: Latest AMDGPU-PRO or open-source AMDGPU drivers
327328
- **Kernel Support**: Linux kernel 5.4+ recommended
328329

@@ -338,6 +339,8 @@ Module 5: Performance Tuning
338339
- **Profiling**: Nsight Compute, Nsight Systems (NVIDIA), rocprof (AMD)
339340
- **Debugging**: cuda-gdb, rocgdb, compute-sanitizer
340341
- **Libraries**: cuBLAS, cuFFT, rocBLAS, rocFFT (for advanced modules)
342+
- **ML Libraries**: Thrust (NVIDIA), MIOpen (AMD) for deep learning applications
343+
- **System Management**: NVML (NVIDIA), ROCm SMI (AMD) for hardware monitoring
341344

342345
### Performance Expectations by Hardware Tier
343346

@@ -381,28 +384,42 @@ Experience the full development environment with zero setup:
381384
- 📦 Isolated and reproducible builds
382385
- 🧹 Easy cleanup when done
383386

387+
**Container Specifications:**
388+
- **CUDA**: NVIDIA CUDA 12.9.1 on Ubuntu 22.04
389+
- **ROCm**: AMD ROCm 7.0 on Ubuntu 24.04
390+
- **Libraries**: Production-ready toolchains with debugging support
391+
384392
**[📖 Complete Docker Guide →](docker/README.md)**
385393

386394
## 🔧 Build System
387395

396+
Our advanced build system features automatic GPU vendor detection and optimized configurations:
397+
388398
### Project-Wide Commands
389399
```bash
390-
make all # Build all modules
400+
make all # Build all modules with auto-detection
391401
make test # Run comprehensive tests
392402
make clean # Clean all artifacts
393-
make check-system # Verify GPU setup
403+
make check-system # Verify GPU setup and dependencies
394404
make status # Show module completion status
395405
```
396406

397407
### Module-Specific Commands
398408
```bash
399409
cd modules/module1/examples
400-
make # Build all examples in module
410+
make # Build all examples with vendor auto-detection
401411
make test # Run module tests
402412
make profile # Performance profiling
403413
make debug # Debug builds with extra checks
404414
```
405415

416+
### Advanced Build Features
417+
- **Automatic GPU Detection**: Detects NVIDIA/AMD hardware and builds accordingly
418+
- **Production Optimization**: `-O3`, fast math, architecture-specific optimizations
419+
- **Debug Support**: Full debugging symbols and validation checks
420+
- **Library Management**: Automatic detection of optional dependencies (NVML, MIOpen)
421+
- **Cross-Platform**: Single Makefile supports both CUDA and HIP builds
422+
406423
## Performance Expectations
407424

408425
| Module Level | Typical GPU Speedup | Memory Efficiency | Code Quality |

docker/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ This directory contains Docker configurations for comprehensive GPU programming
55
## 🚀 Latest Versions (2025)
66

77
- **CUDA**: 12.9.1 (Latest stable release)
8-
- **ROCm**: 6.4.3 (Latest stable release)
8+
- **ROCm**: 7.0 (Latest stable release)
99
- **Ubuntu**: 22.04 LTS
10-
- **Nsight Tools**: 2025.1.1 (with fallback to 2024.6.1)
10+
- **Nsight Tools**: 2025.1.1
1111

1212
## 🚀 Quick Start
1313

@@ -58,10 +58,10 @@ docker/
5858

5959
### CUDA Development Container
6060
**Image**: `gpu-programming-101:cuda`
61-
**Base**: `nvidia/cuda:12.4-devel-ubuntu22.04`
61+
**Base**: `nvidia/cuda:12.9.1-devel-ubuntu22.04`
6262

6363
**Features**:
64-
- CUDA 12.4 with development tools
64+
- CUDA 12.9.1 with development tools
6565
- NVIDIA Nsight Systems & Compute profilers
6666
- Python 3 with scientific libraries
6767
- GPU monitoring and debugging tools
@@ -73,17 +73,17 @@ docker/
7373

7474
### ROCm Development Container
7575
**Image**: `gpu-programming-101:rocm`
76-
**Base**: `rocm/dev-ubuntu-22.04:6.0`
76+
**Base**: `rocm/dev-ubuntu-22.04:7.0-complete`
7777

7878
**Features**:
79-
- ROCm 6.0 with HIP development environment
79+
- ROCm 7.0 with HIP development environment
8080
- Cross-platform GPU programming (AMD/NVIDIA)
8181
- ROCm profiling tools (rocprof, roctracer)
8282
- Python 3 with scientific libraries
8383

8484
**GPU Requirements**:
8585
- AMD GPU with ROCm support (RX 580+, MI series)
86-
- AMD drivers with ROCm 6.0+
86+
- AMD drivers with ROCm 7.0+
8787

8888
## 🔧 Container Usage
8989

@@ -251,7 +251,7 @@ NVIDIA_VISIBLE_DEVICES=all
251251
ROCM_PATH=/opt/rocm
252252
HIP_PATH=/opt/rocm/hip
253253
HIP_PLATFORM=amd
254-
HSA_OVERRIDE_GFX_VERSION=10.3.0
254+
HSA_OVERRIDE_GFX_VERSION=11.0.0
255255
```
256256

257257
## 🛡️ Security Considerations
@@ -282,10 +282,10 @@ nvidia-smi # For NVIDIA
282282
rocm-smi # For AMD
283283

284284
# Verify Docker GPU support
285-
docker run --rm --gpus all nvidia/cuda:12.4-base nvidia-smi
285+
docker run --rm --gpus all nvidia/cuda:12.9.1-base nvidia-smi
286286

287287
# Check container runtime
288-
docker run --rm --device=/dev/kfd rocm/dev-ubuntu-22.04 rocminfo
288+
docker run --rm --device=/dev/kfd rocm/dev-ubuntu-22.04:7.0 rocminfo
289289
```
290290

291291
**"Container build fails"**
@@ -297,8 +297,8 @@ docker system prune -a
297297
sudo apt update && sudo apt upgrade docker-ce docker-compose
298298

299299
# Check base image availability
300-
docker pull nvidia/cuda:12.4-devel-ubuntu22.04
301-
docker pull rocm/dev-ubuntu-22.04:6.0
300+
docker pull nvidia/cuda:12.9.1-devel-ubuntu22.04
301+
docker pull rocm/dev-ubuntu-22.04:7.0-complete
302302
```
303303

304304
**"Permission denied errors"**

docker/docker-compose.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# GPU Programming 101 - Docker Compose Configuration
22
# Supports both NVIDIA CUDA and AMD ROCm platforms
3-
# Updated for CUDA 12.9.1 and ROCm 6.4.3 (2025)
3+
# Updated for CUDA 12.9.1 and ROCm 7.0 (2025)
44

55
services:
66
# NVIDIA CUDA Development Environment
@@ -83,7 +83,7 @@ services:
8383
environment:
8484
- HIP_VISIBLE_DEVICES=0
8585
- HSA_OVERRIDE_GFX_VERSION=11.0.0
86-
- ROCM_VERSION=6.4.3
86+
- ROCM_VERSION=7.0
8787

8888
# Development tools container (CPU-only for general development)
8989
dev-tools:

docker/rocm/Dockerfile

Lines changed: 6 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,14 @@
11
# GPU Programming 101 - ROCm Development Container
2-
# Based on AMD's official ROCm 6.4.3 development image (latest stable as of 2025)
2+
# Based on AMD's official ROCm development image - used as-is for maximum compatibility
33

4-
FROM rocm/dev-ubuntu-22.04:6.4.3
4+
FROM rocm/dev-ubuntu-24.04:7.0-complete
55

66
# Metadata
77
LABEL maintainer="GPU Programming 101"
88
LABEL description="ROCm/HIP development environment for GPU programming course"
99
LABEL version="2.0"
10-
LABEL rocm.version="6.4.3"
11-
LABEL ubuntu.version="22.04"
12-
13-
# Avoid interactive prompts during package installation
14-
ARG DEBIAN_FRONTEND=noninteractive
15-
16-
# Install essential development tools for GPU programming
17-
RUN apt-get update && apt-get install -y \
18-
# Core development tools
19-
build-essential \
20-
cmake \
21-
git \
22-
wget \
23-
curl \
24-
vim \
25-
nano \
26-
htop \
27-
tree \
28-
# Minimal Python for basic scripting (not data science)
29-
python3 \
30-
python3-pip \
31-
python3-dev \
32-
# Additional utilities
33-
pkg-config \
34-
software-properties-common \
35-
# Debugging and profiling tools
36-
gdb \
37-
valgrind \
38-
strace \
39-
# Network tools
40-
net-tools \
41-
iputils-ping \
42-
&& rm -rf /var/lib/apt/lists/*
43-
44-
# Install core ROCm development packages (keep minimal)
45-
RUN apt-get update && apt-get install -y \
46-
# Core ROCm packages for GPU programming
47-
hip-dev \
48-
hip-samples \
49-
hipblas-dev \
50-
# ROCm profiling tools (essential for performance work)
51-
rocprofiler-dev \
52-
roctracer-dev \
53-
&& rm -rf /var/lib/apt/lists/*
54-
55-
# Install minimal Python packages for basic development (no heavy data science libs)
56-
RUN pip3 install --no-cache-dir \
57-
numpy \
58-
matplotlib
59-
60-
# Set up ROCm environment variables
61-
ENV ROCM_PATH=/opt/rocm
62-
ENV HIP_PATH=/opt/rocm/hip
63-
ENV PATH=${ROCM_PATH}/bin:${HIP_PATH}/bin:${PATH}
64-
ENV LD_LIBRARY_PATH=${ROCM_PATH}/lib:${HIP_PATH}/lib:${LD_LIBRARY_PATH}
65-
ENV HIP_PLATFORM=amd
66-
ENV HSA_OVERRIDE_GFX_VERSION=11.0.0
67-
ENV ROCM_VERSION=6.4.3
68-
69-
# Verify HIP compiler installation (skip rocminfo as no GPU during build)
70-
RUN hipcc --version
10+
LABEL rocm.version="latest"
11+
LABEL ubuntu.version="24.04"
7112

7213
# Create development workspace
7314
WORKDIR /workspace
@@ -76,7 +17,7 @@ RUN mkdir -p /workspace/{projects,samples,output}
7617
# Copy course materials (will be mounted as volume in practice)
7718
COPY . /workspace/gpu-programming-101/
7819

79-
# Set up convenient aliases and environment
20+
# Set up convenient aliases and environment for the course
8021
RUN echo 'alias ll="ls -alF"' >> /root/.bashrc && \
8122
echo 'alias la="ls -A"' >> /root/.bashrc && \
8223
echo 'alias l="ls -CF"' >> /root/.bashrc && \
@@ -159,17 +100,5 @@ echo "=== All tests completed ==="\n' > /workspace/test-gpu.sh
159100

160101
RUN chmod +x /workspace/test-gpu.sh
161102

162-
# Install HIP samples for learning and reference
163-
RUN cd /workspace && \
164-
if [ -d "/opt/rocm/hip/samples" ]; then \
165-
cp -r /opt/rocm/hip/samples ./hip-samples; \
166-
else \
167-
git clone https://github.com/ROCm-Developer-Tools/HIP-Examples.git hip-examples; \
168-
fi
169-
170103
# Default command
171-
CMD ["/bin/bash"]
172-
173-
# Health check to verify HIP compiler access (will only work when GPU is available)
174-
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
175-
CMD hipcc --version > /dev/null 2>&1 || exit 1
104+
CMD ["/bin/bash"]

docker/scripts/build.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ main() {
212212
if [ "$pull" = true ]; then
213213
log "Pulling base images..."
214214
docker pull nvidia/cuda:12.4-devel-ubuntu22.04 || warning "Failed to pull CUDA base image"
215-
docker pull rocm/dev-ubuntu-22.04:6.0 || warning "Failed to pull ROCm base image"
215+
docker pull rocm/dev-ubuntu-24.04:latest || warning "Failed to pull ROCm base image"
216216
fi
217217

218218
local success_count=0

docker/scripts/run.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ run_rocm() {
221221
# Set up GPU access for AMD
222222
local detected_gpu=$(detect_gpu)
223223
if [ "$detected_gpu" = "amd" ] && [ "$no_gpu_requested" = false ]; then
224-
gpu_args="--device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined"
224+
gpu_args="--device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video"
225225
log "Enabling AMD GPU access"
226226
elif [ "$no_gpu_requested" = true ]; then
227227
log "GPU access explicitly disabled with --no-gpu"
@@ -247,8 +247,6 @@ run_rocm() {
247247
-v "$PROJECT_ROOT:/workspace/gpu-programming-101:rw"
248248
-v "gpu101-rocm-home:/root"
249249
-w "/workspace/gpu-programming-101"
250-
-e HIP_VISIBLE_DEVICES=0
251-
-e HSA_OVERRIDE_GFX_VERSION=10.3.0
252250
)
253251

254252
# Add port mapping

0 commit comments

Comments
 (0)