22
33[ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( https://opensource.org/licenses/MIT )
44[ ![ CUDA] ( https://img.shields.io/badge/CUDA-12.9.1-76B900?logo=nvidia )] ( https://developer.nvidia.com/cuda-toolkit )
5- [ ![ ROCm] ( https://img.shields.io/badge/ROCm-6.4.3 -red?logo=amd )] ( https://rocmdocs.amd.com/ )
5+ [ ![ ROCm] ( https://img.shields.io/badge/ROCm-7.0 -red?logo=amd )] ( https://rocmdocs.amd.com/ )
66[ ![ Docker] ( https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker )] ( https://www.docker.com/ )
7- [ ![ Examples] ( https://img.shields.io/badge/Examples-70%2B -green )] ( modules/ )
7+ [ ![ Examples] ( https://img.shields.io/badge/Examples-71 -green )] ( modules/ )
88[ ![ CI] ( https://img.shields.io/badge/CI-GitHub%20Actions-2088FF?logo=github-actions )] ( https://github.com/features/actions )
99
1010** A comprehensive, hands-on educational project for mastering GPU programming with CUDA and HIP**
3535** GPU Programming 101** is a complete educational resource for learning modern GPU programming. This project provides:
3636
3737- ** 9 comprehensive modules** covering beginner to expert topics
38- - ** 70+ working code examples** in both CUDA and HIP
38+ - ** 71 working code examples** in both CUDA and HIP
3939- ** Cross-platform support** for NVIDIA and AMD GPUs
4040- ** Production-ready development environment** with Docker
4141- ** Professional tooling** including profilers, debuggers, and CI/CD
@@ -197,10 +197,11 @@ This architectural knowledge is essential for writing efficient GPU code and is
197197| ---------| -------------|
198198| 🎯 ** Complete Curriculum** | 9 progressive modules from basics to advanced topics |
199199| 💻 ** Cross-Platform** | Full CUDA and HIP support for NVIDIA and AMD GPUs |
200- | 🐳 ** Docker Ready** | Complete containerized development environment |
201- | 🔧 ** Production Quality** | Professional build systems, testing, and profiling |
200+ | 🐳 ** Docker Ready** | Complete containerized development environment with CUDA 12.9.1 & ROCm 7.0 |
201+ | 🔧 ** Production Quality** | Professional build systems, auto-detection, testing, and profiling |
202202| 📊 ** Performance Focus** | Optimization techniques and benchmarking throughout |
203203| 🌐 ** Community Driven** | Open source with comprehensive contribution guidelines |
204+ | 🧪 ** Advanced Libraries** | Support for Thrust, MIOpen, and production ML frameworks |
204205
205206## 🚀 Quick Start
206207
@@ -217,14 +218,14 @@ cd gpu-programming-101
217218
218219# Inside container: verify GPU access and start learning
219220/workspace/test-gpu.sh
220- cd modules/module1 && make && ./01_vector_addition_cuda
221+ cd modules/module1 && make && ./build/ 01_vector_addition_cuda
221222```
222223
223224### Option 2: Native Installation
224225For direct system installation:
225226
226227``` bash
227- # Prerequisites: CUDA 11 .0+ or ROCm 5 .0+, GCC 7 +, Make
228+ # Prerequisites: CUDA 12 .0+ or ROCm 7 .0+, GCC 9 +, Make
228229
229230# Clone and build
230231git clone https://github.com/AIComputing101/gpu-programming-101.git
@@ -265,7 +266,7 @@ Our comprehensive curriculum progresses from fundamental concepts to production-
265266| [ ** Module 8** ] ( modules/module8/ ) | 🚀 Expert | 10-12h | ** Domain Applications** | ML, Scientific Computing | 4 |
266267| [ ** Module 9** ] ( modules/module9/ ) | 🚀 Expert | 6-8h | ** Production Deployment** | Libraries, Integration, Scaling | 4 |
267268
268- ** 📈 Progressive Learning Path: 70+ Examples • 50+ Hours • Beginner to Expert**
269+ ** 📈 Progressive Learning Path: 71 Examples • 50+ Hours • Beginner to Expert**
269270
270271### Learning Progression
271272
@@ -313,7 +314,7 @@ Module 5: Performance Tuning
313314### Software Requirements
314315
315316#### Operating System Support
316- - ** Linux** (Recommended): Ubuntu 22.04 LTS, RHEL 8/9, SLES 15 SP5
317+ - ** Linux** (Recommended): Ubuntu 22.04/24.04 LTS, RHEL 8/9, SLES 15 SP5
317318- ** Windows** : Windows 10/11 with WSL2 recommended for optimal compatibility
318319- ** macOS** : macOS 12+ (Metal Performance Shaders for basic GPU compute)
319320
@@ -322,7 +323,7 @@ Module 5: Performance Tuning
322323 - ** Driver Requirements** :
323324 - Linux: 550.54.14+ for CUDA 12.4+
324325 - Windows: 551.61+ for CUDA 12.4+
325- - ** ROCm Platform** : 6 .0+ (Docker uses ROCm 6.4.3 )
326+ - ** ROCm Platform** : 7 .0+ (Docker uses ROCm 7.0 )
326327 - ** Driver Requirements** : Latest AMDGPU-PRO or open-source AMDGPU drivers
327328 - ** Kernel Support** : Linux kernel 5.4+ recommended
328329
@@ -338,6 +339,8 @@ Module 5: Performance Tuning
338339- ** Profiling** : Nsight Compute, Nsight Systems (NVIDIA), rocprof (AMD)
339340- ** Debugging** : cuda-gdb, rocgdb, compute-sanitizer
340341- ** Libraries** : cuBLAS, cuFFT, rocBLAS, rocFFT (for advanced modules)
342+ - ** ML Libraries** : Thrust (NVIDIA), MIOpen (AMD) for deep learning applications
343+ - ** System Management** : NVML (NVIDIA), ROCm SMI (AMD) for hardware monitoring
341344
342345### Performance Expectations by Hardware Tier
343346
@@ -381,28 +384,42 @@ Experience the full development environment with zero setup:
381384- 📦 Isolated and reproducible builds
382385- 🧹 Easy cleanup when done
383386
387+ ** Container Specifications:**
388+ - ** CUDA** : NVIDIA CUDA 12.9.1 on Ubuntu 22.04
389+ - ** ROCm** : AMD ROCm 7.0 on Ubuntu 24.04
390+ - ** Libraries** : Production-ready toolchains with debugging support
391+
384392** [ 📖 Complete Docker Guide →] ( docker/README.md ) **
385393
386394## 🔧 Build System
387395
396+ Our advanced build system features automatic GPU vendor detection and optimized configurations:
397+
388398### Project-Wide Commands
389399``` bash
390- make all # Build all modules
400+ make all # Build all modules with auto-detection
391401make test # Run comprehensive tests
392402make clean # Clean all artifacts
393- make check-system # Verify GPU setup
403+ make check-system # Verify GPU setup and dependencies
394404make status # Show module completion status
395405```
396406
397407### Module-Specific Commands
398408``` bash
399409cd modules/module1/examples
400- make # Build all examples in module
410+ make # Build all examples with vendor auto-detection
401411make test # Run module tests
402412make profile # Performance profiling
403413make debug # Debug builds with extra checks
404414```
405415
416+ ### Advanced Build Features
417+ - ** Automatic GPU Detection** : Detects NVIDIA/AMD hardware and builds accordingly
418+ - ** Production Optimization** : ` -O3 ` , fast math, architecture-specific optimizations
419+ - ** Debug Support** : Full debugging symbols and validation checks
420+ - ** Library Management** : Automatic detection of optional dependencies (NVML, MIOpen)
421+ - ** Cross-Platform** : Single Makefile supports both CUDA and HIP builds
422+
406423## Performance Expectations
407424
408425| Module Level | Typical GPU Speedup | Memory Efficiency | Code Quality |
0 commit comments