Parallel and High Performance Computing @ EPFL

Welcome to the repository for the graded projects of the Parallel and High Performance Computing course at EPFL, Spring 2025. This repository contains all code, reports, and results for the course assignments.

Overview

This repository showcases solutions to various parallel programming and high performance computing problems, including:

Parallelization using MPI and CUDA
Performance optimization and profiling
Theoretical anlysis of run time
Scalability analysis
Real-world scientific computing applications

Final Project: Parallelized Shallow Water Equations Solver using MPI and CUDA

This was the main project for the course. The full report is avaliable here. All the MPI code is localed here while all the CUDA code is avaliable here.

Overview

Objective: Parallelize the solution of shallow water equations using MPI and CUDA to enhance computational performance.
Theoretical Analysis: Computational cost analysis of grid initialization and time-step calculations.

MPI Parallelization

Domain Subdivision: Divided the 2D grid into subgrids for each processor to handle computations locally.
Communication: Utilized MPI's Cartesian communicator for efficient neighbor communication and halo updates.
Performance Metrics:
- Strong Scaling: Demonstrated significant speedup with increasing processors, aligning with Amdahl's law initially but deviating due to communication overheads.
- Weak Scaling: Showed efficiency and speedup trends with constant work per processor, highlighting communication costs as processors increased.

CUDA Parallelization

Conceptual Similarity: Subdivided the grid into blocks for parallel computation within each block.
Optimizations:
- Parallelized the large majority of functions.
- Implemented a two-kernel approach for global maximum computation, inspired by NVIDIA's reduction techniques.
Performance Results: Illustrated significant time reduction with increasing threads per block, eventually plateauing due to serial bottlenecks.

Additional Projects

Other than the final project, two other projects were completed with the goal of hands-on learning of MPI and CUDA.

Project 1: Parallelized conjugate gradient solver using MPI

This project focuses on parallelizing the conjugate gradient method, specifically targeting the matrix vector multiplication function. Using MPI, the matrix indices were divided among processors to dsitribute the workload. Evaluation showed a notable speedup with up to 25 processors, beyond which a plateau was observed. Scaling analysis, applying Amdahl's and Gustafson's laws, revealed that memory and communication bottlenecks limit scalability.

The full report is avaliable here and all the code is localed here.

Project 2: Parallelized conjugate gradient solver using CUDA

This project focuses on transitioning from CBLAS to CUDA to parallelize linear algebra functions on GPUs. The primary goal was to create CUDA kernels for these functions to enhance computational performance. Timing the conjugate gradient function with varying threads per block revealed that performance improved initially but plateaued after 8 threads per block and degraded significantly beyond 256 threads, likely due to memory bottlenecks.

The full report is avaliable here and all the code is localed here.

Usage

Each project has a Makefile to compile the code. By running >>> make in the corresponding directory will compile the code. Below are the links for the Makefiles

To run the code and the python visualizations SLURM job files are avaliable for each project. They were created to that the project could be run on SLURM clusters, nevertheless the instructions can be run locally. Below are the links for the SLURM job files

Contributor

Mattia Barbiere

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.vscode		.vscode
Final project		Final project
Project 1		Project 1
Project 2		Project 2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parallel and High Performance Computing @ EPFL

Table of Contents

Overview

Final Project: Parallelized Shallow Water Equations Solver using MPI and CUDA

Overview

MPI Parallelization

CUDA Parallelization

Additional Projects

Project 1: Parallelized conjugate gradient solver using MPI

Project 2: Parallelized conjugate gradient solver using CUDA

Usage

Contributor

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MattiaBarbiere/HPC_EPFL

Folders and files

Latest commit

History

Repository files navigation

Parallel and High Performance Computing @ EPFL

Table of Contents

Overview

Final Project: Parallelized Shallow Water Equations Solver using MPI and CUDA

Overview

MPI Parallelization

CUDA Parallelization

Additional Projects

Project 1: Parallelized conjugate gradient solver using MPI

Project 2: Parallelized conjugate gradient solver using CUDA

Usage

Contributor

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages