Skip to content

Commit ae31357

Browse files
Fixes to GPU routines, improved tests and results
1 parent ad873e6 commit ae31357

File tree

13 files changed

+193
-43
lines changed

13 files changed

+193
-43
lines changed

README.md

Lines changed: 139 additions & 1 deletion
Large diffs are not rendered by default.

experiments/config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
device: cpu
2-
size: 512
2+
size: 4096
33
function:
4-
routine: matmul_numba_block_serial
5-
block_size: 24
4+
routine: matmul_numba_serial
5+
block_size: 32
66
print: False

figures/pie_1node_CPU.png

56.9 KB
Loading

figures/pie_1node_GPU.png

65.2 KB
Loading

figures/pie_4nodes_CPU.png

45.8 KB
Loading

figures/pie_4nodes_GPU.png

70.7 KB
Loading

figures/scaling_nodes.png

30.7 KB
Loading

figures/scaling_size.png

44.8 KB
Loading

figures/speedup.png

68 KB
Loading

scripts/run.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,10 @@ def main_gpu(params: dict):
135135
a_d = cuda.to_device(A)
136136
c_d = cuda.to_device(C)
137137

138+
# each process at each step computes a block of C of size n_loc x ncols
139+
# we set parameters for the kernel accordingly
138140
nthreads = bs
139-
blocks_per_grid = ((n_loc + nthreads-1)//nthreads,(SIZE + nthreads-1)//nthreads)
141+
blocks_per_grid = ((n_loc + nthreads-1)//nthreads,(ncols + nthreads-1)//nthreads)
140142
threads_per_block = (nthreads, nthreads)
141143

142144
t_tot = 0
@@ -150,6 +152,7 @@ def main_gpu(params: dict):
150152

151153
B_block = np.empty((n_loc,ncols), dtype=np.float64)
152154
B_col = np.empty((SIZE,ncols), dtype=np.float64)
155+
blocks_per_grid = ((n_loc + nthreads-1)//nthreads,(ncols + nthreads-1)//nthreads)
153156

154157
# create a contiguous block from B to communicate
155158
create_block(B, B_block, start, ncols)

0 commit comments

Comments
 (0)