Skip to content

Commit deca7e1

Browse files
committed
Update daint infos
1 parent 856960b commit deca7e1

File tree

1 file changed

+73
-57
lines changed

1 file changed

+73
-57
lines changed

website/software_install.md

Lines changed: 73 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ which will launch Julia with as many threads are there are cores on your machine
193193
### Julia on GPUs
194194
The [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) module permits to launch compute kernels on Nvidia GPUs natively from within Julia. [JuliaGPU](https://juliagpu.org) provides further reading and [introductory material](https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/) about GPU ecosystems within Julia.
195195

196-
<!--
196+
197197
### Julia MPI
198198
The following steps permit you to install [MPI.jl](https://github.com/JuliaParallel/MPI.jl) on your machine and test it:
199199
1. If Julia MPI is a dependency of a Julia project MPI.jl should have been added upon executing the `instantiate` command from within the package manager [see here](#package_manager). If not, MPI.jl can be added from within the package manager (typing `add MPI` in package mode).
@@ -229,10 +229,10 @@ and add `-host localhost` to the execution script:
229229
```sh
230230
$ mpiexecjl -n 4 -host localhost julia --project ./hello_mpi.jl
231231
```
232-
} -->
232+
}
233233

234-
<!--
235-
For running Julia at scale on Piz Daint, refer to the [Julia MPI GPU on Piz Daint](#julia_mpi_gpu_on_piz_daint) section.
234+
235+
<!-- For running Julia at scale on Piz Daint, refer to the [Julia MPI GPU on Piz Daint](#julia_mpi_gpu_on_piz_daint) section. -->
236236

237237
## GPU computing on Piz Daint
238238

@@ -261,37 +261,23 @@ ssh <username>@ela.cscs.ch
261261
3. Generate a `ed25519` keypair as described in the [CSCS user website](https://user.cscs.ch/access/auth/#generating-ssh-keys-if-not-required-to-provide-a-2nd-factor). On your local machine (not ela), do `ssh-keygen` leaving the passphrase empty. Then copy your public key to the remote server (ela) using `ssh-copy-id`. Alternatively, you can copy the keys manually as described in the [CSCS user website](https://user.cscs.ch/access/auth/#generating-ssh-keys-if-not-required-to-provide-a-2nd-factor).
262262
```sh
263263
ssh-keygen -t ed25519
264-
ssh-copy-id <username>@ela.cscs.ch
265264
ssh-copy-id -i ~/.ssh/id_ed25519.pub <username>@ela.cscs.ch
266265
```
267266

268267
4. Edit your ssh config file located in `~/.ssh/config` and add following entries to it, making sure to replace `<username>` and key file with correct names, if needed:
269268
```sh
270-
Host ela
271-
HostName ela.cscs.ch
272-
User <username>
273-
IdentityFile ~/.ssh/id_ed25519
274-
275-
Host daint
269+
Host daint-xc
276270
HostName daint.cscs.ch
277271
User <username>
278272
IdentityFile ~/.ssh/id_ed25519
279-
ProxyJump ela
280-
ForwardAgent yes
281-
RequestTTY yes
282-
283-
Host nid*
284-
HostName %h
285-
User <username>
286-
IdentityFile ~/.ssh/id_ed25519
287-
ProxyJump daint
273+
ProxyJump <username>@ela.cscs.ch
274+
AddKeysToAgent yes
288275
ForwardAgent yes
289-
RequestTTY yes
290276
```
291277

292278
5. Now you should be able to perform password-less login to daint as following
293279
```sh
294-
ssh daint
280+
ssh daint-xc
295281
```
296282
Moreover, you will get the Julia related modules loaded as we add the `RemoteCommand`
297283

@@ -305,21 +291,18 @@ ln -s $SCRATCH scratch
305291
```
306292
}
307293

308-
\warn{There is interactive visualisation on daint. Make sure to produce `png` or `gifs`. Also to avoid plotting to fail, make sure to set the following `ENV["GKSwstype"]="nul"` in the code. Also, it may be good practice to define the animation directory to avoid filling a `tmp`, such as
309-
```julia
310-
ENV["GKSwstype"]="nul"
311-
if isdir("viz_out")==false mkdir("viz_out") end
312-
loadpath = "./viz_out/"; anim = Animation(loadpath,String[])
313-
println("Animation directory: $(anim.dir)")
314-
```
315-
}
294+
Make sure to remove any folders you may find in your scratch as those are the empty remaining from last year's course.
295+
296+
### Setting up Julia on Piz Daint
297+
298+
The Julia setup on Piz Daint is handled by [JUHPC](https://github.com/JuliaParallel/JUHPC). Everything should be ready for use and the only step required is to activate the environment mostly each time before launching Julia. Also, **only hte first time**, `juliaup` needs to be installed (these steps are explained hereafter).
316299

317300
### Running Julia interactively on Piz Daint
318-
So now, how do we actually run some GPU Julia code on Piz Daint?
301+
To access a GPU on Piz Daint.
319302

320303
1. Open a terminal (other than from within VS code) and login to daint:
321304
```sh
322-
ssh daint
305+
ssh daint-xc
323306
```
324307

325308
2. The next step is to secure an allocation using `salloc`, a functionality provided by the SLURM scheduler. Use `salloc` command to allocate one node (`N1`) and one process (`n1`) on the GPU partition `-C'gpu'` on the project `class04` for 1 hour:
@@ -331,45 +314,68 @@ salloc -C'gpu' -Aclass04 -N1 -n1 --time=01:00:00
331314

332315
👉 *Running **remote job** instead? [Jump right there](#running_a_remote_job_on_piz_daint)*
333316

334-
3. Make sure to remember the **node number** returned upon successful allocation, e.g., `salloc: Nodes nid02145 are ready for job`
335-
336-
4. Once you have your allocation (`salloc`) and the node (here `nid02145`), you can access the compute node by using the following `srun` command followed by loading the required modules:
317+
3. Once you have your allocation (`salloc`) and the node, you can access the compute node by using the following `srun` command:
337318
```sh
338319
srun -n1 --pty /bin/bash -l
339-
module load daint-gpu Julia/1.9.3-CrayGNU-21.09-cuda
340320
```
341321

342-
- In the command bar of VS code (`cmd + shit + P` on macOS, `ctrl + shift + P` on Windows), type `Remote-SSH: Connect to Host...`. Accept what should be accepted and continue. Then type in the node and id (node number) as from previous step (here `nid02145`). Upon hitting enter, you should be on the node with Julia environment loaded.
322+
4. Then, to "activate" the Julia configuration previously prepared, enter the following (do not miss the first dot `.`):
323+
```sh
324+
. $SCRATCH/../julia/daint-gpu-nocudaaware/activate
325+
```
326+
This will activate the artifact-based config for CUDA.jl which works smoother on the rather old Nvidia P100 GPUs. The caveat is that it does not allow for CUDA-aware MPI. It exists also a CUDA-aware `daint-gpu` configuration one could try out at later stage but may not be totally stable.
343327

344-
5. You should then be able to launch Julia
328+
5. Then, **only the first time**, you need to install Julia using the [`juliaup`](https://github.com/JuliaLang/juliaup) command:
345329
```sh
346-
julia
330+
juliaup
347331
```
332+
This will install latest Julia, upon JUHPC calling into juliaup.
348333

349-
#### :eyes: ONLY the first time
350-
1. Assuming you are on a node and launched Julia. To finalise your install, enter the package manager and query status `] st` and `add CUDA@v4`.
334+
6. Next, go to the scratch and create a temporary test dir
335+
```sh
336+
cd $SCRATCH
337+
mkdir tmp-test
338+
cd tmp-test
339+
touch Project.toml
340+
```
351341

352-
\warn{Because some driver discovery compatibility issues, you need to add specifically version 4 of CUDA.jl, upon typing `add CUDA@v4` in the package mode.}
342+
7. You should then be able to launch Julia in the `tmp-test` project environment
343+
```sh
344+
julia --project=.
345+
```
353346

347+
8. Within Julia, enter the package mode, check the status, and add any package you'd like to be part of `tmp-test`. Let's here add `CUDA` and `MPI`, as these two packages will be mostly used in the course.
354348
```julia-repl
355-
(@1.9-daint-gpu) pkg> st
356-
Installing known registries into `/scratch/snx3000/class230/../julia/class230/daint-gpu`
357-
Status `/scratch/snx3000/julia/class230/daint-gpu/environments/1.9-daint-gpu/Project.toml` (empty project)
349+
julia> ]
350+
351+
(tmp-test) pkg> st
352+
Installing known registries into `/scratch/snx3000/class230/../julia/class230/daint-gpu-nocudaaware/juliaup/depot`
353+
Added `General` registry to /scratch/snx3000/class230/../julia/class230/daint-gpu-nocudaaware/juliaup/depot/registries
354+
Status `/scratch/snx3000/class230/tmp-test/Project.toml` (empty project)
358355
359-
(@1.9-daint-gpu) pkg> add CUDA@v4
356+
(tmp-test) pkg> add CUDA, MPI
360357
```
361358

362-
2. Then load it and query version info
359+
9. Then load it and query version info
363360
```julia-repl
364361
julia> using CUDA
365362
366363
julia> CUDA.versioninfo()
367-
CUDA runtime 11.0, local installation
368-
CUDA driver 12.1
369-
NVIDIA driver 470.57.2, originally for CUDA 11.4
364+
CUDA runtime 11.8, artifact installation
365+
CUDA driver 12.6
366+
NVIDIA driver 470.57.2
367+
368+
#[skipped lines]
369+
370+
Preferences:
371+
- CUDA_Runtime_jll.version: 11.8
372+
- CUDA_Runtime_jll.local: false
373+
374+
1 device:
375+
0: Tesla P100-PCIE-16GB (sm_60, 15.897 GiB / 15.899 GiB available)
370376
```
371377

372-
3. Try out your first calculation on the P100 GPU
378+
10. Try out your first calculation on the P100 GPU
373379
```julia-repl
374380
julia> a = CUDA.ones(3,4);
375381
@@ -382,11 +388,20 @@ julia> c .= a .+ b
382388

383389
If you made it to here, you're all set 🚀
384390

391+
\warn{There is interactive visualisation on daint. Make sure to produce `png` or `gifs`. Also to avoid plotting to fail, make sure to set the following `ENV["GKSwstype"]="nul"` in the code. Also, it may be good practice to define the animation directory to avoid filling a `tmp`, such as
392+
```julia
393+
ENV["GKSwstype"]="nul"
394+
if isdir("viz_out")==false mkdir("viz_out") end
395+
loadpath = "./viz_out/"; anim = Animation(loadpath,String[])
396+
println("Animation directory: $(anim.dir)")
397+
```
398+
}
399+
385400
#### Monitoring GPU usage
386401
You can use the `nvidia-smi` command to monitor GPU usage on a compute node on daint. Just type in the terminal or with Julia's REPL (in shell mode):
387402
```julia-repl
388403
shell> nvidia-smi
389-
Tue Oct 24 18:42:45 2023
404+
Fri Oct 25 22:32:26 2024
390405
+-----------------------------------------------------------------------------+
391406
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
392407
|-------------------------------+----------------------+----------------------+
@@ -395,7 +410,7 @@ Tue Oct 24 18:42:45 2023
395410
| | | MIG M. |
396411
|===============================+======================+======================|
397412
| 0 Tesla P100-PCIE... On | 00000000:02:00.0 Off | 0 |
398-
| N/A 21C P0 25W / 250W | 2MiB / 16280MiB | 0% E. Process |
413+
| N/A 24C P0 25W / 250W | 0MiB / 16280MiB | 0% E. Process |
399414
| | | N/A |
400415
+-------------------------------+----------------------+----------------------+
401416
@@ -408,7 +423,7 @@ Tue Oct 24 18:42:45 2023
408423
+-----------------------------------------------------------------------------+
409424
```
410425

411-
\note{You can also use VS code's integrated terminal to launch Julia on daint. However, you can't use the Julia extension nor the direct node login and would have to use `srun -n1 --pty /bin/bash -l` and load the needed modules, namely `module load daint-gpu Julia/1.9.3-CrayGNU-21.09-cuda`.}
426+
\note{You can also use VS code's integrated terminal to launch Julia on daint. However, you can't use the Julia extension and would have to use `srun -n1 --pty /bin/bash -l` and activate the environment.}
412427

413428
### Running a remote job on Piz Daint
414429
If you do not want to use an interactive session you can use the `sbatch` command to launch a job remotely on the machine. Example of a `submit.sh` you can launch (without need of an allocation) as `sbatch submit.sh`:
@@ -424,10 +439,10 @@ If you do not want to use an interactive session you can use the `sbatch` comman
424439
#SBATCH --constraint=gpu
425440
#SBATCH --account class04
426441

427-
module load daint-gpu
428-
module load Julia/1.9.3-CrayGNU-21.09-cuda
442+
# activate julia env
443+
. $SCRATCH/../julia/daint-gpu-nocudaaware/activate
429444

430-
srun julia -O3 <my_julia_gpu_script.jl>
445+
srun julia <my_julia_gpu_script.jl>
431446
```
432447

433448
### JupyterLab access on Piz Daint
@@ -463,6 +478,7 @@ fusermount -u -z /home/$USER/mnt_daint
463478
```
464479
For convenience it is suggested to also symlink to the home-directory `ln -s ~/mnt/daint/users/<your username on daint> ~/mnt/daint_home`. (Note that we mount the root directory `/` with `sshfs` such that access to `/scratch` is possible.)
465480

481+
<!--
466482
### Julia MPI GPU on Piz Daint
467483
The following step should allow you to run distributed memory parallelisation application on multiple GPU nodes on Piz Daint.
468484
1. Make sure to have the Julia GPU environment loaded

0 commit comments

Comments
 (0)