Skip to content

Commit 0940fd9

Browse files
authored
Documentation for CANDIDE (#282)
* Updated documentation for CANDIDE * fixed typos * updtaed candide contact info * fixed typos * added bash formatting for script examples * installation check fix * test SPENV in bash scripts * updated config file paths * added status notes * added LD_LIBRARY_PATH info * fixed formatting * fixed formatting * added cnodes command info
1 parent 2108224 commit 0940fd9

File tree

10 files changed

+259
-8919
lines changed

10 files changed

+259
-8919
lines changed

docs/wiki/candide.md

Lines changed: 221 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,223 @@
11
[Home](./shapepipe.md) | [Environments](./environment.md)
22

3-
# Candide Set Up
3+
# CANDIDE Set Up
4+
5+
> Environment Status Notes
6+
> - Website: https://candideusers.calet.org/
7+
> - No internet access on compute nodes, see [tutorial](https://github.com/CosmoStat/shapepipe/blob/master/docs/wiki/tutorial/pipeline_tutorial.md#mask-images) for how to mange `mask_runner`
8+
> - Current stable OpenMPI version: `4.0.2`
9+
10+
## Contents
11+
12+
1. [Introduction](#Introduction)
13+
1. [Installation](#Installation)
14+
1. [Execution](#Execution)
15+
1. [Troubleshooting](#Troubleshooting)
16+
17+
## Introduction
18+
19+
The [CANDIDE cluster](https://candideusers.calet.org/) is hosted and maintained at the Institut d’Astrophysique de Paris by Stephane Rouberol.
20+
21+
### CANDIDE Account
22+
23+
To request and account on CANDIDE send an email to [Henry Joy McCracken](mailto:hjmcc@iap.fr) and [Stephane Rouberol](mailto:rouberol@iap.fr) at IAP with a short description of what you want to do and with whom you work.
24+
25+
### SSH
26+
27+
Once you have an account on CANDIDE you can connect via SSH as follows:
28+
29+
```bash
30+
$ ping -c 1 -s 999 candide.iap.fr; ssh <mylogin>@candide.iap.fr
31+
```
32+
33+
## Installation
34+
35+
The CANDIDE system uses [Environment Modules](https://modules.readthedocs.io/en/latest/) to manage various software packages. You can view the modules currently available on the system by running:
36+
37+
```bash
38+
$ module avail
39+
```
40+
41+
ShapePipe requires `conda`, which on CANDIDE is provided via `intelpython/3`. To load this package simply run:
42+
43+
```bash
44+
$ module load intelpython/3
45+
```
46+
47+
> You can add this command to your `.bash_profile` to ensure that this module is available when you log in.
48+
49+
You can list the modules already loaded by running:
50+
51+
```bash
52+
$ module list
53+
```
54+
55+
### With MPI
56+
57+
To install ShapePipe with MPI enabled on CANDIDE you also need to load the `openmpi` module. To do so run:
58+
59+
```bash
60+
$ module load openmpi
61+
```
62+
63+
You can also specify a specific version of OpenMPI to use.
64+
65+
```bash
66+
$ module load openmpi/<VERSION>
67+
```
68+
69+
Then you need to identify the root directory of the OpenMPI installation. A easy way to get this information is by running:
70+
71+
```bash
72+
$ module show openmpi
73+
```
74+
75+
which should reveal something like `/softs/openmpi/<VERSION>-torque-CentOS7`. Provide this path to the `mpi-root` option of the installation script as follows:
76+
77+
```bash
78+
$ ./shapepipe_install --mpi-root=/softs/openmpi/<VERSION>-torque-CentOS7
79+
```
80+
81+
> Be sure to check the output of the **Installing MPI** section, as the final check only tests if the `mpiexec` command is available on the system.
82+
83+
You can rebuild the MPI component at any time by doing the following:
84+
85+
```bash
86+
$ pip uninstall mpi4py
87+
$ ./install_shapepipe --no-env --no-exe --mpi-root=/softs/openmpi/<VERSION>-torque-CentOS7
88+
```
89+
90+
### Without MPI
91+
92+
To install ShapePipe without MPI enabled simply pass the `no-mpi` option to the installation script as follows:
93+
94+
```bash
95+
$ ./shapepipe_install --no-mpi
96+
```
97+
98+
## Execution
99+
100+
CANDIDE uses [TORQUE](https://en.wikipedia.org/wiki/TORQUE) for handling distributed jobs.
101+
102+
TORQUE uses standard [Portable Batch System (PBS) commands](https://www.cqu.edu.au/eresearch/high-performance-computing/hpc-user-guides-and-faqs/pbs-commands) such as:
103+
104+
- `qsub` - To submit jobs to the queue.
105+
- `qstat` - To check on the status of jobs in the queue.
106+
- `qdel` - To kill jobs in the queue.
107+
108+
Additionally, the availability of compute nodes can be seen using the command
109+
110+
```bash
111+
$ cnodes
112+
```
113+
114+
Jobs should be submitted as bash scripts. *e.g.*:
115+
116+
```bash
117+
$ qsub candide_smp.sh
118+
```
119+
120+
In this script you can specify:
121+
122+
- The number of nodes to use (*e.g.* `#PBS -l nodes=10`)
123+
- A specific machine to use with a given number of cores (*e.g.* `#PBS -l nodes=n04:ppn=10`)
124+
- The maximum computing time for your script (*e.g.* `#PBS -l walltime=10:00:00`)
125+
126+
### Example SMP Script
127+
128+
[`candide_smp.sh`](../../example/pbs/candide_smp.sh)
129+
130+
```bash
131+
#!/bin/bash
132+
133+
##########################
134+
# SMP Script for CANDIDE #
135+
##########################
136+
137+
# Receive email when job finishes or aborts
138+
#PBS -M <name>@cea.fr
139+
#PBS -m ea
140+
# Set a name for the job
141+
#PBS -N shapepipe_smp
142+
# Join output and errors in one file
143+
#PBS -j oe
144+
# Set maximum computing time (e.g. 5min)
145+
#PBS -l walltime=00:05:00
146+
# Request number of cores
147+
#PBS -l nodes=4
148+
149+
# Full path to environment
150+
export SPENV="$HOME/.conda/envs/shapepipe"
151+
export SPDIR="$HOME/shapepipe"
152+
153+
# Activate conda environment
154+
module load intelpython/3
155+
source activate $SPENV
156+
157+
# Run ShapePipe using full paths to executables
158+
$SPENV/bin/shapepipe_run -c $SPDIR/example/pbs/config_smp.ini
159+
160+
# Return exit code
161+
exit 0
162+
```
163+
164+
> Make sure the number of nodes requested matches the `SMP_BATCH_SIZE` in the config file.
165+
166+
### Example MPI Script
167+
168+
[`candide_mpi.sh`](../../example/pbs/candide_mpi.sh)
169+
170+
```bash
171+
#!/bin/bash
172+
173+
##########################
174+
# MPI Script for CANDIDE #
175+
##########################
176+
177+
# Receive email when job finishes or aborts
178+
#PBS -M <name>@cea.fr
179+
#PBS -m ea
180+
# Set a name for the job
181+
#PBS -N shapepipe_mpi
182+
# Join output and errors in one file
183+
#PBS -j oe
184+
# Set maximum computing time (e.g. 5min)
185+
#PBS -l walltime=00:05:00
186+
# Request number of cores (e.g. 4 from 2 different machines)
187+
#PBS -l nodes=2:ppn=2
188+
# Allocate total number of cores to variable NSLOTS
189+
NSLOTS=`cat $PBS_NODEFILE | wc -l`
190+
191+
# Full path to environment
192+
export SPENV="$HOME/.conda/envs/shapepipe"
193+
export SPDIR="$HOME/shapepipe"
194+
195+
# Load moudules and activate conda environment
196+
module load intelpython/3
197+
module load openmpi/4.0.2
198+
source activate $SPENV
199+
200+
# Run ShapePipe using full paths to executables
201+
$SPENV/bin/mpiexec -n $NSLOTS $SPENV/bin/shapepipe_run -c $SPDIR/example/pbs/config_mpi.ini
202+
203+
# Return exit code
204+
exit 0
205+
```
206+
207+
## Troubleshooting
208+
209+
### OpenBLAS
210+
211+
If you get the following error
212+
213+
```
214+
error while loading shared libraries: libopenblas.so.0: cannot open shared object file: No such file or directory
215+
```
216+
217+
simply run
218+
219+
```bash
220+
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib
221+
```
222+
223+
> You can add the command to your `.bash_profile`.

docs/wiki/environment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@
22

33
# Environment Set Up
44

5-
- [Candide](./candide.md)
5+
- [CANDIDE](./candide.md)
66
- [CANFAR](./canfar.md)
77
- [CCINP3](./ccinp3.md)

example/pbs/candide_mpi.sh

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/bin/bash
22

33
##########################
4-
# MPI Script for Candide #
4+
# MPI Script for CANDIDE #
55
##########################
66

77
# Receive email when job finishes or aborts
@@ -11,19 +11,24 @@
1111
#PBS -N shapepipe_mpi
1212
# Join output and errors in one file
1313
#PBS -j oe
14-
# Request number of cores
15-
#PBS -l nodes=n04:ppn=10+n05:ppn=10
14+
# Set maximum computing time (e.g. 5min)
1615
#PBS -l walltime=00:05:00
16+
# Request number of cores (e.g. 4 from 2 different machines)
17+
#PBS -l nodes=2:ppn=2
18+
# Allocate total number of cores to variable NSLOTS
1719
NSLOTS=`cat $PBS_NODEFILE | wc -l`
1820

19-
# Activate conda environment
21+
# Full path to environment
22+
export SPENV="$HOME/.conda/envs/shapepipe"
23+
export SPDIR="$HOME/shapepipe"
24+
25+
# Load moudules and activate conda environment
2026
module load intelpython/3
21-
module load openmpi/4.0.0
22-
source activate $HOME/.conda/envs/shapepipe
27+
module load openmpi/4.0.2
28+
source activate $SPENV
2329

24-
# Run ShapePipe
25-
cd $HOME/ShapePipe
26-
/softs/openmpi/4.0.0-torque-CentOS7/bin/mpiexec -n $NSLOTS $HOME/.conda/envs/shapepipe/bin/python shapepipe_run.py -c example/pbs/config_mpi.ini
30+
# Run ShapePipe using full paths to executables
31+
$SPENV/bin/mpiexec -n $NSLOTS $SPENV/bin/shapepipe_run -c $SPDIR/example/pbs/config_mpi.ini
2732

2833
# Return exit code
2934
exit 0

example/pbs/candide_mpi2.sh

Lines changed: 0 additions & 31 deletions
This file was deleted.

example/pbs/candide_smp.sh

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/bin/bash
22

33
##########################
4-
# SMP Script for Candide #
4+
# SMP Script for CANDIDE #
55
##########################
66

77
# Receive email when job finishes or aborts
@@ -11,16 +11,21 @@
1111
#PBS -N shapepipe_smp
1212
# Join output and errors in one file
1313
#PBS -j oe
14+
# Set maximum computing time (e.g. 5min)
15+
#PBS -l walltime=00:05:00
1416
# Request number of cores
1517
#PBS -l nodes=4
1618

19+
# Full path to environment
20+
export SPENV="$HOME/.conda/envs/shapepipe"
21+
export SPDIR="$HOME/shapepipe"
22+
1723
# Activate conda environment
1824
module load intelpython/3
19-
source activate $HOME/.conda/envs/shapepipe
25+
source activate $SPENV
2026

21-
# Run ShapePipe
22-
cd $HOME/ShapePipe
23-
$HOME/.conda/envs/shapepipe/bin/python shapepipe_run.py -c example/pbs/config_smp.ini
27+
# Run ShapePipe using full paths to executables
28+
$SPENV/bin/shapepipe_run -c $SPDIR/example/pbs/config_smp.ini
2429

2530
# Return exit code
2631
exit 0

example/pbs/config_mpi.ini

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ MODE = mpi
77

88
## ShapePipe file handling options
99
[FILE]
10-
INPUT_DIR = ./example/data
11-
OUTPUT_DIR = ./example/output
10+
INPUT_DIR = $SPDIR/example/data
11+
OUTPUT_DIR = $SPDIR/example/output
1212

1313
## ShapePipe job handling options
1414
[JOB]
@@ -19,4 +19,4 @@ TIMEOUT = 00:01:35
1919
MESSAGE = The obtained value is:
2020

2121
[SERIAL_EXAMPLE]
22-
ADD_INPUT_DIR = ./example/data/numbers, ./example/data/letters
22+
ADD_INPUT_DIR = $SPDIR/example/data/numbers, $SPDIR/example/data/letters

example/pbs/config_smp.ini

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,22 @@
22

33
## ShapePipe execution options
44
[EXECUTION]
5-
MODULE = python_example, execute_example
5+
MODULE = python_example, serial_example, execute_example
66
MODE = smp
77

88
## ShapePipe file handling options
99
[FILE]
10-
INPUT_DIR = ./example/data
11-
OUTPUT_DIR = ./example/output
10+
INPUT_DIR = $SPDIR/example/data
11+
OUTPUT_DIR = $SPDIR/example/output
1212

1313
## ShapePipe job handling options
1414
[JOB]
15-
SMP_BATCH_SIZE = 3
15+
SMP_BATCH_SIZE = 4
1616
TIMEOUT = 00:01:35
1717

1818
## Module options
1919
[PYTHON_EXAMPLE]
2020
MESSAGE = The obtained value is:
21+
22+
[SERIAL_EXAMPLE]
23+
ADD_INPUT_DIR = $SPDIR/example/data/numbers, $SPDIR/example/data/letters

0 commit comments

Comments
 (0)