|
1 | 1 | [Home](./shapepipe.md) | [Environments](./environment.md)
|
2 | 2 |
|
3 |
| -# Candide Set Up |
| 3 | +# CANDIDE Set Up |
| 4 | + |
| 5 | +> Environment Status Notes |
| 6 | +> - Website: https://candideusers.calet.org/ |
| 7 | +> - No internet access on compute nodes, see [tutorial](https://github.com/CosmoStat/shapepipe/blob/master/docs/wiki/tutorial/pipeline_tutorial.md#mask-images) for how to mange `mask_runner` |
| 8 | +> - Current stable OpenMPI version: `4.0.2` |
| 9 | +
|
| 10 | +## Contents |
| 11 | + |
| 12 | +1. [Introduction](#Introduction) |
| 13 | +1. [Installation](#Installation) |
| 14 | +1. [Execution](#Execution) |
| 15 | +1. [Troubleshooting](#Troubleshooting) |
| 16 | + |
| 17 | +## Introduction |
| 18 | + |
| 19 | +The [CANDIDE cluster](https://candideusers.calet.org/) is hosted and maintained at the Institut d’Astrophysique de Paris by Stephane Rouberol. |
| 20 | + |
| 21 | +### CANDIDE Account |
| 22 | + |
| 23 | +To request and account on CANDIDE send an email to [Henry Joy McCracken](mailto:hjmcc@iap.fr) and [Stephane Rouberol](mailto:rouberol@iap.fr) at IAP with a short description of what you want to do and with whom you work. |
| 24 | + |
| 25 | +### SSH |
| 26 | + |
| 27 | +Once you have an account on CANDIDE you can connect via SSH as follows: |
| 28 | + |
| 29 | +```bash |
| 30 | +$ ping -c 1 -s 999 candide.iap.fr; ssh <mylogin>@candide.iap.fr |
| 31 | +``` |
| 32 | + |
| 33 | +## Installation |
| 34 | + |
| 35 | +The CANDIDE system uses [Environment Modules](https://modules.readthedocs.io/en/latest/) to manage various software packages. You can view the modules currently available on the system by running: |
| 36 | + |
| 37 | +```bash |
| 38 | +$ module avail |
| 39 | +``` |
| 40 | + |
| 41 | +ShapePipe requires `conda`, which on CANDIDE is provided via `intelpython/3`. To load this package simply run: |
| 42 | + |
| 43 | +```bash |
| 44 | +$ module load intelpython/3 |
| 45 | +``` |
| 46 | + |
| 47 | +> You can add this command to your `.bash_profile` to ensure that this module is available when you log in. |
| 48 | +
|
| 49 | +You can list the modules already loaded by running: |
| 50 | + |
| 51 | +```bash |
| 52 | +$ module list |
| 53 | +``` |
| 54 | + |
| 55 | +### With MPI |
| 56 | + |
| 57 | +To install ShapePipe with MPI enabled on CANDIDE you also need to load the `openmpi` module. To do so run: |
| 58 | + |
| 59 | +```bash |
| 60 | +$ module load openmpi |
| 61 | +``` |
| 62 | + |
| 63 | +You can also specify a specific version of OpenMPI to use. |
| 64 | + |
| 65 | +```bash |
| 66 | +$ module load openmpi/<VERSION> |
| 67 | +``` |
| 68 | + |
| 69 | +Then you need to identify the root directory of the OpenMPI installation. A easy way to get this information is by running: |
| 70 | + |
| 71 | +```bash |
| 72 | +$ module show openmpi |
| 73 | +``` |
| 74 | + |
| 75 | +which should reveal something like `/softs/openmpi/<VERSION>-torque-CentOS7`. Provide this path to the `mpi-root` option of the installation script as follows: |
| 76 | + |
| 77 | +```bash |
| 78 | +$ ./shapepipe_install --mpi-root=/softs/openmpi/<VERSION>-torque-CentOS7 |
| 79 | +``` |
| 80 | + |
| 81 | +> Be sure to check the output of the **Installing MPI** section, as the final check only tests if the `mpiexec` command is available on the system. |
| 82 | +
|
| 83 | +You can rebuild the MPI component at any time by doing the following: |
| 84 | + |
| 85 | +```bash |
| 86 | +$ pip uninstall mpi4py |
| 87 | +$ ./install_shapepipe --no-env --no-exe --mpi-root=/softs/openmpi/<VERSION>-torque-CentOS7 |
| 88 | +``` |
| 89 | + |
| 90 | +### Without MPI |
| 91 | + |
| 92 | +To install ShapePipe without MPI enabled simply pass the `no-mpi` option to the installation script as follows: |
| 93 | + |
| 94 | +```bash |
| 95 | +$ ./shapepipe_install --no-mpi |
| 96 | +``` |
| 97 | + |
| 98 | +## Execution |
| 99 | + |
| 100 | +CANDIDE uses [TORQUE](https://en.wikipedia.org/wiki/TORQUE) for handling distributed jobs. |
| 101 | + |
| 102 | +TORQUE uses standard [Portable Batch System (PBS) commands](https://www.cqu.edu.au/eresearch/high-performance-computing/hpc-user-guides-and-faqs/pbs-commands) such as: |
| 103 | + |
| 104 | +- `qsub` - To submit jobs to the queue. |
| 105 | +- `qstat` - To check on the status of jobs in the queue. |
| 106 | +- `qdel` - To kill jobs in the queue. |
| 107 | + |
| 108 | +Additionally, the availability of compute nodes can be seen using the command |
| 109 | + |
| 110 | +```bash |
| 111 | +$ cnodes |
| 112 | +``` |
| 113 | + |
| 114 | +Jobs should be submitted as bash scripts. *e.g.*: |
| 115 | + |
| 116 | +```bash |
| 117 | +$ qsub candide_smp.sh |
| 118 | +``` |
| 119 | + |
| 120 | +In this script you can specify: |
| 121 | + |
| 122 | +- The number of nodes to use (*e.g.* `#PBS -l nodes=10`) |
| 123 | +- A specific machine to use with a given number of cores (*e.g.* `#PBS -l nodes=n04:ppn=10`) |
| 124 | +- The maximum computing time for your script (*e.g.* `#PBS -l walltime=10:00:00`) |
| 125 | + |
| 126 | +### Example SMP Script |
| 127 | + |
| 128 | +[`candide_smp.sh`](../../example/pbs/candide_smp.sh) |
| 129 | + |
| 130 | +```bash |
| 131 | +#!/bin/bash |
| 132 | + |
| 133 | +########################## |
| 134 | +# SMP Script for CANDIDE # |
| 135 | +########################## |
| 136 | + |
| 137 | +# Receive email when job finishes or aborts |
| 138 | +#PBS -M <name>@cea.fr |
| 139 | +#PBS -m ea |
| 140 | +# Set a name for the job |
| 141 | +#PBS -N shapepipe_smp |
| 142 | +# Join output and errors in one file |
| 143 | +#PBS -j oe |
| 144 | +# Set maximum computing time (e.g. 5min) |
| 145 | +#PBS -l walltime=00:05:00 |
| 146 | +# Request number of cores |
| 147 | +#PBS -l nodes=4 |
| 148 | + |
| 149 | +# Full path to environment |
| 150 | +export SPENV="$HOME/.conda/envs/shapepipe" |
| 151 | +export SPDIR="$HOME/shapepipe" |
| 152 | + |
| 153 | +# Activate conda environment |
| 154 | +module load intelpython/3 |
| 155 | +source activate $SPENV |
| 156 | + |
| 157 | +# Run ShapePipe using full paths to executables |
| 158 | +$SPENV/bin/shapepipe_run -c $SPDIR/example/pbs/config_smp.ini |
| 159 | + |
| 160 | +# Return exit code |
| 161 | +exit 0 |
| 162 | +``` |
| 163 | + |
| 164 | +> Make sure the number of nodes requested matches the `SMP_BATCH_SIZE` in the config file. |
| 165 | +
|
| 166 | +### Example MPI Script |
| 167 | + |
| 168 | +[`candide_mpi.sh`](../../example/pbs/candide_mpi.sh) |
| 169 | + |
| 170 | +```bash |
| 171 | +#!/bin/bash |
| 172 | + |
| 173 | +########################## |
| 174 | +# MPI Script for CANDIDE # |
| 175 | +########################## |
| 176 | + |
| 177 | +# Receive email when job finishes or aborts |
| 178 | +#PBS -M <name>@cea.fr |
| 179 | +#PBS -m ea |
| 180 | +# Set a name for the job |
| 181 | +#PBS -N shapepipe_mpi |
| 182 | +# Join output and errors in one file |
| 183 | +#PBS -j oe |
| 184 | +# Set maximum computing time (e.g. 5min) |
| 185 | +#PBS -l walltime=00:05:00 |
| 186 | +# Request number of cores (e.g. 4 from 2 different machines) |
| 187 | +#PBS -l nodes=2:ppn=2 |
| 188 | +# Allocate total number of cores to variable NSLOTS |
| 189 | +NSLOTS=`cat $PBS_NODEFILE | wc -l` |
| 190 | + |
| 191 | +# Full path to environment |
| 192 | +export SPENV="$HOME/.conda/envs/shapepipe" |
| 193 | +export SPDIR="$HOME/shapepipe" |
| 194 | + |
| 195 | +# Load moudules and activate conda environment |
| 196 | +module load intelpython/3 |
| 197 | +module load openmpi/4.0.2 |
| 198 | +source activate $SPENV |
| 199 | + |
| 200 | +# Run ShapePipe using full paths to executables |
| 201 | +$SPENV/bin/mpiexec -n $NSLOTS $SPENV/bin/shapepipe_run -c $SPDIR/example/pbs/config_mpi.ini |
| 202 | + |
| 203 | +# Return exit code |
| 204 | +exit 0 |
| 205 | +``` |
| 206 | + |
| 207 | +## Troubleshooting |
| 208 | + |
| 209 | +### OpenBLAS |
| 210 | + |
| 211 | +If you get the following error |
| 212 | + |
| 213 | +``` |
| 214 | +error while loading shared libraries: libopenblas.so.0: cannot open shared object file: No such file or directory |
| 215 | +``` |
| 216 | + |
| 217 | +simply run |
| 218 | + |
| 219 | +```bash |
| 220 | +export LD_LIBRARY_PATH=$CONDA_PREFIX/lib |
| 221 | +``` |
| 222 | + |
| 223 | +> You can add the command to your `.bash_profile`. |
0 commit comments