Skip to content

Parallelisation in reflectivity analysis software

Andrew Nelson edited this page Dec 21, 2020 · 3 revisions

Parallelisation in reflectivity analysis software.

There are a few different opportunities for parallelisation, depending on the type of calculation being performed and the type of computer system it's being done on:

  1. a specular reflectivity kernel can be parallelised over the number of datapoints in a calculation. If the kernel is implemented in C or Cython then using a threading library like OpenMP, or pthreads, gives good performance. If resolution smearing is being performed, which often requires an oversampling approach, then it makes sense to send all the oversampled points for calculation at the same time (rather than creating an extra loop for a resolution integral), as this maximises the usefulness of the thread library.

  2. the optimisation approach may be parallelisable. For example, Bayesian MCMC or differential evolution often needs to independently calculate hundreds (or thousands) of cost functions at once. In Python one would typically use the multiprocessing module if you are on a desktop machine, or the mpi4py package if you're on a cluster. There is always overhead to distribute this kind of task, and the overhead becomes less costly as the problem becomes more computationally expensive (i.e. each processor is given more work to do). If each cost function is quick to calculate it may be better to stay on a desktop computer; if it takes a long time then a cluster may be better.

Nesting of parallelisation can create issues, and significantly degrade performance. For example, when doing MCMC with hundreds of cost functions to evaluate it may make sense to distribute those over the entire computational capacity you have access to (e.g. an MPI setup on a cluster). However, if each cost function evaluation then uses a thread library to calculate a specular reflectivity kernel, requesting compute from more than 1 processor, there will be oversubscription of resources, and the whole setup slows down dramatically. In such a situation it's better to do the reflectivity calculation in a serial manner (single thread), with the cost functions being distributed over all the available processors. These considerations mean that it's a good idea for analysis packages to retain fine-grained control over the parallelisation they employ, so they can be tuned at different levels depending on the problem. Note that the Python multiprocessing module does not work with many omp libraries.

I have experimented using GPU (pyopencl) to calculate reflectivity in a highly parallelised manner. If the calculation system is small it's better to calculate on a CPU as there is overhead to offloading the calculation to a GPU. At the calculation system becomes larger (more datapoints, more layers in the model) there is a crossover point for the GPU calculation becoming faster.

Clone this wiki locally