Skip to content
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 199 additions & 0 deletions docs/source/contributing/architecture.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
Architecture
============

The architecture of CSET.

Operators
---------

Operators provide the core functionality of CSET. Each operator is a function
that takes some input and returns some output, possibly also producing side
effects like saving plots.

All of the operators in CSET are contained in modules in
``src/CSET/operators/``, which group related operators. For example ``plot.py``
contains various plotting operators.

Recipes
-------

To produce a diagnostic, operators must be combined with recipes, which are YAML
files containing a graph of operators to execute, along with any needed
arguments and a bit of metadata.

The included recipes in CSET can be found in ``src/CSET/recipes/``.

A recipe may optionally contain variables, such as ``$VARIABLE``, which is
replaced by a value provided on the command line or by a loader.

Loaders
-------

Loaders load recipes for use in the workflow, filling in any variables from the
configuration provided in the ``rose-suite.conf``.

The included loaders in CSET can be found in ``src/CSET/loaders/``.

Workflow
--------

To enable large scale running of CSET we provide a cylc workflow. The workflow's
source can be found in ``src/CSET/cset_workflow/``, with its logic being defined
in the ``flow.cylc`` file.

This workflow has three main aims:

1. Fetching the model and observation data.
2. Running all of the enabled recipes on that data.
3. Building a visualisation website for the produced diagnostics.

These aims are accomplished by a series of rose-apps that are run as part of the
workflow. These apps live in ``src/CSET/cset_workflow/app/``.

validate_environment
~~~~~~~~~~~~~~~~~~~~

A small shell script that checks the conda environment has been loaded correctly
and that the ``cset`` command line is available at the start of the workflow.

assign_model_colours
~~~~~~~~~~~~~~~~~~~~

Runs at the start of the workflow to assign each model a colour, so that
different line plots and such can use consistent colours for each model. The
colours are assigned into a style file which is used when baking the recipes.

install_website_skeleton
~~~~~~~~~~~~~~~~~~~~~~~~

Copies static files for the visualisation website to use and creates a symlink
from the configured WEB_DIR to ``$CYLC_WORKFLOW_SHARE_DIR/web``.

This app should probably be merged into finish_website.

fetch_fcst
~~~~~~~~~~

Runs for each model/obs source on each cycle to retrieve the required data. It
can switch between different implementations for different data sources, such as
the filesystem or HTTP.

parbake_recipes
~~~~~~~~~~~~~~~

Runs for each cycle. Reads the user configuration and the writes out all the
enabled recipes with their variables filled in. This allows them to be "baked"
(run) in parallel.

bake_recipes
~~~~~~~~~~~~

Runs for each cycle, and additionally as bake_aggregation_recipes in the final
cycle. This takes the parbaked recipes and runs them to produced the desired
diagnostics.

This will be that task that takes the majority of the workflow's runtime, and is
the only one that needs significant compute resource.

Internally it runs the ``cset bake`` command line for each recipe in parallel
using `rose_bunch`_, however to allow the baked recipes to be decided at runtime
the first thing that runs is the ``baker.sh`` script, which writes out the list
of recipes to bake as a rose optional configuration, and then run the rose app
using it.

.. _rose_bunch: https://metomi.github.io/rose/doc/html/api/built-in/rose_bunch.html

finish_website
~~~~~~~~~~~~~~

Runs at the end of the workflow and constructs the index for the visualisation
website from all of the produced diagnostics.

housekeeping
~~~~~~~~~~~~

Deletes the retrieved data at the end of the workflow to free up disk space.

send_email
~~~~~~~~~~

Sends a notification email to the workflow owner letting them know the workflow
is complete.

demo_pointstat
~~~~~~~~~~~~~~

metplus_ascii2nc
~~~~~~~~~~~~~~~~

metplus_grid_stat
~~~~~~~~~~~~~~~~~

metplus_point_stat
~~~~~~~~~~~~~~~~~~

These apps are not currently used, but aim to integrate METplus in the workflow.
Comment on lines +124 to +136
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
demo_pointstat
~~~~~~~~~~~~~~
metplus_ascii2nc
~~~~~~~~~~~~~~~~
metplus_grid_stat
~~~~~~~~~~~~~~~~~
metplus_point_stat
~~~~~~~~~~~~~~~~~~
These apps are not currently used, but aim to integrate METplus in the workflow.
The following apps are not currently used, but aim to integrate METplus in the workflow.
demo_pointstat
~~~~~~~~~~~~~~
metplus_ascii2nc
~~~~~~~~~~~~~~~~
metplus_grid_stat
~~~~~~~~~~~~~~~~~
metplus_point_stat
~~~~~~~~~~~~~~~~~~

Just wonder if it might be clearer to re-order this part.


Code outline
------------

The code of CSET lives in the ``src/CSET/`` directory, arranged as follows:

src/CSET
~~~~~~~~

.. code-block:: text
src/CSET
├── cset_workflow # Detailed below for clarity.
├── loaders
│   ├── __init__.py # Imports all loaders to make available to the rest of CSET.
│   └── ... # Then lots of loaders, as described above.
├── operators
│   ├── __init__.py # Code for executing ("baking") recipes.
│   ├── _colorbar_definition.json # Default colourbar definitions.
│   ├── _plot_page_template.html # Template for diagnostic output page.
│   ├── _stash_to_lfric.py # Mapping between STASH codes and LFRic variable names.
│   ├── _utils.py # Common utility code for operators.
│   └── ... # Then lots of operators, as described above.
├── recipes
│   ├── __init__.py # Code for parbaking recipes.
│   └── ... # Then lots of recipes, as described above.
├── __init__.py # CLI entrypoint. Sets up logging, parses arguments, etc.
├── __main__.py # Allows running `python -m CSET`.
├── _common.py # Common utility code.
├── extract_workflow.py # Implementation of `cset extract-workflow`.
└── graph.py # Implementation of `cset graph`.
Comment on lines +148 to +167
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the code blocks, and applicable to the one below. It is a lot of text with the comments being on the same line. Is there a better way to format this to make it less busy for people to read.

Copy link
Member Author

@jfrost-mo jfrost-mo Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. It would immediately be better if the comments were aligned so they all started at the same indentation, but that would cause it to scroll in the output.

We could go for a table or a definitions list (as used here), but we would lose the hierarchy, which I see as quite important for understanding what is where.

While a little boring, we might be best served with a simple nested list, such as:

  • cset_workflow
    Detailed below for clarity.
  • loaders
    • __init__.py
      Imports all loaders to make available to the rest of CSET.
    • ...
      Then lots of loaders, as described above.
  • operators
    • __init__.py
      Code for executing ("baking") recipes.
    • ...
      Then lots of operators, as described above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe as an image, that way we can keep the hierarchy and not have scrolling problems and clear indentations?

src/CSET/cset_workflow
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text
src/CSET/cset_workflow
├── app # Contains the rose apps described above.
│   ├── assign_model_colours
│   │   ├── bin
│   │   │   └── assign_model_colours.py # Executable for app.
│   │   └── rose-app.conf # Rose app configuration. Mostly sets the executable.
│   └── ... # Lots more rose apps in here.
├── bin # Files in bin are automatically on the workflow's PATH.
│   └── app_env_wrapper # Wrapper script to run things in the conda environment.
├── includes # Deprecated; Use loaders instead now.
├── lib # Available for import into cylc's jinja2 templating.
│   └── python
│   └── jinja_utils.py # A couple helper functions used in flow.cylc.
├── meta # Validation and GUI layout for user configuration in rose-suite.conf.
│   ├── diagnostics
│   │   └── rose-meta.conf # Diagnostic configuration.
│   ├── rose-meta.conf # Automatically generated file, don't edit.
│   └── rose-meta.conf.jinja2 # Workflow configuration.
├── opt # Pre-made configurations for consistent evaluation.
│   └── rose-suite-RAL3LFRIC.conf
├── site # Site-specific cylc configuration.
│   └── localhost.cylc
├── flow.cylc # The main workflow definition detailing what and how tasks are run.
├── install_restricted_files.sh # Script for installing site-specific files.
├── README.md
├── rose-suite.conf # User configuration of workflow and diagnostics.
└── rose-suite.conf.example # Blank user configuration to be copied.
10 changes: 10 additions & 0 deletions docs/source/contributing/dependencies.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
Dependencies
============

Most of CSET's dependencies are managed through conda and are listed in
``requirements/environment.yaml``. The workflow has a number of additional
requirements that must be installed separately.

* bash - At least version 5, used for many workflow tasks, including the one that loads the environment.
* conda - Used for managing the conda environment. Alternatively mamba or micromamba can be used.
* cylc 8 - Used for running the workflow. As it typically requires significant site integration it is not installed with CSET.
* GNU find - We use the ``-printf`` option that is GNU specific.
* sqlite3 - Command line tool for sqlite. Used for doing bad things to rose_bunch.

Requirements of a new dependency
--------------------------------

Expand Down
1 change: 1 addition & 0 deletions docs/source/contributing/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ need to get started, and the links below go into more detail on specific topics.
code-review
dependencies
releases
architecture

Contributing checklist
----------------------
Expand Down