From ec0445ec8b60677c18043b79f86f78eb384a76c5 Mon Sep 17 00:00:00 2001 From: James Frost Date: Fri, 5 Sep 2025 14:48:06 +0100 Subject: [PATCH 1/7] Add architecture documentation detailing the rose apps --- docs/source/contributing/architecture.rst | 113 ++++++++++++++++++++++ docs/source/contributing/index.rst | 1 + 2 files changed, 114 insertions(+) create mode 100644 docs/source/contributing/architecture.rst diff --git a/docs/source/contributing/architecture.rst b/docs/source/contributing/architecture.rst new file mode 100644 index 000000000..cf7531562 --- /dev/null +++ b/docs/source/contributing/architecture.rst @@ -0,0 +1,113 @@ +Architecture +============ + +The architecture of CSET. + +Operators +--------- + +Recipes +------- + +Loaders +------- + +Workflow +-------- + +To enable large scale running of CSET we provide a cylc workflow. The workflow's +source can be found in ``src/CSET/cset_workflow/``, with its logic being defined +in the ``flow.cylc`` file. + +This workflow has three main aims: + +1. Fetching the model and observation data. +2. Running all of the enabled recipes on that data. +3. Building a visualisation website for the produced diagnostics. + +These aims are accomplished by a series of rose-apps that are run as part of the +workflow. These apps live in ``src/CSET/cset_workflow/app/``. + +validate_environment +~~~~~~~~~~~~~~~~~~~~ + +A small shell script that checks the conda environment has been loaded correctly +and that the ``cset`` command line is available at the start of the workflow. + +assign_model_colours +~~~~~~~~~~~~~~~~~~~~ + +Runs at the start of the workflow to assign each model a colour, so that +different line plots and such can use consistent colours for each model. The +colours are assigned into a style file which is used when baking the recipes. + +install_website_skeleton +~~~~~~~~~~~~~~~~~~~~~~~~ + +Copies static files for the visualisation website to use and creates a symlink +from the configured WEB_DIR to ``$CYLC_WORKFLOW_SHARE_DIR/web``. + +This app should probably be merged into finish_website. + +fetch_fcst +~~~~~~~~~~ + +Runs for each model/obs source on each cycle to retrieve the required data. It +can switch between different implementations for different data sources, such as +the filesystem or HTTP. + +parbake_recipes +~~~~~~~~~~~~~~~ + +Runs for each cycle. Reads the user configuration and the writes out all the +enabled recipes with their variables filled in. This allows them to be "baked" +(run) in parallel. + +bake_recipes +~~~~~~~~~~~~ + +Runs for each cycle, and additionally as bake_aggregation_recipes in the final +cycle. This takes the parbaked recipes and runs them to produced the desired +diagnostics. + +This will be that task that takes the majority of the workflow's runtime, and is +the only one that needs significant compute resource. + +Internally it runs the ``cset bake`` command line for each recipe in parallel +using `rose_bunch`_, however to allow the baked recipes to be decided at runtime +the first thing that runs is the ``baker.sh`` script, which writes out the list +of recipes to bake as a rose optional configuration, and then run the rose app +using it. + +.. _rose_bunch: https://metomi.github.io/rose/doc/html/api/built-in/rose_bunch.html + +finish_website +~~~~~~~~~~~~~~ + +Runs at the end of the workflow and constructs the index for the visualisation +website from all of the produced diagnostics. + +housekeeping +~~~~~~~~~~~~ + +Deletes the retrieved data at the end of the workflow to free up disk space. + +send_email +~~~~~~~~~~ + +Sends a notification email to the workflow owner letting them know the workflow +is complete. + +demo_pointstat +~~~~~~~~~~~~~~ + +metplus_ascii2nc +~~~~~~~~~~~~~~~~ + +metplus_grid_stat +~~~~~~~~~~~~~~~~~ + +metplus_point_stat +~~~~~~~~~~~~~~~~~~ + +These apps are not currently used, but aim to integrate METplus in the workflow. diff --git a/docs/source/contributing/index.rst b/docs/source/contributing/index.rst index e444961e3..bacfaec87 100644 --- a/docs/source/contributing/index.rst +++ b/docs/source/contributing/index.rst @@ -15,6 +15,7 @@ need to get started, and the links below go into more detail on specific topics. code-review dependencies releases + architecture Contributing checklist ---------------------- From 0c7784ec2b91448455cfc9f4b5521dc95020260e Mon Sep 17 00:00:00 2001 From: James Frost Date: Fri, 5 Sep 2025 15:05:05 +0100 Subject: [PATCH 2/7] Document non-conda workflow dependencies --- docs/source/contributing/dependencies.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/source/contributing/dependencies.rst b/docs/source/contributing/dependencies.rst index 825f45769..7b40dcd4a 100644 --- a/docs/source/contributing/dependencies.rst +++ b/docs/source/contributing/dependencies.rst @@ -1,6 +1,16 @@ Dependencies ============ +Most of CSET's dependencies are managed through conda and are listed in +``requirements/environment.yaml``. The workflow has a number of additional +requirements that must be installed separately. + +* bash - At least version 5, used for many workflow tasks, including the one that loads the environment. +* conda - Used for managing the conda environment. Alternatively mamba or micromamba can be used. +* cylc 8 - Used for running the workflow. As it typically requires significant site integration it is not installed with CSET. +* GNU find - We use the ``-printf`` option that is GNU specific. +* sqlite3 - Command line tool for sqlite. Used for doing bad things to rose_bunch. + Requirements of a new dependency -------------------------------- From 9fb26cedd1c9ea6aca2faa712012c7b43c41278b Mon Sep 17 00:00:00 2001 From: James Frost Date: Fri, 5 Sep 2025 18:44:29 +0100 Subject: [PATCH 3/7] Describe architecture of operators, recipes, and loaders --- docs/source/contributing/architecture.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/source/contributing/architecture.rst b/docs/source/contributing/architecture.rst index cf7531562..75a790e0e 100644 --- a/docs/source/contributing/architecture.rst +++ b/docs/source/contributing/architecture.rst @@ -6,12 +6,34 @@ The architecture of CSET. Operators --------- +Operators provide the core functionality of CSET. Each operator is a function +that takes some input and returns some output, possibly also producing side +effects like saving plots. + +All of the operators in CSET are contained in modules in +``src/CSET/operators/``, which group related operators. For example ``plot.py`` +contains various plotting operators. + Recipes ------- +To produce a diagnostic, operators must be combined with recipes, which are YAML +files containing a graph of operators to execute, along with any needed +arguments and a bit of metadata. + +The included recipes in CSET can be found in ``src/CSET/recipes/``. + +A recipe may optionally contain variables, such as ``$VARIABLE``, which is +replaced by a value provided on the command line or by a loader. + Loaders ------- +Loaders load recipes for use in the workflow, filling in any variables from the +configuration provided in the ``rose-suite.conf``. + +The included recipes in CSET can be found in ``src/CSET/recipes/loaders/``. + Workflow -------- From 600c75cfd961e6ba730f0ab90eb6919ad3b0f92f Mon Sep 17 00:00:00 2001 From: James Frost Date: Fri, 5 Sep 2025 18:45:22 +0100 Subject: [PATCH 4/7] Add code outline to architecture docs --- docs/source/contributing/architecture.rst | 64 +++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/docs/source/contributing/architecture.rst b/docs/source/contributing/architecture.rst index 75a790e0e..5a13d8ce7 100644 --- a/docs/source/contributing/architecture.rst +++ b/docs/source/contributing/architecture.rst @@ -133,3 +133,67 @@ metplus_point_stat ~~~~~~~~~~~~~~~~~~ These apps are not currently used, but aim to integrate METplus in the workflow. + +Code outline +------------ + +The code of CSET lives in the ``src/CSET/`` directory, arranged as follows: + +src/CSET +~~~~~~~~ + +.. code-block:: text + + src/CSET + ├── cset_workflow # Detailed below for clarity. + ├── loaders + │   ├── __init__.py # Imports all loaders to make available to the rest of CSET. + │   └── ... # Then lots of loaders, as described above. + ├── operators + │   ├── __init__.py # Code for executing ("baking") recipes. + │   ├── _colorbar_definition.json # Default colourbar definitions. + │   ├── _plot_page_template.html # Template for diagnostic output page. + │   ├── _stash_to_lfric.py # Mapping between STASH codes and LFRic variable names. + │   ├── _utils.py # Common utility code for operators. + │   └── ... # Then lots of operators, as described above. + ├── recipes + │   ├── __init__.py # Code for parbaking recipes. + │   └── ... # Then lots of recipes, as described above. + ├── __init__.py # CLI entrypoint. Sets up logging, parses arguments, etc. + ├── __main__.py # Allows running `python -m CSET`. + ├── _common.py # Common utility code. + ├── extract_workflow.py # Implementation of `cset extract-workflow`. + └── graph.py # Implementation of `cset graph`. + +src/CSET/cset_workflow +~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: text + + src/CSET/cset_workflow + ├── app # Contains the rose apps described above. + │   ├── assign_model_colours + │   │   ├── bin + │   │   │   └── assign_model_colours.py # Executable for app. + │   │   └── rose-app.conf # Rose app configuration. Mostly sets the executable. + │   └── ... # Lots more rose apps in here. + ├── bin # Files in bin are automatically on the workflow's PATH. + │   └── app_env_wrapper # Wrapper script to run things in the conda environment. + ├── includes # Deprecated; Use loaders instead now. + ├── lib # Available for import into cylc's jinja2 templating. + │   └── python + │   └── jinja_utils.py # A couple helper functions used in flow.cylc. + ├── meta # Validation and GUI layout for user configuration in rose-suite.conf. + │   ├── diagnostics + │   │   └── rose-meta.conf # Diagnostic configuration. + │   ├── rose-meta.conf # Automatically generated file, don't edit. + │   └── rose-meta.conf.jinja2 # Workflow configuration. + ├── opt # Pre-made configurations for consistent evaluation. + │   └── rose-suite-RAL3LFRIC.conf + ├── site # Site-specific cylc configuration. + │   └── localhost.cylc + ├── flow.cylc # The main workflow definition detailing what and how tasks are run. + ├── install_restricted_files.sh # Script for installing site-specific files. + ├── README.md + ├── rose-suite.conf # User configuration of workflow and diagnostics. + └── rose-suite.conf.example # Blank user configuration to be copied. From 28785dad2de2ec962d2c53bb360f2a7f9441075f Mon Sep 17 00:00:00 2001 From: James Frost Date: Mon, 8 Sep 2025 14:43:14 +0100 Subject: [PATCH 5/7] fixup! Describe architecture of operators, recipes, and loaders --- docs/source/contributing/architecture.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/contributing/architecture.rst b/docs/source/contributing/architecture.rst index 5a13d8ce7..0560ff071 100644 --- a/docs/source/contributing/architecture.rst +++ b/docs/source/contributing/architecture.rst @@ -32,7 +32,7 @@ Loaders Loaders load recipes for use in the workflow, filling in any variables from the configuration provided in the ``rose-suite.conf``. -The included recipes in CSET can be found in ``src/CSET/recipes/loaders/``. +The included loaders in CSET can be found in ``src/CSET/loaders/``. Workflow -------- From b00754989f092e48445c6c03550d5aeda44290f8 Mon Sep 17 00:00:00 2001 From: James Frost Date: Thu, 11 Sep 2025 07:42:13 +0100 Subject: [PATCH 6/7] Clarify what metadata is included in a recipe --- docs/source/contributing/architecture.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source/contributing/architecture.rst b/docs/source/contributing/architecture.rst index 0560ff071..66838070a 100644 --- a/docs/source/contributing/architecture.rst +++ b/docs/source/contributing/architecture.rst @@ -17,9 +17,10 @@ contains various plotting operators. Recipes ------- -To produce a diagnostic, operators must be combined with recipes, which are YAML -files containing a graph of operators to execute, along with any needed -arguments and a bit of metadata. +To produce a diagnostic, operators must be combined in recipes, which +are YAML files containing a graph of operators to execute, along with +any needed arguments and some metadata providing the diagnostic's title, +description and broad category. The included recipes in CSET can be found in ``src/CSET/recipes/``. From 8ebbe050ede0042842371636338355b4e641cd35 Mon Sep 17 00:00:00 2001 From: James Frost Date: Thu, 11 Sep 2025 07:46:47 +0100 Subject: [PATCH 7/7] Clarify use of sqlite3 CLI --- docs/source/contributing/dependencies.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/contributing/dependencies.rst b/docs/source/contributing/dependencies.rst index 7b40dcd4a..7fb7a679c 100644 --- a/docs/source/contributing/dependencies.rst +++ b/docs/source/contributing/dependencies.rst @@ -9,7 +9,7 @@ requirements that must be installed separately. * conda - Used for managing the conda environment. Alternatively mamba or micromamba can be used. * cylc 8 - Used for running the workflow. As it typically requires significant site integration it is not installed with CSET. * GNU find - We use the ``-printf`` option that is GNU specific. -* sqlite3 - Command line tool for sqlite. Used for doing bad things to rose_bunch. +* sqlite3 - Command line tool for sqlite. Used for modifying the rose_bunch database to allow retriggering with a different level of parallelism. Requirements of a new dependency --------------------------------