Skip to content

Commit 5526d30

Browse files
authored
Addition of radio-astronomy use-case (#358)
* Add radio-astronomy use case In use-cases/radio-astronomy, an itwinai integration of ML pulsar detection workflow developed by HTW Berlin is presented. This use-case features a sophisticated ML pipeline composed of 3 models. This workflow has been tested with torch-ddp, microsoft deepspeed and horovod distributed learning strategies. The batch-jsc.sh script provides needed functionality for each strategy, so the user only needs to modify the config.yaml file to change the strategy. * Test-suite The use-case also has a test suite which supports automatic generation of synthetic data in the temporary folder and trains the models on this synthetic data. * Pulsar-plugin https://github.com/interTwin-eu/pulsar-plugin Besides this use-case, which is kept in itwinai repository for demonstration reasons, a plug-in solution for itwinai was developed for ease of modification of the integration by HTW Berlin developers, if necessary. The data.py and trainer.py files, which provide the integration between use-case source code and itwinai code, are very similar between this use-case and the plug-in as of May 2025. * Running from config.yaml file Five pipelines, one for data generation, three for individual model training and one for full workflow evaluation, are set-up in config.yaml file. - Alex Krochak, FZJ
1 parent 33f0385 commit 5526d30

File tree

19 files changed

+4790
-5157
lines changed

19 files changed

+4790
-5157
lines changed

.github/workflows/pytest.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ jobs:
2121
remove-android: true
2222
remove-haskell: true
2323
remove-codeql: true
24-
24+
2525
- uses: actions/checkout@v4
2626

2727
- name: Move Docker directory
@@ -30,7 +30,7 @@ jobs:
3030
sudo mv /var/lib/docker /docker/ &&
3131
sudo ln -s /docker/docker /var/lib/docker &&
3232
sudo systemctl restart docker
33-
33+
3434
# Run tests with pytest in a container
3535
- name: Run Integration Test (development pipeline)
3636
uses: dagger/dagger-for-github@v7
@@ -42,7 +42,7 @@ jobs:
4242
--context ..
4343
--dockerfile ../env-files/torch/skinny.Dockerfile
4444
test-local
45-
--cmd "pytest,-v,--disable-warnings,-n,logical,/app/tests/,-m,not hpc and not tensorflow"
45+
--cmd "pytest,-v,--disable-warnings,-n,logical,/app/tests/,--dist,loadfile,-m,not hpc and not tensorflow"
4646
logs
4747
cloud-token: ${{ secrets.DAGGER_CLOUD_TOKEN }}
4848
version: "0.18.0"
@@ -61,14 +61,14 @@ jobs:
6161
# - name: Make PyTorch virtualenv
6262
# shell: bash -l {0}
6363
# run: make torch-env-cpu
64-
64+
6565
# # Comment this back in to also build tensorflow env
6666
# # - name: Make Tensorflow virtualenv
6767
# # shell: bash -l {0}
6868
# # run: make tensorflow-env-cpu
6969

7070
# # NOTE, to change the name of the env in which tests are run, set custom TORCH_ENV
71-
# # and TF_ENV env variables. Default environment names are ".venv-pytorch" and
71+
# # and TF_ENV env variables. Default environment names are ".venv-pytorch" and
7272
# # ".venv-tf"
7373

7474
# - name: Run pytest for workflows

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@ mnist-sample-data/
2828
exp_data/
2929
mnist_dataset/
3030

31-
3231
# Kubernetes
3332
secret*.yaml
3433

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ contains thoroughly tested features aligned with the toolkit's most recent relea
100100
use-cases/cyclones_doc
101101
use-cases/mnist_doc
102102
use-cases/xtclim_doc
103+
use-cases/radio-astronomy
103104
use-cases/latticeqcd_doc
104105

105106
.. toctree::

docs/use-cases/radio-astronomy.rst

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
Pulsar Segmentation and Analysis for Radio-Astronomy (HTW Berlin)
2+
===============================================================================================
3+
The code is adapted from
4+
`this repository <https://gitlab.com/ml-ppa/pulsarrfi_nn/-/tree/version_0.2/unet_semantic_segmentation?ref_type=heads>`_.
5+
Please visit the original repository for more technical information on the code.
6+
This use case features a sophisticated pipeline composed of few neural networks.
7+
8+
Integration Author: Oleksandr Krochak, FZJ
9+
10+
Environment Management
11+
-----------------------------------------------------------------------------------------------
12+
It is recommended to use the UV environment for running this pipeline.
13+
The overview of itwinai-wide module dependencies can be found in `intertwin/pyproject.toml`.
14+
By running `uv sync --extra devel --extra torch --extra radio-astronomy`, the uv lockfile will
15+
be generated/updated that ensures that correct dependencies are installed. If you want to
16+
change some use-case specific dependencies, please do so in pyproject.toml in the radio-astronomy
17+
section. Afterwards, re-run `uv sync` with the same flags.
18+
19+
Alternatively, you can install the required dependencies from the use-case directory:
20+
`pip install requirements.txt`
21+
22+
Running from a configuration file
23+
-----------------------------------------------------------------------------------------------
24+
You can run the full pipeline sequence by executing the following commands locally.
25+
itwinai will read these commands from the `config.yaml` file in the root of the repository.
26+
1. Generate the synthetic data - `itwinai exec-pipeline +pipe_key=syndata_pipeline`
27+
2. Initialize and train a UNet model - `itwinai exec-pipeline +pipe_key=unet_pipeline`
28+
3. Initialize and train a FilterCNN model - `itwinai exec-pipeline +pipe_key=fcnn_pipeline`
29+
4. Initialize and train a CNN1D model - `itwinai exec-pipeline +pipe_key=cnn1d_pipeline`
30+
5. Compile a full pipeline and test it - `itwinai exec-pipeline +pipe_key=evaluate_pipeline`
31+
32+
When running on HPC, you can use the `batch.sh` SLURM script to run these commands.
33+
34+
Logging with MLflow
35+
-----------------------------------------------------------------------------------------------
36+
By default, the `config.yaml` ensures that the MLflow logging is enabled during the training.
37+
During or after the run, you can launch an MLflow server by executing
38+
`mlflow server --backend-store-uri mllogs/mlflow` and connecting to `http://127.0.0.1:5000/`
39+
in your browser.
40+
41+
Test suite
42+
-----------------------------------------------------------------------------------------------
43+
The test suite is located in the `tests/use-cases/radio-astronomy` folder.
44+
45+
Before running the test suite, you should make sure that the pytorch fixture in:
46+
`tests/use-cases/radio-astronomy/test_radio-astronomy.py`:torch_env()
47+
is correctly defined and corresponds to the virtual environment where itwinai is installed on
48+
your system.
49+
50+
It contains integration tests for each of the pipelines 1-5 mentioned above. The configuration
51+
and execution of the test suite is defined in:
52+
`tests/use-cases/radio-astronomy/test_radio-astronomy.py`
53+
and in the configuration file in the use-case repository:
54+
`use-cases/radio-astronomy/.config-test.yaml`.
55+
If you are updating the test suite, make sure you update both of these files.
56+
57+
Feel free to change the pytest markers as needed, but be careful with pushing these changes.
58+
Tests should be able to run in an isolated environment.

pyproject.toml

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
requires = ["setuptools", "setuptools-scm", "wheel"]
77
build-backend = "setuptools.build_meta"
88

9+
910
[project]
1011
name = "itwinai"
1112
version = "0.3.1"
@@ -54,6 +55,20 @@ dependencies = [
5455
# Has to be added as an optional dependency, when installed in non-AMD environments, the torch
5556
# import will fail as it tries to find the related AMD libs of amdsmi.
5657
amd = ["amdsmi>=6.4.0"]
58+
59+
# dependencies that are not included by dev or torch
60+
# but needed for radio-astronomy.
61+
radio-astronomy = [
62+
"pulsarrfi-nn @ git+https://gitlab.com/ml-ppa/pulsarrfi_nn.git@version_0.2#subdirectory=unet_semantic_segmentation",
63+
"pulsardt @ git+https://gitlab.com/ml-ppa/pulsardt@main",
64+
"ipywidgets",
65+
"tqdm>=4.65.0",
66+
"numpyencoder>=0.3.0",
67+
"pyquaternion>=0.9.9",
68+
"scikit-image>=0.22.0",
69+
"pyqt6>=6.0",
70+
]
71+
5772
torch = [
5873
"torch==2.4.*",
5974
"lightning>=2",
@@ -139,16 +154,20 @@ conflicts = [[{ extra = "tf-cuda" }, { extra = "torch" }]]
139154
# Use PyTorch with CUDA for anything that is not macos
140155
[tool.uv.sources]
141156
torch = [{ index = "pytorch-cu121", marker = "platform_system != 'Darwin'" }]
142-
torchvision = [
143-
{ index = "pytorch-cu121", marker = "platform_system != 'Darwin'" },
144-
]
157+
torchvision = [{ index = "pytorch-cu121", marker = "platform_system != 'Darwin'" }]
158+
pulsardt = [{ index = "pulsar-dt"}]
145159

146160
# Specific index for pytorch
147161
[[tool.uv.index]]
148162
name = "pytorch-cu121"
149163
url = "https://download.pytorch.org/whl/cu121"
150164
explicit = true
151165

166+
[[tool.uv.index]]
167+
name = "pulsar-dt"
168+
url = "https://gitlab.com/api/v4/projects/59840702/packages/pypi/simple"
169+
explicit = true
170+
152171
# Ruff configuration: https://docs.astral.sh/ruff/configuration/
153172
[tool.ruff]
154173
line-length = 95

tests/loggers/test_prov4ml.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,4 +201,4 @@ def test_log_prov_documents(logger_instance, mlflow_run):
201201

202202
log_prov_documents.assert_called_once_with(create_graph=True, create_svg=True)
203203
mlflow_log_artifact.assert_any_call("doc1")
204-
mlflow_log_artifact.assert_any_call("doc2")
204+
mlflow_log_artifact.assert_any_call("doc2")
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# --------------------------------------------------------------------------------------
2+
# Part of the interTwin Project: https://www.intertwin.eu/
3+
#
4+
# Created by: Alex Krochak
5+
#
6+
# Credit:
7+
# - Alex Krochak <o.krochak@fz-juelich.de> - FZJ
8+
# --------------------------------------------------------------------------------------
9+
10+
"""Tests for radio-astronomy use case.
11+
12+
Intended to be integration tests, to make sure that updates in the code base
13+
do not break use cases' workflows.
14+
15+
This is meant to be run from the main itwinai directory, not the use-case folder !!!
16+
"pytest use-cases/radio-astronomy/tests/test_radio-astronomy.py"
17+
18+
NOTE FOR DEVELOPERS: if you are editing this file, make sure that entries in
19+
use-cases/radio-astronomy/.config-test.yaml are updated accordingly !!!
20+
"""
21+
22+
import os
23+
import subprocess
24+
from pathlib import Path
25+
import shutil
26+
27+
import pytest
28+
29+
USECASE_FOLDER = Path("use-cases", "radio-astronomy").resolve()
30+
31+
@pytest.fixture
32+
def torch_env() -> str:
33+
"""Returns absolute path to torch virtual environment."""
34+
env_path = Path(os.environ.get("TORCH_ENV", "./.venv"))
35+
return str(env_path.resolve())
36+
37+
@pytest.fixture
38+
def syndata(tmp_path, torch_env,install_requirements):
39+
# This fixture implicitly tests the synthetic data generation pipeline
40+
install_requirements(USECASE_FOLDER, torch_env)
41+
42+
cmd_data = (
43+
f"{torch_env}/bin/itwinai exec-pipeline --config-name .config-test "
44+
f"+pipe_key=syndata_pipeline ++syndata_test_dir={tmp_path}/ "
45+
)
46+
if len(os.listdir(tmp_path)) == 0: # only run if directory is empty
47+
# Copy the necessary files to the temporary directory for testing
48+
shutil.copy(USECASE_FOLDER / ".config-test.yaml", tmp_path)
49+
shutil.copy(USECASE_FOLDER / "data.py", tmp_path)
50+
shutil.copy(USECASE_FOLDER / "trainer.py", tmp_path)
51+
52+
subprocess.run(cmd_data.split(), check=True, cwd=tmp_path)
53+
54+
return tmp_path
55+
56+
@pytest.fixture
57+
def generate_unet(torch_env, syndata):
58+
"""Generate the U-Net model for the Filter-CNN test. """
59+
cmd = (
60+
f"{torch_env}/bin/itwinai exec-pipeline --config-name .config-test "
61+
f"+pipe_key=unet_pipeline ++image_directory={syndata}/ ++mask_directory={syndata}/ "
62+
)
63+
64+
subprocess.run(cmd.split(), check=True, cwd=syndata)
65+
66+
# @pytest.mark.skip(reason="dependent on .test_dataset, incoroporated into integration test")
67+
def test_radio_astronomy_unet(torch_env, syndata, install_requirements):
68+
"""Test U-Net Pulsar-DDT trainer by running it end-to-end
69+
via the config-test.yaml configuration file."""
70+
71+
install_requirements(USECASE_FOLDER, torch_env)
72+
73+
cmd = (
74+
f"{torch_env}/bin/itwinai exec-pipeline --config-name .config-test "
75+
f"+pipe_key=unet_pipeline ++image_directory={syndata}/ ++mask_directory={syndata}/ "
76+
)
77+
78+
subprocess.run(cmd.split(), check=True, cwd=syndata)
79+
80+
@pytest.mark.functional
81+
def test_radio_astronomy_filtercnn(torch_env, syndata, generate_unet, install_requirements):
82+
"""Test Filter-CNN Pulsar-DDT trainer by running it end-to-end
83+
via the config-test.yaml configuration file. Requires the U-Net model to be present."""
84+
85+
install_requirements(USECASE_FOLDER, torch_env)
86+
87+
cmd = (
88+
f"{torch_env}/bin/itwinai exec-pipeline --config-name .config-test "
89+
f"+pipe_key=fcnn_pipeline ++image_directory={syndata}/ ++mask_directory={syndata}/ "
90+
)
91+
92+
subprocess.run(cmd.split(), check=True, cwd=syndata)
93+
94+
def test_radio_astronomy_cnn1d(torch_env, syndata, install_requirements):
95+
"""Test CNN-1D Pulsar-DDT trainer by running it end-to-end
96+
via the config-test.yaml configuration file."""
97+
98+
install_requirements(USECASE_FOLDER, torch_env)
99+
100+
cmd = (
101+
f"{torch_env}/bin/itwinai exec-pipeline --config-name .config-test "
102+
f"+pipe_key=cnn1d_pipeline ++image_directory={syndata}/ ++mask_directory={syndata}/ "
103+
)
104+
105+
subprocess.run(cmd.split(), check=True, cwd=syndata)
106+
107+
@pytest.mark.skip(reason="dependent on large real data set")
108+
def test_radio_astronomy_evaluate(torch_env):
109+
"""Test the evaluate pipeline by running it end-to-end
110+
via the config-test.yaml configuration file."""
111+
112+
cmd = (
113+
f"{torch_env}/bin/itwinai exec-pipeline "
114+
f"--config-name .config-test "
115+
f"+pipe_key=evaluate_pipeline "
116+
)
117+
118+
## Run the pipeline and check file generation in the use-case folder
119+
subprocess.run(cmd.split(), check=True, cwd=USECASE_FOLDER)
120+
## Clean up the use-case folder
121+
subprocess.run("./.pytest-clean", check=True, cwd=USECASE_FOLDER)

0 commit comments

Comments
 (0)