Skip to content

Commit 6344b88

Browse files
authored
Use pooch (#513)
<!-- Please ensure the PR fulfills the following requirements! --> <!-- If this is your first PR, make sure to add your details to the AUTHORS.rst! --> ### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #230 - [x] (If applicable) Documentation has been added / updated (for bug fixes / features). - [x] (If applicable) Tests have been added. - [x] CHANGELOG.rst has been updated (with summary of main changes). - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added. ### What kind of change does this PR introduce? * Removes the older, cobbled-together testing data management system for one based on `pooch` * Updates several environment variables to use a more consistent naming system, used by similar projects * The `pooch` helper function (called "yangtze"; not married to the name) can now be used to `fetch()` local or remote datasets. * The `main.yml` workflow now uses caching for testing data to reduce the synonymous download requests from the various builds. * A new workflow (`testdata-version.yml`) has been adapted to ensure that the `raven-testdata` tag is kept up-to-date. ### Does this PR introduce a breaking change? Absolutely. The testing data management system is completely changed, and environment variables have been modified. The new system uses a `pooch` registry that will validate that files have the proper checksum on `pytest` call, which is much faster than checking remote servers. The downside is that changes need to be ported to RavenPy when testing data is changed (the `testdata-version.yml` workflow will help with this). `SKIP_TEST_DATA` is now unused and obsolete. ### Other information: The changes here will break the current `raven-testdata` structure (for the better). The migration will need to happen progressively as multiple projects (and PAVICS) are reliant on the current `raven-testdata` structure, so changes will proceed as follows: - `master` remains the default branch and is now fixed. - `main` is currently a copy of master. - `new-system` will be merged to `main` and tagged using a calendar-based versioning system. - The new tagged version of `raven-testdata` will be set in `main.yml::env::RAVEN_TESTDATA_BRANCH` and in `raven.testing.utils.default_testdata_version` Then, after changes have been made to all repositories, `main` will become the new default branch. I'm not certain if we need to inform users, but the default `RavenPy` behaviour will be to refer to `main` and `vYYYY.MM.DD`, set by us. See also: Ouranosinc/raven-testdata#43
2 parents 1a119bf + dc35082 commit 6344b88

File tree

60 files changed

+1507
-1161
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1507
-1161
lines changed

.flake8

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ ignore =
1717
D,
1818
E,
1919
F,
20+
RST210,
2021
W503
2122
per-file-ignores =
2223
rst-roles =

.github/workflows/main.yml

Lines changed: 45 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ on:
77
pull_request:
88

99
env:
10-
RAVEN_TESTING_DATA_BRANCH: master
10+
RAVEN_TESTDATA_BRANCH: v2025.6.12
1111

1212
concurrency:
1313
group: ${{ github.workflow }}-${{ github.ref }}
@@ -70,27 +70,12 @@ jobs:
7070
- name: Harden Runner
7171
uses: step-security/harden-runner@6c439dc8bdf85cadbbce9ed30d1c7b959517bc49 # v2.12.2
7272
with:
73-
egress-policy: block
74-
allowed-endpoints: >
75-
api.github.com:443
76-
azure.archive.ubuntu.com:80
77-
coveralls.io:443
78-
esm.ubuntu.com:443
79-
files.pythonhosted.org:443
80-
github.com:443
81-
motd.ubuntu.com:443
82-
objects.githubusercontent.com:443
83-
packages.microsoft.com:443
84-
pavics.ouranos.ca:443
85-
pypi.org:443
86-
raw.githubusercontent.com:443
87-
test.opendap.org:80
88-
73+
disable-sudo: false
74+
egress-policy: audit
8975
- name: Checkout Repository
9076
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
9177
with:
9278
persist-credentials: false
93-
9479
- name: Set up Python${{ matrix.python-version }}
9580
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
9681
with:
@@ -117,14 +102,32 @@ jobs:
117102
- name: Install CI libraries
118103
run: |
119104
python3 -m pip install --require-hashes -r CI/requirements_ci.txt
105+
106+
- name: Environment caching (macOS)
107+
if: matrix.os == 'macos-latest'
108+
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
109+
with:
110+
path: |
111+
.tox
112+
~/Library/Caches/raven-testdata
113+
key: ${{ hashFiles('src/ravenpy/testing/registry.txt') }}-${{ env.RAVEN_TESTDATA_BRANCH }}-${{ matrix.os }}
114+
- name: Environment caching (Ubuntu)
115+
if: matrix.os == 'ubuntu-latest'
116+
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
117+
with:
118+
path: |
119+
.tox
120+
~/.cache/raven-testdata
121+
key: ${{ hashFiles('src/ravenpy/testing/registry.txt') }}-${{ env.RAVEN_TESTDATA_BRANCH }}-${{ matrix.os }}
122+
120123
- name: Test with tox and report coverage
121124
run: |
122125
if [ "${{ matrix.tox-env }}" != "false" ]; then
123-
python3 -m tox -e ${{ matrix.tox-env }}
126+
python3 -m tox -e ${{ matrix.tox-env }}-prefetch
124127
elif [ "${{ matrix.python-version }}" != "3.13" ]; then
125-
python3 -m tox -e py${{ matrix.python-version }}-coverage
128+
python3 -m tox -e py${{ matrix.python-version }}-prefetch-coverage
126129
else
127-
python3 -m tox -e py${{ matrix.python-version }}
130+
python3 -m tox -e py${{ matrix.python-version }}-prefetch
128131
fi
129132
env:
130133
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -151,18 +154,7 @@ jobs:
151154
uses: step-security/harden-runner@6c439dc8bdf85cadbbce9ed30d1c7b959517bc49 # v2.12.2
152155
with:
153156
disable-sudo: true
154-
egress-policy: block
155-
allowed-endpoints: >
156-
api.github.com:443
157-
conda.anaconda.org:443
158-
coveralls.io:443
159-
files.pythonhosted.org:443
160-
github.com:443
161-
objects.githubusercontent.com:443
162-
pavics.ouranos.ca:443
163-
pypi.org:443
164-
raw.githubusercontent.com:443
165-
test.opendap.org:80
157+
egress-policy: audit
166158
- name: Checkout Repository
167159
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
168160
with:
@@ -187,6 +179,25 @@ jobs:
187179
run: |
188180
micromamba list
189181
python -m pip check || true
182+
- name: Cache test data (macOS)
183+
if: matrix.os == 'macos-latest'
184+
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
185+
with:
186+
path: |
187+
~/Library/Caches/raven-testdata
188+
key: ${{ hashFiles('src/ravenpy/testing/registry.txt') }}-${{ env.RAVEN_TESTDATA_BRANCH }}-conda-${{ matrix.os }}
189+
- name: Cache test data (Ubuntu)
190+
if: matrix.os == 'ubuntu-latest'
191+
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
192+
with:
193+
path: |
194+
~/.cache/raven-testdata
195+
key: ${{ hashFiles('src/ravenpy/testing/registry.txt') }}-${{ env.RAVEN_TESTDATA_BRANCH }}-conda-${{ matrix.os }}
196+
197+
- name: Prefetch RavenPy test data
198+
run: |
199+
python -c "import ravenpy.testing.utils as rtu; rtu.populate_testing_data()"
200+
190201
- name: Test RavenPy
191202
run: |
192203
python -m pytest --numprocesses=logical --cov=src/ravenpy --cov-report=lcov
@@ -207,11 +218,7 @@ jobs:
207218
uses: step-security/harden-runner@6c439dc8bdf85cadbbce9ed30d1c7b959517bc49 # v2.12.2
208219
with:
209220
disable-sudo: true
210-
egress-policy: block
211-
allowed-endpoints: >
212-
coveralls.io:443
213-
github.com:443
214-
objects.githubusercontent.com:443
221+
egress-policy: audit
215222
- name: Coveralls Finished
216223
uses: coverallsapp/github-action@648a8eb78e6d50909eff900e4ec85cab4524a45b # v2.3.6
217224
with:
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
name: Verify Testing Data
2+
3+
on:
4+
pull_request:
5+
types:
6+
- opened
7+
- reopened
8+
- synchronize
9+
paths:
10+
- .github/workflows/main.yml
11+
12+
permissions:
13+
contents: read
14+
15+
jobs:
16+
use-latest-tag:
17+
name: Check Latest raven-testdata Tag
18+
runs-on: ubuntu-latest
19+
if: |
20+
(github.event.pull_request.head.repo.full_name == github.event.pull_request.base.repo.full_name)
21+
permissions:
22+
pull-requests: write
23+
steps:
24+
- name: Harden Runner
25+
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
26+
with:
27+
disable-sudo: true
28+
egress-policy: block
29+
allowed-endpoints: >
30+
api.github.com:443
31+
github.com:443
32+
- name: Checkout Repository
33+
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
34+
with:
35+
persist-credentials: false
36+
- name: Find raven-testdata Tag and CI Testing Branch
37+
run: |
38+
RAVEN_TESTDATA_TAG="$( \
39+
git -c 'versionsort.suffix=-' \
40+
ls-remote --exit-code --refs --sort='version:refname' --tags https://github.com/Ouranosinc/raven-testdata '*.*.*' \
41+
| tail --lines=1 \
42+
| cut --delimiter='/' --fields=3)"
43+
echo "RAVEN_TESTDATA_TAG=${RAVEN_TESTDATA_TAG}" >> $GITHUB_ENV
44+
RAVEN_TESTDATA_BRANCH="$(grep -E "RAVEN_TESTDATA_BRANCH" .github/workflows/main.yml | cut -d ' ' -f4)"
45+
echo "RAVEN_TESTDATA_BRANCH=${RAVEN_TESTDATA_BRANCH}" >> $GITHUB_ENV
46+
- name: Report Versions Found
47+
run: |
48+
echo "Latest raven-testdata tag: ${RAVEN_TESTDATA_TAG}"
49+
echo "Tag for raven-testdata in CI: ${RAVEN_TESTDATA_BRANCH}"
50+
env:
51+
RAVEN_TESTDATA_TAG: ${{ env.RAVEN_TESTDATA_TAG }}
52+
RAVEN_TESTDATA_BRANCH: ${{ env.RAVEN_TESTDATA_BRANCH }}
53+
- name: Find Comment
54+
uses: peter-evans/find-comment@3eae4d37986fb5a8592848f6a574fdf654e61f9e # v3.1.0
55+
id: fc
56+
with:
57+
issue-number: ${{ github.event.pull_request.number }}
58+
comment-author: 'github-actions[bot]'
59+
body-includes: It appears that this Pull Request modifies the `main.yml` workflow.
60+
- name: Compare Versions
61+
if: ${{( env.RAVEN_TESTDATA_TAG != env.RAVEN_TESTDATA_BRANCH )}}
62+
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
63+
with:
64+
script: |
65+
core.setFailed('Configured `raven-testdata` tag is not `latest`.')
66+
- name: Update Failure Comment
67+
if: ${{ failure() }}
68+
uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4.0.0
69+
with:
70+
comment-id: ${{ steps.fc.outputs.comment-id }}
71+
issue-number: ${{ github.event.pull_request.number }}
72+
body: |
73+
> [!WARNING]
74+
> It appears that this Pull Request modifies the `main.yml` workflow.
75+
76+
On inspection, it seems that the `RAVEN_TESTDATA_BRANCH` environment variable is set to a tag that is not the latest in the `Ouranosinc/raven-testdata` repository.
77+
78+
This value must match the most recent tag (`${{ env.RAVEN_TESTDATA_TAG }}`) in order to merge this Pull Request.
79+
80+
If this PR depends on changes in a new testing dataset branch, be sure to tag a new version of `Ouranosinc/raven-testdata` once your changes have been merged to its `main` branch.
81+
edit-mode: replace
82+
- name: Update Success Comment
83+
if: ${{ success() }}
84+
uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4.0.0
85+
with:
86+
comment-id: ${{ steps.fc.outputs.comment-id }}
87+
issue-number: ${{ github.event.pull_request.number }}
88+
body: |
89+
> [!NOTE]
90+
> It appears that this Pull Request modifies the `main.yml` workflow.
91+
92+
On inspection, the `RAVEN_TESTDATA_BRANCH` environment variable is set to the most recent tag (`${{ env.RAVEN_TESTDATA_TAG }}`).
93+
94+
No further action is required.
95+
edit-mode: replace

.pre-commit-config.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ repos:
77
hooks:
88
- id: pyupgrade
99
args: [ '--py39-plus' ]
10+
exclude: ^tests/conftest\.py$
1011
- repo: https://github.com/pre-commit/pre-commit-hooks
1112
rev: v5.0.0
1213
hooks:
@@ -52,7 +53,7 @@ repos:
5253
- repo: https://github.com/astral-sh/ruff-pre-commit
5354
rev: v0.12.2
5455
hooks:
55-
- id: ruff
56+
- id: ruff-check
5657
args: [ '--fix' ]
5758
# - id: ruff-format
5859
- repo: https://github.com/pycqa/flake8
@@ -74,11 +75,11 @@ repos:
7475
hooks:
7576
- id: nbqa-pyupgrade
7677
args: [ '--py39-plus' ]
77-
additional_dependencies: [ 'pyupgrade==3.19.1' ]
78+
additional_dependencies: [ 'pyupgrade==3.20.0' ]
7879
- id: nbqa-black
7980
additional_dependencies: [ 'black==25.1.0' ]
8081
- id: nbqa-isort
81-
additional_dependencies: [ 'isort==6.0.0' ]
82+
additional_dependencies: [ 'isort==6.0.1' ]
8283
- repo: https://github.com/kynan/nbstripout
8384
rev: 0.8.1
8485
hooks:

CHANGELOG.rst

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,15 @@ v0.18.3 (unreleased)
77

88
New features
99
^^^^^^^^^^^^
10-
* Added `parsers.parse_rv` to extract a Command value from an RV file.
10+
* Added `parsers.parse_rv` to extract a Command value from an RV file. (PR #503)
11+
* New module `ravenpy.testing` has been added to provide utility functions and support for testing and testing data management. (PR #513)
12+
13+
Breaking changes
14+
^^^^^^^^^^^^^^^^
15+
* `ravenpy` now requires `pooch>=1.8.0` for downloading and caching remote testing data. (PR #513)
16+
* `ravenpy.utilities.testdata` has been refactored to new module `ravenpy.testing`. The `publish_release_notes` function is now located in `ravenpy.utilities.publishing`. (PR #513)
17+
* The `ravenpy.testing.utils` module now provides a `yangtze()` class for fetching and caching the `raven-testdata` testing data. A convenience function (`get_file`) replaces the previous `get_local_testdata`. (PR #513)
18+
* The `ravenpy.testing.utils.open_dataset` function no longer supports OPeNDAP URLs or local file paths. Instead, it uses the `yangtze()` class to fetch datasets from the testing data repository or the local cache. Users should now use `xarray.open_dataset()` directly for OPeNDAP URLs or local files. (PR #513)
1119

1220
Bug fixes
1321
^^^^^^^^^
@@ -18,6 +26,7 @@ Bug fixes
1826
Internal changes
1927
^^^^^^^^^^^^^^^^
2028
* `ravenpy` now requires `xclim>=0.57.0` and `xsdba` (v0.4.0+). (PR #511)
29+
* The `tests` folder no longer contains an `__init__.py` file and is no longer treated as a package. `pytest` fixtures from `emulators.py` are now directly imported into `conftest.py` for use in tests, and existing `pytest` fixtures have been modified to use the new `yangtze()` class for fetching testing data. (PR #513)
2130

2231
v0.18.2 (2025-05-05)
2332
--------------------

docs/notebooks/02_Extract_geographical_watershed_properties.ipynb

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,10 @@
3737
"import matplotlib.pyplot as plt\n",
3838
"import numpy as np\n",
3939
"import rasterio\n",
40-
"import rioxarray as rio\n",
4140
"from birdy import WPSClient\n",
4241
"\n",
43-
"from ravenpy.utilities.testdata import get_file\n",
42+
"# Utility that simplifies working with test data hosted on GitHub\n",
43+
"from ravenpy.testing.utils import get_file\n",
4444
"\n",
4545
"# This is the URL of the Geoserver that will perform the computations for us.\n",
4646
"url = os.environ.get(\n",
@@ -74,8 +74,7 @@
7474
"\"\"\"\n",
7575
"feature_url = \"input.geojson\"\n",
7676
"\"\"\"\n",
77-
"# However, to keep things tidy, we have also prepared a version that can be accessed easily for\n",
78-
"# demonstration purposes:\n",
77+
"# However, to keep things tidy, we have also prepared a version that can be accessed easily for demonstration purposes:\n",
7978
"feature_url = get_file(\"notebook_inputs/input.geojson\")\n",
8079
"df = gpd.read_file(feature_url)\n",
8180
"display(df)\n",

docs/notebooks/03_Extracting_forcing_data.ipynb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,8 @@
3131
"import xarray as xr\n",
3232
"from clisops.core import subset\n",
3333
"\n",
34-
"from ravenpy.utilities.testdata import get_file"
34+
"# Utility that simplifies working with test data hosted on GitHub\n",
35+
"from ravenpy.testing.utils import get_file"
3536
]
3637
},
3738
{

docs/notebooks/04_Emulating_hydrological_models.ipynb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,9 @@
4343
"from pathlib import Path\n",
4444
"\n",
4545
"from ravenpy.config import commands as rc\n",
46-
"from ravenpy.utilities.testdata import get_file"
46+
"\n",
47+
"# Utility that simplifies fetching and caching test data hosted on GitHub\n",
48+
"from ravenpy.testing.utils import get_file"
4749
]
4850
},
4951
{

docs/notebooks/05_Advanced_RavenPy_configuration.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@
2626
"metadata": {},
2727
"outputs": [],
2828
"source": [
29-
"# Utility that simplifies getting data hosted on the remote PAVICS-Hydro data server.\n",
30-
"from ravenpy.utilities.testdata import get_file"
29+
"# Utility that simplifies fetching and caching data hosted on GitHub\n",
30+
"from ravenpy.testing.utils import get_file"
3131
]
3232
},
3333
{
@@ -216,7 +216,7 @@
216216
"\n",
217217
"# Observed weather data for the Salmon river. We extracted this using Tutorial Notebook 03 and the\n",
218218
"# salmon_river.geojson file as the contour.\n",
219-
"ts = get_file(\"notebook_inputs/ERA5_weather_data_Salmon.nc\")\n",
219+
"ts = yangtze.fetch(\"notebook_inputs/ERA5_weather_data_Salmon.nc\")\n",
220220
"\n",
221221
"# Set alternate variable names in the timeseries data file\n",
222222
"alt_names = {\n",

docs/notebooks/06_Raven_calibration.ipynb

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@
3333
"\n",
3434
"from ravenpy.config import commands as rc\n",
3535
"from ravenpy.config import emulators\n",
36+
"\n",
37+
"# Utility that simplifies working with test data hosted on GitHub\n",
38+
"from ravenpy.testing.utils import get_file\n",
3639
"from ravenpy.utilities.calibration import SpotSetup"
3740
]
3841
},
@@ -52,9 +55,7 @@
5255
"metadata": {},
5356
"outputs": [],
5457
"source": [
55-
"from ravenpy.utilities.testdata import get_file\n",
56-
"\n",
57-
"# We get the netCDF for testing on a server. You can replace the getfile method by a string containing the path to your own netCDF\n",
58+
"# We get the netCDF for testing on a server. You can replace the yangtze method with a string containing the absolute or relative path to your own netCDF\n",
5859
"nc_file = get_file(\n",
5960
" \"raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc\"\n",
6061
")\n",

0 commit comments

Comments
 (0)