-
Notifications
You must be signed in to change notification settings - Fork 5
Use pooch #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pooch #513
Conversation
…reformatting, ignore RST210
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
One of the major issues I'm having is that the testing data comprises lots of large files. Around 45 MB are used in the regular testing of Once I've finalized a version of |
Note It appears that this Pull Request modifies the On inspection, the No further action is required. |
I was running into a race condition where multiple workers were attempting to download and write the remote registry file at the same time. I've adjusted the logic to make better use of lock files so that we don't run into these situations. These changes could/should be ported to other implementations this system ( |
### Description The modifications here completely change the way that testing data is managed in order to provide a versioning system as well as reduce the number of remote requests to GitHub. Projects that rely on `raven-testdata` will now use `pooch` alongside file registries to track the file and checksums, as well as manage the caches. This repo will also now be using tagged commits to ensure that testing data is kept up-to-date while at the same time facilitating easier backtracking if needed. ### Changes - All `md5` files have been removed. - The `README.md` has been updated to detail the new process for fetching data with `pooch` - The `report_check_sums.py` script now creates new registries as well as updates the registry table in the `README.md` - The return characters in the text files have been converted to `LF` to ensure that checksums remain the same when opened on Linux and macOS; lots of redundant whitespace has also been removed from tables. ### What's next Once this PR is approved, `main` will be tagged using calendar versioning as the first stable version. Once all projects reliant on `raven-testdata` have been updated to the new system, the default branch will change from `master` to `main`. The `master` branch will be kept for backwards-compatibility. See also: CSHS-CWRA/RavenPy#513
docs/notebooks/02_Extract_geographical_watershed_properties.ipynb
Outdated
Show resolved
Hide resolved
…es, remove explicit platformdirs installation
I thought we had agreed that the name |
## Overview Changes: * Replaced testdata fetching mechanism with `pooch`-based caching system * Updated tooling to replace a few checks with `ruff`, `Makefile` linting checks are now more precise * Synchronized dependencies and added a few base versions for consistency * Removed obsolete `setup.cfg` * `raven` now uses Trusted Publisher for TestPyPI and PyPI releases ## Related Issue / Discussion This PR is part of an effort to move the `raven-testdata` repo to a versioned scheme that will make it much easier to track changes between major releases of `RavenWPS` and `RavenPy`. It also provides much more thread-safety and prevents needless requests to GitHub when running `pytest`. I'm pinning things so that this should be the last release to support Python 3.9. We should drop that version in the next significant release (if we do another one). ## Additional Information CSHS-CWRA/RavenPy#513 Ouranosinc/raven-testdata#43
Pull Request Checklist:
number
) and pull request (:pull:number
) has been added.What kind of change does this PR introduce?
pooch
pooch
helper function (called "yangtze"; not married to the name) can now be used tofetch()
local or remote datasets.main.yml
workflow now uses caching for testing data to reduce the synonymous download requests from the various builds.testdata-version.yml
) has been adapted to ensure that theraven-testdata
tag is kept up-to-date.Does this PR introduce a breaking change?
Absolutely. The testing data management system is completely changed, and environment variables have been modified. The new system uses a
pooch
registry that will validate that files have the proper checksum onpytest
call, which is much faster than checking remote servers. The downside is that changes need to be ported to RavenPy when testing data is changed (thetestdata-version.yml
workflow will help with this).SKIP_TEST_DATA
is now unused and obsolete.Other information:
The changes here will break the current
raven-testdata
structure (for the better). The migration will need to happen progressively as multiple projects (and PAVICS) are reliant on the currentraven-testdata
structure, so changes will proceed as follows:master
remains the default branch and is now fixed.main
is currently a copy of master.new-system
will be merged tomain
and tagged using a calendar-based versioning system.raven-testdata
will be set inmain.yml::env::RAVEN_TESTDATA_BRANCH
and inraven.testing.utils.default_testdata_version
Then, after changes have been made to all repositories,
main
will become the new default branch. I'm not certain if we need to inform users, but the defaultRavenPy
behaviour will be to refer tomain
andvYYYY.MM.DD
, set by us.See also: Ouranosinc/raven-testdata#43