MDTF diagnostics in the cloud #137
tsjackson-noaa
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This discussion thread is to capture the work that would be needed to run a cloud-hosted version of the MDTF diagnostics, as suggested by @aradhakrishnanGFDL and @jkrasting. I think this project would yield many benefits and, more to the point, is very feasible given the current capabilities of the code.
The technical caveat is that the catalog, as it's currently implemented, is populated with objects rather than their serializations, e.g. DateRanges instead of the string '1990-2010'. Additional code will be needed to deserialize catalog entries when they're being compared to variable data in the query.
Note that, regardless of input format, the output of the preprocessor would still be a set of netcdf files, as this is what all PODs are currently written to accept. These can be written anywhere on local storage, e.g. to a temp directory, but we may run into storage limitations on the instance. Using dask to hand off in-memory Datasets from the framework to PODs is definitely a feature we want in the future, but beyond the scope of this exercise.
For concreteness, and by analogy to what I've seen with Pangeo, my initial proposal would be doing this via a
Beta Was this translation helpful? Give feedback.
All reactions