Skip to content

Conversation

IvanaEscobar
Copy link
Contributor

@IvanaEscobar IvanaEscobar commented Jul 14, 2025

Feature allows user to read in standard pickup files with read_mds

Pickup file structure is expected to contain 3 2D fields, e.g.:

 nDims = [   2 ];
 dimList = [
   192,    1,   192,
    32,    1,   32
 ];
 dataprec = [ 'float64' ];
 nrecords = [   123 ];
 timeStepNumber = [      72000 ];
 nFlds = [   11 ];
 fldList = {
 'Uvel    ' 'GuNm1   ' 'Vvel    ' 'GvNm1   ' 'Theta   ' 'GtNm1   ' 'Salt    ' 'GsNm1   ' 'EtaN    ' 'dEtaHdt ' 'EtaH    '
 };

@mjlosch
Copy link
Member

mjlosch commented Aug 7, 2025

@IvanaEscobar Do you think it's possible to add a test for this new functionality?

@IvanaEscobar
Copy link
Contributor Author

Great idea, I'll add it by tomorrow

@IvanaEscobar
Copy link
Contributor Author

tomorrow means today! 😥

Added contents from MITgcm/verification/global_ocean.cs.32x.15/input/pickup.0000072000.meta to a temporary file. This is how the pytest test_utils.py::test_parse_meta works too.

This test avoids having to load a static file to the repository. CI works with new test too

@mjlosch
Copy link
Member

mjlosch commented Sep 15, 2025

@IvanaEscobar I tried to read a simple pair pickup.0000072000.data/meta but only got empty Data variables. How can I test the functionality?

@IvanaEscobar
Copy link
Contributor Author

IvanaEscobar commented Sep 15, 2025

What function call are you using to check?

Something like the following should work:

from xmitgcm.utils import read_mds; import numpy as np
pk = read_mds('pickup.0000072000', shape=(nz,ny,nx), dtype=np.dtype('>f8'))

I updated the PR description that misled a user to think open_mdsdataset should work with a pickup file now. That should be separate PR. A pickup file contains a mix of 2D and 3D fields, which I don't think xmitgcm.open_mdsdataset can handle (?).

@IvanaEscobar
Copy link
Contributor Author

Let me know if you think open_mdsdataset should read pickup files. It looks as if attempts were made to at least recognize a pickup file; see mod_store.py::_is_pickup_prefix .

To really get the blended data sizes read in would take heavier modifications around reading MDS chunks, and possibly would need to introduce variable names in variables.py since the pickup file use a unique nomenclature e.g. Uvel instead of U or UVEL.

@mjlosch
Copy link
Member

mjlosch commented Sep 16, 2025

Thanks for the clarification, I was indeed trying to use open_mdsdataset.

I am still struggling with read_mds:

from xmitgcm import read_mds
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 from xmitgcm import read_mds

ImportError: cannot import name 'read_mds' from 'xmitgcm' (/Users/mlosch/MITgcm/xmitgcm/xmitgcm/__init__.py)

but that's probably a problem at my end. When I import xmitgcm I can use xmitgcm.mds_store.read_mds, but then I get e.g. for theglobal_ocean.cs32x15 pickup (which has 15 layers):

in [29]: p=xmitgcm.mds_store.read_mds('pickup.0000072000',shape=(15,6*32,32))
In [30]: p
Out[30]: 
{'Uvel': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'GuNm1': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'Vvel': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'GvNm1': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'Theta': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'GtNm1': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'Salt': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'GsNm1': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'EtaN': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'dEtaHdt': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>,
 'EtaH': dask.array<getitem, shape=(32, 192), dtype=>f8, chunksize=(32, 192), chunktype=numpy.ndarray>}

i.e. only 2D arrays. That's probably not what you intended, is it? Where do I go wrong?

@IvanaEscobar
Copy link
Contributor Author

IvanaEscobar commented Sep 16, 2025

Sorry, read_mds is supposed to only be available from utils.py. So the import looks like

from xmitgcm.utils import read_mds

I updated the comment above to show the correct way to load read_mds.


I'm seeing the same 2D array limitation. It's an issue with the blended 2D and 3D fields of pickup files. I made more updates to accommodate for the dask pickup read. Can you try loading on your end again?

Here is what should work?

from xmitgcm.utils import read_mds
p=read_mds('pickup.0000072000',shape=[15,6*32,32])
...
p['EtaH'].compute()

Note: I'm hesitant to modify utils.py too much since I don't have to best understanding of the intention of each routine.

@mjlosch
Copy link
Member

mjlosch commented Sep 16, 2025

Works for regular pickups (where "Eta"-related fields are the only exception to otherwise 3D fields), but what happens when TICES in a pickup_seaice is >1 (that's actually the default: 7).

also the CI fails ...

@IvanaEscobar
Copy link
Contributor Author

I wasn't aware pickup changed formats. I've only used pickups without the optional pkg/seaice active.

The bridge of optional functionality could be crossed when a user requests that feature. Do you think that's okay @mjlosch ?

The CI related issue may have belonged in a separate PR, but I went ahead and fixed it.

@mjlosch
Copy link
Member

mjlosch commented Sep 22, 2025

There are pickup files for many different packages. Here's a full(?) list (some write pickups only for certain configurations, e.g. diagnostics, obcs, generic_advdiff):

atm_compon_interf
atm_phys
atm2d
bbl
bling
cd_code
cheapaml
diagnostics
dic
ecco
fizhi
generic_advdiff
ggl90
gmredi
land
obcs
ocn_compon_interf
ptracers
seaice
shelfice
streamice
thsice

Some of them will be relatively straightforward, others less so, e.g. seaice, obcs. Maybe there needs to be a warning that your code will not work in all cases?

CI: Can you explain, why your fix now works? I don't understand why ds.update causes problems.

@IvanaEscobar
Copy link
Contributor Author

IvanaEscobar commented Sep 22, 2025

I added a note in the PR description to clarify that this PR is for the standard pickup files, and not for the optional variations of the pickup file.


The CI stopped working in the test cases using the latest version of xarray. Ii's likely that an update to xarray was recently pushed, causing a breaking change in the test_llcreader tests. I don't track xarray, so I'm not positive if this is the case. My fix keeps up with whatever changed in the libraries used for this repo's testing suite.

IvanaEscobar and others added 2 commits September 22, 2025 15:27
Co-authored-by: Martin Losch <30285667+mjlosch@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants