Skip to content

Commit abfbccf

Browse files
committed
Read the clean currents in
1 parent 4dc7277 commit abfbccf

File tree

6 files changed

+453
-5
lines changed

6 files changed

+453
-5
lines changed

docs/current_denoising/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Additional documentation for the library in `src/current_denoising/
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
ioutils
2+
====
3+
Additional documentation for the io utils in `current_denoising/`.
4+
5+
6+
read_clean_currents
7+
----
8+
This function is more complicated that `read_currents` (which just reads in a whole dat file
9+
and is for reading in noisy currents). The dat files containing the (clean) CMIP simulation
10+
hold the data for many different runs, models, and start years, so the correct one needs to be
11+
specified. This is done with the year/model/name parameters; the name is a special string (e.g.
12+
r1i1p1f1_gn).
13+
14+
It is assumed that the currents don't change much within 5 years, so we have model outputs at
15+
a granularity of every 5 years. The provided year in the metadata is the start year of the run.
16+
17+
#### r (realization_index)
18+
an integer (≥1) distinguishing among members of an ensemble of simulations that
19+
differ only in their initial conditions (e.g., initialized from different points
20+
in a control run). Note that if two different simulations were started from the
21+
same initial conditions, the same realization number should be used for both simulations.
22+
For example if a historical run with “natural forcing” only and another historical
23+
run that includes anthropogenic forcing were both spawned at the same point in a
24+
control run, both should be assigned the same realization. Also, each so-called
25+
RCP (future scenario) simulation should normally be assigned the same realization
26+
integer as the historical run from which it was initiated.
27+
This will allow users to easily splice together the appropriate historical and future runs.
28+
29+
#### i (initialization_index)
30+
an integer (≥1), which should be assigned a value of 1 except to distinguish
31+
simulations performed under the same conditions but with different initialization
32+
procedures. In CMIP6 this index should invariably be assigned the value “1”
33+
except for some hindcast and forecast experiments called for by the DCPP activity.
34+
The initialization_index can be used either to distinguish between different
35+
algorithms used to impose initial conditions on a forecast or to distinguish
36+
between different observational datasets used to initialize a forecast.
37+
38+
#### p (physics_index)
39+
an integer (≥1) identifying the physics version used by the model. In the
40+
usual case of a single physics version of a model, this argument should normally
41+
be assigned the value 1, but it is essential that a consistent assignment of
42+
physics_index be used across all simulations performed by a particular model.
43+
Use of “physics_index” is reserved for closely-related model versions (e.g., as
44+
in a “perturbed physics” ensemble) or for the same model run with slightly
45+
different parameterizations (e.g., of cloud physics). Model versions that are
46+
substantially different from one another should be given a different source_id”
47+
(rather than simply assigning a different value of the physics_index).
48+
49+
#### f (forcing_index)
50+
an integer (≥1) used to distinguish runs conforming to the protocol of a single
51+
CMIP6 experiment, but with different variants of forcing applied. One can, for
52+
example, distinguish between two historical simulations, one forced with the
53+
CMIP6-recommended forcing data sets and another forced by a different dataset,
54+
which might yield information about how forcing uncertainty affects the simulation.
55+
56+
#### Gridding
57+
A grid-label suffix is used to distinguish between the gridding conventions used:
58+
- grid_label = "gn" (output is reported on the native grid, usually but not invariably at grid cell centers)
59+
- grid_label = "gr" (output is not reported on the native grid, but instead is regridded by the modeling group to a “primary grid” of its choosing)
60+
- grid_label = “gm” (global mean output is reported, so data are not gridded)

pyproject.toml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ authors = [
99

1010
requires-python = ">=3.10"
1111
dependencies = [
12-
"nbdime>=4.0.2",
1312
]
1413

1514
[tool.uv]
@@ -27,22 +26,28 @@ core = [
2726
"ipykernel>=6.29.5",
2827
"jupyter>=1.1.1",
2928
"numpy>=2.2.6",
29+
"pandas>=2.3.2",
3030
]
3131

3232
# Test dependencies - a minimal set of dependencies that let us run the tests
3333
# This enables us to run tests without installing all the dependencies, which is much quicker
34+
# Basically this just gets us to avoid installing torch
35+
# Ideally we wouldn't repeat these things in the core dependencies
36+
# But i need to figure out how to do it better
3437
test = [
3538
"numpy>=2.2.6",
3639
"pytest>=8.4.1",
3740
"scikit-image>=0.25.2",
3841
"matplotlib>=3.10.3",
42+
"pandas>=2.3.2",
3943
]
4044

4145
# Formatting, linting, etc.
4246
# Useful for development, but not required to run the code
4347
dev = [
4448
"black>=25.1.0",
4549
"pylint>=3.3.7",
50+
"nbdime>=4.0.2",
4651
]
4752

4853
[build-system]

src/current_denoising/generation/ioutils.py

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,12 @@
33
"""
44

55
import pathlib
6+
from functools import cache
67

78
import numpy as np
9+
import pandas as pd
10+
11+
from ..plotting.maps import lat_long_grid
812

913

1014
class IOError(Exception):
@@ -43,6 +47,166 @@ def read_currents(path: pathlib.Path) -> np.ndarray:
4347
return np.flipud(data.reshape(shape))
4448

4549

50+
@cache
51+
def _read_clean_current_metadata(metadata_path: pathlib.Path) -> pd.DataFrame:
52+
"""
53+
Read the metadata file for the clean currents .dat file
54+
55+
:returns: metadata dataframe; model/name/year
56+
"""
57+
# First line is the number of models/runs
58+
with open(metadata_path, "r") as f:
59+
num_runs = int(f.readline().strip())
60+
df = pd.read_csv(f, names=["model", "name", "year"], sep="\s+")
61+
62+
if len(df) != num_runs:
63+
raise IOError(
64+
f"Metadata file {metadata_path} has {len(df)} rows, but first line says {num_runs}"
65+
)
66+
67+
return df
68+
69+
70+
def _coriolis_parameter(latitudes: np.ndarray) -> np.ndarray:
71+
"""
72+
Calculate the coriolis parameter at each latitude
73+
"""
74+
omega = 7.2921e-5
75+
torad = np.pi / 180.0
76+
77+
return 2 * omega * np.sin(latitudes * torad)
78+
79+
80+
def current_speed_from_mdt(mdt: np.ndarray) -> np.ndarray:
81+
"""
82+
Convert geodetic MDT to currents.
83+
84+
By assuming geostrophic balance, we can take the gradient of the MDT to get the steady-state
85+
currents.
86+
This requires us to work out the coriolis parameter at each latitude, and to take the gradient
87+
of the MDT.
88+
89+
:param mdt: the mean dynamic topography, in metres, covering the globe.
90+
91+
:returns: the current speed in m/s
92+
93+
"""
94+
g = 9.80665
95+
torad = np.pi / 180.0
96+
R = 6_371_229.0
97+
98+
# Find the grid spacing (in m)
99+
lats, longs = lat_long_grid(mdt.shape)
100+
dlat = np.abs(lats[1] - lats[0]) * torad * R
101+
dlong = (
102+
np.abs(longs[1] - longs[0]) * torad * R * np.cos(torad * lats)[:, np.newaxis]
103+
)
104+
105+
# Find the coriolis parameter at each latitude
106+
f = _coriolis_parameter(lats)
107+
108+
# Velocities are gradients * coriolis param for geostrophic balance
109+
dmdt_dlat = np.gradient(mdt, axis=0) / dlat
110+
dmdt_dlon = np.gradient(mdt, axis=1) / dlong
111+
112+
# u should be negative, but it doesnt matter for speed
113+
u = g / f[:, np.newaxis] * dmdt_dlat
114+
v = g / f[:, np.newaxis] * dmdt_dlon
115+
116+
return np.sqrt(u**2 + v**2)
117+
118+
119+
def read_clean_currents(
120+
path: pathlib.Path,
121+
metadata_path: pathlib.Path,
122+
*,
123+
year: int,
124+
model: str = "ACCESS-CM2",
125+
name: str = "r1i1p1f1_gn",
126+
) -> np.ndarray:
127+
"""
128+
Read clean current data from a .dat file.
129+
130+
Read a .dat file containing clean current data,
131+
given the model/name/year, returning a 720x1440 numpy array giving the current
132+
in m/s.
133+
Sets land grid points to np.nan.
134+
Since the clean current data is stored in a large file containing multiple years and models, we need
135+
to choose the correct one.
136+
137+
Notes on the name convention from the CMIP6 documentation can be found in docs/current_denoising/generation/ioutils.md,
138+
or in the original at https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk.
139+
140+
:param path: location of the .dat file; clean current data is located in
141+
data/projects/SING/richard_stuff/Table2/clean_currents/ on the RDSF
142+
:param metadata_path: location of the metadata .csv file describing the contents of the .dat file
143+
:param year: start of the 5-year period for which to extract data
144+
:param model: the climate model to use
145+
:param name: the model variant to use. Name follows the convention {realisation/initialisation/physics/forcing}_grid
146+
147+
:returns: a numpy array holding current speeds
148+
:raises ValueError: if the requested year/model/name is not found in the metadata
149+
:raises IOError: if the file is malformed, or has a different length to expected from the metadata
150+
151+
"""
152+
metadata = _read_clean_current_metadata(metadata_path)
153+
154+
# The dat file contains a header (record length), then the record, then a footer (record length)
155+
# We want to find the number of bytes to skip to get to the correct record, which
156+
# corresponds to the row number in the metadata file
157+
158+
# Find the row in the metadata file
159+
row = metadata[
160+
(metadata["year"] == year)
161+
& (metadata["model"] == model)
162+
& (metadata["name"] == name)
163+
]
164+
if len(row) == 0:
165+
raise ValueError(
166+
f"Could not find entry for {model=}, {name=}, {year=} in metadata"
167+
)
168+
if len(row) > 1:
169+
raise ValueError(
170+
f"Found multiple entries for {model=}, {name=}, {year=} in metadata"
171+
)
172+
173+
# This tells us how many records to skip
174+
row_index = row.index[0]
175+
176+
with open(path, "rb") as f:
177+
n_bytes_per_record = np.fromfile(f, dtype=np.int32, count=1)[0]
178+
179+
# Add the header + footer
180+
n_bytes_per_record += 8
181+
182+
# Check the file is the right size, based on the metadata
183+
expected_size = int(n_bytes_per_record) * len(metadata)
184+
f.seek(0, 2) # Seek to end of file
185+
actual_size = f.tell()
186+
if actual_size != expected_size:
187+
raise IOError(
188+
f"File size {actual_size} does not match expected {expected_size} from metadata"
189+
)
190+
191+
offset = row_index * n_bytes_per_record
192+
193+
f.seek(offset)
194+
header = np.fromfile(f, dtype=np.int32, count=1)[0]
195+
if header + 8 != n_bytes_per_record:
196+
raise IOError(
197+
f"Record length marker {header} does not match expected {n_bytes_per_record - 8}"
198+
)
199+
200+
retval = np.fromfile(f, dtype="<f4", count=header // 4)
201+
202+
retval[retval == -1.9e19] = np.nan
203+
204+
# Make it look right
205+
retval = np.flipud(retval.reshape((720, 1440)))
206+
207+
return current_speed_from_mdt(retval)
208+
209+
46210
def _included_indices(
47211
n_rows: int, tile_size: int, max_latitude: float
48212
) -> tuple[int, int]:

0 commit comments

Comments
 (0)