Home

Jump to bottom

Hyo-Kyung Lee edited this page Jan 11, 2023 · 21 revisions

Kerchunk Study

Table of Contents

Kerchunk to DMR++

DMR++ to Kerchunk

Introduction

This Wiki is for documenting the Kerchunk Study. We studied the Kerchunk using a few sample NASA Earthdata HDF5 files. We also studied the feasibility of converting Kerchunk to and from OPeNDAP Hyrax DMR++.

Kerchunk

Kerchunk is a derived work based on RFC 7233 (2014), HDF4 File Content Map Writer (2016), and Cloudydap (2017).

Thus, it has many similarities with OPeNDAP Hyrax DMR++ that the Cloudydap project produced. Both rely on HDF5 API calls to get offset/length information. Kerchunk obtains such information through high-level h5py Python calls.

Although the basic idea is same, there are a few differences between them. The following table summarizes key differences.

Workflow	Kerchunk	OPeNDAP
source	HDF5/netCDF/grib/fits	HDF5
language	Python	C/C++
conversion	zarr	dmr++
output	json	xml
aggregation	fsspec+multizarr API	NcML
subchunk	Yes	n/a

The Kerchunk development is still active. It has some issues with NASA HDF5/netCDF-4 data products. See Kerchunk for the details.

Reading NASA data through (nc)Zarr/xarray also has some interoperability issues. For example, an xarray-based DataTree reports an error in reading a NASA HDF5 data product. See DataTree for the details. Unidata ncZarr can't read Kerchunk file.

DMR++

The dmrpp_module can serve NASA HDF5 data products robustly. However, pydap client has some issues. See DMR++ for the details.

DMR++ to Kerchunk

We studied the feasibility of converting Kerchunk to DMR++. See DMR++ to Kerchunk for the details.

Kerchunk to DMR++

We also studied the feasibility of converting Kerchunk to DMR++. See Kerchunk to DMR++ for the details.