Skip to content

HDF5 2.0 Release Planning

nbagha1 edited this page May 24, 2025 · 4 revisions

These are the key topics and issues we plan to focus on for the next version of HDF5. For full transparency, we’re making an effort to track all publicly visible features (i.e., not internal refactoring) in GitHub, so that the community can follow the progress of the release.

The issues targeted for HDF5 2.0 are tracked under the 2.0 milestone, which defines both the timeline and scope of the release.

Issues are organized by software components and grouped under high-level parent issues, each representing a broader area of work. These parent issues contain child issues that break down the specific tasks involved, though not all child issues are targeted for inclusion in the 2.0 release.

NOTE: We’re using release gating to manage scope. Essential dependencies are tracked via the release-blocker gate, and our goal is to resolve nearly all of them before the release. Nice-to-have issues are tracked via the release-future gate and may be moved to release-deferred if de-scoping becomes necessary.

Major 2.0.0 changes

The biggest changes/features we'll be making to the library in HDF5 2.0 are:

Move to semantic versioning (https://semver.org/)

People have been asking for this for a long time and we get many complaints about our existing scheme. All future HDF5 versions will be major.minor.patch.

Cloud-optimized HDF5 is very important to us, and we try to get these tweaks and bugfixes out as soon as we can. HDF5 2.0.0 should support s3 URLs and AWS environment variables in the read-only S3 VFD. We are targeting addressing these specific issues for the 2.0 release.

We plan to revisit library defaults (e.g., cache sizes) to improve performance with cloud-optimized HDF5 and modern I/O hardware. For the 2.0 release, this includes adjusting the default lower bound for file format version in 2025.

We aim to address several floating-point data issues in the 2.0 release. Most critically, Nvidia’s NVHPC fails some of our long double tests, and we’ve had persistent problems with IBM POWER long double conversions. We’re also looking to expand predefined datatypes to include FP8 and both FP4 variants, which are increasingly important in machine learning. Building on our recent addition of IEEE float16, we also plan to introduce support for complex numbers. These specific issues have been tagged for the 2.0 release.

Maintaining two build systems is unnecessary overhead. Keeping the two systems in sync and reinventing the wheel when we have to perform more complicated testing takes non-trivial engineering resources. Starting in the 2.0.0 release, all Autotools files will be removed and we will no longer support it.

Since we're dropping Autotools support, it's imperative that CMake works well. We'll make a pass over both build systems to ensure that CMake does everything the Autotools do, simplify the build system code and revamp the install docs. We're also hoping to make the CMake-built compiler wrappers (e.g., h5cc) behave more like the Autotools. Here's the specific issues that are targeted for the 2.0 release.

Over the past few years, we've dramatically expanded our CI and we'll continue to do that for HDF5 2.0.0. We now report to my.cdash.org and you can see the output of our GitHub CI under the GHDaily heading, as well as test results from many HPC systems. Improvements over the next few months will include testing our develop branch with the development trunk of both OpenMPI and MPICH, testing HDF5 with the HighFive C++ wrapper, and adding missing configurations.

HDF5 is widely used on HPC systems, so we will continue to address bugs as they arise. These parallel HDF5 issues are specifically targeted for the 2.0 release and tagged accordingly.

There are several bugs in the CMake code that deals with building the compression filters that we'd like to fix. We'll also be improving the hdf5_plugins repo.

These are potential security problems, so we try to prioritize these for fixes. We are currently CVE-free and hope to have all oss-fuzz issues closed by the release date. We're also hoping to have the library be sanitizer-clean by the time we release, and then add CI checks to ensure it stays that way. We have tagged these specific issues for the 2.0 release.

These are also high-priority issues. We hope to have all of these addressed by the release date. The highest priority of these is the memory backed files copy issue. In addition, these specific issues are targeted for the 2.0 release and tagged accordingly.

Windows has long been a second-class citizen when it comes to HDF5, and it'd be nice to make that less so in HDF5 2.0. The most important issue to fix is our spotty ability to handle Unicode file names on Windows. Getting CI set up to ensure any fixes don't break in the future will be challenging, but we're hoping to do that in this release. We're also working on better support for MinGW, especially with MSYS2. We've also had code for a VFD based on Win32 API calls donated to us years ago, and it'd be nice to modernize that code, get it to pass our full CI, and get that into the library. These specific issues are targeted for the 2.0 release and tagged accordingly.

[Documentation]

Recognizing the importance of up-to-date documentation for our users, we have determined that documentation updates should not be dependent on formal software releases. ​Moving forward, users will have access to living documentation through dedicated links, with the expectation that content will be regularly updated, expanded, and refined independent of release cycles. We are working on adding a section to the UG that covers cloud-optimized HDF5 and an upgrading guide for people looking to move to a newer version of the library. As mentioned earlier, we also plan to rework our install docs

[Misc]

There are several issues that don't seem to fit into the above categories:

  • We'll look into moving to Fortran 2008 as that might help us cross-compile when also building the Fortran wrappers
  • For Java, we'll add wrapper functions for the direct chunk I/O functions.