-
-
Notifications
You must be signed in to change notification settings - Fork 290
2025
💻 Zoom link: https://us06web.zoom.us/j/89601195963
📆 Meeting calendar invite.
📦 Timebox stick to it!
Note
Please provide time estimates for each agenda item.
Agenda items must be submitted at least 48 hours prior to the meeting.
- Facilitator/time-keeper: Scot
- Note-taker/Editor: AI/Scot
- A Manifesto for the Future of HDF document will be presented in a follow-up meeting (Gerd).
- Aleksandar will set up a filter working subgroup to discuss further next steps; Quincey was interested in contributing.
- Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework. (Quincey)
- Neil to present an RFC covering fine-grained capability reporting by HDF5 VOL connectors (15 min)
- []
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
- Facilitator/time-keeper: Gerd Heber
- Note-taker/Editor: AI/Scot
-
Scot will check whether collaborators can create branches within the HDF Group organization on GitHub.
✅ Collaborators can create branches.
-
The HDF Group will present its vision for community collaboration at the next meeting.
❎ A draft document is under review and will be presented in a follow-up meeting.
-
The HDF Group will provide an update on the HEP (HDF5 Enhancement Proposal) infrastructure at the next meeting.
✅ Added to agenda
-
Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework.
-
Review Meeting Etiquette (Gerd, 4min)
-
Meeting etiquette - key points
- Many of us attend plenty of ineffective meetings. Let this not be one of them!
- Read & improve the etiquette! Don't like something? => Come back next time & discuss!
- The default facilitator is The HDF Group's Sustaining Engineer of the Week, but volunteers are always welcome. Just pencil in your name!
- Be present and respectful: (ChatGPT's impression)
- ✋ Raise your hand to speak, and the facilitator will call on people in the order in which they raise their hands, but may alter that based on who has not spoken recently or to follow a thread.
- Practice makes perfect. Let's try this!
-
Discussion/Resolution on non-standard naming conventions for floating point data types. See PR for the motivation behind the discussion. (Jordan, 14min)
-
Future role for HDFGroup/hdf5_plugins filter plugin repository (@ajelenak, 19 min)
-
Quick preview of initial HDF5 Extensions Proposal (HEP) framework (@ajelenak, 9 min)
-
Review Monte Carlo testing of H5FL package, in [PR] (https://github.com/HDFGroup/hdf5/pull/5195). (Quincey, 14 min)
The meeting covered various topics, including meeting etiquette and naming schemes in HDF5 data types. There were also discussions about managing HDF5 plugins, filters, and repositories, focusing on improving accessibility and maintenance across different platforms. Finally, the group discussed new approaches for publishing enhancement proposals and implementing thread-safe mechanisms for memory allocation.
Gerd was the facilitator of the meeting. He emphasized the importance of meeting etiquette and encouraged everyone to review and improve it. Gerd also encouraged volunteers to take on the role of facilitator to gain experience.
Jordan proposes a new naming scheme for predefined data types in the library, focusing on floating point formats used in machine learning. The proposal includes adding a leading type class specifier to identify the data type immediately. Quincey suggests refining the scheme to include type class, qualifier, endianness, and size. The group discusses the challenges of naming non-standard types and the potential need to deprecate them in the future. They consider whether to seek broader input on the forum but also express concerns about diluting decision-making.
Aleksandar then spoke about the future role of the HDF Group's HDF5 filter plugins repository, and the need for better management. He expressed concerns about the accessibility of HDF5 filters and the need for more proactive roles in providing these filters. Aleksandar also mentioned the issues raised by the ZFP filter developers were similar.
Aleksandar discussed the role of the HDF Group and the ecosystem of filters. Scot suggested that the GitHub repository should use submodules rather than copies. Elena reminded the group of the community's desire for the most useful filters to be built into the library. Allen clarified that some of the filter plugins do not have separate repositories for filter plugins. Aleksandar raised the issue of the relationship between the filter and the filter plugin, and Allen confirmed that the identifiers are for the filter plugin, not the compression filter.
Aleksandar discussed the challenges of maintaining and building libraries for various platforms, highlighting the exhaustion of volunteers in package repositories. He suggested that the repository should include plugins only if their maintainers are willing to fix any issue discovered by HDFG's testing across various compilers and platforms. He also proposed that the repository managers should not be solely responsible for fixing plugin issues. Elena agreed with Aleksandar's points and suggested that simplifying the process could be beneficial. Quincey expressed interest in helping with the issue and suggested setting up a subgroup to address it.
The meeting discussed issues surrounding the repository, including what should be included as a submodule and who is responsible for fixing issues on certain compilers and platforms. They also discussed the need to make the repository more accessible to its user base, particularly for the Python ecosystem of data science. Elena suggested defining the purpose of the repository first before making decisions. Allen proposed including the repository in the HDF5 build process instead of building from it separately. The team also discussed the possibility of creating a CMake preset for Conda Forge to simplify the process.
Aleksandar presented a new approach for publishing HEP, focusing on web publishing rather than PDFs or Word files. He introduced the technology called MyST, which is based on the Markdown text markup format which is easy for people to adopt. The goal is to enable high-quality web-published proposals. He also mentioned that the technology comes from the Jupiter book publishing community, which aims to make Jupyter notebooks a first-class scientific publishing format.
Aleksandar presented a framework for managing proposals, which he believes is user-friendly and doesn't require complex technical skills. He suggested hosting the proposals on Github Pages for easy access. Elena expressed concerns about creating barriers for users, indicating that proposals should be made public and easily commentable.
Quincey discussed implementing a thread-safe mechanism for memory allocation and deallocation, using a free list to reallocate similar-sized memory quickly. He sought feedback on his approach, which involves generating test vectors of operations and executing them a million times to identify failures. Jordan suggested that exhaustive testing is becoming more feasible with modern computing power. Neil and Gerd provided additional insights and suggestions. The team agreed to continue refining the testing approach.
- A Manifesto for the Future of HDF document will be presented in a follow-up meeting (Gerd).
- Aleksandar will set up a filter working subgroup to discuss further next steps; Quicney was interested in contributing.
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
- Facilitator/time-keeper: Neil Fortner
- Note-taker/recorder: Scot Breitenfeld/AI
(Presumably, none.)
- Changes to the agenda?
- Ideas for how to enable community collaboration (Quincey,20 min)
- Branch management, technical discussions, etc.
- Possibly create Github org, with HEPs and other collaborations
- NVIDIA roadmap collaboration opportunities (Quincey, 40 min)
- Accelerator-enabled I/O operations
- Sharded storage
Quick Recap
The HDF5 Working Group meeting focused on the potential for establishing a separate organization for community discussions and collaborations and the need for increased community involvement in two upcoming NVIDIA projects. The meeting also explored the current CPU-GPU node architecture, emphasizing the importance of concentrating on the accelerator components and the potential of implementing an actual metadata database for quicker operations. Additionally, the group discussed the design of the HDF5 architecture, the advantages of the Zarr format over HDF5, and two technical proposals for GPU accelerators: storage and sharded storage.
Summary
HDF5 Working Group Meeting Agenda
Neil, the facilitator, shared the agenda and invited any additions. Scot informed the group about two methods for receiving meeting cancellation notices: via the mailing list and calendar invites. He recommended using these automated methods instead of relying on forum posts for updates.
Neutral Platform for Community Discussions
Quincey proposed creating a separate, broader, and independent organization for community discussions and collaborations, suggesting it could serve as a neutral space not explicitly organized by the HDF Group. He highlighted that this could benefit Nvidia and AMD collaborations, managing branches, and reviewing documents. Steve questioned the necessity of this organization apart from the HDF Group, while Gerd raised concerns about its scope and neutrality. Quincey emphasized the need for a discussion platform that could operate independently from HDF Group meetings. Gerd suggested that the HDF Group could be a neutral platform for such discussions.
Revenue Generation for HDF Group
Scot and Steve discussed the importance of generating revenue for the HDF Group, with Steve expressing concern about the time and resources spent on initiatives that do not produce revenue. Scot emphasized the need for a stable outlook on HDF5 and their services and the potential for increased revenue through outreach. Quincey suggested that the HDF Group allow collaborators to create branches, which Scot agreed to investigate. The conversation concluded with Quincey seeking clarity on Steve's comments regarding the HDF Group's responsibilities.
Increased Community Involvement in Projects
Quincey discussed the need for greater community involvement in two projects: one focusing on accelerator native storage and the other on sharded storage. He desired increased participation from management and other community members in these projects. Quincey mentioned that he would begin inviting people to participate if necessary, and he planned to wrap up multi-threading discussions in the next couple of weeks.
New GPU Architecture for Data Transfer
Quincey spoke about the current CPU-GPU node architecture, where data is cached in CPU memory before being transferred to the GPU. He proposed a new architecture where the GPU would handle most operations, with data transferred in and out more efficiently. Quincey emphasized the importance of establishing a vendor-neutral mechanism for GPU-related tasks and encouraged participation from others. He also suggested integrating this architecture with MPI for collective I/O operations, proposing that type conversion could be incorporated into the new I/O pipeline.
Improving HDF5 Compatibility and Design
Quincey stressed the need to focus on the accelerator components and enhance the techniques introduced by Zarr to ensure HDF5 compatibility with POSIX and object stores. He proposed a design that includes a directory resembling a container, sharding out the dataset storage and utilizing databases for metadata management. He encouraged feedback from a variety of stakeholders to enhance the design. Aleksandar questioned how this approach differs from Zarr and the existing HDF5 schema. Quincey clarified that there are indeed distinctions.
Metadata Database for Faster Operations
Quincey discussed using an actual metadata database for faster operations, which could provide an advantage over Zarr. Aleksandar agreed with Quincey’s points but emphasized the importance of understanding and accessibility for scientists. He expressed concern about HDF5's complexity and the lack of alternatives, stating that these factors should be considered in their decision-making processes.
GPU Memory Storage and Compatibility During the meeting, Joe Lee and Quincey discussed the potential for storing key-value pairs in GPU memory. Quincey confirmed this is feasible but stressed the importance of an abstract and pluggable interface. They also discussed using Nvidia's compression algorithm, nvCOMP. Joe Lee asked about the openness of the instruction set for the H200 chip, to which Quincey admitted he did not know the answer. Furthermore, they discussed the need for a vendor-neutral interface between HDF5 and Nvidia GPUs, as well as the use of NVIDIA GPUDirect® Storage (GDS) APIs to communicate with them.
HDF5 Architecture and Acceleration Discussion Quincey presented the design of the HDF5 architecture, emphasizing the significance of source and destination data buffers on accelerators. He proposed a vendor-neutral approach to enhance performance and suggested collaborating with the HDF5 GPU VFD. Joe Lee inquired about benchmarking HDF5 GPU VFD against other I/O libraries using an AI application that Nvidia can showcase at GTC 2026; Quincey responded that the key metric is the acceleration of I/O for HDF5-based applications. He noted that his components demonstrate improvements, although the final product is not yet built. Gerd sought clarification regarding Alexander's remark about scientists understanding storage concepts, and Alexander explained that these individuals are early adopters of storage software.
Zarr’s Advantages Over HDF5 Aleksandar outlined the advantages of the Zarr format compared to HDF5, highlighting its simplicity and ease of implementation. He pointed out that scientists have adopted Zarr's direct implementation in various programming languages. Aleksandar also mentioned that Zarr is now adding features, such as chunks in a file, which were previously lacking. Quincey proposed that the interface for interacting with the metadata database should enable interaction with a JSON plain text metadata file, which could serve as another plugin for the metadata.
On-Node Storage and Sharded Proposals Quincey introduced two technical proposals regarding GPU storage and sharded storage, seeking interest from participating organizations. Aleksandar expressed interest in the sharded proposal, while Neil indicated an interest in both proposals but highlighted funding constraints. Steve from Lifeboat raised concerns about aligning community and commercial interests. Quincey plans to begin sketching designs for the proposals but mentioned that thread safety work is currently consuming his time. The group agreed to reconvene in two weeks to continue the discussion.
- [] Scot will check whether collaborators can create branches within the HDF Group organization on GitHub.
- [] The HDF Group will present its vision for community collaboration at the next meeting.
- [] The HDF Group will provide an update on the HEP (HDF5 Enhancement Proposal) process at the next meeting.
- [] Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework.