Skip to content

[hdf5] Improve h5_extended_fuzzer coverage#15309

Open
mch-byte wants to merge 2 commits into
google:masterfrom
mch-byte:improve-hdf5-fuzzer
Open

[hdf5] Improve h5_extended_fuzzer coverage#15309
mch-byte wants to merge 2 commits into
google:masterfrom
mch-byte:improve-hdf5-fuzzer

Conversation

@mch-byte

@mch-byte mch-byte commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

This PR improves the h5_extended_fuzzer for the HDF5 project. The existing fuzzer uses hardcoded dataset/attribute names that never match real file contents, has no input size limit (causing OOM-related coverage degradation on ClusterFuzz), and does not suppress log output.

Changes

  • Replace hardcoded "dsetname"/"theattr" with actual object iteration
  • Add bounded H5Dread to exercise chunk I/O, compact I/O, decompression, and type conversion
  • Add H5Dget_create_plist for filter pipeline and fill value parsing
  • Add H5Topen2 for committed datatype coverage
  • Add attribute iteration on both groups and datasets
  • Add H5Gget_info for group metadata paths
  • Add LLVMFuzzerInitialize with H5Eset_auto2 log suppression
  • Add input size bounds (8 to 64KB) and recursion depth limit (8)
  • Add h5_extended_fuzzer.dict with 38 HDF5 format signatures
  • Add h5_extended_fuzzer.options with max_len=65536
  • Update build.sh to copy dict and options to $OUT

Results

This improved fuzzer found a heap-buffer-overflow in H5D__compact_readvv (H5Dcompact.c:409) which has been reported to the HDF5 maintainers via GitHub Security Advisory.

Current OSS-Fuzz coverage for h5_extended_fuzzer is 9.36% and degrading due to OOM. This fuzzer stabilizes coverage and reaches substantially more code paths including chunk read, compact read, type conversion, filter pipeline, and committed datatype handling.

- Replace hardcoded dataset/attribute names with actual object
  iteration via H5Gget_num_objs/H5Gget_objname_by_idx
- Add bounded H5Dread to exercise chunk I/O, compact I/O,
  decompression, and type conversion paths
- Add H5Dget_create_plist for filter pipeline and fill value parsing
- Add H5Topen2 for committed datatype coverage
- Add attribute iteration on both groups and datasets
- Add H5Gget_info for group metadata paths
- Add LLVMFuzzerInitialize with H5Eset_auto2 log suppression
- Add input size bounds (8 to 64KB) to prevent OOM on crafted headers
- Add recursion depth limit (8) to prevent stack overflow
- Add unlink() cleanup of temp file
- Add h5_extended_fuzzer.dict with 38 HDF5 format signatures
- Add h5_extended_fuzzer.options with max_len=65536
- Update build.sh to copy dict and options to $OUT

This improved fuzzer found a heap-buffer-overflow in
H5D__compact_readvv (H5Dcompact.c:409) which has been reported
to the HDF5 maintainers via GitHub Security Advisory.
@google-cla

google-cla Bot commented Apr 6, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions

github-actions Bot commented Apr 6, 2026

Copy link
Copy Markdown

mch-byte is a new contributor to projects/hdf5. The PR must be approved by known contributors before it can be merged. The past contributors are: arthurscchan, hunsche, hyoklee, maflcko, derobins, DonggeLiu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant