Skip to content

Conversation

@fortnern
Copy link
Member

@fortnern fortnern commented Nov 2, 2025

Various related changes, including refactoring part of dataset creation, and reworking how layout versions are calculated.

Needs more testing of filters that create very large chunks, but that will need code to turn it off in cases where we can't allocate >4GiB buffers. I tested this manually by increasing the expansion ration in the expand2 test in dsets, and it passed everything up until it tried to expand the datasets with H5Dset_extent() and my laptop ran out of disk space.

We should add code and testing to handle the case where the "size of size" is set less than 8 bytes. This is not a new issue, since it can be set to 2.

I will file issues for these, but I don't think they are necessary for the release.


Important

Add support for 64-bit chunk dimensions, refactor dataset creation, and rework layout version calculations.

  • Behavior:
    • Support for 64-bit chunk dimensions added, removing 32-bit limitations in H5D.c, H5Dbtree.c, and H5Dbtree2.c.
    • New macro H5D_CHUNK_ENCODE_SIZE_CHECK added to H5Dchunk.c to check chunk size encoding.
    • Refactored dataset creation and layout version calculation in H5Dchunk.c and H5Dint.c.
  • Data Types:
    • Changed uint32_t to hsize_t for chunk sizes in H5Dchunk.c, H5Dbtree.c, and H5Dbtree2.c.
    • Updated function signatures to use hsize_t for chunk sizes in H5D.c and H5Dpkg.h.
  • Misc:
    • Updated warning flags in config/gnu-warnings/4.8 and config/gnu-warnings/cxx-4.8.
    • Removed H5D__layout_set_version and H5D__layout_set_latest_indexing from H5Dlayout.c.

This description was created by Ellipsis for 9a04ce5. You can customize this summary. It will automatically update as commits are pushed.

assert(dset->shared->layout.u.chunk.ndims > 0);

/* In this function, some of these sizes may have already been set since they are sometimes stored in the
* file. If this is the case, verify the calculated sizes match the stored sizes. */
Copy link
Member Author

@fortnern fortnern Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange thing I found - we are storing these things but not using the stored values. I added these file integrity checks here.


/* Calculate layout version */
/* Do not downgrade version */
version_req = layout->version;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function took over the functions of the deleted H5D__layout_set_latest_indexing(), and now handles the main layout version calculation. We must upgrade to at least version_req, and we will upgrade to version_perf if allowed by the bounds, and up to the low bound if not.

*-------------------------------------------------------------------------
*/
herr_t
H5D__layout_set_version(H5F_t *f, H5O_layout_t *layout)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality moved to the construct callbacks and H5D__create (for version bounds check)

*-------------------------------------------------------------------------
*/
herr_t
H5D__layout_set_latest_indexing(H5D_t *dset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality moved to H5D__chunk_construct()

HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "can't get low/high bounds from API context");

/* Determine version - only upgrade for chunked datasets that have a dimension size >32 bits */
if (H5D_CHUNKED == layout->type)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new layout property encoding also needs a test - I will file an issue

}

static int
run_tests(char *filename, hid_t fapl, fsizes_t testsize, int wrt_n)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added many test cases to this test

dcpl = H5Pcreate(H5P_DATASET_CREATE);
CHECK(dcpl, FAIL, "H5Pcreate");

/* Try to use chunked storage for this dataset */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now allowed. We could cap the high bound to test that it's rejected in that case but that's already tested elsewhere.

return FAIL;
} /* end test_simple_io() */

/*-------------------------------------------------------------------------
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this to test the code I wrote for scalar chunk datasets before I realized that was impossible. There was some existing code that seemed to imply it would work.

mattjala
mattjala previously approved these changes Nov 3, 2025
Copy link
Contributor

@mattjala mattjala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides clarifying questions, LGTM

src/H5Dpkg.h Outdated
tconv_buf_hsize += (PIECE_INFO)->piece_points * \
MAX((DINFO)->type_info.src_type_size, (DINFO)->type_info.dst_type_size); \
(IO_INFO)->tconv_buf_size = (size_t)tconv_buf_hsize; \
if (H5_UNLIKELY((hsize_t)(IO_INFO)->tconv_buf_size != tconv_buf_hsize)) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this detect when hsize_t itself overflows during the addition at line 113?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 64-bit systems where sizeof(hsize_t) == sizeof(size_t) == 8:
The addition overflows first, wrapping tconv_buf_hsize
The wrapped value gets cast to size_t and back
The comparison is wrapped_value != wrapped_value -> FALSE
Overflow goes undetected

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some constraint that prevents the multiplication from overflowing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!(PIECE_INFO)->in_place_tconv) {                                                                 \
            hsize_t tconv_buf_hsize;                                                                         \
            hsize_t old_size;                                                                                \
            H5_CHECKED_ASSIGN(tconv_buf_hsize, hsize_t, (IO_INFO)->tconv_buf_size, size_t);                  \
            old_size = tconv_buf_hsize;                                                                      \
            tconv_buf_hsize += (PIECE_INFO)->piece_points *                                                  \
                               MAX((DINFO)->type_info.src_type_size, (DINFO)->type_info.dst_type_size);      \
            if (H5_UNLIKELY(tconv_buf_hsize < old_size || tconv_buf_hsize > SIZE_MAX))                       \
                (IO_INFO)->tconv_buf_overflow = true;                                                        \
            else                                                                                             \
                (IO_INFO)->tconv_buf_size = (size_t)tconv_buf_hsize;                                        \
 }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably wouldn't handle a 64 bit value overflowing correctly, but I'm sure this would be far from the only place in the library that does so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time we need to start worrying about that we'll be doing calculations in 128 bit and checking for overflow that way

Copy link
Collaborator

@brtnfld brtnfld Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true now that we have complex support? So you don't expect multi-exabyte buffer accumulations until 128-bit systems? the check is two lines, it does not seem like a big deal to check it.

Copy link
Member Author

@fortnern fortnern Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow re: complex support. I suppose checks like this could potentially guard against wildly invalid input data. I'll add it if you want me to, just be aware that there are many other places that don't guard against uin64 overflow (I'm not sure I've seen anywhere that does), so the check will likely have no effect until we add checks in most of those other places as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check added

mattjala
mattjala previously approved these changes Nov 3, 2025
HGOTO_ERROR(H5E_DATASET, H5E_CANTFILTER, FAIL, "output pipeline failed");
#if H5_SIZEOF_SIZE_T > 4
/* Check for the chunk expanding too much to encode in a 32-bit value */
if (nbytes > ((size_t)0xffffffff))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment above, just highlighting a few places for review in case they also need a H5D_CHUNK_ENCODE_SIZE_CHECK check. Here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary here because the check will be performed when the chunk is allocated on disk in H5D__chunk_file_alloc


#if H5_SIZEOF_SIZE_T > 4
/* Check for the chunk expanding too much to encode in a 32-bit value */
if (nbytes > ((size_t)0xffffffff))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

HGOTO_ERROR(H5E_PLINE, H5E_CANTFILTER, H5_ITER_ERROR, "output pipeline failed");
#if H5_SIZEOF_SIZE_T > 4
/* Check for the chunk expanding too much to encode in a 32-bit value */
if (nbytes > ((size_t)0xffffffff))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Collaborator

@jhendersonHDF jhendersonHDF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, just a few comments around potential checks needed

jhendersonHDF
jhendersonHDF previously approved these changes Nov 4, 2025
mattjala
mattjala previously approved these changes Nov 4, 2025
@fortnern fortnern dismissed stale reviews from mattjala and jhendersonHDF via 204b0f9 November 4, 2025 15:49
@fortnern
Copy link
Member Author

fortnern commented Nov 4, 2025

I realized the H5D_CHUNK_ENCODE_SIZE_CHECK in H5D__chunk_direct_write is also redundant so I removed that too. I'll leave the macro for now even though it's only used in H5D__chunk_file_alloc

@lrknox lrknox merged commit 1093a18 into HDFGroup:develop Nov 4, 2025
95 checks passed
@github-project-automation github-project-automation bot moved this from To be triaged to Done in HDF5 - TRIAGE & TRACK Nov 4, 2025
@fortnern fortnern deleted the 64bit_chunks branch November 14, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants