-
-
Notifications
You must be signed in to change notification settings - Fork 314
Add support for 64 bit chunks, including 64 bit chunk dimensions #5965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| assert(dset->shared->layout.u.chunk.ndims > 0); | ||
|
|
||
| /* In this function, some of these sizes may have already been set since they are sometimes stored in the | ||
| * file. If this is the case, verify the calculated sizes match the stored sizes. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strange thing I found - we are storing these things but not using the stored values. I added these file integrity checks here.
|
|
||
| /* Calculate layout version */ | ||
| /* Do not downgrade version */ | ||
| version_req = layout->version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function took over the functions of the deleted H5D__layout_set_latest_indexing(), and now handles the main layout version calculation. We must upgrade to at least version_req, and we will upgrade to version_perf if allowed by the bounds, and up to the low bound if not.
…5 if allowed by low bound.
| *------------------------------------------------------------------------- | ||
| */ | ||
| herr_t | ||
| H5D__layout_set_version(H5F_t *f, H5O_layout_t *layout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality moved to the construct callbacks and H5D__create (for version bounds check)
| *------------------------------------------------------------------------- | ||
| */ | ||
| herr_t | ||
| H5D__layout_set_latest_indexing(H5D_t *dset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality moved to H5D__chunk_construct()
| HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "can't get low/high bounds from API context"); | ||
|
|
||
| /* Determine version - only upgrade for chunked datasets that have a dimension size >32 bits */ | ||
| if (H5D_CHUNKED == layout->type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new layout property encoding also needs a test - I will file an issue
| } | ||
|
|
||
| static int | ||
| run_tests(char *filename, hid_t fapl, fsizes_t testsize, int wrt_n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added many test cases to this test
| dcpl = H5Pcreate(H5P_DATASET_CREATE); | ||
| CHECK(dcpl, FAIL, "H5Pcreate"); | ||
|
|
||
| /* Try to use chunked storage for this dataset */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now allowed. We could cap the high bound to test that it's rejected in that case but that's already tested elsewhere.
| return FAIL; | ||
| } /* end test_simple_io() */ | ||
|
|
||
| /*------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this to test the code I wrote for scalar chunk datasets before I realized that was impossible. There was some existing code that seemed to imply it would work.
mattjala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides clarifying questions, LGTM
src/H5Dpkg.h
Outdated
| tconv_buf_hsize += (PIECE_INFO)->piece_points * \ | ||
| MAX((DINFO)->type_info.src_type_size, (DINFO)->type_info.dst_type_size); \ | ||
| (IO_INFO)->tconv_buf_size = (size_t)tconv_buf_hsize; \ | ||
| if (H5_UNLIKELY((hsize_t)(IO_INFO)->tconv_buf_size != tconv_buf_hsize)) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this detect when hsize_t itself overflows during the addition at line 113?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 64-bit systems where sizeof(hsize_t) == sizeof(size_t) == 8:
The addition overflows first, wrapping tconv_buf_hsize
The wrapped value gets cast to size_t and back
The comparison is wrapped_value != wrapped_value -> FALSE
Overflow goes undetected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some constraint that prevents the multiplication from overflowing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!(PIECE_INFO)->in_place_tconv) { \
hsize_t tconv_buf_hsize; \
hsize_t old_size; \
H5_CHECKED_ASSIGN(tconv_buf_hsize, hsize_t, (IO_INFO)->tconv_buf_size, size_t); \
old_size = tconv_buf_hsize; \
tconv_buf_hsize += (PIECE_INFO)->piece_points * \
MAX((DINFO)->type_info.src_type_size, (DINFO)->type_info.dst_type_size); \
if (H5_UNLIKELY(tconv_buf_hsize < old_size || tconv_buf_hsize > SIZE_MAX)) \
(IO_INFO)->tconv_buf_overflow = true; \
else \
(IO_INFO)->tconv_buf_size = (size_t)tconv_buf_hsize; \
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably wouldn't handle a 64 bit value overflowing correctly, but I'm sure this would be far from the only place in the library that does so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the time we need to start worrying about that we'll be doing calculations in 128 bit and checking for overflow that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that true now that we have complex support? So you don't expect multi-exabyte buffer accumulations until 128-bit systems? the check is two lines, it does not seem like a big deal to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow re: complex support. I suppose checks like this could potentially guard against wildly invalid input data. I'll add it if you want me to, just be aware that there are many other places that don't guard against uin64 overflow (I'm not sure I've seen anywhere that does), so the check will likely have no effect until we add checks in most of those other places as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check added
| HGOTO_ERROR(H5E_DATASET, H5E_CANTFILTER, FAIL, "output pipeline failed"); | ||
| #if H5_SIZEOF_SIZE_T > 4 | ||
| /* Check for the chunk expanding too much to encode in a 32-bit value */ | ||
| if (nbytes > ((size_t)0xffffffff)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment above, just highlighting a few places for review in case they also need a H5D_CHUNK_ENCODE_SIZE_CHECK check. Here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary here because the check will be performed when the chunk is allocated on disk in H5D__chunk_file_alloc
|
|
||
| #if H5_SIZEOF_SIZE_T > 4 | ||
| /* Check for the chunk expanding too much to encode in a 32-bit value */ | ||
| if (nbytes > ((size_t)0xffffffff)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
| HGOTO_ERROR(H5E_PLINE, H5E_CANTFILTER, H5_ITER_ERROR, "output pipeline failed"); | ||
| #if H5_SIZEOF_SIZE_T > 4 | ||
| /* Check for the chunk expanding too much to encode in a 32-bit value */ | ||
| if (nbytes > ((size_t)0xffffffff)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
jhendersonHDF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, just a few comments around potential checks needed
H5D_INIT_PIECE_TCONV macro
|
I realized the H5D_CHUNK_ENCODE_SIZE_CHECK in H5D__chunk_direct_write is also redundant so I removed that too. I'll leave the macro for now even though it's only used in H5D__chunk_file_alloc |
explanatory comment
Various related changes, including refactoring part of dataset creation, and reworking how layout versions are calculated.
Needs more testing of filters that create very large chunks, but that will need code to turn it off in cases where we can't allocate >4GiB buffers. I tested this manually by increasing the expansion ration in the expand2 test in dsets, and it passed everything up until it tried to expand the datasets with H5Dset_extent() and my laptop ran out of disk space.
We should add code and testing to handle the case where the "size of size" is set less than 8 bytes. This is not a new issue, since it can be set to 2.
I will file issues for these, but I don't think they are necessary for the release.
Important
Add support for 64-bit chunk dimensions, refactor dataset creation, and rework layout version calculations.
H5D.c,H5Dbtree.c, andH5Dbtree2.c.H5D_CHUNK_ENCODE_SIZE_CHECKadded toH5Dchunk.cto check chunk size encoding.H5Dchunk.candH5Dint.c.uint32_ttohsize_tfor chunk sizes inH5Dchunk.c,H5Dbtree.c, andH5Dbtree2.c.hsize_tfor chunk sizes inH5D.candH5Dpkg.h.config/gnu-warnings/4.8andconfig/gnu-warnings/cxx-4.8.H5D__layout_set_versionandH5D__layout_set_latest_indexingfromH5Dlayout.c.This description was created by
for 9a04ce5. You can customize this summary. It will automatically update as commits are pushed.