Skip to content

Conversation

@teo-tsirpanis
Copy link
Member

This PR builds on top of #5646, and updates the fragment metadata format specification to an extensible design that will not require another versin bump for future additions.


TYPE: NO_HISTORY

@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-415-implement-extensible-fragment-metadata-footer branch from 9d3c0ec to 17a1a24 Compare November 13, 2025 20:09
@ypatia ypatia requested a review from rroelke November 17, 2025 13:44
case DataDirectoryIdentifier::tile_global_order_offsets:
if (data_size != 2 * array_schema_->dim_num() * sizeof(uint64_t)) {
throw FragmentMetadataStatusException(
"Invalid size of tile global order bound offsets data directory");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add expected/found here so we can potentially cut a round trip out of the bug report.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error would happen only if the fragment metadata file was corrupted. If somebody got it, just the expected/found sizes would not help us with diagnosing it, and we would need the whole array and/or a repro either way.

serializer.write(DataDirectoryIdentifier::tile_global_order_offsets);
const auto num_dims = array_schema_->dim_num();
serializer.write<uint32_t>(2 * num_dims * sizeof(uint64_t));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
iassert(gt_offsets_.tile_global_order_min_offsets_.size() == 2 * num_dims);
iassert(gt_offsets_.tile_global_order_max_offsets_.size() == 2 * num_dims);

and then pass the size() value to write instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (modified).

}

/** has_tile_global_order_bounds accessor */
bool& has_tile_global_order_bounds() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks so odd... I see it's because the serializer requires friend access without being a friend. I've seen similarly odd non-const accessors in other places. I feel like that was the wrong paradigm to use.

I'm not suggesting anything actionable here, just musing

Copy link
Member

@rroelke rroelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good but I am still having a hard time with the terminology "data directory" and I do think we should pick something else. And yet this is why naming things is the second hard thing in computer science.

What comes to mind is "semi-structured data". This work turns the fragment metadata footer from a highly structured binary file to a semi-structured one.

Fragment metadata can contain semi-structured attributes.

Or "self-describing data" where the entries have a key that tells you what they are. This feels more apt.

Fragment metadata footers can contain self-describing attributes.

Or even just optional, since all fields are optional going forward with this design:

Fragment metadata footers can contain any number of optional attributes which provide additional metadata.

@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-415-implement-extensible-fragment-metadata-footer branch from 63e346f to d26a8af Compare December 8, 2025 12:51
Copy link
Member Author

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed "data directory" to "optional section".

serializer.write(DataDirectoryIdentifier::tile_global_order_offsets);
const auto num_dims = array_schema_->dim_num();
serializer.write<uint32_t>(2 * num_dims * sizeof(uint64_t));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (modified).

@teo-tsirpanis teo-tsirpanis requested a review from rroelke December 8, 2025 12:51
@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review December 8, 2025 12:52
@teo-tsirpanis
Copy link
Member Author

Merging per offline discussion…

@teo-tsirpanis teo-tsirpanis merged commit eb415af into rr/core-321-fragment-metdata-global-order-bounds Dec 12, 2025
2 of 3 checks passed
@teo-tsirpanis teo-tsirpanis deleted the teo/core-415-implement-extensible-fragment-metadata-footer branch December 12, 2025 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants