-
Notifications
You must be signed in to change notification settings - Fork 977
Skip decompression of pruned parquet pages #20192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.12
Are you sure you want to change the base?
Skip decompression of pruned parquet pages #20192
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
return 0; | ||
} | ||
|
||
// If this page is pruned and has a list parent, set the batch size for this depth to 0 to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this what we added in (#20086) since we are taking care of page sizes (and batch sizes) in compute_page_sizes_kernel
instead.
thrust::make_counting_iterator<size_t>(key_start), | ||
thrust::make_counting_iterator<size_t>(key_start + num_keys_this_iter), | ||
size_input.begin(), | ||
get_page_nesting_size{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
page_mask.data()
not needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two if
blocks need special attention while reviewing. The rest is just trivial stuff
return; | ||
} | ||
|
||
if (page_mask.size() and not page_mask[page_idx]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvdbaranec @pmattione-nvidia please review this if block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment.
Description
Follow up of #20086 and #19986.
This PR enables skipping decompression of parquet data pages marked as pruned in the new experimental parquet reader. This PR also zeros out nesting size information (used to allocate output buffers) for pruned pages right when it's being computed instead of resetting it later-on just before buffer allocation in (#20086).
Checklist