Skip to content

Conversation

mhaseeb123
Copy link
Member

@mhaseeb123 mhaseeb123 commented Sep 23, 2025

Description

Follow up of #19986.

This PR reduces the output column buffer sizes needed to materialize columns with a list parent such as list<list<...>>, list<str>, list<list<..<str>..>> etc. against pruned parquet pages in the next-gen reader. By doing so, we also eliminate non-empty nulls across list hierarchies speeding up their materialization.

Checklist

Copy link

copy-pr-bot bot commented Sep 23, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 23, 2025
@mhaseeb123 mhaseeb123 added 2 - In Progress Currently a work in progress tests Unit testing for project cuIO cuIO issue strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 23, 2025
return std::pair{std::move(table), std::move(buffer)};
}

/**
Copy link
Member Author

@mhaseeb123 mhaseeb123 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this simply moved as is from hybrid_scan_test.cpp. No need to review

return cudf::test::strings_column_wrapper(elements, elements + num_ordered_rows);
}

std::unique_ptr<cudf::table> concatenate_tables(std::vector<std::unique_ptr<cudf::table>> tables,
Copy link
Member Author

@mhaseeb123 mhaseeb123 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this simply moved as is from hybrid_scan_test.cpp. No need to review

@mhaseeb123 mhaseeb123 added the 3 - Ready for Review Ready for review by team label Sep 29, 2025
@mhaseeb123 mhaseeb123 marked this pull request as ready for review September 29, 2025 22:04
@mhaseeb123 mhaseeb123 requested a review from a team as a code owner September 29, 2025 22:04
* @param page_mask Page mask indicating if this column needs to be decoded
* @param min_rows crop all rows below min_row
* @param num_rows Maximum number of rows to read
* other settings and records the result in the PageInfo::str_bytes_all field
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale comments

@mhaseeb123 mhaseeb123 changed the title Reduce output buffer sizes for pruned pages of compound columns with a list parent Reduce output buffer sizes for pruned pages of columns with a list parent Sep 29, 2025
@mhaseeb123 mhaseeb123 added 4 - Needs Review Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Sep 30, 2025
mhaseeb123 and others added 2 commits October 7, 2025 18:25
Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
@mhaseeb123
Copy link
Member Author

pre-commit.ci autofix

@mhaseeb123 mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond labels Oct 8, 2025
@mhaseeb123
Copy link
Member Author

/ok to test a471b6e

@mhaseeb123
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 805043e into rapidsai:branch-25.12 Oct 8, 2025
132 checks passed
@mhaseeb123 mhaseeb123 deleted the fea/reduce-output-buffer-sizes-for-pruned-pages branch October 8, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python) tests Unit testing for project
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants