Skip to content

Conversation

@rahil-c
Copy link
Contributor

@rahil-c rahil-c commented Oct 22, 2025

Describe the issue this Pull Request addresses

Feature: #14127
Discussion: #14128

Note (this is the prior writer side pr #14131, would recommend viewing that first before viewing this pr)

Goal: We should be able to construct a reader that takes in a Lance file and returns an iterator of InternalRows. This reader should also be able to filter down to a required set of columns for efficiently parsing the row keys from the file
Interface:
 
Exit criteria: We should be able to read back the files written by the HoodieFileWriter implementation. We should be able to efficiently read back the set of keys in that file as well.

Summary and Changelog

  • Add HoodieSparkLanceReader which implements the HoodieSparkFileReader interface
  • Use Lance and Arrow related APIs to assist in schema conversion, and reading data from lance file back to an interator of InternalRow
  • Add tests which test round trip the HoodieSparkLanceWriter and HoodieSparkLanceReader

Impact

None

Risk Level

Low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Oct 22, 2025
@rahil-c
Copy link
Contributor Author

rahil-c commented Oct 22, 2025

@rahil-c rahil-c force-pushed the rahil/rfc100-hudi-lance-writer-reader branch 3 times, most recently from bb42add to 07690b6 Compare October 28, 2025 21:39
@rahil-c rahil-c force-pushed the rahil/rfc100-hudi-lance-writer-reader branch from 07690b6 to 8f230ca Compare October 28, 2025 23:44
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:XL PR with lines of changes > 1000 labels Oct 28, 2025
@rahil-c
Copy link
Contributor Author

rahil-c commented Oct 28, 2025

cc @the-other-tim-brown @yihua if you can take a look when you guys get a chance.

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua self-assigned this Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants