Skip to content

perf: Improve metadata record merging and compaction with lazy deserialization#14103

Draft
yihua wants to merge 8 commits intoapache:masterfrom
yihua:perf-mdt-record-lazy-deserialization
Draft

perf: Improve metadata record merging and compaction with lazy deserialization#14103
yihua wants to merge 8 commits intoapache:masterfrom
yihua:perf-mdt-record-lazy-deserialization

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Oct 16, 2025

Describe the issue this Pull Request addresses

Fixes #14370

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Oct 16, 2025
@yihua yihua force-pushed the perf-mdt-record-lazy-deserialization branch from dd4ba58 to ab45fdb Compare November 13, 2025 21:22
import static org.apache.hudi.common.util.StringUtils.fromUTF8Bytes;
import static org.apache.hudi.metadata.HoodieMetadataPayload.KEY_FIELD_NAME;

public class SerializableMetadataIndexedRecord implements GenericRecord, KryoSerializable, Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of introducing another class, can we just update the existing metadata record so we have less code to migrate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still exploring the right approach. I think if we'd like to avoid the Avro deserialization and keep the byte array read from the HFile until the deserialization is necessary, we still need a separate data structure besides the existing metadata record or payload class.

@yihua yihua force-pushed the perf-mdt-record-lazy-deserialization branch from 967b2f9 to b6192e2 Compare November 26, 2025 22:21
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve metadata record merging and compaction with lazy Avro serde

3 participants