Skip to content

Conversation

shayanh
Copy link
Contributor

@shayanh shayanh commented Aug 26, 2025

This PR introduces a new MPT implementation to optimize node reference calculation and data copying. The implementation is in a new crate named mptnew, though it is mostly based on the existing MPT implementation from #427.

The main contribution of this PR is a new serialization technique. MPT nodes are RLP-encoded and serialized recursively. In the guest program, deserialization is performed by traversing the serialized byte array. Since the nodes are already RLP-encoded, we avoid re-encoding them and re-storing their references. As a result, MPT deserialization and state root calculation become purely zero-copy, with node data and references maintained as pointers to the original input data. Additionally, this implementation reduces data copies by writing data directly into the bump area from the start whenever possible.

Important note: I have not yet implemented the build_mpt feature that builds an MPT from MPT proofs. This feature is only used on the host to generate witness data. For now, I generate witness data for the new MPT implementation using the old MPT: on the host, I first build an old MPT and then serialize it in a format compatible with the new MPT. I will implement the build_mpt feature next and then remove the existing MPT implementation.

Results on block number 23100006:

  • Proof Time (s): 211.40 -- before: 226.85
  • Parallel Proof Time (s): 14.75 -- before: 15.53
  • Instruction count: 132,792,482 -- before: 146,988,654

@shayanh shayanh marked this pull request as ready for review August 27, 2025 00:28
//
// More advanced improvement: either pre-execute block at guest to know exact allocations in
// advance, or allocate a separate arena specifically for updates.
let capacity = num_nodes + num_nodes / 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should follow up about this

nodes: Vec<NodeData<'a>>,

/// Cache. Hashing/encoding often needs "what would this node look like in its parent"
cached_references: Vec<RefCell<Option<NodeRef<'a>>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw even in Valery's PR, I wondered if we could get away using raw pointers for a little gain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC I tried using Cell instead and it was very slow for unknown reason. Raw pointers maybe can work

Copy link
Contributor

@jonathanpwang jonathanpwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the overall logic since you had explained it to me verbally.

After you implement the new serialization, we should delete the old crate to clean up and lessen the code.

@jonathanpwang jonathanpwang merged commit 5830cca into main Aug 27, 2025
5 checks passed
@jonathanpwang jonathanpwang deleted the shayanh/mptnew branch August 27, 2025 05:27
@jonathanpwang jonathanpwang restored the shayanh/mptnew branch August 27, 2025 05:27
@jonathanpwang jonathanpwang added the input-format The input format of the host binary changed label Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
input-format The input format of the host binary changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants