-
Notifications
You must be signed in to change notification settings - Fork 5
feat: new mpt implementation #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7d37496
to
1a4bae4
Compare
// | ||
// More advanced improvement: either pre-execute block at guest to know exact allocations in | ||
// advance, or allocate a separate arena specifically for updates. | ||
let capacity = num_nodes + num_nodes / 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should follow up about this
nodes: Vec<NodeData<'a>>, | ||
|
||
/// Cache. Hashing/encoding often needs "what would this node look like in its parent" | ||
cached_references: Vec<RefCell<Option<NodeRef<'a>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw even in Valery's PR, I wondered if we could get away using raw pointers for a little gain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC I tried using Cell instead and it was very slow for unknown reason. Raw pointers maybe can work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the overall logic since you had explained it to me verbally.
After you implement the new serialization, we should delete the old crate to clean up and lessen the code.
This PR introduces a new MPT implementation to optimize node reference calculation and data copying. The implementation is in a new crate named
mptnew
, though it is mostly based on the existing MPT implementation from #427.The main contribution of this PR is a new serialization technique. MPT nodes are RLP-encoded and serialized recursively. In the guest program, deserialization is performed by traversing the serialized byte array. Since the nodes are already RLP-encoded, we avoid re-encoding them and re-storing their references. As a result, MPT deserialization and state root calculation become purely zero-copy, with node data and references maintained as pointers to the original input data. Additionally, this implementation reduces data copies by writing data directly into the bump area from the start whenever possible.
Important note: I have not yet implemented the
build_mpt
feature that builds an MPT from MPT proofs. This feature is only used on the host to generate witness data. For now, I generate witness data for the new MPT implementation using the old MPT: on the host, I first build an old MPT and then serialize it in a format compatible with the new MPT. I will implement thebuild_mpt
feature next and then remove the existing MPT implementation.Results on block number 23100006: