Add Python deserialisation for Merkle tree mini-trees#7676
Add Python deserialisation for Merkle tree mini-trees#7676
Conversation
Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
…e comments Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Extract repeated buffer[position:position+N] pattern into read_bytes() helper function to reduce error-prone code duplication. Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
python/tests/test_merkle_tree.py
Outdated
|
|
||
| if __name__ == "__main__": | ||
| # Run tests | ||
| pytest.main([__file__, "-v"]) |
There was a problem hiding this comment.
@copilot This project doesn't use pytest. Don't add a new test file, and don't test the serialisation in isolation on hand-constructed examples.
The useful test here is test_deserialise_real_ledger_data, which at least looks at real ledger data, but still seems to be missing validation that the constructed MerkleTree is accurate. It asserts sizes, but nothing deeper. The check here should be comparing against a MerkleTree maintained through the loop - if we call .add_leaf() with every transaction, we calculate an identical root to a MerkleTree constructed from this minitree parse.
This test should be moved to somewhere like run_read_ledger_on_testdata in e2e_operations.py, so that it is run as part of our e2e tests. It can probably be removed once this parsing is folded into the Ledger constructor, but that's for a future PR.
There was a problem hiding this comment.
Removed pytest test file and example script. Added proper validation in run_read_ledger_on_testdata that compares an accumulated MerkleTree (built via add_leaf()) with deserialized trees from public:ccf.internal.tree, validating that the roots match. Test uses real ledger data and will be integrated into e2e test runs. Fixed in commit a0781e7.
python/merkle_tree_example.py
Outdated
- Remove pytest test file (project doesn't use pytest) - Remove example script (not needed) - Add proper validation test in run_read_ledger_on_testdata - Test compares accumulated merkle tree (via add_leaf) with deserialized trees - Uses real ledger data from testdata directories Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR implements Python deserialization of Merkle tree mini-trees from signature transactions, enabling validation of isolated ledger chunks without requiring full ledger context.
Changes:
- Added
MerkleTree.deserialise()method to parse compact Merkle tree format frompublic:ccf.internal.treetable - Fixed edge case in
get_merkle_root()to handle empty trees correctly - Added temporary validation test in
run_read_ledger_on_testdatato verify deserialization against real ledger data
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| python/src/ccf/merkletree.py | Core implementation of deserialise() method and empty tree fix |
| tests/e2e_operations.py | Temporary validation test comparing accumulated vs. deserialized tree roots |
| # Store only the leaf level - other levels will be reconstructed on demand | ||
| # by methods like get_merkle_root() via _make_tree(). | ||
| # This is consistent with how add_leaf() works - it only appends to _levels[0] | ||
| # and sets _root to None, deferring tree construction until needed. | ||
| self._levels = [leaf_nodes[:]] |
There was a problem hiding this comment.
The comment here is misleading. After deserialization, _make_tree() cannot correctly reconstruct a tree that had flushed nodes, because _make_tree() has no knowledge of flushed nodes. This means deserialized trees must not be modified with add_leaf() after deserialization, or they will produce incorrect root hashes. This is acceptable for the intended use case (deserialize from signature, verify root, then discard), but should be documented more clearly to prevent misuse.
| trees_validated += 1 | ||
|
|
||
| # Add transaction to accumulated tree | ||
| # Transaction leafs are the transaction digest |
There was a problem hiding this comment.
Minor spelling issue: 'leafs' should be 'leaves' (the correct plural of 'leaf').
| # Transaction leafs are the transaction digest | |
| # Transaction leaves are the transaction digest |
Implements Python parsing of serialized Merkle trees from
public:ccf.internal.treeto enable isolated chunk validation without full ledger context.Changes
Added
MerkleTree.deserialise(buffer, position=0): Parses compact mini-tree format from signature transactions[uint64 num_leaves][uint64 num_flushed][32-byte hashes...][32-byte extra_hashes...]merklecpp::deserialise)get_merkle_root()read_bytes()helper function to safely read buffer segments and advance positionFixed edge case:
get_merkle_root()now handles empty trees correctlyAdded validation test: Integrated into
run_read_ledger_on_testdataintests/e2e_operations.pyMerkleTreeviaadd_leaf()for each transactiontests/testdata/Ledgerconstructor (future PR)Implementation Details
The deserialization correctly reconstructs the Merkle tree structure by:
num_flushedbitmask to insert extra hashes on the left edgeadd_leaf()behavior)Files Changed
python/src/ccf/merkletree.py: Core implementation ofdeserialise()methodtests/e2e_operations.py: Validation test integrated into existing e2e test infrastructureOriginal prompt
public:ccf.internal.treein Python #7675💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.