Add Python deserialisation for Merkle tree mini-trees by Copilot · Pull Request #7676 · microsoft/CCF

Copilot · 2026-02-17T15:03:03Z

Implements Python parsing of serialized Merkle trees from public:ccf.internal.tree to enable isolated chunk validation without full ledger context.

Changes

Added MerkleTree.deserialise(buffer, position=0): Parses compact mini-tree format from signature transactions
- Big-endian: [uint64 num_leaves][uint64 num_flushed][32-byte hashes...][32-byte extra_hashes...]
- Handles flushed nodes via bitmask iteration (matching C++ merklecpp::deserialise)
- Lazy reconstruction: stores leaves only, builds upper levels on-demand via get_merkle_root()
- Uses read_bytes() helper function to safely read buffer segments and advance position
Fixed edge case: get_merkle_root() now handles empty trees correctly
Added validation test: Integrated into run_read_ledger_on_testdata in tests/e2e_operations.py
- Maintains an accumulated MerkleTree via add_leaf() for each transaction
- Compares accumulated tree root with deserialized tree root at signature transactions
- Validates against real ledger data from tests/testdata/
- Will be removed once deserialization is integrated into Ledger constructor (future PR)

Implementation Details

The deserialization correctly reconstructs the Merkle tree structure by:

Reading leaf hashes from the serialized buffer
Processing the num_flushed bitmask to insert extra hashes on the left edge
Building tree levels bottom-up by pairing nodes
Storing only the leaf level (consistent with add_leaf() behavior)

Files Changed

python/src/ccf/merkletree.py: Core implementation of deserialise() method
tests/e2e_operations.py: Validation test integrated into existing e2e test infrastructure

Original prompt

This section details on the original issue you should resolve

<issue_title>Parse public:ccf.internal.tree in Python</issue_title>
<issue_description>To support the isolated Merkle-root-only validation level in #7502, we need to be able to initialise a ccf.MerkleTree from an intermediate state, not just a blank tree appending leafs. This state is what we call a "minitree", written to the public:ccf.internal.tree table in every signature transaction. It is not necessarily a full tree, as nodes from the left may have been flushed.

The source implementation is in C++ under 3rdparty/internal/merklecpp, from the full implementation (including tests) at https://github.com/microsoft/merklecpp. The only bit we need in Python is an equivalent of deserialise. We should test this against sample data from tests/testdata/.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Parse public:ccf.internal.tree in Python #7675

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

…e comments Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

python/src/ccf/merkletree.py

Extract repeated buffer[position:position+N] pattern into read_bytes() helper function to reduce error-prone code duplication. Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

eddyashton · 2026-02-17T16:03:58Z

python/tests/test_merkle_tree.py

+
+if __name__ == "__main__":
+    # Run tests
+    pytest.main([__file__, "-v"])


@copilot This project doesn't use pytest. Don't add a new test file, and don't test the serialisation in isolation on hand-constructed examples.

The useful test here is test_deserialise_real_ledger_data, which at least looks at real ledger data, but still seems to be missing validation that the constructed MerkleTree is accurate. It asserts sizes, but nothing deeper. The check here should be comparing against a MerkleTree maintained through the loop - if we call .add_leaf() with every transaction, we calculate an identical root to a MerkleTree constructed from this minitree parse.

This test should be moved to somewhere like run_read_ledger_on_testdata in e2e_operations.py, so that it is run as part of our e2e tests. It can probably be removed once this parsing is folded into the Ledger constructor, but that's for a future PR.

Removed pytest test file and example script. Added proper validation in run_read_ledger_on_testdata that compares an accumulated MerkleTree (built via add_leaf()) with deserialized trees from public:ccf.internal.tree, validating that the roots match. Test uses real ledger data and will be integrated into e2e test runs. Fixed in commit a0781e7.

eddyashton · 2026-02-17T16:04:30Z

python/merkle_tree_example.py

@copilot Remove this example tool.

Already removed in commit a0781e7 along with the pytest test file.

- Remove pytest test file (project doesn't use pytest) - Remove example script (not needed) - Add proper validation test in run_read_ledger_on_testdata - Test compares accumulated merkle tree (via add_leaf) with deserialized trees - Uses real ledger data from testdata directories Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot

Pull request overview

This PR implements Python deserialization of Merkle tree mini-trees from signature transactions, enabling validation of isolated ledger chunks without requiring full ledger context.

Changes:

Added MerkleTree.deserialise() method to parse compact Merkle tree format from public:ccf.internal.tree table
Fixed edge case in get_merkle_root() to handle empty trees correctly
Added temporary validation test in run_read_ledger_on_testdata to verify deserialization against real ledger data

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
python/src/ccf/merkletree.py	Core implementation of `deserialise()` method and empty tree fix
tests/e2e_operations.py	Temporary validation test comparing accumulated vs. deserialized tree roots

Copilot · 2026-02-17T16:23:00Z

python/src/ccf/merkletree.py

+        # Store only the leaf level - other levels will be reconstructed on demand
+        # by methods like get_merkle_root() via _make_tree().
+        # This is consistent with how add_leaf() works - it only appends to _levels[0]
+        # and sets _root to None, deferring tree construction until needed.
+        self._levels = [leaf_nodes[:]]


The comment here is misleading. After deserialization, _make_tree() cannot correctly reconstruct a tree that had flushed nodes, because _make_tree() has no knowledge of flushed nodes. This means deserialized trees must not be modified with add_leaf() after deserialization, or they will produce incorrect root hashes. This is acceptable for the intended use case (deserialize from signature, verify root, then discard), but should be documented more clearly to prevent misuse.

Copilot · 2026-02-17T16:23:01Z

tests/e2e_operations.py

+                        trees_validated += 1
+
+                # Add transaction to accumulated tree
+                # Transaction leafs are the transaction digest


Minor spelling issue: 'leafs' should be 'leaves' (the correct plural of 'leaf').

Suggested change

# Transaction leafs are the transaction digest

# Transaction leaves are the transaction digest

Initial plan

60f82d1

Copilot AI assigned Copilot and eddyashton Feb 17, 2026

Copilot started work on behalf of eddyashton February 17, 2026 15:03 View session

Copilot AI and others added 4 commits February 17, 2026 15:08

Add deserialise() method to MerkleTree class with tests

1e213df

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Add example script for Merkle tree deserialization

4fcd3c4

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Address code review feedback: remove unused imports/variables, improv…

473dd7f

…e comments Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Final code review fixes: move import to top, improve comments

7785c71

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add parsing for public:ccf.internal.tree in Python~~ Add Python deserialisation for Merkle tree mini-trees Feb 17, 2026

Copilot AI requested a review from eddyashton February 17, 2026 15:14

Copilot finished work on behalf of eddyashton February 17, 2026 15:14

eddyashton reviewed Feb 17, 2026

View reviewed changes

python/src/ccf/merkletree.py Outdated Show resolved Hide resolved

Copilot started work on behalf of eddyashton February 17, 2026 15:59 View session

Refactor deserialise to use helper function for buffer reads

3a0ae01

Extract repeated buffer[position:position+N] pattern into read_bytes() helper function to reduce error-prone code duplication. Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot AI requested a review from eddyashton February 17, 2026 16:00

Copilot finished work on behalf of eddyashton February 17, 2026 16:01

eddyashton reviewed Feb 17, 2026

View reviewed changes

Copilot started work on behalf of eddyashton February 17, 2026 16:04 View session

Copilot AI requested a review from eddyashton February 17, 2026 16:07

Copilot finished work on behalf of eddyashton February 17, 2026 16:08

Copilot started work on behalf of eddyashton February 17, 2026 16:08 View session

Copilot finished work on behalf of eddyashton February 17, 2026 16:09

eddyashton marked this pull request as ready for review February 17, 2026 16:14

eddyashton requested a review from a team as a code owner February 17, 2026 16:14

Copilot AI review requested due to automatic review settings February 17, 2026 16:14

Copilot started reviewing on behalf of eddyashton February 17, 2026 16:15 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python deserialisation for Merkle tree mini-trees#7676

Add Python deserialisation for Merkle tree mini-trees#7676
Copilot wants to merge 7 commits intomainfrom
copilot/parse-public-ccf-internal-tree

Copilot AI commented Feb 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

eddyashton Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

eddyashton Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Transaction leafs are the transaction digest
	# Transaction leaves are the transaction digest

Conversation

Copilot AI commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Implementation Details

Files Changed

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

eddyashton Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

eddyashton Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 17, 2026 •

edited

Loading