Skip to content

Add Python deserialisation for Merkle tree mini-trees#7676

Open
Copilot wants to merge 7 commits intomainfrom
copilot/parse-public-ccf-internal-tree
Open

Add Python deserialisation for Merkle tree mini-trees#7676
Copilot wants to merge 7 commits intomainfrom
copilot/parse-public-ccf-internal-tree

Conversation

Copy link
Contributor

Copilot AI commented Feb 17, 2026

Implements Python parsing of serialized Merkle trees from public:ccf.internal.tree to enable isolated chunk validation without full ledger context.

Changes

  • Added MerkleTree.deserialise(buffer, position=0): Parses compact mini-tree format from signature transactions

    • Big-endian: [uint64 num_leaves][uint64 num_flushed][32-byte hashes...][32-byte extra_hashes...]
    • Handles flushed nodes via bitmask iteration (matching C++ merklecpp::deserialise)
    • Lazy reconstruction: stores leaves only, builds upper levels on-demand via get_merkle_root()
    • Uses read_bytes() helper function to safely read buffer segments and advance position
  • Fixed edge case: get_merkle_root() now handles empty trees correctly

  • Added validation test: Integrated into run_read_ledger_on_testdata in tests/e2e_operations.py

    • Maintains an accumulated MerkleTree via add_leaf() for each transaction
    • Compares accumulated tree root with deserialized tree root at signature transactions
    • Validates against real ledger data from tests/testdata/
    • Will be removed once deserialization is integrated into Ledger constructor (future PR)

Implementation Details

The deserialization correctly reconstructs the Merkle tree structure by:

  1. Reading leaf hashes from the serialized buffer
  2. Processing the num_flushed bitmask to insert extra hashes on the left edge
  3. Building tree levels bottom-up by pairing nodes
  4. Storing only the leaf level (consistent with add_leaf() behavior)

Files Changed

  • python/src/ccf/merkletree.py: Core implementation of deserialise() method
  • tests/e2e_operations.py: Validation test integrated into existing e2e test infrastructure
Original prompt

This section details on the original issue you should resolve

<issue_title>Parse public:ccf.internal.tree in Python</issue_title>
<issue_description>To support the isolated Merkle-root-only validation level in #7502, we need to be able to initialise a ccf.MerkleTree from an intermediate state, not just a blank tree appending leafs. This state is what we call a "minitree", written to the public:ccf.internal.tree table in every signature transaction. It is not necessarily a full tree, as nodes from the left may have been flushed.

The source implementation is in C++ under 3rdparty/internal/merklecpp, from the full implementation (including tests) at https://github.com/microsoft/merklecpp. The only bit we need in Python is an equivalent of deserialise. We should test this against sample data from tests/testdata/.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 4 commits February 17, 2026 15:08
Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
…e comments

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Copilot AI changed the title [WIP] Add parsing for public:ccf.internal.tree in Python Add Python deserialisation for Merkle tree mini-trees Feb 17, 2026
Copilot AI requested a review from eddyashton February 17, 2026 15:14
Extract repeated buffer[position:position+N] pattern into read_bytes()
helper function to reduce error-prone code duplication.

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

if __name__ == "__main__":
# Run tests
pytest.main([__file__, "-v"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot This project doesn't use pytest. Don't add a new test file, and don't test the serialisation in isolation on hand-constructed examples.

The useful test here is test_deserialise_real_ledger_data, which at least looks at real ledger data, but still seems to be missing validation that the constructed MerkleTree is accurate. It asserts sizes, but nothing deeper. The check here should be comparing against a MerkleTree maintained through the loop - if we call .add_leaf() with every transaction, we calculate an identical root to a MerkleTree constructed from this minitree parse.

This test should be moved to somewhere like run_read_ledger_on_testdata in e2e_operations.py, so that it is run as part of our e2e tests. It can probably be removed once this parsing is folded into the Ledger constructor, but that's for a future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed pytest test file and example script. Added proper validation in run_read_ledger_on_testdata that compares an accumulated MerkleTree (built via add_leaf()) with deserialized trees from public:ccf.internal.tree, validating that the roots match. Test uses real ledger data and will be integrated into e2e test runs. Fixed in commit a0781e7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Remove this example tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already removed in commit a0781e7 along with the pytest test file.

- Remove pytest test file (project doesn't use pytest)
- Remove example script (not needed)
- Add proper validation test in run_read_ledger_on_testdata
- Test compares accumulated merkle tree (via add_leaf) with deserialized trees
- Uses real ledger data from testdata directories

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Copilot AI requested a review from eddyashton February 17, 2026 16:07
@eddyashton eddyashton marked this pull request as ready for review February 17, 2026 16:14
@eddyashton eddyashton requested a review from a team as a code owner February 17, 2026 16:14
Copilot AI review requested due to automatic review settings February 17, 2026 16:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Python deserialization of Merkle tree mini-trees from signature transactions, enabling validation of isolated ledger chunks without requiring full ledger context.

Changes:

  • Added MerkleTree.deserialise() method to parse compact Merkle tree format from public:ccf.internal.tree table
  • Fixed edge case in get_merkle_root() to handle empty trees correctly
  • Added temporary validation test in run_read_ledger_on_testdata to verify deserialization against real ledger data

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
python/src/ccf/merkletree.py Core implementation of deserialise() method and empty tree fix
tests/e2e_operations.py Temporary validation test comparing accumulated vs. deserialized tree roots

Comment on lines +173 to +177
# Store only the leaf level - other levels will be reconstructed on demand
# by methods like get_merkle_root() via _make_tree().
# This is consistent with how add_leaf() works - it only appends to _levels[0]
# and sets _root to None, deferring tree construction until needed.
self._levels = [leaf_nodes[:]]
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is misleading. After deserialization, _make_tree() cannot correctly reconstruct a tree that had flushed nodes, because _make_tree() has no knowledge of flushed nodes. This means deserialized trees must not be modified with add_leaf() after deserialization, or they will produce incorrect root hashes. This is acceptable for the intended use case (deserialize from signature, verify root, then discard), but should be documented more clearly to prevent misuse.

Copilot uses AI. Check for mistakes.
trees_validated += 1

# Add transaction to accumulated tree
# Transaction leafs are the transaction digest
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor spelling issue: 'leafs' should be 'leaves' (the correct plural of 'leaf').

Suggested change
# Transaction leafs are the transaction digest
# Transaction leaves are the transaction digest

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse public:ccf.internal.tree in Python

2 participants