Skip to content

Conversation

yongkangc
Copy link
Member

@yongkangc yongkangc commented Oct 8, 2025

This PR builds on top of #18887 by adding worker pooling to account and blinded account proofs, which allows us to reduce overhead from spawning the worker.

Changes:

  • Move Account to pooling logic
  • Move Blinded Account Nodes to pooling logic. Removed underlying structure and functions (pending tasks)

Core logic:

  1. Determine workers based on parallelism
  2. Spawn x workers each with its own tx
  3. Queue storage proofs via crossbeam channels to the workers
  4. Await Storage proofs to build account proofs
Caller
    ↓ (requests account multiproof)
ProofTaskManager
    ↓ (routes to account worker pool)
Account Worker
    ↓ (calls collect_storage_proofs)
    ↓ (queues all storage proofs to storage worker pool)
Storage Workers (parallel computation)
    ↓ (return storage proofs)
Account Worker
    ↓ (calls build_account_multiproof_with_storage_roots)
    ↓ (walks account trie, looks up pre-computed storage roots)
Return to Caller (final account multiproof)

Issues closed:

References:

Next Steps:

  • Removing ProofTaskManager + other cleanups

@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Oct 8, 2025
@yongkangc yongkangc self-assigned this Oct 8, 2025
@yongkangc yongkangc moved this from Backlog to In Progress in Reth Tracker Oct 8, 2025
@yongkangc yongkangc marked this pull request as ready for review October 8, 2025 12:03
@yongkangc yongkangc requested a review from Copilot October 8, 2025 12:03
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds worker pooling support for account proof operations to complement the existing storage proof worker pool. The main goal is to reduce overhead from spawning workers by reusing dedicated workers with long-lived database transactions for account multiproof computations.

Key changes:

  • Added account worker pool alongside existing storage worker pool
  • Integrated account multiproof requests into the worker pooling system
  • Modified configuration to support separate account worker count parameter

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/trie/parallel/src/proof_task.rs Core implementation of account worker pool, job structures, and worker loop functions
crates/trie/parallel/src/proof.rs Refactored multiproof generation to use account worker pool and extracted helper functions
crates/engine/tree/src/tree/payload_processor/multiproof.rs Updated multiproof manager to use account worker pool for proof calculations
crates/engine/tree/src/tree/payload_processor/mod.rs Added account worker count configuration parameter
crates/engine/primitives/src/config.rs Added account_worker_count field and related configuration methods

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yongkangc
Copy link
Member Author

@mediocregopher

re #18887 (comment)

I am thinking about it again, do we wanna expect or might be better to log an error? i am thinking there might not be a cost to erroring instead of expect?

@yongkangc yongkangc requested a review from Copilot October 8, 2025 12:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@mediocregopher
Copy link
Collaborator

@mediocregopher

re #18887 (comment)

I am thinking about it again, do we wanna expect or might be better to log an error? i am thinking there might not be a cost to erroring instead of expect?

My opinion on these situations is that if something isn't supposed to be possible we should acknowledge it in the code; adding unnecessary error handling/logging creates mental overhead when reading the code because the reader might now think through this extra code-path, not realizing it's not really possible.

In this case, if it were possible for sending to the worker channel to fail, I don't think just logging an error would be sufficient; the proof task is getting silently dropped, meaning the sparse trie won't be revealed, meaning possibly a failure to calculate the state root. I would rather, as a reader, not have to think through all of that, and just assert that this can't even really happen.

Put another way: less code == less problems

- Added `AccountMultiproofInput` struct for input parameters related to account multiproof computation.
- Introduced `AccountMultiproofJob` struct for managing account multiproof tasks.
- Updated `ProofTaskKind` enum to include an account multiproof variant.
- Modified task routing logic to ensure account multiproof operations are directed to the appropriate worker pools.
- Introduced `account_worker_count` field in `TreeConfig` struct.
- Updated default implementation and constructor to include `account_worker_count`.
- Modified `ProofTaskManager` instantiation to utilize `account_worker_count`.
- Added getter and setter methods for `account_worker_count` in `TreeConfig`.
- Added support for account multiproof operations with dedicated account workers.
- Introduced `account_work_tx` and `account_worker_count` fields in `ProofTaskManager`.
- Updated worker loop to handle account multiproof jobs and collect storage proofs.
- Enhanced task routing to ensure proper handling of account multiproof requests.
- Improved documentation for clarity on worker lifecycle and transaction reuse.
- Implemented account worker functionality to handle account multiproof operations.
- Introduced `account_work_tx` and `account_worker_count` fields for managing account jobs.
- Updated worker loop to process account multiproof tasks and collect associated storage proofs.
- Improved task routing to ensure efficient handling of account multiproof requests.
- Enhanced documentation for better understanding of worker lifecycle and transaction reuse.
- Introduced `WorkerType` enum for better logging of worker types.
- Updated `ProofTaskManager` to spawn account workers alongside storage workers.
- Implemented `account_worker_loop` to handle account multiproof operations and collect storage proofs.
- Improved documentation on worker lifecycle and transaction reuse.
- Refactored task management to streamline the spawning of worker pools.
- Added `account_proof_task_handle` to manage account multiproof tasks.
- Updated `MultiproofManager` constructor to accept the new account proof task handle.
- Enhanced task queuing for account multiproof operations, including error handling for task submission and channel communication.
- Improved prefix set handling for account multiproof inputs.
- Added `account_worker_loop` function to handle account multiproof tasks.
- Introduced `collect_storage_proofs` function for queuing and collecting storage proofs.
- Enhanced error handling for task submission and result collection.
- Updated documentation to clarify the workflow of account multiproof operations and storage proof handling.
- Introduced `extend_prefix_sets_with_targets` method to streamline prefix set preparation for multiproof generation.
- Updated `decoded_multiproof` to utilize the new method, improving clarity and reducing code duplication.
- Refactored account multiproof task queuing to enhance error handling and communication with worker pools.
- Improved documentation for methods related to account multiproof operations and storage proof handling.
- Updated the field name from `storage_proof_task_handle` to `proof_task_handle` in the `ParallelProof` struct to better reflect its purpose.
- Adjusted related method signatures and internal references to maintain consistency throughout the codebase.
- Removed the generic type parameter `Factory` from the `ProofTaskManager` struct and its implementation, streamlining the code.
- Updated the `new` and `spawn_worker_pool` methods to maintain functionality while simplifying type handling.
- Enhanced clarity and reduced complexity in the proof task management logic.
…Manager`

- Updated comments for `storage_proof_task_handle` and `account_proof_task_handle` to clarify their roles in managing proof tasks.
- Adjusted method calls to use the updated handle references, enhancing code readability and maintainability.
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

…n in parallel

- replace standard mpsc channels with crossbeam channels in `ProofTaskManager`

- Updated `proof_task_rx` and `proof_task_tx` to use `CrossbeamReceiver` and `CrossbeamSender` for improved performance and non-blocking behavior.
- Renamed `collect_storage_proofs` to `queue_storage_proofs` to better reflect its functionality of returning receivers for storage proofs.
- Enhanced documentation to clarify the interleaved parallelism between account trie traversal and storage proof computation.
-   Before this change the code did:

  StorageRootTargets::new(
      prefix_sets.account_prefix_set.iter().map(|n|
  B256::from_slice(&nibbles.pack())),
      prefix_sets.storage_prefix_sets.clone(),
  ).len()

  StorageRootTargets::new(..) builds a brand-new
  B256Map<PrefixSet> by:

  - cloning every entry in storage_prefix_sets (each PrefixSet can
    hold dozens of nibble prefixes), and
  - allocating a fresh B256Map to hold them.

- We only needed the resulting length, so those clones/allocations
  were immediately thrown away.
…age jobs

- Replaced the `storage_proof_handle` parameter with `storage_work_tx` in the `account_worker_loop` function to facilitate direct communication with the storage worker queue.
- Modified the `queue_storage_proofs` function to accept a mutable reference to `storage_prefix_sets`, improving efficiency by avoiding unnecessary clones.
- Enhanced the initialization of `collected_decoded_storages` to use pre-allocated capacity, optimizing memory usage during proof collection.
…nation

- Introduced a new function `default_account_worker_count` to calculate the default number of account worker threads, set to 1.5 times the storage worker count.
- Updated the `TreeConfig` implementation to use the new account worker count function, enhancing performance for I/O-bound tasks related to storage proof collection and account trie traversal.
@yongkangc
Copy link
Member Author

yongkangc commented Oct 9, 2025

Mainnet reth4 On Commit: 9e177f4

image image

@yongkangc
Copy link
Member Author

Base on commit 9e177f4

image image

@yongkangc yongkangc requested a review from Copilot October 9, 2025 07:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


impl<Factory> ProofTaskManager<Factory>
// TODO: Refactor this with storage_worker_loop. ProofTaskManager should be removed in the following
// pr and `MultiproofManager` should be used instead to dispatch jobs directly.
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment indicates technical debt. Consider creating a tracking issue or estimate for this refactoring to avoid it being forgotten.

Suggested change
// pr and `MultiproofManager` should be used instead to dispatch jobs directly.
// pr and `MultiproofManager` should be used instead to dispatch jobs directly.
// Tracking issue: https://github.com/your-org/your-repo/issues/1234

Copilot uses AI. Check for mistakes.

- Updated comments for `proof_task_handle` parameters in the `MultiproofManager` initialization to specify their roles for storage and account proof workers, improving code clarity and maintainability.
@yongkangc yongkangc requested a review from Copilot October 9, 2025 07:41
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yongkangc yongkangc requested a review from Copilot October 9, 2025 08:07
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +485 to +494
#[allow(clippy::too_many_arguments)]
fn build_account_multiproof_with_storage_roots<C, H>(
trie_cursor_factory: C,
hashed_cursor_factory: H,
targets: &MultiProofTargets,
prefix_set: PrefixSet,
collect_branch_node_masks: bool,
multi_added_removed_keys: Option<&Arc<MultiAddedRemovedKeys>>,
mut storage_proof_receivers: B256Map<Receiver<StorageProofResult>>,
missed_leaves_storage_roots: &DashMap<B256, B256>,
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function has too many arguments (8). Consider grouping related parameters into a struct to improve readability and maintainability. For example, the cursor factories and configuration parameters could be grouped.

Suggested change
#[allow(clippy::too_many_arguments)]
fn build_account_multiproof_with_storage_roots<C, H>(
trie_cursor_factory: C,
hashed_cursor_factory: H,
targets: &MultiProofTargets,
prefix_set: PrefixSet,
collect_branch_node_masks: bool,
multi_added_removed_keys: Option<&Arc<MultiAddedRemovedKeys>>,
mut storage_proof_receivers: B256Map<Receiver<StorageProofResult>>,
missed_leaves_storage_roots: &DashMap<B256, B256>,
/// Struct grouping the cursor factories for account multiproof.
struct CursorFactories<C, H> {
trie_cursor_factory: C,
hashed_cursor_factory: H,
}
/// Struct grouping configuration parameters for account multiproof.
struct AccountMultiproofConfig<'a> {
collect_branch_node_masks: bool,
multi_added_removed_keys: Option<&'a Arc<MultiAddedRemovedKeys>>,
storage_proof_receivers: B256Map<Receiver<StorageProofResult>>,
missed_leaves_storage_roots: &'a DashMap<B256, B256>,
}
#[allow(clippy::too_many_arguments)]
fn build_account_multiproof_with_storage_roots<C, H>(
cursor_factories: CursorFactories<C, H>,
targets: &MultiProofTargets,
prefix_set: PrefixSet,
config: AccountMultiproofConfig,

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with the comment that this is a lot of arguments, I think it would be nice to make this simpler by introducing some sort of input or context type

@yongkangc
Copy link
Member Author

@mediocregopher addressed ur changes :)

@shekhirin @Rjected would appreciate a review here

Copy link
Collaborator

@mediocregopher mediocregopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final question

Copy link
Member

@Rjected Rjected left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure for workers makes sense to me, I have some comments on docs and style

/// Account workers primarily coordinate storage proof collection and account trie traversal.
/// They spend significant time blocked on `receiver.recv()` calls waiting for storage proofs,
/// so we use higher concurrency (1.5x storage workers) to maximize throughput and overlap.
/// While storage workers are CPU-bound, account workers are I/O-bound coordinators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment true? I would expect the storage workers to be I/O bound as well

}

impl<Factory> ProofTaskManager<Factory>
// TODO: Refactor this with storage_worker_loop. ProofTaskManager should be removed in the following
Copy link
Member

@Rjected Rjected Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it is helpful to have the state required for each looping task, to be encapsulated in a struct rather than be a free function

Comment on lines +485 to +494
#[allow(clippy::too_many_arguments)]
fn build_account_multiproof_with_storage_roots<C, H>(
trie_cursor_factory: C,
hashed_cursor_factory: H,
targets: &MultiProofTargets,
prefix_set: PrefixSet,
collect_branch_node_masks: bool,
multi_added_removed_keys: Option<&Arc<MultiAddedRemovedKeys>>,
mut storage_proof_receivers: B256Map<Receiver<StorageProofResult>>,
missed_leaves_storage_roots: &DashMap<B256, B256>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with the comment that this is a lot of arguments, I think it would be nice to make this simpler by introducing some sort of input or context type

Comment on lines +793 to +813
// spawns workers that will execute worker_fn
for worker_id in 0..worker_count {
let provider_ro = view.provider_ro()?;
let tx = provider_ro.into_tx();
self.total_transactions += 1;
return Ok(Some(ProofTaskTx::new(tx, self.task_ctx.clone(), self.total_transactions)));
}
let proof_task_tx = ProofTaskTx::new(tx, task_ctx.clone(), worker_id);
let work_rx_clone = work_rx.clone();
let worker_fn_clone = worker_fn.clone();

Ok(None)
}
executor
.spawn_blocking(move || worker_fn_clone(proof_task_tx, work_rx_clone, worker_id));

/// Spawns the next queued proof task on the executor with the given input, if there are any
/// transactions available.
///
/// This will return an error if a transaction must be created on-demand and the consistent view
/// provider fails.
pub fn try_spawn_next(&mut self) -> ProviderResult<()> {
let Some(task) = self.pending_tasks.pop_front() else { return Ok(()) };

let Some(proof_task_tx) = self.get_or_create_tx()? else {
// if there are no txs available, requeue the proof task
self.pending_tasks.push_front(task);
return Ok(())
};

let tx_sender = self.tx_sender.clone();
self.executor.spawn_blocking(move || match task {
ProofTaskKind::BlindedAccountNode(path, sender) => {
proof_task_tx.blinded_account_node(path, sender, tx_sender);
}
// Storage trie operations should never reach here as they're routed to worker pool
ProofTaskKind::BlindedStorageNode(_, _, _) | ProofTaskKind::StorageProof(_, _) => {
unreachable!("Storage trie operations should be routed to worker pool")
}
});
spawned_workers += 1;

tracing::debug!(
target: "trie::proof_task",
worker_id,
spawned_workers,
worker_type = %worker_type,
"{} worker spawned successfully", worker_type
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the Fn here is used to be generic over storage and account workers, and have only one function for spawning worker pools. But IMO this makes it harder to follow what's going on. What would be more clear to me, would be structs that encapsulate:

  1. an account worker / loop
  2. a storage worker / loop

And either duplicating the spawning code or finding another way to abstract the tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants