Skip to content

Conversation

yongkangc
Copy link
Member

@yongkangc yongkangc commented Oct 10, 2025

Context:

  • As part of our performance work to reduce overhead and improve scheduling, we added worker pooling for multiproof generation.
  • This PR aims to perform cleanup and remove ProofTaskManager as an abstraction as we can now directly dispatch the proofs jobs to workers.

closes: #18801

impact:

Change Impact
Remove run() loop thread -1 thread, -1 channel hop
Direct channel sends ~some time saved per task
Eliminate enum wrapping ~2 allocations saved per task

reference PRs:

- Replaced the ProofTaskManager with a new spawn_proof_workers function for better clarity and maintainability.
- Updated related code to utilize the new function, simplifying the worker spawning process.
- Enhanced metrics tracking for storage and account proof requests, ensuring thread-safe operations.
- Improved error handling and code structure across proof task implementations.
@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Oct 10, 2025
@yongkangc yongkangc self-assigned this Oct 10, 2025
@yongkangc yongkangc moved this from Backlog to In Progress in Reth Tracker Oct 10, 2025
- Added a constant `MIN_WORKER_COUNT` to enforce a minimum number of workers for storage and account proof tasks.
- Updated `default_storage_worker_count` and `default_account_worker_count` functions to utilize the new minimum constraint.
- Enhanced setter methods in `TreeConfig` to ensure worker counts do not fall below the minimum.
- Modified command-line argument parsing to validate worker counts against the minimum requirement.
- Added a debug assertion to ensure active_handles does not underflow when dropping a ProofTaskManagerHandle.
- Implemented metrics recording to flush before exit when the last handle is dropped, enhancing monitoring capabilities.
@yongkangc yongkangc requested a review from Copilot October 10, 2025 10:36
@yongkangc yongkangc changed the title refactor: remove proof task manager refactor(trie): remove proof task manager Oct 10, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the proof task management by removing the ProofTaskManager abstraction and replacing it with direct worker pool spawning. The change eliminates the routing thread overhead by providing direct channel access to storage and account worker pools, simplifying the architecture while maintaining the same worker pool functionality.

Key changes:

  • Replaced ProofTaskManager with spawn_proof_workers function for direct worker spawning
  • Converted ProofTaskManagerHandle to provide type-safe queue methods with direct channel access
  • Updated metrics to use lock-free atomic counters for thread-safe operations

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/trie/parallel/src/proof_task_metrics.rs Converts metrics fields to atomic counters for lock-free thread safety
crates/trie/parallel/src/proof_task.rs Replaces ProofTaskManager with spawn_proof_workers function and updates handle interface
crates/trie/parallel/src/proof.rs Updates proof generation to use new direct queue methods
crates/node/core/src/args/engine.rs Adds minimum worker count validation to CLI arguments
crates/engine/tree/src/tree/payload_validator.rs Updates error message to reflect new spawning approach
crates/engine/tree/src/tree/payload_processor/multiproof.rs Updates multiproof manager to use new queue methods
crates/engine/tree/src/tree/payload_processor/mod.rs Replaces ProofTaskManager instantiation with spawn_proof_workers
crates/engine/primitives/src/config.rs Adds MIN_WORKER_COUNT constant and enforces minimum worker limits

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yongkangc yongkangc requested a review from Copilot October 10, 2025 10:38
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

- Introduced helper functions to streamline error conversion from ProviderError and channel receive errors to SparseTrieError.
- Enhanced readability and maintainability of the trie_node method by reducing repetitive error handling code.
@yongkangc yongkangc requested a review from Copilot October 10, 2025 10:51
@yongkangc yongkangc marked this pull request as ready for review October 10, 2025 10:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yongkangc yongkangc requested a review from Copilot October 10, 2025 10:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1037 to +1041
debug_assert_ne!(
previous_handles, 0,
"active_handles underflow in ProofTaskManagerHandle::drop (previous={})",
previous_handles
);
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug assertion checks for underflow after the fetch_sub operation, but this creates a race condition. If multiple threads drop handles simultaneously, one could observe 0 while another decrements below 0. Move the check before fetch_sub or use compare_and_swap to prevent underflow.

Copilot uses AI. Check for mistakes.

- Introduced a `clamp_worker_count` function to centralize the logic for enforcing the minimum worker count.
- Updated setter methods in `TreeConfig` to utilize the new clamping function, improving code readability and maintainability.
@yongkangc yongkangc requested a review from Copilot October 10, 2025 11:05
@paradigmxyz paradigmxyz deleted a comment from Copilot AI Oct 10, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +105 to +109
executor: Handle,
view: ConsistentDbView<Factory>,
task_ctx: ProofTaskCtx,
storage_worker_count: usize,
account_worker_count: usize,
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spawn_proof_workers function lacks documentation for its parameters. Consider adding parameter documentation to explain what each argument does, especially the worker count parameters and their impact on performance.

Copilot uses AI. Check for mistakes.

Comment on lines 1081 to 1092
impl TrieNodeProvider for ProofTaskTrieNodeProvider {
fn trie_node(&self, path: &Nibbles) -> Result<Option<RevealedNode>, SparseTrieError> {
let (tx, rx) = channel();
/// Helper to convert `ProviderError` to `SparseTrieError`
fn provider_err_to_trie_err(e: ProviderError) -> SparseTrieError {
SparseTrieErrorKind::Other(Box::new(std::io::Error::other(e.to_string()))).into()
}

/// Helper to convert channel recv error to `SparseTrieError`
fn recv_err_to_trie_err(_: std::sync::mpsc::RecvError) -> SparseTrieError {
SparseTrieErrorKind::Other(Box::new(std::io::Error::other("channel closed"))).into()
}

Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper functions are defined inside the trie_node method, which creates unnecessary code nesting. Consider moving these helper functions outside the method or to a module level for better maintainability and potential reuse.

Copilot uses AI. Check for mistakes.

Comment on lines 349 to 352
/// Handle to the proof worker pool for storage proofs.
storage_proof_task_handle: ProofTaskManagerHandle,
/// Handle to the proof task manager used for account multiproofs.
/// Handle to the proof worker pool for account multiproofs.
account_proof_task_handle: ProofTaskManagerHandle,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why do we need to separate them here, if it's always the same proof_task_handle passed to both fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the note

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the note, addresssed them

let (tx, rx) = channel();
/// Helper to convert `ProviderError` to `SparseTrieError`
fn provider_err_to_trie_err(e: ProviderError) -> SparseTrieError {
SparseTrieErrorKind::Other(Box::new(std::io::Error::other(e.to_string()))).into()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we do SparseTrieErrorKind::Other(Box::new(e)) here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, good note - addressed that as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed this in commit

Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just have some nits regarding the usage of std::io::Error and separate account/storage proof handles.

- Merged separate storage and account proof task handles into a single proof task handle for improved code clarity and maintainability.
- Updated related methods to utilize the consolidated handle, streamlining the management of proof tasks.
- Updated the ProofTaskMetrics struct to derive Default, removing the manual implementation of the default method.
- This change enhances code clarity and reduces boilerplate, while maintaining the same functionality.
- Updated the error conversion helper function in ProofTaskTrieNodeProvider to directly wrap the ProviderError, enhancing clarity and maintainability.
- This change simplifies the error handling logic within the trie_node method.
@yongkangc
Copy link
Member Author

@shekhirin thanks for the review, just addressed all your comments in the commits

Comment on lines +1083 to +1091
/// Helper to convert `ProviderError` to `SparseTrieError`
fn provider_err_to_trie_err(e: ProviderError) -> SparseTrieError {
SparseTrieErrorKind::Other(Box::new(e)).into()
}

/// Helper to convert channel recv error to `SparseTrieError`
fn recv_err_to_trie_err(_: std::sync::mpsc::RecvError) -> SparseTrieError {
SparseTrieErrorKind::Other(Box::new(std::io::Error::other("channel closed"))).into()
}
Copy link
Collaborator

@shekhirin shekhirin Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just do this without helper functions?

.map_err(|error| SparseTrieErrorKind::Other(Box::new(error)))?;

RecvError already implements Error, so should work?

/// Clamps the worker count to the minimum allowed value.
///
/// Ensures that the worker count is at least [`MIN_WORKER_COUNT`].
const fn clamp_worker_count(count: usize) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just .max(MIN_WORKER_COUNT)? Let's move this to with_*_worker_count functions, no need to have a separate helper fn for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants