Skip to content

Conversation

wetkeyboard
Copy link
Contributor

fix the issue #18408

Thanks @acerone85 for the detailed issue report, I think I partly understood the issue.

During reorganization, when find_disk_reorg() detects the need to remove blocks and starts a RemoveBlocks operation, incoming blocks were incorrectly classified as PersistingNotDescendant.

This disabled parallel state root computation, forcing sequential computation that tried to access headers already removed by the ongoing removal job, resulting in fatal errors.

The race condition sequence:

  1. Reorganization detected → remove_blocks() called → persistence state set to RemovingBlocks
  2. New block arrives at height H+2 while removal is in progress
  3. persisting_kind_for() sees RemovingBlocks → returns PersistingNotDescendant
  4. Parallel state root disabled → sequential computation fails with "no header found"

@mediocregopher
Copy link
Collaborator

Hi @wetkeyboard, thanks for the PR. I'm not 100% sure this is correct though; at the moment the normal state root task (as opposed to parallel state root fallback) cannot be run except on blocks which are descendents of the current persisted tip. This PR would cause state root task to run possibly while the DB tip hasn't been unwound to an ancestor of the block being executed. At best the ConsistentDbView would catch this and fallback to parallel state root anyway, but at worst we'd end up with invalid trie updates.

Two notes:

  • This tracking issue is tracking changes which are intended to remove the need for this parallel state root fallback in a more fundamental way.
  • I think the bug here might actually be with database consistency anyway, I'm going to leave a note on the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

2 participants