Race condition in reprocessing queue

## Description

We suspect that a race condition exists in the reprocessing queue. Consider the following events:

1. An attestation A arrives for a block that is being processed (block B). The attestation is processed with an `UnknownHeadBlock` error and will be added to the reprocess queue.
2. Block B finishes being imported on another thread, and the `BlockImported` event is sent to the reprocess queue, which dequeues all existing attestations for B. Our attestation from A is not dequeued because it hasn't reached the reprocess queue yet.
3. Attestation A is added to the reprocess queue.
4. Attestation A times out after 12s and is retried. It succeeds upon retry, because B was already processed.

This is suboptimal, as A is processed ~12s later than it could have been. This is also a counterexample to the simplification proposed here:

- https://github.com/sigp/lighthouse/issues/7710

## Version

Lighthouse v8.0.0-rc.0 or thereabouts

## Steps to resolve

One way to fix this might be to make the reprocess queue aware of recently processed blocks (yuck?). It could immediately reprocess any messages it receives for recently processed blocks in order to guard against this race.

Another option is to just do nothing. We could try to observe how often this race happens in the wild before deciding whether an invasive code fix is worth it. Or we could wait until we are improving/rewriting the beacon processor.

Perhaps another option would be to use the new block status tracker to link the reprocessing of the attestation to the completion of the block import. We could attach a receiver for the status channel to the attestation when it initially fails processing, and use this to prompt reprocessing. However I think a similar race could still exist even if this case, if the block is completely unknown when attestation processing begins, but it starts and finishes importing between when the attestation is found to be invalid and when it is added to the reprocess queue (this is quite unlikely because this window of time is small, but not impossible from a concurrency correctness PoV).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Race condition in reprocessing queue #8073

Description

Version

Steps to resolve

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race condition in reprocessing queue #8073

Description

Description

Version

Steps to resolve

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions