Skip to content

False positives in validator monitor missed block logging #8080

@michaelsproul

Description

@michaelsproul

Description

The validator monitor's monitoring of missed blocks uses arbitrary BeaconStates to look for skipped slots, with a tolerance of MISSED_BLOCK_LAG_SLOTS = 4 to try to avoid false positives at very recent slots.

The problem is, this methodology is still vulnerable to false positives when processing random side chains. E.g. on Hoodi:

Sep 18 02:36:22.561 ERROR Validator missed a block index: 327331, slot: 1329123, prev_block_root: 0x2c30f23f26c1924029f514073355889aace7de4ff9f44b901748e8dca3266340

This log is a false positive, because that slot was indeed proposed.

The slot 1329123 lies on a skipped slot for a side chain imported around the same time:

Sep 18 02:36:17.709 DEBUG Proposer shuffling cache miss parent_root: 0x2c30f23f26c1924029f514073355889aace7de4ff9f44b901748e8dca3266340, parent_slot: 1329104, block_root: 0x3f2d4cbe58575624b8a6bb3efbbe5b518a766eae5b2dde88c9db06cc96bc0288, block_slot: 1329129

An out of sync proposer building a block from the parent at slot 1329104 (25 slots prior), is enough to confuse the validator monitor. We process the head state of this side chain here:

self.import_block_update_validator_monitor(
block,
&state,
&mut consensus_context,
current_slot,
parent_block.slot(),
);

From that head state, the slot at 1329123 appears missed, so we get the warning.

Steps to resolve

I think we have two ways to fix this: a strict approach, and a pragmatic one.

The strict approach would only check for missed blocks once the slot of the proposal is finalized. This would have 0 false positives, but would be slower to detect missed blocks (we have to wait for finalization!)

The pragmatic approach might be to only process BeaconStates from the canonical chain. This information is already available in import_block, because we use it to determine whether to prime the early attester cache:

// This block became the head, add it to the early attester cache.
Ok(new_head_root) if new_head_root == block_root => {

I don't have a strong preference between the pragmatic and strict approaches. Perhaps the pragmatic one is better as it has the advantage of more quickly identifying faults. Although it is still vulnerable to false positives if the chain is reorging a lot over distances > MISSED_BLOCK_LAG_SLOTS.

Additional Info

Thanks @chong-he for finding this bug in the logs!

While implementing this change it might be nice to also extend (and clean up) the tests:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions