-
Notifications
You must be signed in to change notification settings - Fork 925
Description
Description
The validator monitor's monitoring of missed blocks uses arbitrary BeaconState
s to look for skipped slots, with a tolerance of MISSED_BLOCK_LAG_SLOTS = 4
to try to avoid false positives at very recent slots.
The problem is, this methodology is still vulnerable to false positives when processing random side chains. E.g. on Hoodi:
Sep 18 02:36:22.561 ERROR Validator missed a block index: 327331, slot: 1329123, prev_block_root: 0x2c30f23f26c1924029f514073355889aace7de4ff9f44b901748e8dca3266340
This log is a false positive, because that slot was indeed proposed.
The slot 1329123 lies on a skipped slot for a side chain imported around the same time:
Sep 18 02:36:17.709 DEBUG Proposer shuffling cache miss parent_root: 0x2c30f23f26c1924029f514073355889aace7de4ff9f44b901748e8dca3266340, parent_slot: 1329104, block_root: 0x3f2d4cbe58575624b8a6bb3efbbe5b518a766eae5b2dde88c9db06cc96bc0288, block_slot: 1329129
An out of sync proposer building a block from the parent at slot 1329104 (25 slots prior), is enough to confuse the validator monitor. We process the head state of this side chain here:
lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs
Lines 3980 to 3986 in 3cb7e59
self.import_block_update_validator_monitor( | |
block, | |
&state, | |
&mut consensus_context, | |
current_slot, | |
parent_block.slot(), | |
); |
From that head state, the slot at 1329123 appears missed, so we get the warning.
Steps to resolve
I think we have two ways to fix this: a strict approach, and a pragmatic one.
The strict approach would only check for missed blocks once the slot of the proposal is finalized. This would have 0 false positives, but would be slower to detect missed blocks (we have to wait for finalization!)
The pragmatic approach might be to only process BeaconState
s from the canonical chain. This information is already available in import_block
, because we use it to determine whether to prime the early attester cache:
lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs
Lines 3926 to 3927 in 3cb7e59
// This block became the head, add it to the early attester cache. | |
Ok(new_head_root) if new_head_root == block_root => { |
I don't have a strong preference between the pragmatic and strict approaches. Perhaps the pragmatic one is better as it has the advantage of more quickly identifying faults. Although it is still vulnerable to false positives if the chain is reorging a lot over distances > MISSED_BLOCK_LAG_SLOTS
.
Additional Info
Thanks @chong-he for finding this bug in the logs!
While implementing this change it might be nice to also extend (and clean up) the tests: