Skip to content

Conversation

nalepae
Copy link
Contributor

@nalepae nalepae commented Oct 8, 2025

What type of PR is this?
Bug fix

What does this PR do? Why is it needed?
Before this PR, Prysm used to return ValidationReject when hitting an internal error in the validator of a message received via gossip. In this case, the corresponding peer is unfairly punished, and possibly disconnected.

This PR moves internal errors from ValidationReject to ValidationIgnore. As a consequence, we ignore the incoming message without punishing the peer.

Other notes for review
Please read commit by commit, with commit messages.

Choosing if the peer has a minimum of responsibility in an "internal error" is not always so straightforward.
Sometimes, there is no doubt. Sometimes, things are more complicated.

Ideally, the reviewer should check all the ValidationReject in the codebase and confirm that Prysm reject/ignore messages when it should.

The following part explains some questionable cases.

decodePubsubMessage

The only case where I found it unfair to penalize the peer is when

	m, ok := base.(ssz.Unmarshaler)
	if !ok {
		return nil, errors.Errorf("message of %T does not support marshaller interface", base)
	}

fails.

However, I think reaching this case shows an important bug in our node and the node needs a fix. So I did not write an exception for it.

kzg.VerifyCellKZGProofBatch

I initially thought that ckzg4844.VerifyCellKZGProofBatch would return false and no error if at least one proof in the batch is invalid, and an error if the function itself has an issue.
However, giving to this function a sidecar modified by createInvalidTestDataColumns will error the function as well.
(Maybe it's a bug of ckzg4844.VerifyCellKZGProofBatch). As a consequence, we want to reject all messages resulting of an error of ckzg4844.VerifyCellKZGProofBatch.

Seen with @jtraglia : The createInvalidTestDataColumns flips the first byte of the field element in the cell. This will make the field element invalid (not just incorrect) & that’s why it errors.

==> We need to REJECT if ckzg4844.VerifyCellKZGProofBatch errors.

Acknowledgements

Rationale:
The only way `set.Verify` returns an error is if there is a mismatch in `VerifyMultipleSignatures` arguments sizes, which is not the reponsability of the peer.
==> It is unfair to penalize the peer in such a case.
@jtraglia
Copy link
Contributor

jtraglia commented Oct 8, 2025

I initially thought that ckzg4844.VerifyCellKZGProofBatch would return false and no error if at least one proof in the batch is invalid, and an error if the function itself has an issue. However, giving to this function a sidecar modified by createInvalidTestDataColumns will error the function as well. (Maybe it's a bug of ckzg4844.VerifyCellKZGProofBatch). As a consequence, we want to reject all messages resulting of an error of ckzg4844.VerifyCellKZGProofBatch.

VerifyCellKZGProofBatch will return false and no error if an input (eg proof) is incorrect, but it will return false with an error if anything is invalid (malformed). There is no bug here. This is how it was designed.

@nalepae nalepae force-pushed the stop-penalize-peers-for-nothing branch 3 times, most recently from a52e57e to c449ca0 Compare October 10, 2025 13:00
@nalepae nalepae changed the title Stop penalize peers for nothing Stop unfairly penalize peers Oct 10, 2025
@nalepae nalepae force-pushed the stop-penalize-peers-for-nothing branch from f921d23 to 90ef02c Compare October 10, 2025 14:27
if err != nil {
log.WithFields(logging.BlobFields(bv.blob)).WithError(err).Debug("Could not replay parent state for blob signature verification")
return ErrInvalidProposerSignature
return errors.Wrap(err, "parent state")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this change, TestValidProposerSignature_CacheMiss now fails.
Is it fair to penalize the peer in such a case?

parentSlot, err := bv.fc.Slot(bv.blob.ParentRoot())
if err != nil {
return errors.Wrap(errSlotNotAfterParent, "Parent root not in forkchoice")
return errors.Wrap(err, "slot")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this change, TestSidecarParentSlotLower starts to fail.
Is it fair to penalize the peer in such a case?

if err != nil {
log.WithError(err).WithFields(logging.BlobFields(bv.blob)).Debug("State replay to parent_root failed")
return errSidecarUnexpectedProposer
return errors.Wrap(err, "parent state")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this change, TestSidecarProposerExpected - "not cached, state lookup failure" starts to fail.
Is it fair to penalize the peer in such a case?

if err != nil {
log.WithError(err).WithFields(logging.BlobFields(bv.blob)).Debug("Error computing proposer index from parent state")
return errSidecarUnexpectedProposer
return errors.Wrap(err, "compute proposer")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this change, TestSidecarProposerExpected - "not cached, ComputeProposer fails start to fail.
Is it fair to penalize the peer in such a case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants