Skip to content

Conversation

ziggie1984
Copy link
Collaborator

@ziggie1984 ziggie1984 commented Sep 25, 2025

While this PR started to enforce the tlv onion, we decided against the enforcement of assumed feature bits. During the implementation however I encountered some flaky tests where we would not add the node currently in our graph because we would receive the node announcement before the channel announcement which would lead us to drop the node announcement message. Now we precache them similar to channel update messages and replay them as soon as a new channel announcement is received. The cache is a LRU storage which makes sure we limit its capacity to prevent OOM attacks.

@ziggie1984 ziggie1984 marked this pull request as ready for review September 25, 2025 12:45
@ziggie1984 ziggie1984 force-pushed the enforce-tlv-during-pathfinding branch from 942019a to 8237e54 Compare September 25, 2025 12:47
@ziggie1984 ziggie1984 added the payments Related to invoices/payments label Sep 25, 2025
@ziggie1984 ziggie1984 self-assigned this Sep 25, 2025
@ziggie1984 ziggie1984 added this to the v0.21.0 milestone Sep 25, 2025
@ziggie1984 ziggie1984 force-pushed the enforce-tlv-during-pathfinding branch from 8237e54 to ef97de6 Compare September 25, 2025 13:00
@saubyk saubyk added this to v0.21 Sep 25, 2025
@saubyk saubyk moved this to In review in v0.21 Sep 25, 2025
@ziggie1984 ziggie1984 force-pushed the enforce-tlv-during-pathfinding branch 9 times, most recently from 20e1b0c to 632f18d Compare September 26, 2025 11:10
This makes sure also our source node is part of the cache. Currently
there is use of the source node but also adding it to the cache seems
the right way here.
@ziggie1984 ziggie1984 force-pushed the enforce-tlv-during-pathfinding branch 2 times, most recently from 86f2e8e to b5c963d Compare September 26, 2025 21:17
@ziggie1984
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily enforces that all hops in a route must support TLV-encoded onion payloads, which improves routing reliability by failing early for incompatible paths. The core logic is implemented in routing/pathfind.go. Additionally, there are significant improvements to the gossip handling mechanism in discovery/gossiper.go, which now caches and replays premature node announcements, enhancing graph sync robustness. The test suite has been updated accordingly to reflect these changes. My review includes a suggestion to fix a potential logic issue in blinded path feature handling and a minor style improvement.

Comment on lines 396 to 397
features := lnwire.EmptyFeatureVector()
features.Set(lnwire.TLVOnionPayloadOptional)
if b.Features != nil {
features = b.Features.Clone()

// In case the feature set is empty, we set the TLV onion
// payload optional feature.
if features.IsEmpty() {
features.Set(lnwire.TLVOnionPayloadOptional)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for ensuring the TLVOnionPayloadOptional feature is set for blinded paths is flawed. If b.Features is non-nil and non-empty but lacks the TLVOnionPayloadOptional feature, this required bit won't be added. This can lead to creating invalid blinded routes.

The logic can be simplified to ensure the feature is always set:

	features := lnwire.EmptyFeatureVector()
	if b.Features != nil {
		features = b.Features.Clone()
	}
	features.Set(lnwire.TLVOnionPayloadOptional)

graph.ErrIgnored,
) {
switch {
case graph.IsError(err, graph.ErrIgnored):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the style guide, each case in a switch statement should be preceded by a brief comment explaining the condition. This case is missing that comment.1

                // If the node is ignored because we don't have any of its
                // channels, we'll cache it for later processing.
		case graph.IsError(err, graph.ErrIgnored):

Style Guide References

Footnotes

  1. The style guide recommends adding a brief comment to each case in a switch statement to explain the condition. (link)

@ziggie1984 ziggie1984 force-pushed the enforce-tlv-during-pathfinding branch from b5c963d to 1a99c62 Compare September 27, 2025 07:06
@ziggie1984
Copy link
Collaborator Author

Short summary:

  • LND does only encode onion_packets using the tlv payload encoding for quite some time now, but still supports decoding also the legacy format.

  • Now when enforcing this feature on the pathfinding side we need to make sure the the private nodes and also the blinded nodes for blinded payments always set this additional feature otherwise the pathfinding will fail.

  • This enforcement revealed some flakes in our code where previously we wouldn't care if we did not have the nodes node_announcement now the pathfinding would fail. I had to introduce the caching of node_announcments which was especially a problem for the itests because when using private channels we send the channel_announcment and the node_announcment almost at the same time and there might be some race conditions that we do not receive them in order and would not persist the node_announcment which for private channels wouldn't be sent again. I think this fix has a positive side effect because I saw cases in the past where we did miss the node_announcment and this caching might be a solution, wdyt ?

Overall I tend to merge this PR because it favours more strict pathfinding and also graph consistency and moreover fix some flakes as well we saw previously not having the node_annoucment. However I just want to point out that this PR is not strictly needed for the payment series so happy to hear the thoughts of the reviews.

Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure this is the correct direction for deprecating a legacy feature bit. THe spec labels this bit as "Assumed" in bolt 9. So I think instead we can just stop checking for the bit completely and just assume that all nodes on any route support the new format?

Rather than now going and adding the bit everywhere where an empty feature vector would have sufficed (as this also makes testing trickier)

// currently don't use the feature set of the blinded path currently so
// we always set it to allow the path finder to succeed because LND
// only creates paths with tlv payload encoding..
features := lnwire.EmptyFeatureVector()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might make sense to have an lnwire.AssumedFeatures() which would include all the feature bits that the spec has marked as assumed? non-blocking

// updates that we'll hold onto.
maxPrematureUpdates = 100

// maxPrematureNodeAnnouncements tracks the max amount of premature node
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this should be in its own PR? or at least have a commit message body? doesnt seem related to the TLV enforcement

req := &routerrpc.SendPaymentRequest{
Dest: carol.PubKey[:],
Amt: int64(payAmt),
PaymentHash: ht.Random32Bytes(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we need the node announcment to create a route because
we always verify if the node supports tlv payloads

i dont know if we should do this... i feel like we should lean more towards just assuming that all nodes support this format? Ie, i think when we deprecate a legacy encoding via a feature bit, we should in general be removing code & not adding extra checks etc?

I think for route hints, we should just assume that all nodes on that route support the TLV format

@ziggie1984
Copy link
Collaborator Author

Im not sure this is the correct direction for deprecating a legacy feature bit. THe spec labels this bit as "Assumed" in bolt 9. So I think instead we can just stop checking for the bit completely and just assume that all nodes on any route support the new format?

agree, it adds too much complexity without much benefit, I think we can still use the prematureNodeAnnouncment cache from this PR tho, which reveiled some flakes where we would not have a proper node_announcment set. So will drop all the commits except the one which introduced the cache.

@ziggie1984 ziggie1984 changed the title Enforce TLV when creating a Route Add NodeAnnouncment Cache to the Gossiper Sep 29, 2025
@ziggie1984 ziggie1984 force-pushed the enforce-tlv-during-pathfinding branch from badc471 to 3f6730f Compare September 29, 2025 13:43
During some tests it was reveiled that there might be a race
condition where we would receive a node annoucment without having
received the Channel Announcment especially for private channels
this might be the case because sending the channel announcment
and sending the node annoucnment happens by default at the same
time (3 blocks by default). Now we cache node annoucments when
received and no channel has been found yet and later replay them
as soon as a new channel announcment for this channel is received.
@ziggie1984 ziggie1984 force-pushed the enforce-tlv-during-pathfinding branch from 3f6730f to 58e6b6b Compare September 29, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
payments Related to invoices/payments
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

3 participants