multi: prevent goroutine leak in brontide #10012

ziggie1984 · 2025-06-30T18:42:06Z

This makes sure that goroutines do not pile up in case premature channel updates are received which are never processed but get deleted form the prematureUpdate LRU cache in case the maximum limit of 100 premature updates is reached and therefore old chan update message get deleted.

~~Depends on: lightninglabs/neutrino#322~~

coderabbitai · 2025-06-30T18:42:12Z

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Roasbeef · 2025-07-01T23:51:02Z

discovery/gossiper.go

 			maxPrematureUpdates,
+			lru.WithDeleteCallback(
+				func(k uint64, cmsg *cachedNetworkMsg) {
+					// for every network message which is


Hmm, I see we've changed approaches. Don't we still want to ensure that the goroutine created to wait the response is cleaned up as soon as possible?

A premature channel update is an update for a channel we don't know about. It's of common use in the itests due to lack of instant block propagation (one node gets the block first, sends the ann before the other has seen the block. It can even be a zombie edge.

In the common case, we hear of the block then we can return an error and the goroutine exits. However, if it's a zombie, and stays that way for weeks, then these goroutines will still pile up. Or if it's a nearly valid, but fake channel update, we'll never actually process it.

Thinking about it a more, this doest at least restrict the amount of these goroutines waiting for a premature update to be processed to maxPrematureUpdates, which currently is 100.

exactly I think that's the cleanest solution

also added the safeguard to exit the goroutine after a timeout, I think this combination should be a good temporary fix until we have the actor model in place.

why do we need this if we already have remoteGossipMsgTimeout?

the remoteGossipMsgTimeout is just a safeguard to prevent issues if developers forget to write into the error channel to prevent potential leaks in the future. So it is an extra safetynet to prevent goroutine leaks because we cannot guarantee the error Chan is used currently in the future, that's why I was adding the comment that it is just temporary until we can replace this code design with the actor model proposed by roasbeef

removed the goroutine

ziggie1984 · 2025-07-02T10:14:11Z

running this on my node, and memory usage + goroutines are stable now, so I think that approach is good to go

yyforyongyu

Note that this bug is introduced in #9875, which adds a goroutine to log the error - i think we should just remove the goroutine instead, as it only logs the error but not processing it. By making a dramatic change to the tool we use lru and another timeout mechanism is an overkill imo.

yyforyongyu · 2025-07-02T11:34:48Z

discovery/gossiper.go

 			maxPrematureUpdates,
+			lru.WithDeleteCallback(
+				func(k uint64, cmsg *cachedNetworkMsg) {
+					// for every network message which is


why do we need this if we already have remoteGossipMsgTimeout?

yyforyongyu · 2025-07-02T11:36:47Z

peer/brontide.go

 						"msg %T: %v", msg,
 						err)
 				}
+


Think we might as well remove this goroutine as it does nothing but logging the error?

as described below, yes we can do that but that would not solve the problem we have here it would just shadow it and down the road when using the response we would run into this issue again.

ziggie1984 · 2025-07-02T12:03:25Z

Note that this bug is introduced in #9875, which adds a goroutine to log the error - i think we should just remove the goroutine instead, as it only logs the error but not processing it. By making a dramatic change to the tool we use lru and another timeout mechanism is an overkill imo.

The idea behind this change is to fix the issue we have by never writing in the errorChannel which we assume should happen also see the comment here:

lnd/discovery/gossiper.go

Lines 3137 to 3139 in 1d2e547

    
           // NOTE: We don't return anything on the error channel for this 
        
           // message, as we expect that will be done when this 
        
           // ChannelUpdate is later reprocessed.

Moreover it is just logging for now, but should be enhanced in the future, see also the TODO to punish the Peer potentially.

So the introduction of the goroutine in itself did not really introduce the issue by itself but rather revealed a deeper problem we are trying to fix with this PR.

I leave it to the majority but I would like to see this change be merged into LND.

starius

Posted a proposal and a question to verify this code doesn't introduce a dead lock.

discovery/gossiper.go

yyforyongyu · 2025-07-02T14:24:36Z

So the introduction of the goroutine in itself did not really introduce the issue by itself but rather revealed a deeper problem we are trying to fix with this PR.

Def - but I think the root problem is we are firing goroutines that without controlling their lifecycle, which is an anti-go thing as we should never start a goroutine without knowing how it will stop. In addition if we are just looking for temporary solution here I don't see why we can't just remove it, since we wanna have a more comprehensive fix via the actor?

ziggie1984 · 2025-07-02T14:54:39Z

Ok I think you have a valid point, removed the goroutine when processing network responses and also fixed a potential race condition.

yyforyongyu · 2025-07-02T15:16:11Z

discovery/gossiper.go

+	// premature ChannelUpdates. These are pointers and we might in the
+	// meantime receive new premature ChannelUpdate for this exact channel
+	// which will read also the same premature ChannelUpdates currently in
+	// the LRU cache.


hmmm...I think the cache uses sync map under the hood?

ok I think you are right, because we only read the *cachedNetworkMsg value we are concurrent safe.

discovery/gossiper.go

yyforyongyu · 2025-07-02T20:47:41Z

/gemini review

gemini-code-assist

Code Review

The code changes prevent a goroutine leak in the brontide by removing the waiting goroutine and adding comments to explain the change. The review suggests improvements to the comments for better context.

discovery/gossiper.go

peer/brontide.go

Roasbeef

LGTM 🧆

discovery/gossiper.go

We cannot rely on a response currently so we avoid spawning goroutines. This is just a temporary fix to avoid the goroutine leak.

yyforyongyu

LGTM🚢

ziggie1984 force-pushed the fix-goroutine-leak branch from 65fb58f to 82a2d85 Compare July 1, 2025 18:49

ziggie1984 added the bug fix label Jul 1, 2025

ziggie1984 added this to the v0.19.2 milestone Jul 1, 2025

ziggie1984 self-assigned this Jul 1, 2025

ziggie1984 marked this pull request as ready for review July 1, 2025 18:52

ziggie1984 force-pushed the fix-goroutine-leak branch from 82a2d85 to 34d53d3 Compare July 1, 2025 19:25

ziggie1984 requested a review from starius July 1, 2025 19:25

ziggie1984 force-pushed the fix-goroutine-leak branch 2 times, most recently from abced28 to b870fb7 Compare July 1, 2025 23:49

Roasbeef reviewed Jul 1, 2025

View reviewed changes

ziggie1984 force-pushed the fix-goroutine-leak branch from 6692785 to de88a18 Compare July 2, 2025 10:43

yyforyongyu requested changes Jul 2, 2025

View reviewed changes

ziggie1984 force-pushed the fix-goroutine-leak branch from de88a18 to 32af9d5 Compare July 2, 2025 11:46

ziggie1984 force-pushed the fix-goroutine-leak branch from 32af9d5 to ce879b7 Compare July 2, 2025 12:42

starius reviewed Jul 2, 2025

View reviewed changes

discovery/gossiper.go Outdated Show resolved Hide resolved

discovery/gossiper.go Outdated Show resolved Hide resolved

ziggie1984 force-pushed the fix-goroutine-leak branch from ce879b7 to b4fb92a Compare July 2, 2025 13:21

ziggie1984 force-pushed the fix-goroutine-leak branch 2 times, most recently from b7c81d7 to fc0dcfb Compare July 2, 2025 14:52

ziggie1984 requested review from yyforyongyu and starius July 2, 2025 14:53

yyforyongyu reviewed Jul 2, 2025

View reviewed changes

ziggie1984 force-pushed the fix-goroutine-leak branch 2 times, most recently from 7959ccf to 024ae3a Compare July 2, 2025 16:13

ziggie1984 requested review from yyforyongyu and Roasbeef July 2, 2025 16:14

yyforyongyu reviewed Jul 2, 2025

View reviewed changes

discovery/gossiper.go Show resolved Hide resolved

gemini-code-assist bot reviewed Jul 2, 2025

View reviewed changes

discovery/gossiper.go Show resolved Hide resolved

peer/brontide.go Show resolved Hide resolved

peer/brontide.go Show resolved Hide resolved

Roasbeef approved these changes Jul 2, 2025

View reviewed changes

discovery/gossiper.go Show resolved Hide resolved

ziggie1984 force-pushed the fix-goroutine-leak branch from 024ae3a to 699c097 Compare July 3, 2025 04:25

ziggie1984 added 3 commits July 3, 2025 06:27

discovery: add comments

dedb75a

brontide: remove async goroutine to process gossip process result

ed8ad3d

We cannot rely on a response currently so we avoid spawning goroutines. This is just a temporary fix to avoid the goroutine leak.

docs: add release-notes

e6aff21

ziggie1984 force-pushed the fix-goroutine-leak branch from 699c097 to e6aff21 Compare July 3, 2025 04:28

ziggie1984 requested a review from yyforyongyu July 3, 2025 04:29

yyforyongyu approved these changes Jul 3, 2025

View reviewed changes

yyforyongyu merged commit ffd944e into lightningnetwork:master Jul 3, 2025
37 of 40 checks passed

guggero mentioned this pull request Jul 3, 2025

release: create v0.19.2-rc1 branch #9986

Merged

yyforyongyu mentioned this pull request Jul 4, 2025

Skip unnecessary disconnect and wait for disconnect to finish in shutdown #10031

Open

multi: prevent goroutine leak in brontide #10012

multi: prevent goroutine leak in brontide #10012

Uh oh!

Conversation

ziggie1984 commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jun 30, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziggie1984 commented Jul 2, 2025

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziggie1984 commented Jul 2, 2025

Uh oh!

starius left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yyforyongyu commented Jul 2, 2025

Uh oh!

ziggie1984 commented Jul 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yyforyongyu commented Jul 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Roasbeef left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ziggie1984 commented Jun 30, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)