graphdb: uniformly use fine grained locking when accessing the graph db cache #10247

Roasbeef · 2025-09-24T23:00:45Z

I noticed in #10128 that at times, we'll hold the cache mutex the entire time a DB transaction is active. This effectively makes all access to the graph DB fully serial. As seen in #10245, if we hold this mutex for too long, while network activity is on going for transactions, then we'll end up blocking other goroutines unnecessarily.

In this PR, to resolve that we switch to a fine grained locking methodology. Before this commit, we'd just grab the lock, then user a defer, while typically only access the cache at the very end of the function. Now we'll grab the mutex only when we need it.

We also add randomized concurrent test using rapid to attempt to catch any rare deadlocks that this set of changes may introduce.

One open question is: are there any cache consistency violations that may arise as a result of this PR that we need to worry about?

gemini-code-assist · 2025-09-24T23:00:58Z

Summary of Changes

Hello @Roasbeef, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the locking mechanism within the graph database cache to enhance concurrency and overall system responsiveness. Previously, a broad mutex lock could serialize access to the graph DB, leading to performance bottlenecks. The updated approach employs more granular locking, ensuring the mutex is held only for the brief periods required for cache operations. This change is validated by a new property-based test designed to expose concurrency issues.

Highlights

Fine-grained Locking: The graph DB cache now uses fine-grained locking, acquiring the cache mutex only when directly interacting with the cache, rather than holding it for the entire duration of a database transaction. This change aims to improve concurrency and prevent serialization of graph DB access.
Improved Concurrency: By reducing the scope of the cache mutex, the system can avoid unnecessary blocking of goroutines, especially during network activity, addressing performance issues observed in related pull requests.
New Concurrency Test: A new randomized concurrent test has been added using the rapid framework. This test rigorously verifies the cache mutex behavior under contention by simulating various concurrent operations on the ChannelGraph's public API, helping to catch potential deadlocks or race conditions introduced by the locking changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the locking strategy for the graph database cache, moving from a coarse-grained mutex held for the duration of database transactions to a more fine-grained approach where locks are acquired only when accessing the cache. This is a great improvement for performance, as it significantly reduces lock contention and prevents blocking other goroutines. The addition of a rapid-based concurrent property test is also excellent for increasing confidence in the correctness of these concurrent changes.

My main concern, which you've also raised, is the potential for cache consistency violations. By releasing the lock between the database transaction and the cache update, a window is created where the cache can be out of sync with the database. I've added a detailed comment on this. Overall, this is a solid PR that addresses an important performance issue.

gemini-code-assist · 2025-09-24T23:03:17Z

graph/db/kv_store.go

+	if len(chansClosed) > 0 {
+		c.cacheMu.Lock()
+		for _, channel := range chansClosed {
+			c.rejectCache.remove(channel.ChannelID)
+			c.chanCache.remove(channel.ChannelID)
+		}
+		c.cacheMu.Unlock()
 	}


While this fine-grained locking approach improves performance by reducing lock contention, it introduces a time window where the cache can be inconsistent with the database.

Specifically, between the kvdb.Update call completing and the cache being updated here, other goroutines can read stale data from the cache. For example, another thread could read a channel from chanCache that has just been pruned from the database, leading pathfinding algorithms to construct routes through channels that are already closed.

While routing failures are generally recoverable, this inconsistency might be undesirable. If strong consistency is a requirement, an alternative would be to hold the write lock across both the database transaction and the cache update, which is what the previous implementation did. Given the performance goals of this PR, this trade-off might be acceptable, but it's a critical point to consider and document.

gemini-code-assist · 2025-09-24T23:03:17Z

graph/db/sql_store.go

+	if len(closedChans) > 0 {
+		s.cacheMu.Lock()
+		for _, channel := range closedChans {
+			s.rejectCache.remove(channel.ChannelID)
+			s.chanCache.remove(channel.ChannelID)
+		}
+		s.cacheMu.Unlock()
 	}


Similar to the kv_store.go implementation, this change introduces a potential cache consistency issue. There's a window between the database transaction committing and the cache being updated where other goroutines could read stale data. Please see my more detailed comment on graph/db/kv_store.go.

In this commmit, we switch kv and SQL stores to only hold cacheMu while touching the caches. This avoids long-held locks during heavyweight DB work and lets readers use RLock when hitting the chan cache. The cache purges and inserts now take the mutex only when there is actual work to do.

In this commit, we add a Rapid-powered property test that drives ChannelGraph through its public API from multiple goroutines. The workload mixes adds, pruning, deletes, horizon scans, and zombie transitions to shake out deadlocks introduced by the cache mutex changes.

In this commit, we make the cache concurrency property test track inserted channel metadata so subsequent operations reuse real channel IDs, outpoints, and block hashes. This increases the odds of hitting cache mutation paths instead of falling back to no-op code paths. Also skip the long-running property when tests are executed with -short.

Roasbeef added graph optimization concurrency labels Sep 24, 2025

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

Roasbeef force-pushed the fine-grained-graph branch from 48b1e16 to 3ace97b Compare September 24, 2025 23:33

Roasbeef added 5 commits September 24, 2025 16:36

graph/db: use fine grained locking for HasChannelEdge

a0f6b6a

graph/db: use fine grained locking for UpdateEdgePolicy

cd6b27b

Roasbeef force-pushed the fine-grained-graph branch from 3ace97b to cd6b27b Compare September 24, 2025 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

graphdb: uniformly use fine grained locking when accessing the graph db cache #10247

graphdb: uniformly use fine grained locking when accessing the graph db cache #10247

Uh oh!

Roasbeef commented Sep 24, 2025

Uh oh!

gemini-code-assist bot commented Sep 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

gemini-code-assist bot Sep 24, 2025

Uh oh!

Uh oh!

graphdb: uniformly use fine grained locking when accessing the graph db cache #10247

Are you sure you want to change the base?

graphdb: uniformly use fine grained locking when accessing the graph db cache #10247

Uh oh!

Conversation

Roasbeef commented Sep 24, 2025

Uh oh!

gemini-code-assist bot commented Sep 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!