Skip to content

lightning-liquidity: Introduce MessageQueueNotifierGuard type #3981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Aug 1, 2025

Previously, when enqueuing new messages to the `MessageQueue`, we'd
directly notify the BP to handle the messages, potentially causing
multiple wake-ups in short succession, and risking that we'd reenter
with crucial locks still held.

Here, we instead introduce a `MessageQueueNotifierGuard` type that
parallels our recently-introduced `EventQueueNotifierGuard`, buffers the
messages, and will only append them to the message queue and notify the
BP when dropped. This will allow us to remove a lot of error-prone
boilerplate in the next step.
Now that we have the `MessageQueueNotifierGuard`, we can be sure that we
always dropped locks before notifying. Hence, we can save *a lot* of
error-prone boilerplate that we used to ensure we'd only enqueue if we
dropped locks.

Had the start of this still laying around as a follow-up to #3509 / as part of the liquidity persistence branch. Think it's a nice clean-up to have both queues work the same way, if nothing else.

(cc @martinsaposnic)

@tnull tnull requested a review from TheBlueMatt August 1, 2025 09:39
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 1, 2025

🎉 This PR is now ready for review!
Please choose at least one reviewer by assigning them on the right bar.
If no reviewers are assigned within 10 minutes, I'll automatically assign one.
Once the first reviewer has submitted a review, a second will be assigned if required.

Copy link

codecov bot commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 77.37557% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.96%. Comparing base (61e5819) to head (309591a).

Files with missing lines Patch % Lines
lightning-liquidity/src/lsps1/client.rs 0.00% 25 Missing ⚠️
lightning-liquidity/src/lsps2/service.rs 85.09% 24 Missing ⚠️
lightning-liquidity/src/lsps5/service.rs 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3981      +/-   ##
==========================================
+ Coverage   88.94%   88.96%   +0.01%     
==========================================
  Files         174      174              
  Lines      124201   124161      -40     
  Branches   124201   124161      -40     
==========================================
- Hits       110472   110460      -12     
+ Misses      11251    11226      -25     
+ Partials     2478     2475       -3     
Flag Coverage Δ
fuzzing 22.64% <12.12%> (+<0.01%) ⬆️
tests 88.79% <76.47%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tnull tnull marked this pull request as draft August 1, 2025 10:17
@tnull
Copy link
Contributor Author

tnull commented Aug 1, 2025

Drafting this, as I'll add more changes here.

@tnull tnull force-pushed the 2025-08-message-queue-notifier branch from 399cd56 to fd5f90a Compare August 1, 2025 11:11
@tnull tnull marked this pull request as ready for review August 1, 2025 11:11
@tnull
Copy link
Contributor Author

tnull commented Aug 1, 2025

Drafting this, as I'll add more changes here.

Undrafting, added more cleanups and took a slightly different approach than originally for the MessageQueueNotifier.

@tnull tnull force-pushed the 2025-08-message-queue-notifier branch 2 times, most recently from 9efe41e to 686c921 Compare August 1, 2025 11:19
Copy link
Contributor

@martinsaposnic martinsaposnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few small comments. otherwise LGTM

@tnull tnull force-pushed the 2025-08-message-queue-notifier branch from 686c921 to 3c13e5e Compare August 1, 2025 13:56
@martinsaposnic
Copy link
Contributor

fixup 3cfffa8 looks good

nit: sometimes the notifier is instantiated before input validation so it can create an unnecessary allocation. we could instantiate the guard once it’s certain a message will be queued (this is also happening with event_queue_notifier)

feel free to ignore this comment since this has such a tiny impact, but still worth mentioning.

@tnull
Copy link
Contributor Author

tnull commented Aug 1, 2025

fixup 3cfffa8 looks good

nit: sometimes the notifier is instantiated before input validation so it can create an unnecessary allocation. we could instantiate the guard once it’s certain a message will be queued (this is also happening with event_queue_notifier)

feel free to ignore this comment since this has such a tiny impact, but still worth mentioning.

Actually, no, we need to make sure that the notifier is always created before the respective locks as they'll dropped in reverse order, which is why we acquire the guards first thing in the methods. I even considered requiring to hand-in the lock, but that is probably not easily done across implementations. I do however consider introducing a common PersistenceNotifierGuard that would hold both event guard and the message queue guard for the upcoming persistence PR. We'll see. Apart from that, which allocation are you referring to exactly?

@martinsaposnic
Copy link
Contributor

which allocation are you referring to exactly?

the notifier allocation itself. but what you are saying makes sense, we need to create it at the top. thanks for the clarification

@martinsaposnic
Copy link
Contributor

we need to make sure that the notifier is always created before the respective locks as they'll dropped in reverse order

this makes me nervous as it's super easy to get the order wrong and difficult to review and test if there is such a case.

not sure if there is a standard for doing something like a debug_assert for lock ordering

@TheBlueMatt
Copy link
Collaborator

The LDK mutexes have a bunch of logic to ensure lock order inversions are caught in testing. The lightning-liquidity crate uses those via the files symlinked in the sync directory. If any tests exercise two paths which have conflicting lockorders (even across different tests), they'll fail.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please squash :)

@tnull tnull force-pushed the 2025-08-message-queue-notifier branch from 3c13e5e to edad14b Compare August 1, 2025 16:06
@tnull
Copy link
Contributor Author

tnull commented Aug 1, 2025

please squash :)

Squashed without further changes.

tnull added 2 commits August 1, 2025 18:16
Previously, when enqueuing new messages to the `MessageQueue`, we'd
directly notify the BP to handle the messages, potentially causing
multiple wake-ups in short succession, and risking that we'd reenter
with crucial locks still held.

Here, we instead introduce a `MessageQueueNotifierGuard` type that
parallels our recently-introduced `EventQueueNotifierGuard`, buffers the
messages, and will only append them to the message queue and notify the
BP when dropped. This will allow us to remove a lot of error-prone
boilerplate in the next step.
Now that we have the `MessageQueueNotifierGuard`, we can be sure that we
always dropped locks before notifying. Hence, we can save *a lot* of
error-prone boilerplate that we used to ensure we'd only enqueue if we
dropped locks.
@tnull tnull force-pushed the 2025-08-message-queue-notifier branch from edad14b to 309591a Compare August 1, 2025 16:16
@tnull
Copy link
Contributor Author

tnull commented Aug 1, 2025

please squash :)

Squashed without further changes.

Actually, force-pushed yet another minor fixup, ensuring we drop the queue lock before waking BP:

diff --git a/lightning-liquidity/src/message_queue.rs b/lightning-liquidity/src/message_queue.rs
index fc6639b53..45b3c7f48 100644
--- a/lightning-liquidity/src/message_queue.rs
+++ b/lightning-liquidity/src/message_queue.rs
@@ -55,6 +55,5 @@ impl<'a> Drop for MessageQueueNotifierGuard<'a> {
        fn drop(&mut self) {
                if !self.buffer.is_empty() {
-                       let mut queue_lock = self.msg_queue.queue.lock().unwrap();
-                       queue_lock.append(&mut self.buffer);
+                       self.msg_queue.queue.lock().unwrap().append(&mut self.buffer);
                        self.msg_queue.pending_msgs_notifier.notify();
                }

@martinsaposnic
Copy link
Contributor

The LDK mutexes have a bunch of logic to ensure lock order inversions are caught in testing. The lightning-liquidity crate uses those via the files symlinked in the sync directory. If any tests exercise two paths which have conflicting lockorders (even across different tests), they'll fail.

thanks for the explanation. I had no idea this existed.

I wanted to check how this works so I changed the lock order on 2 methods from LSPS5 and left the other one unchanged

  • handle_set_webhook: webhooks.lock() -> pending_messages.notifier()
  • handle_list_webhook: webhooks.lock() -> pending_messages.notifier()
  • handle_remove_webhook: pending_messages.notifier() -> webhooks.lock()

these 3 methods have plenty of coverage

I ran the lightning-liquidity tests and no failures, then ran ./ci/ci-tests.sh (since I don't know exactly how these checks are run) and also no failures. not sure if I'm missing something

I can open an issue since this is maybe unrelated to this PR? @TheBlueMatt

@tnull
Copy link
Contributor Author

tnull commented Aug 1, 2025

The LDK mutexes have a bunch of logic to ensure lock order inversions are caught in testing. The lightning-liquidity crate uses those via the files symlinked in the sync directory. If any tests exercise two paths which have conflicting lockorders (even across different tests), they'll fail.

thanks for the explanation. I had no idea this existed.

I wanted to check how this works so I changed the lock order on 2 methods from LSPS5 and left the other one unchanged

  • handle_set_webhook: webhooks.lock() -> pending_messages.notifier()
  • handle_list_webhook: webhooks.lock() -> pending_messages.notifier()
  • handle_remove_webhook: pending_messages.notifier() -> webhooks.lock()

these 3 methods have plenty of coverage

I ran the lightning-liquidity tests and no failures, then ran ./ci/ci-tests.sh (since I don't know exactly how these checks are run) and also no failures. not sure if I'm missing something

I can open an issue since this is maybe unrelated to this PR? @TheBlueMatt

I think there is a bit of a confusion here: we have checks for actual double-locks etc in debug_sync. However, the worst that (by now / since #3509) could happen in lightning-liquidity that you wake the background processor and it tries to to acquire any of the locks still held. It could end up being pretty inefficent worst-case, but given that the background processor will be driven by another thread/task anyways, I don't think we could hit any actual deadlocks (anymore, pre-#3509 there was indeed some potential, and we had be super careful to drop all locks before making the callback to PeerManager::process_pending_msgs). That said, there will be another layer of complexity added with persistence, which will likely be mostly driven by the background processor also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants