Skip to content

Conversation

skyzh
Copy link
Member

@skyzh skyzh commented Jul 30, 2025

Problem

Part of LKB-2634

The main motivation of this patch is to make the lease LSN requests in parallel. On that, it seems like a refactor to make it running in async is a good choice.

Summary of changes

  • Issue lease_lsn requests in parallel.
  • ..which is done by futures::stream::buffered_unordered.
  • revert the test case test_readonly_node_gc to issue lease requests only from the compute

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh skyzh force-pushed the skyzh/lease-lsn-tokio branch from 15732e8 to 81d0f5d Compare July 30, 2025 15:48
@skyzh skyzh marked this pull request as ready for review July 30, 2025 16:19
@skyzh skyzh requested a review from a team as a code owner July 30, 2025 16:19
@skyzh skyzh requested review from ololobus, myrrc and erikgrinaker July 30, 2025 16:19
Copy link

github-actions bot commented Jul 30, 2025

9075 tests run: 8402 passed, 0 failed, 673 skipped (full report)


Flaky tests (1)

Postgres 17

Code coverage* (full report)

  • functions: 34.7% (8840 of 25457 functions)
  • lines: 45.8% (71633 of 156461 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
5622bf0 at 2025-07-30T20:02:06.957Z :recycle:

@skyzh skyzh marked this pull request as draft July 30, 2025 18:35
@skyzh
Copy link
Member Author

skyzh commented Jul 30, 2025

need to asyncify wait_timeout_while_pageserver_connstr_unchanged, converting to draft

@skyzh skyzh force-pushed the skyzh/lease-lsn-tokio branch from ffd12aa to 2a4feb6 Compare July 30, 2025 18:51
Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh skyzh force-pushed the skyzh/lease-lsn-tokio branch from 2a4feb6 to 5622bf0 Compare July 30, 2025 18:58
@skyzh skyzh marked this pull request as ready for review July 30, 2025 20:40
@skyzh
Copy link
Member Author

skyzh commented Jul 30, 2025

cc @alexanderlaw where can I trigger 100 runs of a specific test? I'd like to see if the previous flakyness can be fixed without changing the test case. Thanks!

@alexanderlaw
Copy link
Contributor

cc @alexanderlaw where can I trigger 100 runs of a specific test? I'd like to see if the previous flakyness can be fixed without changing the test case. Thanks!

Please try something like this:

gh workflow run 'Build and Run Selected Test' --ref skyzh/stable-test-readonly-node-gc -f test-selection='test_readonly_node_gc' -f run-count='30' -f  archs='["x64", "arm64"]' -f build-types='["release"]' -f pg-versions='[{"pg_version":"v14"}, {"pg_version":"v15"}, {"pg_version":"v17"}]'

}
}

/// Acquires lsn lease in a retry loop. Returns the expiration time if a lease is granted.
/// Returns an error if a lease is explicitly not granted. Otherwise, we keep sending requests.
fn acquire_lsn_lease_with_retry(
async fn acquire_lsn_lease_with_retry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs spawn_blocking for wait_timeout_while_pageserver_connstr_unchanged too.

}
}

Ok(leases.into_iter().min().flatten())
let mut stream = futures::stream::iter(jobs).buffer_unordered(MAX_CONCURRENT_LEASE_CONNECTIONS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just use an unbounded FuturedUnordered here. If the tenant has many shards, they are mostly likely running on many Pageservers, so the concurrency limiting doesn't really do anything for the server. For the client side, we don't care about opening O(16) concurrent connections.

@skyzh skyzh closed this Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants