Skip to content

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented May 23, 2025

Problem

The Pageserver gRPC server should support TLS.

Requires #11972.
Touches #11728.

Summary of changes

Add a listen_grpc_tls setting to enable gRPC TLS support, using the Pageserver's standard SSL certificate and key with automatic reloading.

When enabled, inbound TLS connections are transparently decrypted before they're passed on to the Tonic server.

@erikgrinaker erikgrinaker requested a review from a team as a code owner May 23, 2025 13:39
@erikgrinaker erikgrinaker mentioned this pull request May 23, 2025
29 tasks
@erikgrinaker erikgrinaker changed the title pageserver: enable TLS for gRPC server pageserver: add TLS support for gRPC server May 23, 2025
Copy link

github-actions bot commented May 23, 2025

8492 tests run: 7944 passed, 0 failed, 548 skipped (full report)


Flaky tests (1)

Postgres 17

Code coverage* (full report)

  • functions: 32.4% (9050 of 27907 functions)
  • lines: 48.7% (79445 of 163243 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
472031e at 2025-05-26T10:40:34.324Z :recycle:

match result {
Ok(tcp_conn) => yield tls_acceptor
.accept(tcp_conn)
.await
Copy link
Member

@DimasKovas DimasKovas May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not recommended to implement TLS like that

The current flow of the execution will be:

TCP accept (1) -> TLS handshake (1) -> TCP accept (2) -> TLS handshake (2) -> ...

The problem is that it takes some time to perform TLS handshake (RTT), and during TLS handshake you can't accept other connection. Even worse, if the client hangs during TLS handshake, you won't be able to accept other connections till the TLS handshake is timed out.

The recommended way is to run a loop which only accepts TCP connections and then spawn a new tokio task to perform TLS handshake and execute the request.

That's what the example of tonic_rustls does:
https://github.com/hyperium/tonic/blob/689a86dbad107fa721714bb02e7f8e39264a377d/examples/src/tls_rustls/server.rs#L68

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, great catch -- but really annoying. I'd like to avoid having to re-wire the entire server.

It would be possible to spawn handshake tasks and still plumb them into serve_with_incoming as a stream. But spawning two successive tasks might be too expensive.

Let me do a third take on Monday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But spawning two successive tasks might be too expensive.

My wording was not 100% correct. The idea was to execute handshakes/TCP accepts concurrently, and spawning a tokio task is one possible option. I guess just handling it in the current tokio task via tokio::select! should be fine too, I don't see major pitfalls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that'd also work.

I'm going to focus on implementing the actual page service first, and revisit this in a bit, as it's not on the critical path.

Base automatically changed from erik/pageserver-grpc to main May 26, 2025 08:36
@erikgrinaker erikgrinaker force-pushed the erik/pageserver-grpc-tls branch from 09c79fc to 472031e Compare May 26, 2025 09:38
@erikgrinaker erikgrinaker marked this pull request as draft May 26, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants