net: support proxy protocol v2 #3075

avikivity · 2025-10-29T20:58:13Z

Proxy protocol v2 [1] is a protocol used by haproxy to forward the original
client information to a server behind a reverse proxy. It is also employed
by AWS Network Load Balancers for the same purpose [2].

The connection load balancing algorithm "port" which uses the source port
number cannot be used behind a reverse proxy, because the port number
generated by the client is obfuscated by the reverse proxy.

We introduce a new listen_options member that declares connections to
the server_socket use proxy protocol v2. The remote_address() and local_address()
connected_socket members will now return the original connection's addresses.
The "port" load balancing algorithm will also use the original connection's
source port, restoring correct shard routing.

This server socket will only accept connections that have the protocol header
prepended to the connection data.

The httpd server is extended with a /shard endpoint (so we can observe the
shard we landed on) and a --proxy-protocol switch so we can test
the new algorithm. It is easy to test with the /shard endpoint and the curl
--local-port option, with and without haproxy.

[1] https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
[2] https://aws.amazon.com/blogs/networking-and-content-delivery/preserving-client-ip-address-with-proxy-protocol-v2-and-network-load-balancer/

avikivity · 2025-10-29T20:58:46Z

TODO: propagate the addresses to connected_socket's accessors, unit test

Copilot

Pull Request Overview

This PR adds support for the PROXY protocol v2 header parsing in Seastar's POSIX networking stack, introducing a new load balancing algorithm that uses connection information from the proxy protocol header.

Key changes:

Implements PROXY protocol v2 header parsing for IPv4 and IPv6 connections
Adds proxy_protocol_v2_and_port load balancing algorithm that extracts remote address from the header
Updates the httpd demo application to support configurable load balancing algorithms

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/net/posix-stack.cc	Implements proxy protocol v2 header parsing and integrates it into the accept loop with new load balancing logic
include/seastar/net/posix-stack.hh	Adds move assignment operators to conntrack::handle class
include/seastar/net/api.hh	Adds new `proxy_protocol_v2_and_port` load balancing algorithm enum value
apps/httpd/main.cc	Adds command-line option for selecting load balancing algorithm and a new `/shard` endpoint

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-29T21:13:17Z

src/net/posix-stack.cc

            }
-        } ();
+            auto& proxy_protocol_header = *proxy_protocol_header_opt;
+            auto& psa = proxy_protocol_header.remote_address;


The code unconditionally calls as_posix_sockaddr_in() on the remote address, which assumes IPv4. This will fail when the proxy protocol header contains an IPv6 address (fam == 0x2). The code should check the address family and use as_posix_sockaddr_in6().sin6_port for IPv6 addresses, similar to how it's done on line 614.

Suggested change

auto& psa = proxy_protocol_header.remote_address;

uint16_t port = 0;

if (psa.family() == AF_INET) {

port = ntoh(psa.as_posix_sockaddr_in().sin_port);

} else if (psa.family() == AF_INET6) {

port = ntoh(psa.as_posix_sockaddr_in6().sin6_port);

}

cth = _conntrack.get_handle(port % smp::count);

Copilot · 2025-10-29T21:13:17Z

src/net/posix-stack.cc

+        case server_socket::load_balancing_algorithm::connection_distribution:
+            cth = _conntrack.get_handle();
+            break;
+        case server_socket::load_balancing_algorithm::port:


The code unconditionally calls as_posix_sockaddr_in() which assumes IPv4. This will fail for IPv6 connections where sa.family() is AF_INET6. The code should check the address family and use as_posix_sockaddr_in6().sin6_port for IPv6 addresses.

Suggested change

case server_socket::load_balancing_algorithm::port:

uint16_t port = 0;

if (sa.family() == AF_INET) {

port = ntohs(sa.as_posix_sockaddr_in().sin_port);

} else if (sa.family() == AF_INET6) {

port = ntohs(sa.as_posix_sockaddr_in6().sin6_port);

}

cth = _conntrack.get_handle(port % smp::count);

avikivity · 2025-10-29T21:13:23Z

v2: fix build in module mode by adding missing include

gleb-cloudius · 2025-10-30T12:03:06Z

src/net/posix-stack.cc

+        if (fam_proto != 0x00) { // UNSPEC
+            co_return std::nullopt;
+        }
+        co_return local_proxy_protocol_v2_header(fd);


The 15th and 16th bytes is the address length in bytes in network endian order. It is used so that the receiver knows how many address bytes to skip even when it does not implement the presented protocol. Thus the length of the protocol header in bytes is always exactly 16 + this value. When a sender presents a LOCAL connection, it should not present any address so it sets this field to zero. Receivers MUST always consider this field to skip the appropriate number of bytes and must not assume zero is presented for LOCAL connections. When a receiver accepts an incoming connection showing an UNSPEC address family or protocol, it may or may not decide to log the address information if present.

Looks like we need to read data below anyway.

Right, I read this part but managed to misunderstand it.

avikivity · 2025-10-30T13:41:26Z

v3: really fix the module build. I believe our module build is completely broken.

avikivity · 2025-10-30T21:02:10Z

v4:

read second part of header even for LOCAL command
return addresses carried by header from connected_socket::local_address() and connected_socket::remote_address()

avikivity · 2025-10-31T15:39:50Z

After implementing connected_socket::local_address and connected_socket::remote_address, I think the feature should be enabled using a new bool listen_options::proxy_protocol_v2. This is because the feature doesn't just affect load balancing, it also affects local/remote address reporting. It's reasonable to use other load balancing options with proxy protocol v2.

So we need a new bool in listen_options, and change the meaning of load_balancing_algorithm::port to use the "real" remote port when proxy protocol is enabled.

gleb-cloudius · 2025-11-02T15:54:43Z

After implementing connected_socket::local_address and connected_socket::remote_address, I think the feature should be enabled using a new bool listen_options::proxy_protocol_v2. This is because the feature doesn't just affect load balancing, it also affects local/remote address reporting. It's reasonable to use other load balancing options with proxy protocol v2.

So we need a new bool in listen_options, and change the meaning of load_balancing_algorithm::port to use the "real" remote port when proxy protocol is enabled.

Well, I proposed to implement it as new type of socket similar to tls.

avikivity · 2025-11-04T11:29:37Z

After implementing connected_socket::local_address and connected_socket::remote_address, I think the feature should be enabled using a new bool listen_options::proxy_protocol_v2. This is because the feature doesn't just affect load balancing, it also affects local/remote address reporting. It's reasonable to use other load balancing options with proxy protocol v2.
So we need a new bool in listen_options, and change the meaning of load_balancing_algorithm::port to use the "real" remote port when proxy protocol is enabled.

Well, I proposed to implement it as new type of socket similar to tls.

Yes, it's very similar.

avikivity · 2025-11-08T21:49:26Z

v5:

rebase past 0353ec3
engaged via listen_options::proxy_protocol_v2 rather than load_balancing_algorithm::proxy_protocol_v2
(because it also affects local_address() and remote_address())

Still todo:

more tests

pgellert

We're looking to add proxy protocol support (v1 and v2) in Redpanda, so I wanted to chime in with a few comments from that perspective. Hope that's okay.

pgellert · 2025-11-10T12:11:59Z

src/net/posix-stack.cc

+    default:
+        co_return std::nullopt;
+    }
+    co_return header;


The RFC talks about chaining multiple layers in section "4.1. Multiple layers". However it is not clear to me from the spec if multiple layers of proxies would lead to the server receiving a response with: (a) a single header with the IPs populated with the original client's IPs, or (b) multiple headers with each proxy adding its own proxying information.

This blog post makes me think that the latter is possible: Two hops TCP proxies resulting in two PROXY protocol header.

Do you know what the expected behaviour is here or have you had a chance to test with multiple layers of proxy protocols?

To me it's more sensible for intermediate proxies to act as proxy protocol consumers (strip incoming headers) and producers (generate outgoing headers), populating the outgoing headers with addresses from the incoming headers.

It's also possible to chain the headers, but then the end consumer would need to know the number of layers to unstack.

I propose deferring this question until a real-life configuration is found.

I propose deferring this question until a real-life configuration is found.

That makes sense to me.

pgellert · 2025-11-10T12:21:02Z

include/seastar/net/api.hh

+    // The connection is encapsulated with proxy protocol v2 (which is just
+    // a header prepended to the data stream). connected_socket::remote_address()
+    // and connected_socket::local_address() will return the addresses
+    // as specified in the proxy protocol header. load_balancing_algorithm::port
+    // will use the port from the proxy protocol header.
+    bool proxy_protocol_v2 = false;


GCP only supports proxy protocol v1 at this point (ref) as far as I am aware. Even if seastar doesn't support proxy protocol v1 from the outset, it would be great if the implementation could be made generic to be easily extendable to proxy protocol v1.

Based on that I think it would make sense to make this config a more generic std::optional<proxy_protocol_config> proxy_protocol_config; that has a bool supports_v2 field or perhaps a vector of supported versions.

GCP Private Service Connect supports v2 (ref - https://docs.cloud.google.com/vpc/docs/about-vpc-hosted-services#proxy-protocol )

I can rename it to proxy_protocol and comment that only v2 is supported. If someone adds v1 support, that is a compatible change. If someone uses it with v1 despite the comment that it's not supported, they'll learn to read comments (I know I won't).

GCP Private Service Connect supports v2 (ref - https://docs.cloud.google.com/vpc/docs/about-vpc-hosted-services#proxy-protocol )

Thank you, I haven't come across this earlier. However, this is only for Private Service Connect, and GCP NLB's still use v1 unfortunately.

I can rename it to proxy_protocol and comment that only v2 is supported. If someone adds v1 support, that is a compatible change. If someone uses it with v1 despite the comment that it's not supported, they'll learn to read comments (I know I won't).

Not making specific versions of proxy protocol configurable, but having a single on/off config sounds good to me.

Another reason why I was thinking about a more generic config struct, however, is because I was also considering whether we are going to need an allowlist of IPs to accept proxy protocol headers from (i.e. a configurable IP list, that is used to validate the TCP connection's IP before accepting a connection with the proxy protocol header).

This is something that the spec calls for to prevent IP spoofing:

The receiver SHOULD ensure proper access filtering so that only trusted proxies are allowed to use this protocol.

And other implementations I could find seem to support such IP allow lists as a configuration of the proxy protocol consumer layer:

tcp-request connection expect-proxy layer4 if { src -f proxies.lst } -- https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#4.2-tcp-request%20connection

https://nginx.org/en/docs/stream/ngx_stream_realip_module.html#set_real_ip_from

https://mariadb.com/docs/server/clients-and-utilities/server-client-software/client-libraries/proxy-protocol-support#enabling-proxy-protocol-in-mariadb-server

What do you think about supporting IP whitelisting in the proxy protocol layer? I guess people could (and probably should) just use iptables to restrict traffic to the proxy protocol port, but I am wondering if there is a reason why other implementations support it as an explicit proxy protocol-layer config as well. I think it would certainly make configuring proxy protocol simpler in Kubernetes deployments.

I think in modern cloud environments it is easy to whitelist addresses using the software defined networking layer. Since there's a dedicated port for proxy protocol connections, it's easy to dedicate a firewall rule for it.

To support in-app whitelisting, we have two options:

add a vector of IP addresses to whitelist against

expose the real remote_address (and local_address) in addition to the proxied ones, and let application conduct the whitelisting

I think 2 is better. First, we expose information that is generically needed as a side effect. Second, whitelisting can easily become complicated (subnets, dynamic changes, blah blah). Third, the Seastar socket layer is supposed to mimic the posix API and not add random stuff. This series is an exception, but it has an excuse - we need to process the proxy packet before fully establishing the connected_socket, since we may need to migrate it based on the contents of the packet.

pgellert · 2025-11-10T12:21:56Z

apps/httpd/main.cc

    app.add_options()("port", bpo::value<uint16_t>()->default_value(10000), "HTTP Server port");
+    app.add_options()("load-balancing-algorithm",
+            bpo::value<std::string>()->default_value("connection_distribution"),
+            "Load balancing algorithm: connection_distribution, port, proxy_protocol_v2_and_port, fixed");


I think you backed out of the separate load balancing algo earlier:

Suggested change

"Load balancing algorithm: connection_distribution, port, proxy_protocol_v2_and_port, fixed");

"Load balancing algorithm: connection_distribution, port, fixed");

Yes, bad rebase, thanks.

avikivity · 2025-11-11T09:53:16Z

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

so I wanted to chime in with a few comments from that perspective. Hope that's okay.

That's why there is a comment box here.

avikivity · 2025-11-11T09:54:13Z

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

Ah, I guess for GCP, but it does support v2.

v1 is outdated and I don't see a modern use case for it.

pgellert · 2025-11-11T10:11:17Z

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

Ah, I guess for GCP, but it does support v2.

v1 is outdated and I don't see a modern use case for it.

Yes, the use case would be GCP NLB's. In GCP, only Private Service Connect supports v2 as per the docs, whereas NLB's only support v1 still.

avikivity · 2025-11-11T10:33:33Z

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

Ah, I guess for GCP, but it does support v2.
v1 is outdated and I don't see a modern use case for it.

Yes, the use case would be GCP NLB's. In GCP, only Private Service Connect supports v2 as per the docs, whereas NLB's only support v1 still.

https://cloud.google.com/load-balancing/docs/tcp/setting-up-tcp#proxy-protocol

https://docs.cloud.google.com/vpc/docs/about-vpc-hosted-services#proxy-protocol

I see. Pity. Do send a patch once this is merged (or even before, and I can incorporate it into the series).

avikivity · 2025-11-12T20:29:57Z

v6:

rename configuration to proxy_protocol to leave open the implementation of v1
fix list of load balancing algorithms in httpd command line help
positive and negative tests for the protocol itself
rearranged the patches a little

pgellert · 2025-11-14T11:13:10Z

src/net/posix-stack.cc

+    virtual socket_address local_address() const noexcept override {
+        return _proxy_local_addr;
+    }
+    virtual socket_address remote_address() const noexcept override {
+        return _proxy_remote_addr;
+    }


Overwriting the remote_address makes sense to me, as that allows applications to treat the connection as if it originated from the remote client.

However, I would expect the local_address() to continue to refer to the server's socket address and not the proxy's socket address even if the proxy protocol is enabled. I think it would be more in-line with the meaning of local_address() if it was left as _ops->local_address(_fd.get_file_desc()) and to report the _proxy_remote_addr through a new proxy_address() method instead.

For example, in redpanda we use the local address for audit logging, reporting which interface the connection came in through, and we would want to continue to report the server's socket address there, regardless of whether the proxy protocol is enabled or not.

Alternatively, we could expose the server's socket address through a new method, but I think it would be better to leave the local_address() unchanged and expose the proxy address as a new method instead.

What do you think?

Arguments can be made both ways. The proxy's address is how the server is advertised to the user, so if you log it (for example, in scylladb we have a system.clients[1] table that is visible to the user), then the user can match the address to something known to them.

Eventually, we'll need to expose both address types. But note the real local address is already known to the server, who bound to it in the first place.

[1] (but system.clients only logs the remote address, not the local address)

For example, if you have a cluster of haproxies, and you suspect one of malfunctioning, having the proxy bind address tells you which one to look at.

Eventually, we'll need to expose both address types. But note the real local address is already known to the server, who bound to it in the first place.

This makes sense, and agreed that this is sufficient for my side.

Would the preferences for the behaviour of local_address() change if we supported decoding more than one layer of proxy protocol headers? For example consider a complex network layout such as:

customer network | VPC peering proxy / \ / \ | NLB [1] \ / server

I guess even in this case, exposing the local address as the vpc peering proxy's socket is still useful for matching the client-side view. But then the "local address" is really the first proxy's address, which may be outside of the control of the server operator.

I am happy either way because, as you said, both socket addresses are available to the server, so the server can choose which one to expose based on the use case. But I just wanted to think through and understand the trade offs with you.

[1] we may point the customer-facing domain name to an NLB to load balance initial connections (bootstrapping, in Kafka terms). After that, clients connect directly to the individual brokers by IP or broker-specific domain name.

Perhaps the API for exposing the "real" local/remote addresses can return a vector describing the path. The no-proxy case would return a vector of length 1.

I really hope we don't have to come to this. This kind of network engineering belongs in the network, not the application, though I understand why it came to be. For ScyllaDB, we'd be much better off with routeable IPv6 subnets, since the service structure is exposed to the application, unlike web servers.

I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.

pgellert · 2025-11-14T11:19:04Z

tests/unit/socket_test.cc

+    auto client = seastar::async([&listen_addr, header = std::move(header)] {
+        auto s = connect(listen_addr).get();
+        auto out = s.output();
+        out.write(header.data(), header.size()).get();
+        out.flush().get();
+        out.close().get();
+    });
+
+    when_all(std::move(server), std::move(client)).discard_result().get();
+    ss.abort_accept();
+
+    // With LOCAL mode, addresses should be real connection addresses, not from header
+    // We can't check exact match since the client port is ephemeral, but we can verify
+    // that the addresses are not the fake proxy addresses and the IPs match
+    BOOST_REQUIRE_EQUAL(actual_local, socket_address(listen_addr));


Is it worth strengthening this check to also assert here that BOOST_REQUIRE_EQUAL(actual_remote, actual_client_local) where actual_client_local = s.local_address() of the client's connected socket?

Good idea, done.

avikivity · 2025-11-16T09:49:34Z

v7:

strengthened LOCAL_MODE test to check the client's local address too (server's remote)
not done: advertise real server's local address in connected_socket::local_address(), kept the proxy's bind address instead

avikivity · 2025-11-17T19:19:27Z

v8:

rebased past conflicts

pgellert

Looks good from my side

avikivity · 2025-11-20T13:41:09Z

@nyh please review

avikivity · 2025-11-24T06:45:26Z

@nyh review ping

nyh · 2025-11-26T15:47:20Z

@avikivity I'm reviewing now, which means I needed to start by reading the protocol definition (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) which I was not aware of until today.

Can you please clarify, when you're saying that you want to support "proxy protocol v2" do you mean just the binary serialization added in v2? Or do you want to do the full v2, including both human-readable and binary versions?

avikivity · 2025-11-26T15:55:57Z

@avikivity I'm reviewing now, which means I needed to start by reading the protocol definition (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) which I was not aware of until today.

Can you please clarify, when you're saying that you want to support "proxy protocol v2" do you mean just the binary serialization added in v2? Or do you want to do the full v2, including both human-readable and binary versions?

Only the binary protocol.

Since this is a reverse proxy integration protocol, we don't expect to randomly receive v1.

nyh · 2025-11-26T15:57:50Z

@avikivity I'm reviewing now, which means I needed to start by reading the protocol definition (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) which I was not aware of until today.
Can you please clarify, when you're saying that you want to support "proxy protocol v2" do you mean just the binary serialization added in v2? Or do you want to do the full v2, including both human-readable and binary versions?

Only the binary protocol.

Since this is a reverse proxy integration protocol, we don't expect to randomly receive v1.

Yes, that's what I guessed. But it's misleading to call this "v2 protocol" because according to the documentation I am reading, the v2 protocol needs to support both textual and binary protocol. I think we should explicitly state that we only support the v2 binary protocol.

nyh

Looks very good. I had some minor comments and suggestions you may want to look at, but none of it is important enough to "request changes".

However, the thing that most bothered me is the name of the "proxy_protocol_v2_header" class, which I think has the wrong name - it is neither the header, nor specific to the v2 protocol. It is just the "original addresses" we took out of the proxy header, and if we supported the v1 protocol, we would have taken exactly the same information out of the textual header as well.

nyh · 2025-11-26T16:24:27Z

include/seastar/net/api.hh

+    // as specified in the proxy protocol header. load_balancing_algorithm::port
+    // will use the port from the proxy protocol header.
+    //
+    // Currently only proxy protocol v2 is supported.


I recommend using here and elsewhere a phrase like "proxy protocol v2 binary format" instead of just "proxy protocol v2" because according to the documentation, while v2 added the binary format, it didn't remove the textual format, and we are not implementing that.

I also suggest writing at least in one comment a link to https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt. The "proxy protocol" is a very generic sounding, readers might not find it obvious to which protocol this refers to, or if it even refers to a specific protocol (its name isn't capitalized).

Clarified; added link.

nyh · 2025-11-26T16:31:05Z

include/seastar/net/posix-stack.hh

    void on_batch_flush_error() noexcept override;
 };

+struct proxy_protocol_v2_header {


I found the name of this structure a bit confusing. It is not structured like the proxy protocol header, and doesn't even have the full capabilities of a proxy protocol header. Rather it just contains the pieces important to us today.

Also, the word "v2" in this structure's name is irrelevant - if we ever decide to support the textual format (from v1) as well, anyway we'll parse it into this same structure.

Renamed to proxy_data.

nyh · 2025-11-26T16:42:13Z

src/net/posix-stack.cc

+        buffer = heap_buffer.get();
+    }
+
+    auto xlen = co_await fd.read_some(buffer, len);


Nitpick, you don't need to create a named variable "xlen" - while it's not inefficient (the compiler will not keep it), a human reader needs to parse it. You can just do:

if (co_await fd.read_some(buffer, len) < len) { co_return std::nullopt; }

Same for the variable n_read above, by the way.

I think two lines are more readable, especially when people aren't used to co_await operator precedence.

nyh · 2025-11-26T16:46:09Z

src/net/posix-stack.cc

+    auto proto = fam_proto & 0x0F;
+
+    if (proto != 0x1) { // STREAM
+        co_return std::nullopt;


This return nullopt is different from the others - while the previous ones were cases where the connection was genuinely broken and should be dropped, the case here includes cases (like UDP) which are supported by the protocol, but you chose not to support right now. This is fine, but we should document which pieces we don't support.

Added comment.

nyh · 2025-11-26T16:47:33Z

src/net/posix-stack.cc

+        co_return std::nullopt;
+    }
+
+    proxy_protocol_v2_header header;


Again, I think both the struct and the variable name (header) are misnamed. It's not v2-specific, and it's not a "header", it's just the addresses we found in the header. Maybe we should call it something like original_addresses or client_addresses or something?

nyh · 2025-11-26T16:57:49Z

src/net/posix-stack.cc

+    virtual socket_address local_address() const noexcept override {
+        return _proxy_local_addr;
+    }
+    virtual socket_address remote_address() const noexcept override {
+        return _proxy_remote_addr;
+    }


I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.

nyh · 2025-11-26T17:02:25Z

tests/unit/socket_test.cc

    BOOST_CHECK(ss.remote_address().is_unspecified());
 }
+
+// Comprehensive tests for proxy protocol v2 implementation


When you add 16 tests for one feature (the proxy protocol), around 600 lines of code, I think it makes a lot of sense to put it in a separate source file.

nyh · 2025-11-26T17:06:01Z

tests/unit/socket_test.cc

+
+    auto server = seastar::async([&ss, &got_connection] {
+        using namespace std::chrono_literals;
+        // Set a timer to abort accept after 100ms


I don't understand this. I think we expect the connection to be rejected in a clean reliable way - never to need this timeout. Will this cause the test to fail - or incorrectly succeed - if the timeout occurs? Are we sure 100ms is always enough even on extremely overloaded test machines?

The server never sees a bad connection. It's rejected by the networking layer. It's like a malformed SYN packet, an accept will never see it. So we need to escape.

Seastar tests aren't overcommitted.

I added checks that we get the timeout when we expect it.

nyh · 2025-11-26T17:07:19Z

tests/unit/socket_test.cc

+            out.flush().get();
+            out.close().get();
+        } catch (...) {
+            // Expected - server may close connection during write/flush/close


So why did you write the client to send data while the server doesn't read data? Shouldn't you fix one of them - e.g., fix the client to read until eof?

I didn't write anything here.

nyh · 2025-11-26T17:09:23Z

tests/unit/socket_test.cc

+
+    // With LOCAL mode, addresses should be real connection addresses, not from header
+    // We can't check exact match since the client port is ephemeral, but we can verify
+    // that the addresses are not the fake proxy addresses and the IPs match


Does the BOOST_REQUIRE_EQUAL below really implement what this comment says ("we can verify that the addresses are NOT the fake proxy addresses")? I don't see how.

I believe they do. We check the client address against the real one.

avikivity · 2025-11-26T20:03:00Z

I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.

It's documented in listen_options::proxy_protocol.

The recursion is replaced by a while loop. This simplifes later patches which add more optional awaiting steps. The coroutinization does not add an allocation, as typically accept() will return a non-ready future and future::then_unpack() will allocate a continuation. In any case accept() is not very performance sensitive.

Implement assignment operators for conntrack::handle, needed for the next patch. There is nothing special about them here.

The immediately invoked lambda avoids some repetition, but if we want to add awaiting steps to it it will turn into a coroutine and add an allocation. Turn it into regular code instead, using the conntrack::handle assignment operator we added in the previous patch.

…kets Implement the proxy protocol v2 as defined by haproxy. This is a header prepended to the input stream containing the "real" local and remote addresses, hidden by the reverse proxy or load balancer. This cannot be implemented by the application, because we want port load balancing (load_balancing_algorithm::port) to continue to work and use the client's source port. Connection migration can only be done by Seastar at the moment. We can't use input_stream/data_source here since the connection will be transferred to another shard, and these objects are deeply intertwined with the shard, but we can use pollable_fd which is just a wrapper around a file descriptor. We don't need to use the equivalent of read_exactly() because the protocol guarantees the header will be sent in one packet and not be fragmented. In this patch, we only parse the protocol header and extract the remote address. Changing connected_socket::local_address() and connected_socket::remote_address() to reflect the proxy data is done later.

Proxied connections derive the local and remote addresses from the proxy protocol, not getsockname(2)/getpeername(2). Add a new derived class to hold the storage for those addresses and supply new implementations for local_address() and remote_address(). A constructor in posix_connected_socket_impl is made public for this. Since the class itself is part of the implementation, this doesn't expose anything.

Expand the existing tests to send a fake proxy protocol packet instead of binding to the source port.

16 tests covering: Negative cases: bad signature, incomplete packets, invalid versions/commands/families/protocols, wrong lengths. Positive cases: IPv4/IPv6 addresses, LOCAL mode, TLVs, extreme port values. The tests were written by copilot under my guidance.

Useful for testing connection load-balancing algorithms.

Useful for, well, testing load balancing algorithms. Can be used with curl's --local-port option, and a local haproxy to test the proxy protocol.

avikivity · 2025-11-26T21:14:06Z

v9:

clarified that only the binary format of proxy protocol is supported
added link to the protocol documentation
renamed proxy_protocol_v2_header to proxy_data
renamed variable of that type
commented non-support of proxied UDP over TCP
moved proxy protocol tests to new file
refactored tests to reduce code duplication
verify timeout happened when we expect it
rebased

avikivity requested review from Copilot and gleb-cloudius October 29, 2025 20:58

avikivity force-pushed the proxy-protocol-v2 branch from 69ad576 to d5664a1 Compare October 29, 2025 21:13

Copilot AI reviewed Oct 29, 2025

View reviewed changes

avikivity force-pushed the proxy-protocol-v2 branch from d5664a1 to 12c0cfe Compare October 30, 2025 11:04

gleb-cloudius reviewed Oct 30, 2025

View reviewed changes

avikivity force-pushed the proxy-protocol-v2 branch from 12c0cfe to c2c8500 Compare October 30, 2025 13:20

avikivity marked this pull request as draft October 30, 2025 21:00

avikivity force-pushed the proxy-protocol-v2 branch from c2c8500 to 5153478 Compare October 30, 2025 21:01

avikivity force-pushed the proxy-protocol-v2 branch from 5153478 to 4e7cace Compare November 8, 2025 21:48

pgellert reviewed Nov 10, 2025

View reviewed changes

avikivity changed the title ~~net: support proxy protocol v2 for load balancing~~ net: support proxy protocol v2 Nov 12, 2025

avikivity force-pushed the proxy-protocol-v2 branch from 90a7664 to c24513d Compare November 12, 2025 20:29

avikivity marked this pull request as ready for review November 12, 2025 20:30

pgellert reviewed Nov 14, 2025

View reviewed changes

avikivity force-pushed the proxy-protocol-v2 branch from c24513d to d9a6500 Compare November 16, 2025 09:48

avikivity force-pushed the proxy-protocol-v2 branch from d9a6500 to 7109cfe Compare November 17, 2025 19:19

pgellert approved these changes Nov 18, 2025

View reviewed changes

avikivity requested a review from nyh November 20, 2025 13:41

nyh approved these changes Nov 26, 2025

View reviewed changes

avikivity added 10 commits November 26, 2025 22:18

net: conntrack: complete handle nested class special member functions

e7c51f9

Implement assignment operators for conntrack::handle, needed for the next patch. There is nothing special about them here.

net: posix_server_socket_impl: adjust indentation

b693563

test: socket_test: test load balancer with proxy protocol

d45a5a7

Expand the existing tests to send a fake proxy protocol packet instead of binding to the source port.

apps: httpd: add /shard endpoint

dda1b1c

Useful for testing connection load-balancing algorithms.

apps: httpd: add --load-balancing-algorithm

91c0655

Useful for, well, testing load balancing algorithms. Can be used with curl's --local-port option, and a local haproxy to test the proxy protocol.

avikivity force-pushed the proxy-protocol-v2 branch from 7109cfe to 91c0655 Compare November 26, 2025 21:13

nyh approved these changes Nov 27, 2025

View reviewed changes

nyh closed this in 5b75164 Nov 27, 2025

nyh merged commit 5b75164 into scylladb:master Nov 27, 2025
16 checks passed

-            auto& psa = proxy_protocol_header.remote_address;
+            uint16_t port = 0;
+            if (psa.family() == AF_INET) {
+                port = ntoh(psa.as_posix_sockaddr_in().sin_port);
+            } else if (psa.family() == AF_INET6) {
+                port = ntoh(psa.as_posix_sockaddr_in6().sin6_port);
+            }
+            cth = _conntrack.get_handle(port % smp::count);

-        case server_socket::load_balancing_algorithm::port:
+            uint16_t port = 0;
+            if (sa.family() == AF_INET) {
+                port = ntohs(sa.as_posix_sockaddr_in().sin_port);
+            } else if (sa.family() == AF_INET6) {
+                port = ntohs(sa.as_posix_sockaddr_in6().sin6_port);
+            }
+            cth = _conntrack.get_handle(port % smp::count);

	"Load balancing algorithm: connection_distribution, port, proxy_protocol_v2_and_port, fixed");
	"Load balancing algorithm: connection_distribution, port, fixed");

net: support proxy protocol v2 #3075

net: support proxy protocol v2 #3075

Uh oh!

Conversation

avikivity commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avikivity commented Oct 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

avikivity commented Oct 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avikivity commented Oct 30, 2025

Uh oh!

avikivity commented Oct 30, 2025

Uh oh!

avikivity commented Oct 31, 2025

Uh oh!

gleb-cloudius commented Nov 2, 2025

Uh oh!

avikivity commented Nov 4, 2025

Uh oh!

avikivity commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pgellert left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avikivity commented Nov 11, 2025

Uh oh!

avikivity commented Nov 11, 2025

Uh oh!

pgellert commented Nov 11, 2025

Uh oh!

avikivity commented Nov 11, 2025

Uh oh!

avikivity commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avikivity commented Oct 29, 2025 •

edited

Loading

avikivity commented Nov 8, 2025 •

edited

Loading