Skip to content

Conversation

@avikivity
Copy link
Member

@avikivity avikivity commented Oct 29, 2025

Proxy protocol v2 [1] is a protocol used by haproxy to forward the original
client information to a server behind a reverse proxy. It is also employed
by AWS Network Load Balancers for the same purpose [2].

The connection load balancing algorithm "port" which uses the source port
number cannot be used behind a reverse proxy, because the port number
generated by the client is obfuscated by the reverse proxy.

We introduce a new listen_options member that declares connections to
the server_socket use proxy protocol v2. The remote_address() and local_address()
connected_socket members will now return the original connection's addresses.
The "port" load balancing algorithm will also use the original connection's
source port, restoring correct shard routing.

This server socket will only accept connections that have the protocol header
prepended to the connection data.

The httpd server is extended with a /shard endpoint (so we can observe the
shard we landed on) and a --proxy-protocol switch so we can test
the new algorithm. It is easy to test with the /shard endpoint and the curl
--local-port option, with and without haproxy.

[1] https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
[2] https://aws.amazon.com/blogs/networking-and-content-delivery/preserving-client-ip-address-with-proxy-protocol-v2-and-network-load-balancer/

@avikivity
Copy link
Member Author

TODO: propagate the addresses to connected_socket's accessors, unit test

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the PROXY protocol v2 header parsing in Seastar's POSIX networking stack, introducing a new load balancing algorithm that uses connection information from the proxy protocol header.

Key changes:

  • Implements PROXY protocol v2 header parsing for IPv4 and IPv6 connections
  • Adds proxy_protocol_v2_and_port load balancing algorithm that extracts remote address from the header
  • Updates the httpd demo application to support configurable load balancing algorithms

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/net/posix-stack.cc Implements proxy protocol v2 header parsing and integrates it into the accept loop with new load balancing logic
include/seastar/net/posix-stack.hh Adds move assignment operators to conntrack::handle class
include/seastar/net/api.hh Adds new proxy_protocol_v2_and_port load balancing algorithm enum value
apps/httpd/main.cc Adds command-line option for selecting load balancing algorithm and a new /shard endpoint

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
} ();
auto& proxy_protocol_header = *proxy_protocol_header_opt;
auto& psa = proxy_protocol_header.remote_address;
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code unconditionally calls as_posix_sockaddr_in() on the remote address, which assumes IPv4. This will fail when the proxy protocol header contains an IPv6 address (fam == 0x2). The code should check the address family and use as_posix_sockaddr_in6().sin6_port for IPv6 addresses, similar to how it's done on line 614.

Suggested change
auto& psa = proxy_protocol_header.remote_address;
uint16_t port = 0;
if (psa.family() == AF_INET) {
port = ntoh(psa.as_posix_sockaddr_in().sin_port);
} else if (psa.family() == AF_INET6) {
port = ntoh(psa.as_posix_sockaddr_in6().sin6_port);
}
cth = _conntrack.get_handle(port % smp::count);

Copilot uses AI. Check for mistakes.
case server_socket::load_balancing_algorithm::connection_distribution:
cth = _conntrack.get_handle();
break;
case server_socket::load_balancing_algorithm::port:
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code unconditionally calls as_posix_sockaddr_in() which assumes IPv4. This will fail for IPv6 connections where sa.family() is AF_INET6. The code should check the address family and use as_posix_sockaddr_in6().sin6_port for IPv6 addresses.

Suggested change
case server_socket::load_balancing_algorithm::port:
uint16_t port = 0;
if (sa.family() == AF_INET) {
port = ntohs(sa.as_posix_sockaddr_in().sin_port);
} else if (sa.family() == AF_INET6) {
port = ntohs(sa.as_posix_sockaddr_in6().sin6_port);
}
cth = _conntrack.get_handle(port % smp::count);

Copilot uses AI. Check for mistakes.
@avikivity
Copy link
Member Author

v2: fix build in module mode by adding missing include

if (fam_proto != 0x00) { // UNSPEC
co_return std::nullopt;
}
co_return local_proxy_protocol_v2_header(fd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 15th and 16th bytes is the address length in bytes in network endian order.
It is used so that the receiver knows how many address bytes to skip even when
it does not implement the presented protocol. Thus the length of the protocol
header in bytes is always exactly 16 + this value. When a sender presents a
LOCAL connection, it should not present any address so it sets this field to
zero. Receivers MUST always consider this field to skip the appropriate number
of bytes and must not assume zero is presented for LOCAL connections. When a
receiver accepts an incoming connection showing an UNSPEC address family or
protocol, it may or may not decide to log the address information if present.

Looks like we need to read data below anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I read this part but managed to misunderstand it.

@avikivity
Copy link
Member Author

v3: really fix the module build. I believe our module build is completely broken.

@avikivity avikivity marked this pull request as draft October 30, 2025 21:00
@avikivity
Copy link
Member Author

v4:

  • read second part of header even for LOCAL command
  • return addresses carried by header from connected_socket::local_address() and connected_socket::remote_address()

@avikivity
Copy link
Member Author

After implementing connected_socket::local_address and connected_socket::remote_address, I think the feature should be enabled using a new bool listen_options::proxy_protocol_v2. This is because the feature doesn't just affect load balancing, it also affects local/remote address reporting. It's reasonable to use other load balancing options with proxy protocol v2.

So we need a new bool in listen_options, and change the meaning of load_balancing_algorithm::port to use the "real" remote port when proxy protocol is enabled.

@gleb-cloudius
Copy link
Contributor

After implementing connected_socket::local_address and connected_socket::remote_address, I think the feature should be enabled using a new bool listen_options::proxy_protocol_v2. This is because the feature doesn't just affect load balancing, it also affects local/remote address reporting. It's reasonable to use other load balancing options with proxy protocol v2.

So we need a new bool in listen_options, and change the meaning of load_balancing_algorithm::port to use the "real" remote port when proxy protocol is enabled.

Well, I proposed to implement it as new type of socket similar to tls.

@avikivity
Copy link
Member Author

After implementing connected_socket::local_address and connected_socket::remote_address, I think the feature should be enabled using a new bool listen_options::proxy_protocol_v2. This is because the feature doesn't just affect load balancing, it also affects local/remote address reporting. It's reasonable to use other load balancing options with proxy protocol v2.
So we need a new bool in listen_options, and change the meaning of load_balancing_algorithm::port to use the "real" remote port when proxy protocol is enabled.

Well, I proposed to implement it as new type of socket similar to tls.

Yes, it's very similar.

@avikivity
Copy link
Member Author

avikivity commented Nov 8, 2025

v5:

  • rebase past 0353ec3
  • engaged via listen_options::proxy_protocol_v2 rather than load_balancing_algorithm::proxy_protocol_v2
    (because it also affects local_address() and remote_address())

Still todo:

  • more tests

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're looking to add proxy protocol support (v1 and v2) in Redpanda, so I wanted to chime in with a few comments from that perspective. Hope that's okay.

default:
co_return std::nullopt;
}
co_return header;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC talks about chaining multiple layers in section "4.1. Multiple layers". However it is not clear to me from the spec if multiple layers of proxies would lead to the server receiving a response with: (a) a single header with the IPs populated with the original client's IPs, or (b) multiple headers with each proxy adding its own proxying information.

This blog post makes me think that the latter is possible: Two hops TCP proxies resulting in two PROXY protocol header.

Do you know what the expected behaviour is here or have you had a chance to test with multiple layers of proxy protocols?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's more sensible for intermediate proxies to act as proxy protocol consumers (strip incoming headers) and producers (generate outgoing headers), populating the outgoing headers with addresses from the incoming headers.

It's also possible to chain the headers, but then the end consumer would need to know the number of layers to unstack.

I propose deferring this question until a real-life configuration is found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose deferring this question until a real-life configuration is found.

That makes sense to me.

Comment on lines 422 to 427
// The connection is encapsulated with proxy protocol v2 (which is just
// a header prepended to the data stream). connected_socket::remote_address()
// and connected_socket::local_address() will return the addresses
// as specified in the proxy protocol header. load_balancing_algorithm::port
// will use the port from the proxy protocol header.
bool proxy_protocol_v2 = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCP only supports proxy protocol v1 at this point (ref) as far as I am aware. Even if seastar doesn't support proxy protocol v1 from the outset, it would be great if the implementation could be made generic to be easily extendable to proxy protocol v1.

Based on that I think it would make sense to make this config a more generic std::optional<proxy_protocol_config> proxy_protocol_config; that has a bool supports_v2 field or perhaps a vector of supported versions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename it to proxy_protocol and comment that only v2 is supported. If someone adds v1 support, that is a compatible change. If someone uses it with v1 despite the comment that it's not supported, they'll learn to read comments (I know I won't).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCP Private Service Connect supports v2 (ref - https://docs.cloud.google.com/vpc/docs/about-vpc-hosted-services#proxy-protocol )

Thank you, I haven't come across this earlier. However, this is only for Private Service Connect, and GCP NLB's still use v1 unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename it to proxy_protocol and comment that only v2 is supported. If someone adds v1 support, that is a compatible change. If someone uses it with v1 despite the comment that it's not supported, they'll learn to read comments (I know I won't).

Not making specific versions of proxy protocol configurable, but having a single on/off config sounds good to me.

Another reason why I was thinking about a more generic config struct, however, is because I was also considering whether we are going to need an allowlist of IPs to accept proxy protocol headers from (i.e. a configurable IP list, that is used to validate the TCP connection's IP before accepting a connection with the proxy protocol header).

This is something that the spec calls for to prevent IP spoofing:

The receiver SHOULD ensure proper access filtering so that only trusted proxies are allowed to use this protocol.

And other implementations I could find seem to support such IP allow lists as a configuration of the proxy protocol consumer layer:

What do you think about supporting IP whitelisting in the proxy protocol layer? I guess people could (and probably should) just use iptables to restrict traffic to the proxy protocol port, but I am wondering if there is a reason why other implementations support it as an explicit proxy protocol-layer config as well. I think it would certainly make configuring proxy protocol simpler in Kubernetes deployments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in modern cloud environments it is easy to whitelist addresses using the software defined networking layer. Since there's a dedicated port for proxy protocol connections, it's easy to dedicate a firewall rule for it.

To support in-app whitelisting, we have two options:

  1. add a vector of IP addresses to whitelist against
  2. expose the real remote_address (and local_address) in addition to the proxied ones, and let application conduct the whitelisting

I think 2 is better. First, we expose information that is generically needed as a side effect. Second, whitelisting can easily become complicated (subnets, dynamic changes, blah blah). Third, the Seastar socket layer is supposed to mimic the posix API and not add random stuff. This series is an exception, but it has an excuse - we need to process the proxy packet before fully establishing the connected_socket, since we may need to migrate it based on the contents of the packet.

app.add_options()("port", bpo::value<uint16_t>()->default_value(10000), "HTTP Server port");
app.add_options()("load-balancing-algorithm",
bpo::value<std::string>()->default_value("connection_distribution"),
"Load balancing algorithm: connection_distribution, port, proxy_protocol_v2_and_port, fixed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you backed out of the separate load balancing algo earlier:

Suggested change
"Load balancing algorithm: connection_distribution, port, proxy_protocol_v2_and_port, fixed");
"Load balancing algorithm: connection_distribution, port, fixed");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bad rebase, thanks.

@avikivity
Copy link
Member Author

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

so I wanted to chime in with a few comments from that perspective. Hope that's okay.

That's why there is a comment box here.

@avikivity
Copy link
Member Author

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

Ah, I guess for GCP, but it does support v2.

v1 is outdated and I don't see a modern use case for it.

@pgellert
Copy link
Contributor

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

Ah, I guess for GCP, but it does support v2.

v1 is outdated and I don't see a modern use case for it.

Yes, the use case would be GCP NLB's. In GCP, only Private Service Connect supports v2 as per the docs, whereas NLB's only support v1 still.

@avikivity
Copy link
Member Author

We're looking to add proxy protocol support (v1 and v2) in Redpanda,

What's your use case for v1?

Ah, I guess for GCP, but it does support v2.
v1 is outdated and I don't see a modern use case for it.

Yes, the use case would be GCP NLB's. In GCP, only Private Service Connect supports v2 as per the docs, whereas NLB's only support v1 still.

I see. Pity. Do send a patch once this is merged (or even before, and I can incorporate it into the series).

@avikivity avikivity changed the title net: support proxy protocol v2 for load balancing net: support proxy protocol v2 Nov 12, 2025
@avikivity
Copy link
Member Author

v6:

  • rename configuration to proxy_protocol to leave open the implementation of v1
  • fix list of load balancing algorithms in httpd command line help
  • positive and negative tests for the protocol itself
  • rearranged the patches a little

@avikivity avikivity marked this pull request as ready for review November 12, 2025 20:30
Comment on lines +327 to +339
virtual socket_address local_address() const noexcept override {
return _proxy_local_addr;
}
virtual socket_address remote_address() const noexcept override {
return _proxy_remote_addr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overwriting the remote_address makes sense to me, as that allows applications to treat the connection as if it originated from the remote client.

However, I would expect the local_address() to continue to refer to the server's socket address and not the proxy's socket address even if the proxy protocol is enabled. I think it would be more in-line with the meaning of local_address() if it was left as _ops->local_address(_fd.get_file_desc()) and to report the _proxy_remote_addr through a new proxy_address() method instead.

For example, in redpanda we use the local address for audit logging, reporting which interface the connection came in through, and we would want to continue to report the server's socket address there, regardless of whether the proxy protocol is enabled or not.

Alternatively, we could expose the server's socket address through a new method, but I think it would be better to leave the local_address() unchanged and expose the proxy address as a new method instead.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments can be made both ways. The proxy's address is how the server is advertised to the user, so if you log it (for example, in scylladb we have a system.clients[1] table that is visible to the user), then the user can match the address to something known to them.

Eventually, we'll need to expose both address types. But note the real local address is already known to the server, who bound to it in the first place.

[1] (but system.clients only logs the remote address, not the local address)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if you have a cluster of haproxies, and you suspect one of malfunctioning, having the proxy bind address tells you which one to look at.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we'll need to expose both address types. But note the real local address is already known to the server, who bound to it in the first place.

This makes sense, and agreed that this is sufficient for my side.

Would the preferences for the behaviour of local_address() change if we supported decoding more than one layer of proxy protocol headers? For example consider a complex network layout such as:

customer network
      |
  VPC peering proxy
     / \
    /   \
   |   NLB [1]
    \   /
    server

I guess even in this case, exposing the local address as the vpc peering proxy's socket is still useful for matching the client-side view. But then the "local address" is really the first proxy's address, which may be outside of the control of the server operator.

I am happy either way because, as you said, both socket addresses are available to the server, so the server can choose which one to expose based on the use case. But I just wanted to think through and understand the trade offs with you.

[1] we may point the customer-facing domain name to an NLB to load balance initial connections (bootstrapping, in Kafka terms). After that, clients connect directly to the individual brokers by IP or broker-specific domain name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the API for exposing the "real" local/remote addresses can return a vector describing the path. The no-proxy case would return a vector of length 1.

I really hope we don't have to come to this. This kind of network engineering belongs in the network, not the application, though I understand why it came to be. For ScyllaDB, we'd be much better off with routeable IPv6 subnets, since the service structure is exposed to the application, unlike web servers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.

Comment on lines 802 to 816
auto client = seastar::async([&listen_addr, header = std::move(header)] {
auto s = connect(listen_addr).get();
auto out = s.output();
out.write(header.data(), header.size()).get();
out.flush().get();
out.close().get();
});

when_all(std::move(server), std::move(client)).discard_result().get();
ss.abort_accept();

// With LOCAL mode, addresses should be real connection addresses, not from header
// We can't check exact match since the client port is ephemeral, but we can verify
// that the addresses are not the fake proxy addresses and the IPs match
BOOST_REQUIRE_EQUAL(actual_local, socket_address(listen_addr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth strengthening this check to also assert here that BOOST_REQUIRE_EQUAL(actual_remote, actual_client_local) where actual_client_local = s.local_address() of the client's connected socket?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

@avikivity
Copy link
Member Author

v7:

  • strengthened LOCAL_MODE test to check the client's local address too (server's remote)
  • not done: advertise real server's local address in connected_socket::local_address(), kept the proxy's bind address instead

@avikivity
Copy link
Member Author

v8:

  • rebased past conflicts

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side

@avikivity
Copy link
Member Author

@nyh please review

@avikivity avikivity requested a review from nyh November 20, 2025 13:41
@avikivity
Copy link
Member Author

@nyh review ping

@nyh
Copy link
Contributor

nyh commented Nov 26, 2025

@avikivity I'm reviewing now, which means I needed to start by reading the protocol definition (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) which I was not aware of until today.

Can you please clarify, when you're saying that you want to support "proxy protocol v2" do you mean just the binary serialization added in v2? Or do you want to do the full v2, including both human-readable and binary versions?

@avikivity
Copy link
Member Author

@avikivity I'm reviewing now, which means I needed to start by reading the protocol definition (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) which I was not aware of until today.

Can you please clarify, when you're saying that you want to support "proxy protocol v2" do you mean just the binary serialization added in v2? Or do you want to do the full v2, including both human-readable and binary versions?

Only the binary protocol.

Since this is a reverse proxy integration protocol, we don't expect to randomly receive v1.

@nyh
Copy link
Contributor

nyh commented Nov 26, 2025

@avikivity I'm reviewing now, which means I needed to start by reading the protocol definition (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) which I was not aware of until today.
Can you please clarify, when you're saying that you want to support "proxy protocol v2" do you mean just the binary serialization added in v2? Or do you want to do the full v2, including both human-readable and binary versions?

Only the binary protocol.

Since this is a reverse proxy integration protocol, we don't expect to randomly receive v1.

Yes, that's what I guessed. But it's misleading to call this "v2 protocol" because according to the documentation I am reading, the v2 protocol needs to support both textual and binary protocol. I think we should explicitly state that we only support the v2 binary protocol.

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. I had some minor comments and suggestions you may want to look at, but none of it is important enough to "request changes".

However, the thing that most bothered me is the name of the "proxy_protocol_v2_header" class, which I think has the wrong name - it is neither the header, nor specific to the v2 protocol. It is just the "original addresses" we took out of the proxy header, and if we supported the v1 protocol, we would have taken exactly the same information out of the textual header as well.

// as specified in the proxy protocol header. load_balancing_algorithm::port
// will use the port from the proxy protocol header.
//
// Currently only proxy protocol v2 is supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using here and elsewhere a phrase like "proxy protocol v2 binary format" instead of just "proxy protocol v2" because according to the documentation, while v2 added the binary format, it didn't remove the textual format, and we are not implementing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest writing at least in one comment a link to https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt. The "proxy protocol" is a very generic sounding, readers might not find it obvious to which protocol this refers to, or if it even refers to a specific protocol (its name isn't capitalized).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified; added link.

void on_batch_flush_error() noexcept override;
};

struct proxy_protocol_v2_header {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the name of this structure a bit confusing. It is not structured like the proxy protocol header, and doesn't even have the full capabilities of a proxy protocol header. Rather it just contains the pieces important to us today.

Also, the word "v2" in this structure's name is irrelevant - if we ever decide to support the textual format (from v1) as well, anyway we'll parse it into this same structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to proxy_data.

buffer = heap_buffer.get();
}

auto xlen = co_await fd.read_some(buffer, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, you don't need to create a named variable "xlen" - while it's not inefficient (the compiler will not keep it), a human reader needs to parse it. You can just do:

if (co_await fd.read_some(buffer, len) < len) {
    co_return std::nullopt;
}

Same for the variable n_read above, by the way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two lines are more readable, especially when people aren't used to co_await operator precedence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touché.

auto proto = fam_proto & 0x0F;

if (proto != 0x1) { // STREAM
co_return std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return nullopt is different from the others - while the previous ones were cases where the connection was genuinely broken and should be dropped, the case here includes cases (like UDP) which are supported by the protocol, but you chose not to support right now. This is fine, but we should document which pieces we don't support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

co_return std::nullopt;
}

proxy_protocol_v2_header header;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think both the struct and the variable name (header) are misnamed. It's not v2-specific, and it's not a "header", it's just the addresses we found in the header. Maybe we should call it something like original_addresses or client_addresses or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

Comment on lines +327 to +339
virtual socket_address local_address() const noexcept override {
return _proxy_local_addr;
}
virtual socket_address remote_address() const noexcept override {
return _proxy_remote_addr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.

BOOST_CHECK(ss.remote_address().is_unspecified());
}

// Comprehensive tests for proxy protocol v2 implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you add 16 tests for one feature (the proxy protocol), around 600 lines of code, I think it makes a lot of sense to put it in a separate source file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


auto server = seastar::async([&ss, &got_connection] {
using namespace std::chrono_literals;
// Set a timer to abort accept after 100ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. I think we expect the connection to be rejected in a clean reliable way - never to need this timeout. Will this cause the test to fail - or incorrectly succeed - if the timeout occurs? Are we sure 100ms is always enough even on extremely overloaded test machines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server never sees a bad connection. It's rejected by the networking layer. It's like a malformed SYN packet, an accept will never see it. So we need to escape.

Seastar tests aren't overcommitted.

I added checks that we get the timeout when we expect it.

out.flush().get();
out.close().get();
} catch (...) {
// Expected - server may close connection during write/flush/close
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why did you write the client to send data while the server doesn't read data? Shouldn't you fix one of them - e.g., fix the client to read until eof?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write anything here.


// With LOCAL mode, addresses should be real connection addresses, not from header
// We can't check exact match since the client port is ephemeral, but we can verify
// that the addresses are not the fake proxy addresses and the IPs match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the BOOST_REQUIRE_EQUAL below really implement what this comment says ("we can verify that the addresses are NOT the fake proxy addresses")? I don't see how.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they do. We check the client address against the real one.

@avikivity
Copy link
Member Author

avikivity commented Nov 26, 2025

I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.

It's documented in listen_options::proxy_protocol.

The recursion is replaced by a while loop.

This simplifes later patches which add more optional awaiting
steps.

The coroutinization does not add an allocation, as typically accept()
will return a non-ready future and future::then_unpack() will allocate
a continuation. In any case accept() is not very performance sensitive.
Implement assignment operators for conntrack::handle, needed for the
next patch. There is nothing special about them here.
The immediately invoked lambda avoids some repetition, but if we want
to add awaiting steps to it it will turn into a coroutine and add an
allocation. Turn it into regular code instead, using the conntrack::handle
assignment operator we added in the previous patch.
…kets

Implement the proxy protocol v2 as defined by haproxy. This is a header
prepended to the input stream containing the "real" local and remote
addresses, hidden by the reverse proxy or load balancer.

This cannot be implemented by the application, because we want
port load balancing (load_balancing_algorithm::port) to continue
to work and use the client's source port. Connection migration can
only be done by Seastar at the moment.

We can't use input_stream/data_source here since the connection will
be transferred to another shard, and these objects are deeply intertwined
with the shard, but we can use pollable_fd which is just a wrapper around
a file descriptor. We don't need to use the equivalent of read_exactly()
because the protocol guarantees the header will be sent in one packet
and not be fragmented.

In this patch, we only parse the protocol header and extract the remote
address. Changing connected_socket::local_address() and
connected_socket::remote_address() to reflect the proxy data is done later.
Proxied connections derive the local and remote addresses from
the proxy protocol, not getsockname(2)/getpeername(2). Add a new
derived class to hold the storage for those addresses and supply
new implementations for local_address() and remote_address().

A constructor in posix_connected_socket_impl is made public for this.
Since the class itself is part of the implementation, this doesn't
expose anything.
Expand the existing tests to send a fake proxy protocol packet
instead of binding to the source port.
16 tests covering:

Negative cases: bad signature, incomplete packets, invalid
versions/commands/families/protocols, wrong lengths.

Positive cases: IPv4/IPv6 addresses, LOCAL mode, TLVs, extreme port
values.

The tests were written by copilot under my guidance.
Useful for testing connection load-balancing algorithms.
Useful for, well, testing load balancing algorithms. Can be used
with curl's --local-port option, and a local haproxy to test
the proxy protocol.
@avikivity
Copy link
Member Author

v9:

  • clarified that only the binary format of proxy protocol is supported
  • added link to the protocol documentation
  • renamed proxy_protocol_v2_header to proxy_data
  • renamed variable of that type
  • commented non-support of proxied UDP over TCP
  • moved proxy protocol tests to new file
  • refactored tests to reduce code duplication
  • verify timeout happened when we expect it
  • rebased

@nyh nyh closed this in 5b75164 Nov 27, 2025
@nyh nyh merged commit 5b75164 into scylladb:master Nov 27, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants