-
Notifications
You must be signed in to change notification settings - Fork 1.7k
net: support proxy protocol v2 #3075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
TODO: propagate the addresses to connected_socket's accessors, unit test |
69ad576 to
d5664a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the PROXY protocol v2 header parsing in Seastar's POSIX networking stack, introducing a new load balancing algorithm that uses connection information from the proxy protocol header.
Key changes:
- Implements PROXY protocol v2 header parsing for IPv4 and IPv6 connections
- Adds
proxy_protocol_v2_and_portload balancing algorithm that extracts remote address from the header - Updates the httpd demo application to support configurable load balancing algorithms
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/net/posix-stack.cc | Implements proxy protocol v2 header parsing and integrates it into the accept loop with new load balancing logic |
| include/seastar/net/posix-stack.hh | Adds move assignment operators to conntrack::handle class |
| include/seastar/net/api.hh | Adds new proxy_protocol_v2_and_port load balancing algorithm enum value |
| apps/httpd/main.cc | Adds command-line option for selecting load balancing algorithm and a new /shard endpoint |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/net/posix-stack.cc
Outdated
| } | ||
| } (); | ||
| auto& proxy_protocol_header = *proxy_protocol_header_opt; | ||
| auto& psa = proxy_protocol_header.remote_address; |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code unconditionally calls as_posix_sockaddr_in() on the remote address, which assumes IPv4. This will fail when the proxy protocol header contains an IPv6 address (fam == 0x2). The code should check the address family and use as_posix_sockaddr_in6().sin6_port for IPv6 addresses, similar to how it's done on line 614.
| auto& psa = proxy_protocol_header.remote_address; | |
| uint16_t port = 0; | |
| if (psa.family() == AF_INET) { | |
| port = ntoh(psa.as_posix_sockaddr_in().sin_port); | |
| } else if (psa.family() == AF_INET6) { | |
| port = ntoh(psa.as_posix_sockaddr_in6().sin6_port); | |
| } | |
| cth = _conntrack.get_handle(port % smp::count); |
| case server_socket::load_balancing_algorithm::connection_distribution: | ||
| cth = _conntrack.get_handle(); | ||
| break; | ||
| case server_socket::load_balancing_algorithm::port: |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code unconditionally calls as_posix_sockaddr_in() which assumes IPv4. This will fail for IPv6 connections where sa.family() is AF_INET6. The code should check the address family and use as_posix_sockaddr_in6().sin6_port for IPv6 addresses.
| case server_socket::load_balancing_algorithm::port: | |
| uint16_t port = 0; | |
| if (sa.family() == AF_INET) { | |
| port = ntohs(sa.as_posix_sockaddr_in().sin_port); | |
| } else if (sa.family() == AF_INET6) { | |
| port = ntohs(sa.as_posix_sockaddr_in6().sin6_port); | |
| } | |
| cth = _conntrack.get_handle(port % smp::count); |
|
v2: fix build in module mode by adding missing include |
d5664a1 to
12c0cfe
Compare
src/net/posix-stack.cc
Outdated
| if (fam_proto != 0x00) { // UNSPEC | ||
| co_return std::nullopt; | ||
| } | ||
| co_return local_proxy_protocol_v2_header(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 15th and 16th bytes is the address length in bytes in network endian order.
It is used so that the receiver knows how many address bytes to skip even when
it does not implement the presented protocol. Thus the length of the protocol
header in bytes is always exactly 16 + this value. When a sender presents a
LOCAL connection, it should not present any address so it sets this field to
zero. Receivers MUST always consider this field to skip the appropriate number
of bytes and must not assume zero is presented for LOCAL connections. When a
receiver accepts an incoming connection showing an UNSPEC address family or
protocol, it may or may not decide to log the address information if present.
Looks like we need to read data below anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I read this part but managed to misunderstand it.
12c0cfe to
c2c8500
Compare
|
v3: really fix the module build. I believe our module build is completely broken. |
c2c8500 to
5153478
Compare
|
v4:
|
|
After implementing connected_socket::local_address and connected_socket::remote_address, I think the feature should be enabled using a new bool listen_options::proxy_protocol_v2. This is because the feature doesn't just affect load balancing, it also affects local/remote address reporting. It's reasonable to use other load balancing options with proxy protocol v2. So we need a new bool in listen_options, and change the meaning of load_balancing_algorithm::port to use the "real" remote port when proxy protocol is enabled. |
Well, I proposed to implement it as new type of socket similar to tls. |
Yes, it's very similar. |
5153478 to
4e7cace
Compare
|
v5:
Still todo:
|
pgellert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're looking to add proxy protocol support (v1 and v2) in Redpanda, so I wanted to chime in with a few comments from that perspective. Hope that's okay.
src/net/posix-stack.cc
Outdated
| default: | ||
| co_return std::nullopt; | ||
| } | ||
| co_return header; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC talks about chaining multiple layers in section "4.1. Multiple layers". However it is not clear to me from the spec if multiple layers of proxies would lead to the server receiving a response with: (a) a single header with the IPs populated with the original client's IPs, or (b) multiple headers with each proxy adding its own proxying information.
This blog post makes me think that the latter is possible: Two hops TCP proxies resulting in two PROXY protocol header.
Do you know what the expected behaviour is here or have you had a chance to test with multiple layers of proxy protocols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it's more sensible for intermediate proxies to act as proxy protocol consumers (strip incoming headers) and producers (generate outgoing headers), populating the outgoing headers with addresses from the incoming headers.
It's also possible to chain the headers, but then the end consumer would need to know the number of layers to unstack.
I propose deferring this question until a real-life configuration is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose deferring this question until a real-life configuration is found.
That makes sense to me.
include/seastar/net/api.hh
Outdated
| // The connection is encapsulated with proxy protocol v2 (which is just | ||
| // a header prepended to the data stream). connected_socket::remote_address() | ||
| // and connected_socket::local_address() will return the addresses | ||
| // as specified in the proxy protocol header. load_balancing_algorithm::port | ||
| // will use the port from the proxy protocol header. | ||
| bool proxy_protocol_v2 = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCP only supports proxy protocol v1 at this point (ref) as far as I am aware. Even if seastar doesn't support proxy protocol v1 from the outset, it would be great if the implementation could be made generic to be easily extendable to proxy protocol v1.
Based on that I think it would make sense to make this config a more generic std::optional<proxy_protocol_config> proxy_protocol_config; that has a bool supports_v2 field or perhaps a vector of supported versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCP Private Service Connect supports v2 (ref - https://docs.cloud.google.com/vpc/docs/about-vpc-hosted-services#proxy-protocol )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename it to proxy_protocol and comment that only v2 is supported. If someone adds v1 support, that is a compatible change. If someone uses it with v1 despite the comment that it's not supported, they'll learn to read comments (I know I won't).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCP Private Service Connect supports v2 (ref - https://docs.cloud.google.com/vpc/docs/about-vpc-hosted-services#proxy-protocol )
Thank you, I haven't come across this earlier. However, this is only for Private Service Connect, and GCP NLB's still use v1 unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename it to proxy_protocol and comment that only v2 is supported. If someone adds v1 support, that is a compatible change. If someone uses it with v1 despite the comment that it's not supported, they'll learn to read comments (I know I won't).
Not making specific versions of proxy protocol configurable, but having a single on/off config sounds good to me.
Another reason why I was thinking about a more generic config struct, however, is because I was also considering whether we are going to need an allowlist of IPs to accept proxy protocol headers from (i.e. a configurable IP list, that is used to validate the TCP connection's IP before accepting a connection with the proxy protocol header).
This is something that the spec calls for to prevent IP spoofing:
The receiver SHOULD ensure proper access filtering so that only trusted proxies are allowed to use this protocol.
And other implementations I could find seem to support such IP allow lists as a configuration of the proxy protocol consumer layer:
tcp-request connection expect-proxy layer4 if { src -f proxies.lst }-- https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#4.2-tcp-request%20connection- https://nginx.org/en/docs/stream/ngx_stream_realip_module.html#set_real_ip_from
- https://mariadb.com/docs/server/clients-and-utilities/server-client-software/client-libraries/proxy-protocol-support#enabling-proxy-protocol-in-mariadb-server
What do you think about supporting IP whitelisting in the proxy protocol layer? I guess people could (and probably should) just use iptables to restrict traffic to the proxy protocol port, but I am wondering if there is a reason why other implementations support it as an explicit proxy protocol-layer config as well. I think it would certainly make configuring proxy protocol simpler in Kubernetes deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in modern cloud environments it is easy to whitelist addresses using the software defined networking layer. Since there's a dedicated port for proxy protocol connections, it's easy to dedicate a firewall rule for it.
To support in-app whitelisting, we have two options:
- add a vector of IP addresses to whitelist against
- expose the real remote_address (and local_address) in addition to the proxied ones, and let application conduct the whitelisting
I think 2 is better. First, we expose information that is generically needed as a side effect. Second, whitelisting can easily become complicated (subnets, dynamic changes, blah blah). Third, the Seastar socket layer is supposed to mimic the posix API and not add random stuff. This series is an exception, but it has an excuse - we need to process the proxy packet before fully establishing the connected_socket, since we may need to migrate it based on the contents of the packet.
apps/httpd/main.cc
Outdated
| app.add_options()("port", bpo::value<uint16_t>()->default_value(10000), "HTTP Server port"); | ||
| app.add_options()("load-balancing-algorithm", | ||
| bpo::value<std::string>()->default_value("connection_distribution"), | ||
| "Load balancing algorithm: connection_distribution, port, proxy_protocol_v2_and_port, fixed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you backed out of the separate load balancing algo earlier:
| "Load balancing algorithm: connection_distribution, port, proxy_protocol_v2_and_port, fixed"); | |
| "Load balancing algorithm: connection_distribution, port, fixed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, bad rebase, thanks.
What's your use case for v1?
That's why there is a comment box here. |
Ah, I guess for GCP, but it does support v2. v1 is outdated and I don't see a modern use case for it. |
Yes, the use case would be GCP NLB's. In GCP, only Private Service Connect supports v2 as per the docs, whereas NLB's only support v1 still. |
I see. Pity. Do send a patch once this is merged (or even before, and I can incorporate it into the series). |
90a7664 to
c24513d
Compare
|
v6:
|
| virtual socket_address local_address() const noexcept override { | ||
| return _proxy_local_addr; | ||
| } | ||
| virtual socket_address remote_address() const noexcept override { | ||
| return _proxy_remote_addr; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overwriting the remote_address makes sense to me, as that allows applications to treat the connection as if it originated from the remote client.
However, I would expect the local_address() to continue to refer to the server's socket address and not the proxy's socket address even if the proxy protocol is enabled. I think it would be more in-line with the meaning of local_address() if it was left as _ops->local_address(_fd.get_file_desc()) and to report the _proxy_remote_addr through a new proxy_address() method instead.
For example, in redpanda we use the local address for audit logging, reporting which interface the connection came in through, and we would want to continue to report the server's socket address there, regardless of whether the proxy protocol is enabled or not.
Alternatively, we could expose the server's socket address through a new method, but I think it would be better to leave the local_address() unchanged and expose the proxy address as a new method instead.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments can be made both ways. The proxy's address is how the server is advertised to the user, so if you log it (for example, in scylladb we have a system.clients[1] table that is visible to the user), then the user can match the address to something known to them.
Eventually, we'll need to expose both address types. But note the real local address is already known to the server, who bound to it in the first place.
[1] (but system.clients only logs the remote address, not the local address)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if you have a cluster of haproxies, and you suspect one of malfunctioning, having the proxy bind address tells you which one to look at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we'll need to expose both address types. But note the real local address is already known to the server, who bound to it in the first place.
This makes sense, and agreed that this is sufficient for my side.
Would the preferences for the behaviour of local_address() change if we supported decoding more than one layer of proxy protocol headers? For example consider a complex network layout such as:
customer network
|
VPC peering proxy
/ \
/ \
| NLB [1]
\ /
server
I guess even in this case, exposing the local address as the vpc peering proxy's socket is still useful for matching the client-side view. But then the "local address" is really the first proxy's address, which may be outside of the control of the server operator.
I am happy either way because, as you said, both socket addresses are available to the server, so the server can choose which one to expose based on the use case. But I just wanted to think through and understand the trade offs with you.
[1] we may point the customer-facing domain name to an NLB to load balance initial connections (bootstrapping, in Kafka terms). After that, clients connect directly to the individual brokers by IP or broker-specific domain name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the API for exposing the "real" local/remote addresses can return a vector describing the path. The no-proxy case would return a vector of length 1.
I really hope we don't have to come to this. This kind of network engineering belongs in the network, not the application, though I understand why it came to be. For ScyllaDB, we'd be much better off with routeable IPv6 subnets, since the service structure is exposed to the application, unlike web servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.
tests/unit/socket_test.cc
Outdated
| auto client = seastar::async([&listen_addr, header = std::move(header)] { | ||
| auto s = connect(listen_addr).get(); | ||
| auto out = s.output(); | ||
| out.write(header.data(), header.size()).get(); | ||
| out.flush().get(); | ||
| out.close().get(); | ||
| }); | ||
|
|
||
| when_all(std::move(server), std::move(client)).discard_result().get(); | ||
| ss.abort_accept(); | ||
|
|
||
| // With LOCAL mode, addresses should be real connection addresses, not from header | ||
| // We can't check exact match since the client port is ephemeral, but we can verify | ||
| // that the addresses are not the fake proxy addresses and the IPs match | ||
| BOOST_REQUIRE_EQUAL(actual_local, socket_address(listen_addr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth strengthening this check to also assert here that BOOST_REQUIRE_EQUAL(actual_remote, actual_client_local) where actual_client_local = s.local_address() of the client's connected socket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
c24513d to
d9a6500
Compare
|
v7:
|
d9a6500 to
7109cfe
Compare
|
v8:
|
pgellert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side
|
@nyh please review |
|
@nyh review ping |
|
@avikivity I'm reviewing now, which means I needed to start by reading the protocol definition (https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) which I was not aware of until today. Can you please clarify, when you're saying that you want to support "proxy protocol v2" do you mean just the binary serialization added in v2? Or do you want to do the full v2, including both human-readable and binary versions? |
Only the binary protocol. Since this is a reverse proxy integration protocol, we don't expect to randomly receive v1. |
Yes, that's what I guessed. But it's misleading to call this "v2 protocol" because according to the documentation I am reading, the v2 protocol needs to support both textual and binary protocol. I think we should explicitly state that we only support the v2 binary protocol. |
nyh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. I had some minor comments and suggestions you may want to look at, but none of it is important enough to "request changes".
However, the thing that most bothered me is the name of the "proxy_protocol_v2_header" class, which I think has the wrong name - it is neither the header, nor specific to the v2 protocol. It is just the "original addresses" we took out of the proxy header, and if we supported the v1 protocol, we would have taken exactly the same information out of the textual header as well.
include/seastar/net/api.hh
Outdated
| // as specified in the proxy protocol header. load_balancing_algorithm::port | ||
| // will use the port from the proxy protocol header. | ||
| // | ||
| // Currently only proxy protocol v2 is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using here and elsewhere a phrase like "proxy protocol v2 binary format" instead of just "proxy protocol v2" because according to the documentation, while v2 added the binary format, it didn't remove the textual format, and we are not implementing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also suggest writing at least in one comment a link to https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt. The "proxy protocol" is a very generic sounding, readers might not find it obvious to which protocol this refers to, or if it even refers to a specific protocol (its name isn't capitalized).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified; added link.
include/seastar/net/posix-stack.hh
Outdated
| void on_batch_flush_error() noexcept override; | ||
| }; | ||
|
|
||
| struct proxy_protocol_v2_header { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the name of this structure a bit confusing. It is not structured like the proxy protocol header, and doesn't even have the full capabilities of a proxy protocol header. Rather it just contains the pieces important to us today.
Also, the word "v2" in this structure's name is irrelevant - if we ever decide to support the textual format (from v1) as well, anyway we'll parse it into this same structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to proxy_data.
| buffer = heap_buffer.get(); | ||
| } | ||
|
|
||
| auto xlen = co_await fd.read_some(buffer, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, you don't need to create a named variable "xlen" - while it's not inefficient (the compiler will not keep it), a human reader needs to parse it. You can just do:
if (co_await fd.read_some(buffer, len) < len) {
co_return std::nullopt;
}
Same for the variable n_read above, by the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think two lines are more readable, especially when people aren't used to co_await operator precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Touché.
| auto proto = fam_proto & 0x0F; | ||
|
|
||
| if (proto != 0x1) { // STREAM | ||
| co_return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return nullopt is different from the others - while the previous ones were cases where the connection was genuinely broken and should be dropped, the case here includes cases (like UDP) which are supported by the protocol, but you chose not to support right now. This is fine, but we should document which pieces we don't support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
src/net/posix-stack.cc
Outdated
| co_return std::nullopt; | ||
| } | ||
|
|
||
| proxy_protocol_v2_header header; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think both the struct and the variable name (header) are misnamed. It's not v2-specific, and it's not a "header", it's just the addresses we found in the header. Maybe we should call it something like original_addresses or client_addresses or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
| virtual socket_address local_address() const noexcept override { | ||
| return _proxy_local_addr; | ||
| } | ||
| virtual socket_address remote_address() const noexcept override { | ||
| return _proxy_remote_addr; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also have an API for the "real" (not known to the client) local address, but we don't need to do this now. What we should probably do now is to add a comment explaining that local_address() may be different from what you expect if the proxy protocol is used.
tests/unit/socket_test.cc
Outdated
| BOOST_CHECK(ss.remote_address().is_unspecified()); | ||
| } | ||
|
|
||
| // Comprehensive tests for proxy protocol v2 implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you add 16 tests for one feature (the proxy protocol), around 600 lines of code, I think it makes a lot of sense to put it in a separate source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
tests/unit/socket_test.cc
Outdated
|
|
||
| auto server = seastar::async([&ss, &got_connection] { | ||
| using namespace std::chrono_literals; | ||
| // Set a timer to abort accept after 100ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. I think we expect the connection to be rejected in a clean reliable way - never to need this timeout. Will this cause the test to fail - or incorrectly succeed - if the timeout occurs? Are we sure 100ms is always enough even on extremely overloaded test machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server never sees a bad connection. It's rejected by the networking layer. It's like a malformed SYN packet, an accept will never see it. So we need to escape.
Seastar tests aren't overcommitted.
I added checks that we get the timeout when we expect it.
tests/unit/socket_test.cc
Outdated
| out.flush().get(); | ||
| out.close().get(); | ||
| } catch (...) { | ||
| // Expected - server may close connection during write/flush/close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why did you write the client to send data while the server doesn't read data? Shouldn't you fix one of them - e.g., fix the client to read until eof?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't write anything here.
tests/unit/socket_test.cc
Outdated
|
|
||
| // With LOCAL mode, addresses should be real connection addresses, not from header | ||
| // We can't check exact match since the client port is ephemeral, but we can verify | ||
| // that the addresses are not the fake proxy addresses and the IPs match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the BOOST_REQUIRE_EQUAL below really implement what this comment says ("we can verify that the addresses are NOT the fake proxy addresses")? I don't see how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe they do. We check the client address against the real one.
It's documented in listen_options::proxy_protocol. |
The recursion is replaced by a while loop. This simplifes later patches which add more optional awaiting steps. The coroutinization does not add an allocation, as typically accept() will return a non-ready future and future::then_unpack() will allocate a continuation. In any case accept() is not very performance sensitive.
Implement assignment operators for conntrack::handle, needed for the next patch. There is nothing special about them here.
The immediately invoked lambda avoids some repetition, but if we want to add awaiting steps to it it will turn into a coroutine and add an allocation. Turn it into regular code instead, using the conntrack::handle assignment operator we added in the previous patch.
…kets Implement the proxy protocol v2 as defined by haproxy. This is a header prepended to the input stream containing the "real" local and remote addresses, hidden by the reverse proxy or load balancer. This cannot be implemented by the application, because we want port load balancing (load_balancing_algorithm::port) to continue to work and use the client's source port. Connection migration can only be done by Seastar at the moment. We can't use input_stream/data_source here since the connection will be transferred to another shard, and these objects are deeply intertwined with the shard, but we can use pollable_fd which is just a wrapper around a file descriptor. We don't need to use the equivalent of read_exactly() because the protocol guarantees the header will be sent in one packet and not be fragmented. In this patch, we only parse the protocol header and extract the remote address. Changing connected_socket::local_address() and connected_socket::remote_address() to reflect the proxy data is done later.
Proxied connections derive the local and remote addresses from the proxy protocol, not getsockname(2)/getpeername(2). Add a new derived class to hold the storage for those addresses and supply new implementations for local_address() and remote_address(). A constructor in posix_connected_socket_impl is made public for this. Since the class itself is part of the implementation, this doesn't expose anything.
Expand the existing tests to send a fake proxy protocol packet instead of binding to the source port.
16 tests covering: Negative cases: bad signature, incomplete packets, invalid versions/commands/families/protocols, wrong lengths. Positive cases: IPv4/IPv6 addresses, LOCAL mode, TLVs, extreme port values. The tests were written by copilot under my guidance.
Useful for testing connection load-balancing algorithms.
Useful for, well, testing load balancing algorithms. Can be used with curl's --local-port option, and a local haproxy to test the proxy protocol.
7109cfe to
91c0655
Compare
|
v9:
|
Proxy protocol v2 [1] is a protocol used by haproxy to forward the original
client information to a server behind a reverse proxy. It is also employed
by AWS Network Load Balancers for the same purpose [2].
The connection load balancing algorithm "port" which uses the source port
number cannot be used behind a reverse proxy, because the port number
generated by the client is obfuscated by the reverse proxy.
We introduce a new listen_options member that declares connections to
the server_socket use proxy protocol v2. The remote_address() and local_address()
connected_socket members will now return the original connection's addresses.
The "port" load balancing algorithm will also use the original connection's
source port, restoring correct shard routing.
This server socket will only accept connections that have the protocol header
prepended to the connection data.
The httpd server is extended with a /shard endpoint (so we can observe the
shard we landed on) and a --proxy-protocol switch so we can test
the new algorithm. It is easy to test with the /shard endpoint and the curl
--local-port option, with and without haproxy.
[1] https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
[2] https://aws.amazon.com/blogs/networking-and-content-delivery/preserving-client-ip-address-with-proxy-protocol-v2-and-network-load-balancer/