p2p: Improved peer selection with /24 subnet deduplication to disadvantage 'spy nodes' #9939

rbrunner7 · 2025-05-30T14:59:41Z

For background info about those "spy nodes" check this GitHub issue, this Reddit post and this wonderful work from Rucknium.

This PR was developed in close cooperation with @Rucknium who gave much valuable input and also actually checked that the new method improves things: The daemon connects to considerably less "spy nodes" with it.

As I rewrote large parts of the make_new_connection_from_peerlist method, the diff with the old version of the method comes out quite confusing. Maybe for review hold the two full method sources side-by-side.

jeffro256

First pass: some high-level commentary, and also some reminders to use const in variable declarations when possible.

src/p2p/net_node.inl

jeffro256 · 2025-07-31T06:09:54Z

src/p2p/net_node.inl

-      // build a set of all the /16 we're connected to, and prefer a peer that's not in that set
-      std::set<uint32_t> classB;
-      if (&zone == &m_network_zones.at(epee::net_utils::zone::public_)) // at returns reference, not copy
+      // Build a list of all distinct /24 subnets we are connected to now right now; to catch


Why narrow the subnets of connections we avoid duplicating from /16 to /24? This would allow the code to make connections to IPs more similar to the set of connections that we already have, not less.

The background here is @Rucknium 's paper, to be found here, and there especially figure 2 on page 5.

Rucknium and I discussed /16 versus /24 back and forth over weeks. If I read the backlog correctly, I asked whether /24 instead of the proposed /16 in the paper would be alright as well. Rucknium agreed to an attempt to go for that. They also did a large number of test runs with statistics to see whether my code actually worked, and to what degree, and /24 seemed to work out.

When proposing "going finer", I imagined that with /24 you exclude less honest nodes that just have the bad luck to be in the same subnet as a spy node. With /16 this danger seems much bigger to me. And well, as of now the spy nodes do us the favor to nicely mostly crowd into /24 subnets that they fill, using all 256 IPs there.

This would allow the code to make connections to IPs more similar to the set of connections that we already have, not less.

Not sure how to weight that. In any case, the whole peer selection process is essentially random on several levels. I don't think there is a real danger that we accidentally somehow seek out /24 subnets that are immediate neighbours of spy node infested subnets and thus probably infested themselves. I would also speculate that if this was real danger Rucknium's test runs should have unearthed that.

I think we're misunderstanding each other here. I mean that the old code was already de-duplicating subnets for which ones which it was already connected to, not the new type of de-duplication where we treat many IPs under a new /24 subnet as having one total vote. Did Rucknium discuss this type of de-duplication at all?

I don't think we have a misunderstanding. Yes, it's about this very first phase of making a list of subnets we are already connected to. Originally built as a list of /16 subnets, now narrowed down to a list of /24 subnets.

And yes, understood as such by Rucknium and me, and as I said discussed back and forth whether to make a difference in subnet size between this initial "connected to" list evaluation and the later candidate filtering, or treat everything /24, and the latter is where we landed and which Rucknium gave their ok to, after some simulations and/or test runs with their actual daemon, as far as I remember.

I ran some simulations that produced realistic network connections when nodes adopt different subnet connection schemes.

The simulation works like this:

A set of IP addresses of reachable node is collected, based on a Monero network scan from May 2025. Only reachable nodes will be used in the simulation, but the network simulation function has to option to set up unreachable nodes if desired. Since the IP addresses of unreachable nodes are unknown, they cannot be used for the network latency calculations later.

Each node follows a procedure to establish "connections" to other nodes:

The node considers the whole list of IP addresses. Any node it is already connected to is excluded. Any node that is in a subnet of a node that it is already connected to is also excluded (this subnet level is a configurable parameter).

The remaining candidate list is de-duplicated based on a specified subnet level, which can be different from the already-connected subnet level.

A peer is chosen from this list with uniform random selection.

Steps 3 - 5 are performed 12 times to fill up the default outbound connections limit.

A peer churning process is simulated. This involves randomly dropping a peer from the output peer list and re-running steps 3 - 5 to fill in the dropped peer slot. 100 peers are dropped and their slots filled. This churn step is needed because of how the "already-connected" criteria statistically depends on the number of peer slots filled.

Nodes that are on the MRL ban list accept inbound connections, but do not establish their own outbound connections. In the simulation, they are skipped when it is their turn to create connections.

Note that any "limits" on the number of inbound peer connections to nodes are ignored.

I performed simulations of four scenarios using the gen.network() function of version 0.2.4 of my xmrpeers R package. The combinations of "already-connected" subnet level being set to /16 and /24 and subnet deduplication being enabled are the /24 level or not being enabled at all.

The scenario of the status quo is "already-connected" subnet level being set to /16 and subnet deduplication not being enabled.

This scenario of this PR is "already-connected" subnet level being set to /24 and subnet deduplication being enabled.

The simulation creates a network/graph edgelist. The edgelist can be used to compute the number of outbound connections of honest nodes to suspected spy nodes. It can also be used to roughly measure network latency. Below are some tables.

Average number of connections to spy nodes

already.connected.subnet.level do.deduplication mean.malicious.connections

16 TRUE 1.047757

24 TRUE 1.064399

16 FALSE 3.582851

24 FALSE 3.626628

When honest nodes have the default 12 outbound connections and subnet deduplication is enabled, setting the "already-connected" subnet qualification to the /16 subnet level gets you an average of 1.048 connections to the spy nodes. When it is /24, the average is 1.064.

@jeffro256 , you asked :

But there is also geographical distribution concerns also, not related to spy nodes. How does going from /16 to /24 affect that? A less diverse set of already-connected peers might slow down tx&block propagation for those farthest away from the "center" of the node graph

According to Wikipedia.

The eccentricity $\epsilon(v)$ of a vertex $v$ is the greatest distance between $v$ and any other vertex....It can be thought of as how far a node is from the node most distant from it in the graph.

The eccentricity of every node can be measured. For eccentricity computations, the spy nodes were removed from the network because they do not establish outbound connections.

The edges of the network graph were weighted by the geographic distance between the IP addresses. The units of the weight of each edge is the number of milliseconds it would take light to travel on the Earth's surface in the shortest path between two points. Actual internet latency speeds will be slower due to internet cables not following the shortest path between each IP address's geographic location and computations at internet exchange points.

Mean eccentricity, in milliseconds

already.connected.subnet.level do.deduplication mean.weighted.eccentricity

16 TRUE 69.41301

24 TRUE 68.58541

16 FALSE 70.05076

24 FALSE 70.27645

When subnet deduplication is enabled and the "already-connected" subnet qualification is /16, the time that a photon could travel from an average node to the "most distant" node in the network is 69.41 milliseconds. The next table computes the maximum, i.e. the maximum distance between any two nodes on the network.

Maximum eccentricity, in milliseconds

already.connected.subnet.level do.deduplication max.weighted.eccentricity

16 TRUE 106.5484

24 TRUE 108.3377

16 FALSE 110.4373

24 FALSE 110.8845

We also get unweighted eccentricity, which just counts each hop as one unit. The unweighted eccentricity of every honest node in the network in every scenario was 4, i.e. the minimum number of hops between a specific node and and any other node was always 4 regardless of which node we start at. In a few of the simulation runs iwth a different random seed, a few of the nodes had an unweighted eccentricity of 5 when do.deduplication was FALSE. But in this random seed, every is 4:

Table of unweighted eccentricity

already.connected.subnet.level do.deduplication table.unweighted.eccentricity

16 TRUE 4: 2764

24 TRUE 4: 2764

16 FALSE 4: 2764

24 FALSE 4: 2764

Code to reproduce the analysis is below. Click to expand. Note that this code makes network calls to a geo-IP service.

Reproduction code

# WARNING: This script makes network calls to ip-api.com install.packages(c("remotes", "geosphere", "knitr")) remotes::install_github("Rucknium/xmrpeers", upgrade = FALSE, ref = "35408988f80a306627f5c10abb10ffb89f008bf4") # ref is to get specific version so this is reproducible library(data.table) data("good_peers", "ban_list", package = "xmrpeers") good_peers.cleaned <- good_peers good_peers.cleaned <- stringr::str_extract(good_peers.cleaned, "[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}") # Remove ports from IP addresses good_peers.cleaned <- good_peers.cleaned[good_peers.cleaned != ""] # Removes IPv6 addresses good_peers.cleaned <- unique(na.omit(good_peers.cleaned)) # https://ip-api.com/docs/api:batch # Docs say: # This endpoint is limited to 15 requests per minute from an IP address. # A request needs to formatted as a JSON array, containing up to 100 IP addresses or objects. # Only POST requests are accepted. # The returned HTTP header X-Rl contains the number of requests remaining in the # current rate limit window. X-Ttl contains the seconds until the limit is reset. # Your implementation should always check the value of the X-Rl header, and if # its is 0 you must not send any more requests for the duration of X-Ttl in seconds. good_peers.cleaned.split <- split(good_peers.cleaned, f = rep(seq_len(ceiling(length(good_peers.cleaned) / 100)), each = 100)[seq_along(good_peers.cleaned)]) # Split into chunks of 100, with remainder handle <- RCurl::getCurlHandle() h = RCurl::basicHeaderGatherer() w = RCurl::basicTextGatherer() ip.geo <- list() for (i in seq_along(good_peers.cleaned.split)) { json.post <- RJSONIO::toJSON( good_peers.cleaned.split[[i]] ) # returns NULL RCurl::postForm("http://ip-api.com/batch", .opts = RCurl::curlOptions( postfields = json.post, httpheader = c('Content-Type' = 'application/json', Accept = 'application/json'), headerfunction = h$update, writefunc = w$update # Updates the value of "h" and "w" as a side effect. # (header = TRUE would print the header to the console) ), curl = handle ) # "postForm does not return response with headerfunction" # https://github.com/omegahat/RCurl/issues/5 # "You should also pass the writefunc (e.g. basicTextGatherer) along with the headerfunction." ip.geo[[i]] <- data.table::rbindlist(RJSONIO::fromJSON(w$value()), fill = TRUE, ignore.attr = TRUE) w$reset() # Reset to make room for the new data # "This can be used to reuse the same object across requests but to avoid # cumulating new input with the material from previous requests." throttle.limit <- as.numeric(h$value()[["X-Rl"]]) if (throttle.limit == 0 ) { Sys.sleep(75) # Sleep for 75 seconds if (unexpectedly) we have no more requests available # during this minute. } message(i, " of ", length(good_peers.cleaned.split), " API requests completed. ", "Throttle limit in current minute: ", throttle.limit) Sys.sleep(5) # Sleep 5 seconds. Gives one second tolerance to the limit of 15 requests per minute. } ip.geo <- data.table::rbindlist(ip.geo, fill = TRUE) ip.geo <- ip.geo[, .(ip = query, lat, lon)] stopifnot( all(complete.cases(ip.geo))) # Make sure no missing values future::plan(future::multisession, workers = max(c(1, floor(parallelly::availableCores()/5)))) # Each process uses 400-500% CPU, so set to available threads divided by 5 peer.selection.scenarios <- expand.grid( already.connected.subnet.level = c(16, 24), do.deduplication = c(TRUE, FALSE)) results <- vector("list", nrow(peer.selection.scenarios)) set.seed(314) for (scenario in seq_len(nrow(peer.selection.scenarios))) { message("\nStarting simulation for scenario ", scenario, " of ", nrow(peer.selection.scenarios)) already.connected.subnet.level <- peer.selection.scenarios$already.connected.subnet.level[scenario] do.deduplication <- peer.selection.scenarios$do.deduplication[scenario] # Need these to be set as variables due to scoping issues with paralellized sessions generated.network <- xmrpeers::gen.network( outbound.ips = good_peers.cleaned, malicious.ips = ban_list, n.unreachable = 0, already.connected.subnet.level = already.connected.subnet.level, deduplication.subnet.level = 24, do.deduplication = do.deduplication, default.outbound.connections = 12, dropped.connection.churns = 100, compute.network.stats = FALSE ) malicious.indices <- generated.network$nodes[ xmrpeers::in.ip.set(ip, ban_list), index] nodes.honest <- generated.network$nodes[ ! index %in% malicious.indices, ] edgelist.honest <- generated.network$edgelist[ ! origin %in% malicious.indices, ] edgelist.honest.completely <- generated.network$edgelist[ ! destination %in% malicious.indices, ] edgelist.honest[, malicious.connection := destination %in% malicious.indices] malicious.connection.count <- edgelist.honest[, .(count = sum(malicious.connection)), by = "origin"] results[[scenario]]$nrow <- nrow(malicious.connection.count) results[[scenario]]$summary <- summary(malicious.connection.count$count) nodes.honest <- merge(nodes.honest, ip.geo, by = "ip", all.x = TRUE) p1 <- nodes.honest[match(edgelist.honest.completely$origin, index), .(lon, lat)] p2 <- nodes.honest[match(edgelist.honest.completely$destination, index), .(lon, lat)] edge.weights <- geosphere::distGeo(as.matrix(p1), as.matrix(p2)) edgelist.honest.completely[, weight := edge.weights] # igraph::eccentricity() will automatically use weights edgelist.graph <- igraph::graph_from_data_frame(edgelist.honest.completely, directed = FALSE) # NOTE: graph_from_data_frame() is correct. graph_from_edgelist() gives incorrect # results, probably from gaps in the index numbers. weighted.eccentricity <- igraph::eccentricity(edgelist.graph) speed.of.light <- 299792458 # meters per second results[[scenario]]$weighted.eccentricity.summary <- summary( 1000 * weighted.eccentricity / speed.of.light) # Multiply by 1000 to get milliseconds unweighted.eccentricity <- igraph::eccentricity(edgelist.graph, weights = NA) results[[scenario]]$unweighted.eccentricity.table <- table(unweighted.eccentricity) } scenarios.results <- data.table::as.data.table(peer.selection.scenarios) scenarios.results[, mean.malicious.connections := sapply(results, function(x) x$summary[["Mean"]]) ] scenarios.results[, mean.weighted.eccentricity := sapply(results, function(x) x$weighted.eccentricity.summary[["Mean"]]) ] scenarios.results[, max.weighted.eccentricity := sapply(results, function(x) x$weighted.eccentricity.summary[["Max."]]) ] scenarios.results[, table.unweighted.eccentricity := sapply(results, function(x) { y <- x$unweighted.eccentricity.table paste0(names(y), ": ", unclass(y), collapse = "; ") }) ] knitr::kable(scenarios.results[, .(already.connected.subnet.level, do.deduplication, mean.malicious.connections)], format = "pipe") knitr::kable(scenarios.results[, .(already.connected.subnet.level, do.deduplication, mean.weighted.eccentricity)], format = "pipe") knitr::kable(scenarios.results[, .(already.connected.subnet.level, do.deduplication, max.weighted.eccentricity)], format = "pipe") knitr::kable(scenarios.results[, .(already.connected.subnet.level, do.deduplication, table.unweighted.eccentricity)], format = "pipe")

src/p2p/net_node.inl

jeffro256 · 2025-08-27T15:54:55Z

Anecdotally, this does a great job at reducing the number of connections to spy nodes. My running spy node connection count ratio went from ~22% to ~1.5%. You can run this patch to test it out: log_outgoing_spies.patch. Run your monerod instance with --log-level p2p.net:DEBUG and look for lines containing the string "Running spy node stats:".

jeffro256

2nd pass

src/p2p/net_node.inl

jeffro256

This logic is looking pretty good, and does well in light stress testing. It does a good job of clearing out the spies, too. I'll be ready to approve after some small cleanups.

src/p2p/net_node.inl

…ntage 'spy nodes'

0xFFFC0000 added pending review enhancement networking labels May 30, 2025

Rucknium mentioned this pull request Jun 3, 2025

Monero Research Lab Meeting - Wed 04 June 2025, 17:00 UTC monero-project/meta#1217

Closed

Rucknium mentioned this pull request Jun 10, 2025

Monero Research Lab Meeting - Wed 11 June 2025, 17:00 UTC monero-project/meta#1220

Closed

Rucknium mentioned this pull request Jun 27, 2025

Monero Research Lab Meeting - Wed 25 June 2025, 17:00 UTC monero-project/meta#1226

Closed

jeffro256 suggested changes Jul 31, 2025

View reviewed changes

Rucknium mentioned this pull request Jul 31, 2025

Monero Research Lab Meeting - Wed 30 July 2025, 17:00 UTC monero-project/meta#1247

Closed

This was referenced Aug 8, 2025

Monero Research Lab Meeting - Wed 06 August 2025, 17:00 UTC monero-project/meta#1250

Closed

Monero Research Lab Meeting - Wed 13 August 2025, 17:00 UTC monero-project/meta#1254

Closed

Rucknium mentioned this pull request Aug 23, 2025

Monero Research Lab Meeting - Wed 20 August 2025, 17:00 UTC monero-project/meta#1256

Closed

rbrunner7 force-pushed the peers branch from 57eea9d to e293f43 Compare August 24, 2025 13:53

jeffro256 suggested changes Aug 28, 2025

View reviewed changes

Rucknium mentioned this pull request Aug 28, 2025

Monero Research Lab Meeting - Wed 27 August 2025, 17:00 UTC monero-project/meta#1261

Closed

Rucknium mentioned this pull request Sep 4, 2025

Monero Research Lab Meeting - Wed 03 September 2025, 17:00 UTC monero-project/meta#1263

Closed

rbrunner7 force-pushed the peers branch from e293f43 to c9aee9e Compare September 5, 2025 15:40

jeffro256 reviewed Sep 8, 2025

View reviewed changes

src/p2p/net_node.inl Outdated Show resolved Hide resolved

src/p2p/net_node.inl Outdated Show resolved Hide resolved

p2p: Improved peer selection with /24 subnet deduplication to disadva…

4b54215

…ntage 'spy nodes'

rbrunner7 force-pushed the peers branch from c9aee9e to 4b54215 Compare September 12, 2025 11:54

Uh oh!

p2p: Improved peer selection with /24 subnet deduplication to disadvantage 'spy nodes' #9939

Are you sure you want to change the base?

p2p: Improved peer selection with /24 subnet deduplication to disadvantage 'spy nodes' #9939

Conversation

rbrunner7 commented May 30, 2025

Uh oh!

jeffro256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffro256 Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

rbrunner7 Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

jeffro256 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

rbrunner7 Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Rucknium Aug 19, 2025

Choose a reason for hiding this comment

Average number of connections to spy nodes

Mean eccentricity, in milliseconds

Maximum eccentricity, in milliseconds

Table of unweighted eccentricity

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffro256 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffro256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffro256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffro256 commented Aug 27, 2025 •

edited

Loading