Skip to content

refactor: bevy #882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 125 commits into from
Jul 3, 2025
Merged

refactor: bevy #882

merged 125 commits into from
Jul 3, 2025

Conversation

andrewgazelka
Copy link
Member

@andrewgazelka andrewgazelka commented Apr 22, 2025

This is an experiment

Related: valence-rs/valence#685

Why switch from flecs to Bevy

Why stay in flecs

  • More performant
  • Has ability of us adding scripting language on top in non-Rust
  • Overall better design (at least for non-Rust). I expect flecs to always be ahead of Bevy in terms of its progress. Also less of a chance we get roadblocked by some Bevy design feature.

Issues that need to be resolved:

  • in Bevy not everything is an entity. This makes some things annoying. For instance, I am unsure how system-order (which is core to hyperion) will work

use std::collections::{BTreeSet, HashMap, HashSet};
use derive_more::Constructor;
use flecs_ecs::{
core::{
Builder, ComponentOrPairId, Entity, EntityView, EntityViewGet, IdOperations, QueryAPI,
QueryBuilderImpl, SystemAPI, flecs, flecs::DependsOn,
},
macros::Component,
prelude::{Module, World},
};
/// sort by depth and then by id
#[derive(PartialOrd, Ord, PartialEq, Eq, Debug)]
struct OrderKey {
depth: usize,
id: Entity,
}
#[derive(Default)]
struct DepthCalculator {
depths: HashMap<Entity, usize, rustc_hash::FxBuildHasher>,
}
impl DepthCalculator {
fn calculate_depth(&mut self, view: EntityView<'_>) -> usize {
if let Some(depth) = self.depths.get(&view.id()) {
return *depth;
}
// todo: add stackoverflow check
let mut entity_depth = 0;
view.each_target::<DependsOn>(|depends_on| {
let tentative_depth = self.calculate_depth(depends_on) + 1;
entity_depth = entity_depth.max(tentative_depth);
});
self.depths.insert(view.id(), entity_depth);
entity_depth
}
fn on_update_depth(&mut self, world: &World) -> usize {
let view = world
.component_id::<flecs::pipeline::PostUpdate>()
.entity_view(world);
self.calculate_depth(view)
}
}
#[derive(
Component,
Constructor,
Copy,
Clone,
Debug,
PartialEq,
Eq,
PartialOrd,
Ord
)]
#[must_use]
#[meta]
pub struct SystemOrder {
value: u16,
}
impl SystemOrder {
#[must_use]
pub const fn value(&self) -> u16 {
self.value
}
pub fn of(entity: EntityView<'_>) -> Self {
entity.get::<&Self>(|order| *order)
}
}
fn calculate(world: &World) {
let mut depth_calculator = DepthCalculator::default();
let mut map = BTreeSet::new();
// get all depths for systems
world
.query::<()>()
.with::<flecs::system::System>()
.build()
.each_entity(|entity, ()| {
let depth = depth_calculator.calculate_depth(entity);
map.insert(OrderKey {
depth,
id: entity.id(),
});
});
// handle all observers
world
.query::<()>()
.with::<flecs::Observer>()
.build()
.each_entity(|entity, ()| {
let depth = depth_calculator.on_update_depth(world);
map.insert(OrderKey {
depth,
id: entity.id(),
});
});
// assert all entities are unique
assert_eq!(
map.len(),
map.iter().map(|x| x.id).collect::<HashSet<_>>().len()
);
for (idx, value) in map.into_iter().enumerate() {
let idx = u16::try_from(idx).expect("number of systems exceeds u16 (65536)");
let entity = value.id.entity_view(world);
entity.set(SystemOrder::new(idx));
}
}
#[derive(Component)]
pub struct SystemOrderModule;
impl Module for SystemOrderModule {
fn module(world: &World) {
world.component::<SystemOrder>().meta();
world
.observer::<flecs::OnAdd, ()>()
.with::<flecs::system::System>()
.run(|it| calculate(&it.world()));
world
.observer::<flecs::OnAdd, ()>()
.with::<flecs::Observer>()
.run(|it| calculate(&it.world()));
}
}

system-order is used for re-ordering packets once they reach the proxy. I definitely feel like flecs has fewer things that would hard roadblock us... however sadly flecs rust has some quality of life issues that can be annoying

Copy link

Benchmark Results for general

ray_intersection/aabb_size_0.1                     [  33.7 ns ...  33.6 ns ]      -0.29%
ray_intersection/aabb_size_1                       [  33.4 ns ...  33.3 ns ]      -0.23%
ray_intersection/aabb_size_10                      [  24.8 ns ...  24.7 ns ]      -0.26%
ray_intersection/ray_distance_1                    [  13.5 ns ...  13.5 ns ]      -0.06%
ray_intersection/ray_distance_5                    [  13.5 ns ...  13.5 ns ]      -0.02%
ray_intersection/ray_distance_20                   [  13.6 ns ...  13.5 ns ]      -0.12%
overlap/no_overlap                                 [  24.5 ns ...  24.8 ns ]      +1.25%
overlap/partial_overlap                            [  24.7 ns ...  24.6 ns ]      -0.15%
overlap/full_containment                           [  22.5 ns ...  22.5 ns ]      -0.11%
point_containment/inside                           [   4.9 ns ...   4.9 ns ]      +0.09%
point_containment/outside                          [   4.4 ns ...   4.4 ns ]      +0.06%
point_containment/boundary                         [   4.9 ns ...   4.9 ns ]      -0.06%

Comparing to d817614

Copy link

Benchmark Results for general

ray_intersection/aabb_size_0.1                     [  38.4 ns ...  36.2 ns ]      -5.72%
ray_intersection/aabb_size_1                       [  34.9 ns ...  36.5 ns ]      +4.74%
ray_intersection/aabb_size_10                      [  24.8 ns ...  25.8 ns ]      +4.04%
ray_intersection/ray_distance_1                    [  14.0 ns ...  13.8 ns ]      -1.75%
ray_intersection/ray_distance_5                    [  15.5 ns ...  14.3 ns ]      -7.87%
ray_intersection/ray_distance_20                   [  14.1 ns ...  14.1 ns ]      +0.20%
overlap/no_overlap                                 [  24.2 ns ...  24.4 ns ]      +0.86%
overlap/partial_overlap                            [  24.6 ns ...  24.6 ns ]      +0.02%
overlap/full_containment                           [  22.5 ns ...  22.5 ns ]      -0.17%
point_containment/inside                           [   4.9 ns ...   4.9 ns ]      +0.22%
point_containment/outside                          [   4.4 ns ...   4.4 ns ]      -0.18%
point_containment/boundary                         [   5.0 ns ...   5.0 ns ]      +0.11%

Comparing to 4c402d0

Copy link

Benchmark Results for general

ray_intersection/aabb_size_0.1                     [  33.7 ns ...  33.8 ns ]      +0.22%
ray_intersection/aabb_size_1                       [  33.5 ns ...  33.6 ns ]      +0.19%
ray_intersection/aabb_size_10                      [  24.6 ns ...  24.6 ns ]      -0.27%
ray_intersection/ray_distance_1                    [  13.4 ns ...  13.5 ns ]      +0.53%
ray_intersection/ray_distance_5                    [  13.6 ns ...  13.6 ns ]      -0.23%
ray_intersection/ray_distance_20                   [  13.5 ns ...  13.5 ns ]      +0.16%
overlap/no_overlap                                 [  24.5 ns ...  24.7 ns ]      +0.86%
overlap/partial_overlap                            [  24.7 ns ...  24.6 ns ]      -0.24%
overlap/full_containment                           [  22.6 ns ...  22.5 ns ]      -0.20%
point_containment/inside                           [   4.9 ns ...   4.9 ns ]      -0.11%
point_containment/outside                          [   4.4 ns ...   4.4 ns ]      +0.16%
point_containment/boundary                         [   4.9 ns ...   4.9 ns ]      +0.02%

Comparing to 4c402d0

@andrewgazelka andrewgazelka mentioned this pull request Apr 28, 2025
@TestingPlant TestingPlant reopened this May 22, 2025
@TestingPlant TestingPlant self-assigned this May 22, 2025
Entity indices are unique for all living entities. The generation does
not need to be included in the id.
Thread locals can be used from the thread_local crate. Bevy does not
appear to expose a thread-specific identifier for the original optimized
ThreadLocal to work.
This reduces the code complexity and also allows the code to check if
another player with the same name exists when a player joins without
race conditions.
Systems are expected to establish a happens-before relationship through
explicit system ordering, sending multiple packets in the same system,
etc to maintain the order of a sequence of packets relative to each
other.
Adds all components to the spawned entity in the spawn command that are
present in the original version of Hyperion. Note that the original
version still has some components commented out.
When the Play state is added, players receive chunk packets. It seems
like the client ignores chunk packets sent before GameJoinS2c. To fix
this, the Play state is only added after GameJoinS2c
Without this, some threads would finish their work and be idle while
other threads are working. This change ensures that all threads are able
to work to finish a system faster.
Copy link

github-actions bot commented Jul 1, 2025

Benchmark Results for general

ray_intersection/aabb_size_0.1                     [  29.4 ns ...  24.1 ns ]     -18.09%*
ray_intersection/aabb_size_1                       [  27.4 ns ...  21.7 ns ]     -20.81%*
ray_intersection/aabb_size_10                      [  24.0 ns ...  19.6 ns ]     -18.52%*
ray_intersection/ray_distance_1                    [  13.8 ns ...   3.8 ns ]     -72.41%*
ray_intersection/ray_distance_5                    [  13.8 ns ...   3.8 ns ]     -72.62%*
ray_intersection/ray_distance_20                   [  13.8 ns ...   3.8 ns ]     -72.59%*
overlap/no_overlap                                 [  25.0 ns ...  28.9 ns ]     +15.45%*
overlap/partial_overlap                            [  25.1 ns ...  28.8 ns ]     +14.87%*
overlap/full_containment                           [  22.5 ns ...  21.6 ns ]      -3.97%
point_containment/inside                           [   4.9 ns ...   4.9 ns ]      +0.01%
point_containment/outside                          [   4.4 ns ...   4.4 ns ]      -0.04%
point_containment/boundary                         [   4.9 ns ...   4.9 ns ]      +0.39%

Comparing to ddb1090

@TestingPlant TestingPlant marked this pull request as ready for review July 1, 2025 05:10
@TestingPlant
Copy link
Collaborator

This PR is now ready for review. Here's a general overview of the most important changes:

  • We are now using a custom lock-free channel from packet_channel.rs to pass c2s packets from the proxy code to the ingress code. I have checked the safety of this through Miri. Originally, Hyperion used a dashmap mapping connection ids to BytesMut. Dashmap locks internally so it is not ideal for performance. The ingress system would copy these bytes to a Vec which contained raw packet bytes to be decoded. The new packet channel is a linked list of "fragments," where each Fragment is the equivalent of a Box<[u8]> containing 0 or more full contiguous packets with a length prefix. Packets are decoded directly from the packet channel reader.
  • I have changed my Valence fork to use bytes::Bytes for protocol packets to allow packets to be 'static, which allows them to be sent through Bevy events. Making this change while not requiring the bytes to be copied to an owned container (e.g. Vec<u8>) while decoding required the packet channel to be implemented.
  • C2s packets are now sent through Bevy events. HandlerRegistry has been removed.
  • S2c packets are now ordered based on the order of the unicast/broadcast call instead of using system-order. I believe this is more intuitive, and systems can explicitly specify when a system runs in relation with other systems if needed.
  • command_channel.rs is based off of Bevy's Commands implementation and adapted to turn it into an mpsc channel.

Most of the other changes are relatively trivial.

Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had the chance to actually run it yet.

Regardless, there's nothing that really sticks out to me that I would block this PR on. Is there any particular feedback you're looking for?

@SanderMertens
Copy link

I still believe that the performance of this PR is acceptable

If the goal of Hyperion is to set a world record for highest CCU but you can achieve 2x that number of players by swapping out the ECS, what's stopping someone from doing that and breaking the record a month afterwards? :p

It sounds like y'all have committed to this path which is fine, but with a performance difference that big it seems to conflict with the stated goal of the project 🤔

I don't think any of the errors in #761 (comment) have been reported as issues on the Flecs discord. It looks like a few issues are caused by being on an older version, and one or two are likely application errors. Either way looks like it would've been easy to address them.

Copy link

github-actions bot commented Jul 1, 2025

Benchmark Results for general

ray_intersection/aabb_size_0.1                     [  26.2 ns ...  20.7 ns ]     -21.11%*
ray_intersection/aabb_size_1                       [  24.7 ns ...  19.1 ns ]     -22.33%*
ray_intersection/aabb_size_10                      [  23.6 ns ...  19.3 ns ]     -18.41%*
ray_intersection/ray_distance_1                    [  13.8 ns ...   4.0 ns ]     -71.03%*
ray_intersection/ray_distance_5                    [  14.0 ns ...   4.2 ns ]     -69.54%*
ray_intersection/ray_distance_20                   [  14.0 ns ...   4.1 ns ]     -70.66%*
overlap/no_overlap                                 [  25.5 ns ...  29.0 ns ]     +13.79%*
overlap/partial_overlap                            [  25.3 ns ...  28.9 ns ]     +14.19%*
overlap/full_containment                           [  22.5 ns ...  21.2 ns ]      -5.48%*
point_containment/inside                           [   4.9 ns ...   4.9 ns ]      -0.11%
point_containment/outside                          [   4.4 ns ...   4.4 ns ]      -0.11%
point_containment/boundary                         [   5.0 ns ...   5.0 ns ]      +0.15%

Comparing to ddb1090

@TestingPlant
Copy link
Collaborator

If the goal of Hyperion is to set a world record for highest CCU but you can achieve 2x that number of players by swapping out the ECS, what's stopping someone from doing that and breaking the record a month afterwards?

It'd be more accurate to say that the Flecs version supports ~1.25-1.5x more players, although this is under the assumption that the relationship between ms/tick and player count is linear (which I believe it mostly is, although there are things like collision checks which likely have a greater than linear time complexity).

The main limitation will be finding enough players to join the event. It's always a possibility that someone else will run their own event with more players without any changes to Hyperion. It's also quite possible for someone else to just rent a more powerful VPS which would be relatively cheap considering that the event would only last a couple of hours. I don't think our choice of ECS will stop anyone else from claiming the record later.

Also, stability is very important because crashes in the event are not acceptable. I'd love to see the crashes in Flecs get fixed, but I know that debugging them without an MCVE is difficult and will take a long time.

It sounds like y'all have committed to this path which is fine, but with a performance difference that big it seems to conflict with the stated goal of the project 🤔

It's important to keep the specs of the VPS I've been testing on in mind. Using a CPU with better or worse performance will decrease or increase the performance difference respectively. The CPU benchmark of my VPS (ARM Neoverse-N1 4 Core) shows that it is quite slow compared to more common server CPUs, so the performance difference will be a lot larger on my VPS.
image

Here is the comparison between the 4-core Neoverse and the M3 Max used in the benchmarks in the README (for reference, the 4-core Neoverse gets 13.5ms/tick in the Flecs version while the M3 Max gets 1.42ms/tick in the Flecs version):
image

It's somewhat misleading to take the ms/tick from my VPS benchmarks at face value, considering that the VPS we use for the final event will have a substantially more powerful CPU.

I don't think any of the errors in #761 (comment) have been reported as issues on the Flecs discord.

I reported Indra-db/Flecs-Rust#253 to the Flecs issue tracker. I haven't reported the other issues because I don't think posting the error messages without an MCVE would be helpful, but I can still post them to the Discord or to the issue tracker if you'd like.

Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@SanderMertens
Copy link

It'd be more accurate to say that the Flecs version supports ~1.25-1.5x more players

1.5x faster with an older version of Flecs. The reason I said 2x is because the latest Flecs release comes with a lot of performance improvements: https://ajmmertens.medium.com/flecs-4-1-is-out-fab4f32e36f6

Screenshot 2025-07-01 at 1 45 14 PM

When this version is measured against latest bevy Flecs consistently shows better performance which is why I expect the difference to be closer than 2x than 1.5x (1.5x by itself already is a big difference though: 10.000 players vs. 6666 players!).

this is under the assumption that the relationship between ms/tick and player count is linear

It should be, for a large enough number of players all other factors should become insignificant. Unless you're doing collision checks by comparing each entity against every other entity, player count should be the dominating factor.

I'd love to see the crashes in Flecs get fixed, but I know that debugging them without an MCVE is difficult and will take a long time

I don't think this assumption is backed up by data. I've worked with many large and small game studios, and once a bug is reported it usually takes between 1-2 weeks to get fixed, sometimes faster. None of the issues listed in your message look particularly difficult to fix. If there's an easy way for me to reproduce them by running Hyperion I'd also be happy to take a look.

It's somewhat misleading to take the ms/tick from my VPS benchmarks at face value, considering that the VPS we use for the final event will have a substantially more powerful CPU.

I don't think there is reason to assume that either Flecs or Bevy would perform better/worse on different hardware, so the performance difference (and therefore difference in player count) should be roughly the same between platforms.

I haven't reported the other issues because I don't think posting the error messages without an MCVE would be helpful, but I can still post them to the Discord or to the issue tracker if you'd like.

Before you do that we should make sure you're on the latest Flecs version. Some of the issues might just disappear once you're on the latest version. I'll leave a comment on the issue in the Rust repo, since I think I know what's going on there.

@TestingPlant
Copy link
Collaborator

If there's an easy way for me to reproduce them by running Hyperion I'd also be happy to take a look.

Download the update-flecs-3 branch from https://github.com/TestingPlant/hyperion. I have updated it to the latest Flecs-Rust commit on the main branch. Then run cargo run --bin tag

On another terminal, download the optimize branch from https://github.com/andrewgazelka/proptest-mc-bot and run cargo run -- 127.0.0.1:25565 5000

Note that on Linux/macOS, you might need to run ulimit -Sn 5000 in the same terminal beforehand to increase the file descriptor limit.

once a bug is reported it usually takes between 1-2 weeks to get fixed, sometimes faster.

I believe some of these issues might require time to design breaking API changes to prevent triggering internal asserts or undefined behavior using the safe Rust API in Flecs. Ideally we would not be able to trigger internal asserts/UB through the safe API at all instead of just needing to avoid using the Flecs API in unsound ways within Hyperion since the latter allows us to accidentally make changes that reintroduce the errors. However I am not familiar with the Flecs internals so I do not know for certain whether breaking API changes would be needed.

I don't think there is reason to assume that either Flecs or Bevy would perform better/worse on different hardware, so the performance difference (and therefore difference in player count) should be roughly the same between platforms.

To clarify, I'm referring to the difference between ms/tick (ms/tick of the new version minus ms/tick of the old version). Based on my assumptions earlier1, this PR would increase the ms/tick from 1.42 ms/tick to ~2ms/tick on the M3 Max. It still increased by the same amount of times (1.25-1.5x, or 2x estimated from the new Flecs version), but it is still more than performant enough for our 10k player event since we only need to run 20 ticks/second.

Footnotes

  1. Unfortunately we likely won't be able to get real benchmarks from the MacBook since Andrew is currently busy.

@SanderMertens
Copy link

I believe some of these issues might require time to design breaking API changes to prevent triggering internal asserts or undefined behavior using the safe Rust API in Flecs.

None of the error messages so far suggest there's a fundamental issue in the design of the flecs rust API, so I think this is a bit of a premature assumption. I'll try to reproduce the issues.

Copy link
Collaborator

@Kumpelinus Kumpelinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@TestingPlant TestingPlant added this pull request to the merge queue Jul 3, 2025
Merged via the queue into main with commit 4f2bafe Jul 3, 2025
11 of 12 checks passed
@TestingPlant TestingPlant deleted the andrew/bevy branch July 3, 2025 01:25
github-merge-queue bot pushed a commit that referenced this pull request Jul 4, 2025
This reflects the changes made in #882 and #914.

I have also adjusted the claim that Hyperion supports 170k+ players to
10k+ players. We do not currently have any benchmarks supporting the
170k+ player claim before or after the Bevy PR. I believe this number
was extrapolated from the 5k player ms/tick, where (1.42 ms/tick)/(5k
players) = (50 ms/tick)/(176k players [extrapolated]), but we do not
have a real benchmark running 170k+ players.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Bevy vs flecs Possible undefined behavior with bots connected
6 participants