-
-
Notifications
You must be signed in to change notification settings - Fork 29
refactor: bevy #882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: bevy #882
Conversation
Benchmark Results for general
Comparing to d817614 |
Benchmark Results for general
Comparing to 4c402d0 |
Benchmark Results for general
Comparing to 4c402d0 |
Entity indices are unique for all living entities. The generation does not need to be included in the id.
Thread locals can be used from the thread_local crate. Bevy does not appear to expose a thread-specific identifier for the original optimized ThreadLocal to work.
This reduces the code complexity and also allows the code to check if another player with the same name exists when a player joins without race conditions.
Systems are expected to establish a happens-before relationship through explicit system ordering, sending multiple packets in the same system, etc to maintain the order of a sequence of packets relative to each other.
Adds all components to the spawned entity in the spawn command that are present in the original version of Hyperion. Note that the original version still has some components commented out.
When the Play state is added, players receive chunk packets. It seems like the client ignores chunk packets sent before GameJoinS2c. To fix this, the Play state is only added after GameJoinS2c
Without this, some threads would finish their work and be idle while other threads are working. This change ensures that all threads are able to work to finish a system faster.
Benchmark Results for general
Comparing to ddb1090 |
This PR is now ready for review. Here's a general overview of the most important changes:
Most of the other changes are relatively trivial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't had the chance to actually run it yet.
Regardless, there's nothing that really sticks out to me that I would block this PR on. Is there any particular feedback you're looking for?
If the goal of Hyperion is to set a world record for highest CCU but you can achieve 2x that number of players by swapping out the ECS, what's stopping someone from doing that and breaking the record a month afterwards? :p It sounds like y'all have committed to this path which is fine, but with a performance difference that big it seems to conflict with the stated goal of the project 🤔 I don't think any of the errors in #761 (comment) have been reported as issues on the Flecs discord. It looks like a few issues are caused by being on an older version, and one or two are likely application errors. Either way looks like it would've been easy to address them. |
Benchmark Results for general
Comparing to ddb1090 |
It'd be more accurate to say that the Flecs version supports ~1.25-1.5x more players, although this is under the assumption that the relationship between ms/tick and player count is linear (which I believe it mostly is, although there are things like collision checks which likely have a greater than linear time complexity). The main limitation will be finding enough players to join the event. It's always a possibility that someone else will run their own event with more players without any changes to Hyperion. It's also quite possible for someone else to just rent a more powerful VPS which would be relatively cheap considering that the event would only last a couple of hours. I don't think our choice of ECS will stop anyone else from claiming the record later. Also, stability is very important because crashes in the event are not acceptable. I'd love to see the crashes in Flecs get fixed, but I know that debugging them without an MCVE is difficult and will take a long time.
It's important to keep the specs of the VPS I've been testing on in mind. Using a CPU with better or worse performance will decrease or increase the performance difference respectively. The CPU benchmark of my VPS (ARM Neoverse-N1 4 Core) shows that it is quite slow compared to more common server CPUs, so the performance difference will be a lot larger on my VPS. Here is the comparison between the 4-core Neoverse and the M3 Max used in the benchmarks in the README (for reference, the 4-core Neoverse gets 13.5ms/tick in the Flecs version while the M3 Max gets 1.42ms/tick in the Flecs version): It's somewhat misleading to take the ms/tick from my VPS benchmarks at face value, considering that the VPS we use for the final event will have a substantially more powerful CPU.
I reported Indra-db/Flecs-Rust#253 to the Flecs issue tracker. I haven't reported the other issues because I don't think posting the error messages without an MCVE would be helpful, but I can still post them to the Discord or to the issue tracker if you'd like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
1.5x faster with an older version of Flecs. The reason I said 2x is because the latest Flecs release comes with a lot of performance improvements: https://ajmmertens.medium.com/flecs-4-1-is-out-fab4f32e36f6 ![]() When this version is measured against latest bevy Flecs consistently shows better performance which is why I expect the difference to be closer than 2x than 1.5x (1.5x by itself already is a big difference though: 10.000 players vs. 6666 players!).
It should be, for a large enough number of players all other factors should become insignificant. Unless you're doing collision checks by comparing each entity against every other entity, player count should be the dominating factor.
I don't think this assumption is backed up by data. I've worked with many large and small game studios, and once a bug is reported it usually takes between 1-2 weeks to get fixed, sometimes faster. None of the issues listed in your message look particularly difficult to fix. If there's an easy way for me to reproduce them by running Hyperion I'd also be happy to take a look.
I don't think there is reason to assume that either Flecs or Bevy would perform better/worse on different hardware, so the performance difference (and therefore difference in player count) should be roughly the same between platforms.
Before you do that we should make sure you're on the latest Flecs version. Some of the issues might just disappear once you're on the latest version. I'll leave a comment on the issue in the Rust repo, since I think I know what's going on there. |
Download the On another terminal, download the Note that on Linux/macOS, you might need to run
I believe some of these issues might require time to design breaking API changes to prevent triggering internal asserts or undefined behavior using the safe Rust API in Flecs. Ideally we would not be able to trigger internal asserts/UB through the safe API at all instead of just needing to avoid using the Flecs API in unsound ways within Hyperion since the latter allows us to accidentally make changes that reintroduce the errors. However I am not familiar with the Flecs internals so I do not know for certain whether breaking API changes would be needed.
To clarify, I'm referring to the difference between ms/tick (ms/tick of the new version minus ms/tick of the old version). Based on my assumptions earlier1, this PR would increase the ms/tick from 1.42 ms/tick to ~2ms/tick on the M3 Max. It still increased by the same amount of times (1.25-1.5x, or 2x estimated from the new Flecs version), but it is still more than performant enough for our 10k player event since we only need to run 20 ticks/second. Footnotes
|
None of the error messages so far suggest there's a fundamental issue in the design of the flecs rust API, so I think this is a bit of a premature assumption. I'll try to reproduce the issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
This reflects the changes made in #882 and #914. I have also adjusted the claim that Hyperion supports 170k+ players to 10k+ players. We do not currently have any benchmarks supporting the 170k+ player claim before or after the Bevy PR. I believe this number was extrapolated from the 5k player ms/tick, where (1.42 ms/tick)/(5k players) = (50 ms/tick)/(176k players [extrapolated]), but we do not have a real benchmark running 170k+ players.
This is an experiment
Related: valence-rs/valence#685
Why switch from flecs to Bevy
Why stay in flecs
Issues that need to be resolved:
system-order
(which is core to hyperion) will workhyperion/crates/system-order/src/lib.rs
Lines 1 to 146 in 9b8c3d0
system-order
is used for re-ordering packets once they reach the proxy. I definitely feel like flecs has fewer things that would hard roadblock us... however sadly flecs rust has some quality of life issues that can be annoying