Skip to content

Conversation

JustCallMeRay
Copy link
Contributor

@JustCallMeRay JustCallMeRay commented Jan 6, 2025

I have added constexpr specifiers to Node and Edge classes. These allow for the specified functions and objects to be ran at compile time instead of runtime, resulting in a speed increase.

A possible use case that might result in a speed up:

// load a pylogenic tree from 3 databases
// Base nodes can be constructed at compile time
Node<Enum, Organism> bacteriaBase {  Enum.BacteriaBase, Organism{} };
Node<Enum, Organism> archea {  Enum.Archea, Organism{} };
Node<Enum, Organism> eukaryota {  Enum.Eukaryota, Organism{} };
// Edges can also be constructed at compile time
Edge ...
// Code cannot be run at compile time so everything after this line cannot be sped up
std::vector<Node<Enum, Organism>> nodes = LoadFromDatabase();

I have changed the UserId to be able to take anything as a template specialisation which may have a possible pain point with const char* being picked over strings and therefor doing a shallow copy over a deep string copy, so I have a static assert to make sure that never happens.

@ZigRazor
Copy link
Owner

@JustCallMeRay this pull request is interesting, Have you time to complete it?

@JustCallMeRay
Copy link
Contributor Author

JustCallMeRay commented Jun 29, 2025

Hiya, sorry work happens I have less time to contribute to open source stuff.

So the constexpr ends at the graph class because it uses std::unordered_map which is not constexpr. There are third party alternatives to the standard which are better for certain things than the standard, for example frozen I think would do it.

However if the user does not want to use frozen (say it's slower or doesn't work on embedded systems for example), so I had thought about having cmake configure a simple header which would be something like #include <@CustomMap@> and template<typename Key, typename Vlaue> using CXXGraphMap = @map_namespace@::@map_name@<Key, Value>;

But didn't get around to it.

@JustCallMeRay
Copy link
Contributor Author

JustCallMeRay commented Jun 30, 2025

I had a quick look at the frozen benchmarks and they look much quicker than std::unordered_map so maybe worth doing a

#if __has_include(<frozen/unordered_map.h>)
# include <frozen/unordered_map.h>
# define CXXGraph_ConstexprMap constexpr
   namespace CxxGraph {
     template<typename Key, typename Value>
     using MapType = frozen::unordered_map<Key, Value>;
  }
#else
# include <unordered_map>
# define CXXGraph_ConstexprMap
   namespace CxxGraph {
      template<typename Key, typename Value>
      using MapType = std::unordered_map<Key, Value>;
   }
#endif
Benchmarks

JustCallMeRay ➜ /workspaces/frozen (06bee53) $ $compare benchmarksfiltered $fz_bench BM_.*Std.* $fz_bench BM_.*Fz.*
RUNNING: /workspaces/frozen/build/benchmarks/frozen.benchmark --benchmark_filter=BM_.*Std.* --benchmark_out=/tmp/tmp17l9bmas
2025-06-30T08:58:11+00:00
Running /workspaces/frozen/build/benchmarks/frozen.benchmark
Run on (2 X 3491.59 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x1)
  L1 Instruction 32 KiB (x1)
  L2 Unified 1280 KiB (x1)
  L3 Unified 49152 KiB (x1)
Load Average: 0.58, 1.00, 0.84
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
BM_IntInStdSet                    149 ns          148 ns      4914607
BM_IntInStdArray                  123 ns          122 ns      5272976
BM_IntNotInStdSet                 179 ns          175 ns      4121661
BM_IntNotInStdArray               196 ns          196 ns      3802926
BM_StrInStdSet                    561 ns          561 ns      1363840
BM_StrInStdArray                  511 ns          510 ns      1378160
BM_StrNotInStdSet                 562 ns          560 ns      1364904
BM_StrNotInStdArray               647 ns          633 ns      1140156
BM_StrInStdUnorderedMap           308 ns          308 ns      2193992
BM_StrNotInStdUnorderedMap        255 ns          255 ns      2431215
BM_IntInStdUnorderedSet          93.6 ns         93.5 ns      7483176
BM_IntInStdArray                  124 ns          124 ns      5743759
BM_IntNotInStdUnorderedSet        107 ns          107 ns      6562972
BM_IntNotInStdArray               195 ns          193 ns      3523203
BM_StrInStdUnorderedSet           380 ns          369 ns      2076260
BM_StrInStdArray                  531 ns          531 ns      1270593
BM_StrNotInStdUnorderedSet        253 ns          253 ns      2624767
BM_StrNotInStdArray               642 ns          641 ns      1107392
RUNNING: /workspaces/frozen/build/benchmarks/frozen.benchmark --benchmark_filter=BM_.*Fz.* --benchmark_out=/tmp/tmpfntue7lb
2025-06-30T08:58:27+00:00
Running /workspaces/frozen/build/benchmarks/frozen.benchmark
Run on (2 X 3452.75 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x1)
  L1 Instruction 32 KiB (x1)
  L2 Unified 1280 KiB (x1)
  L3 Unified 49152 KiB (x1)
Load Average: 0.89, 1.05, 0.86
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_IntInFzSet                   64.6 ns         64.5 ns     11606088
BM_IntNotInFzSet                61.4 ns         61.2 ns     12023716
BM_StrInFzSet                    445 ns          442 ns      1578099
BM_StrNotInFzSet                 314 ns          313 ns      2227839
BM_StrInFzUnorderedMap           348 ns          347 ns      2054688
BM_StrNotInFzUnorderedMap        196 ns          196 ns      3654870
BM_IntInFzUnorderedSet           153 ns          152 ns      4648482
BM_IntNotInFzUnorderedSet        135 ns          131 ns      6374913
BM_StrInFzUnorderedSet           350 ns          349 ns      2036144
BM_StrNotInFzUnorderedSet        219 ns          219 ns      3149037
Comparing BM_.*Std.* (from /workspaces/frozen/build/benchmarks/frozen.benchmark) to BM_.*Fz.* (from /workspaces/frozen/build/benchmarks/frozen.benchmark)
Benchmark                                    Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------
[BM_.*Std.* vs. BM_.*Fz.*]                -0.5653         -0.5647           149            65           148            64
[BM_.*Std.* vs. BM_.*Fz.*]                -0.5004         -0.4980           123            61           122            61
[BM_.*Std.* vs. BM_.*Fz.*]                +1.4825         +1.5204           179           445           175           442
[BM_.*Std.* vs. BM_.*Fz.*]                +0.6056         +0.6008           196           314           196           313
[BM_.*Std.* vs. BM_.*Fz.*]                -0.3792         -0.3816           561           348           561           347
[BM_.*Std.* vs. BM_.*Fz.*]                -0.6162         -0.6161           511           196           510           196
[BM_.*Std.* vs. BM_.*Fz.*]                -0.7282         -0.7284           562           153           560           152
[BM_.*Std.* vs. BM_.*Fz.*]                -0.7916         -0.7933           647           135           633           131
[BM_.*Std.* vs. BM_.*Fz.*]                +0.1354         +0.1326           308           350           308           349
[BM_.*Std.* vs. BM_.*Fz.*]                -0.1418         -0.1410           255           219           255           219
[BM_.*Std.* vs. BM_.*Fz.*]_pvalue          0.3498          0.3498      U Test, Repetitions: 18 vs 10
OVERALL_GEOMEAN                           -0.2850         -0.2849             0             0             0             0

@ZigRazor
Copy link
Owner

I had a quick look at the frozen benchmarks and they look much quicker than std::unordered_map so maybe worth doing a

#if __has_include(<frozen/unordered_map.h>)
# include <frozen/unordered_map.h>
# define CXXGraph_ConstexprMap constexpr
   namespace CxxGraph {
     template<typename Key, typename Value>
     using MapType = frozen::unordered_map<Key, Value>;
  }
#else
# include <unordered_map>
# define CXXGraph_ConstexprMap
   namespace CxxGraph {
      template<typename Key, typename Value>
      using MapType = std::unordered_map<Key, Value>;
   }
#endif

Benchmarks


JustCallMeRay ➜ /workspaces/frozen (06bee53) $ $compare benchmarksfiltered $fz_bench BM_.*Std.* $fz_bench BM_.*Fz.*
RUNNING: /workspaces/frozen/build/benchmarks/frozen.benchmark --benchmark_filter=BM_.*Std.* --benchmark_out=/tmp/tmp17l9bmas
2025-06-30T08:58:11+00:00
Running /workspaces/frozen/build/benchmarks/frozen.benchmark
Run on (2 X 3491.59 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x1)
  L1 Instruction 32 KiB (x1)
  L2 Unified 1280 KiB (x1)
  L3 Unified 49152 KiB (x1)
Load Average: 0.58, 1.00, 0.84
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
BM_IntInStdSet                    149 ns          148 ns      4914607
BM_IntInStdArray                  123 ns          122 ns      5272976
BM_IntNotInStdSet                 179 ns          175 ns      4121661
BM_IntNotInStdArray               196 ns          196 ns      3802926
BM_StrInStdSet                    561 ns          561 ns      1363840
BM_StrInStdArray                  511 ns          510 ns      1378160
BM_StrNotInStdSet                 562 ns          560 ns      1364904
BM_StrNotInStdArray               647 ns          633 ns      1140156
BM_StrInStdUnorderedMap           308 ns          308 ns      2193992
BM_StrNotInStdUnorderedMap        255 ns          255 ns      2431215
BM_IntInStdUnorderedSet          93.6 ns         93.5 ns      7483176
BM_IntInStdArray                  124 ns          124 ns      5743759
BM_IntNotInStdUnorderedSet        107 ns          107 ns      6562972
BM_IntNotInStdArray               195 ns          193 ns      3523203
BM_StrInStdUnorderedSet           380 ns          369 ns      2076260
BM_StrInStdArray                  531 ns          531 ns      1270593
BM_StrNotInStdUnorderedSet        253 ns          253 ns      2624767
BM_StrNotInStdArray               642 ns          641 ns      1107392
RUNNING: /workspaces/frozen/build/benchmarks/frozen.benchmark --benchmark_filter=BM_.*Fz.* --benchmark_out=/tmp/tmpfntue7lb
2025-06-30T08:58:27+00:00
Running /workspaces/frozen/build/benchmarks/frozen.benchmark
Run on (2 X 3452.75 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x1)
  L1 Instruction 32 KiB (x1)
  L2 Unified 1280 KiB (x1)
  L3 Unified 49152 KiB (x1)
Load Average: 0.89, 1.05, 0.86
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_IntInFzSet                   64.6 ns         64.5 ns     11606088
BM_IntNotInFzSet                61.4 ns         61.2 ns     12023716
BM_StrInFzSet                    445 ns          442 ns      1578099
BM_StrNotInFzSet                 314 ns          313 ns      2227839
BM_StrInFzUnorderedMap           348 ns          347 ns      2054688
BM_StrNotInFzUnorderedMap        196 ns          196 ns      3654870
BM_IntInFzUnorderedSet           153 ns          152 ns      4648482
BM_IntNotInFzUnorderedSet        135 ns          131 ns      6374913
BM_StrInFzUnorderedSet           350 ns          349 ns      2036144
BM_StrNotInFzUnorderedSet        219 ns          219 ns      3149037
Comparing BM_.*Std.* (from /workspaces/frozen/build/benchmarks/frozen.benchmark) to BM_.*Fz.* (from /workspaces/frozen/build/benchmarks/frozen.benchmark)
Benchmark                                    Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------
[BM_.*Std.* vs. BM_.*Fz.*]                -0.5653         -0.5647           149            65           148            64
[BM_.*Std.* vs. BM_.*Fz.*]                -0.5004         -0.4980           123            61           122            61
[BM_.*Std.* vs. BM_.*Fz.*]                +1.4825         +1.5204           179           445           175           442
[BM_.*Std.* vs. BM_.*Fz.*]                +0.6056         +0.6008           196           314           196           313
[BM_.*Std.* vs. BM_.*Fz.*]                -0.3792         -0.3816           561           348           561           347
[BM_.*Std.* vs. BM_.*Fz.*]                -0.6162         -0.6161           511           196           510           196
[BM_.*Std.* vs. BM_.*Fz.*]                -0.7282         -0.7284           562           153           560           152
[BM_.*Std.* vs. BM_.*Fz.*]                -0.7916         -0.7933           647           135           633           131
[BM_.*Std.* vs. BM_.*Fz.*]                +0.1354         +0.1326           308           350           308           349
[BM_.*Std.* vs. BM_.*Fz.*]                -0.1418         -0.1410           255           219           255           219
[BM_.*Std.* vs. BM_.*Fz.*]_pvalue          0.3498          0.3498      U Test, Repetitions: 18 vs 10
OVERALL_GEOMEAN                           -0.2850         -0.2849             0             0             0             0

I think the advantage in this moment does not justify the introduction of an external library.
Maybe we can think to define somenthing to let the users the possibility to use a different implementation of stdlib as in this case.
A compiler switch for example.
What do you think?
P.S. : Also the memory consumption is an important parameters for benchmark, we don't know at this moment if the frozen library has more overhead

@JustCallMeRay
Copy link
Contributor Author

JustCallMeRay commented Jun 30, 2025

I think the advantage in this moment does not justify the introduction of an external library.

I agree, all the things I have been thinking about would be an opt in dependency rather than a hard requirement.

let the users the possibility to use a different implementation

Yes, the user has to tell us what headers to include, what namespace the class is under and what the name of the class is. There are several ways I can think of to do the include:

  1. #if __has_include(<frozen/unordered_map.h>), i.e. if frozen headers are available use those instead of the std lib.
    • This means that which map types are supported are dicated under this repo and users would not be able to do something that is not yet supported
  2. Import from a pre-processor definition so the user would do #define CXXGraph_MapHeader <frozen/unordered_map.h>
    • This means a user can attempt to use any class and would be expected to find any unexpected behaviour themselves.
    • This would be closest to the suggestion: "A compiler switch for example."
  3. Handling it in a CMake config file, this could allow either of the above.
  4. A mix of the first two,
    + I don't know why someone would have frozen and not be interested in using it
    + If they don't have something we choose to explicitly support we can revert to the second option.

For the namespace and class:

  • If 1 it's really simple
  • if 2 the user would need to #define more things
  • If 3 the user would define it in CMake but no where else
  • If 4, see 2

memory consumption

I doubt frozen will have a much higher overhead. To find out for sure I would have to write memory benchmarks for frozen to find out (the ones above are from their repo), as they use google benchmark this doesn't seem difficult, someone here has written something that could easily be added but I am not going to follow through with that as I think it's somewhat irrelevant for an optional dependency.

If I get some time I will update the branch and implement 1 then mark the PR as ready

@ZigRazor
Copy link
Owner

ZigRazor commented Jul 1, 2025

@JustCallMeRay thank you so much! I Agree with you!

@JustCallMeRay JustCallMeRay force-pushed the constexpr-additions branch from c0cf7ad to ee6214c Compare July 1, 2025 18:54
@JustCallMeRay JustCallMeRay force-pushed the constexpr-additions branch from ee6214c to 4faf229 Compare July 1, 2025 21:15
@ZigRazor ZigRazor marked this pull request as ready for review July 2, 2025 08:04
@ZigRazor ZigRazor marked this pull request as draft July 2, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants