Skip to content

Add Quicksort benchmark and export benchmark results as an artifact #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 25, 2024

Conversation

tbagrel1
Copy link
Member

@tbagrel1 tbagrel1 commented Sep 10, 2024

This PR adds a proper benchmark for quicksort, across 3 methods and 3 list sizes, as encouraged by this thread. It also runs and exports benchmark results to an artifact for each GHC version in the CI matrix so it will be easier to check for potential performance regression in the future. The PR builds upon #479 and #480

Here are the results I got :

GHC 9.4.8

/!\ Done locally on my computer, so ordering between methods is valid, but absolute times should not be compared to other results below /!\

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        773  μs ±  65 μs
      lazyListQuicksort:                                                         OK
        276  μs ±  16 μs
      stdLibSort:                                                                OK
        219  μs ±  12 μs
    size 50000
      linArrayQuicksort:                                                         OK
        63.9 ms ± 3.2 ms
      lazyListQuicksort:                                                         OK
        32.4 ms ± 2.3 ms
      stdLibSort:                                                                OK
        38.1 ms ± 1.9 ms
    size 1000000
      linArrayQuicksort:                                                         OK
        3.557 s ± 322 ms
      lazyListQuicksort:                                                         OK
        2.135 s ±  72 ms
      stdLibSort:                                                                OK
        4.309 s ± 212 ms

Results below are extracted from artifacts of https://github.com/tweag/linear-base/actions/runs/10851885368

GHC 9.6.6

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        1.37 ms ±  91 μs, 6.5 MB allocated,  15 KB copied,  89 MB peak memory
      lazyListQuicksort:                                                         OK
        320  μs ±  22 μs, 1.0 MB allocated,  13 KB copied,  89 MB peak memory
      stdLibSort:                                                                OK
        246  μs ±  24 μs, 665 KB allocated, 3.4 KB copied,  89 MB peak memory
    size 50000
      linArrayQuicksort:                                                         OK
        115  ms ±  11 ms, 520 MB allocated, 1.1 MB copied,  89 MB peak memory
      lazyListQuicksort:                                                         OK
        44.3 ms ± 2.0 ms,  85 MB allocated,  25 MB copied,  89 MB peak memory
      stdLibSort:                                                                OK
        42.5 ms ± 2.3 ms,  52 MB allocated,  16 MB copied,  89 MB peak memory
    size 1000000
      linArrayQuicksort:                                                         OK
        3.105 s ± 239 ms,  13 GB allocated,  47 MB copied, 227 MB peak memory
      lazyListQuicksort:                                                         OK
        1.730 s ± 140 ms, 2.1 GB allocated, 910 MB copied, 484 MB peak memory
      stdLibSort:                                                                OK
        2.299 s ±  70 ms, 1.3 GB allocated, 996 MB copied, 484 MB peak memory

GHC 9.8.2

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        145  μs ±  12 μs, 247 KB allocated,  42 B  copied,  82 MB peak memory
      lazyListQuicksort:                                                         OK
        303  μs ±  30 μs, 1.0 MB allocated,  13 KB copied,  82 MB peak memory
      stdLibSort:                                                                OK
        241  μs ±  11 μs, 665 KB allocated, 3.3 KB copied,  82 MB peak memory
    size 50000
      linArrayQuicksort:                                                         OK
        14.3 ms ± 1.2 ms,  15 MB allocated, 1.0 MB copied,  82 MB peak memory
      lazyListQuicksort:                                                         OK
        43.5 ms ± 2.9 ms,  85 MB allocated,  24 MB copied,  82 MB peak memory
      stdLibSort:                                                                OK
        42.2 ms ± 1.4 ms,  52 MB allocated,  16 MB copied,  82 MB peak memory
    size 1000000
      linArrayQuicksort:                                                         OK
        471  ms ±  33 ms, 369 MB allocated,  43 MB copied, 318 MB peak memory
      lazyListQuicksort:                                                         OK
        1.655 s ± 111 ms, 2.1 GB allocated, 928 MB copied, 492 MB peak memory
      stdLibSort:                                                                OK
        2.308 s ± 9.4 ms, 1.3 GB allocated, 996 MB copied, 492 MB peak memory

GHC 9.10.1

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        152  μs ±  11 μs, 247 KB allocated,  42 B  copied,  88 MB peak memory
      lazyListQuicksort:                                                         OK
        318  μs ±  13 μs, 1.0 MB allocated,  13 KB copied,  88 MB peak memory
      stdLibSort:                                                                OK
        242  μs ±  24 μs, 665 KB allocated, 3.4 KB copied,  88 MB peak memory
    size 50000
      linArrayQuicksort:                                                         OK
        15.3 ms ± 1.4 ms,  15 MB allocated, 947 KB copied,  88 MB peak memory
      lazyListQuicksort:                                                         OK
        47.6 ms ± 1.8 ms,  85 MB allocated,  24 MB copied,  88 MB peak memory
      stdLibSort:                                                                OK
        40.6 ms ± 3.5 ms,  51 MB allocated, 9.9 MB copied,  88 MB peak memory
    size 1000000
      linArrayQuicksort:                                                         OK
        602  ms ± 6.6 ms, 369 MB allocated,  46 MB copied, 318 MB peak memory
      lazyListQuicksort:                                                         OK
        1.808 s ±  11 ms, 2.1 GB allocated, 900 MB copied, 498 MB peak memory
      stdLibSort:                                                                OK
        2.827 s ± 238 ms, 1.3 GB allocated, 996 MB copied, 498 MB peak memory

N.B as far as I know, peak memory usage reporting is quite unreliable when several benchmarks are run in a sequence. In particular, I don't think peak memory can go down, only up if the next bench uses more memory. On the other hand, the "allocated memory" and "copied memory" reporting seems decently reliable.

Conclusion

It appears that starting from GHC 9.8, the performance of quicksort using the linear arrays provided by Linear base gets much more efficient than naive quicksort. That may be related to Ur unboxing optimization that was initially disallowed on earlier GHC versions.

@tbagrel1 tbagrel1 changed the title LinArray vs naive Quicksort performance benchmark [WIP] Add LinArray vs naive Quicksort performance benchmark Sep 11, 2024
@tbagrel1 tbagrel1 marked this pull request as draft September 11, 2024 07:07
@tbagrel1 tbagrel1 force-pushed the tbagrel1/quicksort-perf-bench branch 9 times, most recently from f0ac6b9 to 4da51b6 Compare September 11, 2024 14:49
@tbagrel1 tbagrel1 changed the title [WIP] Add LinArray vs naive Quicksort performance benchmark Add Quicksort benchmark and export benchmark results as an artifact Sep 11, 2024
@tbagrel1 tbagrel1 marked this pull request as ready for review September 11, 2024 15:27
@tbagrel1 tbagrel1 force-pushed the tbagrel1/quicksort-perf-bench branch 8 times, most recently from a993847 to 8d1533e Compare September 13, 2024 14:58
@tbagrel1
Copy link
Member Author

tbagrel1 commented Sep 13, 2024

With this PR, I enable benchmarks in the cabal matrix of the CI, so that we can check performance across versions of linear-base and versions of GHC easily.

The thing is, stack is our preferred platform, and we use stackage for package version choice and build plan. We partially support cabal on the project, but we have little constraint bounds/overrides to help cabal find a proper build plan. But for this PR, we need to use cabal bench to bench with different GHC versions.

Without any options, cabal build linear-base:bench:bench doesn't succeed with GHC 9.8/GHC 9.10 as of now. One thing we used in the past is --allow-newer, and it is enough to help the constraint solver to find a build plan for the library with all versions of GHC, but not for the benchmarks, mainly because of a constraint on filepath in transitive dependencies.

In 8d1533e, I pinned the version of filepath and the package state index. It is enough for the constraint solver to succeed, and shouldn't break unexpectedly as long as this version of filepath is compatible with other constraints.

By some very (un)fortunate coincidence, @amesgen & I found that if we enable tests (--enable-tests or tests: True in cabal.project) at the moment of building the benchmark, it helps the constraint solver to succeed, with no need to pin filepath. It might be more brittle or less brittle than the other solution, I don't know. This option is presented in 919fc6c

@tbagrel1 tbagrel1 requested a review from aspiwack September 13, 2024 15:45
Comment on lines +3 to +4
tests: True
benchmarks: True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you just remove test:True in a recent commit? What's the story here?

Copy link
Member Author

@tbagrel1 tbagrel1 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we used cabal mainly for the CI, and we explicitly passed --disable-tests --disable-benchmarks to make the build go faster (as we only wanted to check if the library would build). So I thought it wasn't worth it to enable tests in cabal.project to then override it to False on every command of the CI. It would result in a different behavior in dependency resolution compared to using cabal locally without options.

That's also why I cleaned up cabal.project to a bare minimum in my previous PR, so that compile-time options would be in the config file shared between both stack and cabal, with little/no overrides in cabal.project.

But now, because dependency resolution doesn't succeed on its own without options, we need to chose:

  • use a specific version of filepath to obtain a valid build plan; that's commit 3a8e25a
  • OR enable tests and benchmarks as it coincidentally makes the dependency resolution succeed with a valid build plan; that's commit b53b28a
  • OR use --allow-newer on a per-package basis, but that would require a lot of trial and error, and we might not find a valid build plan that works with all versions of GHC at once

@tbagrel1
Copy link
Member Author

tbagrel1 commented Sep 18, 2024

For reference, here is the comparison of generated Core between GHC 9.6 and GHC 9.8 for quicksort. It seems that Ur are indeed unboxed in GHC 9.8 while they aren't in GHC 9.6:

GHC 9.6

partition_$s$wpartition2
  = \ sc sc1 ww ww1 ->
      case $w$c<2 ww1 ww of {
        False ->
          case sc1 of { Array ww2 ->
          case ($wassertIndexInRange (lvl1 `cast` <Co:4> :: ...) ww ww2)
               `cast` <Co:2> :: ...
          of nt
          { __DEFAULT ->
          case $wget ww (nt `cast` <Co:3> :: ...) of { (# ww3, ww4 #) ->
          case ww3 of { Ur lVal ->
          case ($wassertIndexInRange (lvl3 `cast` <Co:4> :: ...) ww1 ww4)
               `cast` <Co:2> :: ...
          of nt1
          { __DEFAULT ->
          case $wget ww1 (nt1 `cast` <Co:3> :: ...) of { (# ww5, ww6 #) ->
          case ww5 of { Ur rVal ->
          case lVal of { I# ww7 ->
[...]

GHC 9.8

partition_$s$wpartition2
  = \ sc sc1 ww ww1 ->
      case <# ww1 ww of {
        __DEFAULT ->
          case <=# 0# ww of {
            __DEFAULT -> case partition17 of wild { };
            1# ->
              case <# ww (sizeofMutableArray# (sc1 `cast` <Co:2> :: ...)) of {
                __DEFAULT -> case partition17 of wild { };
                1# ->
                  case $wget ww sc1 of { (# ww2, ww3 #) ->
                  case <=# 0# ww1 of {
                    __DEFAULT -> case partition16 of wild { };
                    1# ->
                      case <# ww1 (sizeofMutableArray# (ww3 `cast` <Co:2> :: ...)) of {
                        __DEFAULT -> case partition16 of wild { };
                        1# ->
                          case $wget ww1 ww3 of { (# ww5, ww6 #) ->
                          case ww2 of { I# f1 ->
                          join {
                            $j
                              = case ww5 of { I# f2 ->

Quicksort-GHC-9.6-full.dump-simpl.txt
Quicksort-GHC-9.6-suppr.dump-simpl.txt
Quicksort-GHC-9.8-full.dump-simpl.txt
Quicksort-GHC-9.8-suppr.dump-simpl.txt

@aspiwack
Copy link
Member

I'll be frank: I'm not sure why unboxing of Ur starts appearing specifically in GHC 9.8. Maybe I did something. But I don't remember. (I remember there was a refactor of the worker/wrapper split code at some point which I believed to accidentally enable unboxing of Ur, but I seem to remember that it was in effect in 9.6 already. It doesn't really matter, as long as this is good now)

@tbagrel1 tbagrel1 requested a review from aspiwack September 20, 2024 11:13
@tbagrel1
Copy link
Member Author

I discovered that the test, bench, etc. subdirectories didn't match the src structure.

I added a commit to address this, but because the change would affect a dozen of files, I'm requesting your approval again :) Let me know if you think a separate PR would be better.

@aspiwack
Copy link
Member

Yes, please merge the easy things, and split the last commit in a new PR so that you can explain what you did and why.

@tbagrel1 tbagrel1 force-pushed the tbagrel1/quicksort-perf-bench branch from a78d659 to b53b28a Compare September 25, 2024 07:42
@tbagrel1 tbagrel1 merged commit 9a7dc1b into master Sep 25, 2024
11 checks passed
@tbagrel1 tbagrel1 deleted the tbagrel1/quicksort-perf-bench branch September 25, 2024 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants