Add Quicksort benchmark and export benchmark results as an artifact #477

tbagrel1 · 2024-09-10T15:54:14Z

This PR adds a proper benchmark for quicksort, across 3 methods and 3 list sizes, as encouraged by this thread. It also runs and exports benchmark results to an artifact for each GHC version in the CI matrix so it will be easier to check for potential performance regression in the future. The PR builds upon #479 and #480

Here are the results I got :

GHC 9.4.8

/!\ Done locally on my computer, so ordering between methods is valid, but absolute times should not be compared to other results below /!\

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        773  μs ±  65 μs
      lazyListQuicksort:                                                         OK
        276  μs ±  16 μs
      stdLibSort:                                                                OK
        219  μs ±  12 μs
    size 50000
      linArrayQuicksort:                                                         OK
        63.9 ms ± 3.2 ms
      lazyListQuicksort:                                                         OK
        32.4 ms ± 2.3 ms
      stdLibSort:                                                                OK
        38.1 ms ± 1.9 ms
    size 1000000
      linArrayQuicksort:                                                         OK
        3.557 s ± 322 ms
      lazyListQuicksort:                                                         OK
        2.135 s ±  72 ms
      stdLibSort:                                                                OK
        4.309 s ± 212 ms

Results below are extracted from artifacts of https://github.com/tweag/linear-base/actions/runs/10851885368

GHC 9.6.6

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        1.37 ms ±  91 μs, 6.5 MB allocated,  15 KB copied,  89 MB peak memory
      lazyListQuicksort:                                                         OK
        320  μs ±  22 μs, 1.0 MB allocated,  13 KB copied,  89 MB peak memory
      stdLibSort:                                                                OK
        246  μs ±  24 μs, 665 KB allocated, 3.4 KB copied,  89 MB peak memory
    size 50000
      linArrayQuicksort:                                                         OK
        115  ms ±  11 ms, 520 MB allocated, 1.1 MB copied,  89 MB peak memory
      lazyListQuicksort:                                                         OK
        44.3 ms ± 2.0 ms,  85 MB allocated,  25 MB copied,  89 MB peak memory
      stdLibSort:                                                                OK
        42.5 ms ± 2.3 ms,  52 MB allocated,  16 MB copied,  89 MB peak memory
    size 1000000
      linArrayQuicksort:                                                         OK
        3.105 s ± 239 ms,  13 GB allocated,  47 MB copied, 227 MB peak memory
      lazyListQuicksort:                                                         OK
        1.730 s ± 140 ms, 2.1 GB allocated, 910 MB copied, 484 MB peak memory
      stdLibSort:                                                                OK
        2.299 s ±  70 ms, 1.3 GB allocated, 996 MB copied, 484 MB peak memory

GHC 9.8.2

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        145  μs ±  12 μs, 247 KB allocated,  42 B  copied,  82 MB peak memory
      lazyListQuicksort:                                                         OK
        303  μs ±  30 μs, 1.0 MB allocated,  13 KB copied,  82 MB peak memory
      stdLibSort:                                                                OK
        241  μs ±  11 μs, 665 KB allocated, 3.3 KB copied,  82 MB peak memory
    size 50000
      linArrayQuicksort:                                                         OK
        14.3 ms ± 1.2 ms,  15 MB allocated, 1.0 MB copied,  82 MB peak memory
      lazyListQuicksort:                                                         OK
        43.5 ms ± 2.9 ms,  85 MB allocated,  24 MB copied,  82 MB peak memory
      stdLibSort:                                                                OK
        42.2 ms ± 1.4 ms,  52 MB allocated,  16 MB copied,  82 MB peak memory
    size 1000000
      linArrayQuicksort:                                                         OK
        471  ms ±  33 ms, 369 MB allocated,  43 MB copied, 318 MB peak memory
      lazyListQuicksort:                                                         OK
        1.655 s ± 111 ms, 2.1 GB allocated, 928 MB copied, 492 MB peak memory
      stdLibSort:                                                                OK
        2.308 s ± 9.4 ms, 1.3 GB allocated, 996 MB copied, 492 MB peak memory

GHC 9.10.1

  quicksort
    size 1000
      linArrayQuicksort:                                                         OK
        152  μs ±  11 μs, 247 KB allocated,  42 B  copied,  88 MB peak memory
      lazyListQuicksort:                                                         OK
        318  μs ±  13 μs, 1.0 MB allocated,  13 KB copied,  88 MB peak memory
      stdLibSort:                                                                OK
        242  μs ±  24 μs, 665 KB allocated, 3.4 KB copied,  88 MB peak memory
    size 50000
      linArrayQuicksort:                                                         OK
        15.3 ms ± 1.4 ms,  15 MB allocated, 947 KB copied,  88 MB peak memory
      lazyListQuicksort:                                                         OK
        47.6 ms ± 1.8 ms,  85 MB allocated,  24 MB copied,  88 MB peak memory
      stdLibSort:                                                                OK
        40.6 ms ± 3.5 ms,  51 MB allocated, 9.9 MB copied,  88 MB peak memory
    size 1000000
      linArrayQuicksort:                                                         OK
        602  ms ± 6.6 ms, 369 MB allocated,  46 MB copied, 318 MB peak memory
      lazyListQuicksort:                                                         OK
        1.808 s ±  11 ms, 2.1 GB allocated, 900 MB copied, 498 MB peak memory
      stdLibSort:                                                                OK
        2.827 s ± 238 ms, 1.3 GB allocated, 996 MB copied, 498 MB peak memory

N.B as far as I know, peak memory usage reporting is quite unreliable when several benchmarks are run in a sequence. In particular, I don't think peak memory can go down, only up if the next bench uses more memory. On the other hand, the "allocated memory" and "copied memory" reporting seems decently reliable.

Conclusion

It appears that starting from GHC 9.8, the performance of quicksort using the linear arrays provided by Linear base gets much more efficient than naive quicksort. That may be related to Ur unboxing optimization that was initially disallowed on earlier GHC versions.

tbagrel1 · 2024-09-13T15:32:10Z

With this PR, I enable benchmarks in the cabal matrix of the CI, so that we can check performance across versions of linear-base and versions of GHC easily.

The thing is, stack is our preferred platform, and we use stackage for package version choice and build plan. We partially support cabal on the project, but we have little constraint bounds/overrides to help cabal find a proper build plan. But for this PR, we need to use cabal bench to bench with different GHC versions.

Without any options, cabal build linear-base:bench:bench doesn't succeed with GHC 9.8/GHC 9.10 as of now. One thing we used in the past is --allow-newer, and it is enough to help the constraint solver to find a build plan for the library with all versions of GHC, but not for the benchmarks, mainly because of a constraint on filepath in transitive dependencies.

In 8d1533e, I pinned the version of filepath and the package state index. It is enough for the constraint solver to succeed, and shouldn't break unexpectedly as long as this version of filepath is compatible with other constraints.

By some very (un)fortunate coincidence, @amesgen & I found that if we enable tests (--enable-tests or tests: True in cabal.project) at the moment of building the benchmark, it helps the constraint solver to succeed, with no need to pin filepath. It might be more brittle or less brittle than the other solution, I don't know. This option is presented in 919fc6c

aspiwack · 2024-09-18T01:00:07Z

cabal.project

+tests: True
+benchmarks: True


Didn't you just remove test:True in a recent commit? What's the story here?

Previously we used cabal mainly for the CI, and we explicitly passed --disable-tests --disable-benchmarks to make the build go faster (as we only wanted to check if the library would build). So I thought it wasn't worth it to enable tests in cabal.project to then override it to False on every command of the CI. It would result in a different behavior in dependency resolution compared to using cabal locally without options.

That's also why I cleaned up cabal.project to a bare minimum in my previous PR, so that compile-time options would be in the config file shared between both stack and cabal, with little/no overrides in cabal.project.

But now, because dependency resolution doesn't succeed on its own without options, we need to chose:

use a specific version of filepath to obtain a valid build plan; that's commit 3a8e25a

OR enable tests and benchmarks as it coincidentally makes the dependency resolution succeed with a valid build plan; that's commit b53b28a

OR use --allow-newer on a per-package basis, but that would require a lot of trial and error, and we might not find a valid build plan that works with all versions of GHC at once

tbagrel1 · 2024-09-18T08:11:33Z

For reference, here is the comparison of generated Core between GHC 9.6 and GHC 9.8 for quicksort. It seems that Ur are indeed unboxed in GHC 9.8 while they aren't in GHC 9.6:

GHC 9.6

partition_$s$wpartition2
  = \ sc sc1 ww ww1 ->
      case $w$c<2 ww1 ww of {
        False ->
          case sc1 of { Array ww2 ->
          case ($wassertIndexInRange (lvl1 `cast` <Co:4> :: ...) ww ww2)
               `cast` <Co:2> :: ...
          of nt
          { __DEFAULT ->
          case $wget ww (nt `cast` <Co:3> :: ...) of { (# ww3, ww4 #) ->
          case ww3 of { Ur lVal ->
          case ($wassertIndexInRange (lvl3 `cast` <Co:4> :: ...) ww1 ww4)
               `cast` <Co:2> :: ...
          of nt1
          { __DEFAULT ->
          case $wget ww1 (nt1 `cast` <Co:3> :: ...) of { (# ww5, ww6 #) ->
          case ww5 of { Ur rVal ->
          case lVal of { I# ww7 ->
[...]

GHC 9.8

partition_$s$wpartition2
  = \ sc sc1 ww ww1 ->
      case <# ww1 ww of {
        __DEFAULT ->
          case <=# 0# ww of {
            __DEFAULT -> case partition17 of wild { };
            1# ->
              case <# ww (sizeofMutableArray# (sc1 `cast` <Co:2> :: ...)) of {
                __DEFAULT -> case partition17 of wild { };
                1# ->
                  case $wget ww sc1 of { (# ww2, ww3 #) ->
                  case <=# 0# ww1 of {
                    __DEFAULT -> case partition16 of wild { };
                    1# ->
                      case <# ww1 (sizeofMutableArray# (ww3 `cast` <Co:2> :: ...)) of {
                        __DEFAULT -> case partition16 of wild { };
                        1# ->
                          case $wget ww1 ww3 of { (# ww5, ww6 #) ->
                          case ww2 of { I# f1 ->
                          join {
                            $j
                              = case ww5 of { I# f2 ->

Quicksort-GHC-9.6-full.dump-simpl.txt
Quicksort-GHC-9.6-suppr.dump-simpl.txt
Quicksort-GHC-9.8-full.dump-simpl.txt
Quicksort-GHC-9.8-suppr.dump-simpl.txt

aspiwack · 2024-09-19T00:27:56Z

I'll be frank: I'm not sure why unboxing of Ur starts appearing specifically in GHC 9.8. Maybe I did something. But I don't remember. (I remember there was a refactor of the worker/wrapper split code at some point which I believed to accidentally enable unboxing of Ur, but I seem to remember that it was in effect in 9.6 already. It doesn't really matter, as long as this is good now)

tbagrel1 · 2024-09-20T11:16:12Z

I discovered that the test, bench, etc. subdirectories didn't match the src structure.

I added a commit to address this, but because the change would affect a dozen of files, I'm requesting your approval again :) Let me know if you think a separate PR would be better.

aspiwack · 2024-09-22T23:56:49Z

Yes, please merge the easy things, and split the last commit in a new PR so that you can explain what you did and why.

…obust solution

tbagrel1 changed the title ~~LinArray vs naive Quicksort performance benchmark~~ [WIP] Add LinArray vs naive Quicksort performance benchmark Sep 11, 2024

tbagrel1 marked this pull request as draft September 11, 2024 07:07

tbagrel1 force-pushed the tbagrel1/quicksort-perf-bench branch 9 times, most recently from f0ac6b9 to 4da51b6 Compare September 11, 2024 14:49

tbagrel1 changed the title ~~[WIP] Add LinArray vs naive Quicksort performance benchmark~~ Add Quicksort benchmark and export benchmark results as an artifact Sep 11, 2024

tbagrel1 marked this pull request as ready for review September 11, 2024 15:27

tbagrel1 force-pushed the tbagrel1/quicksort-perf-bench branch 8 times, most recently from a993847 to 8d1533e Compare September 13, 2024 14:58

tbagrel1 requested a review from aspiwack September 13, 2024 15:45

aspiwack approved these changes Sep 18, 2024

View reviewed changes

tbagrel1 requested a review from aspiwack September 20, 2024 11:13

Add benchmark for Quicksort

a1404cf

tbagrel1 added 2 commits September 25, 2024 09:40

Update CI to run and export benchmarks across several GHC versions

3a8e25a

Remove brittle version bound on filepath; replace with another more r…

b53b28a

…obust solution

tbagrel1 force-pushed the tbagrel1/quicksort-perf-bench branch from a78d659 to b53b28a Compare September 25, 2024 07:42

tbagrel1 merged commit 9a7dc1b into master Sep 25, 2024
11 checks passed

tbagrel1 deleted the tbagrel1/quicksort-perf-bench branch September 25, 2024 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Quicksort benchmark and export benchmark results as an artifact #477

Add Quicksort benchmark and export benchmark results as an artifact #477

Uh oh!

tbagrel1 commented Sep 10, 2024 •

edited

Loading

Uh oh!

tbagrel1 commented Sep 13, 2024 •

edited

Loading

Uh oh!

aspiwack Sep 18, 2024

Uh oh!

tbagrel1 Sep 18, 2024 •

edited

Loading

Uh oh!

tbagrel1 commented Sep 18, 2024 •

edited

Loading

Uh oh!

aspiwack commented Sep 19, 2024

Uh oh!

tbagrel1 commented Sep 20, 2024

Uh oh!

aspiwack commented Sep 22, 2024

Uh oh!

Uh oh!

Uh oh!

		tests: True
		benchmarks: True

Add Quicksort benchmark and export benchmark results as an artifact #477

Add Quicksort benchmark and export benchmark results as an artifact #477

Uh oh!

Conversation

tbagrel1 commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbagrel1 commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aspiwack Sep 18, 2024

Choose a reason for hiding this comment

Uh oh!

tbagrel1 Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tbagrel1 commented Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aspiwack commented Sep 19, 2024

Uh oh!

tbagrel1 commented Sep 20, 2024

Uh oh!

aspiwack commented Sep 22, 2024

Uh oh!

Uh oh!

Uh oh!

tbagrel1 commented Sep 10, 2024 •

edited

Loading

tbagrel1 commented Sep 13, 2024 •

edited

Loading

tbagrel1 Sep 18, 2024 •

edited

Loading

tbagrel1 commented Sep 18, 2024 •

edited

Loading