Add fromArrayMonolithic #115

treeowl · 2014-12-25T02:52:04Z

For GHC, this should let us create a sequence out of an array
without having any stray pointers to the array in the sequence.

For GHC, this should let us create a sequence out of an array without having any stray pointers to the array in the sequence.

treeowl · 2014-12-25T05:40:01Z

Note: if you're concerned about depending on primitive, we could reimplement what we need; it's just really ugly.

treeowl · 2014-12-25T17:18:55Z

After a lot of trouble, I finally figured out how to test this properly, and it works: holding on to a split-off piece of a sequence produced by fromArrayMonolithic requires much less memory than holding on to one produced by fromArray, although actually performing the conversion produces a lot more garbage. I doubt that many people will need fromArrayMonolithic, but if they do, they probably won't want to write it themselves.

The testing code (compile it with -with-rtsopts=-T and give it a number like 100000):

module Main where

import Prelude hiding (splitAt)

import Data.Array hiding (index)
import Data.Sequence

import Control.Exception
import Control.DeepSeq

import GHC.Stats
import System.Mem (performMajorGC)

import qualified Data.Foldable as F

sillyArray :: Int -> Array Int Int
sillyArray n = listArray (0,n-1) [0 .. n-1]

testFromArray :: Int -> String -> (Array Int Int -> Seq Int) -> IO ()
testFromArray size name fa = do
  putStrLn $ "Testing " ++ name
  let sm = case splitAt 10 . fa $ sillyArray size of
             (small, large) -> large `deepseq` small
  evaluate sm
  stats1 <- getGCStats
  putStrLn $ "Bytes used before GC: " ++ show (currentBytesUsed stats1)
  performMajorGC
  stats2 <- getGCStats
  putStrLn $ "Bytes used after GC: " ++ show (currentBytesUsed stats2)
  putStrLn $ "Result: " ++ show (F.sum sm)

main = do
  putStrLn "How large should the array be?"
  size <- readLn
  testFromArray size "fromArray" fromArray
  testFromArray size "fromArrayMonolithic" fromArrayMonolithic

treeowl · 2014-12-25T17:45:47Z

We should also test how this ST implementation compares to one using State. State leads to a bunch of pairs showing up in the Core, which suggests extra allocation, but that may not be the case.

Allocation actually goes down quite a bit when using State instead of ST.

treeowl · 2014-12-25T18:15:08Z

State is better, by quite a bit, and avoids silly GHC version issues. Perfect.

foxik · 2014-12-30T13:46:04Z

Personally, I am against this. Why having two 'fromArray' functions? Also, what is the difference between fromArrayMonolithic and rnf . fromArray? The second one is a trivial way of having the same effect, isn't it?

In any case, this is API change, so please propose to libraries@haskell if you want to pursue this.

treeowl · 2014-12-30T14:18:33Z

Personally, I am against this. Why having two 'fromArray' functions?

Because they have very different performance tradeoffs.

Also, what is the difference between fromArrayMonolithic and rnf .
fromArray? The second one is a trivial way of having the same effect, isn't
it?

No. fromArrayMonolithic does not force any of the array elements. It just
copies them out of the array (so pointers to thunks will just get copied).
What I'd really like is to add support for a sort of copy-on-GC array
slice to GHC's RTS, which would let us have it all, but it sounds like that
would be a pretty large project.

In any case, this is API change, so please propose to libraries@haskell
if you want to pursue this.

Will do.

—
Reply to this email directly or view it on GitHub.

tibbe · 2014-12-30T14:23:18Z

I find the name confusing. Does this just create an element-strict Seq (which would imply that there aren't any references to the array) or is it still value-lazy, but careful to evaluate the array reads just enough to make sure there are no "read array" thunks left? If it's the latter, perhaps we should just change fromArray to do that? Having a fromArray that leaks memory by holding references to the array sounds bad.

treeowl · 2014-12-30T14:25:58Z

It does the latter (the careful thing). I guess changing fromArray is an
option, but fromArray has much better performance in other contexts, so I'm
not sure.

treeowl · 2014-12-30T14:47:18Z

The monolithic version can also be expanded to grab a limited slice, which
I don't think fromArray can do (because of Ix nonsense). But the
monolithic way feels a lot like a workaround for the absence of RTS support
for slicing.

treeowl · 2014-12-30T14:48:43Z

Er .... actually, fromArray can probably do that too. Maybe we should
yank fromArray pending further thought?

foxik · 2014-12-30T16:19:08Z

Personally I am not happy about exporting two fromArray functions, so I would agree with modifying fromArray.

Nevertheless, we are not using primitive for this -- every dependency costs something. I understand that vector uses primitive, but I am unwilling to use primitive for one (probably very little used) conversion function from Sequence.

Personally, I would even consider getting rid of GHC-specific stuff in fromArray and use elems. Yes, it is slow, but the alternative is using semi-official API and a bunch of extensions (probably MagicHash and UnboxedTuples) for one conversion function I am not sure anyone really wants. Any thoughts, @tibbe?

treeowl · 2014-12-30T16:22:41Z

I'm not happy with the elems approach at all. I'd rather just kill the
function altogether for now and consider the design space more carefully
for the next version.

foxik · 2014-12-30T16:27:38Z

Well it is my mistake I did not insist on proposal to libraries@haskell before merging the first fromArray. Not going to happen in the future :-)

Nevertheless, we are probably stuck with fromArray now. If the numbers show there is a reasonaly large difference, I can probably live with MagicHash, UnboxedTuples and dependency on ghc-prim, but there should be nontrivial improvement to justify all this.

treeowl · 2014-12-30T16:36:11Z

Insisting on bureaucratic process doesn't always make things work better, but perhaps this time it would have. There is a huge time-vs-space safety tradeoff for fromArray vs fromArrayMonolithic. fromArray uses unsafeAt to get immediate O(log(min{i,n-i})) access to the resulting sequence. fromArrayMonolithic uses the primitive stuff to guarantee space safety if an array is converted and the resulting sequence split up. An approach using elems offers neither of these benefits—no immediate access and no space safety.

tibbe · 2014-12-30T17:02:25Z

I generally agree with @foxik here. It would help to clearly show the benefits of having both (i.e. with benchmarks). We also need to be careful so we measure the right thing. Deferring lots of things lazily can often give performance that we don't see in real programs (where we typically force the elements eventually and suffer higher GC costs down the line due to holding on to thunks.)

treeowl · 2014-12-30T17:29:08Z

I can put benchmarks together for sure. In this case, deferring things lazily is a pretty clear performance win for some reasonably likely situations. In particular, it's extremely common to build a structure (e.g., a Seq), and then only use it as-is a single time. By deferring the construction of each subtree until that subtree is used, we can avoid the very high cost of letting all the pieces fall out of cache as we construct the sequence only to drag them back in again to use it. Something like f <$> fromArray arry <*> s2, for example, will get a big speed boost from this.

tibbe · 2014-12-30T17:30:25Z

It sounds like you have a good benchmark then. :) Please add it to the benchmark suite and post the results here (including the naive toList comparison).

foxik · 2014-12-30T17:39:01Z

The bureaucratic process you are referring to has a reasonably good meaning -- on one side, it allows the community to express its oppinion, and on the other side, it makes the proposal more explicit and usually causes that the alternatives are spelled out and considered; in this situation, we probably would have discussed the fast_but_keeping_reference vs slow_but_no_references approaches and settle on one of them or decided that we need both.

treeowl · 2014-12-30T17:41:38Z

@foxik Yes, you are right. It can definitely be helpful sometimes. Sometimes it just bogs down in silly bickering, which does not negate the fact that sometimes it is helpful. I made a mistake here, and I'm sorry.

foxik · 2014-12-30T22:26:17Z

There is no need to feel sorry -- the mistake of adding fromArray without discussing the semantics is mine as a maintainer, not yours :-)

I understand you can feel that the library sumission process is blocking and slowing down your work. But there is a good reason for it, because the widely used packages like containers need a lot of stability. Also, multiple heads are better than one.

sjakobi

I somewhat agree that this should be discussed on the libraries list.

I'll "request changes" to clarify that this discussion should happen before the PR could be merged.

treeowl force-pushed the fromArrayMonolithic branch 3 times, most recently from 12a2eeb to 68d58e2 Compare December 25, 2014 03:45

Add fromArrayMonolithic

6ce774c

For GHC, this should let us create a sequence out of an array without having any stray pointers to the array in the sequence.

treeowl force-pushed the fromArrayMonolithic branch from 68d58e2 to 6ce774c Compare December 25, 2014 03:57

Switch from ST to State

c72717b

Allocation actually goes down quite a bit when using State instead of ST.

sjakobi requested changes Jul 15, 2020

View reviewed changes

sjakobi added Seq feature-request needs-libraries-proposal A proposal needs to be discussed on the libraries mailing list labels Jul 15, 2020

Add fromArrayMonolithic #115

Are you sure you want to change the base?

Add fromArrayMonolithic #115

Uh oh!

Conversation

treeowl commented Dec 25, 2014

Uh oh!

treeowl commented Dec 25, 2014

Uh oh!

treeowl commented Dec 25, 2014

Uh oh!

treeowl commented Dec 25, 2014

Uh oh!

treeowl commented Dec 25, 2014

Uh oh!

foxik commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

tibbe commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

foxik commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

foxik commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

tibbe commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

tibbe commented Dec 30, 2014

Uh oh!

foxik commented Dec 30, 2014

Uh oh!

treeowl commented Dec 30, 2014

Uh oh!

foxik commented Dec 30, 2014

Uh oh!

sjakobi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!