-
Notifications
You must be signed in to change notification settings - Fork 159
Concurrent normalization #2074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent normalization #2074
Conversation
b188fa8
to
e2a4522
Compare
f2e77e4
to
de54ed3
Compare
90b1223
to
8cf1a7e
Compare
@vmchale I recall Alex mentioning he couldn't reliably measure speedups, is that now fixed by using a work queue? Have you got any numbers for us? :) |
Part of that was a lack of programs to try it on where there was a significant branching factor when normalizing top-level binders. For the benchmarks / examples we have I was finding that actually not much was running at the same time (since so many entities are also so small this doesn't help). For many examples we have the number of children for a binder is typically 1, which means in the worst case there is no concurrency at all. When there are multiple children, it typically doesn't go higher than 2 or 3 in a design so the actual concurrency you observe is somewhat limited The point is I don't think this can really be answered without a benchmark program where normalizing a binder can result in many child binders needing to be normalized as a result (i.e. an entity to normalize where it's pretty much guaranteed it will be able to find enough work for all cores on a machine). I think ideally this would be two benchmarks:
|
@martijnbastiaan sort of? It runs faster instead of slower, but not always drastically better. Let me gather more data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to easily disable concurrent normalization?
I'd think that disabling it might be useful when trying debug the compiler.
Because with it enabled the various debug messages that clash can generate might be interleaved from different things being normalized in parallel.
I tried running with +RTS -N1
or +RTS -maxN1
but that doesn't seem to work.
For easy debugging you would likely have to compile without |
One thing we should probably do is make sure we record transformation history sensibly though. Actually debugging the normalization process becomes somewhat more hectic if rewrites are seen (e.g. through the medium of Perhaps that means waiting until an entity is normalized, and only then printing out the rewrite steps it took (with locked access to |
8cf1a7e
to
6866bcf
Compare
Okay, that's good - hopefully it means our examples are just not benefitting from concurrent normalization. An example of a more realistic codebase would be Christiaan's Contranomy, perhaps try there? (GitHub please copy threaded conversation from GitLab next pls.) |
Other "larger" public code bases I can think of are:
|
cc @christiaanb Unfortunately it seems to be a little slower on Contranomy too! Not concurrent:
Concurrent:
So you can see it's slower but nothing crazy (with work queues) |
Yeah, this is fairly similar to my experiences pre Vanessa's improvements, i.e. work queue and non-parallel GC (which I guess doesn't matter here anyway). At least contranomy is large enough that the std dev. doesn't go ridiculous |
It may be what we want to do is make a new flag, |
Unfortunately cfm seems to require work and in any case we see slowdowns in the majority of our little examples so I think @alex-mckenna's suggestion is the way to go. |
352d131
to
dc07f93
Compare
dc07f93
to
a1cf3d1
Compare
To save you looking around for everything, adding a flag is basically:
Then you can use the new flag by getting the |
Oh wow, thank you!! |
bd22ac3
to
8e7a8de
Compare
8e7a8de
to
9c2b6f9
Compare
@@ -28,25 +29,25 @@ import Clash.Rewrite.Types (Rewrite, RewriteMonad) | |||
-- | State of the 'NormalizeMonad' | |||
data NormalizeState | |||
= NormalizeState | |||
{ _normalized :: BindingMap | |||
{ _normalized :: MVar BindingMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is insufficient because inlineWorkFree
calls normalizeTopLvlBndr
So why is this bad? Take the following example:
top x = (g x, f x)
f x = h + x
{-# NOINLINE f #-}
g x = h * x
{-# NOINLINE g #-}
h = expensive_workfree_expression
After normalizing top
, we normalize f
and g
concurrently. And both will do an inlineWorkFree
of h
. They could then both start normalizing h
before committing it to the cache. Thus I think normalizeTopLvlBndr
should do something akin:
normalizeTopLvlBndr isTop nm (Binding nm' sp inl pr tm _) = do
normalizedV <- Lens.use (extra.normalized)
cache <- takeMVar normalizedV
case lookup nm cache of
Just vMVar ->
putMVar normalizedV cache
readMVar vMVar
Nothing ->
tmp <- newEmptyMVar
putMVar normalizedV (extendVarEnv nm tmp cache)
... continue normalization ...
putMVar tmp value
return value
I doesn't save us much time, because the thread arriving second still has to wait for the first. But the further along the first thread is, the more time we save.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! That is clever.
It slows things down a little bit, however, perhaps due to more MVar
s or perhaps because I convert to lists in order to mapM
over a VarEnv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... slowing down... not the effect I was hoping for :)
So a couple of things:
No use in doing (2) and (3) before (1) though. |
@@ -0,0 +1 @@ | |||
CHANGED: Add concurrent normalization flag [#2074](https://github.com/clash-lang/clash-compiler/pull/2074) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by review: Please explicitly name the flag (-fclash-concurrent-normalization
) in the Changelog, otherwise people still have to hunt for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well as a brief description of what it does. The Changelog should be an understandable stand-alone unit. It doesn't have to be extensive, but it should be informative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it should be ADDED
, not CHANGED
. It would probably be good to document somehwere the keywords we expect here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started documenting changelog best practices in #2169.
Co-authored-by: Vanessa McHale <vamchale@gmail.com>
5b32515
to
861f350
Compare
@@ -271,6 +271,11 @@ Clash Compiler Flags | |||
|
|||
.. _`Edalize`: https://github.com/olofk/edalize | |||
|
|||
-fclash-concurrent-normaliztation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-fclash-concurrent-normaliztation | |
-fclash-concurrent-normalization |
if not (id' `elemVarSet` bound) | ||
then do | ||
-- mark that we are attempting to normalize id' | ||
MVar.putMVar binds (bound `extendVarSet` id', pairs) | ||
pair <- normalize' id' q | ||
MVar.modifyMVar_ binds (pure . second (pair:)) | ||
else | ||
MVar.putMVar binds (bound, pairs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could do with some explanation.
Is this accurate?
if not (id' `elemVarSet` bound) | |
then do | |
-- mark that we are attempting to normalize id' | |
MVar.putMVar binds (bound `extendVarSet` id', pairs) | |
pair <- normalize' id' q | |
MVar.modifyMVar_ binds (pure . second (pair:)) | |
else | |
MVar.putMVar binds (bound, pairs) | |
if not (id' `elemVarSet` bound) | |
then do | |
-- mark that we are attempting to normalize id' | |
MVar.putMVar binds (bound `extendVarSet` id', pairs) | |
pair <- normalize' id' q | |
-- record the normalized id' | |
MVar.modifyMVar_ binds (pure . second (pair:)) | |
else | |
-- id' is already (being) normalized | |
MVar.putMVar binds (bound, pairs) |
nextS <- Lens.use uniqSupply | ||
normalizeStep q binds nextS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this just be:
nextS <- Lens.use uniqSupply | |
normalizeStep q binds nextS | |
normalizeStep q binds s |
Since the Supply
is local to our monad and it couldn't have changed, right?
MVar.withMVar ioLockV $ \() -> | ||
traceWhen (hasTransformationInfo AppliedTerm opts) | ||
("Dropping type application on TopEntity: " ++ showPpr (varName f) ++ "\ntype:\n" ++ showPpr tyArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having to do
ioLockV <- Lens.use ioLock
MVar.withMVar ioLockV $ \() ->
traceWhen cond "..."
all over the place.
Maybe we should have a helper, something like:
traceWithIoLockWhen cond msg = when cond $ do
ioLockV <- Lens.use ioLock
MVar.withMVar ioLockV $ \() ->
traceWhen cond "..."
Which has the added benefit of not needing to get the lock when the condition is False.
(the name could do with some bikeshedding probably)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just
Monad.when (hasTransformationInfo AppliedTerm opts) $
MVar.withMVar ioLockV $ \() -> traceM ...
I would rather just use the "normal" when
/ unless
from Control.Monad
in situations like these instead of making new strangely-specific functions
We've only been able to measure slight slowdowns, and never speedups. Let's keep the branch around, but I doubt we're going to merge this as is. |
Still TODO: