Skip to content

Conversation

@penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Nov 13, 2025

This PR replaces LogDensityFunction with FastLDF.

This PR, at first, also removed SimpleVarInfo. This has been separated into a separate PR, but the rationale for removing SVI is detailed here.

SVI

SimpleVarInfo

I would argue that the main purpose of SVI was for performance when evaluating a model with NamedTuple or Dict parameters:

retval, svi = DynamicPPL.evaluate!!(model, SimpleVarInfo(params)) # using DefaultContext

However, exactly the same thing can be accomplished now with the combination of InitFromParams + OnlyAccsVarInfo:

# it's a one-liner now:
retval, oavi = DynamicPPL.fast_evaluate!!(model, InitFromParams(params, nothing), DynamicPPL.default_accumulators())

# Which expands to this:
ctx = InitContext(InitFromParams(params), nothing))
oavi = OnlyAccsVarInfo(DynamicPPL.default_accumulators())
model = setleafcontext(model, ctx)
retval, oavi = DynamicPPL.evaluate!!(model, oavi)

Indeed, this strategy is on par with SVI for the NamedTuple case and faster than SVI for the Dict case (see benchmarks on #1125; performance characteristics on this PR are the same).

The other context in which SVI was useful was evaluation with vector parameters, but that's handled by FastLDF. Thus, my conclusion is that SVI is no longer needed.

Furthermore, because SVI is gone, I believe DynamicTransformation, StaticTransformation, and maybe_invlink_before_eval!! can also go. However, we'll leave that for another time.

I looked through GitHub for usage of SimpleVarInfo, and it's only in DPPL, Turing, Mooncake benchmarks (trivially fixed once FastLDF is released), and https://github.com/epimap/Epimap.jl-public and https://github.com/marius311/MuseInference.jl, which haven't been updated in a while.

@penelopeysm penelopeysm marked this pull request as draft November 13, 2025 22:17
@github-actions
Copy link
Contributor

github-actions bot commented Nov 13, 2025

Benchmark Report

  • this PR's head: 0c10eaa8f09d75c5f4b85e0bcd545ff9a1bf49a3
  • base branch: 4a1156038eb673dc9567d8a3a4d008455ec83908

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, icelake-server)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬─────────────────────────────────┬───────────────────────────┬─────────────────────────────────┐
│                       │       │             │                   │        │        t(eval) / t(ref)         │     t(grad) / t(eval)     │        t(grad) / t(ref)         │
│                       │       │             │                   │        │ ──────────┬───────────┬──────── │ ──────┬─────────┬──────── │ ──────────┬───────────┬──────── │
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │      base │   this PR │ speedup │  base │ this PR │ speedup │      base │   this PR │ speedup │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│               Dynamic │    10 │    mooncake │             typed │   true │    544.51 │    358.71 │    1.52 │  7.88 │   15.69 │    0.50 │   4289.81 │   5627.12 │    0.76 │
│                   LDA │    12 │ reversediff │             typed │   true │   3027.37 │   3144.99 │    0.96 │  2.11 │    4.51 │    0.47 │   6388.96 │  14171.30 │    0.45 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │ 167236.65 │ 106043.14 │    1.58 │  5.84 │    4.39 │    1.33 │ 976250.66 │ 465391.65 │    2.10 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │  15652.16 │   9007.78 │    1.74 │  5.79 │    4.54 │    1.27 │  90648.57 │  40933.46 │    2.21 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │  51647.85 │  33398.58 │    1.55 │ 11.20 │    9.90 │    1.13 │ 578285.39 │ 330691.30 │    1.75 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │   5566.15 │   3999.24 │    1.39 │  6.33 │    8.38 │    0.76 │  35260.10 │  33531.38 │    1.05 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │     18.95 │      3.80 │    4.98 │  1.92 │    2.72 │    0.71 │     36.45 │     10.34 │    3.52 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │   3018.27 │   1304.63 │    2.31 │ 45.01 │  118.98 │    0.38 │ 135838.19 │ 155220.13 │    0.88 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │  25515.71 │       err │     err │ 28.67 │     err │     err │ 731549.53 │       err │     err │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │   1303.53 │       err │     err │ 81.24 │     err │     err │ 105901.97 │       err │     err │
│           Smorgasbord │   201 │      enzyme │             typed │   true │   3016.50 │       err │     err │  4.42 │     err │     err │  13345.88 │       err │     err │
│           Smorgasbord │   201 │    mooncake │             typed │   true │   2998.64 │   1786.02 │    1.68 │  4.96 │    6.18 │    0.80 │  14859.53 │  11034.99 │    1.35 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ reversediff │             typed │   true │   2989.03 │   1850.05 │    1.62 │ 56.19 │   81.65 │    0.69 │ 167962.04 │ 151058.10 │    1.11 │
│           Smorgasbord │   201 │ forwarddiff │      typed_vector │   true │   3140.29 │   1775.75 │    1.77 │ 46.31 │   60.42 │    0.77 │ 145416.15 │ 107294.50 │    1.36 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │   2657.01 │   1766.20 │    1.50 │ 75.79 │   58.37 │    1.30 │ 201377.14 │ 103096.86 │    1.95 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼───────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │    untyped_vector │   true │   2704.79 │   1782.86 │    1.52 │ 47.24 │   59.19 │    0.80 │ 127776.41 │ 105529.80 │    1.21 │
│              Submodel │     1 │    mooncake │             typed │   true │     29.51 │      8.53 │    3.46 │  5.20 │    5.36 │    0.97 │    153.30 │     45.72 │    3.35 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴───────────┴───────────┴─────────┴───────┴─────────┴─────────┴───────────┴───────────┴─────────┘

@github-actions
Copy link
Contributor

DynamicPPL.jl documentation for PR #1139 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1139/

@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 95.29412% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.07%. Comparing base (4a11560) to head (0c10eaa).
⚠️ Report is 1 commits behind head on breaking.

Files with missing lines Patch % Lines
src/logdensityfunction.jl 94.20% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           breaking    #1139      +/-   ##
============================================
- Coverage     81.59%   81.07%   -0.52%     
============================================
  Files            42       41       -1     
  Lines          3955     3894      -61     
============================================
- Hits           3227     3157      -70     
- Misses          728      737       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@penelopeysm penelopeysm force-pushed the py/not-experimental branch 2 times, most recently from c4f93fe to c1ee6cc Compare November 14, 2025 00:37
@penelopeysm penelopeysm force-pushed the py/not-experimental branch 2 times, most recently from d86ff22 to be27184 Compare November 14, 2025 01:11
@penelopeysm penelopeysm force-pushed the py/not-experimental branch 2 times, most recently from 0cc9278 to 6f5df1a Compare November 14, 2025 02:07
@penelopeysm penelopeysm force-pushed the py/not-experimental branch 2 times, most recently from 177656b to 9310ec0 Compare November 15, 2025 20:44
@penelopeysm penelopeysm requested a review from mhauru November 17, 2025 16:08
@penelopeysm
Copy link
Member Author

@mhauru Feel free to not do a full review at this stage, right now I'm more curious as to your thoughts about SVI. Happy to either keep it in for now, or remove it outright. Regardless of which way we go, I can rebase this against breaking so that this can actually be reviewed.

@mhauru
Copy link
Member

mhauru commented Nov 18, 2025

Trying to think of reasons to keep SVI, and the only one I can think of would be if we, or someone else, used it with one of the methods that don't use LDF. It hasn't traditionally worked with particle samplers because it didn't have num_produce. It should now work, since num_produce is an accumulator. I doubt anyone has started using it. I guess we could try to start using it, see what happens. I can't recall how it does with Gibbs, but I doubt it'll work in cases where you mix samplers that want or do not want linking, since it can't link individual variables.

Its days are definitely numbered. The only reason I can think of for not doing it now would be to investigate using it with particle samplers, but I doubt we'll get to that. So I'd lean towards getting the code simplification straight-away, make it easier to work with VarInfo as we keep refactoring.

@penelopeysm penelopeysm changed the base branch from py/params-from-ldf to breaking November 18, 2025 11:07
@penelopeysm penelopeysm force-pushed the py/not-experimental branch 3 times, most recently from 7fa0986 to 6849bc2 Compare November 18, 2025 15:48
@penelopeysm
Copy link
Member Author

penelopeysm commented Nov 18, 2025

I think CloudFlare is responsible for the remaining ordinary test suite failures. The perf benchmarks are pretty much where I expected them to be.

The gradient evaluation benchmarks are not amazing, but I think it's mainly because any perf gains for these models are in primal evaluation, and so if the gradient doesn't also get sped up, it looks bad.

Arguably there should be a third column which is t(grad) / t(ref).

@penelopeysm penelopeysm marked this pull request as ready for review November 18, 2025 16:12
HISTORY.md Outdated

#### SimpleVarInfo

`SimpleVarInfo` has been removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleVarInfo has cleaner code than VarInfo. Doesn't it make more sense to remove VarInfo and replace it with SimpleVarInfo?

Copy link
Member Author

@penelopeysm penelopeysm Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markus can probably say more, but I don't think the choice is between SVI and VI. The structs are actually the same except that SVI carries an extra field to store a transformation (which would typically be part of VI's metadata field).

struct SimpleVarInfo{NT,Accs<:AccumulatorTuple where {N},C<:AbstractTransformation} <:
AbstractVarInfo
"underlying representation of the realization represented"
values::NT
"tuple of accumulators for things like log prior and log likelihood"
accs::Accs
"represents whether it assumes variables to be transformed"
transformation::C
end

struct VarInfo{Tmeta,Accs<:AccumulatorTuple} <: AbstractVarInfo
metadata::Tmeta
accs::Accs
end

I think the complexity lies in what kinds of metadata they carry. SVI code looks simpler because src/simple_varinfo.jl only handles NamedTuple or Dict metadata. VarInfo code looks terrible because src/varinfo.jl mostly deals with Metadata metadata (hence all the generated functions).

So instead of saying removing SVI, maybe it would have been more accurate to say that this is removing NamedTuple and Dict as supported forms of metadata. As far as I can tell it's mostly a coincidence that this code is associated with SVI.

If the aim is to clean up src/varinfo.jl, then what really needs to happen is to replace Metadata with VNV (although VNV has 1.7k lines of code to itself, so IMO that's not exactly clean; but it probably feels better because it's in its own file, more documented, etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(#1105)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the plan is to use SimpleVarInfo with VNV and VNT as the unifying tracing data type, largely because SimpleVarInfo has a cleaner API and better documentation. I agree that the differences between SVI and VI aren’t all that significant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason we can currently dispense with SVI but not VI is that Gibbs requires the ability to link some of the variables, and only some. Whereas SVI assumes that either all or none are linked. If not for needing VI for Gibbs, I think we could be pretty close to in fact removing both SVI and VI, if we can just get ValuesAsInModelAccumulator to be as fast as they are (something I hope #1150 will eventually lead to).

Copy link
Member

@mhauru mhauru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the PR, I'm inclined to request splitting the deletion of SVI into a separate PR, if it's not too much work. Reasons:

  • I think it would be good to remove DynamicTransform and maybe some other stuff that is no longer needed in the same go, to not leave dead code behind to confuse us later.
  • I foresee wanting to revert the removal of SVI to benchmark VNT against it at some point, and thus would be good to have it all in a single commit, or a range of back-to-back commits.

Otherwise broadly happy with this, some naming questions.

julia> model = demo();
julia> vi = SimpleVarInfo(x=1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the StaticTransform stuff not work with VarInfo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. SimpleVarInfo has a field for the transformation

"represents whether it assumes variables to be transformed"
transformation::C

but VarInfo doesn't (transformation is done on a per-variable basis). It will always hit this method:

# NOTE: This is kind of weird, but it effectively preserves the "old"
# behavior where we're allowed to call `link!` on the same `VarInfo`
# multiple times.
transformation(::VarInfo) = DynamicTransformation()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not that it matters now, ofc, but I guess this will come up when we discuss SVI :))

src/fasteval.jl Outdated

"""
FastLDF(
DynamicPPL.fast_evaluate!!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above I raised a comment about whether this needs to be exported. I also wonder even if it's not exported whether it could have a more descriptive name. If I would come across this, I would think "if fast_evaluate!! is fast, why does evaluate!! exist then". "Fast" feels like an implementation thing and maybe the more interface-oriented question is "what does evaluate do that this doesn't", or the other way around.

Copy link
Member Author

@penelopeysm penelopeysm Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it doesn't need to be exported. I would say the main reason why this exists is to save on a lot of code repetition. It's needed upstream in Turing as well (e.g. in Prior() sampling) so I think it is a good idea to have a method. I see the point about the name though; it's not very meaningful except to you and I.

In many ways, I think the point of this method is really to generate a set of accumulators given an initialisation strategy. So one could say init_accs!! maybe?

Or, even more Julian (although antithetical to my own sensibilities :) - but it's conceptually similar enough that I actually quite like it), is to overload DynamicPPL.init!!(...) (see current definition here) for AccumulatorTuple? It might be a bit weird in that it would take an AccumulatorTuple (or NTuple{N,<:AbstractAccumulator}) and return an OnlyAccsVarInfo, but I reckon that's fine.

If we wanted to make it more consistent, we could make AccumulatorTuple subtype AbstractVarInfo directly, e.g.

DynamicPPL.getaccs(at::AccumulatorTuple) = at
DynamicPPL.setaccs!!(::AccumulatorTuple, new_at::AccumulatorTuple) = new_at

I would say that technically it should not subtype AbstractVarInfo but maybe there should be a broader supertype, AbstractContainsAccs, and AbstractVarInfo should be a subtype of that. I can't say I'm bothered enough to change that.

Copy link
Member Author

@penelopeysm penelopeysm Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could make AccumulatorTuple subtype AbstractVarInfo directly

I've just gone ahead and done this now. What used to be

fast_evaluate!!(model, strategy, accs)

is now simply the existing function

init!!(model, accs, strategy)

There is actually a slight difference in code, which is that init!! will call resetaccs!!. I thought that this might cause some slowdown, but I benchmarked it and it's nonexistent / swallowed by benchmark noise anyway. So I'm quite pleased with this new setup.

The only issue seems to be Enzyme is a bit unhappy, but I think that's something that can be resolved separately.

I'm aware I did this quite impulsively without giving time for discussion, so if you don't like it / have another idea, please tell me, I won't be annoyed.

Copy link
Member Author

@penelopeysm penelopeysm Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, very sadly, I ran into method ambiguities. In particular, at[:] was not well-defined because getindex(::AbstractVarInfo, ::Colon) says "get all the values", but getindex(::AccumulatorTuple, idx) says "get the accumulators". Obviously the ambiguity itself could be resolved manually, but there's a deeper semantic difference, so I didn't feel comfortable with doing that.

So we're back to using OnlyAccsVarInfo, but we can keep using init!! instead of fast_evaluate!!. Sorry about the rollercoaster.

Copy link
Member Author

@penelopeysm penelopeysm Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, even that failed. The problem is that init!! doesn't contain the TSVI-acc-eltype code (which is why CI is failing now). I copied that code into init!!, and that just absolutely killed performance, trivial model took 100 µs. No amount of @inline could get it to work. Frustrating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, I liked this solution.

@@ -1,49 +0,0 @@
using Test, DynamicPPL, ADTypes, LogDensityProblems, ForwardDiff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it that tests/fasteval.jl covers everything previously done here and in ad.jl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's the idea. Actually looking at this now, I realise I forgot to test some aspects of the LogDensityProblems interface. I'll add those tests in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@penelopeysm penelopeysm changed the title Make FastLDF the default + remove SimpleVarInfo Make FastLDF the default Nov 24, 2025

"""
AccumulatorTuple{N,T<:NamedTuple}
AccumulatorTuple{N,T<:NamedTuple} <: AbstractVarInfo
Copy link
Member

@yebai yebai Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s interesting that the accumulator mechanism evolved into a generalisation of VarInfos.

@penelopeysm penelopeysm force-pushed the py/not-experimental branch 4 times, most recently from 6fd177a to cf33cff Compare November 25, 2025 02:26
@penelopeysm
Copy link
Member Author

penelopeysm commented Nov 25, 2025

Alright, after some experimentation I think I managed to move the definition of fast_evaluate!! into init!! without causing any performance regressions or test failures. So there's no need to define or export fast_evaluate!!. But we do need to export init!! (I reckon it should probably have been exported in the last release anyway, so this is a nice course correction).

I think any subsequent changes to this need to be done in quite careful, small steps, since the performance characteristics seem genuinely quite fragile towards refactoring. That makes me quite displeased, but I guess that's the cost of caring about performance.

The SVI removal is rolled back now.

Finally, regarding file names: I would be very much on board with renaming both src/fasteval.jl and test/fasteval.jl to their respective logdensityfunction.jl. I just wasn't keen on doing it now because that would give incredibly ugly diffs for those files. Once this review is done, I can do that.

@penelopeysm penelopeysm requested a review from mhauru November 25, 2025 02:49
@penelopeysm
Copy link
Member Author

(BTW, the first commit is the big one, but that's the same commit as before — I had to rebase on breaking hence the force push includes that commit. Only the two smaller later ones are new since the last review.)

Copy link
Member

@mhauru mhauru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very pleased with this, thanks. Happy to merge after the renaming of files (or rename them in another PR, whichever you prefer).

Will the fragility of performance get better once TSVI is opt-in, and the num_threads > 1 check is replaced with something that can be evaluated at compile time? Also, even if it doesn't, I presume performance won't be worse than it was with the old LDF, just maybe roughly the same?

@penelopeysm
Copy link
Member Author

Will the fragility of performance get better once TSVI is opt-in

Possibly. That's the next thing to check... it might get better or worse! I tried dabbling with it but didn't manage to get anywhere. I think it'll be easier when I'm not juggling two changes at once though.

@penelopeysm penelopeysm merged commit 766f663 into breaking Nov 25, 2025
19 of 21 checks passed
@penelopeysm penelopeysm deleted the py/not-experimental branch November 25, 2025 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants