Simformer #1621

nMaax · 2025-07-16T07:21:34Z

Simformer

Important

This PR is part of Google Summer of Code 2025

Note

Before opening this PR, I initially experimented with the Simformer and auxiliary components in a separate branch of my sbi fork. You can find it here. I used such branch (simformer-dev) as a first environment where I could experiment solutions with more freedom, then I opened this PR once I got a minimum viable product. Such branch basically served as my working enviroment for the first month and a half of the GSOC. Nevertheless, all the code I finalized there has been fully incorporated into this PR you are reading.

More specifically, in such branch I mainly worked on a first version of the Simformer neural network architecture and the "masked" interface, I also attempted to introduce a Joint distribution interface, i.e., a parallel interface to the current "Posterior" approach in sbi that could generalize better to the Simformer case—as the Simformer do not work by means of "posterior", "likelihood" or such, but more generally by means of arbitrary conditionals. Neverthless, the later has been dropped to rather implement the use of a Wrapper class that could adapt the more general Simformer approach to the existing sbi posterior interface (see below for more information)

Implemented the Simformer, Gloeckler et al. 2024 ICML. The Simformer aims to unify the various simulation-based inference paradigms (posterior, likelihood, or arbitrary conditional sampling) within a single framework, allowing users to sample from any conditional distribution of interest—potentially acting also by a novel data generator if one samples the unconditioned joint distribution of all variables.

The Simformer diverges from the standard sbi paradigm of data provided by means of theta and x, it rather exploits a full tensor inputs of data and two masks:

A condition_mask to identify which variables are latent (to be inferred by the Simformer) and which are observed (ground data)
A edge_mask to identify relationships between variables, equivalent to and adjacency matrix for a DAG. This mask will be directly used by the transformer attention block to mask-out certain attention scores.

Design of the Masked Classes

To accomplish this, it has been necessary to create some "parallel" classes of the current ScoreEstimator, VectorFieldEstimator, etc. to work by means of this "masked" paradigm.

Generally, each "Masked" version of other objects are provided exactly below their counterpart in the same python file, e.g. MaskedConditionalVectorFieldEstimator is exactly below the code block of ConditionalVectorFieldEstimator; and they simply consist in an overall re-factor of the original counterpart, where each use of a "theta and x" or "inputs and condition" has been replaced with a general "inputs, condtion_mask, and edge_mask".

It has been also introduced a Wrapper class able to adapt the original API of the Posterior to the Simformer one, thanks to this class one is able to simply call build_conditional() method directly on the Simformer inference object and obtain a standard Posterior object that works as always—given some fixed condition and edge masks. The Wrapper handles all the shapes automatically and perform auxiliary operations to pass the data to a Simformer network and the underlying masked estimator; this is done mainly through two helper functions: assemble_full_inputs() and disassemble_full_inputs(), which are able to convert between the $(\theta, x)$ setting to the full input tensor, and back.

At inference time, an edge_mask can be specified, otherwise it will be None (equivalent to a full ones tensor, but memory safer), condition_mask instead must be specifically passed at build_conditional time; another option is to directly use the build_posterior() and build_likelihood() method which will automatically generate an appropriate condition_mask based on posterior_latent_idx and posterior_observed_idx parameters specified at init() of the Simformer.

Also at training time an edge_mask can be specified, if not the default value will still be None, more generally the user can pass a Callable to generate condition or edge masks, so that one can simply choose the mask distributions they prefer. Sets of tensors/lists or even just one tensor can be passed as well. Masks are also generated just-in-time (JIT) for the training, that is, they are not provided at append_simulation(), but during the train() in order to save up memory. Differently from inference time, here if a condition mask is not specified, a default generator will be used, producing masks sampled by a $\text{Bernoulli}(p=0.5)$.

Note that the Simformer potentially allows the user to set any mask of their choice both at training and inference time, it is rather duty of the user to provide coherent definitions (callables, sets, or fixed tensors) that make sense, e.g. if the user passes a specific edge mask at training time, the Simformer will learn that specific DAG structure, it is then duty of the user to pass a coherent edge_mask also when calling build_conditional, build_posterior or build_likelihood.

Furthermore, the Simformer is also able to manage invalid inputs (nan's and inf's) natively, if handle_invalid_x=True then the Simformer will automatically spot invalid inputs at training time (still JIT) and switch their state on the condition mask as latent (to be inferred), other than also replace such values with small Gaussian noise for numerical stability.

Also, a Flow-matching equivalent of the Simformer (we assumed the above to be score-based) has been provided.

This PR then includes integration with the mini-sbibm benchmakr suite, and a notebook tutorial for the Simformer (under advanced_tutorials/docs), where I showcase its use. I also tried to make the API Reference as clear as possible for documentation.

Refactor of existing code

Parts of the existing code have been refactored, mainly to avoid repetition of code and keep everything DRY. The most important pieces of code that have been modified are:

the SDE Estimators, where we moved the definition of mean_t, std_t etc. into some standard Mixins (e.g., instead of VEScoreEstimator(ConditionalScoreEstimator) one now have VarianceExplodingSDE which defined mean_t, std_t etc., and VEScoreEstimator becomes VEScoreEstimator(ConditionalScoreEstimator, VarianceExplodingSDE); so that I can also define easily MaskedVEScoreEstimator(MaskedConditionalScoreEstimator, VarianceExplodingSDE) without repeating the VE SDE pieces.)
the NeuralInference interface, which has been split using a Mixin too (BaseNeuralInference) which defines shared properties of both NeuralInference and MaskedNeuralInference, this also requested some minor adjustments mainly for methods such as _resolve_prior() and _resolve_estimator(), most importantly a new NoPrior object has been created as a temporary solution for Keep prior optional and remove unnecessary copies of theas from ImproperPrior. #1635
the ConditionalVectorFieldEstimator and the MaskedConditionalVectorFieldEstimator where simplified by moving shared code into a Mixin called BaseConditionalVectorFieldEstimator, mainly regarding mean_base, std_base properties, or methods such as diffusion_fn()

Summary of modified files

Files I modified should count to be the following:

sbi/inference

sbi/inference/trainers/base.py: Added MaskedNeuralInference.
sbi/inference/trainers/vfpe/base_vf_inference.py: Added MaskedVectorFieldEstimatorBuilder and MaskedVectorFieldInference (subclass of MaskedNeuralInference).
sbi/inference/trainers/vfpe/simformer.py: New file introducing the Simformer inference class.

sbi/neural_nets

sbi/neural_nets/factory.py: Added support for building Simformer networks (simformer_nn).
sbi/neural_nets/estimators/base.py: Added MaskedConditionalEstimator and MaskedConditionalVectorFieldEstimator (subclass of MaskedConditionalEstimator).
sbi/neural_nets/estimators/score_estimator.py:
- Added MaskedConditionalScoreEstimator (subclass of MaskedConditionalVectorFieldEstimator), placed directly above ConditionalScoreEstimator.
- Added MaskedVEScoreEstimator (subclass of MaskedConditionalScoreEstimator).
sbi/neural_nets/net_builders/vector_field_nets.py:
- build_vector_field_estimator updated to support simformer and masked-score.
- Introduced MaskedSimformerBlock, MaskedDiTBlock, SimformerNet (subclass of MaskedVectorFieldNet), and build_simformer_network (defines default architecture parameters).

sbi/utils

sbi/utils/vector_field_utils.py: Added MaskedVectorFieldNet.

sbi/analysis

sbi/analysis/plots.py: Minor fix to ensure CPU conversion in ensure_numpy() (added .cpu() before .numpy()).

Unit Test

Introduced benchmarks (mini_sbibm) and test for the simformer and related masked objects in

tests/linearGaussian_vector_field_test.py
tests/posterior_nn_test.py
tests/vector_field_nets_test.py
tests/vf_estimator_test.py (which also includes shape tests on the Wrapper)
tests/bm_test.py

Regarding linear gaussian tests, I tried to implement the simformer tests in existing methods as much as possible, nonetheless iid test and sde/ode sampling equivalence are still provided as separate dedicated tests and fixtures

New files

docs/advanced_tutorials/22_simformer.ipynb
sbi/inference/trainers/vfpe/simformer.py: including both Score-based and Flow-matching Simformer interfaces

Thank you

Thank you sbi and Google for this opportunity. It has been so rewarding implementing the Simformer: not only I learned something completely new itself, but most importantly I understood how to do it: having to familiarize with new concepts, writing code within code made by others, and following indications of mentors are the real value of this experience. Special thanks to my mentors Manuel (@manuelgloeckler ) and Jan (@janfb ) for accepting my proposal, and @manuelgloeckler in particular for having helped me throughout the whole journey!

Removing code duplication on embedding net handing

…rgument in NPSE

…le full input in MVF Wrapper for 2-dim tensors

…ons in wrapper

nMaax · 2025-08-30T22:04:11Z

After re-basing this PR to merge into main, a bunch of collaborators' commits from the original parent branch appeared here. Tried to clean a little by squashing commits, but it would have required 285 different conflicts solutions 😅 so I aborted the operation and had to keep everything as it is

…t use of simformer (condition is an empty tensor)

…f not slow tests

… up time of not slow tests

…t is default True, in linear gaussian vf test

…ing a warning in case it is detected

…xture to gpu Pass device information to IID method in VectorFieldBasedPotential

nMaax · 2025-09-01T14:58:29Z

Alright, as requested by Google:

If the pull request is going to have more work done after GSoC is over, make sure the last GSoC commit is noted.

I mark the below as the last commit for my GSoC. Nonetheless, I am still able to work more on this to implement advices and fixes after review👍

…atiom" This reverts commit eb73ffe.

…one has been defined yet" This reverts commit d8253be.

…eck during sampling

…er and FlowMatchingSimformer for coherence with sbi

manuelgloeckler added 30 commits May 28, 2025 12:35

Fix tests and init transformer last layer as zero

18337d4

Remove test jupyter :/

c33cd92

Formatting, refactoring tests

a9ded55

Fix pyright

1331d78

Remove what is expected to fail

e91a9f0

Minor fixes

46f86c2

Small docstring update

3f3fcc5

Backward compatiblity warnings from some unused kwargs

380b334

Typing with vectorfield net

14262f1

Simplify score estimator

cf762ea

Updates

2839aec

Fixing transformer with cross attn

151973a

Add error msg for unsupported shapes

b23418b

Better tests

689a727

refactored tests

0b3dc40

Reverting wierd reshapings in score estimator.

49f8caf

Removing code duplication on embedding net handing

Merge branch 'main' into merge_flow_builders_to_current_main

66a68eb

Fix formating issues

0293383

Fixing inconsistencies

c585eb1

Fixing pyright

42c1d56

Fix embedding_net not passed

024f54e

Fix embedding net bug

59e35a0

Remove redundant "num_blocks"

f1cf710

Merge branch 'main' into merge_flow_builders_to_current_main

bed97b6

Adding some degree of backward compatibility on user interface.

a0fb73d

Fixing failing test on new convergence check

387f965

Add transformer to bm

fe00c02

Must be okay that the files already exits bm

1c00bcd

Fix merge bug. Add deprecation warnings for Score estimator keyword a…

27a9c3a

…rgument in NPSE

Fixing transformers... (no pos emb. and others)

a9445ab

nMaax added 6 commits August 30, 2025 19:22

Removed wrong dimension in condition reshaping/repetition for _assemb…

ee1ff2a

…le full input in MVF Wrapper for 2-dim tensors

Added comments for assemble and disassemble full inputs helper functi…

b56fdd5

…ons in wrapper

Improved overall MVF Wrapper comments for clarity

018c301

Adapted samples rejection logic for NoPrior poor shape informatiom

eb73ffe

Added Siformer and FlowMatchingSimformer to default benchmark run

6bf3111

Removed useless unsqueeze for benchmark on Simformer

d7ee92a

nMaax added 13 commits August 31, 2025 00:36

Improved references to Simformer paper and minor docstrings

91738a2

Minor docustring improvement

666c097

Removed sde_type from FlowMatchingSimformer as unused

e4d62fb

Minor docstring improvement

2d4fe62

Further added comments and docstring into MVF Wrapper

ac521ed

Adapted Wrapper and Posterior to handle degenerate case of full laten…

51c2b9a

…t use of simformer (condition is an empty tensor)

Marked two linear gaussian tests simformer as slow to speed up time o…

578bdd8

…f not slow tests

Further marked other linear gaussian tests simformer as slow to speed…

43fa23a

… up time of not slow tests

Improved comments and typing, removed parameter for progress bar as i…

8a5b2c4

…t is default True, in linear gaussian vf test

Ensuring tensor on device for VE sde drift fn

a83b674

Running simformer fixtures on gpu

0528680

Ensuring masks on the same device as input in Simformer network, rais…

db24a5e

…ing a warning in case it is detected

Moved ode/sde equivalence and iid linear gaussian checks simformer fi…

6997551

…xture to gpu Pass device information to IID method in VectorFieldBasedPotential

nMaax requested a review from manuelgloeckler September 1, 2025 14:58

nMaax added 8 commits September 1, 2025 18:04

Separated Simformer and FlowSimformer benchmark cases

2472a47

Revert "Adapted samples rejection logic for NoPrior poor shape inform…

6c8c64c

…atiom" This reverts commit eb73ffe.

Revert "Update NoPrior batch and even shape at append simulation if n…

d86fc69

…one has been defined yet" This reverts commit d8253be.

Introduced support constraint real in NoPrior, fix errors in shape ch…

0ba5863

…eck during sampling

Merge remote-tracking branch 'upstream/main' into simformer

a646c25

Re-ordered params for build posterior and build likelihood in Simform…

eee967c

…er and FlowMatchingSimformer for coherence with sbi

Attempted to fix bug in iid lin gauss tests over simformer

5fff8db

Re-inserted check on calc misspecificaton mmd

90d3d5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simformer #1621

Simformer #1621

nMaax commented Jul 16, 2025 •

edited

Loading

Uh oh!

nMaax commented Aug 30, 2025 •

edited

Loading

Uh oh!

nMaax commented Sep 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Simformer #1621

Are you sure you want to change the base?

Simformer #1621

Conversation

nMaax commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Simformer

Design of the Masked Classes

Refactor of existing code

Summary of modified files

Unit Test

New files

Thank you

Uh oh!

nMaax commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nMaax commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nMaax commented Jul 16, 2025 •

edited

Loading

nMaax commented Aug 30, 2025 •

edited

Loading

nMaax commented Sep 1, 2025 •

edited

Loading