|
8 | 8 | "\n",
|
9 | 9 | "`sbi` incorporates also the Simformer, a novel framework to provide an \"all-in-one\" solution for Simulation-Based Inference.\n",
|
10 | 10 | "\n",
|
11 |
| - "Similarly to the NPSE and the FMPE algorithms, the Simformer leverage score-based generative models to sample from the posteriors. More specifically, the underlying architecture is a Transformer.\n", |
| 11 | + "Similarly to the NPSE and the FMPE algorithms, the Simformer leverage score-based and flow-matching generative models to sample from the conditional distribution, leveraging a Transformer-based architecture.\n", |
12 | 12 | "\n",
|
13 |
| - "While NPSE and FMPE replace Normalizing Flows as the core conditional density estimator for the posterior $p(\\theta|x)$, **Simformer** takes a more fundamental step. It aims to learn a single, unified score model that can implicitly represent **any joint distribution** of $\\theta$ and $x$, thereby unlocking a versatile range of inference capabilities from a single trained model.\n", |
| 13 | + "While NPSE and FMPE limit themselves to predict the posterior $p(\\theta|x)$ distributuion, the **Simformer** takes a more fundamental step. It aims to learn a single, unified model that can implicitly represent **any joint distribution** of $\\theta$ and $x$, thereby unlocking a range of inference capabilities from a single trained model.\n", |
14 | 14 | "\n",
|
15 |
| - "At its heart, Simformer is a **Masked Conditional Score Model**. That is, instead of treating parameters ($\\theta$) and data ($x$) separately, Simformer rather expects them as a single input vector, such a concatenation $\\hat{\\mathbf{x}} = [\\theta, x]$. Which is then accompanied with two other tensors that represents dependencies between inputs:\n", |
| 15 | + "At its heart, Simformer is a **Masked Conditional Model**. That is, instead of treating parameters ($\\theta$) and data ($x$) separately, Simformer rather expects them as a single input vector, such a concatenation $\\hat{\\mathbf{x}} = [\\theta, x]$. Which is then accompanied with two other tensors that represents dependencies between inputs:\n", |
16 | 16 | "\n",
|
17 |
| - "- The **Condition Mask ($M_C$)**, which explicitly designates which variables in $\\hat{\\mathbf{x}}$ are **observed** and which are **latent (to be inferred)**.\n", |
| 17 | + "- The **Condition Mask ($M_C$)**, which explicitly designates which variables in $\\hat{\\mathbf{x}}$ are **observed (to be conditioned on)** and which are **latent (to be inferred)**.\n", |
18 | 18 | "- The **Edge Mask ($M_E$)**, which inject prior knowledge about the **causal or statistical dependencies** between variables in $\\hat{\\mathbf{x}}$. By specifying which variables can \"attend\" to others in the Transformer aligment mechanism.\n",
|
19 | 19 | "\n",
|
20 | 20 | "Most importantly, the Simformer can be trained using different condition and edge masks at different training steps, generating a final neural network which did not limit its learning process to a specific inference setting, but can be flexibly re-used with arbitrary condition and edge masks.\n",
|
|
25 | 25 | "- Gloeckler M. et al. \"All-in-one simulation-based inference.\" ICML 2024."
|
26 | 26 | ]
|
27 | 27 | },
|
| 28 | + { |
| 29 | + "cell_type": "markdown", |
| 30 | + "metadata": {}, |
| 31 | + "source": [ |
| 32 | + "We first start importing dependencies and preparing a simulator." |
| 33 | + ] |
| 34 | + }, |
28 | 35 | {
|
29 | 36 | "cell_type": "code",
|
30 | 37 | "execution_count": null,
|
|
61 | 68 | "cell_type": "markdown",
|
62 | 69 | "metadata": {},
|
63 | 70 | "source": [
|
64 |
| - "Differently from other sbi methods, the Simformer works by means unflattened variable features, i.e., we must train the Simformer on simulations that provide features of each variable in separate, trailing dimension." |
| 71 | + "Differently from other sbi methods, the Simformer works by means unflattened variable features, i.e., we must train the Simformer on simulations that provide features of each variable in separate, trailing dimension. For our walkthrough we will implement the score-based Simformer, the flow-matching equivalent works by the same exact API." |
65 | 72 | ]
|
66 | 73 | },
|
67 | 74 | {
|
|
85 | 92 | "source": [
|
86 | 93 | "As for NPSE, the Simformer approximates the posterior distribution by learning its score function, i.e., gradient of the log-density, using the denoising score matching loss. Refer to [19_flowmatching_and_scorematching](#) and [20_score_based_methods_new_features.ipynb](#) for understanding denoising score matching.\n",
|
87 | 94 | "\n",
|
88 |
| - "Note also that only the single-round version of Simformer is implemented currently." |
| 95 | + ">Note also that currently only the single-round version of Simformer is implemented currently." |
89 | 96 | ]
|
90 | 97 | },
|
91 | 98 | {
|
|
95 | 102 | "outputs": [],
|
96 | 103 | "source": [
|
97 | 104 | "# Instantiate Simformer and append simulations\n",
|
98 |
| - "inference = Simformer(device='gpu')\n", |
| 105 | + "inference = Simformer()\n", |
99 | 106 | "inference.append_simulations(inputs)"
|
100 | 107 | ]
|
101 | 108 | },
|
|
105 | 112 | "source": [
|
106 | 113 | "Notice how we only appended the `inputs` tensor, without providing any knowledge of dependencies within variables, or which are observed or latent (i.e., to be infered) in our problem. In other words, we did not pass any condition or edge mask.\n",
|
107 | 114 | "\n",
|
108 |
| - "By now, the Simformer will automatically generate condition and edge masks, more specifically:\n", |
109 |
| - "\n", |
110 |
| - "- Condition masks will be generated according to a $\\text{Bernoulli}(p=0.5)$, for degenerate cases in which the entire mask of a given sample collapse to a full latent set, or full observed—i.e. full ones or full zeroes, a mechanism that randomly flips one variable take place\n", |
| 115 | + "Condition and edge masks will be rather generated at trainig time. Such generation can be ruled by the user passing either a `Callable`, a collection of masks (`list`, `set`, `Tensor`), or a fixed `Tensor`. If we do not specify anything sbi will automatically generate them:\n", |
| 116 | + "- Condition masks will be generated according to a $\\text{Bernoulli}(p=0.5)$, for degenerate cases in which the entire mask of a given sample collapse to full observed—i.e. full ones, a mechanism that randomly flips one variable at random take place\n", |
111 | 117 | "- Edge masks will be `None` (equivalent to a full tensor of ones, i.e., full attention)"
|
112 | 118 | ]
|
113 | 119 | },
|
|
118 | 124 | "outputs": [],
|
119 | 125 | "source": [
|
120 | 126 | "# Train the score estimator\n",
|
121 |
| - "score_estimator = inference.train()" |
| 127 | + "score_estimator = inference.train(max_num_epochs=100)" |
122 | 128 | ]
|
123 | 129 | },
|
124 | 130 | {
|
|
200 | 206 | "\n",
|
201 | 207 | "While the example provided is trivial, the Simformer becomes quite more interesting for more complex models, where there could exist more that two variables involved. And where the conditional distribution that one needs to infer is not assumed to be already choosen.\n",
|
202 | 208 | "\n",
|
203 |
| - "Nonetheless, the Simformer still allows to call the two common `build_posterior()` and `build_likelihood()` methods. As long as we specifiy which variables are intended as latent or observed from the posterior perspective." |
| 209 | + "Nonetheless, the Simformer still allows to call the two common `build_posterior()` and `build_likelihood()` methods. As long as we first inform the Simformer which variables are intended as latent or observed **from the posterior perspective**." |
204 | 210 | ]
|
205 | 211 | },
|
206 | 212 | {
|
|
216 | 222 | "cell_type": "markdown",
|
217 | 223 | "metadata": {},
|
218 | 224 | "source": [
|
219 |
| - "This setting of latent and posterior indexes *do not affect* the behaviour of the `build_conditional()` method, as it only relies on the masks passed when it is called. It simply informs the Simformer on how to interpret variables when `build_posterior()` and `build_likelihood()` are called. Crucially, note that the indexes passed must be interpret as latent and observed from the posterior perspective. These can also be set at init time through `posterior_latent_idx` and `posterior_observed_idx` parameters. " |
| 225 | + "This setting of latent and posterior indexes *do not affect* the behaviour of the `build_conditional()` method, as it only relies on the masks passed when it is called. It simply informs the Simformer on how to interpret variables when `build_posterior()` and `build_likelihood()` are called. Again, note that the indexes passed must be interpret as latent and observed **from the posterior perspective**. These can also be set at Simformer init time through `posterior_latent_idx` and `posterior_observed_idx` parameters. " |
220 | 226 | ]
|
221 | 227 | },
|
222 | 228 | {
|
|
243 | 249 | ")"
|
244 | 250 | ]
|
245 | 251 | },
|
| 252 | + { |
| 253 | + "cell_type": "markdown", |
| 254 | + "metadata": {}, |
| 255 | + "source": [ |
| 256 | + "Indeed, the posterior just found is similar to the conditional infered in previous steps. We can similarly build the likelihood again." |
| 257 | + ] |
| 258 | + }, |
246 | 259 | {
|
247 | 260 | "cell_type": "code",
|
248 | 261 | "execution_count": null,
|
|
0 commit comments