You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/contributing/implementing_distribution.md
+25-21Lines changed: 25 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
(implementing-a-distribution)=
2
-
# Implementing a Distribution
2
+
# Implementing a RandomVariable Distribution
3
3
4
4
This guide provides an overview on how to implement a distribution for PyMC version `>=4.0.0`.
5
5
It is designed for developers who wish to add a new distribution to the library.
6
6
Users will not be aware of all this complexity and should instead make use of helper methods such as `~pymc.DensityDist`.
7
7
8
8
PyMC {class}`~pymc.Distribution` builds on top of Aesara's {class}`~aesara.tensor.random.op.RandomVariable`, and implements `logp`, `logcdf` and `moment` methods as well as other initialization and validation helpers.
9
-
Most notably `shape/dims` kwargs, alternative parametrizations, and default `transforms`.
9
+
Most notably `shape/dims/observed` kwargs, alternative parametrizations, and default `transform`.
10
10
11
11
Here is a summary check-list of the steps needed to implement a new distribution.
12
12
Each section will be expanded below:
@@ -88,11 +88,11 @@ blah = BlahRV()
88
88
Some important things to keep in mind:
89
89
90
90
1. Everything inside the `rng_fn` method is pure Python code (as are the inputs) and should not make use of other `Aesara` symbolic ops. The random method should make use of the `rng` which is a NumPy {class}`~numpy.random.RandomState`, so that samples are reproducible.
91
-
1. Non-default `RandomVariable` dimensions will end up in the `rng_fn` via the `size` kwarg. The `rng_fn` will have to take this into consideration for correct output. `size` is the specification used by NumPy and SciPy and works like PyMC `shape` for univariate distributions, but is different for multivariate distributions. For multivariate distributions the __`size` excludes the `ndim_supp` support dimensions__, whereas the __`shape` of the resulting `TensorVariabe` or `ndarray` includes the support dimensions__. This [discussion](https://github.com/numpy/numpy/issues/17669) may be helpful to get more context.
91
+
1. Non-default `RandomVariable` dimensions will end up in the `rng_fn` via the `size` kwarg. The `rng_fn` will have to take this into consideration for correct output. `size` is the specification used by NumPy and SciPy and works like PyMC `shape` for univariate distributions, but is different for multivariate distributions. For multivariate distributions the __`size` excludes the `ndim_supp` support dimensions__, whereas the __`shape` of the resulting `TensorVariabe` or `ndarray` includes the support dimensions__. For more context check {doc}`The dimensionality notebook <pymc:dimensionality=>`.
92
92
1.`Aesara` tries to infer the output shape of the `RandomVariable` (given a user-specified size) by introspection of the `ndim_supp` and `ndim_params` attributes. However, the default method may not work for more complex distributions. In that case, custom `_supp_shape_from_params` (and less probably, `_infer_shape`) should also be implemented in the new `RandomVariable` class. One simple example is seen in the {class}`~pymc.DirichletMultinomialRV` where it was necessary to specify the `rep_param_idx` so that the `default_supp_shape_from_params` helper method can do its job. In more complex cases, it may not suffice to use this default helper. This could happen for instance if the argument values determined the support shape of the distribution, as happens in the `~pymc.distributions.multivarite._LKJCholeskyCovRV`.
93
93
1. It's okay to use the `rng_fn``classmethods` of other Aesara and PyMC `RandomVariables` inside the new `rng_fn`. For example if you are implementing a negative HalfNormal `RandomVariable`, your `rng_fn` can simply return `- halfnormal.rng_fn(rng, scale, size)`.
94
94
95
-
*Note: In addition to `size`, the PyMC API also provides `shape`and `dims` as alternatives to define a distribution dimensionality, but this is taken care of by {class}`~pymc.Distribution`, and should not require any extra changes.*
95
+
*Note: In addition to `size`, the PyMC API also provides `shape`, `dims`and `observed` as alternatives to define a distribution dimensionality, but this is taken care of by {class}`~pymc.Distribution`, and should not require any extra changes.*
96
96
97
97
For a quick test that your new `RandomVariable``Op` is working, you can call the `Op` with the necessary parameters and then call `eval()` on the returned object:
98
98
@@ -129,9 +129,11 @@ Here is how the example continues:
129
129
130
130
```python
131
131
132
+
import aesara.tensor as at
132
133
from pymc.aesaraf import floatX, intX
133
134
from pymc.distributions.continuous import PositiveContinuous
134
135
from pymc.distributions.dist_math import check_parameters
136
+
from pymc.distributions.shape_utils import rv_size_is_none
135
137
136
138
137
139
# Subclassing `PositiveContinuous` will dispatch a default `log` transformation
This is usually needed for Multivariate distributions.
292
-
You can see an example in {class}`~pymc.test.test_random.TestDirichlet`.
295
+
You can see an example in {class}`~pymc.tests.distributions.test_multivariate.TestDirichlet`.
293
296
294
297
### Notes on `check_pymcs_draws_match_reference` test
295
298
296
299
The `check_pymcs_draws_match_reference` is a very simple test for the equality of draws from the `RandomVariable` and the exact same python function, given the same inputs and random seed.
297
300
A small number (`size=15`) is checked. This is not supposed to be a test for the correctness of the random number generator.
298
-
The latter kind of test (if warranted) can be performed with the aid of `pymc_random` and `pymc_random_discrete` methods in the same test file, which will perform an expensive statistical comparison between the `RandomVariable.rng_fn` and a reference Python function.
301
+
The latter kind of test (if warranted) can be performed with the aid of `pymc_random` and `pymc_random_discrete` methods, which will perform an expensive statistical comparison between the `RandomVariable.rng_fn` and a reference Python function.
299
302
This kind of test only makes sense if there is a good independent generator reference (i.e., not just the same composition of NumPy / SciPy calls that is done inside `rng_fn`).
300
303
301
304
Finally, when your `rng_fn` is doing something more than just calling a NumPy or SciPy method, you will need to set up an equivalent seeded function with which to compare for the exact draws (instead of relying on `seeded_[scipy|numpy]_distribution_builder`).
302
-
You can find an example in {class}`~pymc.tests.test_distributions_random.TestWeibull`, whose `rng_fn` returns `beta * np.random.weibull(alpha, size=size)`.
305
+
You can find an example in {class}`~pymc.tests.distributions.test_continuous.TestWeibull`, whose `rng_fn` returns `beta * np.random.weibull(alpha, size=size)`.
303
306
304
307
305
308
## 4. Adding tests for the `logp` / `logcdf` methods
306
309
307
-
Tests for the `logp` and `logcdf` methods are contained in `pymc/tests/test_distributions.py`, and most make use of the `TestMatchesScipy` class, which provides `check_logp`, `check_logcdf`, and
308
-
`check_selfconsistency_discrete_logcdf` standard methods.
309
-
These will suffice for most distributions.
310
+
Tests for the `logp` and `logcdf` mostly make use of the helpers `check_logp`, `check_logcdf`, and
311
+
`check_selfconsistency_discrete_logcdf` implemented in `~pymc.tests.distributions.util`
310
312
311
313
```python
312
-
314
+
from pymc.tests.distributions.util import check_logp, check_logcdf, Domain
313
315
from pymc.tests.helpers import select_by_precision
0 commit comments