Skip to content

Commit 8879954

Browse files
ricardoV94twiecki
authored andcommitted
Update implementing_distribution.md
1 parent a085ab9 commit 8879954

File tree

1 file changed

+25
-21
lines changed

1 file changed

+25
-21
lines changed

docs/source/contributing/implementing_distribution.md

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
(implementing-a-distribution)=
2-
# Implementing a Distribution
2+
# Implementing a RandomVariable Distribution
33

44
This guide provides an overview on how to implement a distribution for PyMC version `>=4.0.0`.
55
It is designed for developers who wish to add a new distribution to the library.
66
Users will not be aware of all this complexity and should instead make use of helper methods such as `~pymc.DensityDist`.
77

88
PyMC {class}`~pymc.Distribution` builds on top of Aesara's {class}`~aesara.tensor.random.op.RandomVariable`, and implements `logp`, `logcdf` and `moment` methods as well as other initialization and validation helpers.
9-
Most notably `shape/dims` kwargs, alternative parametrizations, and default `transforms`.
9+
Most notably `shape/dims/observed` kwargs, alternative parametrizations, and default `transform`.
1010

1111
Here is a summary check-list of the steps needed to implement a new distribution.
1212
Each section will be expanded below:
@@ -88,11 +88,11 @@ blah = BlahRV()
8888
Some important things to keep in mind:
8989

9090
1. Everything inside the `rng_fn` method is pure Python code (as are the inputs) and should not make use of other `Aesara` symbolic ops. The random method should make use of the `rng` which is a NumPy {class}`~numpy.random.RandomState`, so that samples are reproducible.
91-
1. Non-default `RandomVariable` dimensions will end up in the `rng_fn` via the `size` kwarg. The `rng_fn` will have to take this into consideration for correct output. `size` is the specification used by NumPy and SciPy and works like PyMC `shape` for univariate distributions, but is different for multivariate distributions. For multivariate distributions the __`size` excludes the `ndim_supp` support dimensions__, whereas the __`shape` of the resulting `TensorVariabe` or `ndarray` includes the support dimensions__. This [discussion](https://github.com/numpy/numpy/issues/17669) may be helpful to get more context.
91+
1. Non-default `RandomVariable` dimensions will end up in the `rng_fn` via the `size` kwarg. The `rng_fn` will have to take this into consideration for correct output. `size` is the specification used by NumPy and SciPy and works like PyMC `shape` for univariate distributions, but is different for multivariate distributions. For multivariate distributions the __`size` excludes the `ndim_supp` support dimensions__, whereas the __`shape` of the resulting `TensorVariabe` or `ndarray` includes the support dimensions__. For more context check {doc}`The dimensionality notebook <pymc:dimensionality=>`.
9292
1. `Aesara` tries to infer the output shape of the `RandomVariable` (given a user-specified size) by introspection of the `ndim_supp` and `ndim_params` attributes. However, the default method may not work for more complex distributions. In that case, custom `_supp_shape_from_params` (and less probably, `_infer_shape`) should also be implemented in the new `RandomVariable` class. One simple example is seen in the {class}`~pymc.DirichletMultinomialRV` where it was necessary to specify the `rep_param_idx` so that the `default_supp_shape_from_params` helper method can do its job. In more complex cases, it may not suffice to use this default helper. This could happen for instance if the argument values determined the support shape of the distribution, as happens in the `~pymc.distributions.multivarite._LKJCholeskyCovRV`.
9393
1. It's okay to use the `rng_fn` `classmethods` of other Aesara and PyMC `RandomVariables` inside the new `rng_fn`. For example if you are implementing a negative HalfNormal `RandomVariable`, your `rng_fn` can simply return `- halfnormal.rng_fn(rng, scale, size)`.
9494

95-
*Note: In addition to `size`, the PyMC API also provides `shape` and `dims` as alternatives to define a distribution dimensionality, but this is taken care of by {class}`~pymc.Distribution`, and should not require any extra changes.*
95+
*Note: In addition to `size`, the PyMC API also provides `shape`, `dims` and `observed` as alternatives to define a distribution dimensionality, but this is taken care of by {class}`~pymc.Distribution`, and should not require any extra changes.*
9696

9797
For a quick test that your new `RandomVariable` `Op` is working, you can call the `Op` with the necessary parameters and then call `eval()` on the returned object:
9898

@@ -129,9 +129,11 @@ Here is how the example continues:
129129

130130
```python
131131

132+
import aesara.tensor as at
132133
from pymc.aesaraf import floatX, intX
133134
from pymc.distributions.continuous import PositiveContinuous
134135
from pymc.distributions.dist_math import check_parameters
136+
from pymc.distributions.shape_utils import rv_size_is_none
135137

136138

137139
# Subclassing `PositiveContinuous` will dispatch a default `log` transformation
@@ -231,15 +233,16 @@ pm.logcdf(blah, [-0.5, 1.5]).eval()
231233

232234
## 3. Adding tests for the new `RandomVariable`
233235

234-
Tests for new `RandomVariables` are mostly located in `pymc/tests/test_distributions_random.py`.
235-
Most tests can be accommodated by the default `BaseTestDistribution` class, which provides default tests for checking:
236+
Tests for new `RandomVariables` are mostly located in `pymc/tests/distributions/test_*.py`.
237+
Most tests can be accommodated by the default `BaseTestDistributionRandom` class, which provides default tests for checking:
236238
1. Expected inputs are passed to the `rv_op` by the `dist` `classmethod`, via `check_pymc_params_match_rv_op`
237239
1. Expected (exact) draws are being returned, via `check_pymc_draws_match_reference`
238240
1. Shape variable inference is correct, via `check_rv_size`
239241

240242
```python
243+
from pymc.tests.distributions.util import BaseTestDistributionRandom, seeded_scipy_distribution_builder
241244

242-
class TestBlah(BaseTestDistribution):
245+
class TestBlah(BaseTestDistributionRandom):
243246

244247
pymc_dist = pm.Blah
245248
# Parameters with which to test the blah pymc Distribution
@@ -266,7 +269,7 @@ For instance, if it's just the inverse, testing with `1.0` is not very informati
266269

267270
```python
268271

269-
class TestBlahAltParam2(BaseTestDistribution):
272+
class TestBlahAltParam2(BaseTestDistributionRandom):
270273

271274
pymc_dist = pm.Blah
272275
# param2 is equivalent to 1 / alt_param2
@@ -276,7 +279,7 @@ class TestBlahAltParam2(BaseTestDistribution):
276279

277280
```
278281

279-
Custom tests can also be added to the class as is done for the {class}`~pymc.tests.test_random.TestFlat`.
282+
Custom tests can also be added to the class as is done for the {class}`~pymc.tests.distributions.test_continuous.TestFlat`.
280283

281284
### Note on `check_rv_size` test:
282285

@@ -289,37 +292,36 @@ tests_to_run = ["check_rv_size"]
289292
```
290293

291294
This is usually needed for Multivariate distributions.
292-
You can see an example in {class}`~pymc.test.test_random.TestDirichlet`.
295+
You can see an example in {class}`~pymc.tests.distributions.test_multivariate.TestDirichlet`.
293296

294297
### Notes on `check_pymcs_draws_match_reference` test
295298

296299
The `check_pymcs_draws_match_reference` is a very simple test for the equality of draws from the `RandomVariable` and the exact same python function, given the same inputs and random seed.
297300
A small number (`size=15`) is checked. This is not supposed to be a test for the correctness of the random number generator.
298-
The latter kind of test (if warranted) can be performed with the aid of `pymc_random` and `pymc_random_discrete` methods in the same test file, which will perform an expensive statistical comparison between the `RandomVariable.rng_fn` and a reference Python function.
301+
The latter kind of test (if warranted) can be performed with the aid of `pymc_random` and `pymc_random_discrete` methods, which will perform an expensive statistical comparison between the `RandomVariable.rng_fn` and a reference Python function.
299302
This kind of test only makes sense if there is a good independent generator reference (i.e., not just the same composition of NumPy / SciPy calls that is done inside `rng_fn`).
300303

301304
Finally, when your `rng_fn` is doing something more than just calling a NumPy or SciPy method, you will need to set up an equivalent seeded function with which to compare for the exact draws (instead of relying on `seeded_[scipy|numpy]_distribution_builder`).
302-
You can find an example in {class}`~pymc.tests.test_distributions_random.TestWeibull`, whose `rng_fn` returns `beta * np.random.weibull(alpha, size=size)`.
305+
You can find an example in {class}`~pymc.tests.distributions.test_continuous.TestWeibull`, whose `rng_fn` returns `beta * np.random.weibull(alpha, size=size)`.
303306

304307

305308
## 4. Adding tests for the `logp` / `logcdf` methods
306309

307-
Tests for the `logp` and `logcdf` methods are contained in `pymc/tests/test_distributions.py`, and most make use of the `TestMatchesScipy` class, which provides `check_logp`, `check_logcdf`, and
308-
`check_selfconsistency_discrete_logcdf` standard methods.
309-
These will suffice for most distributions.
310+
Tests for the `logp` and `logcdf` mostly make use of the helpers `check_logp`, `check_logcdf`, and
311+
`check_selfconsistency_discrete_logcdf` implemented in `~pymc.tests.distributions.util`
310312

311313
```python
312-
314+
from pymc.tests.distributions.util import check_logp, check_logcdf, Domain
313315
from pymc.tests.helpers import select_by_precision
314316

315317
R = Domain([-np.inf, -2.1, -1, -0.01, 0.0, 0.01, 1, 2.1, np.inf])
316318
Rplus = Domain([0, 0.01, 0.1, 0.9, 0.99, 1, 1.5, 2, 100, np.inf])
317319

318-
...
319320

320-
def test_blah(self):
321321

322-
self.check_logp(
322+
def test_blah():
323+
324+
check_logp(
323325
pymc_dist=pm.Blah,
324326
# Domain of the distribution values
325327
domain=R,
@@ -333,7 +335,7 @@ def test_blah(self):
333335
n_samples=100,
334336
)
335337

336-
self.check_logcdf(
338+
check_logcdf(
337339
pymc_dist=pm.Blah,
338340
domain=R,
339341
paramdomains={"mu": R, "sigma": Rplus},
@@ -370,15 +372,17 @@ def test_blah_logcdf(self):
370372

371373
## 5. Adding tests for the `moment` method
372374

373-
Tests for the `moment` method are contained in `pymc/tests/test_distributions_moments.py`, and make use of the function `assert_moment_is_expected`
375+
Tests for the `moment` make use of the function `assert_moment_is_expected`
374376
which checks if:
375377
1. Moments return the `expected` values
376378
1. Moments have the expected size and shape
379+
1. Moments have a finite logp
377380

378381
```python
379382

380383
import pytest
381384
from pymc.distributions import Blah
385+
from pymc.tests.distributions.util import assert_moment_is_expected
382386

383387
@pytest.mark.parametrize(
384388
"param1, param2, size, expected",

0 commit comments

Comments
 (0)