Skip to content

Commit 0416868

Browse files
committed
doc clarifications
1 parent ac0adec commit 0416868

File tree

2 files changed

+35
-21
lines changed

2 files changed

+35
-21
lines changed

doc/arkode/guide/source/Mathematics.rst

Lines changed: 28 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2134,9 +2134,13 @@ Adjoint Sensitivity Analysis
21342134
============================
21352135

21362136
Consider :eq:`ARKODE_IVP_simple_explicit`, but where the ODE also depends on some parameters
2137-
:math:`p` (that is, we have :math:`f(t,y,p)`). Now, suppose we have a functional :math:`g(y(t_f),p)`
2138-
for which we would like to compute the gradients :math:`\partial g/\partial y(t_0)`
2139-
and/or :math:`\partial g/\partial p`. The adjoint method is one approach to obtaining the
2137+
:math:`p` (that is, we have :math:`f(t,y,p)`). Now, suppose we have a functional,
2138+
2139+
.. math::
2140+
g(y(t_f),p),
2141+
2142+
for which we would like to compute the gradients :math:`\partial g(y(t_f),p)/\partial y(t_0)`
2143+
and/or :math:`\partial g(y(t_f),p)/\partial p`. The adjoint method is one approach to obtaining the
21402144
gradients that is particularly efficient when there are relatively few functionals and a
21412145
large number of parameters. With the adjoint method we solve the adjoint ODEs for :math:`\lambda(t)
21422146
\in \mathbb{R}^N` and :math:`\mu(t) \in \mathbb{R}^{N_s}`:
@@ -2150,7 +2154,7 @@ large number of parameters. With the adjoint method we solve the adjoint ODEs fo
21502154
\partial f/\partial y` is the Jacobian with respect to the dependent variable and :math:`f_p \equiv
21512155
\partial f/\partial p` is the Jacobian with respect to the parameters. The ARKStep module in ARKODE
21522156
provides adjoint sensitivity analysis based on the *discrete* formulation, i.e., given an s-stage explicit
2153-
Runge--Kutta method (as in :eq:`ARKODE_ERK`, but without the embedding), the discrete adjoint
2157+
Runge--Kutta method (as in :eq:`ARKODE_ERK`, but without the embedding), the discrete adjoint
21542158
to compute :math:`\lambda_n` and :math:`\mu_n` starting from :math:`\lambda_{n+1}` and
21552159
:math:`\mu_{n+1}` is given by
21562160

@@ -2175,20 +2179,23 @@ For more information on performing discrete adjoint sensitivity analysis see, :n
21752179
Discrete vs. Continuous Adjoint Method
21762180
--------------------------------------
21772181

2178-
We note that in addition to the discrete adjoint approach, there is a second adjoint method that is used in the literature. In the *continuous*
2179-
approach, we derive the sensitivity equations directly from the model and then we integrate them
2180-
with a time integration method. This is the approach implemented in the SUNDIALS :ref:`CVODES
2181-
<CVODES.Mathematics.ASA>` and :ref:`IDAS <IDAS.Mathematics.ASA>` packages. In the *discrete*
2182-
approach, the model equations are discretized with the time integration method first, and then we
2183-
derive the adjoints of the discretized equations. It is understood that the continuous adjoint
2184-
method can be problematic in the context of optimization problems because the continuous adjoint
2185-
method provides an approximation to the gradient of a continuous cost function while the optimizer
2186-
is expecting the gradient of the discrete cost function. The discrepancy means that the optimizer
2187-
can fail to converge further once it is near a local minimum :cite:p:`giles2000introduction`. On
2188-
the other hand, the discrete adjoint method provides the exact gradient of the discrete cost
2189-
function allowing the optimizer to fully converge. Consequently, the discrete adjoint method is
2190-
often preferable in optimization despite its own drawbacks -- such as its (relatively) increased
2191-
memory usage and the possible introduction of unphysical computational modes
2192-
:cite:p:`sirkes1997finite`. This is not to say that the discrete adjoint method is always the better
2193-
choice over the continuous adjoint method in optimization. Practical considerations may lead one to
2194-
choose the continuous approach.
2182+
We note that in addition to the discrete adjoint approach, there is a second adjoint method that is
2183+
sometimes used -- the *continuous* adjoint method. In the continuous approach, we derive the
2184+
sensitivity equations directly from the model and then we integrate them with a time integration
2185+
method. This is the approach implemented in the SUNDIALS :ref:`CVODES <CVODES.Mathematics.ASA>` and
2186+
:ref:`IDAS <IDAS.Mathematics.ASA>` packages. In the *discrete* approach, the model equations are
2187+
discretized with the time integration method first, and then we derive the adjoints of the
2188+
discretized equations. It is understood that the continuous adjoint method can be problematic in the
2189+
context of optimization problems because the continuous adjoint method provides an approximation to
2190+
the gradient of a continuous cost function while the optimizer is expecting the gradient of the
2191+
discrete cost function. The discrepancy means that the optimizer can fail to converge further once
2192+
it is near a local minimum :cite:p:`giles2000introduction`. On the other hand, the discrete adjoint
2193+
method provides the exact gradient of the discrete cost function allowing the optimizer to fully
2194+
converge. Consequently, the discrete adjoint method is often preferable in optimization despite its
2195+
own drawbacks -- such as its (relatively) increased memory usage and the possible introduction of
2196+
unphysical computational modes :cite:p:`sirkes1997finite`. This is not to say that the discrete
2197+
adjoint approach is always the better choice over the continuous adjoint approach in optimization.
2198+
Computational efficiency and stability of one approach over the other can be both problem and method
2199+
dependent. Section 8 in the paper :cite:p:`rackauckas2020universal` discusses the tradeoffs further
2200+
and provides numerous references that may help inform users in choosing between the discrete and
2201+
continuous adjoint approaches.

doc/shared/sundials.bib

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2413,4 +2413,11 @@ @article{sanduDiscrete2006
24132413
issn = {0302-9743},
24142414
doi = {10.1007/11758549_76},
24152415
pages = {550--557}
2416+
}
2417+
2418+
@article{rackauckas2020universal,
2419+
title={Universal differential equations for scientific machine learning},
2420+
author={Rackauckas, Christopher and Ma, Yingbo and Martensen, Julius and Warner, Collin and Zubov, Kirill and Supekar, Rohit and Skinner, Dominic and Ramadhan, Ali and Edelman, Alan},
2421+
journal={arXiv preprint arXiv:2001.04385},
2422+
year={2020}
24162423
}

0 commit comments

Comments
 (0)