@@ -2134,9 +2134,13 @@ Adjoint Sensitivity Analysis
2134
2134
============================
2135
2135
2136
2136
Consider :eq: `ARKODE_IVP_simple_explicit `, but where the ODE also depends on some parameters
2137
- :math: `p` (that is, we have :math: `f(t,y,p)`). Now, suppose we have a functional :math: `g(y(t_f),p)`
2138
- for which we would like to compute the gradients :math: `\partial g/\partial y(t_0 )`
2139
- and/or :math: `\partial g/\partial p`. The adjoint method is one approach to obtaining the
2137
+ :math: `p` (that is, we have :math: `f(t,y,p)`). Now, suppose we have a functional,
2138
+
2139
+ .. math ::
2140
+ g(y(t_f),p),
2141
+
2142
+ for which we would like to compute the gradients :math: `\partial g(y(t_f),p)/\partial y(t_0 )`
2143
+ and/or :math: `\partial g(y(t_f),p)/\partial p`. The adjoint method is one approach to obtaining the
2140
2144
gradients that is particularly efficient when there are relatively few functionals and a
2141
2145
large number of parameters. With the adjoint method we solve the adjoint ODEs for :math: `\lambda (t)
2142
2146
\in \mathbb {R}^N` and :math: `\mu (t) \in \mathbb {R}^{N_s}`:
@@ -2150,7 +2154,7 @@ large number of parameters. With the adjoint method we solve the adjoint ODEs fo
2150
2154
\partial f/\partial y` is the Jacobian with respect to the dependent variable and :math: `f_p \equiv
2151
2155
\partial f/\partial p` is the Jacobian with respect to the parameters. The ARKStep module in ARKODE
2152
2156
provides adjoint sensitivity analysis based on the *discrete * formulation, i.e., given an s-stage explicit
2153
- Runge--Kutta method (as in :eq: `ARKODE_ERK `, but without the embedding), the discrete adjoint
2157
+ Runge--Kutta method (as in :eq: `ARKODE_ERK `, but without the embedding), the discrete adjoint
2154
2158
to compute :math: `\lambda _n` and :math: `\mu _n` starting from :math: `\lambda _{n+1 }` and
2155
2159
:math: `\mu _{n+1 }` is given by
2156
2160
@@ -2175,20 +2179,23 @@ For more information on performing discrete adjoint sensitivity analysis see, :n
2175
2179
Discrete vs. Continuous Adjoint Method
2176
2180
--------------------------------------
2177
2181
2178
- We note that in addition to the discrete adjoint approach, there is a second adjoint method that is used in the literature. In the *continuous *
2179
- approach, we derive the sensitivity equations directly from the model and then we integrate them
2180
- with a time integration method. This is the approach implemented in the SUNDIALS :ref: `CVODES
2181
- <CVODES.Mathematics.ASA>` and :ref: `IDAS <IDAS.Mathematics.ASA >` packages. In the *discrete *
2182
- approach, the model equations are discretized with the time integration method first, and then we
2183
- derive the adjoints of the discretized equations. It is understood that the continuous adjoint
2184
- method can be problematic in the context of optimization problems because the continuous adjoint
2185
- method provides an approximation to the gradient of a continuous cost function while the optimizer
2186
- is expecting the gradient of the discrete cost function. The discrepancy means that the optimizer
2187
- can fail to converge further once it is near a local minimum :cite:p: `giles2000introduction `. On
2188
- the other hand, the discrete adjoint method provides the exact gradient of the discrete cost
2189
- function allowing the optimizer to fully converge. Consequently, the discrete adjoint method is
2190
- often preferable in optimization despite its own drawbacks -- such as its (relatively) increased
2191
- memory usage and the possible introduction of unphysical computational modes
2192
- :cite:p: `sirkes1997finite `. This is not to say that the discrete adjoint method is always the better
2193
- choice over the continuous adjoint method in optimization. Practical considerations may lead one to
2194
- choose the continuous approach.
2182
+ We note that in addition to the discrete adjoint approach, there is a second adjoint method that is
2183
+ sometimes used -- the *continuous * adjoint method. In the continuous approach, we derive the
2184
+ sensitivity equations directly from the model and then we integrate them with a time integration
2185
+ method. This is the approach implemented in the SUNDIALS :ref: `CVODES <CVODES.Mathematics.ASA >` and
2186
+ :ref: `IDAS <IDAS.Mathematics.ASA >` packages. In the *discrete * approach, the model equations are
2187
+ discretized with the time integration method first, and then we derive the adjoints of the
2188
+ discretized equations. It is understood that the continuous adjoint method can be problematic in the
2189
+ context of optimization problems because the continuous adjoint method provides an approximation to
2190
+ the gradient of a continuous cost function while the optimizer is expecting the gradient of the
2191
+ discrete cost function. The discrepancy means that the optimizer can fail to converge further once
2192
+ it is near a local minimum :cite:p: `giles2000introduction `. On the other hand, the discrete adjoint
2193
+ method provides the exact gradient of the discrete cost function allowing the optimizer to fully
2194
+ converge. Consequently, the discrete adjoint method is often preferable in optimization despite its
2195
+ own drawbacks -- such as its (relatively) increased memory usage and the possible introduction of
2196
+ unphysical computational modes :cite:p: `sirkes1997finite `. This is not to say that the discrete
2197
+ adjoint approach is always the better choice over the continuous adjoint approach in optimization.
2198
+ Computational efficiency and stability of one approach over the other can be both problem and method
2199
+ dependent. Section 8 in the paper :cite:p: `rackauckas2020universal ` discusses the tradeoffs further
2200
+ and provides numerous references that may help inform users in choosing between the discrete and
2201
+ continuous adjoint approaches.
0 commit comments