doc clarifications

balos1 · balos1 · commit 0416868576dc · 2024-09-03T13:57:49.000-07:00
diff --git a/doc/arkode/guide/source/Mathematics.rst b/doc/arkode/guide/source/Mathematics.rst
@@ -2134,9 +2134,13 @@ Adjoint Sensitivity Analysis
 ============================
 
 Consider :eq:`ARKODE_IVP_simple_explicit`, but where the ODE also depends on some parameters
-:math:`p` (that is, we have :math:`f(t,y,p)`). Now, suppose we have a functional :math:`g(y(t_f),p)`
-for which we would like to compute the gradients :math:`\partial g/\partial y(t_0)`
-and/or :math:`\partial g/\partial p`. The adjoint method is one approach to obtaining the
+:math:`p` (that is, we have :math:`f(t,y,p)`). Now, suppose we have a functional,
+
+.. math::
+   g(y(t_f),p),
+
+for which we would like to compute the gradients :math:`\partial g(y(t_f),p)/\partial y(t_0)`
+and/or :math:`\partial g(y(t_f),p)/\partial p`. The adjoint method is one approach to obtaining the
 gradients that is particularly efficient when there are relatively few functionals and a
 large number of parameters. With the adjoint method we solve the adjoint ODEs for :math:`\lambda(t)
 \in \mathbb{R}^N` and :math:`\mu(t) \in \mathbb{R}^{N_s}`:
@@ -2150,7 +2154,7 @@ large number of parameters. With the adjoint method we solve the adjoint ODEs fo
 \partial f/\partial y` is the Jacobian with respect to the dependent variable and :math:`f_p \equiv
 \partial f/\partial p` is the Jacobian with respect to the parameters. The ARKStep module in ARKODE
 provides adjoint sensitivity analysis based on the *discrete* formulation, i.e., given an s-stage explicit
-Runge--Kutta method (as in :eq:`ARKODE_ERK`, but without the embedding), the discrete adjoint 
+Runge--Kutta method (as in :eq:`ARKODE_ERK`, but without the embedding), the discrete adjoint
 to compute :math:`\lambda_n` and :math:`\mu_n` starting from :math:`\lambda_{n+1}` and
 :math:`\mu_{n+1}` is given by
 
@@ -2175,20 +2179,23 @@ For more information on performing discrete adjoint sensitivity analysis see, :n
 Discrete vs. Continuous Adjoint Method
 --------------------------------------
 
-We note that in addition to the discrete adjoint approach, there is a second adjoint method that is used in the literature.  In the *continuous*
-approach, we derive the sensitivity equations directly from the model and then we integrate them
-with a time integration method. This is the approach implemented in the SUNDIALS :ref:`CVODES
-<CVODES.Mathematics.ASA>` and :ref:`IDAS <IDAS.Mathematics.ASA>` packages. In the *discrete*
-approach, the model equations are discretized with the time integration method first, and then we
-derive the adjoints of the discretized equations. It is understood that the continuous adjoint
-method can be problematic in the context of optimization problems because the continuous adjoint
-method provides an approximation to the gradient of a continuous cost function while the optimizer
-is expecting the gradient of the discrete cost function. The discrepancy means that the optimizer
-can fail to converge further once it is near a local minimum :cite:p:`giles2000introduction`. On
-the other hand, the discrete adjoint method provides the exact gradient of the discrete cost
-function allowing the optimizer to fully converge. Consequently, the discrete adjoint method is
-often preferable in optimization despite its own drawbacks -- such as its (relatively) increased
-memory usage and the possible introduction of unphysical computational modes
-:cite:p:`sirkes1997finite`. This is not to say that the discrete adjoint method is always the better
-choice over the continuous adjoint method in optimization. Practical considerations may lead one to
-choose the continuous approach.
+We note that in addition to the discrete adjoint approach, there is a second adjoint method that is
+sometimes used -- the *continuous* adjoint method. In the continuous approach, we derive the
+sensitivity equations directly from the model and then we integrate them with a time integration
+method. This is the approach implemented in the SUNDIALS :ref:`CVODES <CVODES.Mathematics.ASA>` and
+:ref:`IDAS <IDAS.Mathematics.ASA>` packages. In the *discrete* approach, the model equations are
+discretized with the time integration method first, and then we derive the adjoints of the
+discretized equations. It is understood that the continuous adjoint method can be problematic in the
+context of optimization problems because the continuous adjoint method provides an approximation to
+the gradient of a continuous cost function while the optimizer is expecting the gradient of the
+discrete cost function. The discrepancy means that the optimizer can fail to converge further once
+it is near a local minimum :cite:p:`giles2000introduction`. On the other hand, the discrete adjoint
+method provides the exact gradient of the discrete cost function allowing the optimizer to fully
+converge. Consequently, the discrete adjoint method is often preferable in optimization despite its
+own drawbacks -- such as its (relatively) increased memory usage and the possible introduction of
+unphysical computational modes :cite:p:`sirkes1997finite`. This is not to say that the discrete
+adjoint approach is always the better choice over the continuous adjoint approach in optimization.
+Computational efficiency and stability of one approach over the other can be both problem and method
+dependent. Section 8 in the paper :cite:p:`rackauckas2020universal` discusses the tradeoffs further
+and provides numerous references that may help inform users in choosing between the discrete and
+continuous adjoint approaches.
diff --git a/doc/shared/sundials.bib b/doc/shared/sundials.bib
@@ -2413,4 +2413,11 @@ @article{sanduDiscrete2006
   issn = {0302-9743},
   doi = {10.1007/11758549_76},
   pages = {550--557}
+}
+
+@article{rackauckas2020universal,
+  title={Universal differential equations for scientific machine learning},
+  author={Rackauckas, Christopher and Ma, Yingbo and Martensen, Julius and Warner, Collin and Zubov, Kirill and Supekar, Rohit and Skinner, Dominic and Ramadhan, Ali and Edelman, Alan},
+  journal={arXiv preprint arXiv:2001.04385},
+  year={2020}
 }