add backward error calculation documentation

mgovers · mgovers · commit c213c989e328 · 2025-02-12T09:21:30.000+01:00
Signed-off-by: Martijn Govers &lt;Martijn.Govers@Alliander.com&gt;
diff --git a/docs/advanced_documentation/algorithms/lu-solver.md b/docs/advanced_documentation/algorithms/lu-solver.md
@@ -167,20 +167,75 @@ slightly different matrix that is then iteratively refined.
 #### Pivot perturbation algorithm
 
 The following pivot perturbation algorithm works for both real and complex matrix equations. Let $M$
-be the matrix, $\left\||A\right\||_{\infty ,\text{bwod}}$ the
+be the matrix, $\left\|M\right\|_{\infty ,\text{bwod}}$ the
 [block-wise off-diagonal infinite norm](#block-wise-off-diagonal-infinite-matrix-norm) of the matrix.
 
-1. Set $\epsilon \gets \text{perturbation_threshold} * \left\||A\right\||_{\text{bwod}}$
-2. If $|\text{pivot_element}| < \epsilon$, then:
-   1. Set $\text{phase}\left(\text{pivot_element}\right) \gets \text{pivot_element} / |\text{pivot_element}|$.
-   2. Set $\text{pivot_element} \gets \text{perturbation_threshold} * \text{phase}\left(\text{pivot_element}\right)$.
+1. Set $\epsilon \gets \text{perturbation_threshold} * \left\|M\right\|_{\text{bwod}}$
+2. If $|\text{pivot_element}| \lt \epsilon$, then:
+   1. If $|\text{pivot_element}| = 0$, then:
+      1. Set $\text{direction} = 1$.
+      2. Proceed.
+   2. Else:
+      1. Set $\text{direction}\left(\text{pivot_element}\right) \gets \text{pivot_element} / |\text{pivot_element}|$.
+      2. Proceed.
+   3. Set $\text{pivot_element} \gets \epsilon * \text{direction}\left(\text{pivot_element}\right)$.
 
-<!-- In this equation, $\left\||A\right\||$ denotes the absolute value of the maximum element, as defined in
-[Arioli89](https://epubs.siam.org/doi/10.1137/0610013). -->
+$\text{direction}$ ensures that the complex phase of the pivot element is preserved, with a fallback
+the positive real axis when the pivot element is identically zero.
 
 #### Iterative refinement
 
-TODO
+This algorithm is heavily inspired by the GESP algorithm described in
+[Li99](https://www.semanticscholar.org/paper/A-Scalable-Sparse-Direct-Solver-Using-Static-Li-Demmel/7ea1c3360826ad3996f387eeb6d70815e1eb3761).
+
+The refinement process improves the solution to the matrix equation $A \cdot x = b$ as follows:
+In iteration step $i$, it assumes an existing approximation $x_i$ for $x$. It then defines
+the difference between the current best and the actual solution $\Delta x = x - x_i$.
+Substiting in the original equation yields $A \cdot (x_i + \Delta x) = b$, so that
+$A \cdot \Delta x = b - A \cdot x_i =: r$, where the residual $r$ can be calculated.
+An estimation for the left-hand side can be obtained by using the pivot-perturbed matrix
+$\tilde{A}$ instead of the original matrix A. Convergence can be reached if $r \to 0$, since then
+also $\left\|\Delta x\right\| \to 0$. Solving for $\Delta x$ and substituting back into
+$x_{i+1} = x_i + \Delta x$ provides the next best approximation $x_{i+1}$ for $x$.
+
+A measure for the quality of the approximation is given by the $\text{backward_error}$ (see also
+[backward error formula](#improved-backward-error-calculation)).
+
+Since the matrix $A$ does not change during this process, the LU decomposition remains valid
+throughout the process, so that this iterative refinement can be done at a reasonably low cost.
+
+Given the original matrix equation $A \cdot x = b$ to solve, the pivot perturbated matrix
+$\tilde{A}$ with a pre-calculated LU-decomposition, and the convergence threshold $\epsilon$,
+the algorithm is as follows:
+
+1. Initialize:
+   1. Set the initial estimate: $x_{\text{est}} = 0$.
+   2. Set the initial residual: $r \gets b$.
+   3. Set the initial backward error: $\text{backward_error} = \infty$.
+   4. Set the number of iterations to 0.
+2. Iteratively refine; loop:
+   1. Check stop criteria:
+      1. If $\text{backward_error} \leq \epsilon$, then:
+         1. Convergence reached: stop the refinement process.
+      2. Else, if the number of iterations > maximum allowed amount of iterations, then:
+         1. Convergence not reached; iterative refinement not possible: raise a sparse matrix
+            error.
+      3. Else:
+         1. Increase the number of iterations.
+         2. Proceed.
+   2. Solve $\tilde{A} \cdot \Delta x = r$ for $\Delta x$.
+   3. Calculate the backward error with the original $x$ and $r$ using the [backward error formula](#improved-backward-error-calculation).
+   4. Set the next estimation of $x$: $x_{\text{est}} \gets x_{\text{est}} + \Delta x$.
+   5. Set the residual: $r \gets b - A \cdot x$.
+
+Because the backward error is calculated on the $x$ and $r$ from the previous iteration, the
+iterative refinement loop will always be executed twice.
+
+The reason a sparse matrix error is raised and not an iteration diverge error, is that it is the
+iterative refinement of the matrix equation solution that cannot be solved in the set amount of
+iterations - not the set of power system equations. This will only happen when the matrix
+equation requires iterative refinement in the first place, which happens only when pivot
+perturbation is needed, namely in the case of an ill-conditioned matrix equation.
 
 #### Differences with literature
 
@@ -196,7 +251,7 @@ They are summarized below.
 contains an early-out criterion for the iterative refinement that checks for deminishing returns in
 consecutive iterations. It amounts to (in reverse order):
 
-1. If $$\text{backward_error} \gt \frac{1}{2}\text{last_backward_error}$$, then:
+1. If $\text{backward_error} \gt \frac{1}{2}\text{last_backward_error}$, then:
    1. Stop iterative refinement.
 2. Else:
    1. Go to next refinement iteration.
@@ -205,28 +260,79 @@ In power systems, however, the fact that the matrix may contain elements
 [spanning several orders of magnitude](#element-size-properties-of-power-system-equations) may cause
 slow convergence far away from the optimum. The diminishing return criterion would cause the
 algorithm to exit before the actual solution is found. Multiple refinement iterations may still
-yield better results. The power grid model therefore does not stop on deminishing returns. Instead,
-a maximum amount of iterations is used in combination with the error tolerance.
+yield better results. The power grid model therefore does not stop on deminishing returns. Instead, a maximum amount of iterations is used in combination with the error tolerance.
+
+##### Improved backward error calculation
+
+In power system equations, the matrix equation $A x = b$ can be very unbalanced: some entries
+in the matrix $A$ may be very large while others are zero or very small. The same may be true for
+the right-hand side of the equation $b$, as well as its solution $x$. In fact, there may be
+certain rows $i$ for which both $\left|b[i]\right|$ and
+$\sum_j \left|A[i,j]\right| \left|x[j]\right|$ are small and, therefore, their sum is prone to
+rounding errors, which may be several orders larger than machine precision.
+
+[Li99](https://www.semanticscholar.org/paper/A-Scalable-Sparse-Direct-Solver-Using-Static-Li-Demmel/7ea1c3360826ad3996f387eeb6d70815e1eb3761)
+uses the following backward error in the [iterative refinement algorithm](#iterative-refinement):
+
+$$
+\begin{align*}
+\text{backward_error}_{\text{Li}} &= \max_i \frac{\left|r[i]\right|}{\sum_j \left|A[i,j]\right| \left|x[j]\right| + \left|b[i]\right|} \\
+&= \max_i \frac{\left|b[i] - \sum_j A[i,j] x[j]\right|}{\sum_j \left|A[i,j]\right| \left|x[j]\right| + \left|b[i]\right|} \\
+&= \max_i \frac{\left|r[i]\right|}{\left(\left|A\right| \cdot \left|x\right| + \left|b\right|\right)\left[i\right]}
+\end{align*}
+$$
+
+In this equation, the symbolic notation
+$\left(\left|A\right|\cdot \left|x\right|\right)\left[i\right] := \sum_j |A[i,j]| |x[j]|$, as
+defined in [Arioli89](https://epubs.siam.org/doi/10.1137/0610013).
+
+Due to aforementioned, this is prone to rounding errors, and a single row with rounding errors may
+cause the entire iterative refinement to fail. The power grid model therefore use a modified
+version, in which the denominator is capped to a minimum value, determined by the maximum across all
+denominators:
+
+$$
+\begin{align*}
+D_{\max} &= \max_i {(|A||x| + |b|)_i} \\
+berr &= \max_{ i } {\frac{|r|_i}{\max ( (|A||x| + |b|)_i,  \epsilon_{berr} D_{\max}) }}
+\end{align*}
+$$
+
+$\epsilon$ may be chosen. $\epsilon = 0$ means no cut-off, while $\epsilon = 1$ means maximum
+cut-off. The former is prone to rounding errors, while the latter may hide issues in rows with small
+coefficients by supressing them in the backward error, even if that row's residual is relatively
+large, in favor of other rows with larger absolute, but smaller relative, residuals. In conclusion,
+$\epsilon$ should be chosen not too large and not too small.
+
+```{note}
+$\epsilon = 10^{-4}$ was experimentally determined to be a reasonably good value on a number of
+real-world MV grids.
+```
 
 ##### Block-wise off-diagonal infinite matrix norm
 
-For the pivot perturbation algorithm, a matrix norm is used to determine the relative size of the
-current pivot element compared to the rest of the matrix as a measure for the ill-conditioning. The
-norm is a variant to the $L_{\infty}$ norm of a matrix, which we call the block-wise off-diagonal
-infinite matrix norm ($L_{\infty ,\text{bwod}}$).
+For the [pivot perturbation algorithm](#pivot-perturbation-algorithm), a matrix norm is used to
+determine the relative size of the current pivot element compared to the rest of the matrix as a
+measure for the degree of ill-conditioning. The norm is a variant of the $L_{\infty}$ norm of a
+matrix, which we call the block-wise off-diagonal infinite matrix norm
+($L_{\infty ,\text{bwod}}$).
 
 Since the power grid model solves the matrix equations using a multi-scale matrix solver (dense
 intra-block, block-sparse for the full topological structure of the grid), the norm is also taken on
-two levels.
+those same levels, so the calculation of the norm is _block-wise_.
 
-In addition, because the diagonal blocks may have much larger elements than the off-diagonal
-elements, while the relevant information is contained mostly in in the off-diagonal blocks. As a
-result, the block-diagonal elements would undesirably dominate the norm. The power grid model
-therefore skips the diagonal blocks when calculating the norm.
+In addition, the diagonal blocks may have much larger elements than the off-diagonal elements, while
+the relevant information is contained mostly in the off-diagonal blocks. As a result, the
+block-diagonal elements would undesirably dominate the norm. The power grid model therefore
+restricts the calculation of the norm to _off-diagonal_ blocks.
 
-In short, the $L_{\infty ,\text{bwod}}$ norm it is the $L_{\infty}$ norm of the block-sparse matrix
+In short, the $L_{\infty ,\text{bwod}}$-norm it is the $L_{\infty}$ norm of the block-sparse matrix
 with the $L_{\infty}$ norm of the individual blocks as elements, where the block-diagonal elements
-are skipped at the block-level. The algorithm is as follows:
+are skipped at the block-level.
+
+###### Block-wise off-diagonal infinite matrix norm algorithm
+
+The algorithm is as follows:
 
 Let $M\equiv M\left[0:N, 0:N\right]$ be the $N\times N$-matrix with a block-sparse structure and
 $M\left[i,j\right]$ its block element at (0-based) indices $(i,j)$, where $i,j = 0..(N-1)$. In turn,
@@ -247,5 +353,87 @@ let $M[i,j] \equiv M_{i,j}\left[0:N_{i,j},0:N_{i,j}\right]$ be the dense block w
                1. Set $\text{block_row_norm} \gets \text{block_row_norm} + \left\|M_{i,j}\left[k,l\right]\right\|$.
             3. Calculate the new block norm: set
                $\text{block_norm} \gets \max\left\{\text{block_norm}, \text{block_row_norm}\right\}$.
+            4. Continue with the next row of the current block.
          4. Set $\text{row_norm} \gets \text{row_norm} + \text{block_norm}$.
-   3. Calculate the new norm: set $\text{norm} \gets \max\left\{\text{norm}, \text{row_norm}\right\}$.
+         5. Continue with the next block-column.
+   3. Calculate the new norm: set
+      $\text{norm} \gets \max\left\{\text{norm}, \text{row_norm}\right\}$.
+   4. Continue with the next block-row.
+
+###### Illustration of the block-wise off-diagonal infinite matrix norm calculation
+
+This section aims to illustrate how the $L_{\infty ,\text{bwod}}$-norm differs from a regular
+$L_{\infty}$-norm using the following examples.
+
+The first example shows how the taking the block-wise norm affects the calculation of the norm.
+
+$$
+\begin{pmatrix}
+\begin{pmatrix}
+0 && 0 \\
+0 && 0
+\end{pmatrix} && \begin{pmatrix}
+1 && 0 \\
+0 && 3
+\end{pmatrix} && \begin{pmatrix}
+3 && 0 \\
+0 && 0
+\end{pmatrix} \\
+\begin{pmatrix}
+5 && 0 \\
+0 && 0
+\end{pmatrix} &&
+\begin{pmatrix}
+0 && 0 \\
+0 && 0
+\end{pmatrix} && \begin{pmatrix}
+0 && 0 \\
+0 && \frac{1}{2}
+\end{pmatrix} \\
+\begin{pmatrix}
+0 && 0 \\
+0 && 0
+\end{pmatrix} &&
+\begin{pmatrix}
+0 && 0 \\
+0 && 0
+\end{pmatrix} &&
+\begin{pmatrix}
+1 && 0 \\
+0 && 1
+\end{pmatrix}
+\end{pmatrix}
+$$
+
+* The regular $L_{\infty}$-norm is $\max\left\{1+3, 3, 5, \frac{1}{2}, 1, 1\right\} = 5$.
+* The block-wise off-diagonal infinity $L_{\infty ,\text{bwod}}$-norm is
+  $\max\left\{0+\max\left\{1, 3\right\}+\max\left\{3, 0\right\},\max\left\{5, 0\right\} + \max\left\{0, \frac{1}{2}\right\}, 1\right\} = \max\left\{3+3, 5+\frac{1}{2}, 1, 1\right\} = 6$.
+
+The two norms clearly differ and even the elements that contribute most to the norm are different.
+
+The next example shows how keeping only the off-diagonal blocks affects the norm.
+
+$$
+\begin{pmatrix}
+\begin{pmatrix}
+20 && 20 \\
+30 && 0
+\end{pmatrix} && \begin{pmatrix}
+2 && 2 \\
+3 && 0
+\end{pmatrix} \\
+\begin{pmatrix}
+0 && 0 \\
+0 && 3
+\end{pmatrix} && \begin{pmatrix}
+100 && 0 \\
+0 && 1
+\end{pmatrix}
+\end{pmatrix}
+$$
+
+* The regular $L_{\infty}$-norm is $\max\left\{20+20+2+2,30+3,100,3+1\right\} = \max\left\{44,33,100,4\right\} = 100$.
+* The block-wise infinity norm with diagonals would be
+  $\max\left\{\max\left\{20+20, 30\right\}+\max\left\{2+2, 3\right\},\max\left\{0,3\right\} + \max\left\{100, 1\right\}\right\} = \max\left\{40+4, 3+100\right\} = \max\left\{44, 103\right\} = 103$.
+* The $L_{\infty ,\text{bwod}}$-norm is
+  $\max\left\{\max\left\{2+2, 3\right\},\max\left\{0,3\right\}\right\} = \max\left\{4, 3\right\} = 4$.