Use nbviewer links for GP notebooks

krasserm · krasserm · commit bffc6dc89504 · 2020-11-15T09:10:42.000+01:00
diff --git a/README.md b/README.md
@@ -12,12 +12,12 @@ some of the notebooks via [nbviewer](https://nbviewer.jupyter.org/) to ensure a
   [PyMC3 implementation](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-linear-regression/bayesian_linear_regression_pymc3.ipynb).
 
 - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb)
-  [Gaussian processes](https://krasserm.github.io/2018/03/19/gaussian-processes/). 
+  [Gaussian processes](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true). 
   Introduction to Gaussian processes for regression. Example implementations with plain NumPy/SciPy as well as with libraries 
   scikit-learn and GPy ([requirements.txt](gaussian-processes/requirements.txt)). 
 
 - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb)
-  [Gaussian processes for classification](https://krasserm.github.io/2020/11/04/gaussian-processes-classification/). 
+  [Gaussian processes for classification](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb). 
   Introduction to Gaussian processes for classification. Example implementations with plain NumPy/SciPy as well as with 
   scikit-learn ([requirements.txt](gaussian-processes/requirements.txt)). 
 
diff --git a/gaussian-processes/gaussian_processes.ipynb b/gaussian-processes/gaussian_processes.ipynb
@@ -42,7 +42,7 @@
     "\n",
     "Observations that are closer to $\\mathbf{x}$ have a higher weight than observations that are further away. Weights are computed from $\\mathbf{x}$ and observed $\\mathbf{x}_i$ with a kernel $\\kappa$. A special case is k-nearest neighbors (KNN) where the $k$ closest observations have a weight $1/k$, and all others have weight $0$. Non-parametric methods often need to process all training data for prediction and are therefore slower at inference time than parametric methods. On the other hand, training is usually faster as non-parametric models only need to remember training data. \n",
     "\n",
-    "Another example of non-parametric methods are [Gaussian processes](https://en.wikipedia.org/wiki/Gaussian_process) (GPs). Instead of inferring a distribution over the parameters of a parametric function Gaussian processes can be used to infer a distribution over functions directly. A Gaussian process defines a prior over functions. After having observed some function values it can be converted into a posterior over functions. Inference of continuous function values in this context is known as GP regression but GPs can also be used for [classification](https://github.com/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb). \n",
+    "Another example of non-parametric methods are [Gaussian processes](https://en.wikipedia.org/wiki/Gaussian_process) (GPs). Instead of inferring a distribution over the parameters of a parametric function Gaussian processes can be used to infer a distribution over functions directly. A Gaussian process defines a prior over functions. After having observed some function values it can be converted into a posterior over functions. Inference of continuous function values in this context is known as GP regression but GPs can also be used for [classification](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb). \n",
     "\n",
     "A Gaussian process is a [random process](https://en.wikipedia.org/wiki/Stochastic_process) where any point $\\mathbf{x} \\in \\mathbb{R}^d$ is assigned a random variable $f(\\mathbf{x})$ and where the joint distribution of a finite number of these variables $p(f(\\mathbf{x}_1),...,f(\\mathbf{x}_N))$ is itself Gaussian:\n",
     "\n",
diff --git a/gaussian-processes/gaussian_processes_classification.ipynb b/gaussian-processes/gaussian_processes_classification.ipynb
@@ -28,7 +28,7 @@
    "source": [
     "# Gaussian processes for classification\n",
     "\n",
-    "This article gives an introduction to Gaussian processes for classification and provides a minimal implementation with NumPy. Gaussian processes for regression are covered in a [previous article](https://github.com/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb) and a brief recap is given in the next section.\n",
+    "This article gives an introduction to Gaussian processes for classification and provides a minimal implementation with NumPy. Gaussian processes for regression are covered in a [previous article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true) and a brief recap is given in the next section.\n",
     "\n",
     "## Regression recap\n",
     "\n",
@@ -39,7 +39,7 @@
     "\\tag{1}\n",
     "$$\n",
     "\n",
-    "A GP is a prior over functions whose shape (smoothness, ...) is defined by $\\mathbf{K} = \\kappa(\\mathbf{X}, \\mathbf{X})$ where $\\kappa$ is a parameteric kernel function. It is common to set $\\boldsymbol\\mu = \\mathbf{0}$. Given observed function values $\\mathbf{y}$ at points $\\mathbf{X}$ we want to predict a new function value $f_*$ at point $\\mathbf{x}_*$. By definition of a GP, the joint distribution of observed values $\\mathbf{y}$ and prediction $f_*$ is also a Gaussian:\n",
+    "A GP is a prior over functions whose shape (smoothness, ...) is defined by $\\mathbf{K} = \\kappa(\\mathbf{X}, \\mathbf{X})$ where $\\kappa$ is a parameteric kernel function. It is common to set $\\boldsymbol\\mu = \\mathbf{0}$. Given observed noisy function values $\\mathbf{y}$ at points $\\mathbf{X}$ we want to predict a noise-free function value $f_*$ at point $\\mathbf{x}_*$. The joint distribution of observed values $\\mathbf{y}$ and prediction $f_*$ is also a Gaussian:\n",
     "\n",
     "$$\n",
     "p(\\mathbf{y}, f_* \\mid \\mathbf{X},\\mathbf{x}_*) = \n",
@@ -50,7 +50,7 @@
     "\\tag{2}\n",
     "$$\n",
     "\n",
-    "where $\\mathbf{K}_y = \\mathbf{K} + \\sigma_y^2\\mathbf{I}$, $\\mathbf{k}_* = \\kappa(\\mathbf{X},\\mathbf{x}_*)$ and $k_{**} = \\kappa(\\mathbf{x}_*,\\mathbf{x}_*)$. $\\sigma_y^2$ models noise in the observed function values $\\mathbf{y}$. Turning the joint distribution $(2)$ into a conditional distribution, using standard rules for conditioning Gaussians, we obtain a predictive distribution\n",
+    "where $\\mathbf{K}_y = \\mathbf{K} + \\sigma_y^2\\mathbf{I}$, $\\mathbf{k}_* = \\kappa(\\mathbf{X},\\mathbf{x}_*)$ and $k_{**} = \\kappa(\\mathbf{x}_*,\\mathbf{x}_*)$. $\\sigma_y^2$ models noise in the observed function values $\\mathbf{y}$. Turning the joint distribution $(2)$ into a conditional distribution we obtain a predictive distribution\n",
     "\n",
     "$$\n",
     "p(f_* \\mid \\mathbf{x}_*, \\mathbf{X}, \\mathbf{y}) = \\mathcal{N}(f_* \\mid \\boldsymbol\\mu_*, \\boldsymbol\\Sigma_*)\n",
@@ -66,7 +66,7 @@
     "\\end{align*}\n",
     "$$\n",
     "\n",
-    "In contrast to the notation in the [previous article](https://github.com/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb), I'm using here a single test input $\\mathbf{x}_*$ to be consistent with the notation in the following sections. However, the implementation further below is vectorized so that predictions can be made for multiple test inputs $\\mathbf{X}_*$ in a single operation.\n",
+    "In contrast to the notation in the [previous article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true), I'm using here a single test input $\\mathbf{x}_*$ to be consistent with the notation in the following sections. However, the implementation further below is vectorized so that predictions can be made for multiple test inputs $\\mathbf{X}_*$ in a single operation.\n",
     "\n",
     "\n",
     "## Binary classification\n",
@@ -232,7 +232,7 @@
    "source": [
     "### Training\n",
     "\n",
-    "As in [Gaussian processes](https://github.com/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb) for regression, we again use a squared exponential kernel with length parameter `theta[0]` and multiplicative constant `theta[1]`."
+    "As in [Gaussian processes](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true) for regression, we again use a squared exponential kernel with length parameter `theta[0]` and multiplicative constant `theta[1]`."
    ]
   },
   {