|
28 | 28 | "source": [
|
29 | 29 | "# Gaussian processes for classification\n",
|
30 | 30 | "\n",
|
31 |
| - "This article gives an introduction to Gaussian processes for classification and provides a minimal implementation with NumPy. Gaussian processes for regression are covered in a [previous article](https://github.com/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb) and a brief recap is given in the next section.\n", |
| 31 | + "This article gives an introduction to Gaussian processes for classification and provides a minimal implementation with NumPy. Gaussian processes for regression are covered in a [previous article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true) and a brief recap is given in the next section.\n", |
32 | 32 | "\n",
|
33 | 33 | "## Regression recap\n",
|
34 | 34 | "\n",
|
|
39 | 39 | "\\tag{1}\n",
|
40 | 40 | "$$\n",
|
41 | 41 | "\n",
|
42 |
| - "A GP is a prior over functions whose shape (smoothness, ...) is defined by $\\mathbf{K} = \\kappa(\\mathbf{X}, \\mathbf{X})$ where $\\kappa$ is a parameteric kernel function. It is common to set $\\boldsymbol\\mu = \\mathbf{0}$. Given observed function values $\\mathbf{y}$ at points $\\mathbf{X}$ we want to predict a new function value $f_*$ at point $\\mathbf{x}_*$. By definition of a GP, the joint distribution of observed values $\\mathbf{y}$ and prediction $f_*$ is also a Gaussian:\n", |
| 42 | + "A GP is a prior over functions whose shape (smoothness, ...) is defined by $\\mathbf{K} = \\kappa(\\mathbf{X}, \\mathbf{X})$ where $\\kappa$ is a parameteric kernel function. It is common to set $\\boldsymbol\\mu = \\mathbf{0}$. Given observed noisy function values $\\mathbf{y}$ at points $\\mathbf{X}$ we want to predict a noise-free function value $f_*$ at point $\\mathbf{x}_*$. The joint distribution of observed values $\\mathbf{y}$ and prediction $f_*$ is also a Gaussian:\n", |
43 | 43 | "\n",
|
44 | 44 | "$$\n",
|
45 | 45 | "p(\\mathbf{y}, f_* \\mid \\mathbf{X},\\mathbf{x}_*) = \n",
|
|
50 | 50 | "\\tag{2}\n",
|
51 | 51 | "$$\n",
|
52 | 52 | "\n",
|
53 |
| - "where $\\mathbf{K}_y = \\mathbf{K} + \\sigma_y^2\\mathbf{I}$, $\\mathbf{k}_* = \\kappa(\\mathbf{X},\\mathbf{x}_*)$ and $k_{**} = \\kappa(\\mathbf{x}_*,\\mathbf{x}_*)$. $\\sigma_y^2$ models noise in the observed function values $\\mathbf{y}$. Turning the joint distribution $(2)$ into a conditional distribution, using standard rules for conditioning Gaussians, we obtain a predictive distribution\n", |
| 53 | + "where $\\mathbf{K}_y = \\mathbf{K} + \\sigma_y^2\\mathbf{I}$, $\\mathbf{k}_* = \\kappa(\\mathbf{X},\\mathbf{x}_*)$ and $k_{**} = \\kappa(\\mathbf{x}_*,\\mathbf{x}_*)$. $\\sigma_y^2$ models noise in the observed function values $\\mathbf{y}$. Turning the joint distribution $(2)$ into a conditional distribution we obtain a predictive distribution\n", |
54 | 54 | "\n",
|
55 | 55 | "$$\n",
|
56 | 56 | "p(f_* \\mid \\mathbf{x}_*, \\mathbf{X}, \\mathbf{y}) = \\mathcal{N}(f_* \\mid \\boldsymbol\\mu_*, \\boldsymbol\\Sigma_*)\n",
|
|
66 | 66 | "\\end{align*}\n",
|
67 | 67 | "$$\n",
|
68 | 68 | "\n",
|
69 |
| - "In contrast to the notation in the [previous article](https://github.com/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb), I'm using here a single test input $\\mathbf{x}_*$ to be consistent with the notation in the following sections. However, the implementation further below is vectorized so that predictions can be made for multiple test inputs $\\mathbf{X}_*$ in a single operation.\n", |
| 69 | + "In contrast to the notation in the [previous article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true), I'm using here a single test input $\\mathbf{x}_*$ to be consistent with the notation in the following sections. However, the implementation further below is vectorized so that predictions can be made for multiple test inputs $\\mathbf{X}_*$ in a single operation.\n", |
70 | 70 | "\n",
|
71 | 71 | "\n",
|
72 | 72 | "## Binary classification\n",
|
|
232 | 232 | "source": [
|
233 | 233 | "### Training\n",
|
234 | 234 | "\n",
|
235 |
| - "As in [Gaussian processes](https://github.com/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb) for regression, we again use a squared exponential kernel with length parameter `theta[0]` and multiplicative constant `theta[1]`." |
| 235 | + "As in [Gaussian processes](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true) for regression, we again use a squared exponential kernel with length parameter `theta[0]` and multiplicative constant `theta[1]`." |
236 | 236 | ]
|
237 | 237 | },
|
238 | 238 | {
|
|
0 commit comments