Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
B
Bayesian Statistical Learning 2
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Alina Bazarova
Bayesian Statistical Learning 2
Commits
933f4591
Commit
933f4591
authored
1 month ago
by
Steve Schmerler
Browse files
Options
Downloads
Patches
Plain Diff
Add more text in notebooks
parent
f4542a46
No related branches found
No related tags found
1 merge request
!2
Update GP slides and notebooks
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
BLcourse2.3/01_one_dim.py
+8
-0
8 additions, 0 deletions
BLcourse2.3/01_one_dim.py
BLcourse2.3/02_two_dim.py
+21
-4
21 additions, 4 deletions
BLcourse2.3/02_two_dim.py
BLcourse2.3/03_one_dim_SVI.py
+68
-24
68 additions, 24 deletions
BLcourse2.3/03_one_dim_SVI.py
with
97 additions
and
28 deletions
BLcourse2.3/01_one_dim.py
+
8
−
0
View file @
933f4591
...
@@ -32,6 +32,14 @@
...
@@ -32,6 +32,14 @@
# we still denote the collection of all $\ve x_i = x_i\in\mathbb R$ points with
# we still denote the collection of all $\ve x_i = x_i\in\mathbb R$ points with
# $\ma X$ to keep the notation consistent with the slides.
# $\ma X$ to keep the notation consistent with the slides.
# # About
#
# In this notebook, we explore the material presented in [the
# slides](https://doi.org/10.6084/m9.figshare.25988176) with code, using the
# [gpytorch](https://gpytorch.ai) library. We cover exact GP inference and
# hyper parameter optimization using the marginal likelihood.
# # Imports, helpers, setup
# # Imports, helpers, setup
# ##%matplotlib notebook
# ##%matplotlib notebook
...
...
This diff is collapsed.
Click to expand it.
BLcourse2.3/02_two_dim.py
+
21
−
4
View file @
933f4591
...
@@ -13,7 +13,13 @@
...
@@ -13,7 +13,13 @@
# name: python3
# name: python3
# ---
# ---
# In this notebook, we use a GP to fit a 2D data set.
# # About
#
# In this notebook, we use a GP to fit a 2D data set. We use the same ExactGP
# machinery as in the 1D case and show how GPs can be used for 2D interpolation
# (when data is free of noise) or regression (noisy data). Think of this as a
# toy geospatial data setting. Actually, in geostatistics, Gaussian process
# regression is known as [Kriging](https://en.wikipedia.org/wiki/Kriging).
# $\newcommand{\ve}[1]{\mathit{\boldsymbol{#1}}}$
# $\newcommand{\ve}[1]{\mathit{\boldsymbol{#1}}}$
# $\newcommand{\ma}[1]{\mathbf{#1}}$
# $\newcommand{\ma}[1]{\mathbf{#1}}$
# $\newcommand{\pred}[1]{\rm{#1}}$
# $\newcommand{\pred}[1]{\rm{#1}}$
...
@@ -50,6 +56,13 @@ torch.set_default_dtype(torch.float64)
...
@@ -50,6 +56,13 @@ torch.set_default_dtype(torch.float64)
torch
.
manual_seed
(
123
)
torch
.
manual_seed
(
123
)
# # Generate toy 2D data
# # Generate toy 2D data
#
# Our ground truth function is $f(\ve x) = \sin(r) / r$ with $\ve x =
# [x_0,x_1] \in\mathbb R^2$ and the radial distance
# $r=\sqrt{\ve x^\top\,\ve x}$, also known as "Mexican hat" function, which is a
# radial wave-like pattern which decays with distance from the center $\ve
# x=\ve 0$. We generate data by random sampling 2D points $\ve x_i$ and calculating
# $y_i = f(\ve x_i)$, optionally adding Gaussian noise further down.
class
MexicanHat
:
class
MexicanHat
:
...
@@ -304,11 +317,15 @@ with torch.no_grad():
...
@@ -304,11 +317,15 @@ with torch.no_grad():
assert
(
post_pred_f
.
mean
==
post_pred_y
.
mean
).
all
()
assert
(
post_pred_f
.
mean
==
post_pred_y
.
mean
).
all
()
# -
# -
# When `use_noise=False`, then the GP's prediction is an almost pefect
# When `use_noise=False`, then the GP's prediction is an almost pe
r
fect
# reconstruction of the ground truth function (in-distribution, so where we
# reconstruction of the ground truth function (in-distribution, so where we
# have data).
# have data). While 3D plots are fun, they are not optimal for judging how well
# the GP model represents the ground truth function.
# # Plot difference to ground truth and uncertainty
# # Plot difference to ground truth and uncertainty
#
# Let's use contour plots to visualize the difference between GP prediction and
# ground truth, as well as epistemic, total and aleatoric uncertainty.
# +
# +
ncols
=
4
ncols
=
4
...
@@ -432,7 +449,7 @@ print(
...
@@ -432,7 +449,7 @@ print(
# 0.2).
# 0.2).
# * out-of-distribution: `f_std` (epistemic) dominates
# * out-of-distribution: `f_std` (epistemic) dominates
# Exercises
#
# Exercises
#
#
# Go back up, switch on the settings for Exercise 2 and re-run the notebook.
# Go back up, switch on the settings for Exercise 2 and re-run the notebook.
# Same with Exercise 3.
# Same with Exercise 3.
...
...
This diff is collapsed.
Click to expand it.
BLcourse2.3/03_one_dim_SVI.py
+
68
−
24
View file @
933f4591
...
@@ -13,9 +13,12 @@
...
@@ -13,9 +13,12 @@
# name: bayes-ml-course
# name: bayes-ml-course
# ---
# ---
# In this notebook, we replace the ExactGP inference by using SVI (stochastic
# # About
# variational inference).
#
#
# In this notebook, we replace the ExactGP inference and log marginal
# likelihood optimization by using sparse stochastic variational inference.
# This serves as an example of the many methods `gpytorch` offers to make GPs
# scale to large data sets.
# $\newcommand{\ve}[1]{\mathit{\boldsymbol{#1}}}$
# $\newcommand{\ve}[1]{\mathit{\boldsymbol{#1}}}$
# $\newcommand{\ma}[1]{\mathbf{#1}}$
# $\newcommand{\ma}[1]{\mathbf{#1}}$
# $\newcommand{\pred}[1]{\rm{#1}}$
# $\newcommand{\pred}[1]{\rm{#1}}$
...
@@ -43,16 +46,20 @@ from matplotlib import is_interactive
...
@@ -43,16 +46,20 @@ from matplotlib import is_interactive
import
numpy
as
np
import
numpy
as
np
from
torch.utils.data
import
TensorDataset
,
DataLoader
from
torch.utils.data
import
TensorDataset
,
DataLoader
from
utils
import
extract_model_params
,
plot_samples
from
utils
import
extract_model_params
,
plot_samples
torch
.
set_default_dtype
(
torch
.
float64
)
torch
.
set_default_dtype
(
torch
.
float64
)
torch
.
manual_seed
(
123
)
torch
.
manual_seed
(
123
)
# -
# -
# # Generate toy 1D data
# # Generate toy 1D data
#
# Now we generate 10x more points as in the ExactGP case, still the inference
# won't be much slower (exact GPs scale roughly as $N^3$). Note that the data we
# use here is still tiny (1000 points is easy even for exact GPs), so the
# method's usefulness cannot be fully exploited in our example
# -- also we don't even use a GPU yet :).
# +
# +
...
@@ -94,25 +101,42 @@ print(f"{X_pred.shape=}")
...
@@ -94,25 +101,42 @@ print(f"{X_pred.shape=}")
# -
# -
# # Define GP model
# # Define GP model
#
# The model follows [this
# example](https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/SVGP_Regression_CUDA.html)
# based on [Hensman et al., "Scalable Variational Gaussian Process Classification",
# 2015](https://proceedings.mlr.press/v38/hensman15.html). The model is
# "sparse" since it works with a set of *inducing* points $(\ma Z, \ve u),
# \ve u=f(\ma Z)$ which is much smaller than the train data $(\ma X, \ve y)$.
#
# We have the same hyper parameters as before
#
# * $\ell$ = `model.covar_module.base_kernel.lengthscale`
# * $\sigma_n^2$ = `likelihood.noise_covar.noise`
# * $s$ = `model.covar_module.outputscale`
# * $m(\ve x) = c$ = `model.mean_module.constant`
#
# plus additional ones, introduced by the approximations used:
#
# * the learnable inducing points $\ma Z$ for the variational distribution
# $q_{\ve\psi}(\ve u)$
# * learnable parameters of the variational distribution $q_{\ve\psi}(\ve u)=\mathcal N(\ve m_u, \ma S)$: the
# variational mean $\ve m_u$ and covariance $\ma S$ in form a lower triangular
# matrix $\ma L$ such that $\ma S=\ma L\,\ma L^\top$
# +
# +
class
ApproxGPModel
(
gpytorch
.
models
.
ApproximateGP
):
class
ApproxGPModel
(
gpytorch
.
models
.
ApproximateGP
):
"""
API:
def
__init__
(
self
,
Z
):
# Approximate inducing value posterior q(u), u = f(Z), Z = inducing
model.forward() prior f_pred
# points (subset of X_train)
model() posterior f_pred
likelihood(model.forward()) prior with noise y_pred
likelihood(model()) posterior with noise y_pred
"""
def
__init__
(
self
,
X_ind
):
variational_distribution
=
(
variational_distribution
=
(
gpytorch
.
variational
.
CholeskyVariationalDistribution
(
X_ind
.
size
(
0
))
gpytorch
.
variational
.
CholeskyVariationalDistribution
(
Z
.
size
(
0
))
)
)
# Compute q(f(X)) from q(u)
variational_strategy
=
gpytorch
.
variational
.
VariationalStrategy
(
variational_strategy
=
gpytorch
.
variational
.
VariationalStrategy
(
self
,
self
,
X_ind
,
Z
,
variational_distribution
,
variational_distribution
,
learn_inducing_locations
=
False
,
learn_inducing_locations
=
False
,
)
)
...
@@ -123,7 +147,6 @@ class ApproxGPModel(gpytorch.models.ApproximateGP):
...
@@ -123,7 +147,6 @@ class ApproxGPModel(gpytorch.models.ApproximateGP):
)
)
def
forward
(
self
,
x
):
def
forward
(
self
,
x
):
"""
The prior, defined in terms of the mean and covariance function.
"""
mean_x
=
self
.
mean_module
(
x
)
mean_x
=
self
.
mean_module
(
x
)
covar_x
=
self
.
covar_module
(
x
)
covar_x
=
self
.
covar_module
(
x
)
return
gpytorch
.
distributions
.
MultivariateNormal
(
mean_x
,
covar_x
)
return
gpytorch
.
distributions
.
MultivariateNormal
(
mean_x
,
covar_x
)
...
@@ -132,9 +155,10 @@ class ApproxGPModel(gpytorch.models.ApproximateGP):
...
@@ -132,9 +155,10 @@ class ApproxGPModel(gpytorch.models.ApproximateGP):
likelihood
=
gpytorch
.
likelihoods
.
GaussianLikelihood
()
likelihood
=
gpytorch
.
likelihoods
.
GaussianLikelihood
()
n_train
=
len
(
X_train
)
n_train
=
len
(
X_train
)
# Start values for inducing points Z, use 10% random sub-sample of X_train.
ind_points_fraction
=
0.1
ind_points_fraction
=
0.1
ind_idxs
=
torch
.
randperm
(
n_train
)[:
int
(
n_train
*
ind_points_fraction
)]
ind_idxs
=
torch
.
randperm
(
n_train
)[:
int
(
n_train
*
ind_points_fraction
)]
model
=
ApproxGPModel
(
X_ind
=
X_train
[
ind_idxs
])
model
=
ApproxGPModel
(
Z
=
X_train
[
ind_idxs
])
# -
# -
...
@@ -157,14 +181,34 @@ model.mean_module.constant = 3.0
...
@@ -157,14 +181,34 @@ model.mean_module.constant = 3.0
model
.
covar_module
.
base_kernel
.
lengthscale
=
1.0
model
.
covar_module
.
base_kernel
.
lengthscale
=
1.0
model
.
covar_module
.
outputscale
=
1.0
model
.
covar_module
.
outputscale
=
1.0
likelihood
.
noise_covar
.
noise
=
0.3
likelihood
.
noise_covar
.
noise
=
0.3
##print("model params:")
##pprint(extract_model_params(model, raw=False, try_item=False))
##print("likelihood params:")
##pprint(extract_model_params(likelihood, raw=False, try_item=False))
# -
# -
# # Fit GP to data: optimize hyper params
# # Fit GP to data: optimize hyper params
#
# Now we optimize the GP hyper parameters by doing a GP-specific variational inference (VI),
# where we optimize not the log marginal likelihood (ExactGP case),
# but an ELBO (evidence lower bound) objective
#
# $$
# \max_\ve\zeta\left(\mathbb E_{q_{\ve\psi}(\ve u)}\left[\ln p(\ve y|\ve u) \right] -
# D_{\text{KL}}(q_{\ve\psi}(\ve u)\Vert p(\ve u))\right)
# $$
#
# with respect to
#
# $$\ve\zeta = [\ell, \sigma_n^2, s, c, \ve\psi] $$
#
# with
#
# $$\ve\psi = [\ve m_u, \ma Z, \ma L]$$
#
# the parameters of the variational distribution $q_{\ve\psi}(\ve u)$.
#
# In addition, we perform a stochastic
# optimization by using a deep learning type mini-batch loop, hence
# "stochastic" variational inference (SVI). The latter speeds up the
# optimization since we only look at a fraction of data per optimizer step to
# calculate an approximate loss gradient (`loss.backward()`).
# +
# +
# Train mode
# Train mode
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment