Generalized Additive Models with R and mgcvGavin SimpsonJanuary 3, 20221 / 49

Today's topics

What are GAMs?
How to fit GAMs in R with mgcv
Examples

2 / 49

Logistics

HTML Slide deck bit.ly/gam-efi-22 © Simpson (2022)
RMarkdown bit.ly/gam-efi-git

3 / 49

Motivating example4 / 49

HadCRUT4 time series

5 / 49

Hadley Centre NH temperature record ensemble

How would you model the trend in these data?

Linear Models

$y_{i} \sim N (μ_{i}, σ^{2})$

$μ_{i} = β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{j} x_{j i}$

Assumptions

linear effects of covariates are good approximation of the true effects
conditional on the values of covariates, $y_{i} | X \sim N (0, σ^{2})$
this implies all observations have the same variance
$y_{i} | X$ are independent

An additive model address the first of these

6 / 49

Why bother with anything more complex?7 / 49

Is this linear?

8 / 49

Polynomials perhaps…

9 / 49

Polynomials perhaps…

We can keep on adding ever more powers of $x$ to the model — model selection problem

Runge phenomenon — oscillations at the edges of an interval — means simply moving to higher-order polynomials doesn't always improve accuracy

10 / 49

GAMs offer a solution11 / 49

HadCRUT data set

## Load Data
tmpf <- tempfile()
curl_download("https://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.6.0.0.annual_nh.txt", tmpf)
gtemp <- read.table(tmpf, colClasses = rep("numeric", 12))[, 1:2] # only want some of the variables
names(gtemp) <- c("Year", "Temperature")
gtemp <- as_tibble(gtemp)

File format

12 / 49

HadCRUT data set

gtemp

## # A tibble: 172 × 2
##     Year Temperature
##    <dbl>       <dbl>
##  1  1850          NA
##  2  1851          NA
##  3  1852          NA
##  4  1853          NA
##  5  1854          NA
##  6  1855          NA
##  7  1856          NA
##  8  1857          NA
##  9  1858          NA
## 10  1859          NA
## # … with 162 more rows

13 / 49

Fitting a GAM

library('mgcv')
m <- gam(Temperature ~ s(Year), data = gtemp, method = 'REML')
summary(m)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Temperature ~ s(Year)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.020523   0.009733  -2.109   0.0365 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df   F p-value    
## s(Year) 7.838   8.65 145  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.88   Deviance explained = 88.5%
## -REML = -91.196  Scale est. = 0.016294  n = 172

14 / 49

Fitted GAM

15 / 49

GAMs16 / 49

Generalized Additive Models

Source: GAMs in R by Noam Ross

17 / 49

GAMs are an intermediate-complexity model

can learn from data without needing to be informed by the user
remain interpretable because we can visualize the fitted features

How is a GAM different?

In LM we model the mean of data as a sum of linear terms:

$y_{i} = β_{0} + \sum_{j} β_{j} x_{j i} + ϵ_{i}$

A GAM is a sum of smooth functions or smooths

$y_{i} = β_{0} + \sum_{j} s_{j} (x_{j i}) + ϵ_{i}$

where $ϵ_{i} \sim N (0, σ^{2})$ , $y_{i} \sim Normal$ (for now)

Call the above equation the linear predictor in both cases

18 / 49

Fitting a GAM in R

model <- gam(y ~ s(x1) + s(x2) + te(x3, x4), # formuala describing model
             data = my_data_frame,           # your data
             method = 'REML',                # or 'ML'
             family = gaussian)              # or something more exotic

s() terms are smooths of one or more variables

te() terms are the smooth equivalent of main effects + interactions

19 / 49

How did `gam()` know?

20 / 49

What magic is this?

Rob Potter

21 / 49

22 / 49

Wiggly things

23 / 49

GAMs use splines to represent the non-linear relationships between covariates, here x, and the response variable on the y axis.

Basis expansions

In the polynomial models we used a polynomial basis expansion of $x$

$x^{0} = 1$ — the model constant term
$x^{1} = x$ — linear term
$x^{2}$
$x^{3}$
…

24 / 49

Splines

Splines are functions composed of simpler functions

Simpler functions are basis functions & the set of basis functions is a basis

When we model using splines, each basis function $b_{k}$ has a coefficient $β_{k}$

Resultant spline is a the sum of these weighted basis functions, evaluated at the values of $x$

$s (x) = \sum_{k = 1}^{K} β_{k} b_{k} (x)$

25 / 49

Splines formed from basis functions

26 / 49

Splines are built up from basis functions

Here I'm showing a cubic regression spline basis with 10 knots/functions

We weight each basis function to get a spline. Here all the basisi functions have the same weight so they would fit a horizontal line

Weight basis functions ⇨ spline

27 / 49

But if we choose different weights we get more wiggly spline

Each of the splines I showed you earlier are all generated from the same basis functions but using different weights

How do GAMs learn from data?

28 / 49

How does this help us learn from data?

Here I'm showing a simulated data set, where the data are drawn from the orange functions, with noise. We want to learn the orange function from the data

Maximise penalised log-likelihood ⇨ β

29 / 49

Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints

Avoid overfitting our sample30 / 49

How wiggly?

$\int_{R} [f^{''}]^{2} d x = β^{T} S β$

31 / 49

Penalised fit

$L_{p} (β) = L (β) - \frac{1}{2} λ β^{T} S β$

32 / 49

Wiggliness

$\int_{R} [f^{''}]^{2} d x = β^{T} S β = W$

(Wiggliness is 100% the right mathy word)

We penalize wiggliness to avoid overfitting

33 / 49

Making wiggliness matter

$W$ measures wiggliness

(log) likelihood measures closeness to the data

We use a smoothing parameter $λ$ to define the trade-off, to find the spline coefficients $B_{k}$ that maximize the penalized log-likelihood

$L_{p} = \log (Likelihood) - λ W$

34 / 49

HadCRUT4 time series

35 / 49

Picking the right wiggliness

Two ways to think about how to optimize $λ$ :

Predictive: Minimize out-of-sample error
Bayesian: Put priors on our basis coefficients

Many methods: AIC, Mallow's $C_{p}$ , GCV, ML, REML

Practically, use REML, because of numerical stability
Hence gam(..., method = "REML")

Animation of derivatives

36 / 49

Maximum allowed wiggliness

We set basis complexity or "size" $k$

This is maximum wigglyness, can be thought of as number of small functions that make up a curve

Once smoothing is applied, curves have fewer effective degrees of freedom (EDF)

EDF < $k$

37 / 49

Maximum allowed wiggliness

$k$ must be large enough, the $λ$ penalty does the rest

Large enough — space of functions representable by the basis includes the true function or a close approximation to the tru function

Bigger $k$ increases computational cost

In mgcv, default $k$ values are arbitrary — after choosing the model terms, this is the key user choice

Must be checked! — gam.check()

38 / 49

GAM summary so far

GAMs give us a framework to model flexible nonlinear relationships
Use little functions (basis functions) to make big functions (smooths)
Use a penalty to trade off wiggliness/generality
Need to make sure your smooths are wiggly enough

39 / 49

A cornucopia of smooths40 / 49

A cornucopia of smooths

The type of smoother is controlled by the bs argument (think basis)

The default is a low-rank thin plate spline bs = 'tp'

Many others available

Cubic splines bs = 'cr'
P splines bs = 'ps'
Cyclic splines bs = 'cc' or bs = 'cp'
Adaptive splines bs = 'ad'
Random effect bs = 're'
Factor smooths bs = 'fs'

Duchon splines bs = 'ds'
Spline on the sphere bs = 'sos'
MRFs bs = 'mrf'
Soap-film smooth bs = 'so'
Gaussian process bs = 'gp'

41 / 49

A bestiary of conditional distributions

A GAM is just a fancy GLM

Simon Wood & colleagues (2016) have extended the mgcv methods to some non-exponential family distributions

binomial()
poisson()
Gamma()
inverse.gaussian()
nb()
tw()
mvn()
multinom()

betar()
scat()
gaulss()
ziplss()
twlss()
cox.ph()
gamals()
ocat()

42 / 49

Smooth interactions

Two ways to fit smooth interactions

Bivariate (or higher order) thin plate splines
- s(x, z, bs = 'tp')
- Isotropic; single smoothness parameter for the smooth
- Sensitive to scales of x and z
Tensor product smooths
- Separate marginal basis for each smooth, separate smoothness parameters
- Invariant to scales of x and z
- Use for interactions when variables are in different units
- te(x, z)

43 / 49

Tensor product smooths

There are multiple ways to build tensor products in mgcv

te(x, z)
t2(x, z)
s(x) + s(z) + ti(x, z)

te() is the most general form but not usable in gamm4::gamm4() or brms

t2() is an alternative implementation that does work in gamm4::gamm4() or brms

ti() fits pure smooth interactions; where the main effects of x and z have been removed from the basis

44 / 49

Factor smooth interactions

Two ways for factor smooth interactions

by variable smooths
- entirely separate smooth function for each level of the factor
- each has it's own smoothness parameter
- centred (no group means) so include factor as a fixed effect
- y ~ f + s(x, by = f)
bs = 'fs' basis
- smooth function for each level of the function
- share a common smoothness parameter
- fully penalized; include group means
- closer to random effects
- y ~ s(x, f, bs = 'fs')

45 / 49

Random effects

When fitted with REML or ML, smooths can be viewed as just fancy random effects

Inverse is true too; random effects can be viewed as smooths

If you have simple random effects you can fit those in gam() and bam() without needing the more complex GAMM functions gamm() or gamm4::gamm4()

These two models are equivalent

m_nlme <- lme(travel ~ 1, data = Rail, ~ 1 | Rail, method = "REML") 
m_gam  <- gam(travel ~ s(Rail, bs = "re"), data = Rail, method = "REML")

46 / 49

Random effects

The random effect basis bs = 're' is not as computationally efficient as nlme or lme4 for fitting

complex random effects terms, or
random effects with many levels

Instead see gamm() and gamm4::gamm4()

gamm() fits using lme()
gamm4::gamm4() fits using lmer() or glmer()

For non Gaussian models use gamm4::gamm4()

47 / 49

Next steps

Read Simon Wood's book!

Lots more material on our ESA GAM Workshop site

https://noamross.github.io/mgcv-esa-workshop/

Noam Ross' free GAM Course

https://noamross.github.io/gams-in-r-course/

A couple of papers:

Simpson, G.L., 2018. Modelling Palaeoecological Time Series Using Generalised Additive Models. Frontiers in Ecology and Evolution 6, 149. https://doi.org/10.3389/fevo.2018.00149
Pedersen, E.J., Miller, D.L., Simpson, G.L., Ross, N., 2019. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ 7, e6876. https://doi.org/10.7717/peerj.6876

Also see my blog: www.fromthebottomoftheheap.net

48 / 49

Acknowledgments

Funding

Fellow GAM colleagues

David Miller

Eric Pedersen

Noam Ross

Slides

RMarkdown bit.ly/gam-efi-git

49 / 49

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Generalized Additive Models with R and mgcv

Gavin Simpson

January 3, 2022

Today's topics

Logistics

Motivating example

HadCRUT4 time series

Linear Models

Why bother with anything more complex?

Is this linear?

Polynomials perhaps…

Polynomials perhaps…

GAMs offer a solution

HadCRUT data set

HadCRUT data set

Fitting a GAM

Fitted GAM

GAMs

Generalized Additive Models

How is a GAM different?

Fitting a GAM in R

How did gam() know?

What magic is this?

Wiggly things

Basis expansions

Splines

Splines formed from basis functions

Weight basis functions ⇨ spline

How do GAMs learn from data?

Maximise penalised log-likelihood ⇨ β

Avoid overfitting our sample

How wiggly?

Penalised fit

Wiggliness

Making wiggliness matter

HadCRUT4 time series

Picking the right wiggliness

Maximum allowed wiggliness

Maximum allowed wiggliness

GAM summary so far

A cornucopia of smooths

A cornucopia of smooths

A bestiary of conditional distributions

Smooth interactions

Tensor product smooths

Factor smooth interactions

Random effects

Random effects

Next steps

Acknowledgments

Funding

Fellow GAM colleagues

Slides

Today's topics

Help

How did `gam()` know?