+ - 0:00:00
Notes for current slide
Notes for next slide

Generalized Additive Models with R and mgcv

Gavin Simpson

January 3, 2022

1 / 49

Today's topics

  • What are GAMs?

  • How to fit GAMs in R with mgcv

  • Examples

2 / 49

Logistics

3 / 49

Motivating example

4 / 49

HadCRUT4 time series

5 / 49

Hadley Centre NH temperature record ensemble

How would you model the trend in these data?

Linear Models

yiN(μi,σ2)

μi=β0+β1x1i+β2x2i++βjxji

Assumptions

  1. linear effects of covariates are good approximation of the true effects
  2. conditional on the values of covariates, yi|XN(0,σ2)
  3. this implies all observations have the same variance
  4. yi|X are independent

An additive model address the first of these

6 / 49

Why bother with anything more complex?

7 / 49

Is this linear?

8 / 49

Polynomials perhaps…

9 / 49

Polynomials perhaps…

We can keep on adding ever more powers of x to the model — model selection problem

Runge phenomenon — oscillations at the edges of an interval — means simply moving to higher-order polynomials doesn't always improve accuracy

10 / 49

GAMs offer a solution

11 / 49

HadCRUT data set

## Load Data
tmpf <- tempfile()
curl_download("https://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.6.0.0.annual_nh.txt", tmpf)
gtemp <- read.table(tmpf, colClasses = rep("numeric", 12))[, 1:2] # only want some of the variables
names(gtemp) <- c("Year", "Temperature")
gtemp <- as_tibble(gtemp)

File format

12 / 49

HadCRUT data set

gtemp
## # A tibble: 172 × 2
## Year Temperature
## <dbl> <dbl>
## 1 1850 NA
## 2 1851 NA
## 3 1852 NA
## 4 1853 NA
## 5 1854 NA
## 6 1855 NA
## 7 1856 NA
## 8 1857 NA
## 9 1858 NA
## 10 1859 NA
## # … with 162 more rows
13 / 49

Fitting a GAM

library('mgcv')
m <- gam(Temperature ~ s(Year), data = gtemp, method = 'REML')
summary(m)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## Temperature ~ s(Year)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.020523 0.009733 -2.109 0.0365 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Year) 7.838 8.65 145 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.88 Deviance explained = 88.5%
## -REML = -91.196 Scale est. = 0.016294 n = 172
14 / 49

Fitted GAM

15 / 49

GAMs

16 / 49

Generalized Additive Models


Source: GAMs in R by Noam Ross

17 / 49

GAMs are an intermediate-complexity model

  • can learn from data without needing to be informed by the user
  • remain interpretable because we can visualize the fitted features

How is a GAM different?

In LM we model the mean of data as a sum of linear terms:

yi=β0+jβjxji+ϵi

A GAM is a sum of smooth functions or smooths

yi=β0+jsj(xji)+ϵi

where ϵiN(0,σ2), yiNormal (for now)

Call the above equation the linear predictor in both cases

18 / 49

Fitting a GAM in R

model <- gam(y ~ s(x1) + s(x2) + te(x3, x4), # formuala describing model
data = my_data_frame, # your data
method = 'REML', # or 'ML'
family = gaussian) # or something more exotic

s() terms are smooths of one or more variables

te() terms are the smooth equivalent of main effects + interactions

19 / 49

How did gam() know?

20 / 49

What magic is this?

21 / 49
22 / 49

Wiggly things

23 / 49

GAMs use splines to represent the non-linear relationships between covariates, here x, and the response variable on the y axis.

Basis expansions

In the polynomial models we used a polynomial basis expansion of x

  • x0=1 — the model constant term
  • x1=x — linear term
  • x2
  • x3
24 / 49

Splines

Splines are functions composed of simpler functions

Simpler functions are basis functions & the set of basis functions is a basis

When we model using splines, each basis function bk has a coefficient βk

Resultant spline is a the sum of these weighted basis functions, evaluated at the values of x

s(x)=k=1Kβkbk(x)

25 / 49

Splines formed from basis functions

26 / 49

Splines are built up from basis functions

Here I'm showing a cubic regression spline basis with 10 knots/functions

We weight each basis function to get a spline. Here all the basisi functions have the same weight so they would fit a horizontal line

Weight basis functions ⇨ spline

27 / 49

But if we choose different weights we get more wiggly spline

Each of the splines I showed you earlier are all generated from the same basis functions but using different weights

How do GAMs learn from data?

28 / 49

How does this help us learn from data?

Here I'm showing a simulated data set, where the data are drawn from the orange functions, with noise. We want to learn the orange function from the data

Maximise penalised log-likelihood ⇨ β

29 / 49

Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints

Avoid overfitting our sample

30 / 49

How wiggly?

R[f]2dx=βTSβ

31 / 49

Penalised fit

Lp(β)=L(β)12λβTSβ

32 / 49

Wiggliness

R[f]2dx=βTSβ=W

(Wiggliness is 100% the right mathy word)

We penalize wiggliness to avoid overfitting

33 / 49

Making wiggliness matter

W measures wiggliness

(log) likelihood measures closeness to the data

We use a smoothing parameter λ to define the trade-off, to find the spline coefficients Bk that maximize the penalized log-likelihood

Lp=log(Likelihood)λW

34 / 49

HadCRUT4 time series

35 / 49

Picking the right wiggliness

Two ways to think about how to optimize λ:

  • Predictive: Minimize out-of-sample error
  • Bayesian: Put priors on our basis coefficients

Many methods: AIC, Mallow's Cp, GCV, ML, REML

  • Practically, use REML, because of numerical stability
  • Hence gam(..., method = "REML")

Animation of derivatives

36 / 49

Maximum allowed wiggliness

We set basis complexity or "size" k

This is maximum wigglyness, can be thought of as number of small functions that make up a curve

Once smoothing is applied, curves have fewer effective degrees of freedom (EDF)

EDF < k

37 / 49

Maximum allowed wiggliness

k must be large enough, the λ penalty does the rest

Large enough — space of functions representable by the basis includes the true function or a close approximation to the tru function

Bigger k increases computational cost

In mgcv, default k values are arbitrary — after choosing the model terms, this is the key user choice

Must be checked!gam.check()

38 / 49

GAM summary so far

  1. GAMs give us a framework to model flexible nonlinear relationships

  2. Use little functions (basis functions) to make big functions (smooths)

  3. Use a penalty to trade off wiggliness/generality

  4. Need to make sure your smooths are wiggly enough

39 / 49

A cornucopia of smooths

40 / 49

A cornucopia of smooths

The type of smoother is controlled by the bs argument (think basis)

The default is a low-rank thin plate spline bs = 'tp'

Many others available

  • Cubic splines bs = 'cr'
  • P splines bs = 'ps'
  • Cyclic splines bs = 'cc' or bs = 'cp'
  • Adaptive splines bs = 'ad'
  • Random effect bs = 're'
  • Factor smooths bs = 'fs'
  • Duchon splines bs = 'ds'
  • Spline on the sphere bs = 'sos'
  • MRFs bs = 'mrf'
  • Soap-film smooth bs = 'so'
  • Gaussian process bs = 'gp'
41 / 49

A bestiary of conditional distributions

A GAM is just a fancy GLM

Simon Wood & colleagues (2016) have extended the mgcv methods to some non-exponential family distributions

  • binomial()
  • poisson()
  • Gamma()
  • inverse.gaussian()
  • nb()
  • tw()
  • mvn()
  • multinom()
  • betar()
  • scat()
  • gaulss()
  • ziplss()
  • twlss()
  • cox.ph()
  • gamals()
  • ocat()
42 / 49

Smooth interactions

Two ways to fit smooth interactions

  1. Bivariate (or higher order) thin plate splines
    • s(x, z, bs = 'tp')
    • Isotropic; single smoothness parameter for the smooth
    • Sensitive to scales of x and z
  2. Tensor product smooths
    • Separate marginal basis for each smooth, separate smoothness parameters
    • Invariant to scales of x and z
    • Use for interactions when variables are in different units
    • te(x, z)
43 / 49

Tensor product smooths

There are multiple ways to build tensor products in mgcv

  1. te(x, z)
  2. t2(x, z)
  3. s(x) + s(z) + ti(x, z)

te() is the most general form but not usable in gamm4::gamm4() or brms

t2() is an alternative implementation that does work in gamm4::gamm4() or brms

ti() fits pure smooth interactions; where the main effects of x and z have been removed from the basis

44 / 49

Factor smooth interactions

Two ways for factor smooth interactions

  1. by variable smooths
    • entirely separate smooth function for each level of the factor
    • each has it's own smoothness parameter
    • centred (no group means) so include factor as a fixed effect
    • y ~ f + s(x, by = f)
  2. bs = 'fs' basis
    • smooth function for each level of the function
    • share a common smoothness parameter
    • fully penalized; include group means
    • closer to random effects
    • y ~ s(x, f, bs = 'fs')
45 / 49

Random effects

When fitted with REML or ML, smooths can be viewed as just fancy random effects

Inverse is true too; random effects can be viewed as smooths

If you have simple random effects you can fit those in gam() and bam() without needing the more complex GAMM functions gamm() or gamm4::gamm4()

These two models are equivalent

m_nlme <- lme(travel ~ 1, data = Rail, ~ 1 | Rail, method = "REML")
m_gam <- gam(travel ~ s(Rail, bs = "re"), data = Rail, method = "REML")
46 / 49

Random effects

The random effect basis bs = 're' is not as computationally efficient as nlme or lme4 for fitting

  • complex random effects terms, or
  • random effects with many levels

Instead see gamm() and gamm4::gamm4()

  • gamm() fits using lme()
  • gamm4::gamm4() fits using lmer() or glmer()

For non Gaussian models use gamm4::gamm4()

47 / 49

Next steps

Read Simon Wood's book!

Lots more material on our ESA GAM Workshop site

https://noamross.github.io/mgcv-esa-workshop/

Noam Ross' free GAM Course

https://noamross.github.io/gams-in-r-course/

A couple of papers:

  1. Simpson, G.L., 2018. Modelling Palaeoecological Time Series Using Generalised Additive Models. Frontiers in Ecology and Evolution 6, 149. https://doi.org/10.3389/fevo.2018.00149
  2. Pedersen, E.J., Miller, D.L., Simpson, G.L., Ross, N., 2019. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ 7, e6876. https://doi.org/10.7717/peerj.6876

Also see my blog: www.fromthebottomoftheheap.net

48 / 49

Acknowledgments

Funding

Fellow GAM colleagues

David Miller

Eric Pedersen

Noam Ross

Slides

49 / 49

Today's topics

  • What are GAMs?

  • How to fit GAMs in R with mgcv

  • Examples

2 / 49
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow