What are GAMs?
How to fit GAMs in R with mgcv
Examples
Hadley Centre NH temperature record ensemble
How would you model the trend in these data?
yi∼N(μi,σ2)
μi=β0+β1x1i+β2x2i+⋯+βjxji
Assumptions
An additive model address the first of these
We can keep on adding ever more powers of x to the model — model selection problem
Runge phenomenon — oscillations at the edges of an interval — means simply moving to higher-order polynomials doesn't always improve accuracy
## Load Datatmpf <- tempfile()curl_download("https://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.6.0.0.annual_nh.txt", tmpf)gtemp <- read.table(tmpf, colClasses = rep("numeric", 12))[, 1:2] # only want some of the variablesnames(gtemp) <- c("Year", "Temperature")gtemp <- as_tibble(gtemp)
gtemp
## # A tibble: 172 × 2## Year Temperature## <dbl> <dbl>## 1 1850 NA## 2 1851 NA## 3 1852 NA## 4 1853 NA## 5 1854 NA## 6 1855 NA## 7 1856 NA## 8 1857 NA## 9 1858 NA## 10 1859 NA## # … with 162 more rows
library('mgcv')m <- gam(Temperature ~ s(Year), data = gtemp, method = 'REML')summary(m)
## ## Family: gaussian ## Link function: identity ## ## Formula:## Temperature ~ s(Year)## ## Parametric coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.020523 0.009733 -2.109 0.0365 *## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Approximate significance of smooth terms:## edf Ref.df F p-value ## s(Year) 7.838 8.65 145 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## R-sq.(adj) = 0.88 Deviance explained = 88.5%## -REML = -91.196 Scale est. = 0.016294 n = 172
GAMs are an intermediate-complexity model
In LM we model the mean of data as a sum of linear terms:
yi=β0+∑jβjxji+ϵi
A GAM is a sum of smooth functions or smooths
yi=β0+∑jsj(xji)+ϵi
where ϵi∼N(0,σ2), yi∼Normal (for now)
Call the above equation the linear predictor in both cases
model <- gam(y ~ s(x1) + s(x2) + te(x3, x4), # formuala describing model data = my_data_frame, # your data method = 'REML', # or 'ML' family = gaussian) # or something more exotic
s()
terms are smooths of one or more variables
te()
terms are the smooth equivalent of main effects + interactions
gam()
know?GAMs use splines to represent the non-linear relationships between covariates, here x
, and the response variable on the y
axis.
In the polynomial models we used a polynomial basis expansion of x
Splines are functions composed of simpler functions
Simpler functions are basis functions & the set of basis functions is a basis
When we model using splines, each basis function bk has a coefficient βk
Resultant spline is a the sum of these weighted basis functions, evaluated at the values of x
s(x)=K∑k=1βkbk(x)
Splines are built up from basis functions
Here I'm showing a cubic regression spline basis with 10 knots/functions
We weight each basis function to get a spline. Here all the basisi functions have the same weight so they would fit a horizontal line
But if we choose different weights we get more wiggly spline
Each of the splines I showed you earlier are all generated from the same basis functions but using different weights
How does this help us learn from data?
Here I'm showing a simulated data set, where the data are drawn from the orange functions, with noise. We want to learn the orange function from the data
Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints
∫R[f′′]2dx=βTSβ
Lp(β)=L(β)−12λβTSβ
∫R[f′′]2dx=βTSβ=W
(Wiggliness is 100% the right mathy word)
We penalize wiggliness to avoid overfitting
W measures wiggliness
(log) likelihood measures closeness to the data
We use a smoothing parameter λ to define the trade-off, to find the spline coefficients Bk that maximize the penalized log-likelihood
Lp=log(Likelihood)−λW
Two ways to think about how to optimize λ:
Many methods: AIC, Mallow's Cp, GCV, ML, REML
gam(..., method = "REML")
We set basis complexity or "size" k
This is maximum wigglyness, can be thought of as number of small functions that make up a curve
Once smoothing is applied, curves have fewer effective degrees of freedom (EDF)
EDF < k
k must be large enough, the λ penalty does the rest
Large enough — space of functions representable by the basis includes the true function or a close approximation to the tru function
Bigger k increases computational cost
In mgcv, default k values are arbitrary — after choosing the model terms, this is the key user choice
Must be checked! — gam.check()
GAMs give us a framework to model flexible nonlinear relationships
Use little functions (basis functions) to make big functions (smooths)
Use a penalty to trade off wiggliness/generality
Need to make sure your smooths are wiggly enough
The type of smoother is controlled by the bs
argument (think basis)
The default is a low-rank thin plate spline bs = 'tp'
Many others available
bs = 'cr'
bs = 'ps'
bs = 'cc'
or bs = 'cp'
bs = 'ad'
bs = 're'
bs = 'fs'
bs = 'ds'
bs = 'sos'
bs = 'mrf'
bs = 'so'
bs = 'gp'
A GAM is just a fancy GLM
Simon Wood & colleagues (2016) have extended the mgcv methods to some non-exponential family distributions
binomial()
poisson()
Gamma()
inverse.gaussian()
nb()
tw()
mvn()
multinom()
betar()
scat()
gaulss()
ziplss()
twlss()
cox.ph()
gamals()
ocat()
Two ways to fit smooth interactions
s(x, z, bs = 'tp')
x
and z
x
and z
te(x, z)
There are multiple ways to build tensor products in mgcv
te(x, z)
t2(x, z)
s(x) + s(z) + ti(x, z)
te()
is the most general form but not usable in gamm4::gamm4()
or brms
t2()
is an alternative implementation that does work in gamm4::gamm4()
or brms
ti()
fits pure smooth interactions; where the main effects of x
and z
have been removed from the basis
Two ways for factor smooth interactions
by
variable smoothsy ~ f + s(x, by = f)
bs = 'fs'
basisy ~ s(x, f, bs = 'fs')
When fitted with REML or ML, smooths can be viewed as just fancy random effects
Inverse is true too; random effects can be viewed as smooths
If you have simple random effects you can fit those in gam()
and bam()
without needing the more complex GAMM functions gamm()
or gamm4::gamm4()
These two models are equivalent
m_nlme <- lme(travel ~ 1, data = Rail, ~ 1 | Rail, method = "REML") m_gam <- gam(travel ~ s(Rail, bs = "re"), data = Rail, method = "REML")
The random effect basis bs = 're'
is not as computationally efficient as nlme or lme4 for fitting
Instead see gamm()
and gamm4::gamm4()
gamm()
fits using lme()
gamm4::gamm4()
fits using lmer()
or glmer()
For non Gaussian models use gamm4::gamm4()
Read Simon Wood's book!
Lots more material on our ESA GAM Workshop site
https://noamross.github.io/mgcv-esa-workshop/
Noam Ross' free GAM Course
https://noamross.github.io/gams-in-r-course/
A couple of papers:
Also see my blog: www.fromthebottomoftheheap.net
David Miller
Eric Pedersen
Noam Ross
What are GAMs?
How to fit GAMs in R with mgcv
Examples
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |