We meet today on Treaty 4 lands, the territories of the Cree, Saulteaux (SOH-toh), Dakota, Lakota, Nakoda, and the homeland of the Métis Nation.
Today, these lands continue to be the shared territory of many diverse peoples.
We meet today on Treaty 4 lands, the territories of the Cree, Saulteaux (SOH-toh), Dakota, Lakota, Nakoda, and the homeland of the Métis Nation.
Today, these lands continue to be the shared territory of many diverse peoples.
One way to describe statistics is the principled process by which we learn from data in the presence of noise and uncertainty
Why would we want to learn from data?
Lotka-Voltera models of competition between species
What do the data tell us?
If we want to know how theory matches with observation then we might want to see what the data can tell us without imposing too many restrictions or constraints on our statistical model
We may have little or no theory to work with, so we take an empirical approach which may lead to the development of new theory
We learn from data because it can highlight our preconceptions and biases
Learning from data could be a simple as fitting a linear regression model...
Or as complex as fitting a sophisticated multi-layered neural network trained on huge datasets or corpora
Learning from data involves trade offs
We can have models that fit our data well — low bias — but which are highly variable, or
We can fit models that have lower variance, but these tend to have higher bias, i.e. fit the data less well
A linear regression model is very interpretable but unless the underlying relationship is linear it will have poor fit
Deep learning may fit data incredibly well but the model is very difficult to interpret and understand
GAMs are an intermediate-complexity model
Heresey, I know, but I prefer my sci-fi more Duffer Brothers than Time Lord
GAMs use splines to represent the non-linear relationships between covariates, here x
, and the response variable on the y
axis.
Splines are built up from basis functions
Here I'm showing a cubic regression spline basis with 10 knots/functions
We weight each basis function to get a spline. Here all the basisi functions have the same weight so they would fit a horizontal line
But if we choose different weights we get more wiggly spline
Each of the splines I showed you earlier are all generated from the same basis functions but using different weights
How does this help us learn from data?
Here I'm showing a simulated data set, where the data are drawn from the orange functions, with noise. We want to learn the orange function from the data
Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints
Developing methodological approaches
Developing packages to enable model-fitting by other scientists
Training
Simpson (2018) Frontiers in Ecology & Evolution
Pedersen et al (2019) PeerJ
Memes are cool
Estimate effect size
Estimate variance
Make predictions from a model
Estimate effect size
Estimate variance
Make predictions from a model
Stats are cool™
Stats profs be like
Stats profs be like
Students be like
GAMs
Conditional distribution
yi∼EF(μi,Θ)
GAMs
Conditional distribution
Link function
g(E(yi))=g(μi)=ηi
GAMs
Conditional distribution
Link function
Linear predictor
η=β0+p∑j=1fj(xj)
GAMs
Conditional distribution
Link function
Linear predictor
Homework: read chapter on GAMs and do ex. 1–20
12 rivers in AB, SK, MB
Apportionment of inter-provincial river water
Water quality objectives
Sulphate ([SO4]) in Assiniboine River near Shellmouth
No significant increases over the years
Keep [SO4] < 299 mg L-1
Test statistic used for the detection of a trend in a time series
Commonly used for water quality assessment (Hirsch et al. 1982)
Used by the Prairie Provinces Water Board — responsible for MAA
Assumes that the trend is monotonic
Hirsch, R.M., Slack, J.R. and Smith, R.A. (1982), Techniques for trend assessment for monthly water quality data, Water Resources Research 18, 107–121.
Create a model with year, seasonal, and flow effects
Create a model with year, seasonal, and flow effects
Add interactions between terms
Add interactions between terms
How seasonal patterns change over the years, How the effect of flow changes over the years (e.g. FX of dilution in seasons of high/low pollution), How the effect of flow changes over the seasons (e.g. FX of dilution in years of high/low pollution)
Estimate E([SO4]) over time
Estimate probability [SO4] exceeds the guideline (299 mg L-1) over time
Identify years of significant increase in [SO4]
Using the model we can show that:
The expected value of [SO4] was often > 299 mg L-1 — esp. after 2010
The probability that [SO4] exceeded 299 mg L-1 was often high — esp. after 2010
[SO4] increased markedly post 2008 — failure of the water quality objectives
[SO4] in the Assiniboine River should be monitored more closely
Adjusted and Homogenized Canadian Climate Data (AHCCD)
Monthly mean temperature
36 climate stations
Variables
36693 observations
~ 36 stations ranging from Uranium City in the Taiga Shield to Poplar River near the Canadian-US border
Random effect allows the model to account for the variance between locations due to any systematic/random error
There 36 693 observations
Daily temperature data are available, and models have been run for these, but we decided to use monthly temperature because there were less than 37 thousand observations vs the 1.1 million
Mention map
To show changing trends, I used HGAM…
After Gavin’s and Stefano’s presentations, expect one of the following thoughts on GAMs.
At the very simplest, GAMs better model wiggly data and show trends more accurately than linear models would
Don’t freak out before I get the chance to present on why I’m really here
HGAMs are a lot simpler than you may expect
Only difference is data can be grouped, and trends can vary between the groups
HGAMs are similar to GAMs
Instead of one model per time series, model all time series at once
Smooths can vary between time series
Can determine
common trend over all stations
unique trend for each station
Quick heads up
The Stavrinauts made me add this…
Not to bore with stats jargon
HGAMs allow us to model wiggly curves to the data and account for location differences, all in one model
Break down components of the model
Left: average seasonal trend across all stations, summers much warmer than winters
Difference is 30-40 degrees
Middle: temperature change throughout the years
Fewer temperature stations in the beginning, hence the large Cis
Temperature has become more variable between years
Right: while left shows average trends, the right plot shows how seasonal trends have changed by year
Three main cities in Saskatchewan for last 118 years
Temperatures are increasing throughout the seasons, but the winters are more drastic
First thing to notice is that north SK is colder than south SK
Northern SK is colder than southern SK
Difference is about 9℃
Fewer stations in northern SK
Need to use the model to extrapolate between stations
9 degrees difference
Fewer stations in north; therefore, use model to extrapolate for all possible
2018 modelled temperature
Spatial pattern changes throughout the year
Greater temperature variability in northern SK
How spatial trends change throughout the year
North SK has these two vertical bands that are warmer in the summer and colder in the winter than the adjacent area
South SK is not as variable and remains even throughout the year
Refresher, these plots show the global/average trend, but we also want to know how individual stations vary around these average trends
First plot shows how locations vary within years
Some have warmer winters and cooler summers than average
Some are the inverse. Most stay around the average
Second plot: how locations vary throughout the years
Again, most stay around the average. But some have increased
Two weird ones that have cooled and are now closer to the average
HGAMs are useful for modelling wiggly data from many climate stations
Temperatures have significantly increased across Saskatchewan since the 1880s
This change is more clearly seen in the winter months
Seasonal trends vary spatially and temporally
Significant variation in the trends at each climate station
Annual minimum temperature is a strong control on many in-lake processes (eg Hampton et al 2017)
Extreme events can have long-lasting effects on lake ecology — mild winter in Europe 2006–7 (eg Straile et al 2010)
Reduction in habitat or refugia for cold-adapted species
Hampton et al (2017). Ecology under lake ice. Ecology Letters 20, 98–111. doi: 10/f3tpzh
Straile et al (2010). Effects of a half a millennium winter on a deep lake — a shape of things to come? Global Change Biology 16, 2844–2856. doi: 10/bx6t4d
Central limit theorem shows us that the Gaussian or normal distribution is the sampling distribution for many sample statistics, including sample means, as samples sizes become large
Central limit theorem underlies much of the theory that justifies much of the statistics you learn about in your statistics courses, and supports the use of the Gaussian or normal distribution
The maximum of a sample of iid random variables after proper renormalization can only converge in distribution to one of three possible distributions; the Gumbel distribution, the Fréchet distribution, or the Weibull distribution.
In 1978 Daniel McFadden demonstrated the common functional form for all three distributions — the GEVD
G(y)=exp{−[1+ξ(y−μσ)]−1/ξ+}
Three parameters to estimate
Three distributions
Lake minimum surface water temperatures have increased by on the order of 1–3 degrees over the last 60 years
Evidence that the distribution of annual minima has changed in many lakes — implications for future extreme events which have long-term knock-on effects
HGAMLSS with the GEV distribution are a good way of modelling common trends in environmental extremes
We meet today on Treaty 4 lands, the territories of the Cree, Saulteaux (SOH-toh), Dakota, Lakota, Nakoda, and the homeland of the Métis Nation.
Today, these lands continue to be the shared territory of many diverse peoples.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |