Using random effects in GAMs with mgcv
02 February 2021 /posted in: R
There are lots of choices for fitting generalized linear mixed effects models within R, but if you want to include smooth functions of covariates, the choices are limited. One option is to fit the model using
gamm() from the mgcv 📦 or
gamm4() from the gamm4 📦, which use
lme() (nlme 📦) or one of
glmer() (lme4 📦) under the hood respectively. The problem with doing things that way is that you get PQL fitting for non-Gaussian models (😱) and the range of families for handling non-Gaussian responses is quite limited, especially compared with the extended families now available with
gam(). brms 📦 is a good option if you don’t want to do everything by hand, but the MCMC can be slow. Instead, we could use the equivalence between smooths and random effects and use
bam() from mgcv. In this post I’ll show you how to do just that.
Getting data from the Canada Covid-19 Tracker using R
31 January 2021 /posted in: R
Last semester (Fall 2020) I taught a new course in healthcare data science for the Johnson Shoyama Graduate School in Public Policy. One of the final topics of the course was querying application programming interfaces (APIs) from within R. The example we used was querying data on the Covid 19 pandemic from the Covid-19 Tracker Canada, which has a simple API that’s easy to work with. In this post I’ll show how we accessed the API from within R and converted the query responses into something we can work with easily.
Two new versions of gratia released
30 January 2021 /posted in: R
While the Covid-19 pandemic and teaching a new course in the fall put paid to most of my development time last year, some time off work this January allowed me time to work on gratia 📦 again. I released 0.5.0 to CRAN in part to fix an issue with tests not running on the new M1 chips from Apple because I wasn’t using vdiffr 📦 conditionally. Version 0.5.1 followed shortly thereafter as I’d messed up an argument name in
smooth_estimates(), a new function that I hope will allow development to proceed more quickly and mke it easier to maintain code and extend functionality to cover a greater range of model types. Read on to fin out more about
smooth_estimates() and what else was in these two releases.
Extrapolating with B splines and GAMs
03 June 2020 /posted in: R
An issue that often crops up when modelling with generlaized additive models (GAMs), especially with time series or spatial data, is how to extrapolate beyond the range of the data used to train the model? The issue arises because GAMs use splines to learn from the data using basis functions. The splines themselves are built from basis functions that are typically setup in terms of the data used to fit the model. If there are no basis functions beyond the range of the input data, what exactly is being used if we want to extrapolate? A related issue is that of the wiggliness penalty; depending on the type of basis used, the penalty could extend over the entire real line (-∞–∞), or only over the range of the input data. In this post I want to take a practical look the extrapolation behaviour of splines in GAMs fitted with the mgcv package for R. In particular I want to illustrate how flexible the B spline basis is.
gratia 0.4.1 released
31 May 2020 /posted in: R
After a slight snafu related to the 1.0.0 release of dplyr, a new version of gratia is out and available on CRAN. This release brings a number of new features, including differences of smooths, partial residuals on partial plots of univariate smooths, and a number of utility functions, while under the hood gratia works for a wider range of models that can be fitted by mgcv.
Rendering your README with GitHub Actions
30 April 2020 /posted in: R
There’s one thing that has bugged me for a while about developing R packages. We have all these nice, modern tools we have for tracking our code, producing web sites from the roxygen documentation, an so on. Yet for every code commit I make to the master branch of a package repo, there’s often two or more additional steps I need to take to keep the package
README.md and pkgdown site in sync with the code. Don’t get me wrong; it’s amazing that we have these tools available to help users get to grips with our R packages. It’s just that there’s a lot of extra things to remember to do to keep everything up to date. The development of free-to-use services such as Travis CI or Appveyor have been very useful as they can automate many of these repetitive tasks. A more recent newcomer to the field is GitHub Actions. The other day I was grappling with getting a GitHub Actions workflow to render a
README.Rmd file to
README.md on GitHub, so that I didn’t have to do it locally all the time. After a lot of trial and error, this is how I got it working.
What evaluating Discovery Grants for the last three years has taught me
26 February 2020 /posted in: Science
For the last three years I have been a member of NSERC’s Discovery Grant Evaluation Group for Ecology and Evolution (that’s 1503 in NSERC-speak). In that time I’ve evaluated over 130 Discovery Grant submissions, read the same number of Canadian CCVs, and even chaired a few evaluations. This is what I learned, through this process, about writing a successful Discovery Grant.
25 October 2019 /posted in: R
One of the fun bits of my job is that I have actual time dedicated to helping colleagues and grad students with statistical or computational problems. Recently I’ve been helping one of our Lab Instructors with some data that from their Plant Physiology Lab course. Whilst I was writing some R code to import the raw data for the lab from an Excel sheet, it occurred to me that this would be a good excuse to look at the new
pivot_wider() functions from the tidyr package. In this post I show how these new functions facilitate common data processing steps; I was personally surprised how little data wrangling was actually needed in the end to read in the data from the lab.
radian: a modern console for R
18 June 2019 /posted in: R
Whenever I’m developing R code or writing data wrangling or analysis scripts for research projects that I work on I use Emacs and its add-on package Emacs Speaks Statistics (ESS). I’ve done so for nigh on a couple of decades now, ever since I switched full time to running Linux as my daily OS. For years this has served me well, though I wouldn’t call myself an Emacs expert; not even close! With a bit of help from some R Core coding standards document I got indentation working how I like it, I learned to contort my fingers in weird and wonderful ways to execute a small set of useful shortcuts, and I even committed some of those shortcuts to memory. More recently, however, my go-to methods for configuring Emacs+ESS were failing; indentation was all over the shop, the smart
_ stopped working or didn’t work as it had for over a decade, syntax highlighting of R-related files, like
.Rmd was hit and miss, and polymode was just a mystery to me. Configuring Emacs+ESS was becoming much more of a chore, and rather unhelpfully, my problems coincided with my having less and less time to devote to tinkering with my computer setups. Also, fiddling with this stuff just wasn’t fun any more. So, in a fit of pique following one to many reconfiguration sessions of Emacs+ESS, I went in search of some greener grass. During that search I came across radian, a neat, attractive, simple console for working with R.
Tibbles, checking examples, & character encodings
22 January 2019 /posted in: R
Recently I’ve been preparing my gratia package for submission to CRAN. During my pre-flight testing I noticed an issue under Windows checking the examples in the package against the reference output I generated on linux. In the latest release of the tibble package, the way tibbles are printed has changed subtly and in a way that leads to cross-platform differences. As I write this, tibbles with more than a set number of rows are printed in a truncated form, showing only the first 10 rows of data. In such cases, a final line is printed with an ellipsis and a note as to how many more rows are in the tibble. It was this ellipsis that was causing the cross-platform issue where differences between the output generated on windows and the reference output were being identified during
R CMD check on Windows. If this is causing you an issue, here’s one way to solve the problem.