A better way of saving and loading objects in R
Hadley Wickham ((???)) this week mentioned on Twitter his preference for
saveRDS() over the more familiar
save(). Being a new function to me, I thought I’d take a look…
load() will be familiar to many R users. They allow you to save a named R object to a file or other connection and restore that object again. When loaded the named object is restored to the current environment (in general use this is the global environment — the workspace) with the same name it had when saved. This is annoying for example when you have a saved model object resulting from a previous fit and you want to compare it with the model object returned when the R code is rerun. Unless you change the name of the model fit object in your script you can’t have both the saved object and the newly created one available in the same environment at the same time.
Here’s an example of what I mean.
saveRDS() provides a far better solution to this problem and to the general one of saving and loading objects created with R.
saveRDS() serializes an R object into a format that can be saved. Wikipedia describes this thus
…serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment.
save() does the same thing, but with one important difference;
saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized.
We can illustrate this using the model fitted earlier
(Note that the two model objects have different environments within their representations so we have to ignore this when testing their identity.)
You’ll notice that in the call to
saveRDS() I named the file with the extension
.rds. This appears to be the convention used for serialized object of this sort; R uses this representation often, for example package meta-data and the databases used by
help.search(). In contrast the extension
.rda is often used for objects serialized via
So there you have it;
readRDS() are the newest additions to my day-to-day workflow.
Update: As Gabor points out in the comments
saveRDS() isn’t a drop-in replacement for
save(). The main difference is that
save() can save many objects to a file in a single call, whilst
saveRDS(), being a lower-level function, works with a single object at a time. This is a feature for me given the above use-case, but if you find yourself saving any more than a couple of objects at a time
saveRDS() may not be ideal for you. The second significant difference is that
saveRDS() forgets the original name of the object; in the use-case above this is also seen as an advantage. If maintaining the original name is important to you then there is no reason to switch from using
Commentscomments powered by Disqus
By Gavin Simpson
01 April 2012