Using packrat with git for (better) version control
This is a post in the spirit of “100 days of R packages” documenting my experience using a new (to me) R package. My posts won’t be nearly once per day – more like once per whenever I happen to need to learn a new package. There are of course much more thorough and formal guides out there, but I also find it very informative and entertaining to learn from others as they go.
packrat is (or should be) an important part of anybody’s reproducibility toolkit in R. Its sole purpose is to keep track of what versions of packages (and R itself) you are running for any given project. This way, if you need to share code, it completely takes the guesswork out of debugging due to differences in package versions. And that includes sharing with yourself in the future! I can easily envision a scenario in which I work on project that uses various packages, have to set it aside for a while… maybe a long while… then come back to it many package updates later.
packrat ensures that I can get back to the original state of my project even if those updates break my original code.
Here is what I did in one particular project to get
1. Install packrat the usual way
2. Go to my current project and run
This scans the project, scooping up any package names called by
require(), etc., and sets up a “private library” of packages to be used just for this project. All
packrat files will be saved in the appropriately-named
packrat folder within the project.
3. Optionally, add
.gitignore for this project.
packrat/src/ contains the source tarballs, which can be quite large (ca. 70 mb for this project), so you may want to not have them in your repo. However, this means that anybody who attempts to reproduce (i.e., clone) the project will have to download those from external repositories (i.e., CRAN). And packrat is
limited in the types of external repositories it can use.
packrat/lib*/, which contains the compiled packages, is automatically added to
packrat::snapshot() to save a snapshot of the current state of packages used for this project.
Within my project, if I now go to
packrat/lib-R/x86_64-apple-darwin15.6.0, I will see it contains one folder,
3.4.4., for the current version of R. This is the “private library” containing all of the packages used by this project.
packrat, each version of R on my machine would contain a single library with all my packages, and any R project would have to use that one shared library. In other words, each package could have only one version, and all projects have to use that same version. With
packrat, since each project has its own private library, each project can use whatever version of any package you want.
Those four steps are basically all you need to do to get
packrat set up for an existing project. I just happen to be needing to update R from 3.4 to 3.5., so I am going to do that and see how it works with
packrat in the next series of steps.
5. I update R by closing RStudio, going to CRAN, downloading R 3.5, and installing it.
After that’s done, I can check the installed versions by running
ls -l /Library/Frameworks/R.framework/Versions/
total 8 drwxrwxr-x 3 root admin 102 Sep 28 2012 2.14 drwxrwxr-x 6 root admin 204 Sep 28 2012 2.15 drwxrwxr-x 6 root admin 204 Apr 13 2013 3.0 drwxrwxr-x 3 root admin 102 Jul 25 2015 3.1 drwxrwxr-x 3 root admin 102 Dec 15 2016 3.2 drwxrwxr-x 6 root admin 204 Dec 15 2016 3.3 drwxrwxr-x 3 root admin 102 May 26 14:40 3.4 drwxrwxr-x 6 root admin 204 May 26 14:40 3.5 lrwxr-xr-x 1 root admin 3 May 26 14:40 Current -> 3.5
All my old versions are still there if I need to go back, but the current one is now set to 3.5.
6. Open my project again in RStudio. RStudio is now running R 3.5, which doesn’t have any packages installed yet, including
packrat. But since my project is now a “packrat project,” this gets detected, and
packrat installs itself. How nice of it ☺️
Packrat is not installed in the local library -- attempting to bootstrap an installation... > Installing packrat into project private library: - 'packrat/lib/x86_64-apple-darwin15.6.0/3.5.0' * installing *source* package ‘packrat’ ... ** package ‘packrat’ successfully unpacked and MD5 sums checked ** R ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (packrat) > Attaching packrat > Restoring library > Packrat bootstrap successfully completed. Restarting R and entering packrat mode...
Now when I look in
packrat/lib-R/x86_64-apple-darwin15.6.0/ (relative to my current project directory), I see two folders,
3.5.0, for the two versions of R that my project has used.
3.5.0 is empty except for
packrat, which just installed itself.
This re-installs all of my packages from the previous snapshot, saving them into
packrat/lib-R/x86_64-apple-darwin15.6.0/3.5.0. Sweet – no need for tedious
install.packages() by hand!
But I notice a warning message at the end:
Warning message: In packrat::restore() : The most recent snapshot was generated using R version 3.4.4
Yes that’s true. How to tell it I want to use R 3.5? Since that’s what I’m doing now, maybe just another
packrat::snapshot()? Let’s try that.
Unable to tangle file '/Users/joelnitta/R/baitfindR_drake/README.Rmd'; cannot parse dependencies Error in file(con, "r") : cannot open the connection In addition: Warning messages: 1: In loadNamespace(name) : there is no package called ‘yaml’ 2: In file(con, "r") : cannot open file '/var/folders/p1/r4gbnm710n55gt5h_1c9sz4w0000gn/T//RtmpZToRhp/fileb85a39e6c455': No such file or directory
Uh oh, getting some weird errors 😟
yaml got left out somehow, and I have no idea what’s up with the second error. From reading the documentation, I know that as long as packrat is in “on” mode, running
install.packages() will install to my private library for this project.
install.packages("yaml") and try again:
Installing package into ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/yaml_2.1.19.tgz' Content type 'application/x-gzip' length 185161 bytes (180 KB) ================================================== downloaded 180 KB
??? Wait it didn’t install into my private library? Let’s look there to be sure… OK it’s there, and it looks like it’s been there since before I ran the last command. This is getting weird… So much for fancy bug-checking, let’s just reboot R! 😫
Close down RStudio, open the project again, run
Up to date.
OK, works for me! Now try
Snapshot written to '/Users/joelnitta/R/baitfindR_drake/packrat/packrat.lock'
And if we look at
packrat.lock, the top four lines say
PackratFormat: 1.4 PackratVersion: 0.4.9.2 RVersion: 3.5.0 Repos: CRAN=https://cran.rstudio.com/
Alright, now my current snapshot-ed version is at 3.5! So I should be (theoretically) able to back down to R 3.4, run
packrat::restore() again, and get all of my v 3.4 packages… I think. That’s one thing I’m not too clear about
packrat just now – it seems really easy to
snapshot() your current state, and
restore() the state before that, but what about older versions? Do you only get one snapshot-ed version to keep? It seems that way, except maybe in this “special case” since each version of R gets its own private library of packages within each packrat project. I guess that works because you only care if your code is functioning properly with the current versioning of your packages.
8. Anyways, the next thing to do is to run my project code again and make sure everything is intact after updating R…
Yes, looks good! 👌
9. Now to test how this works when I update my packages. The easiest way to update all packages is through the RStudio GUI Tools -> Check for Package Updates -> Select All -> Install Updates, so I do that.
The following packages have been updated in your library, but have not been recorded in packrat: library packrat processx 3.1.0 NA Use packrat::snapshot() to record these packages in packrat. The following packages are out of sync between packrat and your current library: packrat library DBI 0.8 1.0.0 MASS 7.3-49 7.3-50 Rcpp 0.12.16 0.12.17 callr 2.0.3 2.0.4 dplyr 0.7.4 0.7.5 foreign 0.8-69 0.8-70 future 1.8.0 1.8.1 future.apply 0.1.0 0.2.0 modelr 0.1.1 0.1.2 psych 188.8.131.52 1.8.4 stringi 1.1.7 1.2.2 tidyr 0.8.0 0.8.1 utf8 1.1.3 1.1.4 yaml 2.1.18 2.1.19 Use packrat::snapshot() to set packrat to use the current library, or use packrat::restore() to reset the library to the last snapshot.
OK, making more sense now. It does indeed seem that we only get to have one “stable” version to fall back on. So the next step is to try running project code again now that I’ve updated packages to make sure all is A-OK.
Yes it is, so we can do
packrat::snapshot(). Now this is the “stable” version until I update more packages.
Now when I share my project with collaborators via github or bitbucket it will be easy to ensure that we are all using the same versions of packages. All they have to do is clone the project, then run
packrat::restore(). No more “what version are you running?” confusion 👍
Also, I finally figured out how to maintain multiple “snapshots” – use git, of course!😅 As long as you commmit immediately after a snapshot, you should be able to checkout that particular version of
packrat::restore(), and be back to whatever package version state you want. The same applies even if you want to use different versions of R. Just make sure to checkout
packrat.lock matching whatever R version you are currently running.
packrat + git = 😃