wikipedia

Using packrat with git for (better) version control

This is a post in the spirit of “100 days of R packages” documenting my experience using a new (to me) R package. My posts won’t be nearly once per day – more like once per whenever I happen to need to learn a new package. There are of course much more thorough and formal guides out there, but I also find it very informative and entertaining to learn from others as they go.

Today’s package: packrat 📦🐀

packrat is (or should be) an important part of anybody’s reproducibility toolkit in R. Its sole purpose is to keep track of what versions of packages (and R itself) you are running for any given project. This way, if you need to share code, it completely takes the guesswork out of debugging due to differences in package versions. And that includes sharing with yourself in the future! I can easily envision a scenario in which I work on project that uses various packages, have to set it aside for a while… maybe a long while… then come back to it many package updates later. packrat ensures that I can get back to the original state of my project even if those updates break my original code.


Here is what I did in one particular project to get packrat running.

1. Install packrat the usual way

install.packages("packrat")`

2. Go to my current project and run packrat::init().

This scans the project, scooping up any package names called by library(), require(), etc., and sets up a “private library” of packages to be used just for this project. All packrat files will be saved in the appropriately-named packrat folder within the project.

3. Optionally, add packrat/src/ to .gitignore for this project.

packrat/src/ contains the source tarballs, which can be quite large (ca. 70 mb for this project), so you may want to not have them in your repo. However, this means that anybody who attempts to reproduce (i.e., clone) the project will have to download those from external repositories (i.e., CRAN). And packrat is limited in the types of external repositories it can use.

Note that packrat/lib*/, which contains the compiled packages, is automatically added to .gitignore.

4. Run packrat::snapshot() to save a snapshot of the current state of packages used for this project.

Within my project, if I now go to packrat/lib-R/x86_64-apple-darwin15.6.0, I will see it contains one folder, 3.4.4., for the current version of R. This is the “private library” containing all of the packages used by this project.

Without packrat, each version of R on my machine would contain a single library with all my packages, and any R project would have to use that one shared library. In other words, each package could have only one version, and all projects have to use that same version. With packrat, since each project has its own private library, each project can use whatever version of any package you want.

Those four steps are basically all you need to do to get packrat set up for an existing project. I just happen to be needing to update R from 3.4 to 3.5., so I am going to do that and see how it works with packrat in the next series of steps.


5. I update R by closing RStudio, going to CRAN, downloading R 3.5, and installing it.

After that’s done, I can check the installed versions by running ls -l /Library/Frameworks/R.framework/Versions/

total 8
drwxrwxr-x  3 root  admin  102 Sep 28  2012 2.14
drwxrwxr-x  6 root  admin  204 Sep 28  2012 2.15
drwxrwxr-x  6 root  admin  204 Apr 13  2013 3.0
drwxrwxr-x  3 root  admin  102 Jul 25  2015 3.1
drwxrwxr-x  3 root  admin  102 Dec 15  2016 3.2
drwxrwxr-x  6 root  admin  204 Dec 15  2016 3.3
drwxrwxr-x  3 root  admin  102 May 26 14:40 3.4
drwxrwxr-x  6 root  admin  204 May 26 14:40 3.5
lrwxr-xr-x  1 root  admin    3 May 26 14:40 Current -> 3.5

All my old versions are still there if I need to go back, but the current one is now set to 3.5.

6. Open my project again in RStudio. RStudio is now running R 3.5, which doesn’t have any packages installed yet, including packrat. But since my project is now a “packrat project,” this gets detected, and packrat installs itself. How nice of it ☺️

Packrat is not installed in the local library -- attempting to bootstrap an installation...
> Installing packrat into project private library:
- 'packrat/lib/x86_64-apple-darwin15.6.0/3.5.0'
* installing *source* package ‘packrat’ ...
** package ‘packrat’ successfully unpacked and MD5 sums checked
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (packrat)
> Attaching packrat
> Restoring library
> Packrat bootstrap successfully completed. Restarting R and entering packrat mode...

Now when I look in packrat/lib-R/x86_64-apple-darwin15.6.0/ (relative to my current project directory), I see two folders, 3.4.4 and 3.5.0, for the two versions of R that my project has used. 3.5.0 is empty except for packrat, which just installed itself.

7. Run packrat::restore().

This re-installs all of my packages from the previous snapshot, saving them into packrat/lib-R/x86_64-apple-darwin15.6.0/3.5.0. Sweet – no need for tedious install.packages() by hand!

But I notice a warning message at the end:

Warning message:
In packrat::restore() :
  The most recent snapshot was generated using R version 3.4.4

Yes that’s true. How to tell it I want to use R 3.5? Since that’s what I’m doing now, maybe just another packrat::snapshot()? Let’s try that.

Unable to tangle file '/Users/joelnitta/R/baitfindR_drake/README.Rmd'; cannot parse dependencies
Error in file(con, "r") : cannot open the connection
In addition: Warning messages:
1: In loadNamespace(name) : there is no package called ‘yaml’
2: In file(con, "r") :
  cannot open file '/var/folders/p1/r4gbnm710n55gt5h_1c9sz4w0000gn/T//RtmpZToRhp/fileb85a39e6c455': No such file or directory

Uh oh, getting some weird errors 😟

Apparently yaml got left out somehow, and I have no idea what’s up with the second error. From reading the documentation, I know that as long as packrat is in “on” mode, running install.packages() will install to my private library for this project.

So let’s install.packages("yaml") and try again:

Installing package into ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/yaml_2.1.19.tgz'
Content type 'application/x-gzip' length 185161 bytes (180 KB)
==================================================
downloaded 180 KB

??? Wait it didn’t install into my private library? Let’s look there to be sure… OK it’s there, and it looks like it’s been there since before I ran the last command. This is getting weird… So much for fancy bug-checking, let’s just reboot R! 😫

Close down RStudio, open the project again, run packrat::status()

Up to date.

OK, works for me! Now try packrat::snapshot() again

Snapshot written to '/Users/joelnitta/R/baitfindR_drake/packrat/packrat.lock'

And if we look at packrat.lock, the top four lines say

PackratFormat: 1.4
PackratVersion: 0.4.9.2
RVersion: 3.5.0
Repos: CRAN=https://cran.rstudio.com/

Alright, now my current snapshot-ed version is at 3.5! So I should be (theoretically) able to back down to R 3.4, run packrat::restore() again, and get all of my v 3.4 packages… I think. That’s one thing I’m not too clear about packrat just now – it seems really easy to snapshot() your current state, and restore() the state before that, but what about older versions? Do you only get one snapshot-ed version to keep? It seems that way, except maybe in this “special case” since each version of R gets its own private library of packages within each packrat project. I guess that works because you only care if your code is functioning properly with the current versioning of your packages.

8. Anyways, the next thing to do is to run my project code again and make sure everything is intact after updating R…

Yes, looks good! 👌

9. Now to test how this works when I update my packages. The easiest way to update all packages is through the RStudio GUI Tools -> Check for Package Updates -> Select All -> Install Updates, so I do that.

Then check packrat::status()

The following packages have been updated in your library, but have not been recorded in packrat:
               library   packrat
    processx     3.1.0        NA

Use packrat::snapshot() to record these packages in packrat.

The following packages are out of sync between packrat and your current library:
                   packrat   library
    DBI                0.8     1.0.0
    MASS            7.3-49    7.3-50
    Rcpp           0.12.16   0.12.17
    callr            2.0.3     2.0.4
    dplyr            0.7.4     0.7.5
    foreign         0.8-69    0.8-70
    future           1.8.0     1.8.1
    future.apply     0.1.0     0.2.0
    modelr           0.1.1     0.1.2
    psych          1.8.3.3     1.8.4
    stringi          1.1.7     1.2.2
    tidyr            0.8.0     0.8.1
    utf8             1.1.3     1.1.4
    yaml            2.1.18    2.1.19

Use packrat::snapshot() to set packrat to use the current library, or use
packrat::restore() to reset the library to the last snapshot.

OK, making more sense now. It does indeed seem that we only get to have one “stable” version to fall back on. So the next step is to try running project code again now that I’ve updated packages to make sure all is A-OK.

Yes it is, so we can do packrat::snapshot(). Now this is the “stable” version until I update more packages.

Conclusion

Now when I share my project with collaborators via github or bitbucket it will be easy to ensure that we are all using the same versions of packages. All they have to do is clone the project, then run packrat::restore(). No more “what version are you running?” confusion 👍

Also, I finally figured out how to maintain multiple “snapshots” – use git, of course!😅 As long as you commmit immediately after a snapshot, you should be able to checkout that particular version of packrat.lock, run packrat::restore(), and be back to whatever package version state you want. The same applies even if you want to use different versions of R. Just make sure to checkout packrat.lock matching whatever R version you are currently running.

packrat + git = 😃

Resources

Official packrat site

A much more thorough tutorial

Related

comments powered by Disqus