Making R packages for me

A simple guide to helpful resources when making R packages mostly for personal use, written by a mere mortal. It is not intended as a how to, but a resource of how to links (mainly for my own benefit) that I found useful when developing my first R package. (well, to be honest my first package is still being developed, and a complete mess; InteractiveWordcloud is my first functioning, openly shared package that I developed for the LivingNorway colloquium after being frustrated at the commercial offerings).

Why?

“But I already write R functions and source them and document my code and have good workflows…”

Great. More power to you. I, unfortunately, do not have such similar workflows and scripts that I can remember always what files, where, and what functions do. So, writing packages for myself means that I can keep a better documentation of what functions do. It also means that I can easily share my work with others. Including my other selves (on other computers, on other platforms, across institutions). If I share it, I might even get the opportunity to receive hopefully constructive criticism from others. Which is good, because even if it damages my pride, I want to improve (as a coder, a researcher, and as a human being). Karl Broman expands on why you might want to write your own packages.

So do whatever works for you. If you choose to try writing your own R packages, read on for my links and tips. If not, whatever.

How?

There are an increasing number of how to guides. They range from Hilary Parker’s super basic tutorial and Hadley Wickham’s comprehensive tomb, and the even denser official cran manual.

I found Karl Broman’s primer one of the better resources – relatively up to date, and comprehensive enough, but not too overwhelming. He also has nice links that are highly relevant on “git/github, GNU make, knitr, making a web site with GitHub Pages, data organization, and reproducible research”.

There are many, many other resources out there, more every day. This tutorial from Matti Vuorre is embedded in an open science framework, and seems relatively comprehensive and user friendly, I might look to it if I was starting again. And if you use RStudio, they provide some RStudio specific tutorials.

How do I…?

Here are some further tips or more specific links that I found useful.

Initiate, build, and manage a package

Initiate the package? Easiest is probably through RStudio, new > package or usethis

Get a licence file? This is needed to build. See Broman’s licences page. Also automated via usethis.

Build and install my package? If you just want it locally, you can click the ‘install and restart’ in the Build pane of RStudio. To distribute, you can create a tar.gz file using devtools::build, and then this can be loaded via the command:

install.packages("packageName_0.1.0.tar.gz", repos = NULL, type = "source")

However, why build a package if not storing it online? Just host it on github (see last item in this post). Then it can easily be downloaded by anyone via the command:

devtools::install_github(“developer/package”)

Indeed, usethis and the older devtools provide many functions to do common tasks to build packages. I recommend taking note of these (former self – take note!!)

Change the package name post-creation? Oops. Lucky there is a package for that: changer

Add functions and document them

Use roxygen. I wouldn’t go so far as to say it makes life easy, but it does make life easier.

First, allow roxygen to build the documentation: Tools > project options > build tools check ‘generate documentation with Roxygen’.

Then write a function as a .R file in the /R folder, place the cursor in the function, then go to code > add roxygen skeleton. Write documentation in the function file with roxygen coding (#’). Run devtools::document() to push through new changes in the roxygen documentation within the .R files to the files/documentation.

Add data and document it

Best done through usethis::use_data(x) which creates the /data folder. Instead of documenting the data directly, you document the name of the dataset and save it in R/. See the relevant section of Hadley’s tomb. Scripts for creating raw data can be created with usethis::use_raw_data.

Using other packages within your function, depends and imports

Note, all packages, including base, have a tendency to change over time, and can cause your functions to break. Using packrat, renv, conda, or docker can keep a track of the versions and help prevent things breaking. I am not this advanced yet.

Refer to all functions from other packages clearly in code, as pkg::function

Add, in the DESCRIPTION file, all the packages ‘Depends:’ (must have and doesn’t call through pkg::function), ‘Imports:’ (must have and calls through pkg::function) and ‘Suggests:’ (provides work arounds when not available) (see discussion on differences and when to use here).

For documenting multiple functions in the same help file and for linking help files to aid navigation search for ‘family’ in Hadley’s tomb.

For illustrating an @example, or @examples (i.e. examples that run over multiple lines – note the plural!) that might cause an error, wrap with \dontrun{}

Check if everything is OK

devtools::check() or, use the RStudio > Build > Check. This writes out a log, and then a summary at the end of errors, warnings and notes.

More information on check and the multitude of confusing errors here and here.

Some common error/warning messages I’ve encountered:

  • ‘no visible binding for global variable’ – this is because functions or variables are not found (i.e attributed to a package, or defined), or can be a dplyr issue, see below, or this useful R-bloggers blog describes a number of possible causes and solutions.
  • warning test_functions…@example… doesn’t exist—simple fix should be @examples
  • %>% pipe issues – need to import magrittr, see below.
  • Sometimes, if packages are out of sync with your R version, you might need to re-load an older version of the package, else it will throw a warning, but this acts like an error. Find the required version number through the cran archive, and install.
  • Sometimes I’ve had a pandoc error 11 occur – it says in the vignette.. there were some errors in the documentation to one of the functions (which didn’t have an @return slot), or there were other problems with a specific function, or there were packages loaded in the vignettes which did not have an entry in the Imports section of the DESCRIPTION folder. One way to work towards resolving this is to remove vignette building from the package build, and see if that turns out any errors. Sometimes they just happen, and re-running or restarting can help.
  • Some other errors are if you have special characters like {} & % etc in your .rmd section comments (e.g. function description). These throw big scary looking errors. But are simple to solve (just replace or put an escape \ prior to the symbol).

Possibly also useful to spellcheck or deploy other useful usethis functions.

Use pipes and dplyr

This will cause warnings as explained more comprehensively here.

Pipes are from magrittr. magrittr should not be listed in Depends, rather leave it in Imports. If using roxygen, add to the function documentation #' @importFrom magrittr "%>%” , or, add it to the NAMESPACE file ‘importFrom(magrittr,"%>%")’ directly (roxygen does this automatically).

If using dplyr, you can resolve the “no visible binding for global variable” note by explicitly using .data in your expression (e.g., df %>% filter(.data$a > 5)) AND by importing rlang::.data in your namespace (e.g., @importFrom rlang .data).

Input checks and tests of functions

For input checks, several options exist:

  • if(cond) stop(‘custom message’) – throw a custom message
  • stopifnot(multiple conditions)—not always a very informative message
  • assertthat::assert_that(condition) – throws a more informative message
    • also has a number of helpful conditions
    • and on_failure() for custom error messages
  • try, tryCatch—more complicated to use, but good when catching errors from other functions.

For tests you can set up with usethis::use_testthat(), and more detail in Hadley’s tomb.

Make a Vignette

Set a vignette up with usethis::use_vignette('my_vignette') — note this needs installed knitr and rmarkdown first. For updating, use devtools::build() as the RStudio shortcut doesn’t build vignettes.

Kbroman and Hadley give good documentation for this.

Vignettes are build through Rmarkdown. Here is the markdown bible. You possibly also want to include functions in the vignette. Also page 11 here and here. And you probably want some Latex math cheat sheets.

Push it to github

Already initiated a package in Rstudio? push it this way.

Before you do, however, consider file sizes and types. GitHub is a great place for code, but not for data. Because of the way it stores and distributes versions, files that are large (individual files >10MB; there is a hard limit for files >100MB), or even smaller non-text files that are repeatedly updated, can rapidly multiply in size. GitHub recommends repositories stay small (<1GB, <<5GB), and provides guidelines for file management, including the recommendation to use Git Large File Service. Note, they highlight that they are developed to facilitate working with and sharing code, not as a backup.

Good job, you.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s