16  Compound arguments

16.1 What’s the pattern?

Sometimes it’s useful to take the same data as either one complex argument (e.g. a matrix or data frame) or as multiple simple arguments. I think the most compelling reason to use this pattern is when another function might be called directly by a user (who will supply individual arguments) or with the output from another function (which needs to pour into a single argument). For example, it seems reasonable that you should be able to feed the output of str_locate() directly into str_sub():


x <- c("aaaaab", "aaab", "ccccb")
loc <- str_locate(x, "a+b")

str_sub(x, loc)
#> [1] "aaaaab" "aaab"   NA

But equally, it’s nice to be able to supply individual start and end values if known:

str_sub("Hadley", start = 2, end = 4)
#> [1] "adl"

So str_sub() allows either individual vectors supplied to start and end, or a two-column matrix supplied to start.

The main challenge with using this pattern is that there’s no way to make the pattern clear from the function signature alone, so you need to carefully document and generate error messages. Because this is a very rare pattern, there are no helpers available so you’ll have to think about much of it yourself.

I’ve included this pattern mostly as an example of a something that we have only partly thought through, so that you can get some sense of the questions that we ask when developing something new.

16.2 What are some examples?

  • rgb(cbind(r, g, b)) is equivalent to rgb(r, g, b). This is useful because what rgb() really operates on is colours, as specified by their red, green, and blue components. When working interactively it’s useful to set the components individually, but when operating on colour with functions, it’s nice to have a single object.

  • options(list(a = 1, b = 2)) is equivalent to options(a = 1, b = 2). This is half of very useful pattern. The other half of that pattern is that options() returns the previous value of any options that you set. That means you can do old <- options(…); options(old) to temporarily set then reset options.

  • withr::local_options() and withr::local_envvar() work similarly: you can either supply a single list of values, or individually named values.

  • stringr::str_sub(x, cbind(start, end)) is equivalent to str_sub(x, start, end). I’m not sure if it would be better to accept a data frame or a matrix.

16.3 Open questions

This pattern is only used in a couple of places, so I have a lot of questions still:

  • Is it better to use a matrix (simpler) or a data frame (more common)?
  • Should you respect the names of the matrix/data.frame? Should you require that the names match the argument names? For matrices, you could do this only if the names are present, but data frames require names.
  • Is it better to branch on the first argument being complex (like str_sub()) or the additional arguments being missing (like rgb())?
  • Is this pattern even worth it? Would it be clearer to just have a pair of functions? Or always require that the user use the complex form? Is this just a trick that I think is cool but very few other people would every discover?

16.4 dplyr::bind_rows() and !!!

Another place that this pattern crops up is in dplyr::bind_rows(). When binding rows together, it’s equally useful to bind a few named data frames as it is to bind a list of data frames that come from map or similar. In base R you need to know about do.call(rbind, mylist) which is a relatively sophisticated pattern. So in dplyr we tried to make bind_rows() automatically figure out if you were in situation one or two.

Unfortunately, it turns out to be really hard to tell which of the situations you are in, so dplyr implemented heuristics that work most of the time, but occasionally it fails in surprising ways.

Now we have generally steered away from interfaces that try to automatically “unsplice” their inputs and instead require that you use !!! to explicitly unsplice. This is has some advantages and disadvantages: it’s an interface that’s becoming increasingly common in the tidyverse (and we have a good convention for documenting it with the <dynamic-dots> tag), but it’s still relatively rare and is an advanced technique that we don’t expect everyone to learn. That’s why for this important case, we also have purrr::list_cbind().

But it means that functions like purrr::hoist(), forcats::fct_cross(), and rvest::html_form() which are less commonly given lists have a clearly documented escape hatch that doesn’t require another different function. (And of course if you understand the do.call pattern you can still use that too).

16.5 How do I use the pattern?

To implement in your own functions, you should branch on the type of the first argument and then check that the others aren’t supplied.

str_sub <- function(string, start, end) {
  if (is.matrix(start)) {
    if (!missing(end)) {
      stop("`end` must be missing when `start` is a matrix", call. = FALSE)
    if (ncol(start) != 2) {
      stop("Matrix `start` must have exactly two columns", call. = FALSE)
    stri_sub(string, from = start[, 1], to = start[, 2])
  } else {
    stri_sub(string, from = start, to = end)

And make it clear in the documentation:

#' @param start,end Integer vectors giving the `start` (default: first)
#'   and `end` (default: last) positions, inclusively. 
#'   Alternatively, you pass a two-column matrix to `start`, i.e. 
#'   `str_sub(x, start, end)` is equivalent to 
#'   `str_sub(x, cbind(start, end))`

(If you look at string::str_sub() you’ll notice that start and end do have defaults; I think this is a mistake because start and end are important enough that the user should always be forced to supply them.)