Keep defaults short and sweet

What’s the pattern?

Default values should be short and sweet. Avoid large or complex calculations in the default values, instead using NULL or a helper function when the default requires complex calculation. This keeps the function specification focussed on the big picture (i.e. what are the arguments and are they required or not) rather than the details of the defaults.

What are some examples?

It’s common for functions to use NULL to mean that the argument is optional, but the computation of the default is non-trivial:

  • The default label in cut() yields labels in the form [a, b).
  • The default pattern in dir() means match all files.
  • The default by in dplyr::left_join() means join using the common variables between the two data frames (the so-called natural join).
  • The default mapping in ggplot2::geom_point() (and friends) means use the mapping from in the overall plot.

In other cases, we encapsulate default values into a function:

  • readr functions use a family of functions including readr::show_progress(), readr::should_show_col_types() and readr::should_show_lazy() that make it easier for users to override various defaults.

It’s also worth looking at a couple of counter examples that come from base R:

  • The default value for by in seq is ((to - from)/(length.out - 1)).

  • reshape() has a very long default argument: the split argument is one of two possible lists depending on the value of the sep argument:

    reshape <- function(
        ...,
        split = if (sep == "") {
          list(regexp = "[A-Za-z][0-9]", include = TRUE)
        } else {
          list(regexp = sep, include = FALSE, fixed = TRUE)
        }
    ) {}
  • sample.int() uses a complicated rule to determine whether or not to use a faster hash based method that’s only applicable in some circumstances: useHash = (!replace && is.null(prob) && size <= n/2 && n > 1e+07)).

How do I use it?

So what should you do if a default requires some complex calculation? We have two recommended approaches: using NULL or creating a helper function. I’ll also show you two other alternatives which we don’t generally recommend but you’ll see in a handful of places in the tidyverse, and can be useful in limited circumstances.

NULL default

The simplest, and most common, way to indicate that an argument is optional, but has a complex default is to use NULL as the default. Then in the body of the function you perform the actual calculation only if the is NULL. For example, if we were to use this approach in sample.int(), it might look something like this:

sample.int <- function (n, size = n, replace = FALSE, prob = NULL, useHash = NULL)  {
  if (is.null(useHash)) {
    useHash <- n > 1e+07 && !replace && is.null(prob) && size <= n/2
  }
}

This pattern is made more elegant with the infix %||% operator which is built in to R 4.4. If you need it in an older version of R you can import it from rlang or copy and paste it in to your utils.R:

`%||%` <- function(x, y) if (is.null(x)) y else x

sample.int <- function (n, size = n, replace = FALSE, prob = NULL, useHash = NULL)  {
  useHash <- useHash %||% n > 1e+07 && !replace && is.null(prob) && size <= n/2
}

%||% is particularly well suited to arguments where the default value is found through a cascading system of fallbacks. For example, this code from ggplot2::geom_bar() finds the width by first looking at the data, then in the parameters, finally falling back to computing it from the resolution of the x variable:

width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9)

Don’t use %||% for more complex examples where the individual clauses can’t fit on their own line. For example in reshape(), I wouldn’t write:

reshape <- function(..., sep = ".", split = NULL) {
  split <- split %||% if (sep == "") {
    list(regexp = "[A-Za-z][0-9]", include = TRUE)
  } else {
    list(regexp = sep, include = FALSE, fixed = TRUE)
  }  
  ...
}

I would instead use is.null() and assign split inside each branch:

reshape <- function(..., sep = ".", split = NULL) {
  if (is.null(split)) {
    if (sep == "") {
      split <- list(regexp = "[A-Za-z][0-9]", include = TRUE)
    } else {
      split <- list(regexp = sep, include = FALSE, fixed = TRUE)
    }
  }
  ...
}

Or alternatively you might pull the code out into a helper function:

split_default <- function(sep = ".") {
 if (sep == "") {
    list(regexp = "[A-Za-z][0-9]", include = TRUE)
  } else {
    list(regexp = sep, include = FALSE, fixed = TRUE)
  }
}

reshape <- function(..., sep = ".", split = NULL) {
  split <- split %||% split_default(sep)
  ...
}

That makes it very clear exactly which other arguments the default for split depends on.

Exported helper function

If you have created a helper function for your own use, might consider use it as the default:

reshape <- function(..., sep = ".", split = split_default(sep)) {
  ...
}

The problem with using an internal function as the default is that the user can’t easily run this function to see what it does, making the default a bit magical (Chapter 20). So we recommend that if you want to do this you export and document that function. This is the main downside of this approach: you have to think carefully about the name of the function because it’s user facing.

A good example of this pattern is readr::show_progress(): it’s used in every read_ function in readr to determine whether or not a progress bar should be shown. Because it has a relatively complex explanation, it’s nice to be able to document it in its own file, rather than cluttering up file reading functions with incidental details.

Alternatives

If the above techniques don’t work for your case there are two other alternatives that we don’t generally recommend but can be useful in limited situations.

Sometimes you’d like to use the NULL approach defined above, but NULL already has a specific meaning that you want to preserve. For example, this comes up in ggplot2 scales functions which allow you to set the name of the scale which is displayed on the axis or legend. The default value should just preserve whatever existing label is present so that if you’re providing a scale to customise (e.g.) the breaks or labels, you don’t need to re-type the scale name. However, NULL is also a meaningful value because it means eliminate the scale label altogether1. For that reason the default value for name is ggplot2::waiver() a ggplot2-specific convention that means “inherit from the existing value”.

If you look at ggplot2::waiver() you’ll see it’s just a very lightweight S3 class2:

ggplot2::waiver
#> function () 
#> structure(list(), class = "waiver")
#> <bytecode: 0x557b1355dba8>
#> <environment: namespace:ggplot2>

And then ggplot2 also provides the internal is.waive()3 function which allows to work with it in the same way we might work with a NULL:

is.waive <- function(x) {
  inherits(x, "waiver")
}

The primary downside of this technique is that it requires substantial infrastructure to set up, so it’s only really worth it for very important functions or if you’re going to use it in multiple places.

The final alternative is to condition on the absence of an argument using missing(). It works something like this:

reshape <- function(..., sep = ".", split) {
  if (missing(split)) {
    split <- split_default(sep)
  }
  ...
}

I mention this technique because we used it in purrr::reduce() for the .init argument. This argument is mostly optional:

library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked _by_ '.GlobalEnv':
#> 
#>     %||%
reduce(letters[1:3], paste)
#> [1] "a b c"
reduce(letters[1:2], paste)
#> [1] "a b"
reduce(letters[1], paste)
#> [1] "a"

But it is required when .x (the first argument) is empty, and it’s good practice to supply it when wrapping reduce() inside another function because it ensures that you get the right type of output for all inputs:

reduce(letters[0], paste)
#> Error in `reduce()`:
#> ! Must supply `.init` when `.x` is empty.
reduce(letters[0], paste, .init = "")
#> [1] ""

Why use this approach? NULL is a potentially valid option for .init, so we can’t use that approach. And we only need it for a single function, that’s not terribly important, so creating a sentinel didn’t seem to worth it. .init is “semi” required so this seemed to be the least worst solution to the problem.

The major drawback to this technique is that it makes it look like an argument is required (in direct conflict with Chapter 7).

How do I remediate existing problems?

If you have a function with a long default, you can remediate it with any of the approaches. It won’t be a breaking change unless you accidentally change the computation of the default, so make sure you have a test for that before you begin.

See also

  • See Chapter 11 for a tecnhnique to simplify your function spec if its long because it has many less important optional arguments.

  1. Unlike name = "" which doesn’t show the label, but preserves the space where it would appear (sometimes useful for aligning multiple plots), name = NULL also eliminates the space normally allocated for the label.↩︎

  2. If I was to write this code today I’d use ggplot2_waiver as the class name.↩︎

  3. If I wrote this code today, I’d call it is_waiver().↩︎