Keep defaults short and sweet
What’s the pattern?
Default values should be short and sweet. Avoid large or complex calculations in the default values, instead using NULL
or a helper function when the default requires complex calculation. This keeps the function specification focussed on the big picture (i.e. what are the arguments and are they required or not) rather than the details of the defaults.
What are some examples?
It’s common for functions to use NULL
to mean that the argument is optional, but the computation of the default is non-trivial:
- The default
label
incut()
yields labels in the form[a, b)
. - The default
pattern
indir()
means match all files. - The default
by
indplyr::left_join()
means join using the common variables between the two data frames (the so-called natural join). - The default
mapping
inggplot2::geom_point()
(and friends) means use the mapping from in the overall plot.
In other cases, we encapsulate default values into a function:
- readr functions use a family of functions including
readr::show_progress()
,readr::should_show_col_types()
andreadr::should_show_lazy()
that make it easier for users to override various defaults.
It’s also worth looking at a couple of counter examples that come from base R:
The default value for
by
inseq
is((to - from)/(length.out - 1))
.-
reshape()
has a very long default argument: thesplit
argument is one of two possible lists depending on the value of thesep
argument: sample.int()
uses a complicated rule to determine whether or not to use a faster hash based method that’s only applicable in some circumstances:useHash = (!replace && is.null(prob) && size <= n/2 && n > 1e+07))
.
How do I use it?
So what should you do if a default requires some complex calculation? We have two recommended approaches: using NULL
or creating a helper function. I’ll also show you two other alternatives which we don’t generally recommend but you’ll see in a handful of places in the tidyverse, and can be useful in limited circumstances.
NULL
default
The simplest, and most common, way to indicate that an argument is optional, but has a complex default is to use NULL
as the default. Then in the body of the function you perform the actual calculation only if the is NULL
. For example, if we were to use this approach in sample.int()
, it might look something like this:
This pattern is made more elegant with the infix %||%
operator which is built in to R 4.4. If you need it in an older version of R you can import it from rlang or copy and paste it in to your utils.R
:
%||%
is particularly well suited to arguments where the default value is found through a cascading system of fallbacks. For example, this code from ggplot2::geom_bar()
finds the width by first looking at the data, then in the parameters, finally falling back to computing it from the resolution of the x
variable:
width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9)
Don’t use %||%
for more complex examples where the individual clauses can’t fit on their own line. For example in reshape()
, I wouldn’t write:
I would instead use is.null()
and assign split
inside each branch:
Or alternatively you might pull the code out into a helper function:
That makes it very clear exactly which other arguments the default for split
depends on.
Exported helper function
If you have created a helper function for your own use, might consider use it as the default:
reshape <- function(..., sep = ".", split = split_default(sep)) {
...
}
The problem with using an internal function as the default is that the user can’t easily run this function to see what it does, making the default a bit magical (Chapter 20). So we recommend that if you want to do this you export and document that function. This is the main downside of this approach: you have to think carefully about the name of the function because it’s user facing.
A good example of this pattern is readr::show_progress()
: it’s used in every read_
function in readr to determine whether or not a progress bar should be shown. Because it has a relatively complex explanation, it’s nice to be able to document it in its own file, rather than cluttering up file reading functions with incidental details.
Alternatives
If the above techniques don’t work for your case there are two other alternatives that we don’t generally recommend but can be useful in limited situations.
Sometimes you’d like to use the NULL
approach defined above, but NULL
already has a specific meaning that you want to preserve. For example, this comes up in ggplot2 scales functions which allow you to set the name
of the scale which is displayed on the axis or legend. The default value should just preserve whatever existing label is present so that if you’re providing a scale to customise (e.g.) the breaks or labels, you don’t need to re-type the scale name. However, NULL
is also a meaningful value because it means eliminate the scale label altogether1. For that reason the default value for name
is ggplot2::waiver()
a ggplot2-specific convention that means “inherit from the existing value”.
If you look at ggplot2::waiver()
you’ll see it’s just a very lightweight S3 class2:
ggplot2::waiver
#> function ()
#> structure(list(), class = "waiver")
#> <bytecode: 0x557b1355dba8>
#> <environment: namespace:ggplot2>
And then ggplot2 also provides the internal is.waive()
3 function which allows to work with it in the same way we might work with a NULL
:
is.waive <- function(x) {
inherits(x, "waiver")
}
The primary downside of this technique is that it requires substantial infrastructure to set up, so it’s only really worth it for very important functions or if you’re going to use it in multiple places.
The final alternative is to condition on the absence of an argument using missing().
It works something like this:
reshape <- function(..., sep = ".", split) {
if (missing(split)) {
split <- split_default(sep)
}
...
}
I mention this technique because we used it in purrr::reduce()
for the .init
argument. This argument is mostly optional:
But it is required when .x
(the first argument) is empty, and it’s good practice to supply it when wrapping reduce()
inside another function because it ensures that you get the right type of output for all inputs:
Why use this approach? NULL
is a potentially valid option for .init
, so we can’t use that approach. And we only need it for a single function, that’s not terribly important, so creating a sentinel didn’t seem to worth it. .init
is “semi” required so this seemed to be the least worst solution to the problem.
The major drawback to this technique is that it makes it look like an argument is required (in direct conflict with Chapter 7).
How do I remediate existing problems?
If you have a function with a long default, you can remediate it with any of the approaches. It won’t be a breaking change unless you accidentally change the computation of the default, so make sure you have a test for that before you begin.
See also
- See Chapter 11 for a tecnhnique to simplify your function spec if its long because it has many less important optional arguments.
Unlike
name = ""
which doesn’t show the label, but preserves the space where it would appear (sometimes useful for aligning multiple plots),name = NULL
also eliminates the space normally allocated for the label.↩︎If I was to write this code today I’d use
ggplot2_waiver
as the class name.↩︎If I wrote this code today, I’d call it
is_waiver()
.↩︎