Required args shouldn’t have defaults
What’s the pattern?
Required arguments shouldn’t have defaults; optional arguments should have defaults. In other words, an argument should have a default if and only if it’s optional.
This simple convention ensures that you can tell which arguments are optional and which arguments are required from a glance at the function signature. Otherwise you need to rely on a careful reading of documentation. Additionally, if you don’t follow this convention and want to provide helpful error messages, you’ll need to implement them yourself rather than relying on R’s defaults.
This pattern raises the question of when an argument should be required, and when you should provide a default. I think this usually seems “obvious” but I wanted to discuss a few functions that might get it wrong:
rnorm()
andrunif()
are interesting cases as they set default values formean
/sd
andmin
/max
. Giving them defaults makes them feels like less important, and inconsistent with the other RNGs which generally require that you specify the parameters of the distribution. But both the normal and uniform distributions have very high-profile “standard” versions that make sense as defaults.-
You can use
predict()
directly on a model and it gives predictions for the data used to fit the model:In my opinion,
predict()
should always require a dataset because prediction is primary about applying the model to new situations. stringr::str_sub()
has default values forstart
andend
. This allows you to do clever things likestr_sub(x, end = 3)
orstr_sub(x, -3)
to select the first or last three characters, but I now believe that leads to code that is harder to read, and it would have been better to makestart
andend
required arguments.
What are some examples?
This is a straightforward convention that the vast majority of functions follow. There are a few exceptions that exist in base R, mostly for historical reasons. Here are a couple of examples:
-
In
sample()
neitherx
notsize
has a default value:args(sample) #> function (x, size, replace = FALSE, prob = NULL) #> NULL
This suggests that
size
is required, but it’s actually optional: -
lm()
does not have defaults forformula
,data
,subset
,weights
,na.action
, oroffset
.args(lm) #> function (formula, data, subset, weights, na.action, method = "qr", #> model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, #> contrasts = NULL, offset, ...) #> NULL
But only
formula
is actually required:
In the tidyverse, one function that fails to follow this pattern is ggplot2::geom_abline()
, slope
and intercept
don’t have defaults but are not required. If you don’t supply them they default to slope = 1
and intercept = 0
, or are taken from aes()
if they’re provided there. This is a mistake caused by trying to have geom_abline()
do too much — it can be both used as an annotation (i.e. with a single slope
and intercept
) or used to draw multiple lines from data (i.e. with one line for each row).
How do I use the pattern?
This pattern is generally easy to follow: if you don’t use missing()
it’s very hard to do this by mistake.
How do I remediate past mistakes?
If an argument is required, remove the default argument. If an argument is optional, either set it to the default value, or if the computation is complicated, set it to NULL
and then compute inside the body of the function.