Required args shouldn’t have defaults
What’s the pattern?
Required arguments shouldn’t have defaults; optional arguments should have defaults. In other words, an argument should have a default if and only if it’s optional.
This simple convention ensures that you can tell which arguments are optional and which arguments are required from a glance at the function signature. Otherwise you need to rely on a careful reading of documentation. Additionally, if you don’t follow this convention and want to provide helpful error messages, you’ll need to implement them yourself rather than relying on R’s defaults.
This pattern raises the question of when an argument should be required, and when you should provide a default. I think this usually seems “obvious” but I wanted to discuss a few functions that might get it wrong:
rnorm()andrunif()are interesting cases as they set default values formean/sdandmin/max. Giving them defaults makes them feels like less important, and inconsistent with the other RNGs which generally require that you specify the parameters of the distribution. But both the normal and uniform distributions have very high-profile “standard” versions that make sense as defaults.-
You can use
predict()directly on a model and it gives predictions for the data used to fit the model:In my opinion,
predict()should always require a dataset because prediction is primary about applying the model to new situations. stringr::str_sub()has default values forstartandend. This allows you to do clever things likestr_sub(x, end = 3)orstr_sub(x, -3)to select the first or last three characters, but I now believe that leads to code that is harder to read, and it would have been better to makestartandendrequired arguments.
What are some examples?
This is a straightforward convention that the vast majority of functions follow. There are a few exceptions that exist in base R, mostly for historical reasons. Here are a couple of examples:
-
In
sample()neitherxnotsizehas a default value:args(sample) #> function (x, size, replace = FALSE, prob = NULL) #> NULLThis suggests that
sizeis required, but it’s actually optional: -
lm()does not have defaults forformula,data,subset,weights,na.action, oroffset.args(lm) #> function (formula, data, subset, weights, na.action, method = "qr", #> model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, #> contrasts = NULL, offset, ...) #> NULLBut only
formulais actually required:
In the tidyverse, one function that fails to follow this pattern is ggplot2::geom_abline(), slope and intercept don’t have defaults but are not required. If you don’t supply them they default to slope = 1 and intercept = 0, or are taken from aes() if they’re provided there. This is a mistake caused by trying to have geom_abline() do too much — it can be both used as an annotation (i.e. with a single slope and intercept) or used to draw multiple lines from data (i.e. with one line for each row).
How do I use the pattern?
This pattern is generally easy to follow: if you don’t use missing() it’s very hard to do this by mistake.
How do I remediate past mistakes?
If an argument is required, remove the default argument. If an argument is optional, either set it to the default value, or if the computation is complicated, set it to NULL and then compute inside the body of the function.