Prefer a enum, even if only two choices
What’s the pattern?
If your function implements two strategies, it’s tempting to distinguish between them using an argument that takes either TRUE
or FALSE
. However, I recommend that you use an enumeration unless:
- You’re really sure there won’t ever be another strategy. If you do discover a third (or fourth, or fifth, or …) strategy, you’ll need to change the interface of your function.
- It’s very clear what both
TRUE
andFALSE
options mean just from the name of the argument. Generally theTRUE
value tends to be easier to understand becausesomething = TRUE
tells you what will happen, butsomething = FALSE
only tells you what won’t happen.
What are some examples?
There are quite a few examples of the problem in tidyverse, because this is a pattern that we only discovered relatively recently:
- By default,
stringr::str_subset(string, pattern)
returns the elements ofstring
that match thepattern
. You can usenegate = TRUE
to instead return the elements that don’t match the pattern, but I now wonder if would be more clear asreturn = c("matches", "non-matches")
. -
httr2::multi_req_perform()
allows you to perform a bunch of HTTP requests in parallel. It has an argument calledcancel_on_error
that can takeTRUE
orFALSE
. It’s fairly clear whatcancel_on_error = TRUE
means; but it’s not so obvious whatcancel_on_error = FALSE
does. Additionally, it seems likely that I’ll come up with other possible error handling strategies in the future, and even though I don’t know what they are now, it would be better to plan for the future with an argument specification likeerror = c("cancel", "continue")
.
cut()
has an argument called right
which is used to pick between right-closed left-open intervals (TRUE)
and right-open left-closed arguments. I think it’s hard to remember which is which and a clearer specification might be open_side = c("right", "left")
or maybe bounds = c("[)", "(]")
. Another interesting case in base R is sort()
which has two arguments that take a single logical value: decreasing
and na.last
:
-
The
decreasing
argument is used to pick between sorting in ascending or descending order. It’s easy to understand whatdecreasing = TRUE
does, but slightly less clear whatdecreasing = FALSE
, the default, means because it feels like a double negative:Compare this with
vctrs::vec_sort()
, which uses an enum:I think this is a mild improvement because the two options are spelled out explicitly.
The
na.last
argument is used to control the location of missing values in the result. It takes three possible values:TRUE
(putNA
s at the end),FALSE
(putNA
s at the beginning), orNA
(dropNA
s from the result). This is an interesting way to support three strategies, but as we’ll see later I think this would be more clear the argument specification wasna = c("drop", "first", "last")
.
How do you remediate past mistakes?
There are two possible ways to switch to using a strategy instead of TRUE
/FALSE
depending on whether the old argument name makes sense with the new argument values. The sections below show what you’ll need to do if you need a new argument (most cases) or if you’re lucky enough to be able to reuse the existing argument.
Create a new argument
Imagine we wanted to remediate the na.last
argument to sort()
. Currently:
-
na.last = TRUE
means putNA
s last. -
na.last = FALSE
means putNA
s first. -
na.list = NA
means to drop them.
I think we could make this function more clear by changing the argument name to na
and accepting one of three values: last
, first
, or drop
.
Changing an argument name is equivalent removing the old name and adding the new name. This way of thinking about the change makes it easier to see how you do it in a backward compatible way: you need to deprecate the old argument in favour of the new one.
sort <- function(x,
na.last = lifecycle::deprecated(),
na = c("drop", "first", "last")) {
if (lifecycle::is_present(na.last)) {
lifecycle::deprecate_warn("1.0.0", "sort(na.last)", "sort(na)")
if (!is.logical(na.last) || length(na.last) != 1) {
cli::cli_abort("{.arg na.last} must be a single TRUE, FALSE, or NA.")
}
if (isTRUE(na.last)) {
na <- "last"
} else if (isFALSE(na.last)) {
na <- "first"
} else {
na <- "drop"
}
} else {
na <- arg_match(na)
}
...
}
Note that because na
is a prefix of na.last
and sort()
puts na.last
before …,
not after it (see Chapter 8), this introduces a very subtle behaviour change. Previously, sort(x, n = TRUE)
would have worked and been equivalent to sort(x, na.last = TRUE)
. But it will now fail because n
is a prefix of two arguments (na
and na.last)
. This is unlikely to affect much code, but is worth being aware of.
It would also be nice to make the default value "last"
to match order()
, especially since it’s very unusual for a function to silently remove missing values. However, that’s likely to affect a lot of existing code, making it unlikely to be worthwhile.
Re-use an existing name
Originally haven::write_sav(compress)
could either be TRUE
(compress the file) or FALSE
(don’t compress it). But then SPSS version 21.0 introduced a new way of compressing files leading to three possible options: compress with the new way (zsav), compress with the old way (bytes), or don’t compress. In this case we got lucky because we can continue to use the same argument name: compress = c("byte", "zsav", "none")
. We allowed existing code by special casing the behaviour of TRUE
and FALSE
:
You could choose to deprecate TRUE
and FALSE
, but here we chose to the keep them since it’s only a small amount of extra code in haven, and it means that existing users don’t need to think about it. See ?haven::read_sav
for how we communicated the change in the docs.
In a future version of haven we might change the order of the enum so that the zsav
compression method becomes the default. This generally yields smaller files but can’t be read by older versions of SPSS. Now that v21 is over 5 years old1, it’s reasonable to make the smaller format the default.
Five years is the general threshold for support across the tidyverse.↩︎