Prefer a enum, even if only two choices

What’s the pattern?

If your function implements two strategies, it’s tempting to distinguish between them using an argument that takes either TRUE or FALSE. However, I recommend that you use an enumeration unless:

  • You’re really sure there won’t ever be another strategy. If you do discover a third (or fourth, or fifth, or …) strategy, you’ll need to change the interface of your function.
  • It’s very clear what both TRUE and FALSE options mean just from the name of the argument. Generally the TRUE value tends to be easier to understand because something = TRUE tells you what will happen, but something = FALSE only tells you what won’t happen.

What are some examples?

There are quite a few examples of the problem in tidyverse, because this is a pattern that we only discovered relatively recently:

  • By default, stringr::str_subset(string, pattern) returns the elements of string that match the pattern. You can use negate = TRUE to instead return the elements that don’t match the pattern, but I now wonder if would be more clear as return = c("matches", "non-matches").
  • httr2::multi_req_perform() allows you to perform a bunch of HTTP requests in parallel. It has an argument called cancel_on_error that can take TRUE or FALSE. It’s fairly clear what cancel_on_error = TRUE means; but it’s not so obvious what cancel_on_error = FALSE does. Additionally, it seems likely that I’ll come up with other possible error handling strategies in the future, and even though I don’t know what they are now, it would be better to plan for the future with an argument specification like error = c("cancel", "continue").

cut() has an argument called right which is used to pick between right-closed left-open intervals (TRUE) and right-open left-closed arguments. I think it’s hard to remember which is which and a clearer specification might be open_side = c("right", "left") or maybe bounds = c("[)", "(]"). Another interesting case in base R is sort() which has two arguments that take a single logical value: decreasing and na.last:

  • The decreasing argument is used to pick between sorting in ascending or descending order. It’s easy to understand what decreasing = TRUE does, but slightly less clear what decreasing = FALSE, the default, means because it feels like a double negative:

    x <- sample(10)
    sort(x, decreasing = TRUE)
    sort(x, decreasing = FALSE)

    Compare this with vctrs::vec_sort(), which uses an enum:

    vctrs::vec_sort(x, direction = "desc")
    vctrs::vec_sort(x, direction = "asc")

    I think this is a mild improvement because the two options are spelled out explicitly.

  • The na.last argument is used to control the location of missing values in the result. It takes three possible values: TRUE (put NAs at the end), FALSE (put NAs at the beginning), or NA (drop NAs from the result). This is an interesting way to support three strategies, but as we’ll see later I think this would be more clear the argument specification was na = c("drop", "first", "last").

How do you remediate past mistakes?

There are two possible ways to switch to using a strategy instead of TRUE/FALSE depending on whether the old argument name makes sense with the new argument values. The sections below show what you’ll need to do if you need a new argument (most cases) or if you’re lucky enough to be able to reuse the existing argument.

Create a new argument

Imagine we wanted to remediate the na.last argument to sort(). Currently:

  • na.last = TRUE means put NAs last.
  • na.last = FALSE means put NAs first.
  • na.list = NA means to drop them.

I think we could make this function more clear by changing the argument name to na and accepting one of three values: last, first, or drop.

Changing an argument name is equivalent removing the old name and adding the new name. This way of thinking about the change makes it easier to see how you do it in a backward compatible way: you need to deprecate the old argument in favour of the new one.

sort <- function(x,
                 na.last = lifecycle::deprecated(),
                 na = c("drop", "first", "last")) {
  if (lifecycle::is_present(na.last)) {
    lifecycle::deprecate_warn("1.0.0", "sort(na.last)", "sort(na)")

    if (!is.logical(na.last) || length(na.last) != 1) {
      cli::cli_abort("{.arg na.last} must be a single TRUE, FALSE, or NA.")
    }
    
    if (isTRUE(na.last)) {
      na <- "last"
    } else if (isFALSE(na.last)) {
      na <- "first"
    } else {
      na <- "drop"
    }
  } else {
    na <- arg_match(na)
  }
  
  ...
}

Note that because na is a prefix of na.last and sort() puts na.last before …,not after it (see Chapter 8), this introduces a very subtle behaviour change. Previously, sort(x, n = TRUE) would have worked and been equivalent to sort(x, na.last = TRUE). But it will now fail because n is a prefix of two arguments (na and na.last). This is unlikely to affect much code, but is worth being aware of.

It would also be nice to make the default value "last" to match order(), especially since it’s very unusual for a function to silently remove missing values. However, that’s likely to affect a lot of existing code, making it unlikely to be worthwhile.

Re-use an existing name

Originally haven::write_sav(compress) could either be TRUE (compress the file) or FALSE (don’t compress it). But then SPSS version 21.0 introduced a new way of compressing files leading to three possible options: compress with the new way (zsav), compress with the old way (bytes), or don’t compress. In this case we got lucky because we can continue to use the same argument name: compress = c("byte", "zsav", "none"). We allowed existing code by special casing the behaviour of TRUE and FALSE:

write_sav <- function(data, path, compress = c("byte", "zsav", "none"), adjust_tz = TRUE) {
  if (isTRUE(compress)) {
    compress <- "zsav"
  } else if (isFALSE(compress)) {
    compress <- "none"
  } else {
    compress <- arg_match(compress)
  }

  ...
}

You could choose to deprecate TRUE and FALSE, but here we chose to the keep them since it’s only a small amount of extra code in haven, and it means that existing users don’t need to think about it. See ?haven::read_sav for how we communicated the change in the docs.

In a future version of haven we might change the order of the enum so that the zsav compression method becomes the default. This generally yields smaller files but can’t be read by older versions of SPSS. Now that v21 is over 5 years old1, it’s reasonable to make the smaller format the default.


  1. Five years is the general threshold for support across the tidyverse.↩︎