Reduce argument clutter with an options object

What’s the problem?

If you have a large number of optional arguments that control the fine details of the operation of a function, it might be worth lumping them all together into a separate “options” object created by a helper function.

Having a large number of less important arguments makes it harder to see the most important. By moving rarely used and less important arguments to a secondary function, you can more easily draw attention to what is most important.

What are some examples?

  • Many base R modelling functions like loess(), glm(), and nls() have a control argument that are paired with a function like loess.control(), glm.control(), and nls.control(). These allow you to modify rarely used defaults, including the number of iterations, the stopping criteria, and some debugging options.

    optim() uses a less formal version of this structure — while it has a control argument, it doesn’t have a matching optim.control() helper. Instead, you supply a named list with components described in ?optim. A helper function is more convenient than a named list because it checks the argument names for free and gives nicer autocomplete to the user.

  • This pattern is common in other modelling packages, e.g. tune::fit_resamples() + tune::control_resamples(), tune::control_bayes(), tune::control_grid(), and caret::train() + caret::trainControl()

  • readr::read_delim() and friends take a locale argument which is paired with the readr::locale() helper. This object bundles together a bunch of options related to parsing numbers, dates, and times that vary from country to country.

  • readr::locale() itself has a date_names argument that’s paired with readr::date_names() and readr::date_names_lang() helpers. You typically use the argument by supplying a two letter locale (which date_names_lang() uses to look up common languages), but if your language isn’t supported you can use readr::date_names() to individually supply full and abbreviated month and day of week names.

On the other hand, some functions with many arguments that would benefit from this technique include:

  • readr::read_delim() has a lot of options that control rarely needed details of file parsing (e.g. escape_backslash, escape_double, quoted_na, comment, trim_ws). These make the function specification very long and might well be better in a details object.

  • ggplot2::geom_smooth() fits a smooth line to your data. Most of the time you only want to pick the model and formula used, but geom_smooth() (via ggplot2::stat_smooth()) also provides n, fullrange, span, level, and method.args to control details of the fit. I think these would be better in their own details object.

How do I use this pattern?

The simplest implementation is just to write a helper function that returns a list:

my_fun_opts <- function(opt1 = 1, opt2 = 2) {
  list(
    opt1 = opt1,
    opt2 = opt2
  )
}

This alone is nice because you can document the individual arguments, you get name checking for free, and auto-complete will remind the user what these less important options include.

Better error messages

An optional extra is to add a unique class to the list:

my_fun_opts <- function(opt1 = 1, opt2 = 2) {
  structure(
    list(
      opt1 = opt1,
      opt2 = opt2
    ),
    class = "mypackage_my_fun_opts"
  )
}

This then allows you to create more informative error messages:

my_fun_opts <- function(..., opts = my_fun_opts()) {
  if (!inherits(opts, "mypackage_my_fun_opts")) {
    cli::cli_abort("{.arg opts} must be created by {.fun my_fun_opts}.")
  }
}

my_fun_opts(opts = 1)
#> Error in `my_fun_opts()`:
#> ! `opts` must be created by `my_fun_opts()`.

If you use this option in many places, you should consider pulling out the repeated code into a check_my_fun_opts() function.

How do I remediate past mistakes?

Typically you notice this problem only after you have created too many options so you’ll need to carefully remediate by introducing a new options argument and paired helper function. For example, if your existing function looks like this:

my_fun <- function(x, y, opt1 = 1, opt2 = 2) {
  
}

If you want to keep the existing function specification you could add a new opts argument that uses the values of opt1 and opt2:

my_fun <- function(x, y, opts = NULL, opt1 = 1, opt2 = 2) {
  
  opts <- opts %||% my_fun_opts(opt1 = opt1, opt2 = opt2)
}

However, that introduces a dependency between the arguments: if you specify both opts and opt1/opt2, opts will win. You could certainly add extra code to pick up on this problem and warn the user, but I think it’s just cleaner to deprecate the old arguments so that you can eventually remove them:

my_fun <- function(x, y, opts = my_fun_opts(), opt1 = deprecated(), opt2 = deprecated()) {
  
  if (lifecycle::is_present(opt1)) {
    lifecycle::deprecate_warn("1.0.0", "my_fun(opt1)", "my_fun_opts(opt1)")
    opts$opt1 <- opt1
  }
  if (lifecycle::is_present(opt2)) {
    lifecycle::deprecate_warn("1.0.0", "my_fun(opt2)", "my_fun_opts(opt2)")
    opts$opt2 <- opt2
  }
}

Then you can remove the old arguments in a future release.

See also

  • Chapter 14 is a similar pattern when you have multiple options function that each encapsulate a different strategy.