# 24 Type-stability

The less you need to know about a function’s inputs to predict the type of its output, the better. Ideally, a function should either always return the same type of thing, or return something that can be trivially computed from its inputs.

If a function is type-stable it satisifes two conditions:

• You can predict the output type based only on the input types (not their values).

• If the function uses `...`, the order of arguments in does not affect the output type.

``library(vctrs)``

## 24.1 Simple examples

• `purrr::map()` and `base::lapply()` are trivially type-stable because they always return lists.

• `paste()` is type stable because it always returns a character vector.

``````vec_ptype(paste(1))
#> character(0)
vec_ptype(paste("x"))
#> character(0)``````
• `base::mean(x)` almost always returns the same type of output as `x`. For example, the mean of a numeric vector is a numeric vector, and the mean of a date-time is a date-time.

``````vec_ptype(mean(1))
#> numeric(0)
vec_ptype(mean(Sys.time()))
#> POSIXct of length 0``````
• `ifelse()` is not type-stable because the output type depends on the value:

``````vec_ptype(ifelse(NA, 1L, 2))
#> <unspecified> [0]
vec_ptype(ifelse(FALSE, 1L, 2))
#> numeric(0)
vec_ptype(ifelse(TRUE, 1L, 2))
#> integer(0)``````

## 24.2 More complicated examples

Some functions are more complex because they take multiple input types and have to return a single output type. This includes functions like `c()` and `ifelse()`. The rules governing base R functions are idiosyntractic, and each function tends to apply it’s own slightly different set of rules. Tidy functions should use the consistent set of rules provided by the vctrs package.

## 24.3 Challenge: the median

A more challenging example is `median()`. The median of a vector is a value that (as evenly as possible) splits the vector into a lower half and an upper half. In the absense of ties, `mean(x > median(x)) == mean(x <= median(x)) == 0.5`. The median is straightforward to compute for odd lengths: you simply order the vector and pick the value in the middle, i.e. `sort(x)[(length(x) - 1) / 2]`. It’s clear that the type of the output should be the same type as `x`, and this algorithm can be applied to any vector that can be ordered.

But what if the vector has an even length? In this case, there’s no longer a unique median, and by convention we usually take the mean of the middle two numbers.

In R, this makes the `median()` not type-stable:

``````typeof(median(1:3))
#> [1] "integer"
typeof(median(1:4))
#> [1] "double"``````

Base R doesn’t appear to follow a consistent principle when computing the median of a vector of length 2. Factors throw an error, but dates do not (even though there’s no date half way between two days that differ by an odd number of days).

``````median(factor(1:2))
#> Error in median.default(factor(1:2)): need numeric data
median(Sys.Date() + 0:1)
#> [1] "2019-11-20"``````

To be clear, the problems that this cause in practice are quite small, but this makes analysis of `median()` more complex, and it makes it to know what principle you should adhere to when creating `median` methods for new vector classes.

``````median("foo")
#> [1] "foo"
median(c("foo", "bar"))
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]): argument
#> is not numeric or logical: returning NA
#> [1] NA``````

## 24.4 Exercises

1. How is a date like an integer? Why is this inconsistent?

``````vec_ptype(mean(Sys.Date()))
#> Date of length 0
vec_ptype(mean(1L))
#> numeric(0)``````