The normal tidystats workflow consists of running statistics
functions such as lm()
, saving the output into variables,
and then using the add_stats()
function to add the
statistics to a list. This works as long as tidystats has built-in
support for the statistics functions you ran. So what should you do when
this is not the case?
The first thing would be to let me know that there’s a function you would like tidystats to support. There are various ways to contact me. You can go to the Github page and create an issue. This is the preferred method because it is easy to paste code snippets and to ask follow-up questions. For alternative ways to contact me, see the tidystats website.
Of course, it’s not always possible to wait for me to add support for
the function to tidystats. Nor is it always possible for me to add
support for the function. This can happen when the statistic you want to
report is not identifiable as belonging to a particular type of analysis
(for example, the result of confint()
returns a matrix,
which does not contain any information about it being a matrix
containing confidence intervals).
For these reasons, it is useful to know that there are two possible solutions.
The first solution is that I can still add support for functions that
return objects without sufficient information. You can tell
add_stats()
what kind of object it is using the
class
argument. This only works if I explicitly coded
support for a specific class. You can see which classes are supported in
the help document of add_stats()
(?add_stats
).
The second solution is to manually extract the statistics yourself
and create an object to add to add_stats()
. I’ve created a
few helper functions to help you do this: custom_stats()
and custom_stat()
. These two functions work together to
help you store the statistics in a format needed for tidystats to
function.
custom_stats()
has two arguments: method
and statistics
. The method should contain a description of
the type of method you used. The statistics
argument
requires a vector of statistics created with the
custom_stat()
function.
The custom_stat()
function serves to create a statistic,
along with the necessary information to report the statistic. At a
minimum, it requires specifying the name
and
value
of the statistic. Optionally you can also specify a
symbol and a subscript so that the text editor add-ins can correctly
report the statistic. Finally, it’s also possible that the statistic is
a ranged statistic, i.e., it has a lower and upper bound. These
statistics require that you specify the type of interval
(e.g., “CI”, “HDI”), level
(e.g., .95), lower
and upper
bound.
Below I show a few examples of adding custom statistics.
Example 1: Using the class
argument
Say we want to calculate the confidence intervals for several
parameters in a linear model, using the confint()
function.
# Run a linear model
fit <- lm(100 / mpg ~ disp, data = mtcars)
# Compute the confidence intervals
fit_confint <- confint(fit)
# Create an empty list
statistics <- list()
# Add linear model and confidence intervals to the list
statistics <- statistics %>%
add_stats(fit) %>%
add_stats(fit_confint)
Unfortunately, we get an error:
Error in UseMethod(“tidy_stats”) : no applicable method for ‘tidy_stats’ applied to an object of class “c(‘matrix’, ‘array’, ‘double’, ‘numeric’)”
That’s because confint()
return a standard matrix,
rather than an object specific to the confint()
function.
You can check this yourself by running the class()
function
on the output of confint()
(e.g.,
class(fit_confint)
). You’ll see that it simply says
"matrix" "array"
. That’s not enough for tidystats to work
with. Ideally it would say something like "confint"
so that
tidystats knows what it is working with and extract the statistics.
Thankfully, we can still add statistics from confint()
to a list via add_stats()
, using the class
argument. We can specify that the statistics are from the
confint()
function by saying
class = "confint"
.
We don’t get an error this time, indicating that it worked.
Example 2: Using custom_stats()
Say you want to calculate a Bayes Factor using the BIC approach (Wagenmakers, 2007). An example of this approach can be found here; which I’ll repeat down below.
# Set seed for reproducibility
set.seed(14)
# Simulate some data
intercept_data <- data.frame(score = scale(rnorm(40), center = 0.72))
# Run two models and calculate the BIC
full_lm <- lm(score ~ 1, intercept_data)
null_lm <- lm(score ~ 0, intercept_data)
BF_BIC <- exp((BIC(null_lm) - BIC(full_lm)) / 2)
The Bayes Factor is 5.05. Now, how do you add this value to a tidystats list? If we try it the standard way, we’ll see that it fails.
# Load the tidystats package
library(tidystats)
# Create an empty list
statistics <- list()
# Add BIC to the list using add_stats()
statistics <- add_stats(statistics, BF_BIC)
This produces an error message that says:
Error in UseMethod(“tidy_stats”) : no applicable method for ‘tidy_stats’ applied to an object of class “c(‘double’, ‘numeric’)”
It’s because BF_BIC
is simply a number and not the
output of a statistics function, so tidystats doesn’t know how to store
this number. Let’s fix this using custom_stats()
and
custom_stat()
.
# Create a list of custom statistics
BIC <- custom_stats(
method = "BIC",
statistics = custom_stat(
name = "BIC Bayes Factor",
value = BF_BIC,
symbol = "BF",
subscript = "10"
)
)
# Add the statistics to the list
statistics <- add_stats(statistics, BIC)
Now we don’t get an error. Thanks to custom_stats()
and
custom_stat()
we correctly structured the statistic so it
can be added to the list via add_stats()
.
Summary
tidystats works by taking the output of statistical tests, extracting the statistics, and reorganizing them into a particular structure. This works if 1) tidystats has built-in support for the function and 2) if the function used to run the statistical test returns an object that tidystats can use to identify the test.
If you want to use tidystats on a function that is not supported yet, please contact me to let me know that I should add support for it.
Alternatively, you can manually create a list of statistics and
supply it to the add_stats()
function using
custom_stats()
and custom_stat()
.
The goal is to have tidystats support as many tests as possible, so that you rarely have to resort to this solution.