Skip to contents

describe_data() returns a set of common descriptive statistics (e.g., number of observations, mean, standard deviation) for one or more numeric variables.

Usage

describe_data(data, ..., na.rm = TRUE, short = FALSE)

Arguments

data

A data frame.

...

One or more unquoted column names from the data frame.

na.rm

A boolean indicating whether missing values (including NaN) should be excluded in calculating the descriptives? The default is TRUE.

short

A boolean indicating whether only a subset of descriptives should be reported? If set to TRUE``, only the N, M, and SD will be returned. The default is FALSE`.

Details

The data can be grouped using dplyr::group_by() so that descriptives will be calculated for each group level.

Skew and kurtosis are based on the datawizard::skewness() and datawizard::kurtosis() functions (Komsta & Novomestky, 2015).

Examples

describe_data(quote_source, response)
#> # A tibble: 1 × 13
#>   var     missing     N     M    SD     SE   min   max range median  mode   skew
#>   <chr>     <int> <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>
#> 1 respon…      18  6325  5.59  2.19 0.0275     1     9     8      5     5 -0.137
#> # ℹ 1 more variable: kurtosis <dbl>

describe_data(quote_source, response, na.rm = FALSE)
#> # A tibble: 1 × 13
#>   var      missing     N     M    SD    SE   min   max range median  mode  skew
#>   <chr>      <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1 response      18  6325    NA    NA    NA    NA    NA    NA     NA     5    NA
#> # ℹ 1 more variable: kurtosis <dbl>

quote_source |>
  dplyr::group_by(source) |>
  describe_data(response)
#> # A tibble: 2 × 14
#> # Groups:   source [2]
#>   var     source missing     N     M    SD     SE   min   max range median  mode
#>   <chr>   <chr>    <int> <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
#> 1 respon… Bin L…      18  3083  5.23  2.11 0.0380     1     9     8      5     5
#> 2 respon… Washi…       0  3242  5.93  2.21 0.0388     1     9     8      6     5
#> # ℹ 2 more variables: skew <dbl>, kurtosis <dbl>

quote_source |>
  dplyr::group_by(source) |>
  describe_data(response, short = TRUE)
#> # A tibble: 2 × 5
#> # Groups:   source [2]
#>   var      source         N     M    SD
#>   <chr>    <chr>      <int> <dbl> <dbl>
#> 1 response Bin Laden   3083  5.23  2.11
#> 2 response Washington  3242  5.93  2.21