Skip to contents

count_data returns the number and percentage of observations for categorical variables.

Usage

count_data(data, ..., na.rm = FALSE)

Arguments

data

A data frame.

...

One or more unquoted (categorical) column names from the data frame, separated by commas.

na.rm

Logical. Should missing values (including NaN) be removed?

Details

The data frame can be grouped using dplyr's group_by so that the number of observations will be calculated within each group level.

Examples

# Load dplyr for access to the %>% operator and group_by()
library(dplyr)

# 1 variable
count_data(quote_source, source)
#> # A tibble: 2 × 3
#>   source         n   pct
#>   <chr>      <int> <dbl>
#> 1 Bin Laden   3101  48.9
#> 2 Washington  3242  51.1

# 2 variables
count_data(quote_source, source, sex)
#> # A tibble: 6 × 4
#>   source     sex        n     pct
#>   <chr>      <chr>  <int>   <dbl>
#> 1 Bin Laden  female  2067 32.6   
#> 2 Bin Laden  male    1029 16.2   
#> 3 Bin Laden  NA         5  0.0788
#> 4 Washington female  2206 34.8   
#> 5 Washington male    1031 16.3   
#> 6 Washington NA         5  0.0788

# Ignore missing values
count_data(quote_source, source, sex, na.rm = TRUE)
#> # A tibble: 4 × 4
#>   source     sex        n   pct
#>   <chr>      <chr>  <int> <dbl>
#> 1 Bin Laden  female  2067  32.6
#> 2 Bin Laden  male    1029  16.2
#> 3 Washington female  2206  34.8
#> 4 Washington male    1031  16.3

# Use group_by() to get percentages within each group
quote_source %>%
  group_by(source) %>%
  count_data(sex)
#> # A tibble: 6 × 4
#> # Groups:   source [2]
#>   source     sex        n    pct
#>   <chr>      <chr>  <int>  <dbl>
#> 1 Bin Laden  female  2067 66.7  
#> 2 Bin Laden  male    1029 33.2  
#> 3 Bin Laden  NA         5  0.161
#> 4 Washington female  2206 68.0  
#> 5 Washington male    1031 31.8  
#> 6 Washington NA         5  0.154