count_data
returns the number and percentage of observations for
categorical variables.
Arguments
- data
A data frame.
- ...
One or more unquoted (categorical) column names from the data frame, separated by commas.
- na.rm
Logical. Should missing values (including NaN) be removed?
Details
The data frame can be grouped using dplyr's group_by
so that the number of observations will be calculated within each group
level.
Examples
# Load dplyr for access to the %>% operator and group_by()
library(dplyr)
# 1 variable
count_data(quote_source, source)
#> # A tibble: 2 × 3
#> source n pct
#> <chr> <int> <dbl>
#> 1 Bin Laden 3101 48.9
#> 2 Washington 3242 51.1
# 2 variables
count_data(quote_source, source, sex)
#> # A tibble: 6 × 4
#> source sex n pct
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 32.6
#> 2 Bin Laden male 1029 16.2
#> 3 Bin Laden NA 5 0.0788
#> 4 Washington female 2206 34.8
#> 5 Washington male 1031 16.3
#> 6 Washington NA 5 0.0788
# Ignore missing values
count_data(quote_source, source, sex, na.rm = TRUE)
#> # A tibble: 4 × 4
#> source sex n pct
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 32.6
#> 2 Bin Laden male 1029 16.2
#> 3 Washington female 2206 34.8
#> 4 Washington male 1031 16.3
# Use group_by() to get percentages within each group
quote_source %>%
group_by(source) %>%
count_data(sex)
#> # A tibble: 6 × 4
#> # Groups: source [2]
#> source sex n pct
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 66.7
#> 2 Bin Laden male 1029 33.2
#> 3 Bin Laden NA 5 0.161
#> 4 Washington female 2206 68.0
#> 5 Washington male 1031 31.8
#> 6 Washington NA 5 0.154