count_data returns the number and percentage of observations for categorical variables.

count_data(data, ..., na.rm = FALSE)

Arguments

data

A data frame.

...

One or more unquoted (categorical) column names from the data frame, separated by commas.

na.rm

Logical. Should missing values (including NaN) be removed?

Details

The data frame can be grouped using dplyr's group_by so that the number of observations will be calculated within each group level.

Examples

# Load dplyr for access to the %>% operator and group_by() library(dplyr) # 1 variable count_data(quote_source, source)
#> # A tibble: 2 x 3 #> source n pct #> <chr> <int> <dbl> #> 1 Bin Laden 3101 48.9 #> 2 Washington 3242 51.1
# 2 variables count_data(quote_source, source, sex)
#> # A tibble: 6 x 4 #> source sex n pct #> <chr> <chr> <int> <dbl> #> 1 Bin Laden female 2067 32.6 #> 2 Bin Laden male 1029 16.2 #> 3 Bin Laden NA 5 0.0788 #> 4 Washington female 2206 34.8 #> 5 Washington male 1031 16.3 #> 6 Washington NA 5 0.0788
# Ignore missing values count_data(quote_source, source, sex, na.rm = TRUE)
#> # A tibble: 4 x 4 #> source sex n pct #> <chr> <chr> <int> <dbl> #> 1 Bin Laden female 2067 32.6 #> 2 Bin Laden male 1029 16.2 #> 3 Washington female 2206 34.8 #> 4 Washington male 1031 16.3
# Use group_by() to get percentages within each group quote_source %>% group_by(source) %>% count_data(sex)
#> # A tibble: 6 x 4 #> # Groups: source [2] #> source sex n pct #> <chr> <chr> <int> <dbl> #> 1 Bin Laden female 2067 66.7 #> 2 Bin Laden male 1029 33.2 #> 3 Bin Laden NA 5 0.161 #> 4 Washington female 2206 68.0 #> 5 Washington male 1031 31.8 #> 6 Washington NA 5 0.154