count_data()
returns the number and proportion of observations for
categorical variables.
Arguments
- data
A data frame.
- ...
One or more unquoted (categorical) column names from the data frame, separated by commas.
- na.rm
A boolean specifying whether missing values (including NaN) should be removed.
- pct
A boolean indicating whether to calculate percentages instead of proportions. The default is
FALSE
.
Details
The data frame can be grouped using dplyr::group_by()
so that the number of observations will be calculated within each group
level.
Examples
count_data(quote_source, source)
#> # A tibble: 2 × 3
#> source n prop
#> <chr> <int> <dbl>
#> 1 Bin Laden 3101 0.489
#> 2 Washington 3242 0.511
count_data(quote_source, source, sex)
#> # A tibble: 6 × 4
#> source sex n prop
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 0.326
#> 2 Bin Laden male 1029 0.162
#> 3 Bin Laden NA 5 0.000788
#> 4 Washington female 2206 0.348
#> 5 Washington male 1031 0.163
#> 6 Washington NA 5 0.000788
count_data(quote_source, source, sex, na.rm = TRUE)
#> # A tibble: 4 × 4
#> source sex n prop
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 0.326
#> 2 Bin Laden male 1029 0.162
#> 3 Washington female 2206 0.348
#> 4 Washington male 1031 0.163
count_data(quote_source, source, sex, na.rm = TRUE, pct = TRUE)
#> # A tibble: 4 × 4
#> source sex n pct
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 32.6
#> 2 Bin Laden male 1029 16.2
#> 3 Washington female 2206 34.8
#> 4 Washington male 1031 16.3
# Use dplyr::group_by() to calculate proportions within a group
quote_source |>
dplyr::group_by(source) |>
count_data(sex)
#> # A tibble: 6 × 4
#> # Groups: source [2]
#> source sex n prop
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 0.667
#> 2 Bin Laden male 1029 0.332
#> 3 Bin Laden NA 5 0.00161
#> 4 Washington female 2206 0.680
#> 5 Washington male 1031 0.318
#> 6 Washington NA 5 0.00154