Response Options
This chapter is still a work in progress.
There are several decisions to make that involve response options. How many response options should you use? Should you use an even or odd number of response options? Should you label them? This section contains a summary of best practices that one can use to address these questions.
Number of Response Options
The question of how many response options to use centers around two main concerns. The first is that more options means you can obtain a more fine-grained assessment of the characteristic that is being evaluated (e.g., attitude). In other words, your assessment is more precise. However, the question is how the number of options affects the reliability and the validity of the measurement. With more options, it becomes more difficult for people to distinguish between the different options (e.g., is “Strongly agree” reliably different from “Very strongly agree”?).
Table 1 shows an overview of various studies in which the topic of response options was addressed. The studies vary in many ways, so the final conclusion should be a holistic interpretation of the results, rather than a simple tallying of the results. Note also that only empirical studies are included and not simulation studies. Simulation studies seem limited because they cannot address the plausible psychological limitation of people being unable to distinguish between many options.
Source | Comparisons | Topic | Outcome | Conclusion |
---|---|---|---|---|
Donnellan & Rakhshani (2020) | 2- to 7-, and 11-point Likert | Self-esteem | Reliability; distribution; validity; quality | 5-point Likert or higher |
Simms et al. (2019) | 2- to 11-point Likert + VAS | Personality | Reliability; validity | 6-point Likert |
Sung & Wu (2018) | 5-point Likert and VAS-RRP | Career interest | Reliability | VAS-RPP |
Cox et al. (2017) | 2- and 4-point Likert | Personality | Reliability; validity; duration | Mixed |
Lewis (2017) | 7-, 11-point Likert and VAS | Software usability | Reliability; distribution; validity | No difference |
Kuhlmann et al. (2017) | 5-point Likert and VAS | Personality | Reliability; distribution; validity | No difference |
Hilbert (2016) | 2- and 5-point and VAS | Personality | Reliability; validity; quality | It depends |
Capik & Gozum (2015) | 2- and 5-point Likert | Health | Reliability; validity | No difference |
Eutsler & Lang (2015) | 5-, 7-, 9-, and 11-point Likert | Judgment | Distribution; power | 7-point Likert |
Finn et al. (2015) | 2- and 4-point Likert | 4-point Likert | ||
Revilla et al. (2014) | 5-, 7-, 11-point Likert | 5-point Likert | ||
Cox et al. (2012) | 2- and 4-point Likert | 4-point Likert | ||
Janhunen (2012) | 7-point Likert and 30-point VAR | VAR | ||
Dawes (2008) | 5-, 7-, and 10-point Likert | No difference | ||
Weng (2004) | 3- to 9-point Likert | 5-point or higher | ||
Preston & Colman (2000) | 2- to 11-point Likert and VAS | 7-, 9-, or 10-point Likert | ||
Alwin (1997) | 7- and 11-point Likert | 11-point Likert | ||
Jaeschke et al. (1990) | 7-point Likert and VAS | No difference (slightly favor 7-point Likert) |
||
Flamer (1983) | 2- and 9-point Likert | 9-point Likert | ||
Matell & Jacoby (1971) | 2-point to 18-point Likert | No difference | ||
Bendig (1954) | 2-, 3-, 5-, 7-, and 9-point Likert | No difference (maybe 3-point or higher) |
||
Rhemtulla et al. (2012) | 2- to 7-point Likert | 5-point Likert maybe good, 6- or 7-point best |
There are also several review papers on the topic. Krosnick & Presser (2010) suggest that 7-point Likert scales are probably optimal. Lietz (2010) concludes a desirable Likert-scale consists of 5 to 8 response options. Similarly, Cox III (1980) recommends to use between 5 and 9 response options. Symonds (1924), in 1924, claims the optimum number is 7. Gehlbach & Brinkworth (2011) recommends using 5-points for unipolar items and 7-point for biopolar items.
There are also statistical arguments for why a particular number of response options is preferred. With more response options, the assumption of normality is more likely to be tenable. Some of the papers included in Table 1 (e.g., Rhemtulla et al. (2012)) are about this concern.
Besides psychometric properties it may also be worth taking into account respondent preference. This involves ease of use of the scale and whether the response options allow for sufficient variation for respondents to express their view. Preston & Colman (2000) found that respondents found scales with 5, 7, and 10 points easy to use (compared to fewer options and a VAS) and that they preferred scales with more response options to allow them to express themselves (7 or more). Other studies also show that respondents favor more options (Cox et al., 2017).
Note that if time is of the essence, fewer response options are preferred.
Another relevant factor is whether the scale is bipolar or unipolar. Bipolar scales are symmetrical which means the number of options naturally increase as they need to match both sides of the spectrum. Unipolar items are only about one side, usually ranging from the absence of something to the presence of something (to a certain degree). Since it is harder to label a larger number of options for a unipolar scale, the number of options are likely to be smaller.
Conclusion: It appears that few response options (2 or 3) should definitely be avoided. More response options therefore seems better, but benefits seem to quickly level off. Given other concerns, such as ease of use and interpretability, a 7-point Likert scale seems to be preferred for bipolar scales and a 5-point Likert scale for unipolar scales.
Odd vs. Even Response Options
The middle option of a scale can have an ambiguous meaning. Participants may use it to indicate a moderate standing on the issue (Rugg and Cantril, 1944), a lack of an opinion (Nadler, Weston, and Voyles, 2014), ambivalence (Klopfer and Madden, 1980; Schaeffer and Presser, 2003; Nadler, Weston, and Voyles, 2014), indifference (Schaeffer and Presser, 2003; Nadler, Weston, and Voyles, 2014), uncertainty (Baka, Figgou, and Triga, 2012; Nadler, Weston, and Voyles, 2014), confusion, or to signal context dependence (e.g., “it depends” or disputing the question, see Baka, Figgou, and Triga, 2012).
The middle option may also be used for certain response styles, such as socially desirable responding (Sturgis, Roberts, and Smith, 2012) or satisficing (Krosnick, 1991), although there is not much research showing it actually leads to satisficing Wang & Krosnick (2020).
Af a middle alternative is explicitly offered, the proportion endorsing it increases dramatically (e.g. Ayidiya & McClendon, 1990; Bishop, 1987; Bishop, Hippler, Schwarz, & Strack, 1988; Kalton, Collins, & Brook, 1978; Kalton, Roberts, & Holt, 1980; Rugg & Cantril, 1944).
Some studies show that not including a middle option decreases validity and increases measurement error (O’Muircheartaigh, Krosnick, and Helic, 1999; Kahn, and Dhar, 2002)
Recent study on this: Wang & Krosnick (2020)
An alternative approach to this issue is to use branching. Respondents could first be asked whether they fall at the midpoint or on one side, followed by a question about their extremity on a side. This approach was found to be more reliable and valid than using a 7-point scale (Krosnick and Berent, 1993; Malhotra, Krosnick, and Thomas, 2009).
Conclusion: If it is possible that respondents may have a moderate view, it seems crucial for it to be possible to capture this view. Limitations of a middle option could then be addressed in other ways (e.g., clear questions).
Response Option Labeling
There are several studies that show all response options should be labelled, rather than only labeling the end points (Krosnick & Berent, 1993; Weng, 2004).
For an example of biopolar labels for a 2- to 11-point Likert scale, see Table 1.
Label | 2-point | 3-point | 4-point | 5-point | 6-point | 7-point | 8-point | 9-point | 10-point | 11-point |
---|---|---|---|---|---|---|---|---|---|---|
Very strongly disagree | x | x | x | x | ||||||
Strongly disagree | x | x | x | x | x | x | x | x | ||
Disagree | x | x | x | x | x | x | x | x | x | x |
Mostly disagree | x | x | ||||||||
Slightly disagree | x | x | x | x | x | x | ||||
Neither agree nor disagree | x | x | x | x | x | |||||
Slightly agree | x | x | x | x | x | x | ||||
Mostly agree | x | x | ||||||||
Agree | x | x | x | x | x | x | x | x | x | x |
Strongly agree | x | x | x | x | x | x | x | x | ||
Very strongly agree | x | x | x | x |
It is also recommended to avoid agree-disagree response labels because asking respondents to rate their level of agreement is a cognitively demanding task that increases respondent error and reduces responding effort (Gehlbach & Brinkworth, 2011).
Possible labels, from CampusLabs:
Agreement: Strongly agree, Moderately agree, Neither agree nor disagree, Moderately disagree, Strongly disagree (another version removes the “moderately” qualifier and/or uses “neutral”)
Comparison: Much X, Slightly X, About the same, Slightly (opposite of X), Much (opposite of X)
Ease: Very easy, Moderately easy, Neither easy nor difficult, Moderately difficult, Very difficult
Expectations: Exceeds expectations, Fully meets expectations, Does not fully meet expectations, Does not meet expectations at all
Extent (5 pt): A great deal (Completely, if appropriate), Considerably, Moderately, Slightly, Not at all
Extent (4 pt): Significantly, Moderately, Slightly, Not at all
Frequency (no set time): Always, Often, Occasionally, Rarely, Never
Frequency (general): Daily, Weekly, Monthly, Once a semester, Once a year, Never
Frequency (based on time frame): More than 5 times, 4 - 5 times, 2 - 3 times, 1 time, Less than 1 time, Never
Frequency (extended): More than once a week, Once a week, Once a month, Once a semester, Once a year, Less than once a year, Never
Helpfulness: Extremely helpful, Very helpful, Moderately helpful, Slightly helpful, Not at all helpful
Importance: Extremely important, Very important, Moderately important, Slightly important, Not at all important
Interest: Extremely interested, Very interested, Moderately interested, Slightly interested, Not at all interested
Likelihood: Very likely, Moderately likely, Neither likely nor unlikely, Moderately unlikely, Very unlikely
Numeric Scales: Less than #, About the same, More than #
Probability: Definitely would, Probably would, Probably wouldn’t, Definitely wouldn’t
Proficiency: Beginner, Developing, Competent, Advanced, Expert (typical for Rubrics)
Quality: Excellent, Good, Average, Below average, Poor
Satisfaction: Very satisfied, Moderately satisfied, Neither satisfied nor dissatisfied, Moderately dissatisfied, Very dissatisfied (another version removes the “moderately” qualifier and/or uses “neutral”)
Taken from https://baselinesupport.campuslabs.com/hc/en-us/articles/204305485-Recommended-Scales