Item Development

This step is an important step in the development of a scale as serious problems with the item pool will reverberate through all subsequent data analyses and scale construction efforts.

items should be written that are (i) relevant to the constructs to be measured, and (ii) representative of all potentially important aspects of the target construct. Having formal construct definitions is particularly important here, as such definitions should guide the item writing process.

Besides including items to cover all the different facets of a particular construct, it’s also important that the item pool includes items reflecting all levels of the construct.

Item writing guidelines (Simms, 2008):

Write items using simple and straightforward language that is appropriate for the reading level of the measure’s target population.
Avoid writing complex or convoluted items that are difficult to read and understand (e.g., double-barreled items such as ‘My outgoing nature would make me a good salesperson’, since they confound different characteristics – in this case, being outgoing and being a good salesperson – that may not covary in some individuals.
Avoid using slang and colloquial expressions that may quickly become obsolete. Be careful that phrasing does not affect responses in unexpected ways (e.g., including ‘worry’ in an item nearly guarantees that the it will have a neuroticism component).
To the extent possible, write a mix of positively and negatively worded items to guard against response sets.
Phrase items generally enough that most or all targeted respondents can provide a reasonably appropriate response (e.g., write ‘I get tired after I exercise’ rather than ‘I get tired after playing soccer’).
To increase the likelihood of truthful responding, phrase items asking about sensitive issues using straightforward, matter-of-fact, and nonpejorative language.

Pilot testing

After the initial item pool is complete, it makes sense to pilot test the items before running a large-scaled exploratory study.

Factor loadings can be improved by using multiple response (Likert-type) items, as they generally result in higher loadings than two-choice items (Comrey & Montag, 1982; Oswald & Velicer, 1980; Velicer, DiClemente, & Corriveau, 1984; Velicer, Govia, Cherico, & Corriveau, 1985; Velicer & Stevenson, 1978). Likewise, the quality of item writing can affect the size of the loadings, that is, the expression of an item in simple language, restricting the item to a single idea, or using content that is appropriate to a majority of respondents are all ways of improving items.

References

Simms, L. J. (2008). Classical and modern methods of psychological scale construction: Scale construction. Social and Personality Psychology Compass, 2(1), 414–433. https://doi.org/10.1111/j.1751-9004.2007.00044.x