A Question of Asking Questions
OVERVIEW
Research Background
When we design a survey to measure user feedback, a common approach is to ask users to respond to questions with a numerical scale such as 1-5 or 1-7. However, there seems to be no universally accepted standard in selecting the number of points used on a scale. Whether an arbitrary selection of scale points will influence the findings of a study remains unknown.
Research Question
Given the same survey questions, will scales with different points such as 1-5, 1-7, 1-9, 0-10, and 0-100 produce different study results?
Target User
Survey takers.
METHOD
Research Method
A longitudinal study (Time 1 study and Time 2 study) with thousands of participants was conducted. Two types of stimulus messages were created for this research, including four high quality ads and four low quality ads. The four high quality ads were created by a professional graphic designer, featuring four different products including jeans, stereo speakers, toothpaste, and paper tissues. The four low quality ads were the same as the high quality ads, except that each of them contained three typos.
In the study, participants were randomly assigned to evaluate either high quality ads or low quality ads. Their feedback on the ads were measured by one of the five scales (1-5, 1-7, 1-9, 0-10, or 0-100).
Reason for Selecting the Longitudinal Study Method
This longitudinal method was chosen because a user’s feedback could be measured twice in the research (Time 1 vs. Time 2). If a scale used to measure user feedback were reliable, it should show a significant difference when a user saw different ads in Time 1 and Time 2. On the other hand, it should show no significant difference when the user saw the same ads in both Time 1 and Time 2.
Research Participants
Participants of this research were recruited from Amazon Mechanical Turk. There were 2,610 participants in Time 1 study. Four weeks later, they were all invited to participate in Time 2 study. There were 1,356 participants who accepted the invitation and completed Time 2 study.
FINDINGS
Research Findings
A “top 2 box” analysis was performed to see if users rated high quality ads more favorably than low quality ads. The results showed an expected pattern in general (i.e., users favored high quality ads), but the pattern was not clear for the 0-100 scale.
To compare scores on different scales, all scores were re-scaled to 0-100 by using the formula [(rating - 1)/(number of response categories - 1)] * 100. For example, 2 on a 1-5 scale will be 25 on a 0-100 scale because [(2 - 1)/(5 - 1)] * 100 = 25. When high quality ads were compared to low quality ads, no matter which scale was used, the scores for high quality ads were higher than those for low quality ads (although the p-values in these comparisons were not all < .05).
Moreover, the difference of user feedback was calculated for those who completed both Time 1 study and Time 2 study. The repeated measures ANOVA results showed that participants provided more favorable feedback for high quality ads than low quality ads, no matter same or different scales were used in Time 1 and Time 2.
DELIVERABLES
Actionable Implications
Overall speaking, all five scales tested in this research (1-5, 1-7, 1-9, 0-10, & 0-100) measured user feedback in a consistent way. To apply these findings to research practices, using different scale points likely will not influence the findings of a study (if the p-value is not used as the only criterion to judge a difference being significant or not). Although 5-point and 7-point scales seem popular among researchers, using other scales may be acceptable as they likely will produce the same study results.
Research Publication
Cong Li and Khudejah Ali (2021), “Measuring attitude toward the ad: A test of using arbitrary scales and ‘p < .05’ criterion?” International Journal of Market Research, 63(5), 620-634.