of this article summarized the results of a controlled listening experiment conducted by the author where 300+ listeners, both trained and untrained, rated 4 different loudspeakers based on their preference. The results revealed that the trained and untrained listeners had essentially the same loudspeaker preferences (see reference 1). This provides scientific validation for using trained listeners in loudspeaker tests since their preferences can be safely extrapolated to the preferences of the general population of untrained listeners.
In Part 2, we examine why trained listeners are preferred over untrained listeners for use in listening experiments, by examining differences in performance between the two groups. A common performance metric is the F-statistic, calculated by performing an analysis of variance (ANOVA) on the individual listener's loudspeaker ratings. The F-statistic increases in size as the listener's discrimination and reliability increases. This facet of listener performance is highly desirable for scientists (and bean counters) since fewer listeners and trials are required to achieve an equivalent level of statistical confidence. Some researchers have reported that one trained listener is the statistical equivalent of using 8+ untrained listeners, which translates into considerable cost savings for using trained listeners for audio product testing and research.
The above graph plots the mean loudspeaker F-statistics for 4 groups of untrained listeners categorized according to their occupations. The performance scores of the untrained groups are scaled relative to the mean scores of the trained listener in order to facilitate comparisons between trained and untrained listeners. The trained listeners clearly performed better than any of the untrained groups, by quite a large margin. The relative performance of the untrained groups, from best to worst, were the audio retailers (35%), the audio reviewers (20%), the audio marketing-sales group (10%), and the college students (4%).
The better performance of the audio retailers relative to the other untrained groups may be related to psychological factors such as motivation, expectations, and relevant critical listening experience. The college students - the poorest performing group - were also the youngest and least experienced test subjects. They tended to give all four loudspeakers very similar and very high ratings indicating they were easily satisfied. While this is pure speculation, the students may have had lower sound quality expectations developed through hours of listening to low quality MP3 files reproduced through band-limited earbuds. Most surprising was the relatively poor performance of the audio reviewers, who despite their credentials and years of professional experience, performed 1/5 as well as the trained listeners, and 15 full percentage points lower than the audio retailers. These differences in trained and untrained listener performance underscore the benefits of carefully selecting and training the listeners used for audio product testing and research.
In the next installment
of this article, technical measurements of the loudspeakers used in these experiments will be presented. From this, we will explore what aspects of their performance lead to higher preference ratings in controlled listening tests.
Reference 1: Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. (This paper can be purchased from the Audio Engineering Society here, or downloaded for free courtesy of Harman International.)