An ongoing controversy within the high-end audio community is the efficacy of blind versus sighted audio product listening tests. In a blind listening test, the listener has no specific knowledge of what products are being tested, thereby removing the psychological influence that the product’s brand, design, price and reputation have on the listeners’ impression of its sound quality. While double-blind protocols are standard practice in all fields of science - including consumer testing of food and wine - the audio industry remains stuck in the dark ages in this regard. The vast majority of audio equipment manufacturers and reviewers continue to rely on sighted listening to make important decisions about the products’ sound quality.
An important question is whether sighted audio product evaluations produce honest and reliable judgments of how the product truly sounds.
A Blind Versus Sighted Loudspeaker Experiment
This question was tested in 1994, shortly after I joined Harman International as Manager of Subjective Evaluation . My mission was to introduce formalized, double-blind product testing at Harman. To my surprise, this mandate met rather strong opposition from some of the more entrenched marketing, sales and engineering staff who felt that, as trained audio professionals, they were immune from the influence of sighted biases. Unfortunately, at that time there were no published scientific studies in the audio literature to either support or refute their claims, so a listening experiment was designed to directly test this hypothesis. The details of this test are described in references 1 and 2.
A total of 40 Harman employees participated in these tests, giving preference ratings to four loudspeakers that covered a wide range of size and price. The test was conducted under both sighted and blind conditions using four different music selections.
The mean loudspeaker ratings and 95% confidence intervals are plotted in Figure 1 for both sighted and blind tests. The sighted tests produced a significant increase in preference ratings for the larger, more expensive loudspeakers G and D. (note: G and D were identical loudspeakers except with different cross-overs, voiced ostensibly for differences in German and Northern European tastes, respectively. The negligible perceptual differences between loudspeakers G and D found in this test resulted in the creation of a single loudspeaker SKU for all of Europe, and the demise of an engineer who specialized in the lost art of German speaker voicing).
Brand biases and employee loyalty to Harman products were also a factor in the sighted tests, since three of the four products (G,D, and S) were Harman branded. Loudspeaker T was a large, expensive ($3.6k) competitor's speaker that had received critical acclaim in the audiophile press for its sound quality. However, not even Harman brand loyalty could overpower listeners' prejudices associated with the relatively small size, low price, and plastic materials of loudspeaker S; in the sighted test, it was less preferred to Loudspeaker T, in contrast to the blind test where it was slightly preferred over loudspeaker T.
Loudspeaker positional effects were also a factor since these tests were conducted prior to the construction of the Multichannel Listening Lab with its automated speaker shuffler. The positional effects on loudspeaker preference rating are plotted in Figure 2 for both blind and sighted tests. The positional effects on preference are clearly visible in the blind tests, yet, the effects are almost completely absent in the sighted tests where the visual biases and cognitive factors dominated listeners' judgment of the auditory stimuli. Listeners were also less responsive to loudspeaker-program effects in the sighted tests as compared to the blind test conditions. Finally, the tests found that experienced and inexperienced listeners (both male and female) tended to prefer the same loudspeakers, which has been confirmed in a more recent, larger study. The experienced listeners were simply more consistent in their responses. As it turned out, the experienced listeners were no more or no less immune to the effects of visual biases than inexperienced listeners.
In summary, the sighted and blind loudspeaker listening tests in this study produced significantly different sound quality ratings. The psychological biases in the sighted tests were sufficiently strong that listeners were largely unresponsive to real changes in sound quality caused by acoustical interactions between the loudspeaker, its position in the room, and the program material. In other words, if you want to obtain an accurate and reliable measure of how the audio product truly sounds, the listening test must be done blind. It’s time the audio industry grow up and acknowledge this fact, if it wants to retain the trust and respect of consumers. It may already be too late according to Stereophile magazine founder, Gordon Holt, who lamented in a recent interview:
“Audio as a hobby is dying, largely by its own hand. As far as the real world is concerned, high-end audio lost its credibility during the 1980s, when it flatly refused to submit to the kind of basic honesty controls (double-blind testing, for example) that had legitimized every other serious scientific endeavor since Pascal. [This refusal] is a source of endless derisive amusement among rational people and of perpetual embarrassment for me..”
 Floyd Toole and Sean Olive,”Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests, and Other Interesting Things,” presented at the 97th AES Convention, preprint 3894 (1994). Download here.
 Floyd Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal Press, 2008.