Audio Musings by Sean Olive: Floyd Toole

Thursday, April 9, 2009

The Dishonesty of Sighted Listening Tests

An ongoing controversy within the high-end audio community is the efficacy of blind versus sighted audio product listening tests. In a blind listening test, the listener has no specific knowledge of what products are being tested, thereby removing the psychological influence that the product’s brand, design, price and reputation have on the listeners’ impression of its sound quality. While double-blind protocols are standard practice in all fields of science - including consumer testing of food and wine - the audio industry remains stuck in the dark ages in this regard. The vast majority of audio equipment manufacturers and reviewers continue to rely on sighted listening to make important decisions about the products’ sound quality.

An important question is whether sighted audio product evaluations produce honest and reliable judgments of how the product truly sounds.

A Blind Versus Sighted Loudspeaker Experiment

This question was tested in 1994, shortly after I joined Harman International as Manager of Subjective Evaluation [1]. My mission was to introduce formalized, double-blind product testing at Harman. To my surprise, this mandate met rather strong opposition from some of the more entrenched marketing, sales and engineering staff who felt that, as trained audio professionals, they were immune from the influence of sighted biases. Unfortunately, at that time there were no published scientific studies in the audio literature to either support or refute their claims, so a listening experiment was designed to directly test this hypothesis. The details of this test are described in references 1 and 2.

A total of 40 Harman employees participated in these tests, giving preference ratings to four loudspeakers that covered a wide range of size and price. The test was conducted under both sighted and blind conditions using four different music selections.

The mean loudspeaker ratings and 95% confidence intervals are plotted in Figure 1 for both sighted and blind tests. The sighted tests produced a significant increase in preference ratings for the larger, more expensive loudspeakers G and D. (note: G and D were identical loudspeakers except with different cross-overs, voiced ostensibly for differences in German and Northern European tastes, respectively. The negligible perceptual differences between loudspeakers G and D found in this test resulted in the creation of a single loudspeaker SKU for all of Europe, and the demise of an engineer who specialized in the lost art of German speaker voicing).

Brand biases and employee loyalty to Harman products were also a factor in the sighted tests, since three of the four products (G,D, and S) were Harman branded. Loudspeaker T was a large, expensive ($3.6k) competitor's speaker that had received critical acclaim in the audiophile press for its sound quality. However, not even Harman brand loyalty could overpower listeners' prejudices associated with the relatively small size, low price, and plastic materials of loudspeaker S; in the sighted test, it was less preferred to Loudspeaker T, in contrast to the blind test where it was slightly preferred over loudspeaker T.

Loudspeaker positional effects were also a factor since these tests were conducted prior to the construction of the Multichannel Listening Lab with its automated speaker shuffler. The positional effects on loudspeaker preference rating are plotted in Figure 2 for both blind and sighted tests. The positional effects on preference are clearly visible in the blind tests, yet, the effects are almost completely absent in the sighted tests where the visual biases and cognitive factors dominated listeners' judgment of the auditory stimuli. Listeners were also less responsive to loudspeaker-program effects in the sighted tests as compared to the blind test conditions. Finally, the tests found that experienced and inexperienced listeners (both male and female) tended to prefer the same loudspeakers, which has been confirmed in a more recent, larger study. The experienced listeners were simply more consistent in their responses. As it turned out, the experienced listeners were no more or no less immune to the effects of visual biases than inexperienced listeners.

In summary, the sighted and blind loudspeaker listening tests in this study produced significantly different sound quality ratings. The psychological biases in the sighted tests were sufficiently strong that listeners were largely unresponsive to real changes in sound quality caused by acoustical interactions between the loudspeaker, its position in the room, and the program material. In other words, if you want to obtain an accurate and reliable measure of how the audio product truly sounds, the listening test must be done blind. It’s time the audio industry grow up and acknowledge this fact, if it wants to retain the trust and respect of consumers. It may already be too late according to Stereophile magazine founder, Gordon Holt, who lamented in a recent interview:

“Audio as a hobby is dying, largely by its own hand. As far as the real world is concerned, high-end audio lost its credibility during the 1980s, when it flatly refused to submit to the kind of basic honesty controls (double-blind testing, for example) that had legitimized every other serious scientific endeavor since Pascal. [This refusal] is a source of endless derisive amusement among rational people and of perpetual embarrassment for me..”

References

[1] Floyd Toole and Sean Olive,”Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests, and Other Interesting Things,” presented at the 97th AES Convention, preprint 3894 (1994). Download here.

[2] Floyd Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal Press, 2008.

Sunday, January 11, 2009

What Loudspeaker Specifications Are Relevant to Sound Quality?

This past week I attended the International Loudspeaker Manufacturer’s Association (ALMA) Winter Symposium in Las Vegas where the theme was “Sound Quality in Loudspeaker Design and Manufacturing.” Over the course of 3 days there were presentations, round table discussions, and workshops from the industry’s leading experts focused on improving the sound quality of the loudspeaker. Ironically, the important question of whether these improvements matter to consumers wasn’t raised until the final hours of the symposium in a panel discussion called: “What loudspeaker specifications are relevant to perception?”

The panelists included myself, Steve Temme (Listen Inc.), Dr. Earl Geddes (GedLee), Laurie Fincham (THX), Mike Klasco (Menlo Scientific), and Dr. Floyd Toole (former VP Acoustic Engineering at Harman), who served as the panel moderator. After about 30 minutes, a consensus was quickly reached on the following points:

The perception of loudspeaker sound quality is dominated by linear distortions, which can be accurately quantified and predicted using a set of comprehensive anechoic frequency response measurements (see my previous posting here)
Both trained and untrained listeners tend to prefer the most accurate loudspeakers when measured under controlled double-blind listening conditions (see this article here).
The relationship between perception and measurement of nonlinear distortions is less well understood and needs further research. Popular specifications like Total Harmonic Distortion (THD) and Intermodulation Distortion (IM) do not accurately reflect the distortion’s audibility and effect on the perceived sound quality of the loudspeaker.
Current industry loudspeaker specifications are woefully inadequate in characterizing the sound quality of the loudspeaker. The commonly quoted “20 Hz - 20 kHz , +- 3 dB” single-curve specification is a good example. Floyd Toole made the observation that there is more useful performance information on the side of a tire (see tire below) compared to what’s currently found on most loudspeaker spec sheets (see Floyd's new book "Sound Reproduction").

For the remaining hour, the discussion turned towards identifying the root cause of why loudspeaker performance specifications seem stuck in the Pleistocene Age, despite scientific advancements in loudspeaker psychoacoustics. Do consumers really care about loudspeaker sound quality? Or are they mostly satisfied with the status quo? Why do loudspeaker manufacturers continue to hide behind loudspeaker performance numbers that are mostly meaningless, and often misleading?

The evidence that consumers no longer care about sound quality is anecdotal, largely based on the recent down-market trend in consumer audio. Competition from digital cameras, flat panel video displays, MP3 players, computers, and GPS navigation devices, has decimated the consumers' audio budget. This doesn't prove consumers care less about loudspeaker sound quality, only that there is less available money to purchase it. Marketing research studies indicate that sound quality remains an important factor in consumers' audio purchase decisions. Given the opportunity to hear different loudspeakers under controlled unbiased listening conditions, consumers will tend to prefer the most accurate ones. Unfortunately, with the demise of the speciality audio dealer and the growth of internet-based sales, consumers rarely have the opportunity to audition different loudspeakers - even under the most biased and uncontrolled listening conditions. This is a perfect opportunity and reason for why the industry needs to provide new loudspeaker specifications that accurately portray the perceived sound quality of the loudspeaker.

So why is the loudspeaker industry not moving more quickly towards this goal? In my view, complacency and fear are the major obstacles. The loudspeaker industry is very conservative and largely self-regulated. There are no regulatory agencies to force improvement, or even check whether a product's quoted specifications are compliant with reality. Change will only occur as the result of competition, or pressure exerted by consumers, industry trade organizations (e.g.CEDIA, CEA) or consumer product testing organizations, like Consumer Reports. The fear of adopting a new specification stems from the realization that a company can no longer hide beneath the Emperor's new clothes (i.e. the current specifications). A perceptually relevant specification would clearly identify the good sounding loudspeakers from the truly mediocre ones. In the future, a perceptual-based specification like the one illustrated to the right, could provide ratings on overall sound quality, and various timbral, spatial and dynamic attributes. The consumer could then choose a loudspeaker based on these measured attributes.

In conclusion, all evidence suggests that consumers highly value sound quality when purchasing a loudspeaker, yet current loudspeaker specifications provide little guidance in this matter. It is time the loudspeaker industry grows up and realizes this. Adopting a more perceptually meaningful loudspeaker specification would permit consumers to make smarter loudspeaker choices based on how it sounds. This would better serve the interests of consumers and loudspeaker manufacturers who view the sound quality of a loudspeaker to be its most important selling feature.