Audio Musings by Sean Olive: December 2008

Tuesday, December 30, 2008

Sound Science - Loudspeaker R&D at Harman

The American artist Andy Warhol once said that everyone will eventually have their 15 minutes of fame. The closest I came was being on the cover of Test & Measurement magazine in November 2004. OK, admittedly T&M is not exactly People Magazine, but 1 or 2 pocket protector-wearing test engineers may have noticed the cover while shopping for a new digital oscilloscope or multimeter.

The title of the article is "Sound Science: Musical tastes differ, but tests show that listeners respond with the consistency of spectrum analyzers to loudspeaker performance."

The article explains the science behind loudspeaker R&D at Harman International and is written in a very approachable style for the audio layperson. You can read it here.

Sunday, December 28, 2008

Part 3 - Relationship between Loudspeaker Measurements and Listener Preferences

Part 1 of this article presented experimental evidence from a study conducted by the author demonstrating that trained and untrained listeners prefer the same loudspeakers (see reference 1). Part 2 showed that the trained listeners performed 3 to 20 times better than untrained listeners based on their ability to give discriminating and reliable loudspeaker ratings. In part 3, we examine the relationship between the listeners' loudspeaker preferences and a set of anechoic measurements performed on the loudspeakers used in that study.

The mean loudspeaker preference ratings and 95% confidence intervals, averaged across all listeners, are plotted for each of the four loudspeakers (see the graph to the right). According to the definition of the preference scale, listeners liked loudspeakers P and I, were relatively neutral towards loudspeaker B, and they disliked loudspeaker M.

The next graph on the right shows a set of anechoic measurements for each of the four loudspeakers P, I, B, and M, shown in descending order based on their subjective preference rating. Each loudspeaker was measured at 70 different angles around its horizontal and vertical orbits in order to fully characterize the quality of its on and off-axis sounds, and allow removal of acoustical interference effects from resonances, which can cause harmful colorations to the reproduced sound. These resonances are visually presented as peaks and dips in the frequency response. In each graph, the frequency curves represent, from top to bottom, the quality of the direct sound, the average listening window, the first reflections, the sound power, and the directivity indices for the first reflections and the sound power. The reader is referred to references 2-4 for more background on how these measurements were derived and experimentally validated through controlled listening tests.

There are clear visual correlations between listeners' loudspeaker preferences and the set of frequency graphs. Both trained and untrained listeners clearly preferred the loudspeakers with the flattest, smoothest and most extended frequency response curves, as exhibited in the measurements of loudspeakers P and I. Loudspeaker B was rated lower due to its less extended, bumpy bass, and a large hole centered at 3 kHz in its sound power curve. The measurements of Loudspeaker M indicate it has a lack of low bass, and has a non-smooth frequency response in all of its measured curves. Both the direct and reflected sounds produced by this loudspeaker will contribute serious colorations to the timbre of reproduced sounds.

It is both satisfying and reassuring to know that both trained and untrained listeners recognize and prefer accurate loudspeakers, and that the accuracy can be characterized with a set of comprehensive anechoic measurements. The next logical step is to use these technical measurements as the basis for modeling and predicting listeners' preference ratings. This will be the topic of a future post in this blog.

References

[1] Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. (download for free courtesy of Harman International)

[2] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 1" J. AES Vol. 23, issue 4, pp. 227-235, April 1986. (download for free courtesy of Harman International).

[3] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2," J. AES, Vol. 34, Issue 5, pp. 323-248, May 1986. (download for free courtesy of Harman International)

[4] Allan Devantier, "Characterizing the Amplitude Response of Loudspeaker Systems," presented at the 113th AES Convention, October 2002.

Saturday, December 27, 2008

Part 2 - Differences in Performances of Trained Versus Untrained Listeners

Part 1 of this article summarized the results of a controlled listening experiment conducted by the author where 300+ listeners, both trained and untrained, rated 4 different loudspeakers based on their preference. The results revealed that the trained and untrained listeners had essentially the same loudspeaker preferences (see reference 1). This provides scientific validation for using trained listeners in loudspeaker tests since their preferences can be safely extrapolated to the preferences of the general population of untrained listeners.

In Part 2, we examine why trained listeners are preferred over untrained listeners for use in listening experiments, by examining differences in performance between the two groups. A common performance metric is the F-statistic, calculated by performing an analysis of variance (ANOVA) on the individual listener's loudspeaker ratings. The F-statistic increases in size as the listener's discrimination and reliability increases. This facet of listener performance is highly desirable for scientists (and bean counters) since fewer listeners and trials are required to achieve an equivalent level of statistical confidence. Some researchers have reported that one trained listener is the statistical equivalent of using 8+ untrained listeners, which translates into considerable cost savings for using trained listeners for audio product testing and research.

The above graph plots the mean loudspeaker F-statistics for 4 groups of untrained listeners categorized according to their occupations. The performance scores of the untrained groups are scaled relative to the mean scores of the trained listener in order to facilitate comparisons between trained and untrained listeners. The trained listeners clearly performed better than any of the untrained groups, by quite a large margin. The relative performance of the untrained groups, from best to worst, were the audio retailers (35%), the audio reviewers (20%), the audio marketing-sales group (10%), and the college students (4%).

The better performance of the audio retailers relative to the other untrained groups may be related to psychological factors such as motivation, expectations, and relevant critical listening experience. The college students - the poorest performing group - were also the youngest and least experienced test subjects. They tended to give all four loudspeakers very similar and very high ratings indicating they were easily satisfied. While this is pure speculation, the students may have had lower sound quality expectations developed through hours of listening to low quality MP3 files reproduced through band-limited earbuds. Most surprising was the relatively poor performance of the audio reviewers, who despite their credentials and years of professional experience, performed 1/5 as well as the trained listeners, and 15 full percentage points lower than the audio retailers. These differences in trained and untrained listener performance underscore the benefits of carefully selecting and training the listeners used for audio product testing and research.

In the next installment of this article, technical measurements of the loudspeakers used in these experiments will be presented. From this, we will explore what aspects of their performance lead to higher preference ratings in controlled listening tests.

Reference 1: Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. (This paper can be purchased from the Audio Engineering Society here, or downloaded for free courtesy of Harman International.)

Friday, December 26, 2008

Part 1- Do Untrained Listeners Prefer the Same Loudspeakers as Trained Listeners?

One of the more controversial topics among audio researchers is whether or not trained listeners should be used for audio product testing and research. The argument against using trained listeners is based on a belief that their tastes and preferences in sound quality are fundamentally different from those of the general untrained listener population for whom the product is intended.

There are few published studies to support the notion that trained listeners have different loudspeaker preferences than untrained listeners. To study this question, the author conducted a large study (see reference 1) that compared the loudspeaker preferences of 300+ untrained and trained listeners. Over the course of 18 months, an identical controlled, double-blind listening test was repeated with different groups of trained and untrained listeners who rated 4 different loudspeakers on an 11-point preference scale using 4 different music programs. Loudspeaker positional effects were controlled via an automated speaker shuffler that moves each loudspeaker into the exact same position.

The mean loudspeaker preference ratings for the different groups of listeners are summarized in the above graph. In terms of rank order, the loudspeaker preferences of the untrained listeners (highlighted in red) are essentially the same as those of the trained listeners (highlighted in blue). As a group, the trained listeners tended to give lower ratings, suggesting they may be more difficult to please. An important conclusion from this study is that the loudspeaker preferences of trained listeners can be safely extrapolated to the tastes of consumers having little or no formal listener training. The study did find significant differences between the trained and untrained listeners in terms of how well performed their listening task. This will be discussed in Part 2 that will appear in the next posting of this blog.

Reference 1: Sean. E Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003.

This paper can be purchased from the Audio Engineering Society here, or downloaded for free courtesy of Harman International.

Tuesday, December 23, 2008

Welcome to My Blog on The Science of Sound Recording and Reproduction

This blog is concerned with all matters related to the quality of recorded and reproduced sound. Some of the topics I hope to cover in upcoming posts include recording technology, listening tests, loudspeakers, headphones, automotive audio, and acoustical interactions between loudspeakers and listening rooms.

I am an audio scientist by profession, and in matters related to the sound quality, I prefer to make conclusions based on hard scientific evidence gathered through properly controlled listening tests and meaningful objective measurements. Unfortunately, most of the audio industry doesn't operate this way. Why not? Quality subjective and objective measurements require significant investments in time, facilities, and expertise, whereas opinions on sound quality cost almost nothing. Sometimes you get what you pay for.

I'm particularly interested in the psychoacoustics of audio (i.e. the relationship between the human perception and measurement of sound). Here, controlled listening tests play an important role since they permit scientists to make accurate, reliable and valid correlations between listeners' preferences and the variables being tested (e.g. different loudspeakers, room treatments, etc). From these listening tests will hopefully emerge a set of measurement and design rules from which the audio chain can be consistently optimized to produce a quality listening experience.

I hope the reader will find this blog educational and entertaining.

Note: The above photograph shows a listener auditioning different loudspeakers in Harman International's Multichannel Listening Lab. Loudspeaker positional effects are controlled by an automated speaker mover that shuffles each loudspeaker into the same exact position within 3 seconds. During the test, an acoustically transparent but visually opaque curtain (shown in the up position here) is dropped in front of the loudspeakers so that the listener is not biased by visual factors such as loudspeaker size, brand, price,etc.