Thursday, December 9, 2010

How to Listen: A Course on How to Critically Evaluate the Quality of Recorded and Reproduced Sound

Next month, I will be giving a one day course on How to Listen at the 2011 ALMA Winter Symposium in Las Vegas,held Jan. 4th and 5th, just prior to the CES show. The symposium will also feature other courses, workshops and paper sessions on loudspeaker and headphone design, testing and evaluation. You can register for my course and other events at the ALMA Symposium here. Below is a brief preface for my course How to Listen, which I encourage you to attend.

Figure 1: A listener training in the Harman International Reference Listening Room using the original version of the How to Listen training software.

Whether you are involved in the mixing of live and recorded sound, the design and calibration of sound systems, or just shopping for a new audio system, the question “Does it sound good?” is usually foremost on your mind. With sufficient prodding, most people can offer an opinion on the overall sound quality of a recording and its reproduction. Beyond that, listeners generally lack the necessary experience, training and vocabulary to describe which specific aspects of the sound they like and dislike. Sadly, the audio industry has no standardized terminology that allows musicians, audio engineers and audiophiles to communicate with each other about sound quality in a concise and meaningful way. Courses in critical listening are not commonly available in audio programs at universities, and are even less available to the general public. In summary, there is a real need for a comprehensive course that teaches audio enthusiasts how to critically evaluate sound quality.

A Scientific Approach Towards Training Listeners
To address this need, the author has developed a critical listening course called How to Listen. The course aims to teach students how to evaluate sound quality using percepts well established in the auditory perception field. These sound quality percepts are taught and demonstrated in a controlled way using real-time processing of recorded sounds. This has two benefits. First, the intensity of each attribute can be adjusted according to the aptitude and performance of the listener. Second, closely tying the physical properties of the stimulus to its perception and evaluation (a science known as psychoacoustics) there is theoretical basis behind the training approach. For example, the listener training data can be used to better understand how we perceive sound quality, which physical aspects of sound matter most of its perceived quality, and possibly identify the important underlying sound quality attributes that influence our preferences. Critical listening is treated as a science, rather than the black art it currently is.
How to Listen also includes classroom topics in the fundamentals of human auditory perception, sound quality research in variables that significantly influence the quality of recorded and reproduced sound (e.g. loudspeakers, rooms, recordings, microphones) and a brief tutorial in how to conduct sound quality listening tests that produce accurate, reliable and valid results.
But before we get too far ahead of ourselves, there must be good reasons for training listeners since it requires an investment in time and resources. There is also the question of external validity: Can the sound quality preferences of trained listeners be extrapolated to the preferences of untrained listeners, and does this hold true across different cultures? These questions will be answered in the following sections.
Why Train Listeners?
There are several compelling reasons for training listeners. First, trained listeners have been shown to produce more discriminating and reliable judgment of sound quality than untrained listeners [1]. Fewer listener can be used to achieve a similar level of statistical confidence, which can result in savings in time and money. For example, a panel of 15 trained listeners can provide sound quality ratings with reliable statistical confidence in less than 8 hours. To achieve a similar level of confidence using untrained listeners would require about 10 times more listeners, 10 times more days to complete the testing, and cost 10 times more money to pay the listeners and staff conducting the tests. If the study is conducted by an independent research firm using 200-300 untrained listeners, the cost can easily exceed $100k.
A second reason for training listeners is that they are able to report precisely what they like and dislike about the sound quality using well-defined, meaningful terms. This feedback can provide important guidance for reengineering the product for optimal sound quality.
Besides training listeners for product research, there are benefits in training audio marketing and sales people to become better critical listeners. Training makes them better equipped to communicate sound quality issues to audio engineers and customers. As audio companies expand sales and operations in China, India, and other developing countries, there is a growing need to develop a common cross-cultural understanding as to what constitutes good sound and unacceptable sound.
Does Training Bias Listeners?
An important question is whether the training process itself biases the sound quality preferences of listeners. If the trained listener preferences are different from those of the targeted demographic, there is a danger the product may not be well received in the marketplace. This raises the age old question, “Is preference in sound quality a matter of personal taste - much like food, wine and music - or is it universal?”
To study this question, the author compared the performances and loudspeaker preferences of trained listeners versus untrained listeners [1]. Over 300 untrained listeners were tested over a period of 18 months where they compared four different loudspeakers under controlled, double-blind listening conditions. Their preferences were then compared to the preferences of the trained Harman listening panel.
The results, plotted in Figure 2, show that the rank ordering of the loudspeaker preferences were the same for both the trained and untrained listeners. There were two main differences in how the two groups of listeners responded. First, the trained listeners tended to give lower loudspeaker ratings overall. Second, the trained listeners distinguished themselves from the untrained listeners by generally giving more discriminating and consistent loudspeaker preference ratings.

Figure 2: The mean loudspeaker preference ratings and 95% confidence intervals are shown for four loudspeakers evaluated in a controlled, double-blind listening test. The results of different groups of untrained listeners are compared to those of the 12 Harman listeners.

Relative Performances of Trained Versus Untrained Listeners
A common performance metric used to quantify the listener’s discrimination and consistency in rating sound quality is the F-statistic. This calculation is done by performing an analysis of variance (ANOVA) on the main variable being tested. In the above study [1], the performances of trained versus untrained listeners were compared by calculating the loudspeaker F-statistic for each individual listener.

Figure 3 shows the relative performance of different groups of untrained listeners based on their mean F-statistics compared to the F-statistics of the trained listeners. The relative performances of the untrained groups were: audio retailers (35%), audio reviewers (20%), audio marketing/sales staff (10%), and college students (4%). The poor performance of the students was explained by their tendency to give all four loudspeakers very similar and high ratings. A likely explanation for this was that they experienced a level of sound quality that was much higher than their everyday common experience: compressed MP3 music reproduced through headphones. The good news is that the students seemed to appreciate the higher fidelity sound based on the high ratings. In time, they will hopefully seek out better quality audio systems.

Figure 3: The relative performance of different groups of untrained listeners compared to the trained Harman listeners. Performance is based on the group’s average loudspeaker F-statistic which represents their ability to give discriminating and consistent preference ratings.

Are There Cross-Cultural Preferences in Sound Quality?
One of the oldest controversies in audio is the notion that different cultures or geographical regions of the world have different sound quality preferences [see reference 2]. For example, it is often claimed that Japanese listeners have different loudspeaker preferences than Americans due to differences in language, music, cultural practices and norms, and the acoustics of their homes. So far, very little formal research has done on this subject. In some preliminary studies, the author has found no significant differences in sound quality preferences for loudspeakers and automotive audio systems among Chinese, Japanese and American listeners.
How to Listen: A New Listener Training Software Application
Research has found most sound quality percepts fall under the attribute categories of timbre, spatial, dynamic or related nonlinear distortion. Within these four attributes there are additional sub-attributes that describe more specific sonic characteristics of the attribute. For example, Bright-Dull and Full-Thin are timbre sub-attributes related to the relative emphasis and de-emphasis of high and low frequencies, respectively. Sub-attributes for spatial quality deal with the location and width of the auditory image(s), and the perceived sense of spaciousness or envelopment. Distortion sub-attributes include the presence of noise, hum, audible clipping and distortions specific to the audio device(s) under test.
How to Listen focuses on teaching listeners to evaluate sound quality differences based on these four attributes and their sub-attributes (see Figure 4). While listening to music recordings, one or more attributes are manipulated in a controlled way so that listeners recognize and report the magnitude of these changes using the appropriate terms and scales. An analogy to this would the Wine Aroma Wheel where expert wine tasters are trained to identify the intensities of different aroma-flavors perceived in the wine.

Figure 4: A list of the 17 different training tasks that focus on one or more of the four sound quality attributes: spectral (timbral), spatial, distortion and dynamics.

To facilitate the training process, a proprietary computer-based software program called
“How to Listen” was developed by Harman software engineers Sean Hess and Eric Hu. The software runs on both Mac and PC computers, and can play both stereo and multichannel music files. A real-time DSP engine built into the software application allows real-time manipulation of sound quality attributes in response to the listeners’ responses and performance.
There are currently five different types of training tasks that focus on one or more sound quality attributes (see Figure 4):
  1. Band Identification
  2. Spectral Plot
  3. Spatial Mapping
  4. Attribute Test
  5. Preference Test
Band Identification (see Figure 5) teaches listeners to identify spectral distortions based on their frequency, level and Q-factor using combinations of peak/dip and highpass/lowpass filters. In each trial, the listener compares the unequalized version of the music track (FLAT) to a version that has been equalized (EQ) using one of the filters drawn on the screen. The listener must select the correct filter (Filter 1 or 2) they believe has been applied to the equalized version.

Figure 5: A screen capture of the listener training task “Band Identification” in Harman’s “How to Listen” training software. The listener compares the unequalized music “Flat” to an equalized version (EQ) and must select the EQ filter that is associated with its sound.

The difficulty of each training task automatically increases or decreases based on the listener’s performance. The listener is give immediate feedback on their responses, and they can audition all possible response choices when they enter an incorrect response.

The training task Spectral Plot requires the listener to compare different music programs that have been equalized a number of different ways. The listener must select the equalization curve that best matches its sound quality. This task teaches listeners to behave like human spectrum analyzers. Once fully trained, the listener can draw a graph of the audio system’s frequency response based on how it sounds.
The Spatial Mapping task requires the listener to graphically indicate on a two-dimensional map where a sound appears in the listening space. The Attribute training task requires the listener to correctly rank order two or more sounds on a given attribute scale based on the intensity of the attribute (e.g. bright-dull). For the Preference task, the listener must give preference ratings where the sound quality of the music has been modified for one or more sound quality attributes. The performance of the listener is calculated based on a statistical post-hoc test that determines the discrimination and reliably of the listeners’ preference ratings. Together, these different training tasks teach listeners to critically evaluate any type of sound quality variation they are likely to encounter when listening to recorded and reproduced sound.

The evaluation of sound quality remains an elusive art in the audio industry. Better awareness, understanding, and appreciation of sound quality may be possible if there existed a method to teach listeners how to evaluate the quality of reproduced sound and report what they hear using well-defined and meaningful terminology. How to Listen is a listener training course that aims to achieve those goals. Listeners are taught to identify and rate audible changes to different sound quality percepts related to the spectral, spatial, dynamic and distortion qualities of recorded music. Performance metrics based on the discrimination, accuracy and reliably of the listeners’ responses are factored into whether the listener meets the criterion of being a “trained” listener. The question of whether a listener is truly golden eared or not, is no longer a matter of conjecture and debate since How to Listen will ultimately reveal the true answer.
[1] Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.
[2] Sean E. Olive, “Are There Cross-Cultural Preferences in the Quality of Reproduced Sound?” Audio Musings, July 2,

Saturday, September 18, 2010

Harman Debunks Youthful Music Myths

Robert Archer, a writer at CEPro magazine has written a nice article called "Harman Debunks Youthful Music Myths." The article is based on an interview he did with me a couple of weeks ago, and summarizes some recent Harman research on Generation Y's sound quality preferences for different digital music file formats (MP3 versus CD) and loudspeakers. The details of the preliminary research were first reported back in June in a blog posting, "Some New Evidence that Generation Y May Prefer Accurate Sound Reproduction."

The early results of that research suggest that today's youth prefer higher quality music formats and accurate loudspeakers when given the opportunity to A/B them under controlled, double-blind listening conditions. While it is refreshing news that good sound is not lost on today's youth, the challenge is to figure out how to market and sell it to them.

Unfortunately, good A/B audio demonstrations are becoming nearly extinct. Internet and Big Box store sales of audio equipment and music generally don't provide such listening opportunities. In the end, consumer education, meaningful audio specifications and measurements that are indicative of a product's true sound quality, and accurate, unbiased product reviews, will help consumers make more informed audio and music purchase decisions as they relate to sound quality. Until then, most consumers will never know for sure whether or not they've purchased something that is truly "good enough."

Tuesday, July 13, 2010

The Danger From Headphones

Below is an English translation of a recent article "Gefahr aus dem Kopfhörer" (The Danger From Headphones) written by Matthias Hohensee over at Valley Talk. His article refers to my recent investigations into whether younger generations prefer lossy MP over higher quality music file formats. The preliminary results of that study were reported in the article I recently posted called, “Some New Evidence that Generation Y May Prefer Accurate Sound Reproduction”.
Matthias makes a good point about listener preference for MP3 becoming a moot issue with higher quality file formats becoming the standard, as bandwidth and music storage costs drop. I only briefly mentioned this in my slide presentation (see slide 7), but it deserves repeating. The days of low quality music downloads are numbered, I hope. Then, the main sound quality issue will become the recordings themselves, and the quality of the headphones and loudspeakers through which the recordings are heard. What are your thoughts on this matter?

The Danger from Headphones
by Matthias Hohensee
from Valley Talk 6.30.2010
Can the Germans be really proud of MP3 or has the digital stroke of genius desensitized the hearing of a complete generation?

When Angela Merkel recently visited the prestigious Stanford University in Silicon Valley and enumerated German technology services, she also mentioned the data compression method MP3. The technology that was largely developed by scientists at the Fraunhofer Institute has changed the music industry, even though it’s mainly U.S. companies that profit from the MP3 player market.
But can the Germans be really proud of MP3? Or has the digital stroke of genius desensitized the hearing of a complete generation? At least the observations of Jonathan Berger suggest this. Over the years the Stanford professor of music has been asking his students if they are satisfied with compressed music files, or if they prefer the full Hi-Fi sound.
He came to a surprising result: For years, the number of those who preferred the sound of ‘packed’ music to the uncompressed audio spectrum seems to grow steadily. Berger concludes that the taste of sound has changed.
Good sound is measurable
Sean Olive, on the other hand, considers Berger's insight as nonsense: "Good, accurately reproduced sound is not a question of taste, but scientifically measurable." And this is the way he should see it. After all, Olive is the head of acoustic research at Harman International. The U.S. manufacturer is considered to be THE address for sophisticated sound systems.
Alarmed by Berger's observations Olive recently invited Los Angeles high school students to the Harman studios for extensive tests. "Everyone could hear the difference between different compacted sound files - and preferred less compressed songs," says the scientist relieved.

Danger from headphones
Now, Olive is not really unbiased, after all Harman sells nearly three billion dollars worth of high-end audio technology per year. But in fact, technical progress makes Olive's worries already obsolete. In times of high-speed Internet, data compression does not play the same role as it did in the nineties when the music piracy supplier Napster made MP3 popular.
The songs that were exchanged back then were extremely compressed in order to distribute them via the still slow Internet connections – but also to spare the limited memory of computers and MP3 players. Today the vendors such as Apple and Amazon are selling songs which are formatted in such a way that only real audiophiles can hear the difference to music CDs.
And so the real dangers for the hearing of ‘generation iPod’ aren’t the highly compressed music files, but simply the volume adjustment of their headphones.

Acknowledgements: Thank you to the author Matthias Hohensee for permission to repost his article here, and to Alena Winterhoff for the English translation.

Friday, July 9, 2010

Why Live-versus-Recorded Listening Tests Don't Work

Figure 1: Singer Frieda Hempel conducting a Tone Test at Edison Studios, NYC in 1918. Note that many of the listeners' ears are covered by the blind folds making it a double blind and double deaf listening test, since the experimenter Edison was deaf himself.

Recently I was asked how I could possibly prove or assert that listeners prefer accurate loudspeakers without having performed a live-versus-recorded listening test. This is a test where the listener compares a live musical performance to a recording of the performance reproduced through loudspeakers. The closer the sound quality of the reproduction is to that of the live performance, the more accurate the loudspeaker is deemed to be - at least in theory. In practice, these tests are usually ridden with so many uncontrolled listening test nuisance variables that the results are essentially meaningless. This article examines why live-versus-recorded listening tests are not suitable for serious scientific investigations of the perceived sound quality of recorded and reproduced sound.

Edison’s Tone Tests: “People will hear what you tell them to hear”
Thomas Edison was among the first audio engineers to embrace live-versus-recorded demonstrations. In 1910, he invented the Edison Diamond Disk Phonograph, which he claimed had “no tone” of its own. To prove it, a series of road shows involving 4,000 live-versus-recorded demonstrations of his phonograph were conducted in auditoriums across the United States At some point during the live music performance there would be a switch over to the recorded performance, and apparently audience members could not tell the difference between the live and recorded performances

After a 1916 live-versus-recorded demonstration in Carnegie Hall, the New York Evening Mail stated “the ear could not tell when it was listening to the phonograph alone, and when to actual voice and reproduction together. Only the eye could discover the truth by noting when the singer’s mouth was open or closed” [1].

By today’s standards, the fidelity of Edison’s disc phonograph was egregious in terms of its noise, distortion, limited dynamic range, bandwidth and frequency response (you can hear some of Edison’s recordings online here). It’s hard to imagine that listeners were fooled into thinking his Diamond Disk recording was indistinguishable from the live performance. In fact, we now know that Edison manipulated the tests to produce the results he wanted. First, he carefully chose the music and musicians to work within the technical limitations of his technology. Edison detested music with extreme dynamics, high tones, vibrato and complex textures because they were a challenge to his deafness and his Tone Tests. He selected and coached musicians to mimic the sound of their recordings to minimize the audible differences between live and recorded performances [1],[2].

Second, Edison was the consummate audio salesman and was known to say, “People will hear what you tell them to hear” [2]. The expectations and perceptions of his listeners were manipulated before the test to produce a more predicable outcome. Audience members were given a concert program before his Tone Tests that clearly told them exactly what they would hear, how amazing it will sound, and what an appropriate response would be:

“Those who hear this test will realize fully for the first time how literally true it is that Mr. Edison has made possible the re-creation of the artist’s voice. No more exacting test could be made to demonstrate that the New Edison actually does re-create the voice of the artist than to play it side by side with the artist who made the records. This is the final proof. Close your eyes. See if you can distinguish the voice of the New Edison from that of the artist. Did you ever believe it possible to re-create a voice? Note that the voice of the artist and the voice of the Edison are indistinguishable” [emphasis is mine] [ 3].

Figure 2: Another Edison Tone Test where extraneous biases related to sight and smell may have compromised the results based on the large number of listeners covering their noses. Perhaps a bad case of singer's halitosis made it possible to identify the live performance from the recorded one based on smell alone?

Other Live-Versus-Recorded Demonstrations

Following Edison’s live-versus-recorded demonstrations, other tests have been conducted by Harry Olson at RCA, and G.A. Briggs (Wharfedale) and Peter Walker at Quad in the 1950’s. [4]. A common problem with these demonstrations was double reverberation: the reverberation of the room was heard both in the recording, and again when it was reproduced through loudspeakers in the same room. This made it easier for listeners to tell the difference between the recorded and live performances.

Acoustic Research's Live-Versus-Recorded Demonstrations

During the 1960’s, Acoustic Research (AR), an American loudspeaker company, performed over 75 live-versus-recorded concerts in cities around the USA featuring The Fine Arts String Quartet, and the AR-3 loudspeaker [5],[6]. To solve the double reverberation problem, the recordings of the quartet were made in an anechoic chamber, or outdoors. Outdoor live-versus-recorded demonstrations had the added benefit that there were no room reflections in either the recording or the live performance. This made the demonstrations less sensitive to off-axis problems in the microphones and loudspeakers. It also relaxed the demands on the recording-reproduction to accurately capture and reproduce the complex spatial properties of a reverberant performing space.

The AR demonstrations apparently generated an enormous amount of free publicity in newspapers and audio magazines where it was reported that the reproduction of the recordings was virtually indistinguishable from the live performance. AR sales increased dramatically, to the point where in 1966 AR apparently owned 32% market share of loudspeakers sold in the United States.

A Live-Versus-Recorded Method For Testing Loudspeaker Accuracy

Edgar Villchur, head of Acoustic Research, to his credit, was a firm believer that loudspeakers should accurately reproduce the art (the recorded music) and not editorialize or enhance it. In a 1962 paper, he described a live-versus-recorded method for evaluating the accuracy of loudspeakers [7]. The method used a reference loudspeaker (the live performance) that was placed in the listening room with the loudspeaker-under-test. The goal of the loudspeaker-under-test was to accurately reproduce a previous recording of the reference loudspeaker playing white noise in an anechoic chamber. The original white noise signal was also fed to the reference loudspeaker during the listening test. The more similar the loudspeaker-under-test sounded to the reference speaker, the more accurate it was deemed to be, at least in theory.

Villchur acknowledged that the sensitivity and validity of the method depended on the quality of the reference loudspeaker, its directivity, and the choice of program material. White noise was more revealing of loudspeaker inaccuracies than music. His reference loudspeaker consisted of a single 2-inch midrange from an AR-3 loudspeaker selected because he found using multiple drivers caused acoustical inference that was audible in the anechoic chamber, but not so audible in a reverberant listening room; these differences would produce errors in the listening test. One wonders how a tiny 2-inch driver could have produced adequate high treble and low bass without distortion. These limitations would significantly limit the accuracy and usefulness of this listening test method.

Another problem with this method was that the anechoic loudspeaker recordings were made at a single point in space, and did not capture the directivity and off-axis characteristics of the reference loudspeaker. Unless the speaker-under-test had the same directivity and off-axis characteristics of the reference loudspeaker, it could never sound exactly the same in a reflective listening room. To compensate for these errors, Villchur used a trial-an-error process to find the best microphone position relative to the reference loudspeaker where the timbre of the anechoic recording best matched the timbre of the reference loudspeaker when placed in a room. Adjusting the recording to mimmic the sound of live performance was the reverse process of what Edison’s musicians did, but essentially it produced the same bias. (Edison would have been proud!)

Finally, it is not clear how Villchur controlled loudspeaker positional biases when comparing the reference loudspeaker to the loudspeaker-under-test. Loudspeaker positional biases have been shown to produce audible effects that are sometimes larger than the audible differences between different loudspeakers under test [9]. At Harman, these positional biases are eliminated via an automated speaker shuffler that places each loudspeaker in the same position of the room.

Summary of Problems with Live-versus-Recorded Tests

By today’s standards, the live-versus-recorded tests performed to date lack the necessary scientific controls and rigor to consider their results or conclusions accurate, repeatable and valid. Below are a few of the most significant psychological, physical, methodological or experimental listening variables that plague these types of tests. While it is possible to control some of these variables, others are either impossible, impractical or too expensive to control.

Sighted and Cross-Modality Biases

To date, most of the live-versus-sighted tests have been performed sighted, where non-auditory cues were available to allow the listener to identify whether they were hearing the live or reproduced sound source. These tests could have been easily made blind via an acoustically transparent curtain; however, scientific validity was apparently not the primary purpose of the test. The visual cues from the musicians (bowing, lip syncing) would also enhance the realism and presence of the reproduction, a well-known cognitive effect observed in research of binaural and virtual reality displays.

Listener Expectation, Authority Bias, Group Interaction Bias

In many of the public live-versus-recorded demonstrations, listeners expectations were manipulated by knowledge given to them by the organizers of the demonstrations. In some cases, listeners were told what the expected response should be before the test began (see Edison's concert programs above). In large groups settings, listeners' responses can be easily swayed by the opinions and reaction of other members in the group (a herd mentality), especially when an authority member is present. These biases are easily removed from live-versus-recorded tests by repeating the test for each individual listener. The live and recorded performances would have to be replicated for every listener, which makes the tests too difficult, expensive, time consuming, and impractical to use.

Qualifications of Listeners

None of the live-versus-recorded tests I've read about have reported the hearing and critical listening qualifications of the listeners who participated in them. These are important variables in the sensitivity and reliability of the test results, and can be easily quantified.

Live and Recorded Performances Must Be Identical

For live-versus-recorded tests to be valid, the live and recorded performance should be identical, having the same notes, intonation, tempo, dynamics, loudness, balance between instruments, and the same location and sense of space of the instruments. Otherwise, there are extraneous cues that allow listeners to readily identify the live and recorded performances. Midi-controlled instruments (e.g. player pianos) are but one example of how this problem could be resolved.

Positional Biases from Live and Reproduced Sound Sources

Unless the live and reproduced (e.g. loudspeakers) sound sources occupy the same physical locations, the listener can always identify the live versus recorded versions based on the localized positions of the sound sources.

Errors in the Recording

The usefulness of live-versus-recorded methods for perceptual measurements of sound quality in the playback chain is severely limited by errors in the recording. The recording errors are not easily separated from the errors in the playback chain (see circle-of-confusion). Microphones and microphone techniques both contain errors that limit the timbral, spatial and dynamic accuracy of the recordings through which we judge loudspeakers. Apparently the most effective live-versus-recorded demonstrations were conducted outdoors - effectively an anechoic environment - where the off-axis performances of the microphones and loudspeakers, and the complex spatial cues of a reflective room were largely removed as factors from the experiment. However, results from outdoor live-versus-recorded tests cannot be generalized to how the loudspeakers would perform in real rooms, where the off-axis sounds provide a significant contribution towards the listener's impression of the loudspeaker.

Lack of Proper Scientific Protocols, Listener Response Data, Statistical Analysis, Results

The most interesting characteristic of live-versus-recorded tests is that they never seem to provide listener response data, statistical analysis or published results. Eyewitness reports written in newspapers or magazines do not constitute scientific evidence.

Accuracy is Not Applicable to Most Recordings Made Today

Most recordings made today are not intended to sound like the live performance. Anyone who heard Taylor Swift's live performance with Stevie Nicks at the 2010 Grammy Awards understands why.(Note: you can relive the magical moment on Youtube. Warning: this may be offensive for the musically-inclined). About 90% of commercial recordings are studio creations consisting of a series of overdubs, processed with auto-tuning, equalization, dynamic compression, and reverb sampled from an alien nation. For these recordings, there is no equivalent live performance to which the recording/reproduction can be compared for accuracy. The only reference is what the artist heard over the loudspeakers in the recording control room. If the important performance aspects of the playback system through which the art (the music and recording) was created can be reproduced in the home, then the consumer will hear an accurate reproduction of the music, as the artist intended. It is possible to achieve this if we adopt a science in the service of art philosophy towards audio recording and reproduction.


In reviewing the history of live-versus-reproduced tests, most have been performed as elaborate sales and marketing demonstrations designed to fool listeners into believing that a product sounded much better and more accurate than it actually was. While live-versus-recorded tests have proven their merit as an effective marketing and sales tool, they have not yet proven themselves as a serious method for scientific experiments intended to advance our psychoacoustic understanding of music recording and reproduction.

The reason for this, I believe, is that live-versus-recorded tests do not adequately control important listening test nuisance variables, a prerequisite for accurate, reliable and scientifically valid results. It is not entirely coincidental, that (to my knowledge) none of the live-versus-recorded tests to date have produced a single scientific publication or new psychoacoustic knowledge.

Hopefully, you now understand why I don’t conduct live-versus-recorded loudspeaker listening tests.


[1] Harvith, J., and Harvith, S. Edison, Musicians and the Phonograph: A Century in Retrospect, Greenwood Press, N.Y (1987).

[2] Andre Milliard, “Edison’s Tone Tests and the Ideal of Perfect Sound Reproduction,” from Lost and Found Sounds’, NPR.

[3] Program for Edison Demonstration

[4] Wharfedale History:

[5] Acoustic Research

[6] Edgar Villchur,

[7] Villchur, Edgar, “A Method of Testing Loudspeakers with Random Noise”, J. Audio Eng. Society, Vol. 10, Issue 4, pp, 306-309 (October 1962),

[8] Kissinger, John R.The Development of the Simulated Live-vs-Recorded Test into a Design Tool, presented at the 35th AES Convention, preprint 609, (October 1968

[9] Olive, Sean E.; Schuck, Peter L.; Sally, Sharon L.; Bonneville, Marc E. “The Effects of Loudspeaker Placement on Listeners' Preference Ratings”,JAES Volume 42 Issue 9 pp. 651-669; September 1994.

Wednesday, July 7, 2010

Harman Kardon's Quest to Standardize Sound

Above: Trained listener Alex Miller is evaluating the sound quality of three loudspeakers in Harman's Multichannel Listening Lab. The automated speaker shuffler ensures that each speaker is heard in the exact same position. The acoustically transparent, visually opaque scrim means the tests are double-blind and not influenced by brand, price or other sighted biases. The computer randomly selects the presentation order of the speakers in each trial and listener controls the switching so that experimenter bias is removed from the test.

There is a nice story written by Stuart Miles over at Pocket-Lint called "Harman Kardon's Quest to Standardize Sound". Check it out and leave a comment whether or not you think standards that define the sound quality of an audio product is something that would benefit the consumer.

This story came from a recent Harman Kardon press event held at Northridge where they kicked off Harman Kardon's Science of Sound campaign that aims to promote the science and philosophy behind how we develop and test our products.

A group of 18 European journalists attended the press event and received presentations from Dr. Floyd Toole and myself about how good sound is not a matter of personal taste, but rather something that can be quantified through scientific-based listening tests, measurements, and psychoacoustics.

The journalists experienced how we train listeners and then participated in brief double-blind listening tests of the Harman Kardon MS100 and MAS100 in our Reference Listening Room and the Multichannel Listening Lab (shown in the picture above).

Saturday, July 3, 2010

Are There Cross-Cultural Preferences in The Quality of Reproduced Sound?

Do we need a new user menu where you dial in your nationality to match your taste in sound quality?
[click on image to see a larger version].

The field of audio is ripe with myths and unsubstantiated opinions. One of the most enduring opinions is that there are cross-cultural preferences in the sound quality of reproduced sound. Some of the more common cross-cultural assertions I hear repeated among audiophiles, audio reviewers and audio marketing executives include these:
  1. Americans prefer more bass than Europeans and Japanese
  2. Japanese prefer less bass and more midrange (and listen at lower volumes)
  3. Germans prefer brighter sound
  4. The British prefer “tighter” or more over-damped bass
To my knowledge, these statements are anecdotal, and have not been tested in any rigorous scientific way. Marketing has already given us misguided menus in media players and automotive head units that adjust the equalization based on music genre (e.g.jazz, classical, hip hop, rock, country music, Christian music, and heavy metal, etc). Do we really need another one based on where we were born? What could the “Canadian” sound have in common with a predisposition towards liking cold long winters, hockey, Molson beer, maple syrup, beaver tails, national health care, and the music of KD Lang and Celine Dion?

While it is easy to dismiss the importance of cross-cultural preferences, the subject is gaining serious attention from audio manufacturers expanding into new markets like China, India, Russia and South America. Now the same age-old questions are being asked: Are there cross-cultural preferences in the quality of reproduced sound or is good sound universal and transcend cultural differences?

Possible Reasons Why Cross-Cultural Preferences in Sound Quality May Exist
Very little research in cross-cultural sound quality preferences exists. Nonetheless, here are some proposed reasons why they may exist according to various sources.
Language, Dialect, Music
Certain spectral balances may compliment and enhance the timbre and intelligibility of different languages and dialects. Similarly the culture’s ethnic music and its instrumentation may be enhanced from certain loudspeakers or EQ. Wouldn’t this enhancement be added to the recording by the artist or the producer when it was mixed? If so, why do we need to duplicate in the playback chain? Is there such a thing as too much enhancement (think Dolly Parton)?
Influence of Regional Building Construction and Room Acoustics
One explanation for regional tastes for certain types of loudspeakers is related to the design and construction of the region's homes and apartments. This would affect the noise isolation and acoustical properties of the room, and its interaction with the loudspeaker. Massive, rigid plaster walls commonly found in older construction in Europe would provide more noise isolation and less absorption of bass than less massive and rigid walls used in typical American construction today. It is argued that a loudspeaker with less bass might sound better in the European room. It should be pointed out that if the different rooms and loudspeakers combine in ways that in the final analysis produce the same sound, this doesn't really constitute a difference in preferred sound quality. Different means are being used to achieve the same end goal. Fortunately, there are technological solutions for dealing with loudspeaker-room interactions at low frequencies so that decent bass performance can be achieved regardless of the room’s size, dimensions and stiffness of its walls.
Influence of Social Norms and Practices
Cultural practices and norms may influence how much bass people like, and how loud they listen to their music. For example, Japanese apartment dwellers may prefer to listen to reproduced sound at lower volumes to avoid disturbing their neighbors, which is a serious social infraction. On the other hand, American urban apartment dwellers may be more tolerant of bass and higher playback levels due to better noise isolation from the wall construction. Tolerance to your neighbor's subwoofers and loud music comes more easily if you know they own a handgun. The right to listen to loud music and bass in America is sort of protected under the second amendment (i.e. the right to bear arms). :)

Possible Reasons Why Cross-Cultural Preferences May Not Exist or Matter
The following arguments do not directly prove that cross-cultural sound quality preferences do not exist. They do provide evidence that the cultural entertainment, broadcast, recording and audio industries have largely decided to ignore cross-cultural preferences. Either they don't believe they exist, or if they do, catering to them doesn't make sense from a business or philosophical viewpoint.
Audio Manufacturers: One Product, One Sound
Most audio companies sell the same model of product in every country, only changing the language of the packaging/owners manual and the power supply voltage to meet the local requirements. Measurements of loudspeakers from different countries of origin tend to aim towards the same performance target. There is nothing in the objective measurements or the listening test results that indicate a unique sound, voicing or preference that can be attributed to the country of origin whether the loudspeaker is British, German, Canadian, American, French, Italian, Danish or Japanese [1]-[3]. Accurate sound seems to be the common universal attribute that matters most. These studies did not formally or systematically study the culture or race of the listener as a factor in loudspeaker preference, so the definitive study remains to be done.
Recording/Film Industries: One Product,One Sound
To my knowledge, record companies do not release different mixes of their recordings to satisfy different cultural tastes in sound quality. Fans of Lady Ga Ga apparently equally like (or dislike) her sound on the recordings whether they are in America, Europe or Asia. Similarly, there is no option in the iTunes store where you indicate your nationality or culture before downloading your music.
Universal Loudspeaker / Audio Standards in Broadcast
If you look at international audio standards for broadcasting (AES, IEC, ITU, EBU), and read the loudspeaker papers written by researchers within the BBC (British), CBC (Canadian) and NHK (Japanese), you will find a common set of performance criteria: flat on-axis response, extended bandwidth in bass and treble, smooth off-axis response and low distortion. At the broadcast level, the playback chain in different countries is not being influenced by cross-cultural preferences in the targeted audience where the content will be heard.
Concert Halls and Live Music Performance
Acoustical design of concert halls have generally followed well established standards and practices based on research using international listening panels. Qualities such as spatial envelopment, reverberation, clarity and richness of timbre are universally accepted as desirable qualities. The classical and romantic composers specifically wrote their music for these particular acoustics, and to radically alter the acoustics would not well serve the art.
The Global Economy
In the new global economy, the political, cultural, socioeconomic and technological barriers have been largely removed. As communication between different cultures improves, this will likely influence their attitudes, tastes and perception towards culture, music and sound reproduction. If there are cross-cultural differences in sound quality preferences, it seems likely that in the future these differences will converge, and taste in sound quality will become more homogeneous (hopefully, in a positive way).
Audio is science in the service of art
This philosophy assumes that music, its performance and recording are part of the art, and the goal of sound reproduction is to accurately reproduce the art. To serve the art, there is no room for cultural preferences or individual tastes in the design of the audio equipment used for reproduction of the art. It is presumed that any cultural sound quality preferences will be encoded in when the music when it is performed and recorded, and doesn’t need to be added again in the playback chain.

Here is a parallel analogy in painting: When a Monet art exhibit travels to different countries, the art is not altered, transformed or "improved" to suit the local tastes of the country. Art lovers want to see the original Monet, not a new and improved version with edge enhancements, higher contrast and 3D effects. The same is true of the sound of Vienna Philharmonic when they do a world tour. When they tour Japan, they don’t leave half the bass section at home because the Japanese do not supposedly like bass. So why would we want to tamper with the original sound of the Vienna Philharmonic when playing recordings of them through our audio system?

Research in Cross-Cultural Preference in Sound Quality of Recorded and Reproduced Sound
In the realm of perception there is an essential pan-human unity, and that most differences among cultures is only a “fine tuning” [4].
To date, very little cross-cultural research has been done in the perception of sound quality. One of the challenges in cross-cultural research is ensuring that the listener instructions, sound quality descriptors and semantic definitions of the scales have the same meaning across cultures. Fortunately, there are methods for removing language from the perceptual task. Multidimensional scaling allows listeners to judge different pairs of sounds based on their similarity. Then the perceptual attributes of the sounds (e.g. timbre or spatial related) can be identified through multivariate statistical methods like principal component analysis. In a study of different guitar timbres, Martens et al. found that native speakers of English, Japanese, Bengali, and Sinhala perceived the same underlying dimensions, but used different adjectives/semantics to describe the attribute [5].
In another study that compared Japanese and English speaking listeners’ perception of music recordings made with four different 5-channel microphone techniques, the authors found a common understanding of three critical dimensions in which the quality of the recordings differed [6].
Recently, we have begun testing cross-cultural sound quality preferences of music reproduced through different loudspeakers, equalizations, and automotive audio systems using American, Japanese and Chinese speaking listeners. While this work is still ongoing, the preliminary results do not show any evidence of cross-cultural preferences among the different groups. Accurate sound reproduction seems to be the common link across the preferences of the different cultures.

Very little research has been done in cross-cultural preferences in the sound quality of reproduced sound. What we know is that differe Preliminary investigations by the author in preferred spectral balance of music reproduced through loudspeakers have not revealed any significant differences in cross-cultural preferences to date. If cross-cultural preferences exist, the music and audio industries have largely ignored catering to them, instead distributing products that are optimized for a single universal audience.
Finally, an important question is whether audio companies should even be catering to these cross-cultural preferences if research eventually finds that they indeed exist? If the audio industry takes an “audio science in the service of art” philosophy where the goal is to faithfully and accurately reproduce the art as the artist intended, the question of cross-cultural preferences becomes moot. If certain cultures don’t like the sound of the art, then that becomes an issue between the artist and the recording producer/record executive - not the audio manufacturer.

For more discussion on this topic, please head over to WhatsBestForum.

[1] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 1" J. AES Vol. 23, issue 4, pp. 227-235, April 1986. (download for free courtesy of Harman International).
[2]Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2," J. AES, Vol. 34, Issue 5, pp. 323-248, May 1986. (download for free courtesy of Harman International).
[3] Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. (download for free courtesy of Harman International).
[4] John W. Berry, Ype H. Poortinga, Janak Pandey, Handbook of Cross-Cultural Psychology, Volume 1 Theory and Method, 2nd edition, Aug. 21, 1996.
[5] Martens, William L.; Giragama, Charith N. W.; Herath, Susantha; Wanasinghe, Dishna R.; Sabbir, Alam M.” Relating Multilingual Semantic Scales to a Common Timbre Space - Part II,” presented at the 115th Audio Engineering Convention, preprint 5895 (October 2003).