Monday, May 28, 2018
Hooked on the Science of Sound
Friday, February 17, 2017
TWiRT 337 – Predicting Headphone Sound Quality with Sean Olive
![]() |
| The predicted sound quality of 61 different models of in-ear headphones (blue curve) versus their retail price (green bars). |
So, if you think spending a lot of money on in-ear headphones guarantees you will get excellent sound, you may be sadly disappointed. One of the most expensive IE models ($3000) in the above graph, had a underwhelming predicted score of 20-25% depending what EQ setting you chose. The highest scoring headphone was a $100 model that we equalized to hit the Harman target response, which our research has shown to be preferred by the majority of listeners.
Friday, April 22, 2016
A Virtual Headphone Listening Test Method
Wednesday, October 22, 2014
The Influence of Listeners' Experience, Age and Culture on Headphone Sound Quality Preferences
The paper describes some double-blind headphone listening tests conducted in four different countries (Canada, USA, China and Germany) involving 238 listeners of different ages, gender and listening experiences. Listeners gave comparative preference ratings for three popular headphones and a new reference headphone that were virtually presented through a common replicator headphone equalized to match their measured frequency responses. In this way, biases related to headphone brand, price, visual appearance and comfort were removed from listeners’ judgment of sound quality. On average, listeners preferred the reference headphone that was based on the in-room frequency response of an accurate loudspeaker calibrated in a reference listening room. This was generally true regardless of the listener’s experience, age, gender and culture. This new evidence suggests a headphone standard based on this new target response would satisfy the tastes of most listeners.
The paper is available for download from the AES e-library. You can also find a PDF of our presentation here or view the presentation on YouTube.
Wednesday, June 11, 2014
My Article on Headphone Sound Quality in 2014 LIS
The 2014 Loudspeaker Industry Sourcebook came out this week. In it, you can find an article I wrote called "Perceiving and Measuring Headphone Sound Quality: Do Listeners Agree on What Makes a Headphone Sound Good?"The article is a summary of some recent published research we've conducted at Harman on the perception and measurement of headphone sound quality.
Together, these studies provide scientific evidence that when headphone brand, price, fashion, and celebrity endorsement are removed subjective evaluations, listeners generally agree on what makes a headphone sound good.
So far, this has been true regardless of users' listening training, age, or culture. The more preferred headphones tend to have a smooth, extended frequency response that approximates an accurate loudspeaker's in-room response. This new target frequency response could provide the basis for a new and improved headphone target response. You can find more details on the research here.
Monday, April 22, 2013
The Relationship between Perception and Measurement of Headphone Sound Quality
![]() |
| Above: The brands and models of six popular headphones used in this study. |
Todd Welti and I recently conducted a study to explore the relationship between the perception and measurement of headphone sound quality. The results were presented at the 133rd AES Convention in San Francisco, in October 2012. A PDF of the slide presentation referred to below can be found here. The AES preprint can be found in the AES E-library. The results of this study are summarized below.
Measuring The Perceived Sound Quality of Headphones
Double-blind comparative listening tests were performed on six popular circumaural headphones ranging in price from $200 to $1000 (see above slide). The listening tests were carefully designed to minimize biases from known listening test nuisance variables (slides 7-13). A panel of 10 trained listeners rated each headphone based on overall preferred sound quality, perceived spectral balance, and comfort. The listeners also gave comments on the perceived timbral, spatial, dynamic attributes of the headphones to help explain their underlying sound quality preferences.The headphones were compared four at a time over three listening sessions (slide 12). Assessments were made using three music programs with one repeat to establish the reliability of the listeners' ratings. The order of headphone presentations, programs and listening sessions were randomized to minimize learning and order-related biases. The test administrator manually substituted the different headphones on the listener from behind so they were not aware of the headphone brand, model or appearance during the test (slide 8). However, tactile/comfort differences were part of the test. Listeners could adjust the position of the headphones on their heads via light weight plastic handles attached to the headphones.
Listeners Prefer Headphones With An Accurate, Neutral Spectral Balance
When the listening test results were statistically analyzed, the main effect on the preference rating was due to the different headphones (slide 15). The preferred headphone models were perceived as having the most neutral, even spectral balance (slide 19) with the less preferred models having too much or too little energy in the bass, midrange or treble regions. Frequency analysis of listeners' comments confirmed listeners' spectral balance ratings of the headphones, and proved to be a good predictor of overall preference (slide 20). The most preferred headphones were frequently described as "good spectral balance, neutral with low coloration, and good bass extension," whereas the less preferred models were frequently described as "dull, colored, boomy, and lacking midrange".Looking at the individual listener preferences, we found good agreement among listeners in terms of which models they liked and disliked (slides 16 and 18). Some of the most commercially successful models were among the least preferred headphones in terms of sound quality. In cases where an individual listener had poor agreement with the overall listening panel's headphone preferences, we found either the listener didn't understand the task (they were less trained), or the headphone didn't properly fit the listener, thus causing air leaks and poor bass response; this was later confirmed by doing in-ear measurements of the headphone(s) on individual listeners (slides 26-39).
Measuring the Acoustical Performance of Headphones
Acoustical measurements were made on each headphone using a GRAS 43AG Ear and Cheek simulator equipped with an IEC 711 coupler (slide 24). The measurement device is intended to simulate the acoustical effects of an average human ear including the acoustical interactions between the headphone and the acoustical impedance of the ear. The headphone measurements shown below include these interactions as well as the transfer function of the ear, mostly visible in the graphs as a ~10 dB peak at around 3 kHz. It is important to note that we since we are born with these ear canal resonances, we have adapted to them and don't "hear" them as colorations.Relationship between Subjective and Objective Measurements
Comparing the acoustical measurements of the headphones to their perceived spectral balance confirms that the more preferred headphones generally have a smooth and extended response below 1 kHz that is perceived as an ideal spectral balance (slide 25). The least preferred headphones (HP5 and HP6) have the most uneven measured and perceived frequency responses below 1 kHz, which generated listener comments such as "colored, boomy and muffled." The measured frequency response of HP4 shows a slight bass boost below 200 Hz, yet on average it was perceived as sounding thin; this headphone was one of the models that had bass leakage problems for some listeners due to a poor seal on their ears.Conclusions
In conclusion, this headphone study is one of the first of its kind to report results based on controlled, double-blind listening tests [2]. The results provide evidence that trained listeners preferred the headphones perceived to have the most neutral, spectral balance. The acoustical measurements of the headphone generally confirmed and predicted which headphones listeners preferred. We also found that bass leakage related to the quality of fit and seal of the headphone to the listeners' head/ears can be a significant nuisance variable in subjective and objective measurements of headphone sound quality.It is important for the reader not to draw generalizations from these results beyond the conditions we tested. One audio writer has already questioned whether headphone sound quality preferences of trained listeners can be extrapolated to tastes of untrained younger demographics whose apparent appetite for bass-heavy headphones might indicate otherwise. We don't know the answer to this question. For younger consumers, headphone purchases may be driven more by fashion trends and marketing B.S. (Before Science) than sound quality. While this question is the focus of future research, the preliminary data suggests in blind A/B comparisons kids pref headphones with accurate reproduction to colored, bass-heavy alternatives. This would tend to confirm findings from previous investigations into loudspeaker preferences of high school and college students (both Japanese and American) that so far indicates most listeners prefer accurate sound reproduction regardless of age, listener training or culture.
Future headphone research may tell us (or not) that most people prefer accurate sound reproduction regardless of whether the loudspeakers are installed in the living room, the automobile, or strapped onto the sides of their head. It makes perfect sense, at least to me. Only then will listeners hear the truth -- music reproduced as the artist intended.
________________________________
Footnotes
[1] Despite the paucity of good subjective measurements on headphones there does exist some online resources where you can find objective measurements on headphones. You will be hard pressed to find a manufacturer who will supply these measurements of their products. The resources include Headroom.com, Sound & Vision Magazine, and InnerFidelity.com. Tyll Hertsens at InnerFidelity has a large database of frequency response measurements of headphones that clearly illustrate the lack of consensus among manufacturers on how a headphone should sound and measure. There is even a lack of consistency among different models made by the same brand.
[2] Sadly, studies like this present one are so uncommon in our industry that Sound and Vision Magazine recently declared this paper as the biggest audio story in 2012. Hopefully that will change sooner than later.
Thursday, October 6, 2011
Harman Science of Sound Demonstrations at Rocky Mountain Audio Fest 2011
Drop by and find out more about the science behind Harman audio product development and testing including JBL and Revel loudspeakers. I will be demonstrating our latest release of the "How to Listen" software used for training and selecting listeners for product research and testing. Find out how discriminating and reliable you are as a critical listener.
Attendees will be given 30% discount coupons towards a copy of Floyd Toole's book "Sound Reproduction" (Focal Press), a book that describes much of the current scientific knowledge and perception of the sound quality of loudspeakers, listening rooms, and their acoustical interaction with each other. I will be raffling off a few copies to the best performing listeners.
I hope to see you there!
Tuesday, March 15, 2011
Harman's "How to Listen" Listener Training Software Now Available as Beta
Friday, July 9, 2010
Why Live-versus-Recorded Listening Tests Don't Work

Figure 1: Singer Frieda Hempel conducting a Tone Test at Edison Studios, NYC in 1918. Note that many of the listeners' ears are covered by the blind folds making it a double blind and double deaf listening test, since the experimenter Edison was deaf himself.
Recently I was asked how I could possibly prove or assert that listeners prefer accurate loudspeakers without having performed a live-versus-recorded listening test. This is a test where the listener compares a live musical performance to a recording of the performance reproduced through loudspeakers. The closer the sound quality of the reproduction is to that of the live performance, the more accurate the loudspeaker is deemed to be - at least in theory. In practice, these tests are usually ridden with so many uncontrolled listening test nuisance variables that the results are essentially meaningless. This article examines why live-versus-recorded listening tests are not suitable for serious scientific investigations of the perceived sound quality of recorded and reproduced sound.
Edison’s Tone Tests: “People will hear what you tell them to hear”
Thomas Edison was among the first audio engineers to embrace live-versus-recorded demonstrations. In 1910, he invented the Edison Diamond Disk Phonograph, which he claimed had “no tone” of its own. To prove it, a series of road shows involving 4,000 live-versus-recorded demonstrations of his phonograph were conducted in auditoriums across the United States At some point during the live music performance there would be a switch over to the recorded performance, and apparently audience members could not tell the difference between the live and recorded performances
After a 1916 live-versus-recorded demonstration in Carnegie Hall, the New York Evening Mail stated “the ear could not tell when it was listening to the phonograph alone, and when to actual voice and reproduction together. Only the eye could discover the truth by noting when the singer’s mouth was open or closed” [1].
By today’s standards, the fidelity of Edison’s disc phonograph was egregious in terms of its noise, distortion, limited dynamic range, bandwidth and frequency response (you can hear some of Edison’s recordings online here). It’s hard to imagine that listeners were fooled into thinking his Diamond Disk recording was indistinguishable from the live performance. In fact, we now know that Edison manipulated the tests to produce the results he wanted. First, he carefully chose the music and musicians to work within the technical limitations of his technology. Edison detested music with extreme dynamics, high tones, vibrato and complex textures because they were a challenge to his deafness and his Tone Tests. He selected and coached musicians to mimic the sound of their recordings to minimize the audible differences between live and recorded performances [1],[2].
Second, Edison was the consummate audio salesman and was known to say, “People will hear what you tell them to hear” [2]. The expectations and perceptions of his listeners were manipulated before the test to produce a more predicable outcome. Audience members were given a concert program before his Tone Tests that clearly told them exactly what they would hear, how amazing it will sound, and what an appropriate response would be:
“Those who hear this test will realize fully for the first time how literally true it is that Mr. Edison has made possible the re-creation of the artist’s voice. No more exacting test could be made to demonstrate that the New Edison actually does re-create the voice of the artist than to play it side by side with the artist who made the records. This is the final proof. Close your eyes. See if you can distinguish the voice of the New Edison from that of the artist. Did you ever believe it possible to re-create a voice? Note that the voice of the artist and the voice of the Edison are indistinguishable” [emphasis is mine] [ 3].

Figure 2: Another Edison Tone Test where extraneous biases related to sight and smell may have compromised the results based on the large number of listeners covering their noses. Perhaps a bad case of singer's halitosis made it possible to identify the live performance from the recorded one based on smell alone?
Other Live-Versus-Recorded Demonstrations
Following Edison’s live-versus-recorded demonstrations, other tests have been conducted by Harry Olson at RCA, and G.A. Briggs (Wharfedale) and Peter Walker at Quad in the 1950’s. [4]. A common problem with these demonstrations was double reverberation: the reverberation of the room was heard both in the recording, and again when it was reproduced through loudspeakers in the same room. This made it easier for listeners to tell the difference between the recorded and live performances.
Acoustic Research's Live-Versus-Recorded Demonstrations
During the 1960’s, Acoustic Research (AR), an American loudspeaker company, performed over 75 live-versus-recorded concerts in cities around the USA featuring The Fine Arts String Quartet, and the AR-3 loudspeaker [5],[6]. To solve the double reverberation problem, the recordings of the quartet were made in an anechoic chamber, or outdoors. Outdoor live-versus-recorded demonstrations had the added benefit that there were no room reflections in either the recording or the live performance. This made the demonstrations less sensitive to off-axis problems in the microphones and loudspeakers. It also relaxed the demands on the recording-reproduction to accurately capture and reproduce the complex spatial properties of a reverberant performing space.
The AR demonstrations apparently generated an enormous amount of free publicity in newspapers and audio magazines where it was reported that the reproduction of the recordings was virtually indistinguishable from the live performance. AR sales increased dramatically, to the point where in 1966 AR apparently owned 32% market share of loudspeakers sold in the United States.
A Live-Versus-Recorded Method For Testing Loudspeaker Accuracy
Edgar Villchur, head of Acoustic Research, to his credit, was a firm believer that loudspeakers should accurately reproduce the art (the recorded music) and not editorialize or enhance it. In a 1962 paper, he described a live-versus-recorded method for evaluating the accuracy of loudspeakers [7]. The method used a reference loudspeaker (the live performance) that was placed in the listening room with the loudspeaker-under-test. The goal of the loudspeaker-under-test was to accurately reproduce a previous recording of the reference loudspeaker playing white noise in an anechoic chamber. The original white noise signal was also fed to the reference loudspeaker during the listening test. The more similar the loudspeaker-under-test sounded to the reference speaker, the more accurate it was deemed to be, at least in theory.
Villchur acknowledged that the sensitivity and validity of the method depended on the quality of the reference loudspeaker, its directivity, and the choice of program material. White noise was more revealing of loudspeaker inaccuracies than music. His reference loudspeaker consisted of a single 2-inch midrange from an AR-3 loudspeaker selected because he found using multiple drivers caused acoustical inference that was audible in the anechoic chamber, but not so audible in a reverberant listening room; these differences would produce errors in the listening test. One wonders how a tiny 2-inch driver could have produced adequate high treble and low bass without distortion. These limitations would significantly limit the accuracy and usefulness of this listening test method.
Another problem with this method was that the anechoic loudspeaker recordings were made at a single point in space, and did not capture the directivity and off-axis characteristics of the reference loudspeaker. Unless the speaker-under-test had the same directivity and off-axis characteristics of the reference loudspeaker, it could never sound exactly the same in a reflective listening room. To compensate for these errors, Villchur used a trial-an-error process to find the best microphone position relative to the reference loudspeaker where the timbre of the anechoic recording best matched the timbre of the reference loudspeaker when placed in a room. Adjusting the recording to mimmic the sound of live performance was the reverse process of what Edison’s musicians did, but essentially it produced the same bias. (Edison would have been proud!)
Finally, it is not clear how Villchur controlled loudspeaker positional biases when comparing the reference loudspeaker to the loudspeaker-under-test. Loudspeaker positional biases have been shown to produce audible effects that are sometimes larger than the audible differences between different loudspeakers under test [9]. At Harman, these positional biases are eliminated via an automated speaker shuffler that places each loudspeaker in the same position of the room.
Summary of Problems with Live-versus-Recorded Tests
By today’s standards, the live-versus-recorded tests performed to date lack the necessary scientific controls and rigor to consider their results or conclusions accurate, repeatable and valid. Below are a few of the most significant psychological, physical, methodological or experimental listening variables that plague these types of tests. While it is possible to control some of these variables, others are either impossible, impractical or too expensive to control.
Sighted and Cross-Modality Biases
To date, most of the live-versus-sighted tests have been performed sighted, where non-auditory cues were available to allow the listener to identify whether they were hearing the live or reproduced sound source. These tests could have been easily made blind via an acoustically transparent curtain; however, scientific validity was apparently not the primary purpose of the test. The visual cues from the musicians (bowing, lip syncing) would also enhance the realism and presence of the reproduction, a well-known cognitive effect observed in research of binaural and virtual reality displays.
Listener Expectation, Authority Bias, Group Interaction Bias
In many of the public live-versus-recorded demonstrations, listeners expectations were manipulated by knowledge given to them by the organizers of the demonstrations. In some cases, listeners were told what the expected response should be before the test began (see Edison's concert programs above). In large groups settings, listeners' responses can be easily swayed by the opinions and reaction of other members in the group (a herd mentality), especially when an authority member is present. These biases are easily removed from live-versus-recorded tests by repeating the test for each individual listener. The live and recorded performances would have to be replicated for every listener, which makes the tests too difficult, expensive, time consuming, and impractical to use.
Qualifications of Listeners
None of the live-versus-recorded tests I've read about have reported the hearing and critical listening qualifications of the listeners who participated in them. These are important variables in the sensitivity and reliability of the test results, and can be easily quantified.
Live and Recorded Performances Must Be Identical
For live-versus-recorded tests to be valid, the live and recorded performance should be identical, having the same notes, intonation, tempo, dynamics, loudness, balance between instruments, and the same location and sense of space of the instruments. Otherwise, there are extraneous cues that allow listeners to readily identify the live and recorded performances. Midi-controlled instruments (e.g. player pianos) are but one example of how this problem could be resolved.
Positional Biases from Live and Reproduced Sound Sources
Unless the live and reproduced (e.g. loudspeakers) sound sources occupy the same physical locations, the listener can always identify the live versus recorded versions based on the localized positions of the sound sources.
Errors in the Recording
The usefulness of live-versus-recorded methods for perceptual measurements of sound quality in the playback chain is severely limited by errors in the recording. The recording errors are not easily separated from the errors in the playback chain (see circle-of-confusion). Microphones and microphone techniques both contain errors that limit the timbral, spatial and dynamic accuracy of the recordings through which we judge loudspeakers. Apparently the most effective live-versus-recorded demonstrations were conducted outdoors - effectively an anechoic environment - where the off-axis performances of the microphones and loudspeakers, and the complex spatial cues of a reflective room were largely removed as factors from the experiment. However, results from outdoor live-versus-recorded tests cannot be generalized to how the loudspeakers would perform in real rooms, where the off-axis sounds provide a significant contribution towards the listener's impression of the loudspeaker.
Lack of Proper Scientific Protocols, Listener Response Data, Statistical Analysis, Results
The most interesting characteristic of live-versus-recorded tests is that they never seem to provide listener response data, statistical analysis or published results. Eyewitness reports written in newspapers or magazines do not constitute scientific evidence.
Accuracy is Not Applicable to Most Recordings Made Today
Most recordings made today are not intended to sound like the live performance. Anyone who heard Taylor Swift's live performance with Stevie Nicks at the 2010 Grammy Awards understands why.(Note: you can relive the magical moment on Youtube. Warning: this may be offensive for the musically-inclined). About 90% of commercial recordings are studio creations consisting of a series of overdubs, processed with auto-tuning, equalization, dynamic compression, and reverb sampled from an alien nation. For these recordings, there is no equivalent live performance to which the recording/reproduction can be compared for accuracy. The only reference is what the artist heard over the loudspeakers in the recording control room. If the important performance aspects of the playback system through which the art (the music and recording) was created can be reproduced in the home, then the consumer will hear an accurate reproduction of the music, as the artist intended. It is possible to achieve this if we adopt a science in the service of art philosophy towards audio recording and reproduction.
Conclusions
In reviewing the history of live-versus-reproduced tests, most have been performed as elaborate sales and marketing demonstrations designed to fool listeners into believing that a product sounded much better and more accurate than it actually was. While live-versus-recorded tests have proven their merit as an effective marketing and sales tool, they have not yet proven themselves as a serious method for scientific experiments intended to advance our psychoacoustic understanding of music recording and reproduction.
The reason for this, I believe, is that live-versus-recorded tests do not adequately control important listening test nuisance variables, a prerequisite for accurate, reliable and scientifically valid results. It is not entirely coincidental, that (to my knowledge) none of the live-versus-recorded tests to date have produced a single scientific publication or new psychoacoustic knowledge.
Hopefully, you now understand why I don’t conduct live-versus-recorded loudspeaker listening tests.
References
[1] Harvith, J., and Harvith, S. Edison, Musicians and the Phonograph: A Century in Retrospect, Greenwood Press, N.Y (1987).
[2] Andre Milliard, “Edison’s Tone Tests and the Ideal of Perfect Sound Reproduction,” from Lost and Found Sounds’, NPR.
[3] Program for Edison Demonstration http://www.nipperhead.com/old/tonetest04.htm
[4] Wharfedale History: http://www.wharfedale.co.uk/About/History/tabid/66/Default.aspx
[5] Acoustic Research http://en.wikipedia.org/wiki/Acoustic_Research
[6] Edgar Villchur, http://edgarvillchur.com/
[7] Villchur, Edgar, “A Method of Testing Loudspeakers with Random Noise”, J. Audio Eng. Society, Vol. 10, Issue 4, pp, 306-309 (October 1962),
[8] Kissinger, John R.The Development of the Simulated Live-vs-Recorded Test into a Design Tool, presented at the 35th AES Convention, preprint 609, (October 1968
[9] Olive, Sean E.; Schuck, Peter L.; Sally, Sharon L.; Bonneville, Marc E. “The Effects of Loudspeaker Placement on Listeners' Preference Ratings”,JAES Volume 42 Issue 9 pp. 651-669; September 1994.
Sunday, November 1, 2009
The Subjective and Objective Evaluation of Room Correction Products

In a recent article, I discussed audio’s circle of confusion that exists within the audio industry due to the lack of performance standards in the loudspeakers and rooms through which recordings are monitored. As a result, the quality and consistency of recordings remain highly variable. A significant source of variation in the playback chain occurs from acoustical interactions between the loudspeaker and room, which can produce >18 dB variations in the in-room response below 300-500 Hz.
In recent years, audio manufacturers have begun to offer so-called “room correction” products that measure the in-room response of the loudspeakers at different seating locations, and then automatically equalize them to a target curve defined by the manufacturer. The sonic benefits of these room correction products are generally not well known since, to my knowledge, no one has yet published the results of a well-controlled, double-blind listening test on room correction products. To what degree do room correction products improve or possibly degrade the sound quality of the loudspeaker/room compared to the uncorrected version of the loudspeaker/room? Can the sound quality ratings of the different room correction products be explained by acoustical measurements performed at the listening location?
A Listening Experiment on Commercial Room Correction Products
To answer these questions, we conducted some double-blind listening tests on several commercial room correction products [1]. I recently presented the results of those tests at the 127th Audio Engineering Society Convention in New York. A copy of my AES Keynote presentation can be found here.
A total of three different commercial products were compared to two versions of a Harman prototype room correction that will find its way into future Harman consumer and professional audio products. The products included the Anthem Statement D1, the Audyssey Room Equalizer, the Lyngdorf DPA1, and two versions of the Harman prototype product (see slide 7). Included in the test was a hidden anchor: the same loudspeaker and subwoofer without room correction. In this way, we could directly compare how much each room correction improved or degraded the quality of sound reproduction.
Each room correction device was tested in the Harman International Reference Room using a high quality loudspeaker (B&W 802N) and subwoofer (JBL HB5000) (slides 8 and 9). A calibration was performed for each room correction over the six listening seats according to the manufacturer’s instructions. Two different calibrations were performed with the Harman prototype: one based on a multipoint six-seat average, while the second calibration used a six-microphone spatial average focused on the primary listening seat. The different room corrections were level matched for equal loudness at the listening seat.
The Listener's Task
A total of eight trained listeners with normal hearing participated in the tests. Using a multiple comparison method, the listener could switch at will between the six different room corrections, and rate them according to overall preference, spectral balance, as well as give comments (see slide 14). The administration of the test, including the design, switching, collection and storage of listener responses, was computer automated via Harman’s proprietary Listening Test Software. A total of nine trials were completed using three different programs repeated three times. The presentation order of the program and room corrections was randomized.
Results: Significant Preferences For Different Room Corrections
The mean preference ratings and 95% confidence intervals are shown above in Figure 1 (or slide 17). The room correction products are coded from R1 through R6 in descending order of preference. The identities of the products associated with the results are not relevant for the purpose of this article. Three of the five room corrections (RC1-RC3) were strongly preferred over no room correction (RC4). However, one of the room corrections (RC5) was equally rated to the no correction treatment (RC4), and one of the room corrections (RC6) was rated much worse. Overall, the sound quality of R6 was rated "very poor" based on the semantic definitions of the preference scale.
Perceived Spectral Balance of Room Corrections
Listeners rated the perceived spectral balance of each room correction across seven equal logarithmically spaced frequency bands. The mean spectral balance ratings averaged across all listeners and programs are shown in slide 18. The more preferred room corrections were perceived to have a flatter, smoother spectral balance with extended bass. The less preferred room correction products (R5 and R6) were perceived to have too little bass, which made them sound thin and bright.
Listener Comments on Room Corrections
Listeners also gave comments related to the spectral balance of the different room correction products. Slide 19 shows the number of times a particular comment was used to describe each room correction. The bottom row indicates the correlation between preference rating and the frequency of the comment. The most preferred room corrections were described as "neutral" and "full," which corresponded to flatter, smoother and more bass extended spectral balance ratings. The least preferred room corrections (R4-R6) were described as colored, harsh, thin, and muffled, which corresponded to less flat, less smooth, and less bass extended spectral balance ratings. Slide 20 graphically illustrates the same information in slide 19.
Correlation Between Subjective and Objective Measurements
In-room acoustical measurements were made at the six listening seats using a proprietary 12-channel audio measurement system developed by the Harman R&D Group. Slides 23 and 24 show the amplitude response of the different room corrections spatially averaged for the six seats (slide 23), and at the primary listening seat (slide 24). The measurements are plotted from top to bottom in descending order of preference, each vertically offset to more clearly delineate the differences. A few observations can be made:
- The six-seat spatially averaged curves (slide 23) of the room corrections do not explain listeners' room correction preferences as well as the spatially averaged curves taken at the primary seat (slide 24). This makes perfect sense since all of the listening was done in the primary listening seat.
- Looking at slide 24, the most preferred room corrections produced the smoothest, most extended amplitude responses measured at the primary listening seat. The largest measured differences among the different room corrections occur below 100 Hz and around 2 kHz where the loudspeaker had a significant hole in its sound power response. The room corrections that were able to fill in this sound power dip received higher preference and spectral balance ratings.
- A flat in-room target response is clearly not the optimal target curve for room equalization. The preferred room corrections have a target response that has a smooth downward slope with increasing frequency. This tells us that listeners prefer a certain amount of natural room gain. Removing the rom gain, makes the reproduced music sound unnatural, and too thin, according to these listeners. This also makes perfect sense since the recording was likely mixed in room where the room gain was also not removed; therefore, to remove it from the consumers' listening room would destroy spectral balance of the music as intended by the artist.
Conclusions
There are significant differences in the subjective and objective performance of current commercial room correction products as illustrated in these listening test results. When done properly, room correction can lead to significant improvements in the overall quality of sound reproduction. However, not all room correction products are equal, and two of the tested products produced results that were no better, or much worse, than the unequalized loudspeaker. Room correction preferences are strongly correlated to their perceived spectral balance and related attributes (coloration, full/thin, bright/dull). The most preferred room corrections produced the smoothest, most extended in-room responses measured around the primary listening seat.
More tests are underway to better understand and, if necessary, optimize the performance of Harman's room correction algorithms for different acoustical aspects of the room and loudspeaker.
References
[1] Sean E. Olive, John Jackson, Allan Devantier, David Hunt, and Sean Hess, “The Subjective and Objective Evaluation of Room Correction Products,” presented at the 127th AES Convention, New York, preprint 7960 (October 2009).
Tuesday, March 24, 2009
Binaural Room Scanning - A Powerful Tool For Audio Research & Testing
Unlike auralization methods, BRS provides an auditory display based on actual acoustical measurements of the loudspeakers and listening environment - not simulations based on a model of the loudspeakers and room. For this reason, BRS reproductions are significantly more accurate and realistic than model-based auralizations.
BRS measurements of the loudspeakers and listening space are made with an anthropomorphically accurate binaural mannequin equipped with microphones in each ear (see top photo above). Measurements are made at every 1-2 degrees over a range of ±60 degrees by precisely rotating the mannequin's head via a stepper motor controlled by the BRS measurement computer. Each measurement is stored as a set of binaural room impulse responses (BRIR) that provide the filters through which music is convolved and sent to a calibrated pair of high quality headphones (see bottom photo above). A key feature of the BRS playback system is its ultrasonic head-tracker: it constantly monitors the position of the listener's head, sending the angular coordinates to the playback engine, which in turn switches to the corresponding set of measured BRIRs. In this way, the BRS playback preserves the natural dynamic interaural cues, used by humans to localize sound in rooms. Without these dynamic cues, headphones tend to produce sound images localized inside or near the head with front-to-back reversals being quite common. Head-tracking is therefore necessary for accurate assessment of the true spatial qualities of the audio reproduction.
Current and Future Applications For BRS
As a research tool, BRS offers greater efficiencies and opportunities in how audio scientists research, develop and test audio products within home, professional and automotive listening spaces. BRS allows an unlimited number of acoustical variables to be manipulated, sequentially captured, and later evaluated in a highly repeatable and controlled manner. Using BRS, Harman researchers can do perceptual experiments and product evaluations that would otherwise be impractical or impossible using conventional in situ listening tests. This includes double-blind, controlled comparisons of different audio systems in different automobiles, concert halls or arenas, and home theaters.
BRS has already been used at Harman to study how the acoustical properties of the loudspeaker and listening room interact with each other, how these interactions affect the sound quality of the music reproduction, and the extent to which listeners’ adapt to the room acoustics when listening to multichannel audio systems [2],[3]. Over the next few years, BRS will help expand our current scientific understanding of how listeners perceive sound in rooms, so that we can optimize the sound quality of loudspeakers, acoustic spaces, and room-correction devices used to tame loudspeaker-room interactions. A BRS auditory display connected over the internet to a BRS database could even allow consumers to compare and select their most preferred loudspeaker model, concert hall seat, or automotive audio system configuration, without ever leaving the privacy of their home.
Finally, BRS brings enormous efficiencies, flexibility, and cost savings to psychoacoustic research and testing. The acoustical complexity of an automotive audio system can be captured and stored as a relatively small 10 MB file, which can then be emailed and evaluated anywhere in the world using a relatively inexpensive auditory display. The high costs associated with building expensive ITU-R listening rooms, transporting listeners, automobiles, and loudspeakers around the world for evaluation may soon be a thing of the past.
In the next installment, I will discuss some of the inherent errors found in all BRS systems, and how they can be removed through proper calibration. Some recent listening experiments will be described that validate the perceptual accuracy and performance of our BRS system.
References
[1] Horbach, Ulrich, Karamustafaoglu, Attila, Pellegrini, Renato, Mackensen, Philip, Theile, Günther, “Design and Applications of a Data-Based Auralization System for Surround Sound,” presented at the 106th Audio Eng. Soc. Convention, preprint 4976, (May 1999). Download here.
[2] Olive, Sean and Martens William L. “Interaction between Loudspeakers and Room Acoustics Influences Loudspeaker Preferences in Multichannel Audio,” presented at the 123rd Audio Eng. Soc., Convention, preprint 7196 (October 2007). Download here.
[3] Olive, Sean and Welti Todd, “Validation of a Binaural Car Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems,” to be presented at the 36th International Audio Eng. Conference, Dearborn, Michigan, USA (June 2-4, 2009).









