Friday, July 28, 2023

The Influence of Program Material on Sound Quality Ratings of In-Ear Headphones

Choosing program material for subjective evaluation of audio components is challenging because the acoustic characteristics and qualities of the recordings themselves can bias and influence the results [1], [2]. The programs must be sensitive to and reveal artefacts present in the devices under test, otherwise an invalid null result may occur (type II error). Ideally, the programs should be well recorded and not contain artefacts that may be inadvertently attributed to the headphone or loudspeaker. For example, an accurate headphone may be misperceived as sounding too bright or too full if the recorded program contains an excessive amount of boosted high or low frequency information. These so-called “circle of confusion” errors [3] caused by a lack of meaningful loudspeaker-headphone standards make it difficult to choose neutral programs that don’t bias listening tests.

It would be ideal if there existed a list of recommended programs that meet all the above criteria, or an objective method for selecting the best programs. Unfortunately, the current listening test standards provide neither solution:

“...There is no universally suitable programme material that can be used to assess all systems under all conditions. Accordingly, critical programme material must be sought explicitly for each system to be tested in each experiment. The search for suitable material is usually time-consuming; however, unless truly critical material is found for each system, experiments will fail to reveal differences among systems and will be inconclusive. A small group of expert listeners should select test items out of a larger selection of possible candidates.” [2].

Some insight into selecting effective programs for headphone evaluation may be gained from previous loudspeaker research where the spectral attributes are best evaluated using programs containing wideband, continuous spectrally dense signals. Low and medium Q resonances in loudspeakers are most easily detected using wide band continuous signals, whereas higher Q resonances are most sensitive to impulsive, discontinuous signals [4], [5]. The performance of listeners in categorizing spectral distortions added to headphones increases as the power spectral density of the program increases [6]. While we don’t recommend evaluating headphones using pink noise, there may be benefits in using music tracks that have broadband continuous signals mixed with some impulsive transient sounds as well. For judging the spatial and distortion attributes of headphones a different type of program may be required.

A listener’s familiarity with the program and their affection for it from a musical or emotional perspective may also influence their sound quality judgements. Naïve listeners and audiophiles often criticize formal listening tests because they’re unfamiliar with the programs, or they dislike them. Whether this affects their performance as listeners is not well understood. ITU-R BS 1116 recommends: “the artistic or intellectual content of a programme sequence should be neither so attractive nor so disagreeable or wearisome that the subject is distracted from focusing on the detection of impairments.” [1].

 listening experiment was designed to address some of the above questions: 1) which programs are most effective at producing sensitive and reliable sound quality ratings of headphones 2) to what extent does familiarity with the program play a role and 3) are there physical properties of the program that can help predict their effectiveness in evaluating programs? A post-test survey was also administered to determine whether the listeners’ music preferences for certain programs and other factors influenced their performance and headphone ratings. 

We published an AES preprint in 2017 that describes and summarizes the results of the experiments which can be found here  ( .

I've also created a PPT presentation that summarizes the experiments and results below.


[1] International Telecommunications Union, ITU-R BS 1116-3, “Methods for the subjective assessment of small impairments in audio systems,” REC-BS.1116-3-201502-I/en, February 2015.

[2] International Telecommunications Union, ITU-R BS 1534-3, “Methods for the subjective assessment of intermediate impairments in audio systems,” (October 2015).

[3] Toole, Floyd, The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal Press, first edition 2008.

[4] Toole, Floyd E., and Olive, Sean E. “The Modification of Resonances: Perception and Measurement,” AES Volume 36 Issue 3 pp. 122-142, (March 1988).

[5] Olive, Sean E., Schuck, Peter L., Ryan, James G., Sally, Sharon L., Bonneville, Marc. “The Detection Thresholds of Resonances at Low Frequencies,” J. AES Volume 45 Issue 3 pp. 116-128, (March 1997).

[6] Olive, Sean. E, “A Method for Training Listeners and Selecting Program Material for Listening Tests,” presented at the 97th Audio Eng. Soc. Convention, preprint 9893, November 1994.

Friday, March 4, 2022

The Perception and Measurement of Headphone Sound Quality - What Do Listeners Prefer?


In the spring 2022 edition of Acoustics Today published today was published today you can find an article I wrote called, "The Perception and Measurement of Headphone Sound Quality - What Do Listeners Prefer?" 

For headphone enthusiasts who find the HARMAN headphone research difficult to access (i.e. it's behind an Audio Engineering Society paywall), or comprehend (the research is reported in a series of 19+ technical papers), this article will hopefully provide some relief from your pain.  The article is free and can be downloaded as a PDF. It summarizes the most relevant findings of our research in just 6,000 words (the maximum allowed by AT).

The current international headphone standards recommend a diffuse-field  (DF) calibration for optimal sound quality. The article argues the standard  is outdated, and has been largely rejected by the industry  in favor of alternative targets deemed to be more neutral and preferred.  The rejection of the DF target can be found in headphone surveys where the average measured frequency response deviates significantly from the DF target curve. Instead, the average response tends to approximate the in-room steady-state frequency response of a flat loudspeaker in a semi-reflective field (SRF) produced in a typical listening room.  We've known what makes loudspeakers sound good for almost 40 years since the seminal loudspeaker papers were published by Dr. Floyd Toole in 1985-86.  It turns out what makes a loudspeaker sound good, also applies to headphones. Who would have guessed?

The in-room target response of an accurate loudspeaker became the starting point of the HARMAN target curve for headphones.  Over several years, we conducted many controlled scientific listening tests to test the target curve against many headphones, refine it, and ensure it had a wide acceptance among different groups of listeners based on age, gender, listening experience and geographical location.

 The results found there exist three segments of listeners based the headphone they preferred.  The largest segment (64%) preferred the HARMAN target curve with two smaller segments that preferred something close to the HARMAN target curve with less bass (21%) and more bass (15%).  We also found that membership in these segments tended to be related to factors such as age, listening experience and gender.  Much like loudspeakers, we found that the preferred sound quality rating of a  headphone can be modeled and  predicted based how far its measured frequency deviates from the HARMAN target curve. 

Hopefully, this article will explain the motivation,  major findings and conclusions of the research behind the HARMAN headphone target curve. Enjoy!

Monday, May 28, 2018

Hooked on the Science of Sound

This past month, I was interviewed by Bruel & Kjaer's, "Waves Magazine" in their Expert Profile feature. For those people not familiar with Bruel & Kjaer located in Denmark they are one of the oldest (in operation since 1942) best known manufacturers of acoustic and vibration measurement equipment.

The interviewer was interested in how my career transitioned from musician to recording engineering to acoustic/psychoacoustics. Essentially, my career has been a world-wind trip through the Circle of Confusion where I was guided by my interests, curiosity in the perception and measurement of sound, and the opportunities I was presented at the time. There was no master plan. Hopefully, we've helped remove some of the confusion in the circle by providing with a better understanding of what influences the quality of recorded and reproduced sound, and how to make it better and more consistent.

You can read the entire interview here:

Friday, February 17, 2017

TWiRT 337 – Predicting Headphone Sound Quality with Sean Olive

The predicted sound quality of 61 different models of in-ear headphones (blue curve) versus their retail price (green bars).
On February 16, 2017 I was interviewed by host Kirk Harnack on This Week in Radio Tech. The topic was  "Predicting Sound Headphone Sound Quality". You can find the interview here.

During the interview, Kirk asked if it's possible to design a good sounding headphones for a reasonable cost. Or does one need to spend a considerable amount of cash to obtain good sound? Fortunately for consumers,   my answer was that you can get decent sound without having to spend thousands or even hundreds of dollars. In fact, there is almost no correlation between price and sound quality based on our research.

 I referred to the slide above that shows the predicted sound quality for 61 different models of in-ear headphones based on their measured frequency response.  The correlation between price and sound quality is close to zero and, slightly negative: r = -.16 (i.e. spending more money gets you slightly worse sound on average).

So, if you think spending a lot of money on in-ear headphones guarantees you will get excellent sound, you may be sadly disappointed. One of the most expensive IE models ($3000) in the above graph, had a underwhelming predicted score of 20-25% depending what EQ setting you chose. The highest scoring headphone was a $100 model that we equalized to hit the Harman target response, which our research has shown to be preferred by the majority of listeners.

The sound quality scores in the graph are predicted using a model based on a small sample of headphones that were evaluated by trained listeners in double-blind test. The accuracy of the model is better than 96% but limited to the small sample we tested.  We just completed a large listening test study involving over 30 models and 75 listeners that will allow us to build more accurate and robust predictive models. 

The ultimate goal of this research is to accurately predict the sound quality of headphones based on acoustic measurements without having to conduct expensive and time consuming listening tests. The current engineering approach to tuning headphones is clearly not optimal based on the above slide. Will headphone industry standards, headphone manufacturers and audio review magazines use similar predictive models to reveal to consumers how good the headphones sound?  What do you think?

Tuesday, August 16, 2016

15 Minutes with Harman’s Audio Guru Sean Olive: Sound & Vision Magazine Interview

"The problem is that the current standard audio specifications for headphones and loudspeakers are almost useless in terms of indicating how good or bad they sound." —Sean Olive
Read more at S&V Magazine 

In May 2016, I was interviewed by editor Bob Ankosko in Sound&Vision Magazine about my views of where audio currently is,and where it is going.  You can read the interview here.  One of the recurring questions that I get asked is whether people really care about sound quality anymore. The fact that a recent study found 55% of Americans  typically listen to music through their laptop speakers doesn't bode well for the immediate future. While the recent focus has been on the poor quality of the source material (e.g. compressed MP3),  a typical laptop speaker system won't produce the bottom 3-4 octaves of music whether or not the music is compressed or recorded in high resolution (e.g. 24-bit, 96 kHz).

In terms of  home loudspeakers, the trend is smaller size, fewer number of loudspeakers, and wireless. Sound Bars and  small, powered wireless speakers are what consumers currently want in their homes. The current challenge for engineering is  to build high quality systems with these features but still deliver good sound for prices that consumes will pay. The fact that more consumers are expecting a  high quality (and branded) audio system in their automobiles suggests that the  desire to have good audio is not dead.

What do you think the future holds for audio and sound quality?

Friday, April 22, 2016

A Virtual Headphone Listening Test Method

Fig. 1 The Harman Headphone Virtualizer App allows listeners to make double-blind comparisons of  different headphones through a high-quality replicator headphone. The  app has two listening modes: a sighted mode (shown) and a blind mode (not shown) where listeners are not biased by non-auditory factors (brand, price, celebrity endorsement,etc). Clicking on the picture will show a larger version.

Early on in our headphone research  we realized there was a need to develop a listening test method that allowed us to conduct more controlled double-blind listening tests on different headphones.  This was necessary in order to remove tactile cues (headphone weight and clamping force), visual and psychological biases  (e.g. headphone brand, price, celebrity endorsement,etc )  from listeners' sound quality judgements of headphones.  While these factors (apart from clamping force) don't physically affect the sound of headphones, our  previous research [1]  into blind vs. sighted listening tests revealed their cognitive influence affects listeners'  loudspeaker preferences [1], often in adverse ways. In sighted tests,  listeners were also less sensitive and  discriminating compared to blind conditions when judging different loudspeakers including their interaction with different music selections and loudspeaker positions in the room. For that reason, consumers should be dubious of loudspeaker and headphone reviews that are based solely on sighted listening.

While blind loudspeakers listening tests are possible through the addition of an acoustically-transparent- visually-opaque-curtain,  there is no simple way to hide the identity of a headphone when the listener is wearing it.  In our first headphone listening tests,  the experimenter positionally substituted the different headphones onto the listener's head from behind so that the headphone could not be visually identified. However, after a couple of trials, listeners began to identify certain headphones simply by their weight and clamping force. One of the easiest headphones for listeners to identify was the Audeze LCD-2, which was considerably heavier (522 grams) and more uncomfortable than the other headphones. The test was essentially no longer blind.

To that end, a virtual headphone method was developed whereby listeners could A/B different models of headphones that were virtualized through a single pair of headphones (the replicator headphone). Details on the method and its validation were presented at the 51st Audio Engineering Society International Conference on Loudspeakers and Headphones [2] in Helsinki, Finland in 2013.  A PDF of the slide presentation can be found  here.

Headphone virtualization is done by measuring the frequency response of the different  headphones at the DRP (eardrum reference point) using a G.R.A.S. 45 AG, and then equalizing the replicator headphone to match the measured responses of the real headphones.  In this way, listeners can make instantaneous  A/B comparisons between any number of virtualized headphones through the same headphone without the visual and tactile clues biasing their judgment. More details about the method are in the slides and AES preprint.

An important questions is: "How accurate are the virtual headphones compared to the actual headphones"?  In terms of their linear acoustic performance they are quite similar. Fig. 2 compares the  measured frequency response of the actual versus virtualized headphones.  The agreement is quite good up to 8-10 kHz above which we didn't aggressively equalize the headphones because of measurement errors and large variations related to headphone positioning both on the coupler and the listeners' head.

Fig. 2 Frequency response measurements of the6  actual versus virtualized headphones made on a  GRAS 45 AG coupler with pinna. The dotted curves are based on the physical headphone and the solid curves are from the virtual (replicator) headphone.  The measurements of the right channel of the headphone (red curves) have been offset by 10 dB from the left channels (blue curve) for visual clarify. Clicking on the picture will show a larger version.

More importantly, "Do the actual and virtual headphones sound similar"? To answer this question we performed a validation experiment where listeners evaluated 6 different headphone using both standard and virtual listening methods Listeners gave both preference and spectral balance ratings in both standard and virtual tests. For headphone preference ratings the correlation between standard and virtual test results was r = 0.85. A correlation of 1 would be perfect but 85% agreement is not bad, and hopefully more accurate than headphone ratings based on sighted evaluations. 

The differences between virtual and standard test results we believe are in part due to nuisance variables that were not perfectly controlled across the two test methods. A significant nuisance variable would likely be headphone leakage that would affect the amount of bass heard depending on the fit of the headphone on the individual listener. This would have affected the results in the standard test but not the virtual one where we used an open-back headphone that largely eliminates leakage variations across listeners.  Headphone weight and tactile cues were present in the standard test but not the virtual test, and this could in part explain the differences in results.  If these two variables could be better controlled even higher accuracy can be achieved in virtual headphone listening.

Fig.3 The mean listener preference ratings and 95% confidence intervals shown for the headphones rated using the Standard and Virtual Listening Test Methods. The Standard Method listeners evaluated the actual headphones with tactile/weigh biases and any leakage effects. In the Virtual Tests, there were no visual or tactile cues about the headphones. Note: Clicking on the picture will show a larger version.

Some additional benefits from virtual headphone testing were discovered besides eliminating sighted and psychological biases: the listening tests are faster, more efficient and more sensitive. When listeners can quickly switch and compare all of the headphones in a single trial, auditory memory is less of a factor, and they are better able to discriminate among the choices. Since this paper was written in 2013, we've improved the accuracy of the virtualization in part by developing a custom pinnae for our GRAS 45 CA that better simulates the leakage effects of headphones measured on real human subjects [3].

Finally, it's important to acknowledge what the virtual headphone method doesn't capture: 1)  non-minimum phase effects (mostly occurring at higher frequencies) and 2)  non-linear distortions that are level-dependent. The effect of these two variables on virtual headphone test method have been recently tested experimentally and will be the topic of a future blog posting. Stay tuned. 


[1] Floyd Toole and Sean Olive,”Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests, and Other Interesting Things,” presented at the 97th AES Convention, preprint 3894 (1994). Download here.

[2] Sean E. 

[3] Todd Welti, "Improved Measurement of Leakage Effects for Circum-Aural and Supra-Aural Headphones," presented at the 38th AES Convention, (May 2014). Download here.

Thursday, March 31, 2016

Harman Gives Loudspeaker Course To U of Rochester Engineering Students

Recently Mark Glazer, Principal Engineer at Harman Luxury Audio and  Revel Loudspeakers gave an invited lecture to University of Rochester Audio/Acoustic Engineering Students. The students are part of the graduate acoustic and music engineering program that is overseen by Professor Mark Bocko, Distinguished Professor, Electrical and Computer Engineering. By exposing the students to the fascinating engineering and science of loudspeakers, it is hoped the students will consider a future career in loudspeaker or audio engineering.

The 1-hour lecture gave an overview of what are the current best practices in designing a modern-day loudspeaker.   

The proof of good loudspeaker design is ultimately judged on how good it sounds. Dr. Sean Olive (me), Acoustic Research Fellow at Harman International  presented an overview of the science of evaluating loudspeakers, which included test results from a competitive benchmarking of the new Revel Concerta 2 M16 (designed by Mark Glazer) against three competitors. The results of the listening test results were generally predictable based on the set of anechoic measurements made of the different loudspeakers.

Following the lecture, we got a tour of the University's engineering facilities, which include some impressive 3D laser scanning tools for studying the vibrational modes of loudspeakers. We heard some very novel flat-panel loudspeakers with vibrational mode control developed by the Ph.D students and Professor Bocko, followed by  presentations of research projects undertaken by the Masters and Ph.D. engineering students who are working in acoustics and audio-related research. Overall, the quality of acoustic and music research being done there is impressive. As always, Professor Bocko was a gracious host, and we look forward to a return visit (hopefully in the summer or fall months).

Mark Glazer's speaker design course slides are available here: