Monday, April 22, 2013

The Relationship between Perception and Measurement of Headphone Sound Quality

Above: The brands and models of six popular headphones used in this study.

In many ways, our scientific understanding of the perception and measurement of headphone sound quality is 30  years behind our knowledge of loudspeakers. Over the past three decades, loudspeaker scientists have developed controlled listening test methods that provide accurate and reliable measures of   listeners' loudspeaker preferences, and their underlying sound quality attributes.  From the perceptual data, a set of acoustical loudspeaker measurements has been identified from which we can model and predict listeners' loudspeaker preference ratings with about 86% accuracy.

In contrast to loudspeakers, headphone research is still in its infancy. Looking at published acoustical measurements of  headphones you will discover there is little consensus among brands (or even within the same brand) on how a headphone should sound and measure [1]. There exists too few published studies based on controlled headphone listening tests to identify which objective measurements and target response curves produce an optimal sound quality. Controlled, double-blind comparative  subjective evaluations of different headphones present significant logistical challenges to the researcher that include controlling headphone tactile and visual biases. Sighted biases related to price, brand, and cosmetics have been shown to significantly bias listeners judgements of loudspeaker sound quality. Therefore, these nuisance variables must be controlled in order to obtain accurate assessments of headphone sound quality.

Todd Welti and I recently conducted a study to explore the relationship between the perception and measurement of headphone sound quality. The results were presented at the 133rd AES Convention in San Francisco,  in October 2012.  A PDF of the slide presentation referred to below can be found here. The AES preprint can be found in the AES E-library. The results of this study are summarized below.

Measuring The Perceived Sound Quality of Headphones

Double-blind comparative listening tests were performed on six popular circumaural headphones ranging in price from $200 to $1000 (see above slide).  The listening tests were carefully designed to minimize biases from known listening test nuisance variables (slides 7-13). A panel of 10 trained listeners rated each headphone based on overall preferred sound quality, perceived spectral balance, and comfort. The listeners also gave comments on the perceived timbral, spatial, dynamic attributes of the headphones to help explain their underlying sound quality preferences.

The headphones were compared four at a time over three listening sessions (slide 12).  Assessments were made using three music programs with one repeat to establish the reliability of the listeners' ratings.  The  order of headphone presentations, programs and listening sessions were randomized to minimize learning and order-related biases. The test administrator manually substituted the different headphones on the listener from behind so they were not aware of the headphone brand, model or appearance during the test  (slide 8).  However, tactile/comfort differences were part of the test.  Listeners could adjust the position of the headphones on their heads via light weight plastic handles attached to the headphones.

Listeners Prefer Headphones With An Accurate, Neutral Spectral Balance

When the listening test results were statistically analyzed, the main effect on the preference rating was  due to the different headphones (slide 15).  The  preferred headphone models were perceived as having the most neutral, even spectral balance (slide 19) with the less preferred models having too much or too little energy in the bass, midrange or treble regions.  Frequency analysis of listeners' comments confirmed listeners' spectral balance ratings of the headphones, and proved to be a good predictor of overall preference (slide 20). The most preferred headphones were frequently described as "good spectral balance, neutral with low coloration, and good bass extension," whereas the less preferred models were frequently described as "dull, colored, boomy, and lacking midrange".

Looking at the individual listener preferences, we found good agreement among listeners in terms of which models they liked and disliked (slides 16 and 18). Some of the most commercially successful models were among the least preferred headphones in terms of sound quality. In cases where an individual listener had poor agreement with the overall listening panel's headphone preferences, we found either the listener didn't understand the task (they were less trained),  or the headphone didn't properly fit the listener, thus causing air leaks and poor bass response; this was later confirmed by doing in-ear measurements of the headphone(s) on individual listeners (slides 26-39).

Measuring the Acoustical Performance of Headphones

Acoustical measurements were made on each headphone using a GRAS 43AG Ear and Cheek simulator equipped with an IEC 711 coupler (slide 24). The measurement device is intended to simulate the acoustical effects of an average human ear including the acoustical interactions between the headphone and the acoustical impedance of the ear.  The headphone measurements shown below include these interactions as well as the transfer function of the ear, mostly visible in the graphs as a ~10 dB peak at around 3 kHz.  It is important to note that we since we are born with these ear canal resonances, we have adapted to them and don't "hear" them as colorations.

Relationship between Subjective and Objective Measurements 

Comparing the acoustical measurements of the headphones to their perceived spectral balance confirms that the more preferred headphones generally have a smooth and extended response below 1 kHz that is perceived as an ideal spectral balance (slide 25). The least preferred headphones  (HP5 and HP6)   have the most uneven measured and perceived frequency responses below 1 kHz, which generated listener comments such as "colored, boomy and muffled."  The measured frequency response of HP4 shows a slight bass boost below 200 Hz, yet on average it was perceived as sounding thin; this headphone was one of the models that had bass leakage problems for some listeners due to a poor seal on their ears.

Above: The left and right channel frequency response measurements of each headphone are shown above the  mean preference rating and 95% confidence interval it received in blind listening tests. The dotted green response on each graph shows the "perceived spectral balance" based on the listeners' responses.


In conclusion, this headphone study is one of the first of its kind to report results based on controlled, double-blind listening tests [2]. The results provide evidence that trained listeners preferred the headphones perceived to have the most neutral, spectral balance. The acoustical measurements of the headphone generally confirmed and predicted which headphones listeners preferred. We also found that bass leakage related to the quality of fit and seal of the headphone to the listeners'  head/ears can be a significant nuisance variable in subjective and objective measurements of headphone sound quality.

It is important for the reader not to draw generalizations from these results beyond the conditions we tested. One audio writer has already questioned whether headphone sound quality preferences of trained listeners can be extrapolated to tastes of untrained younger demographics whose apparent appetite for bass-heavy headphones might indicate otherwise. We don't know the answer to this question. For younger consumers, headphone purchases may be  driven more by fashion trends and marketing B.S. (Before Science) than sound quality.  While this question is the focus of future research, the preliminary data suggests  in blind A/B comparisons kids pref headphones with accurate reproduction to colored, bass-heavy alternatives.  This would tend to confirm findings from previous investigations into loudspeaker preferences of high school and college students (both Japanese and American) that so far indicates most listeners prefer accurate  sound reproduction regardless of age, listener training or culture.

Future headphone research may tell us (or not) that most people prefer accurate sound reproduction regardless of whether the loudspeakers are installed in the living room, the automobile, or strapped onto the sides of their head.  It makes perfect sense, at least to me. Only then will listeners hear the truth --  music reproduced as the artist intended.

[1] Despite the paucity of good subjective measurements on headphones there does exist some online resources where you can find objective measurements on headphones. You will be hard pressed to find a manufacturer who will supply these measurements of their products. The resources include, Sound & Vision Magazine, and  Tyll Hertsens at InnerFidelity  has a large database of frequency response measurements of headphones that clearly illustrate the lack of consensus among manufacturers on how a headphone should sound and measure. There is even a lack of consistency among different models made by the same brand.

[2]  Sadly, studies like this present one are so uncommon in our industry that Sound and Vision Magazine  recently declared this paper as the biggest audio story in 2012. Hopefully that will change sooner than later.


  1. Dr. Olive,

    as usual another great and insightful article. As a dedicated reader I appreciate the scientific side of the audio business.

    As someone who has owned the AKG K701 headphones I would have classified them as dynamic/open not closed.

    Cheers and keep up the great audio science.

  2. Tony,

    Thanks and you are correct: the AKGK701 should read: Dynamic/Open since they are both.

  3. Oh, we are finally seeing the light in this snake-oil-god-forbidden industry of headphones..! :D

  4. Very interesting, thank you. Am I right in my assumption that Audeze LCD2 is HP1, AKG K550 HP2, AKG K701 HP3 and so on?

  5. Julie
    No that's not quite the correct order :)

  6. Rin Choi,
    Yes, there is a fair amount of snake-oil in the headphone industry just like there is/was in the loudspeaker industry. All you need is a good industrial designer, a marketing budget, a celebrity to endorse your product, and a ODM supplier in China and your in the headphone business.

    Hopefully, science will help sort out the serious players from the wannabes,

  7. Thanks for the article. Let me take a shot:

    HP1 - LCD-2
    HP2 - K701
    HP4 - K550

    I too, had issues when I tried the K550 at a store, however if I pressed the earcups against my ears, the bass magically appeared.

    Further guessing:

    HP3 - Crossfade
    HP5 - Bose
    HP6 - Beats

    1. AnonymousApril 23, 2013 at 8:39 AM

      You have 3 of 6 correct.

    2. Sean,

      Did you find differences between listeners, i.e. did you find people with rather different ear canal sizes react differently?

      I'm curious, I have seen what appeared to be some connections to that vs. midrange response in headphones, but I haven't the resources to confirm it.

      Bearing mind I am talking purely about listener preference here, not about accuracy.

      As to accuracy, did you use a standard coupler?

  8. Not one mention of HeadRoom ( who have been measuring headphones since the late 90's or Tyll at Innerfidelity who has continued and expanded the headphone measurements? Seems like you should have done a bit more research on the subject of headphone measurements. Also the difficulty of obtaining accurate measurements due to fit issues not allowing for a proper seal on the measurement head, just like it would on different sized human heads as you mentioned.

    Otherwise a well written article, just wished you would have given props to those who've being doing this measurement thing for many years.

    1. See footnote 1 which refers to

  9. Ah, this is really great to see. As a professional audio engineer, I've found the process of choosing headphones maddening. I've also been perplexed by the variance between what others describe to me when listening to a particular set of headphones, and what I hear. Thanks so much for this research and for posting your results! It would be great to see more of this, in particular surveying the current professional studio standards; ATH-M50s, Sennheiser HD650s, Sony MD7905s, etc... Don't suppose we'll ever get standardized measurements published by the manufacturers though.


  10. AnonymousApril 23, 2013 at 11:07 AM
    Thanks for your note and for reminding me to mention the people like Tyll who have been doing excellent work measuring headphones. I commend what they are doing and it only helps the cause to bring more science to the design and testing of headphones. They help illustrate the fact that headphone manufacturers seem to have no consensus in how a headphone should measure.

    What we need more of is controlled listening tests to help us interpret the perceptual meaning of the objective measurements.

  11. @ Rich Breen
    You may find some of these headphone measurements posted at Innerfidilty

    I think standardized headphone measurements may happen once we have some rules to help us interpret their meaning in terms of how they sound. Whether manufacturers choose to show them is different matter. It hasn't happened very much with loudspeakers, so I wouldn't hold my breathe/

  12. This article is about a scientific study that was conducted yet does not discuss the specific findings. With overall agreeing results from the listeners, it maddens me that the specific results were not mentioned. Unreal.

  13. Dr. Olive, Thanks for including the Audeze LCD2 in your paper. This research reinforces our development philosophy. Sankar/Audeze

  14. @ AnonymousApril 23, 2013 at 12:51 PM
    I'm not sure what you mean. We don't identify the brands/models of the headphones in the results for reasons that we don't wish to confuse scientific findings with commercial conflict of interests.
    Not mentioning the brands in the results doesn't change the findings which show a relationship between headphone preference, perceived spectral balance and the measurements.

    Clearly, this is just the beginning of our research, and more headphones, listening tests with different listeners need to be tested to better define the relationship between perception and measurement of headphone sound quality. It's a work in progress.

  15. LCD2, K701, K550, Crossfade, Confort15, Beats ?

  16. I agree with Dr. Olive in that the headphones didn't needed to be specified as the results of the test was the real output.

    Anyway, looking at HP1 a very neutral headphone, then HP2 a headphone that features a diffuse field response. It's pretty interesting how the HP2 was perceived, if we use 1kHz as a reference point on the green curve, it can be seen that it appears to lack bass, however if we look at the frequency response curve, we can see that this is not the case, so perhaps the diffuse field equalization is at fault?

  17. Very interesting. research. While I am a blogger as well.
    Would you please follow my site and like me on facebook.
    I have already become your fans.

  18. What sort of people were the 'panel of 10 trained listeners'?

    1. The listeners were all Harman employees who had normal audiometric hearing and has passed at least level 8 of Harman How to Listen.

  19. Dr. Olive,

    in order to better understand the science I was curious if you could explain the transfer function of the ear when the ear canal is blocked by the in-ear monitor type headphones. I am trying to understand some of the graphs when they don't exhibit that normal drop around 3kHz that we see in your graphs of the frequency measurements.

    Thanks again.


  20. Every diffuse field EQ I've heard sounds bright.... Almost looks like a target curve could be simply flat.

  21. DanTheManApril 28, 2013 at 9:50 PM

    Yes, I think the DF EQ is generally too bright when listening to stereo recordings intended for loudspeakers. Headphones need to simulate what is heard through a good loudspeaker in a good room -- and that doesn't quite resemble what the DF target curve looks like.

    1. Dr. Olive, could you comment on which of the following statements might be true:
      a) The response curve obtained with a good loudspeaker in a good room measured with a standard ear simulator (IEC 711) with a standard pinna (Type 3.3) on a standard manikin (Kemar; P.58 HATS) is substantially different than the published DF "target" curve.
      b) The two curves are substantially the same but headphones with that response curve are not preferred when listening to stereo recordings intended for loudspeakers.
      Do your findings suggest that headphones with the response curve defined in hypothesis a) will not be preferred?

    2. Osman IsvanMay 7, 2013 at 10:20 PM

      To answer your questions specifically:

      a) Yes that is true. What we measure in our listening room is something in between DF and FF with added room gain at LF. That is generally what listeners prefer when listening to stereo recordings intended for loudspeakers.

  22. Dr. Olive,

    Thank you for publishing this. It confirms may thoughts I've had about headphones and measurements. Do you reveal which model is which in your AES paper?

    Here's my guess after comparing your measurements to ours. Since you don't reveal which model of V-MODA, I'm guessing it might be the LP version. However, the bass response curve is similar to the M100 so I'm pretty sure HP-6 is at least made by V-MODA, just not sure about the model.

    HP1 - LCD-2
    HP2 - K701
    HP3 - K501
    HP4 - Beats by Dre
    HP5 - Bose QC15
    HP6 - V-moda crossfade

  23. Dr. Olive,I wish you'd still post as regularly as you used to, but again it has been worth the wait ;) .

    I've had a couple of headphones through the years, until I purchased a Sennheiser HD600. In my opinion it is a good sounding device, but since I don't really know how to interpret the measurements, it's still just that - my opinion. It was however kind of a shock to find out that to my ears the in-ear headphones supplied with my Samsung Galaxy phone sound quite similar. Interesting stuff.

    I am by now convinced that what is a good loudspeaker to me, should be a good loudspeaker to anyone who has more or less normal hearing. Do you think in time we may be able to say the same with respect to headphones, or do you think that our individual hearing may be too different, such that it will to a much greater extent remain a matter of personal preference?

    Kind of off-topic, you said above "From the perceptual data, a set of acoustical loudspeaker measurements has been identified from which we can model and predict listeners' loudspeaker preference ratings with about 86% accuracy". 86% is quite reasonable, but do you have any idea why your predictions aren't even more accurate? Are the different measurements that are used in the model perhaps not weighted optimally, or do you suppose there might be some factor that is not included in your performance estimates? If I am not mistaking, in AB comparisons loudspeakers are generally positioned at the exact same locations - is it perhaps possible that that same location is not optimal for each loudspeaker? Or is it perhaps that some loudspeakers favor certain program material more than others? I'd love to hear your opinion.

    Kind regards,


    1. Martijn MMay 7, 2013 at 5:44 AM

      I wish I had more free time to devote to this blog but I get paid the same whether I work on it or not (and working on it arguably detracts from the things I get paid for).

      I agree with your assessment of loudspeakers: What sounds good to you should generally sound good to most listeners assuming you both have normal hearing and make judgements under the same listening conditions.

      With headphones the same principles would generally apply as long as the same signals are being delivered to your ears. The main issue is the headphones are strongly coupled to the person's head and this can have a strong influence on the signals being delivered to the ear drum.

      Bass leakage is a huge issue with headphones, particularly for closed and in-ear designs. With loudspeakers bass quality accounts for 30% of listener's overall loudspeaker preference so I suspect this is true with headphones too.

      Why can't we predict loudspeaker preference better than 86% accuracy? There are a number of reasons but here are two obvious ones that I pointed out in the original AES preprint:
      1) The prediction model only includes linear distortions in the loudspeaker measurements -- not nonlinear distortion
      2) The accuracy of prediction is limited by limitations in the listening tests that created variance from context effects, elastic scale. The 86% accuracy was for the model based on 70 loudspeakers evaluated over 19-20 different listening tests. For the Consumer Reports tests, where we controlled context effects by comparing all combinations of loudspeakers the accuracy was almost 100%.

    2. Dr Olive, your response above leads nicely into a question I have been meaning to ask. Are you any closer to understanding the effects of nonlinear distortion on listener preference? Also, do you have any upcoming experiments designed to investigate this issue?
      Many thanks

    3. We keep planning some experiments on effects of nonlinear distortion in loudspeakers, but then the price of neodymium falls again and there is less reason to study it :)

      If you design good transducers and don't drive them beyond their excursion limits, it's not much of an issue. But with marketing pressure on small footprint, lower power, and lower cost, nonlinear distortion is an issue.

    4. Dr Olive, you make a very good point re the marketing pressure. You only need to look at the Harman Kardon website to realise there must be a large market for small loudspeakers with circa 3 inch drivers. It must be very technically challenging to produce high quality sound that can play loud enough at low enough frequency to make bass non directional. It would be interesting to hear how Harman approach this challenge?

      In the future I can see maybe DSP coming into play with powered loudspeakers similar to the drivers used in the On beat extreme, I saw a great YouTube clip on these drivers and their excursion capability. However with this approach you run into the issue of needing 2 cables to each speaker, something I'm sure the marketing department also don't like! Do you think this is where the industry is headed, maybe with wireless technology, WiSA perhaps?

      Finally, sorry for the essay!! And more related to the original article. I have recently been reading about DTS headphone x, seems interesting and would be useful for late night listening! I'm guessing they use some kind of BRS recording with some algorithms laid on top. Not sure if there is any head tracking though. Any idea if Harman will incorporate this technology in future products?

      Again apologies for the lengthy response.
      Kind regards

    5. I think the trend to make thinner, more compact audio systems will continue as HDTV flat panels get smaller and thinner.

      That probably means that audio systems will be challenged to play loud with low bass. The application of DSP to cancel nonlinear distortion may become more commonplace. Wireless systems may help encourage the use of smaller more compact subwoofers which will help.

      I heard DTS Headphone X at CES. My understanding is they binaurally process the 5-7 channels into 5-7 virtualized channels that are stored on the Blu-RAY as 2-channels. There is no accommodation for head-tracking and how well it works depends on how good the headphones are. Hence, they are apparently going to recommend or approve certain models of headphones that meet their performance standards.

  24. Dr Olive,

    I'd like to have a go at identifying the headphones, please.

    HP5-Beats by Dr Dre



    1. AnonymousMay 12, 2013 at 8:13 AM

      You have 4 out of 6 correct :)

  25. Brilliant article! Just one question. Is HP1 the LCD2?

  26. Articles like this should get way more exposure. You're doing valuable work!

  27. Dr. Olive,

    Firstly, let me commend you on this paper and, hopefully, following ones.

    I've been doing some A/B blind testing in the past but with a different aim, I was looking to find what people preferred, accuracy-wise that is,in the Circumaural Vs. SupraAural Vs. InEar, unfortunately, my testing equipment (i.e the G.R.A.S system in this article) wasn't that good and my panel was comprised out of friends and colleagues, thus, rendering my results as less reliable.

    I would like to point out though, that a vital piece of information is missing here, I'm talking about the impulse response, or preferably, a decay plot, of said headphones, which is, in my opinion, a most important factor in an accurate sound reproduction.

    Keep up this sacred work.


  28. Really appreciate the study, Dr. Olive. It's my first read on your blog and I'm looking forward to your future articles.

    Will you be conducting blind tests between different headphone cables in the future? I'm sure you're aware of aftermarket headphone cable manufacturers who claims their cables provide an improvement in sound quality. This subject has a been a center of a longstanding debate between subjective and objective listeners. I believe proper study like the above is sorely needed to dispel the myths behind "high end" audio cables.

  29. This is killing me... It seems like there should be enough data to get an indication of which headphone is which, based on the FR graphs and hints of Sean Olive... The PDF just takes forever to browse through as I'm trying to figure it out.

    From personal experience, I found the LCD2 to be the most accurate thing I've heard. There are 3 times I've been shocked by headphones: With my first PortaPro which gave me many things earbuds couldn't, when I got used to Etymotics out of a decent source, and when I heard the LCD2... I've not heard anything better since.

  30. Dr Olive, an experiment that is of dire need for headphone consumers. Thank you so much for the wealth of knowledge you bring to us mere mortals!

    PS Will you ever reveal what HP1 is?

    1. No one has offered me enough money yet to divulge the identities :)

      However, some people have figured it out.

  31. Sorry for the double post but, contrary to what other people think here, I don't think HP1 is the LCD-2. Looking at the data, it would seem that in fact HP2 is the LCD-2 considering that it has an exaggerated bass. Am I correct in thinking this is the true nature of things?

    HP5-Beats by Dr Dre

  32. Could you just use an EQ to level out a headphone frequency response?

    1. Yes, to a large extent you can improve the sound quality of a headphone through equalization. However, you need to be able to accurately measure the headphone and know what the target curve should be. We actually equalized several headphones to different target curves, as well as simulated several different headphones on the same headphone by equalizing it to the measured responses of the different phones. The results are summarized in the Innerfidelity story found in this link:

  33. I’m sure you will provide the more awesome blogs like these blogs that I’ve enjoyed a lot. More about Geo Top News

  34. Could You Review Superlux Headphone.

  35. Very interesting Article. But it made me too curious which HP has which number. There are 2 statements Ehen Dr. Olive says 3/6 right and 4/6 but thats just impossible because there are not 3 HP with the same number in the two lists. So I Gruess it looks Luke this:
    HP1 - LCD
    HP2 - K701
    HP3 - Bose
    HP4 - k550
    HP5 - V-Moda
    HP6 - well, the teenager's toy

    I'm surprised that K550 didn't get Vetter ratings. It is evident that not all listeners did get a proper seal.

  36. Its:
    1. LCD
    2. K701
    3. K550
    6. Beats
    Thats by looking at innerfidelity graphs.

  37. Am trying to support an audio club interested in staging some loudspeaker listening/testing... we are in search of any listening test "forms" that might exist such that the listeners commentary/responses can be better elicited/gathered... including some single blind abx evaluation against our reference pair... and a bit of basic bench testing... not looking for any ground breaking research, just a basic protocol that can generate an objective, useful quick look compiled from 6-10 listeners.

    Mike Miles