Monday, April 22, 2013

The Relationship between Perception and Measurement of Headphone Sound Quality

Above: The brands and models of six popular headphones used in this study.

In many ways, our scientific understanding of the perception and measurement of headphone sound quality is 30  years behind our knowledge of loudspeakers. Over the past three decades, loudspeaker scientists have developed controlled listening test methods that provide accurate and reliable measures of   listeners' loudspeaker preferences, and their underlying sound quality attributes.  From the perceptual data, a set of acoustical loudspeaker measurements has been identified from which we can model and predict listeners' loudspeaker preference ratings with about 86% accuracy.

In contrast to loudspeakers, headphone research is still in its infancy. Looking at published acoustical measurements of  headphones you will discover there is little consensus among brands (or even within the same brand) on how a headphone should sound and measure [1]. There exists too few published studies based on controlled headphone listening tests to identify which objective measurements and target response curves produce an optimal sound quality. Controlled, double-blind comparative  subjective evaluations of different headphones present significant logistical challenges to the researcher that include controlling headphone tactile and visual biases. Sighted biases related to price, brand, and cosmetics have been shown to significantly bias listeners judgements of loudspeaker sound quality. Therefore, these nuisance variables must be controlled in order to obtain accurate assessments of headphone sound quality.

Todd Welti and I recently conducted a study to explore the relationship between the perception and measurement of headphone sound quality. The results were presented at the 133rd AES Convention in San Francisco,  in October 2012.  A PDF of the slide presentation referred to below can be found here. The AES preprint can be found in the AES E-library. The results of this study are summarized below.

Measuring The Perceived Sound Quality of Headphones

Double-blind comparative listening tests were performed on six popular circumaural headphones ranging in price from $200 to $1000 (see above slide).  The listening tests were carefully designed to minimize biases from known listening test nuisance variables (slides 7-13). A panel of 10 trained listeners rated each headphone based on overall preferred sound quality, perceived spectral balance, and comfort. The listeners also gave comments on the perceived timbral, spatial, dynamic attributes of the headphones to help explain their underlying sound quality preferences.

The headphones were compared four at a time over three listening sessions (slide 12).  Assessments were made using three music programs with one repeat to establish the reliability of the listeners' ratings.  The  order of headphone presentations, programs and listening sessions were randomized to minimize learning and order-related biases. The test administrator manually substituted the different headphones on the listener from behind so they were not aware of the headphone brand, model or appearance during the test  (slide 8).  However, tactile/comfort differences were part of the test.  Listeners could adjust the position of the headphones on their heads via light weight plastic handles attached to the headphones.

Listeners Prefer Headphones With An Accurate, Neutral Spectral Balance

When the listening test results were statistically analyzed, the main effect on the preference rating was  due to the different headphones (slide 15).  The  preferred headphone models were perceived as having the most neutral, even spectral balance (slide 19) with the less preferred models having too much or too little energy in the bass, midrange or treble regions.  Frequency analysis of listeners' comments confirmed listeners' spectral balance ratings of the headphones, and proved to be a good predictor of overall preference (slide 20). The most preferred headphones were frequently described as "good spectral balance, neutral with low coloration, and good bass extension," whereas the less preferred models were frequently described as "dull, colored, boomy, and lacking midrange".

Looking at the individual listener preferences, we found good agreement among listeners in terms of which models they liked and disliked (slides 16 and 18). Some of the most commercially successful models were among the least preferred headphones in terms of sound quality. In cases where an individual listener had poor agreement with the overall listening panel's headphone preferences, we found either the listener didn't understand the task (they were less trained),  or the headphone didn't properly fit the listener, thus causing air leaks and poor bass response; this was later confirmed by doing in-ear measurements of the headphone(s) on individual listeners (slides 26-39).

Measuring the Acoustical Performance of Headphones

Acoustical measurements were made on each headphone using a GRAS 43AG Ear and Cheek simulator equipped with an IEC 711 coupler (slide 24). The measurement device is intended to simulate the acoustical effects of an average human ear including the acoustical interactions between the headphone and the acoustical impedance of the ear.  The headphone measurements shown below include these interactions as well as the transfer function of the ear, mostly visible in the graphs as a ~10 dB peak at around 3 kHz.  It is important to note that we since we are born with these ear canal resonances, we have adapted to them and don't "hear" them as colorations.

Relationship between Subjective and Objective Measurements 

Comparing the acoustical measurements of the headphones to their perceived spectral balance confirms that the more preferred headphones generally have a smooth and extended response below 1 kHz that is perceived as an ideal spectral balance (slide 25). The least preferred headphones  (HP5 and HP6)   have the most uneven measured and perceived frequency responses below 1 kHz, which generated listener comments such as "colored, boomy and muffled."  The measured frequency response of HP4 shows a slight bass boost below 200 Hz, yet on average it was perceived as sounding thin; this headphone was one of the models that had bass leakage problems for some listeners due to a poor seal on their ears.

Above: The left and right channel frequency response measurements of each headphone are shown above the  mean preference rating and 95% confidence interval it received in blind listening tests. The dotted green response on each graph shows the "perceived spectral balance" based on the listeners' responses.


In conclusion, this headphone study is one of the first of its kind to report results based on controlled, double-blind listening tests [2]. The results provide evidence that trained listeners preferred the headphones perceived to have the most neutral, spectral balance. The acoustical measurements of the headphone generally confirmed and predicted which headphones listeners preferred. We also found that bass leakage related to the quality of fit and seal of the headphone to the listeners'  head/ears can be a significant nuisance variable in subjective and objective measurements of headphone sound quality.

It is important for the reader not to draw generalizations from these results beyond the conditions we tested. One audio writer has already questioned whether headphone sound quality preferences of trained listeners can be extrapolated to tastes of untrained younger demographics whose apparent appetite for bass-heavy headphones might indicate otherwise. We don't know the answer to this question. For younger consumers, headphone purchases may be  driven more by fashion trends and marketing B.S. (Before Science) than sound quality.  While this question is the focus of future research, the preliminary data suggests  in blind A/B comparisons kids pref headphones with accurate reproduction to colored, bass-heavy alternatives.  This would tend to confirm findings from previous investigations into loudspeaker preferences of high school and college students (both Japanese and American) that so far indicates most listeners prefer accurate  sound reproduction regardless of age, listener training or culture.

Future headphone research may tell us (or not) that most people prefer accurate sound reproduction regardless of whether the loudspeakers are installed in the living room, the automobile, or strapped onto the sides of their head.  It makes perfect sense, at least to me. Only then will listeners hear the truth --  music reproduced as the artist intended.

[1] Despite the paucity of good subjective measurements on headphones there does exist some online resources where you can find objective measurements on headphones. You will be hard pressed to find a manufacturer who will supply these measurements of their products. The resources include, Sound & Vision Magazine, and  Tyll Hertsens at InnerFidelity  has a large database of frequency response measurements of headphones that clearly illustrate the lack of consensus among manufacturers on how a headphone should sound and measure. There is even a lack of consistency among different models made by the same brand.

[2]  Sadly, studies like this present one are so uncommon in our industry that Sound and Vision Magazine  recently declared this paper as the biggest audio story in 2012. Hopefully that will change sooner than later.