Showing posts with label Sean Olive. Show all posts
Showing posts with label Sean Olive. Show all posts

Friday, February 17, 2017

TWiRT 337 – Predicting Headphone Sound Quality with Sean Olive

The predicted sound quality of 61 different models of in-ear headphones (blue curve) versus their retail price (green bars).
On February 16, 2017 I was interviewed by host Kirk Harnack on This Week in Radio Tech. The topic was  "Predicting Sound Headphone Sound Quality". You can find the interview here.

During the interview, Kirk asked if it's possible to design a good sounding headphones for a reasonable cost. Or does one need to spend a considerable amount of cash to obtain good sound? Fortunately for consumers,   my answer was that you can get decent sound without having to spend thousands or even hundreds of dollars. In fact, there is almost no correlation between price and sound quality based on our research.

 I referred to the slide above that shows the predicted sound quality for 61 different models of in-ear headphones based on their measured frequency response.  The correlation between price and sound quality is close to zero and, slightly negative: r = -.16 (i.e. spending more money gets you slightly worse sound on average).

So, if you think spending a lot of money on in-ear headphones guarantees you will get excellent sound, you may be sadly disappointed. One of the most expensive IE models ($3000) in the above graph, had a underwhelming predicted score of 20-25% depending what EQ setting you chose. The highest scoring headphone was a $100 model that we equalized to hit the Harman target response, which our research has shown to be preferred by the majority of listeners.

The sound quality scores in the graph are predicted using a model based on a small sample of headphones that were evaluated by trained listeners in double-blind test. The accuracy of the model is better than 96% but limited to the small sample we tested.  We just completed a large listening test study involving over 30 models and 75 listeners that will allow us to build more accurate and robust predictive models. 

The ultimate goal of this research is to accurately predict the sound quality of headphones based on acoustic measurements without having to conduct expensive and time consuming listening tests. The current engineering approach to tuning headphones is clearly not optimal based on the above slide. Will headphone industry standards, headphone manufacturers and audio review magazines use similar predictive models to reveal to consumers how good the headphones sound?  What do you think?

Friday, April 22, 2016

A Virtual Headphone Listening Test Method

Fig. 1 The Harman Headphone Virtualizer App allows listeners to make double-blind comparisons of  different headphones through a high-quality replicator headphone. The  app has two listening modes: a sighted mode (shown) and a blind mode (not shown) where listeners are not biased by non-auditory factors (brand, price, celebrity endorsement,etc). Clicking on the picture will show a larger version.

Early on in our headphone research  we realized there was a need to develop a listening test method that allowed us to conduct more controlled double-blind listening tests on different headphones.  This was necessary in order to remove tactile cues (headphone weight and clamping force), visual and psychological biases  (e.g. headphone brand, price, celebrity endorsement,etc )  from listeners' sound quality judgements of headphones.  While these factors (apart from clamping force) don't physically affect the sound of headphones, our  previous research [1]  into blind vs. sighted listening tests revealed their cognitive influence affects listeners'  loudspeaker preferences [1], often in adverse ways. In sighted tests,  listeners were also less sensitive and  discriminating compared to blind conditions when judging different loudspeakers including their interaction with different music selections and loudspeaker positions in the room. For that reason, consumers should be dubious of loudspeaker and headphone reviews that are based solely on sighted listening.

While blind loudspeakers listening tests are possible through the addition of an acoustically-transparent- visually-opaque-curtain,  there is no simple way to hide the identity of a headphone when the listener is wearing it.  In our first headphone listening tests,  the experimenter positionally substituted the different headphones onto the listener's head from behind so that the headphone could not be visually identified. However, after a couple of trials, listeners began to identify certain headphones simply by their weight and clamping force. One of the easiest headphones for listeners to identify was the Audeze LCD-2, which was considerably heavier (522 grams) and more uncomfortable than the other headphones. The test was essentially no longer blind.

To that end, a virtual headphone method was developed whereby listeners could A/B different models of headphones that were virtualized through a single pair of headphones (the replicator headphone). Details on the method and its validation were presented at the 51st Audio Engineering Society International Conference on Loudspeakers and Headphones [2] in Helsinki, Finland in 2013.  A PDF of the slide presentation can be found  here.

Headphone virtualization is done by measuring the frequency response of the different  headphones at the DRP (eardrum reference point) using a G.R.A.S. 45 AG, and then equalizing the replicator headphone to match the measured responses of the real headphones.  In this way, listeners can make instantaneous  A/B comparisons between any number of virtualized headphones through the same headphone without the visual and tactile clues biasing their judgment. More details about the method are in the slides and AES preprint.

An important questions is: "How accurate are the virtual headphones compared to the actual headphones"?  In terms of their linear acoustic performance they are quite similar. Fig. 2 compares the  measured frequency response of the actual versus virtualized headphones.  The agreement is quite good up to 8-10 kHz above which we didn't aggressively equalize the headphones because of measurement errors and large variations related to headphone positioning both on the coupler and the listeners' head.


Fig. 2 Frequency response measurements of the6  actual versus virtualized headphones made on a  GRAS 45 AG coupler with pinna. The dotted curves are based on the physical headphone and the solid curves are from the virtual (replicator) headphone.  The measurements of the right channel of the headphone (red curves) have been offset by 10 dB from the left channels (blue curve) for visual clarify. Clicking on the picture will show a larger version.

More importantly, "Do the actual and virtual headphones sound similar"? To answer this question we performed a validation experiment where listeners evaluated 6 different headphone using both standard and virtual listening methods Listeners gave both preference and spectral balance ratings in both standard and virtual tests. For headphone preference ratings the correlation between standard and virtual test results was r = 0.85. A correlation of 1 would be perfect but 85% agreement is not bad, and hopefully more accurate than headphone ratings based on sighted evaluations. 

The differences between virtual and standard test results we believe are in part due to nuisance variables that were not perfectly controlled across the two test methods. A significant nuisance variable would likely be headphone leakage that would affect the amount of bass heard depending on the fit of the headphone on the individual listener. This would have affected the results in the standard test but not the virtual one where we used an open-back headphone that largely eliminates leakage variations across listeners.  Headphone weight and tactile cues were present in the standard test but not the virtual test, and this could in part explain the differences in results.  If these two variables could be better controlled even higher accuracy can be achieved in virtual headphone listening.

Fig.3 The mean listener preference ratings and 95% confidence intervals shown for the headphones rated using the Standard and Virtual Listening Test Methods. The Standard Method listeners evaluated the actual headphones with tactile/weigh biases and any leakage effects. In the Virtual Tests, there were no visual or tactile cues about the headphones. Note: Clicking on the picture will show a larger version.


Some additional benefits from virtual headphone testing were discovered besides eliminating sighted and psychological biases: the listening tests are faster, more efficient and more sensitive. When listeners can quickly switch and compare all of the headphones in a single trial, auditory memory is less of a factor, and they are better able to discriminate among the choices. Since this paper was written in 2013, we've improved the accuracy of the virtualization in part by developing a custom pinnae for our GRAS 45 CA that better simulates the leakage effects of headphones measured on real human subjects [3].

Finally, it's important to acknowledge what the virtual headphone method doesn't capture: 1)  non-minimum phase effects (mostly occurring at higher frequencies) and 2)  non-linear distortions that are level-dependent. The effect of these two variables on virtual headphone test method have been recently tested experimentally and will be the topic of a future blog posting. Stay tuned. 

References

[1] Floyd Toole and Sean Olive,”Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests, and Other Interesting Things,” presented at the 97th AES Convention, preprint 3894 (1994). Download here.

[2] Sean E. 

[3] Todd Welti, "Improved Measurement of Leakage Effects for Circum-Aural and Supra-Aural Headphones," presented at the 38th AES Convention, (May 2014). Download here.




Wednesday, October 22, 2014

The Influence of Listeners' Experience, Age and Culture on Headphone Sound Quality Preferences

At the recent 137th convention of the Audio Engineering Society we presented our latest research paper entitled, "The Influence of Listeners' Experience, Age and Culture on Headphone Sound Quality Preferences."

The paper describes some double-blind  headphone listening tests conducted in four different countries (Canada, USA, China and Germany) involving 238 listeners of different ages, gender and listening experiences. Listeners gave comparative preference ratings for three popular headphones and a new reference headphone that were virtually presented through a common replicator headphone equalized to match their measured frequency responses. In this way, biases related to headphone brand, price, visual appearance and comfort were removed from listeners’ judgment of sound quality. On average, listeners preferred the reference headphone that was based on the in-room frequency response of an accurate loudspeaker calibrated in a reference listening room. This was generally true regardless of the listener’s experience, age, gender and culture. This new evidence suggests a headphone standard based on this new target response would satisfy the tastes of most listeners. 

The paper is available for download from the AES e-library. You can also find a PDF of our presentation here or view the presentation on YouTube.



Wednesday, June 11, 2014

My Article on Headphone Sound Quality in 2014 LIS

The 2014 Loudspeaker Industry Sourcebook came out this week. In it, you can find an article I wrote called "Perceiving and Measuring Headphone Sound Quality: Do Listeners Agree on What Makes a Headphone Sound Good?"

The article is a summary of some recent published research we've conducted at Harman on the perception and measurement of headphone sound quality.

Together, these studies provide scientific evidence that when headphone brand, price, fashion, and celebrity endorsement are removed subjective evaluations, listeners generally agree on what makes a headphone sound good.

So far, this has been true regardless of users' listening training, age, or culture.  The more preferred headphones tend to have a smooth, extended frequency response that approximates an accurate loudspeaker's in-room response. This new target frequency response could provide the basis for a new and improved headphone target response. You can find more details on the research here.

Tuesday, January 28, 2014

Interview in Professional Sound: The Lack of Meaningful Loudspeaker & Headphone Specs



Last October,  I was in Toronto giving a presentation to the local AES section on the perception and measurement of headphones. After the talk, I sat down with Mike Raine from  Professional Sound for an interview. Some of what we discussed is summarized in this article called Sound Advice.

The theme of article is a recurring one that I've discussed before in this blog (see "The Science and Marketing of Sound Quality" and "What Loudspeaker Specifications are Relevant to Sound Quality?").  The bottom line is that the loudspeaker and headphone industry has utterly failed to provide consumers meaningful product specifications that indicate how truly good (or bad) the products sound. Read on to find out why.

Wednesday, July 3, 2013

The Science and Marketing of Sound Quality

To my surprise, this morning an audio friend tweeted a link to an article I recently wrote for our company's  internal newsletter  entitled, "The Science and Marketing of Sound Quality."  My article can be found on a new Harman Innovation website  launched today that features articles on current and future disruptive technology that will impact consumers' infotainment experiences. Check it out.

My article focuses on a longstanding pet peeve of mine (first mentioned in this blog posting): The lack of  perceptually meaningful loudspeaker and headphone specifications in our industry.  While consumer surveys repeatedly report sound quality to be a driving factor in their audio equipment purchases, consumers lack the necessary tools and information to identify the good sounding products from the duds.

This is particularly true for loudspeakers and headphones where the typical throw-away "10 Hz to 40 kHz" specification provided by the manufacturer is utterly useless. This specification only guarantees that the product makes sound, with no guarantee that the sound is good.  While the science exists today to accurately quantify and predict the perceived the sound quality of  loudspeakers (and hopefully, soon headphones), the audio industry continues to drag its heels into the 21st century,  and not routinely provide this information to consumers.

A rare exception is JBL Professional who provides comprehensive detailed measurements on studio/broadcast monitors like the new JBL M2 Master Reference shown below. Inspecting the measured frequency response curves shown  below, you can easily recognize the loudspeaker sounds exceptionally neutral and accurate based on the shape (flat, smooth, and extended)  Based on this set of measurements, we can predict how a listener would rate the sound quality of the loudspeaker in a controlled listening test, with 86% accuracy. The only pertinent information not shown in this graph is how loud the loudspeaker will play before producing audible distortion (trust me, this loudspeaker will play very loud! )

Perceptually meaningful loudspeaker specifications like these have been available for almost 30 years! Yet,  these specifications are currently not part of any professional and consumer loudspeaker standard. Such a standard would go a long way towards improving the quality and consistency of recorded and reproduced sound. Audio consumers want to hear the truth. We need to provide better information and audio specifications so they can find it.

JBL M2 Master Reference Monitor provides true reference sound quality that is clearly indicated by its technical measurements shown below. 
The spatially-averaged frequency response curves of the JBL M2  (from top to bottom) for the listening window (green), the first reflections (red), and the total radiated sound power.  At the bottom are shown directivity indices of the sound power (dotted blue) and first reflections (dotted red). These measurements tell us that the quality of the direct and reflected sounds produced by the loudspeaker will be very accurate and neutral over a relatively wide listening area.





Monday, July 1, 2013

Harman Researchers Make Important Headway in Understanding Headphone Response

Todd Welti, Sean Olive and Elisabeth McMullin are shown above with their custom binaural mannequin, "Sidney" wearing a pair of AKG K1000's. No fit or leakage issues with these headphones.
Tyll Hertsens, Chief Editor at Innerfidelity recently visited our research labs in Northridge, and wrote a nice story in his blog about our headphone research and visit to Harman. You can read the entire story here.

In his story, Tyll summarizes three of our recent AES papers on headphones, the first one of which I already wrote about in this blog. I hope to write summaries of the other two papers in the upcoming weeks when I can find some free time.

Monday, April 22, 2013

The Relationship between Perception and Measurement of Headphone Sound Quality

Above: The brands and models of six popular headphones used in this study.

In many ways, our scientific understanding of the perception and measurement of headphone sound quality is 30  years behind our knowledge of loudspeakers. Over the past three decades, loudspeaker scientists have developed controlled listening test methods that provide accurate and reliable measures of   listeners' loudspeaker preferences, and their underlying sound quality attributes.  From the perceptual data, a set of acoustical loudspeaker measurements has been identified from which we can model and predict listeners' loudspeaker preference ratings with about 86% accuracy.

In contrast to loudspeakers, headphone research is still in its infancy. Looking at published acoustical measurements of  headphones you will discover there is little consensus among brands (or even within the same brand) on how a headphone should sound and measure [1]. There exists too few published studies based on controlled headphone listening tests to identify which objective measurements and target response curves produce an optimal sound quality. Controlled, double-blind comparative  subjective evaluations of different headphones present significant logistical challenges to the researcher that include controlling headphone tactile and visual biases. Sighted biases related to price, brand, and cosmetics have been shown to significantly bias listeners judgements of loudspeaker sound quality. Therefore, these nuisance variables must be controlled in order to obtain accurate assessments of headphone sound quality.

Todd Welti and I recently conducted a study to explore the relationship between the perception and measurement of headphone sound quality. The results were presented at the 133rd AES Convention in San Francisco,  in October 2012.  A PDF of the slide presentation referred to below can be found here. The AES preprint can be found in the AES E-library. The results of this study are summarized below.

Measuring The Perceived Sound Quality of Headphones

Double-blind comparative listening tests were performed on six popular circumaural headphones ranging in price from $200 to $1000 (see above slide).  The listening tests were carefully designed to minimize biases from known listening test nuisance variables (slides 7-13). A panel of 10 trained listeners rated each headphone based on overall preferred sound quality, perceived spectral balance, and comfort. The listeners also gave comments on the perceived timbral, spatial, dynamic attributes of the headphones to help explain their underlying sound quality preferences.

The headphones were compared four at a time over three listening sessions (slide 12).  Assessments were made using three music programs with one repeat to establish the reliability of the listeners' ratings.  The  order of headphone presentations, programs and listening sessions were randomized to minimize learning and order-related biases. The test administrator manually substituted the different headphones on the listener from behind so they were not aware of the headphone brand, model or appearance during the test  (slide 8).  However, tactile/comfort differences were part of the test.  Listeners could adjust the position of the headphones on their heads via light weight plastic handles attached to the headphones.

Listeners Prefer Headphones With An Accurate, Neutral Spectral Balance

When the listening test results were statistically analyzed, the main effect on the preference rating was  due to the different headphones (slide 15).  The  preferred headphone models were perceived as having the most neutral, even spectral balance (slide 19) with the less preferred models having too much or too little energy in the bass, midrange or treble regions.  Frequency analysis of listeners' comments confirmed listeners' spectral balance ratings of the headphones, and proved to be a good predictor of overall preference (slide 20). The most preferred headphones were frequently described as "good spectral balance, neutral with low coloration, and good bass extension," whereas the less preferred models were frequently described as "dull, colored, boomy, and lacking midrange".

Looking at the individual listener preferences, we found good agreement among listeners in terms of which models they liked and disliked (slides 16 and 18). Some of the most commercially successful models were among the least preferred headphones in terms of sound quality. In cases where an individual listener had poor agreement with the overall listening panel's headphone preferences, we found either the listener didn't understand the task (they were less trained),  or the headphone didn't properly fit the listener, thus causing air leaks and poor bass response; this was later confirmed by doing in-ear measurements of the headphone(s) on individual listeners (slides 26-39).

Measuring the Acoustical Performance of Headphones

Acoustical measurements were made on each headphone using a GRAS 43AG Ear and Cheek simulator equipped with an IEC 711 coupler (slide 24). The measurement device is intended to simulate the acoustical effects of an average human ear including the acoustical interactions between the headphone and the acoustical impedance of the ear.  The headphone measurements shown below include these interactions as well as the transfer function of the ear, mostly visible in the graphs as a ~10 dB peak at around 3 kHz.  It is important to note that we since we are born with these ear canal resonances, we have adapted to them and don't "hear" them as colorations.

Relationship between Subjective and Objective Measurements 

Comparing the acoustical measurements of the headphones to their perceived spectral balance confirms that the more preferred headphones generally have a smooth and extended response below 1 kHz that is perceived as an ideal spectral balance (slide 25). The least preferred headphones  (HP5 and HP6)   have the most uneven measured and perceived frequency responses below 1 kHz, which generated listener comments such as "colored, boomy and muffled."  The measured frequency response of HP4 shows a slight bass boost below 200 Hz, yet on average it was perceived as sounding thin; this headphone was one of the models that had bass leakage problems for some listeners due to a poor seal on their ears.

Above: The left and right channel frequency response measurements of each headphone are shown above the  mean preference rating and 95% confidence interval it received in blind listening tests. The dotted green response on each graph shows the "perceived spectral balance" based on the listeners' responses.


Conclusions

In conclusion, this headphone study is one of the first of its kind to report results based on controlled, double-blind listening tests [2]. The results provide evidence that trained listeners preferred the headphones perceived to have the most neutral, spectral balance. The acoustical measurements of the headphone generally confirmed and predicted which headphones listeners preferred. We also found that bass leakage related to the quality of fit and seal of the headphone to the listeners'  head/ears can be a significant nuisance variable in subjective and objective measurements of headphone sound quality.

It is important for the reader not to draw generalizations from these results beyond the conditions we tested. One audio writer has already questioned whether headphone sound quality preferences of trained listeners can be extrapolated to tastes of untrained younger demographics whose apparent appetite for bass-heavy headphones might indicate otherwise. We don't know the answer to this question. For younger consumers, headphone purchases may be  driven more by fashion trends and marketing B.S. (Before Science) than sound quality.  While this question is the focus of future research, the preliminary data suggests  in blind A/B comparisons kids pref headphones with accurate reproduction to colored, bass-heavy alternatives.  This would tend to confirm findings from previous investigations into loudspeaker preferences of high school and college students (both Japanese and American) that so far indicates most listeners prefer accurate  sound reproduction regardless of age, listener training or culture.

Future headphone research may tell us (or not) that most people prefer accurate sound reproduction regardless of whether the loudspeakers are installed in the living room, the automobile, or strapped onto the sides of their head.  It makes perfect sense, at least to me. Only then will listeners hear the truth --  music reproduced as the artist intended.
________________________________

Footnotes
[1] Despite the paucity of good subjective measurements on headphones there does exist some online resources where you can find objective measurements on headphones. You will be hard pressed to find a manufacturer who will supply these measurements of their products. The resources include Headroom.com, Sound & Vision Magazine, and InnerFidelity.com.  Tyll Hertsens at InnerFidelity  has a large database of frequency response measurements of headphones that clearly illustrate the lack of consensus among manufacturers on how a headphone should sound and measure. There is even a lack of consistency among different models made by the same brand.

[2]  Sadly, studies like this present one are so uncommon in our industry that Sound and Vision Magazine  recently declared this paper as the biggest audio story in 2012. Hopefully that will change sooner than later.

Friday, November 30, 2012

Behind Harman's Testing Lab

This past week I had an enjoyable time meeting well-known technology writer Robert Scoble who was visiting our Harman facilities in Northridge, CA along with his geek-in-command Sam Levine. As part of the tour, I showed  them our Reference Listening Room and Multichannel Listening Lab where we do product research and double-blind evaluations of loudspeakers. We discussed the science and philosophy behind how we design and measure the sound quality of our products.



One of the topics of discussion was my recent research that explores whether high school and college students from USA and Japan have different tastes and preferences in the quality of reproduced sound compared to older trained listeners.  We talked about differences in the tastes and performances of trained versus untrained listeners, and how Harman is able to accurately predict  subjective preference ratings of loudspeakers based on a predictive model that analyzes a set of comprehensive anechoic measurements.

After running Robert and Sam through a few trials of listener training using  our software "How to Listen", I decided to put them through a couple of double-blind listening test trials to see if they had the right stuff. They compared four different brands of floor-standing loudspeakers located  behind an acoustically transparent, visually opaque curtain where each loudspeaker is shuffled into the same position via an automated speaker shuffler. All of our tests are conducted double-blind because we have found that even trained listeners are influenced by nuisance variables such as brand, price, size, etc.

 In these tests Robert and Sam heard the same four loudspeakers that have been evaluated previously by hundreds of untrained listeners including young, old, American, Asian, and European listeners, whose preferences and performances were compared to those of our panel of trained listeners. From these tests, we have found evidence that most listeners prefer the most accurate, neutral loudspeaker regardless of age, culture or listening experience.

When the listening trials were done, the curtain went up, and Robert and Sam were surprised to discover their favorite choice was the most accurate loudspeaker which was the least expensive. The science works.  One of the speakers Robert didn't like was a model that he actually owned: it had excessive amounts of treble and upper bass, which I'm told is mandated by the manufacturer's marketing department who believe that "boom and tizz" are what their customers want. Luckily, I haven't met many of their customers, yet. Robert, then surprised me by turning on his camera doing an impromptu interview, which hopefully you'll enjoy. If you want to learn more about the engineering process and tools behind designing a speaker, check out the interview with one of our speaker engineering stars, Charles Sprinkle.

In my next blog posting I hope to discuss some of the exciting research we've been doing on the relationship between the perception and measurement of headphone sound quality. The goal is to develop the same science for measuring and predicting the sound quality of headphones that we've found useful for designing good sounding loudspeakers.  Stay tuned!


Thursday, May 10, 2012

More Evidence that Kids (American and Japanese) Prefer Accurate Sound Reproduction



Geoffrey Morrison, an audio writer at CNET and Sound & Vision has posted a nice summary  of my latest AES paper "Some New Evidence that Teenager and College Students May Prefer Accurate Sound Reproduction" presented at the recent  132nd AES Convention in Budapest, Hungary.


The paper is available for download here at the  AES E-library, and I have provided a YouTube video and a PDF of my presentation slides that summarize the main points of the research.


 The abstract of the paper reads as follows:


A group of 58  high school and college students with different expertise in sound evaluation participated in two separate controlled listening tests that measured their preference choices between music reproduced in (1) MP3 (128 kbp/s) and lossless CD-quality file formats, and (2) music reproduced through four different consumer loudspeakers. As a group, the students preferred the CD-quality reproduction in 70% of the trials and preferred music reproduced through the most accurate, neutral loudspeaker. Critical listening experience was a significant factor in the listeners’ performance and preferences. Together, these tests provide some new evidence that both teenagers and college students can discern and appreciate a better quality of reproduced sound when given the opportunity to directly compare it against lower quality options. 


The effects of culture and trained versus untrained listeners on loudspeaker preference are topics that have been discussed in previous postings on Audio Musings. To further shed some light on this topic, I also ran 149  native speaking Japanese college students through the same loudspeaker preference test along with 12 Harman trained listeners.  The graph below shows the mean loudspeaker preference ratings for these two groups of listeners along with the four different groups of high school and college students from Los Angeles.  




Not surprising, (at least to me) I found that the Japanese college students on average preferred the same accurate loudspeaker (A) as did the 58  Los Angeles students, and the trained Harman listening panel. The main differences among the different listening groups  were related to the effect of prior critical listening experience:  the more trained listeners simply rated the loudspeakers lower on the preference scale, and were more discriminating and consistent in their responses. This result is consistent with previous studies. The least preferred and least accurate loudspeaker (Loudspeaker D) generated the most variance in ratings among the different listening groups. This  was explained by its highly directional behavior combined with its inconsistent frequency response as you move from on-axis to off-axis seating positions. This meant that listeners sitting off-axis heard a much different (and apparently better quality) sound than those listeners  sitting on-axis.


 While the small sample size of listeners doesn't allow us to make generalizations to larger populations, nonetheless it is reassuring  to find that  both the American and Japanese students, regardless of their critical listening experience, recognized good sound when they heard it, and preferred it to the lower quality options.


It would appear that the reason kids don't own better sounding audio solutions has nothing to do with their supposed "deviant"  tastes in sound quality, but more do with  other factors  (e.g. price, convenience, portability, marketing, fashion) that have nothing to do with sound quality.  Music and audio companies should take notice that kids can indeed discriminate between good and bad sound, and prefer accurate sound, despite what the media has been falsely reporting for the last few years. With that out of the way, we should focus on figuring out how to sell sound quality to kids at affordable prices and form factors  they desire to own.


The research suggests that if we cannot figure out how to sell better sound to kids, we have no one to blame but ourselves. 

Thursday, April 21, 2011

Topics Related to Perception and Measurement of Reproduced Sound


On Tuesday, April 26th 2011, I will be giving a presentation at the meeting of the Los Angeles AES Chapter on several topics related to recent audio research at Harman International. The topics include:

I've briefly discussed these topics in Audio Musings over the past few months, and you can find summaries of them by clicking on the links above. I'll be giving an update on new findings, and briefly touch on topics not mentioned above. As a door prize, Harman will donate a free copy of Dr. Floyd Toole's book Sound Reproduction (shown on the right side bar) autographed by the author of the book.

AES members and nonmember guests are welcome to attend. The meeting will be held at the Sportmen's Lodge in Studio City. More details can be found at the Los Angeles AES website.

Sunday, April 3, 2011

Version 2.04 of Harman How to Listen Now Available For Download!

Version 2.04 of Harman How to Listen is now available for download here.

This update fixes the problem with the noise and hum attribute tests. We've also updated the user's manual to help navigate around some installation issues some users have reported.

Friday, March 25, 2011

Version 2.03 of Harman How to Listen Now Available For Download!



You can download the latest update of Harman How to Listen  (version 2.03) here. This update fixes a bug in the Windows version that prompted listeners to locate program material that was not packaged with the installer. There is no significant change to the Mac version. Enjoy!

Tuesday, July 13, 2010

The Danger From Headphones

Below is an English translation of a recent article "Gefahr aus dem Kopfhörer" (The Danger From Headphones) written by Matthias Hohensee over at Valley Talk. His article refers to my recent investigations into whether younger generations prefer lossy MP over higher quality music file formats. The preliminary results of that study were reported in the article I recently posted called, “Some New Evidence that Generation Y May Prefer Accurate Sound Reproduction”.
Matthias makes a good point about listener preference for MP3 becoming a moot issue with higher quality file formats becoming the standard, as bandwidth and music storage costs drop. I only briefly mentioned this in my slide presentation (see slide 7), but it deserves repeating. The days of low quality music downloads are numbered, I hope. Then, the main sound quality issue will become the recordings themselves, and the quality of the headphones and loudspeakers through which the recordings are heard. What are your thoughts on this matter?

The Danger from Headphones
by Matthias Hohensee
from Valley Talk 6.30.2010
Can the Germans be really proud of MP3 or has the digital stroke of genius desensitized the hearing of a complete generation?

When Angela Merkel recently visited the prestigious Stanford University in Silicon Valley and enumerated German technology services, she also mentioned the data compression method MP3. The technology that was largely developed by scientists at the Fraunhofer Institute has changed the music industry, even though it’s mainly U.S. companies that profit from the MP3 player market.
But can the Germans be really proud of MP3? Or has the digital stroke of genius desensitized the hearing of a complete generation? At least the observations of Jonathan Berger suggest this. Over the years the Stanford professor of music has been asking his students if they are satisfied with compressed music files, or if they prefer the full Hi-Fi sound.
He came to a surprising result: For years, the number of those who preferred the sound of ‘packed’ music to the uncompressed audio spectrum seems to grow steadily. Berger concludes that the taste of sound has changed.
Good sound is measurable
Sean Olive, on the other hand, considers Berger's insight as nonsense: "Good, accurately reproduced sound is not a question of taste, but scientifically measurable." And this is the way he should see it. After all, Olive is the head of acoustic research at Harman International. The U.S. manufacturer is considered to be THE address for sophisticated sound systems.
Alarmed by Berger's observations Olive recently invited Los Angeles high school students to the Harman studios for extensive tests. "Everyone could hear the difference between different compacted sound files - and preferred less compressed songs," says the scientist relieved.

Danger from headphones
Now, Olive is not really unbiased, after all Harman sells nearly three billion dollars worth of high-end audio technology per year. But in fact, technical progress makes Olive's worries already obsolete. In times of high-speed Internet, data compression does not play the same role as it did in the nineties when the music piracy supplier Napster made MP3 popular.
The songs that were exchanged back then were extremely compressed in order to distribute them via the still slow Internet connections – but also to spare the limited memory of computers and MP3 players. Today the vendors such as Apple and Amazon are selling songs which are formatted in such a way that only real audiophiles can hear the difference to music CDs.
And so the real dangers for the hearing of ‘generation iPod’ aren’t the highly compressed music files, but simply the volume adjustment of their headphones.


Acknowledgements: Thank you to the author Matthias Hohensee for permission to repost his article here, and to Alena Winterhoff for the English translation.

Friday, July 9, 2010

Why Live-versus-Recorded Listening Tests Don't Work


Figure 1: Singer Frieda Hempel conducting a Tone Test at Edison Studios, NYC in 1918. Note that many of the listeners' ears are covered by the blind folds making it a double blind and double deaf listening test, since the experimenter Edison was deaf himself.


Recently I was asked how I could possibly prove or assert that listeners prefer accurate loudspeakers without having performed a live-versus-recorded listening test. This is a test where the listener compares a live musical performance to a recording of the performance reproduced through loudspeakers. The closer the sound quality of the reproduction is to that of the live performance, the more accurate the loudspeaker is deemed to be - at least in theory. In practice, these tests are usually ridden with so many uncontrolled listening test nuisance variables that the results are essentially meaningless. This article examines why live-versus-recorded listening tests are not suitable for serious scientific investigations of the perceived sound quality of recorded and reproduced sound.


Edison’s Tone Tests: “People will hear what you tell them to hear”
Thomas Edison was among the first audio engineers to embrace live-versus-recorded demonstrations. In 1910, he invented the Edison Diamond Disk Phonograph, which he claimed had “no tone” of its own. To prove it, a series of road shows involving 4,000 live-versus-recorded demonstrations of his phonograph were conducted in auditoriums across the United States At some point during the live music performance there would be a switch over to the recorded performance, and apparently audience members could not tell the difference between the live and recorded performances

After a 1916 live-versus-recorded demonstration in Carnegie Hall, the New York Evening Mail stated “the ear could not tell when it was listening to the phonograph alone, and when to actual voice and reproduction together. Only the eye could discover the truth by noting when the singer’s mouth was open or closed” [1].


By today’s standards, the fidelity of Edison’s disc phonograph was egregious in terms of its noise, distortion, limited dynamic range, bandwidth and frequency response (you can hear some of Edison’s recordings online here). It’s hard to imagine that listeners were fooled into thinking his Diamond Disk recording was indistinguishable from the live performance. In fact, we now know that Edison manipulated the tests to produce the results he wanted. First, he carefully chose the music and musicians to work within the technical limitations of his technology. Edison detested music with extreme dynamics, high tones, vibrato and complex textures because they were a challenge to his deafness and his Tone Tests. He selected and coached musicians to mimic the sound of their recordings to minimize the audible differences between live and recorded performances [1],[2].


Second, Edison was the consummate audio salesman and was known to say, “People will hear what you tell them to hear” [2]. The expectations and perceptions of his listeners were manipulated before the test to produce a more predicable outcome. Audience members were given a concert program before his Tone Tests that clearly told them exactly what they would hear, how amazing it will sound, and what an appropriate response would be:


“Those who hear this test will realize fully for the first time how literally true it is that Mr. Edison has made possible the re-creation of the artist’s voice. No more exacting test could be made to demonstrate that the New Edison actually does re-create the voice of the artist than to play it side by side with the artist who made the records. This is the final proof. Close your eyes. See if you can distinguish the voice of the New Edison from that of the artist. Did you ever believe it possible to re-create a voice? Note that the voice of the artist and the voice of the Edison are indistinguishable” [emphasis is mine] [ 3].


Figure 2: Another Edison Tone Test where extraneous biases related to sight and smell may have compromised the results based on the large number of listeners covering their noses. Perhaps a bad case of singer's halitosis made it possible to identify the live performance from the recorded one based on smell alone?


Other Live-Versus-Recorded Demonstrations

Following Edison’s live-versus-recorded demonstrations, other tests have been conducted by Harry Olson at RCA, and G.A. Briggs (Wharfedale) and Peter Walker at Quad in the 1950’s. [4]. A common problem with these demonstrations was double reverberation: the reverberation of the room was heard both in the recording, and again when it was reproduced through loudspeakers in the same room. This made it easier for listeners to tell the difference between the recorded and live performances.


Acoustic Research's Live-Versus-Recorded Demonstrations

During the 1960’s, Acoustic Research (AR), an American loudspeaker company, performed over 75 live-versus-recorded concerts in cities around the USA featuring The Fine Arts String Quartet, and the AR-3 loudspeaker [5],[6]. To solve the double reverberation problem, the recordings of the quartet were made in an anechoic chamber, or outdoors. Outdoor live-versus-recorded demonstrations had the added benefit that there were no room reflections in either the recording or the live performance. This made the demonstrations less sensitive to off-axis problems in the microphones and loudspeakers. It also relaxed the demands on the recording-reproduction to accurately capture and reproduce the complex spatial properties of a reverberant performing space.


The AR demonstrations apparently generated an enormous amount of free publicity in newspapers and audio magazines where it was reported that the reproduction of the recordings was virtually indistinguishable from the live performance. AR sales increased dramatically, to the point where in 1966 AR apparently owned 32% market share of loudspeakers sold in the United States.



A Live-Versus-Recorded Method For Testing Loudspeaker Accuracy

Edgar Villchur, head of Acoustic Research, to his credit, was a firm believer that loudspeakers should accurately reproduce the art (the recorded music) and not editorialize or enhance it. In a 1962 paper, he described a live-versus-recorded method for evaluating the accuracy of loudspeakers [7]. The method used a reference loudspeaker (the live performance) that was placed in the listening room with the loudspeaker-under-test. The goal of the loudspeaker-under-test was to accurately reproduce a previous recording of the reference loudspeaker playing white noise in an anechoic chamber. The original white noise signal was also fed to the reference loudspeaker during the listening test. The more similar the loudspeaker-under-test sounded to the reference speaker, the more accurate it was deemed to be, at least in theory.


Villchur acknowledged that the sensitivity and validity of the method depended on the quality of the reference loudspeaker, its directivity, and the choice of program material. White noise was more revealing of loudspeaker inaccuracies than music. His reference loudspeaker consisted of a single 2-inch midrange from an AR-3 loudspeaker selected because he found using multiple drivers caused acoustical inference that was audible in the anechoic chamber, but not so audible in a reverberant listening room; these differences would produce errors in the listening test. One wonders how a tiny 2-inch driver could have produced adequate high treble and low bass without distortion. These limitations would significantly limit the accuracy and usefulness of this listening test method.


Another problem with this method was that the anechoic loudspeaker recordings were made at a single point in space, and did not capture the directivity and off-axis characteristics of the reference loudspeaker. Unless the speaker-under-test had the same directivity and off-axis characteristics of the reference loudspeaker, it could never sound exactly the same in a reflective listening room. To compensate for these errors, Villchur used a trial-an-error process to find the best microphone position relative to the reference loudspeaker where the timbre of the anechoic recording best matched the timbre of the reference loudspeaker when placed in a room. Adjusting the recording to mimmic the sound of live performance was the reverse process of what Edison’s musicians did, but essentially it produced the same bias. (Edison would have been proud!)


Finally, it is not clear how Villchur controlled loudspeaker positional biases when comparing the reference loudspeaker to the loudspeaker-under-test. Loudspeaker positional biases have been shown to produce audible effects that are sometimes larger than the audible differences between different loudspeakers under test [9]. At Harman, these positional biases are eliminated via an automated speaker shuffler that places each loudspeaker in the same position of the room.


Summary of Problems with Live-versus-Recorded Tests

By today’s standards, the live-versus-recorded tests performed to date lack the necessary scientific controls and rigor to consider their results or conclusions accurate, repeatable and valid. Below are a few of the most significant psychological, physical, methodological or experimental listening variables that plague these types of tests. While it is possible to control some of these variables, others are either impossible, impractical or too expensive to control.



Sighted and Cross-Modality Biases

To date, most of the live-versus-sighted tests have been performed sighted, where non-auditory cues were available to allow the listener to identify whether they were hearing the live or reproduced sound source. These tests could have been easily made blind via an acoustically transparent curtain; however, scientific validity was apparently not the primary purpose of the test. The visual cues from the musicians (bowing, lip syncing) would also enhance the realism and presence of the reproduction, a well-known cognitive effect observed in research of binaural and virtual reality displays.


Listener Expectation, Authority Bias, Group Interaction Bias

In many of the public live-versus-recorded demonstrations, listeners expectations were manipulated by knowledge given to them by the organizers of the demonstrations. In some cases, listeners were told what the expected response should be before the test began (see Edison's concert programs above). In large groups settings, listeners' responses can be easily swayed by the opinions and reaction of other members in the group (a herd mentality), especially when an authority member is present. These biases are easily removed from live-versus-recorded tests by repeating the test for each individual listener. The live and recorded performances would have to be replicated for every listener, which makes the tests too difficult, expensive, time consuming, and impractical to use.


Qualifications of Listeners

None of the live-versus-recorded tests I've read about have reported the hearing and critical listening qualifications of the listeners who participated in them. These are important variables in the sensitivity and reliability of the test results, and can be easily quantified.


Live and Recorded Performances Must Be Identical

For live-versus-recorded tests to be valid, the live and recorded performance should be identical, having the same notes, intonation, tempo, dynamics, loudness, balance between instruments, and the same location and sense of space of the instruments. Otherwise, there are extraneous cues that allow listeners to readily identify the live and recorded performances. Midi-controlled instruments (e.g. player pianos) are but one example of how this problem could be resolved.


Positional Biases from Live and Reproduced Sound Sources

Unless the live and reproduced (e.g. loudspeakers) sound sources occupy the same physical locations, the listener can always identify the live versus recorded versions based on the localized positions of the sound sources.


Errors in the Recording

The usefulness of live-versus-recorded methods for perceptual measurements of sound quality in the playback chain is severely limited by errors in the recording. The recording errors are not easily separated from the errors in the playback chain (see circle-of-confusion). Microphones and microphone techniques both contain errors that limit the timbral, spatial and dynamic accuracy of the recordings through which we judge loudspeakers. Apparently the most effective live-versus-recorded demonstrations were conducted outdoors - effectively an anechoic environment - where the off-axis performances of the microphones and loudspeakers, and the complex spatial cues of a reflective room were largely removed as factors from the experiment. However, results from outdoor live-versus-recorded tests cannot be generalized to how the loudspeakers would perform in real rooms, where the off-axis sounds provide a significant contribution towards the listener's impression of the loudspeaker.


Lack of Proper Scientific Protocols, Listener Response Data, Statistical Analysis, Results

The most interesting characteristic of live-versus-recorded tests is that they never seem to provide listener response data, statistical analysis or published results. Eyewitness reports written in newspapers or magazines do not constitute scientific evidence.


Accuracy is Not Applicable to Most Recordings Made Today

Most recordings made today are not intended to sound like the live performance. Anyone who heard Taylor Swift's live performance with Stevie Nicks at the 2010 Grammy Awards understands why.(Note: you can relive the magical moment on Youtube. Warning: this may be offensive for the musically-inclined). About 90% of commercial recordings are studio creations consisting of a series of overdubs, processed with auto-tuning, equalization, dynamic compression, and reverb sampled from an alien nation. For these recordings, there is no equivalent live performance to which the recording/reproduction can be compared for accuracy. The only reference is what the artist heard over the loudspeakers in the recording control room. If the important performance aspects of the playback system through which the art (the music and recording) was created can be reproduced in the home, then the consumer will hear an accurate reproduction of the music, as the artist intended. It is possible to achieve this if we adopt a science in the service of art philosophy towards audio recording and reproduction.


Conclusions

In reviewing the history of live-versus-reproduced tests, most have been performed as elaborate sales and marketing demonstrations designed to fool listeners into believing that a product sounded much better and more accurate than it actually was. While live-versus-recorded tests have proven their merit as an effective marketing and sales tool, they have not yet proven themselves as a serious method for scientific experiments intended to advance our psychoacoustic understanding of music recording and reproduction.


The reason for this, I believe, is that live-versus-recorded tests do not adequately control important listening test nuisance variables, a prerequisite for accurate, reliable and scientifically valid results. It is not entirely coincidental, that (to my knowledge) none of the live-versus-recorded tests to date have produced a single scientific publication or new psychoacoustic knowledge.


Hopefully, you now understand why I don’t conduct live-versus-recorded loudspeaker listening tests.


References

[1] Harvith, J., and Harvith, S. Edison, Musicians and the Phonograph: A Century in Retrospect, Greenwood Press, N.Y (1987).

[2] Andre Milliard, “Edison’s Tone Tests and the Ideal of Perfect Sound Reproduction,” from Lost and Found Sounds’, NPR.

[3] Program for Edison Demonstration http://www.nipperhead.com/old/tonetest04.htm

[4] Wharfedale History: http://www.wharfedale.co.uk/About/History/tabid/66/Default.aspx

[5] Acoustic Research http://en.wikipedia.org/wiki/Acoustic_Research

[6] Edgar Villchur, http://edgarvillchur.com/

[7] Villchur, Edgar, “A Method of Testing Loudspeakers with Random Noise”, J. Audio Eng. Society, Vol. 10, Issue 4, pp, 306-309 (October 1962),

[8] Kissinger, John R.The Development of the Simulated Live-vs-Recorded Test into a Design Tool, presented at the 35th AES Convention, preprint 609, (October 1968

[9] Olive, Sean E.; Schuck, Peter L.; Sally, Sharon L.; Bonneville, Marc E. “The Effects of Loudspeaker Placement on Listeners' Preference Ratings”,JAES Volume 42 Issue 9 pp. 651-669; September 1994.