Showing posts with label Harman International. Show all posts
Showing posts with label Harman International. Show all posts

Friday, December 20, 2013

Harman Kardon factory tour: Pure to the art of sound

Some of the Harman and competitor headphones that we've recently tested.
Giving Harman R&D lab tours and presentations to customers and audio journalists is a part of my job. Recently, we played host to some automotive journalists in town attending the Los Angeles Auto show.

Automotive audio journalist Shawn Molnar wrote this great article about his visit to our R&D labs that you can read in his popular BMWBlog.

Shawn gives an overview of  his visit to our labs where we showed the journalists our R&D facilities used for testing and evaluating Harman loudspeakers, headphones and automotive audio systems.  Most people I meet know Harman for its JBL and Harman Kardon consumer and professional products. They are surprised to lean that 75% of our sales are from Harman branded (JBL, Harman Kardon, Infinity, Lexicon, Mark Levinson) automotive audio and infotainment systems.

Research by CEA has found that consumers now spend almost as  much time listening to music in their cars as they do in their homes.  Moreover, the audio experiences in the car are increasingly more sonically satisfying than those experienced in the home. Branded audio systems in premium cars typically provide 7-channel surround sound through 16+ loudspeakers that deliver a full-range, balanced, enveloping sound stage that can reach concert sound pressure levels. Compare this to the tinny, spatially bereft stereo speakers in your MacBook Pro or flat panel TV, and you begin to understand why people are listening to music through headphones when they're not listening in their cars.

So, why don't automotive journalists spend more time writing about audio / infotainment systems in cars given that consumers tell us it's an important factor in their overall satisfaction rating of the car?  When reading reviews of new cars, wouldn't it be nice to hear more about the quality of the audio system than how many heated cup holders it has?

Wednesday, July 3, 2013

The Science and Marketing of Sound Quality

To my surprise, this morning an audio friend tweeted a link to an article I recently wrote for our company's  internal newsletter  entitled, "The Science and Marketing of Sound Quality."  My article can be found on a new Harman Innovation website  launched today that features articles on current and future disruptive technology that will impact consumers' infotainment experiences. Check it out.

My article focuses on a longstanding pet peeve of mine (first mentioned in this blog posting): The lack of  perceptually meaningful loudspeaker and headphone specifications in our industry.  While consumer surveys repeatedly report sound quality to be a driving factor in their audio equipment purchases, consumers lack the necessary tools and information to identify the good sounding products from the duds.

This is particularly true for loudspeakers and headphones where the typical throw-away "10 Hz to 40 kHz" specification provided by the manufacturer is utterly useless. This specification only guarantees that the product makes sound, with no guarantee that the sound is good.  While the science exists today to accurately quantify and predict the perceived the sound quality of  loudspeakers (and hopefully, soon headphones), the audio industry continues to drag its heels into the 21st century,  and not routinely provide this information to consumers.

A rare exception is JBL Professional who provides comprehensive detailed measurements on studio/broadcast monitors like the new JBL M2 Master Reference shown below. Inspecting the measured frequency response curves shown  below, you can easily recognize the loudspeaker sounds exceptionally neutral and accurate based on the shape (flat, smooth, and extended)  Based on this set of measurements, we can predict how a listener would rate the sound quality of the loudspeaker in a controlled listening test, with 86% accuracy. The only pertinent information not shown in this graph is how loud the loudspeaker will play before producing audible distortion (trust me, this loudspeaker will play very loud! )

Perceptually meaningful loudspeaker specifications like these have been available for almost 30 years! Yet,  these specifications are currently not part of any professional and consumer loudspeaker standard. Such a standard would go a long way towards improving the quality and consistency of recorded and reproduced sound. Audio consumers want to hear the truth. We need to provide better information and audio specifications so they can find it.

JBL M2 Master Reference Monitor provides true reference sound quality that is clearly indicated by its technical measurements shown below. 
The spatially-averaged frequency response curves of the JBL M2  (from top to bottom) for the listening window (green), the first reflections (red), and the total radiated sound power.  At the bottom are shown directivity indices of the sound power (dotted blue) and first reflections (dotted red). These measurements tell us that the quality of the direct and reflected sounds produced by the loudspeaker will be very accurate and neutral over a relatively wide listening area.





Monday, July 1, 2013

Harman Researchers Make Important Headway in Understanding Headphone Response

Todd Welti, Sean Olive and Elisabeth McMullin are shown above with their custom binaural mannequin, "Sidney" wearing a pair of AKG K1000's. No fit or leakage issues with these headphones.
Tyll Hertsens, Chief Editor at Innerfidelity recently visited our research labs in Northridge, and wrote a nice story in his blog about our headphone research and visit to Harman. You can read the entire story here.

In his story, Tyll summarizes three of our recent AES papers on headphones, the first one of which I already wrote about in this blog. I hope to write summaries of the other two papers in the upcoming weeks when I can find some free time.

Friday, November 30, 2012

Behind Harman's Testing Lab

This past week I had an enjoyable time meeting well-known technology writer Robert Scoble who was visiting our Harman facilities in Northridge, CA along with his geek-in-command Sam Levine. As part of the tour, I showed  them our Reference Listening Room and Multichannel Listening Lab where we do product research and double-blind evaluations of loudspeakers. We discussed the science and philosophy behind how we design and measure the sound quality of our products.



One of the topics of discussion was my recent research that explores whether high school and college students from USA and Japan have different tastes and preferences in the quality of reproduced sound compared to older trained listeners.  We talked about differences in the tastes and performances of trained versus untrained listeners, and how Harman is able to accurately predict  subjective preference ratings of loudspeakers based on a predictive model that analyzes a set of comprehensive anechoic measurements.

After running Robert and Sam through a few trials of listener training using  our software "How to Listen", I decided to put them through a couple of double-blind listening test trials to see if they had the right stuff. They compared four different brands of floor-standing loudspeakers located  behind an acoustically transparent, visually opaque curtain where each loudspeaker is shuffled into the same position via an automated speaker shuffler. All of our tests are conducted double-blind because we have found that even trained listeners are influenced by nuisance variables such as brand, price, size, etc.

 In these tests Robert and Sam heard the same four loudspeakers that have been evaluated previously by hundreds of untrained listeners including young, old, American, Asian, and European listeners, whose preferences and performances were compared to those of our panel of trained listeners. From these tests, we have found evidence that most listeners prefer the most accurate, neutral loudspeaker regardless of age, culture or listening experience.

When the listening trials were done, the curtain went up, and Robert and Sam were surprised to discover their favorite choice was the most accurate loudspeaker which was the least expensive. The science works.  One of the speakers Robert didn't like was a model that he actually owned: it had excessive amounts of treble and upper bass, which I'm told is mandated by the manufacturer's marketing department who believe that "boom and tizz" are what their customers want. Luckily, I haven't met many of their customers, yet. Robert, then surprised me by turning on his camera doing an impromptu interview, which hopefully you'll enjoy. If you want to learn more about the engineering process and tools behind designing a speaker, check out the interview with one of our speaker engineering stars, Charles Sprinkle.

In my next blog posting I hope to discuss some of the exciting research we've been doing on the relationship between the perception and measurement of headphone sound quality. The goal is to develop the same science for measuring and predicting the sound quality of headphones that we've found useful for designing good sounding loudspeakers.  Stay tuned!


Thursday, May 10, 2012

More Evidence that Kids (American and Japanese) Prefer Accurate Sound Reproduction



Geoffrey Morrison, an audio writer at CNET and Sound & Vision has posted a nice summary  of my latest AES paper "Some New Evidence that Teenager and College Students May Prefer Accurate Sound Reproduction" presented at the recent  132nd AES Convention in Budapest, Hungary.


The paper is available for download here at the  AES E-library, and I have provided a YouTube video and a PDF of my presentation slides that summarize the main points of the research.


 The abstract of the paper reads as follows:


A group of 58  high school and college students with different expertise in sound evaluation participated in two separate controlled listening tests that measured their preference choices between music reproduced in (1) MP3 (128 kbp/s) and lossless CD-quality file formats, and (2) music reproduced through four different consumer loudspeakers. As a group, the students preferred the CD-quality reproduction in 70% of the trials and preferred music reproduced through the most accurate, neutral loudspeaker. Critical listening experience was a significant factor in the listeners’ performance and preferences. Together, these tests provide some new evidence that both teenagers and college students can discern and appreciate a better quality of reproduced sound when given the opportunity to directly compare it against lower quality options. 


The effects of culture and trained versus untrained listeners on loudspeaker preference are topics that have been discussed in previous postings on Audio Musings. To further shed some light on this topic, I also ran 149  native speaking Japanese college students through the same loudspeaker preference test along with 12 Harman trained listeners.  The graph below shows the mean loudspeaker preference ratings for these two groups of listeners along with the four different groups of high school and college students from Los Angeles.  




Not surprising, (at least to me) I found that the Japanese college students on average preferred the same accurate loudspeaker (A) as did the 58  Los Angeles students, and the trained Harman listening panel. The main differences among the different listening groups  were related to the effect of prior critical listening experience:  the more trained listeners simply rated the loudspeakers lower on the preference scale, and were more discriminating and consistent in their responses. This result is consistent with previous studies. The least preferred and least accurate loudspeaker (Loudspeaker D) generated the most variance in ratings among the different listening groups. This  was explained by its highly directional behavior combined with its inconsistent frequency response as you move from on-axis to off-axis seating positions. This meant that listeners sitting off-axis heard a much different (and apparently better quality) sound than those listeners  sitting on-axis.


 While the small sample size of listeners doesn't allow us to make generalizations to larger populations, nonetheless it is reassuring  to find that  both the American and Japanese students, regardless of their critical listening experience, recognized good sound when they heard it, and preferred it to the lower quality options.


It would appear that the reason kids don't own better sounding audio solutions has nothing to do with their supposed "deviant"  tastes in sound quality, but more do with  other factors  (e.g. price, convenience, portability, marketing, fashion) that have nothing to do with sound quality.  Music and audio companies should take notice that kids can indeed discriminate between good and bad sound, and prefer accurate sound, despite what the media has been falsely reporting for the last few years. With that out of the way, we should focus on figuring out how to sell sound quality to kids at affordable prices and form factors  they desire to own.


The research suggests that if we cannot figure out how to sell better sound to kids, we have no one to blame but ourselves. 

Thursday, April 21, 2011

Topics Related to Perception and Measurement of Reproduced Sound


On Tuesday, April 26th 2011, I will be giving a presentation at the meeting of the Los Angeles AES Chapter on several topics related to recent audio research at Harman International. The topics include:

I've briefly discussed these topics in Audio Musings over the past few months, and you can find summaries of them by clicking on the links above. I'll be giving an update on new findings, and briefly touch on topics not mentioned above. As a door prize, Harman will donate a free copy of Dr. Floyd Toole's book Sound Reproduction (shown on the right side bar) autographed by the author of the book.

AES members and nonmember guests are welcome to attend. The meeting will be held at the Sportmen's Lodge in Studio City. More details can be found at the Los Angeles AES website.

Sunday, April 3, 2011

Version 2.04 of Harman How to Listen Now Available For Download!

Version 2.04 of Harman How to Listen is now available for download here.

This update fixes the problem with the noise and hum attribute tests. We've also updated the user's manual to help navigate around some installation issues some users have reported.

Tuesday, July 13, 2010

The Danger From Headphones

Below is an English translation of a recent article "Gefahr aus dem Kopfhörer" (The Danger From Headphones) written by Matthias Hohensee over at Valley Talk. His article refers to my recent investigations into whether younger generations prefer lossy MP over higher quality music file formats. The preliminary results of that study were reported in the article I recently posted called, “Some New Evidence that Generation Y May Prefer Accurate Sound Reproduction”.
Matthias makes a good point about listener preference for MP3 becoming a moot issue with higher quality file formats becoming the standard, as bandwidth and music storage costs drop. I only briefly mentioned this in my slide presentation (see slide 7), but it deserves repeating. The days of low quality music downloads are numbered, I hope. Then, the main sound quality issue will become the recordings themselves, and the quality of the headphones and loudspeakers through which the recordings are heard. What are your thoughts on this matter?

The Danger from Headphones
by Matthias Hohensee
from Valley Talk 6.30.2010
Can the Germans be really proud of MP3 or has the digital stroke of genius desensitized the hearing of a complete generation?

When Angela Merkel recently visited the prestigious Stanford University in Silicon Valley and enumerated German technology services, she also mentioned the data compression method MP3. The technology that was largely developed by scientists at the Fraunhofer Institute has changed the music industry, even though it’s mainly U.S. companies that profit from the MP3 player market.
But can the Germans be really proud of MP3? Or has the digital stroke of genius desensitized the hearing of a complete generation? At least the observations of Jonathan Berger suggest this. Over the years the Stanford professor of music has been asking his students if they are satisfied with compressed music files, or if they prefer the full Hi-Fi sound.
He came to a surprising result: For years, the number of those who preferred the sound of ‘packed’ music to the uncompressed audio spectrum seems to grow steadily. Berger concludes that the taste of sound has changed.
Good sound is measurable
Sean Olive, on the other hand, considers Berger's insight as nonsense: "Good, accurately reproduced sound is not a question of taste, but scientifically measurable." And this is the way he should see it. After all, Olive is the head of acoustic research at Harman International. The U.S. manufacturer is considered to be THE address for sophisticated sound systems.
Alarmed by Berger's observations Olive recently invited Los Angeles high school students to the Harman studios for extensive tests. "Everyone could hear the difference between different compacted sound files - and preferred less compressed songs," says the scientist relieved.

Danger from headphones
Now, Olive is not really unbiased, after all Harman sells nearly three billion dollars worth of high-end audio technology per year. But in fact, technical progress makes Olive's worries already obsolete. In times of high-speed Internet, data compression does not play the same role as it did in the nineties when the music piracy supplier Napster made MP3 popular.
The songs that were exchanged back then were extremely compressed in order to distribute them via the still slow Internet connections – but also to spare the limited memory of computers and MP3 players. Today the vendors such as Apple and Amazon are selling songs which are formatted in such a way that only real audiophiles can hear the difference to music CDs.
And so the real dangers for the hearing of ‘generation iPod’ aren’t the highly compressed music files, but simply the volume adjustment of their headphones.


Acknowledgements: Thank you to the author Matthias Hohensee for permission to repost his article here, and to Alena Winterhoff for the English translation.

Saturday, April 10, 2010

Evaluating the Sound Quality of Ipod Music Stations: Part 2 Listening Tests


Part 1 of this article described a listening test method used at Harman International for evaluating the sound quality of Ipod Music Docking Stations. In part 2, I present the results of a recent competitive benchmarking listening test where three popular Music Stations of comparable price were evaluated by a panel of trained listeners. Were listeners able to reliably formulate a preference among the different Ipod Music Stations using this test method? And what were the underlying sound quality attributes that explain these preferences? Read on to find out.

Throughout this article, I will refer to slides in an accompanying PDF presentation, or you can watch a YouTube video of the presentation.


The Products Tested
A listening test was performed on three Ipod Music Stations that retail for the same approximate price of $599: the Harman Kardon MS 100, the Bose SoundDock 10, and the Bowers & Wilkins Zeppelin (see slide 2). All three products provide Ipod docking playback capability and an auxiliary input for external sources such as CD player,etc. The latter was used in these tests to reproduce a CD-quality stereo test signal fed from a digital sound source.


Listening Test Method
The Music Stations were evaluated in the Harman International Reference Listening Room (slide 4) described in detail in a previous blog posting. Each Music Station was positioned on a shelf attached to the Harman automated in-wall speaker mover, which provides the means for rapid multiple comparisons among three products designed to be used in, on, or near a wall boundary. The music stations were level-matched within 0.1 dB at the listening position by playing pink noise through each unit and adjusting the acoustic output level to produce the same loudness measured via the CRC stereo loudness meter .

All tests were performed double-blind with the identities of the products hidden via an acoustically transparent, but visually opaque screen. The listening panel consisted of 7 trained listeners with normal audiometric hearing. Each listener sat in the same seat situated on-axis to the Music Stations positioned at seated ear height, approximately 11 feet away (slide 5).

The Music Stations were evaluated using a multiple comparison (A/B/C) protocol whereby listeners could switch at will between the three products before entering their final comments and ratings based on overall preference, distortion, and spectral balance. This was repeated using four different stereo music programs with one repeat (4 programs x 2 observations = 8 trials). In total, each listener provided 216 ratings, in addition to their comments. The typical length of the test was between 30-40 minutes. The presentation order of the music programs and Music Stations were randomized by the Harman Listening Test software to minimize any order-related biases in the results.


Results: Overall Preference Ratings For the Music Stations
A repeated measures analysis of variance was used to statistically establish the effects and interactions between the independent variables and the different sound quality ratings. The main effect was related to the Music Stations with no significant effects or interactions observed between the program material and Music Stations. Note that in the following discussion, the brands/models of the Music Stations have removed from the results since this information is not relevant to the primary purpose of the research and this article. Instead, the Music Station products have been assigned the letters A,B and C in descending order according to their mean overall preference rating.

The mean preference ratings and upper 95% confidence intervals based on the 7 listeners are plotted in slide 7. Music Station A received a preference rating of 6.8, and was strongly preferred over the Music Stations B (4.58) and C (4.08).


Individual Listener Preference
The individual listener preference ratings and upper 95% confidence intervals are plotted in slide 8. The intra and inter listener reliability in ratings were generally quite high. All seven listeners rated Music Station A higher than the other two products, although some listeners, notably 55 and 64, were less discriminating and reliable than other the listeners. Both these listeners had significantly less training and experience than the other listeners, which has been demonstrated in previous studies to be an important factor in listener performance.


Distortion Ratings
Nonlinear distortion includes audible buzzes, rattles, noise and other level-dependent distortions related to the performance of the electronics, transducers, and mechanical integrity of the product’s enclosure. In these tests, the average playback level was held constant (78 dB(B) slow), and listeners could not adjust it up or down. Under these test conditions, some listeners still felt there were audible differences in distortion (slide 9) with Music Station A (distortion rating = 7.19) having less distortion than Music Stations B (5.5) and C (4.94).

Some of these differences in subjective distortion ratings could be related to a “Halo Effect," a scaling bias wherein listeners tend to rate the distortion of loudspeakers according to their overall preference ratings - even when the distortion is not audible. An example of “Halo Effect” bias has been noted in a previous loudspeaker study by the author [1]. Reliable and accurate quantification of nonlinear distortion in perceptually meaningful terms remains problematic until better subjective and objective measurements are developed.


Spectral Balance Ratings
Listeners rated the spectral balance of each Music Station across seven equally log-spaced frequency bands using a ± 5-point scale. A rating of 0 indicates an ideal spectral balance, positive numbers indicate too much emphasis within the frequency band, and negative numbers indicate a deemphasis within the frequency band. Rating the spectral balance of an audio component is a highly specialized task that requires skill and practice acquired through using Harman’s “How to Listen” listener training software application. In a previous study [1], it has been shown that spectral balance ratings are closely related to the measured anechoic listening window of the loudspeaker, although may vary with changes in the directivity and the ratio of direct/reflected sound at the listening location.

The mean spectral balance ratings averaged across all programs and listeners are plotted in slide 10. Listeners felt Music Station A had the flattest or most ideal spectral balance, with the exception of a need for more upper/lower bass, and less emphasis in the upper treble. Music Station B was judged to have too much emphasis in the upper bass (88 Hz), and too little emphasis in the upper midrange/treble. Music Station C was rated to have a slight overemphasis in the upper bass, and a very uneven balance throughout the midrange with a peak centered around 1700 Hz.


Listener Comments
Listeners provided comments that described the audible difference among three Music Stations. The frequency or number of times a specific comment was used to describe each product is summarized in slide 11. The correlation between the product’s preference rating and each descriptor is indicated by correlation coefficient (r) shown in the bottom row of the table. The same table data shown in slide 11 are plotted in graphical form in slide 12.

The most common three descriptors applied to the Music Station A were neutral (16), bright (9), and thin (9). These descriptors generally confirm the perceived mean spectral balance ratings summarized in slide 10.

The three most frequent descriptors applied to Music Station B were colored (13), boomy bass (10), and uneven mids(6). The “boomy bass” is clearly suggested in spectral balance ratings (see the large 88 Hz peak) in slide 10.

The three most frequent descriptors used to describe the sound quality of Music Station C were colored (19), uneven mids (9), and harsh (6). All three descriptors have a high negative correlation with the overall preference rating, and may explain the low preference rating this product received. The coloration and unevenness of the midrange are confirmed in the spectral balance rating in slide 10. The harshness is most likely related to the perceived spectral peak perceived around 1700 Hz.


Conclusions
This article summarized the results of a controlled, double-blind listening test performed on three comparatively priced Ipod Music Stations using seven trained listeners with normal hearing. The results provide evidence that the sound quality of Music Station A was strongly preferred over Music Stations B and C. There was strong consensus among all seven listeners who rated Music Station A highest overall. The Music Station preference ratings can be largely explained by examining the perceived spectral balance ratings of the products, which are in turn closely related to listener comments on the sound quality of the products.

The most preferred product, Music Station A, was perceived to have the flattest, most ideal spectral balance, and solicited frequent comments to its neutral sound quality. As the spectral balance ratings deviated from flat or ideal, the products received frequent comments related to coloration, boomy bass, and uneven midrange. While the distortion ratings were highly correlated with preference, more investigation is needed to determine the extent to which the distortion ratings are related to a possible scaling bias known as the “halo effect."

In part 3 of this article, I will present the objective measurements of these products - both anechoic and in-room acoustical measurements - to see if they can reliably predict the subjective ratings of the products reported here.


References
[1] Sean E. Olive, “ A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention, preprint 6113 (May 2004).

Thursday, March 11, 2010

A Method For Training Listeners and Selecting Program For Listening Tests

The benefits of training listeners for subjective evaluation of reproduced sound are well documented [1]-[3]. Not only do trained listeners produce more discriminating and reliable sound quality ratings than untrained listeners, but they can report what they perceive in very precise, quantitative and meaningful terms.


One of the unexpected byproducts of listener training is that it identifies which music selections are most sensitive to distortions commonly found within the audio chain [4]. This is exactly what was found in a series of listener training experiments the author reported in a 1994 paper entitled, “A method for training listeners and selecting program material for listening tests.” The following sections summarize the findings of those early experiments, which helped establish an objective method for training and selecting listeners and program material used for listening tests at Harman International over the past 16 years. A slide presentation summarizing the paper can be downloaded here, and will be referred to throughout the following sections.


Matching the Sound of Spectral Distortions to Their Frequency Response Curve


A computer-based training task was designed where listeners were required to compare different spectral distortions added to programs and then match the frequency response curve of the filter that generated the distortion (see slides 4-5). This was repeated using eight different equalizations and twenty different music selections digitally edited into short 10-20 s loops.


The equalizations included ±3 dB shelving filters at low (100 Hz) and high (5 kHz) frequencies, and ±3 dB resonances (Q = 0.66) centered at 500 Hz and 2 kHz (slide 6). An equalized version of the program (Flat) was always provided as a reference. The twenty music selections include classical, jazz and pop/rock genres with instrumentations that varied from solo instruments, speech and small combos to rock/combos and orchestras (slide 7). Pink noise was also included since this continuous broadband signal has been found to produce the lowest detection thresholds of resonances in loudspeakers [5],[6].


Eight untrained listeners with normal hearing participated in the training exercises, which were conducted over five separate listening sessions consisting of 32 trials each (slides 8 and 9). The presentation order of the equalizations, trials, and programs were randomized to prevent any order related biases. The listener’s performance was based on the percentage of correct responses given over the course of the five training sessions.


The Results


The training results were statistically analyzed using a repeated measures analysis of variance (ANOVA) to determine the effect the different music programs, equalizations, and trials had on the listeners’ performance in correctly identifying the different equalizations (slide 11).


Listener Performance Is Strongly Influenced by Program Selection


The single largest effect on the listener’s performance was the program selection. Slide 13 plots the mean listener performance scores for each of the twenty programs averaged across all eight equalizations. The percentage of correct responses ranged from a high of 88% (pink noise) to a low of 54% (jazz piano trio). Listeners performed the task best when using broadband, spectrally dense continuous signals like pink noise or pop/rock selections like Tracy Chapman, Little Feat, and Jennifer Warnes. Listeners performed worse on programs featuring solo instruments, small combos and speech that produced more discontinuous and narrow band signals. More about this later.



Equalization Context Influences Listener Performance


The effect of equalization on listener performance was surprisingly small (slide 14). There was a tendency for listeners to correctly identify the spectral distortions that occurred at low and high frequency regions versus the midband equalizations. The explanation for this can be found by examining the interaction effect between equalization * trial, indicating that listener performance depended on which combinations of equalizations were presented within a trial. In other words, the context in which an equalization was presented influenced listener performance (slide 15). These contextual effects can be summarized as follows:


  1. Listeners gave more correct responses when the presented equalizations were more separated in frequency.
  2. Listeners gave more correct responses when presented spectral boosts versus notches; spectral notches were often confused with spectral peaks located at slightly higher frequencies.
  3. Low frequency boosts were often confused with high frequency cuts (and vice versa).
  4. Low frequency cuts were often confused with high frequency boosts (and vice versa)



Greater frequency separation between different equalizations would produce more distinctive tonal or timbral differences that would help improve identification. The second observation confirms previous research that has found spectral notches are more difficult to detect than spectral peaks of similar bandwidth [5]. The one exception is broadband dips, which have similar detection thresholds as resonance peaks with equivalent bandwidth[6]. Observations c) and d) are related to each other, and are more difficult to explain. On first glance, it seems implausible that boosts and cuts separated five octaves apart can be confused with one another. A possible explanation is that listeners are using information across the entire bandwidth to judge the perceived perceive balance of the bass and treble. In this case, the slope or shape of the spectra must be an important factor (slide 16). Since a boost or cut of similar magnitude at opposite ends of the audio bandwidth produce similar broadband shapes or slopes, this might explain why listeners might confuse the two with each other.


Program and EQ Interact to Influence Listener Performance


There was also a significant interaction between program and equalization that affected listener performance. This interaction effect was most apparent for the programs presented in training session 3 where listener performance varied significantly depending on the combination of programs and equalization presented to the listener (slide 18). It seems plausible that these differences were related to differences in the spectra of the programs, which was confirmed by plotting the average 1/3-octave spectra of the four programs (slide 19). The largest listener response errors tended to occur when the equalization fell in a frequency range where there was little spectral energy in the programs (e.g. Programs P10 (Stan Getz) and P19 (Canadian Brass)). It makes sense that listeners cannot easily analyze the spectral distortions if the program material does not contain signals that make them audible.



Not All Listeners Are Equal to the Task


No amount of training will make me eligible for the Canadian Olympic hockey team - even if I were 25 years younger. Some people simply lack the innate mental and physical raw material to perform a highly specialized task. This is also true for critical listening as illustrated by the average performance scores of eight listeners after 5 listening sessions (slide 20). The range of individual listener performances range from 82% (listener 4) to 31% (listener 3). All listeners had normal hearing. Therefore, the reason for this large inter-listener variance in performance is related to other factors such as listener motivation, attentiveness, and their listening (and general) intelligence. Training data such as this, can provide an objective quantifiable metric for selecting the best listeners for audio product evaluations.



Practice Makes Perfect


The success of any listener training task that it can lead to measurable improvement in performance with repetition. Slide 21 show shows listener performance measured over five training sessions based on the eight listeners tested. The graph shows monotonic improvement in performance from 65% correct responses to 80% after five training sessions. Additional training sessions would most likely realize further gains in performance for some subjects. In other words, the training works!



Programs With Wider and Flatter Spectrums Improve Listener Performance (Why Tracy Chapman is as Good as Pink Noise)


Spectrum analysis was performed on the different program selections to see if this could explain the strong effect of program on listener performance. The 1/3-octave spectrum of each program was plotted based on a long-term average taken over the entire length of the loop. When we looked at the spectrums of the programs it became clear that this was a significant predictor of how well listeners would perform their task.


Slide 22 plots the average spectrum of four groups of program (5 programs in each group) rank ordered (from highest to lowest) according to the listener performance scores they produced. It clearly shows that the programs with the flattest and most extended spectrums (e.g. pink noise, pop/rock, full orchestra) were better suited for identifying spectral distortions. After pink noise, Tracy Chapman (program 2 in the above graph) had among the widest and flattest spectrums measured, and along with pink noise (program 1) registered the highest listener performance scores. Programs that had narrow band spectra with limited energy above and below 500 Hz (speech, solo instruments, small jazz and classical ensembles) concentrated in group 4 were less suited for identifying spectral distortions. While these groupings had some of the most musically entertaining selections, in the end, they were not good signals for detecting and characterizing spectral distortions in audio components.



Conclusions


A listener training method has been described that teaches listeners how to identify spectral distortions according to their frequency response curve. Experimental evidence was shown indicating listeners improved their performance in this task after 5 training sessions, although not all listeners are equal in their performance.


Statistical analysis of the training data revealed that the program selections are the largest factor influencing listener performance in this task: programs with continuous broadband spectra (e.g. pink noise, Tracy Chapman,etc) provide the best signals for characterizing spectral distortions whereas programs with narrow band spectra (e.g. speech, solo instruments) provide poor signals for performing this task. Furthermore, listeners seem to confuse certain types of spectral distortions with others when the distortions presented share similarities in their frequency, bandwidth, and broadband spectral slope or shape.

Finally, it is important to remember that the training methods and programs discussed in this study focussed on perception and analysis of spectral distortions. While these types of distortions are the most dominant ones found in loudspeakers, microphones and listening rooms, there are other types of distortions for which a different set of programs are likely better suited for revealing their audibility and subjective analysis. The current Harman listener training software “How to Listen” includes training tasks on spectral distortion as well as spatial, dynamic and various types of nonlinear distortions for which we hope to discover the optimal programs for detecting and analyzing their audibility. Stay tuned.



References


  1. Olive, Sean E., "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study,” J. Audio Eng. Soc. Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.
  2. Bech, Soren, “Selection and Training of Subjective for Listening Tests on Sound-Reproducing Equipment,” J. Audio Eng. Soc., vol. 40 no. 7/8 pp. 590-610 (July 1992).
  3. Toole Floyd E. "Subjective Measurements of Loudspeakers Sound Quality and Listener Performance," J. Audio Eng. Soc., vol. 33, pp. 2-32 (1985 Jan./Feb.).
  4. Olive, Sean E., “A Method for Training Listeners and Selecting Program Material for Listening Tests” presented at the 97th AES Convention, preprint 3893, (November 1994).
  5. Toole, Floyd E. and Sean E. Olive, “The Modification of Timbre by Resonances: Perception and Measurement,” J. Audio Eng. Soc., Vol. 36, pp. 122-142 (March 1998).
  6. Olive, Sean E.; Schuck, Peter L.; Ryan, James G.; Sally, Sharon L.; Bonneville, Marc E. “The Detection Thresholds of Resonances at Low Frequencies,” J. Audio Eng. Soc., Vol. 45, Issue 3, pp. 116-128 (March 1997).
  7. Olive Sean E., “Harman’s How to Listen - A New Computer-based Listener Training Program,” May 30,2009.

Friday, February 5, 2010

Evaluating the Sound Quality of Ipod Music Stations: Part 1


For many consumers, an iPod Music Docking Station may be the primary audio device through which they experience most of their recorded music and infotainment. These ubiquitous devices offer a convenient, low cost, portable and easy-to-use solution for enjoying an Ipod through loudspeakers -- but what about their sound quality? What sonic compromises are made in order to achieve this level of convenience and portability? Do certain models or brands of Ipod Music Stations offer better sound than others, and if so, how can consumers identify which ones they are? These are legitimate questions that consumers should be asking when purchasing an Ipod Music Station. Unfortunately, the answers are not readily found.


Choosing an Ipod Music Station based on sonic performance quality is a daunting task for consumers. There are dozens of models to choose from that vary in price from $80 to as high as $3000 for a model designed by Ferrari. Competent in-store demonstrations and reviews of these products are difficult to find, and the technical specifications on the packaging provide no clear indication of how good they sound. For traditional loudspeakers, it is already possible to quantify their sound quality, but the audio industry continues to withhold this information from consumers. Without meaningful performance specifications in place, consumers cannot make sound purchase decisions, nor can manufacturers be easily held accountable for delivering products that sound “ not good enough.”


This article describes a listening test method used at Harman International for evaluating the sound quality of Harman and competitors’ Ipod Music Stations. The goal is to provide subjective ratings of Ipod Music Stations that are accurate, reliable and scientifically valid. From this data, a set of technical performance specifications can be developed that quantify how good the products sound.


Designing Listening Tests For Ipod Music Stations


Fortunately, there already exists a large body of scientific knowledge on how to design accurate, reliable and valid listening tests on loudspeakers. A key ingredient is careful control of listening test nuisance variables: these are psychological, electro-acoustical and experimental factors not directly related to the product(s) under test but nonetheless influence and bias the results (click on the figure below). Some of the more significant nuisance variable controls that should be in place but often are ignored by audio manufacturers and reviewers are:

  • Double-blind conditions (this removes the effects of sighted biases related to brand, price,etc)
  • Trained listeners with normal hearing (trained listeners are up to 20 times more discriminating and reliable than untrained listeners, yet their overall sound quality preferences are similar to those of untrained listeners)
  • Quiet listening room with acoustics that are representative of average homes (important for hearing low level sounds and the quality of the loudspeaker's off-axis radiated sounds)
  • Loudness matching between products (the perception of timbre, spatial and dynamic attributes are level dependent)
  • Selection of well-recorded music selections that are revealing of sound quality differences
  • Multiple comparisons among products which are more discriminating and reliable compared to single stimulus presentations



These important nuisance variable controls are essential for obtaining accurate, reliable and valid sound quality ratings of Ipod Music Stations.



Including the Acoustical Effects of the Wall and Desktop in the Listening Test


If audio products are not tested under similar conditions for which they were designed and intended to be used, the ecological validity (as well as the external validity) of the test may be compromised: in other words, the test results will be of little value or relevance to how the product is typically used in the real world.


Most Ipod Music Stations are intended to be placed on a desktop surface or bookshelf located near a wall, which will cause acoustical reinforcement and cancellation at certain audio frequencies. Below 500 Hz, there will be a gradual increase in sound pressure level that unless compensated for in the design of the product can make vocals and bass instruments sound tubby and boomy. Diffraction effects or reflections from the desktop/bookshelf may also produce audible effects that should be included in the listening test. For these reasons, listening tests on Ipod Music Stations are best done on a desktop/wall boundary.



A Video On How We Evaluate the Sound Quality of Ipod Docking Stations


The video shown at the top of the page illustrates how Ipod Music Stations are currently evaluated in the Harman International Reference Listening Room. The acoustical properties and features of the room have been described in detail in a previous posting.


In the video you see a trained listener comparing three different Ipod Music Stations situated on our automated in-wall speaker mover configured with a removable shelf and desktop. An acoustically transparent, visually opaque screen is placed between the listener and the products under test, so that the test is double-blind (note: the term double-blind implies that neither the listener nor the experimenter know the identities of the products currently selected since the computer controls and randomly assigns the letters A/B/C to the products in each trial.)


The listener can switch between the different products at will and enter their responses via a wireless PDA equipped a custom listening test software (LTS) client application. Sound quality ratings are given on a number of different pre-defined scales that include preference, spectral balance, distortion, auditory image size.This is repeated twice using four different programs.


The PDA client communicates with the LTS server application that performs the following functions:


  • A test wizard that defines of all experimental design and setup parameters (perceptual scales, presentation of stimuli, program, randomization of test objects, playback level,etc), which are then stored in a database
  • automation and administration of the listening test and its hardware (e.g. speaker mover, media player, DSP, audio switcher)
  • collection, storage and statistical analysis of listening test data
  • real-time monitoring of listener’s performance and ratings during the test


LTS makes conducting listening tests an efficient and repeatable process by minimizing human interaction and errors in the listening test setup, storage, and analysis of the results.


Conclusions


This article has described a listening test method used for evaluating Ipod Music Stations with the goal to provide accurate, reliable and valid sound quality ratings. In Part 2, I will show some results from a recent listening test conducted on different Ipod Music Stations, followed by some different acoustical measurements of the products in Part 3. By studying the relationship between well-controlled scientific listening tests and comprehensive acoustical measurements of Ipod Music Stations, a meaningful technical specification based on sound quality can be found.