Showing posts with label loudspeaker preferences. Show all posts
Showing posts with label loudspeaker preferences. Show all posts

Sunday, June 14, 2009

Validation of a Binaural Room Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems


In a previous posting on Audio Musings, I described Harman’s binaural room scanning (BRS) measurement and playback system. BRS is a powerful audio research and testing tool that allows Harman scientists to capture, store and later reproduce through a head-tracking headphone-based auditory display the acoustical signature of one or more audio systems situated in the same or different listening spaces. BRS makes it practical to conduct double-blind listening evaluations of different loudspeakers, listening rooms, and automotive audio systems in a very controlled and efficient way.


I also pointed out that all binaural recording/playback systems contain errors that require proper calibration for their removal. However, removing all BRS errors can become very expensive and impractical, so some compromise is necessary. This precipitates the need to experimentally validate the performance of the BRS system to ensure that the remaining errors after calibration do not significantly change listeners’ perceptual ratings of audio systems evaluated through the BRS system as compared to in situ evaluations.

To this end, Todd Welti, Research Acoustician at Harman International, and I recently presented the results of a series of BRS validation tests performed using different equalizations of a high quality automotive audio system [1]. You can view the Powerpoint presentation of the conference paper here. For more detailed information on this experiment, you can view the proceedings from the recent 36th AES Automotive Conference in Dearborn, Michigan, when they become available in the AES e-library .


To assess the accuracy of the BRS system, a group of trained listeners gave double-blind preference ratings for different equalizations of the audio system evaluated under both in situ (in the car) and BRS playback conditions. For the BRS playback condition, the listener sat in the same car listening to a virtual headphone-based reproduction of the car's audio system. The purpose of the experiment was to determine whether the BRS and in situ methods produced the same preference ratings for different equalizations of the car's audio system.


Listeners gave preference ratings for five different equalizations using 4 different music programs reproduced in mono (left front speaker), stereo (left and right front channels) and surround sound (7.1 channels). The three playback modes were tested separately to isolate potential issues related to differences in how the BRS system reproduced front versus rear, and hard versus phantom-based, auditory images.


The listening test results showed there were no statistically significant differences in equalization preferences between the in situ and BRS playback methods. This was true for mono, stereo and multichannel playback modes (see slides 21-23). An interesting finding was that these results were achieved using a BRS calibration based on a single listener whose calibration tended to work well for the other listeners on the panel. This suggests that individualized listener calibrations for BRS-based listening tests may not be necessary, so long as the calibration and listeners are carefully selected.


In conclusion, this validation experiment provides experimental evidence that a properly calibrated BRS measurement and playback system can produce similar preferences in automotive audio equalization as measured using in situ listening tests.



Reference

[1] Sean E. Olive, Todd Welti, “Validation of a Binaural Car Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems,” presented at the 36th International AES Automotive Audio Conference, (June 2-4, 2009).

Sunday, January 11, 2009

What Loudspeaker Specifications Are Relevant to Sound Quality?



This past week I attended the International Loudspeaker Manufacturer’s Association (ALMA) Winter Symposium in Las Vegas where the theme was “Sound Quality in Loudspeaker Design and Manufacturing.” Over the course of 3 days there were presentations, round table discussions, and workshops from the industry’s leading experts focused on improving the sound quality of the loudspeaker. Ironically, the important question of whether these improvements matter to consumers wasn’t raised until the final hours of the symposium in a panel discussion called: “What loudspeaker specifications are relevant to perception?”

The panelists included myself, Steve Temme (Listen Inc.), Dr. Earl Geddes (GedLee), Laurie Fincham (THX), Mike Klasco (Menlo Scientific), and Dr. Floyd Toole (former VP Acoustic Engineering at Harman), who served as the panel moderator. After about 30 minutes, a consensus was quickly reached on the following points:

  1. The perception of loudspeaker sound quality is dominated by linear distortions, which can be accurately quantified and predicted using a set of comprehensive anechoic frequency response measurements (see my previous posting here)
  2. Both trained and untrained listeners tend to prefer the most accurate loudspeakers when measured under controlled double-blind listening conditions (see this article here).
  3. The relationship between perception and measurement of nonlinear distortions is less well understood and needs further research. Popular specifications like Total Harmonic Distortion (THD) and Intermodulation Distortion (IM) do not accurately reflect the distortion’s audibility and effect on the perceived sound quality of the loudspeaker.
  4. Current industry loudspeaker specifications are woefully inadequate in characterizing the sound quality of the loudspeaker. The commonly quoted “20 Hz - 20 kHz , +- 3 dB” single-curve specification is a good example. Floyd Toole made the observation that there is more useful performance information on the side of a tire (see tire below) compared to what’s currently found on most loudspeaker spec sheets (see Floyd's new book "Sound Reproduction").

For the remaining hour, the discussion turned towards identifying the root cause of why loudspeaker performance specifications seem stuck in the Pleistocene Age, despite scientific advancements in loudspeaker psychoacoustics. Do consumers really care about loudspeaker sound quality? Or are they mostly satisfied with the status quo? Why do loudspeaker manufacturers continue to hide behind loudspeaker performance numbers that are mostly meaningless, and often misleading?

The evidence that consumers no longer care about sound quality is anecdotal, largely based on the recent down-market trend in consumer audio. Competition from digital cameras, flat panel video displays, MP3 players, computers, and GPS navigation devices, has decimated the consumers' audio budget. This doesn't prove consumers care less about loudspeaker sound quality, only that there is less available money to purchase it. Marketing research studies indicate that sound quality remains an important factor in consumers' audio purchase decisions. Given the opportunity to hear different loudspeakers under controlled unbiased listening conditions, consumers will tend to prefer the most accurate ones. Unfortunately, with the demise of the speciality audio dealer and the growth of internet-based sales, consumers rarely have the opportunity to audition different loudspeakers - even under the most biased and uncontrolled listening conditions. This is a perfect opportunity and reason for why the industry needs to provide new loudspeaker specifications that accurately portray the perceived sound quality of the loudspeaker.

So why is the loudspeaker industry not moving more quickly towards this goal? In my view, complacency and fear are the major obstacles. The loudspeaker industry is very conservative and largely self-regulated. There are no regulatory agencies to force improvement, or even check whether a product's quoted specifications are compliant with reality. Change will only occur as the result of competition, or pressure exerted by consumers, industry trade organizations (e.g.CEDIA, CEA) or consumer product testing organizations, like Consumer Reports. The fear of adopting a new specification stems from the realization that a company can no longer hide beneath the Emperor's new clothes (i.e. the current specifications). A perceptually relevant specification would clearly identify the good sounding loudspeakers from the truly mediocre ones. In the future, a perceptual-based specification like the one illustrated to the right, could provide ratings on overall sound quality, and various timbral, spatial and dynamic attributes. The consumer could then choose a loudspeaker based on these measured attributes.

In conclusion, all evidence suggests that consumers highly value sound quality when purchasing a loudspeaker, yet current loudspeaker specifications provide little guidance in this matter. It is time the loudspeaker industry grows up and realizes this. Adopting a more perceptually meaningful loudspeaker specification would permit consumers to make smarter loudspeaker choices based on how it sounds. This would better serve the interests of consumers and loudspeaker manufacturers who view the sound quality of a loudspeaker to be its most important selling feature.

Saturday, January 3, 2009

Why Consumer Report's Loudspeaker Accuracy Scores Are Not Accurate


For over 35 years, Consumer Reports magazine recommended loudspeakers to consumers based on what many audio scientists believe to be a flawed loudspeaker test methodology. Each loudspeaker was assigned an accuracy score related to the "flatness" of its sound power response measured in 1/3-octave bands. Consumers Union (CU) - the organization behind Consumer Reports - asserted that the sound power best predicts how good the loudspeaker sounds in a typical listening room. Until recently, this assertion had never been formally tested or validated in a published scientific study.

In 2004, the author conducted a study designed to address following question: "Does the CU loudspeaker model accurately predict listeners' loudspeaker preference ratings?" (see Reference 1). A sample of 13 different loudspeaker models reviewed in the 2001 August edition of Consumer Reports was selected for the study. Over the course of several months, the 13 loudspeakers were subjectively evaluated by a panel of trained listeners in a series of controlled, double-blind listening tests. Comparative judgments were made among different groups of 4 speakers at a time using four different music programs. Loudspeaker positional biases were eliminated via an automated speaker shuffler. To control loudspeaker context effects, a balanced test design was used so that each loudspeaker was compared against the other 12 loudspeaker models, an equal number of times. This produced a total of 2,912 preference, distortion and spectral balance ratings, in addition to 2,138 comments.

The above graph plots the mean listener loudspeaker preference rating and 95% confidence intervals (blue circles), and the corresponding CU predicted accuracy score (red squares) for each of the 13 loudspeakers. The agreement between the listener preference and CU accuracy scores is very poor, indeed; in fact, the correlation between the two sets of ratings is actually negative (r = -.22) and statistically insignificant (p = 0.46). The most preferred loudspeaker in the test group (loudspeaker 1) actually received the lowest CU accuracy score (76). Conversely, some of the least preferred loudspeakers (e.g. loudspeakers 9 and 10) received the highest CU accuracy scores. In conclusion, the CU accuracy scores do not accurately predict listeners' loudspeaker preference ratings. Since this study was published, CU has begun to reevaluate their loudspeaker testing methods. Hopefully, their new rating system will more accurately predict the perceived sound quality of loudspeakers in a typical listening room.

In the next installment of this article, I will explain why the CU loudspeaker model failed to accurately predict listeners' loudspeaker preferences, and show some new models that work much better in this regard.

Updated 1/5/2009: Today, I was contacted by Consumer Reports who informed me that since 2006 they no longer publish loudspeaker reviews based on their sound power model that I tested in 2004. I was told their new model for predicting loudspeaker sound quality uses a combination of sound power and other analytics to better characterize what the listener hears in a room. In this regard, it is similar to the predictive model I developed, which I will discuss in an upcoming blog posting.

References

[1] Sean E. Olive, "A Multiple Regression Model for Predicting Loudspeaker Preferences using Objective Measurements: Part 1 -Listening Test Results," presented at the 116th AES Convention, May 2004.

Thursday, January 1, 2009

A Video on How We Measure Loudspeaker Sound Quality at Harman International


Part of my job at Harman International involves participating in audio dealer training and press events. This involves a 1-2 day field trip to Harman's R&D labs in Northridge where the visitors experience first-hand the listener training process, and participate in a double-blind loudspeaker listening test. Visitors usually leave our labs with a heightened appreciation and respect for the scientific efforts behind the development and testing of new models of Revel, JBL, and Infinity loudspeakers.

A few years ago, Infinity commissioned a video known as "Infinity Academy", aimed at  encapsulating  the 1-2 day training event onto a DVD. Chapter 6, the "Final Test," discusses listener training and the double-blind listening test, where trained listeners evaluate the Harman prototype loudspeaker against its best competitors. The goal is to achieve "best-in-class" performance, attainable only until  the prototype receives a preference rating higher than its best competitor. In the event that the loudspeaker fails on its first attempt, the listeners' feedback is used to re-engineer the loudspeaker, after which, it is re-submitted for another listening test.

The picture to the right shows three loudspeakers on the automated speaker shuffler in the Multichannel Listening Lab. The shuffler brings each loudspeaker into the exact same position within 3 seconds, so that any loudspeaker positional biases are removed from the listening test.

Chapter 6 can be downloaded in  MPEG-4 (H.264, 41 MB) or MPEG  (84 MB) formats.  The entire 6 chapters of the DVD are available here.

Tuesday, December 30, 2008

Sound Science - Loudspeaker R&D at Harman

The American artist Andy Warhol once said that everyone will eventually have their 15 minutes of fame. The closest I came was being on the cover of Test & Measurement magazine in November 2004. OK, admittedly T&M is not exactly People Magazine, but  1 or 2 pocket protector-wearing test engineers may have noticed the cover while shopping for a new digital oscilloscope or multimeter.

The title of the article is "Sound Science: Musical tastes differ, but tests show that listeners respond with the consistency of spectrum analyzers to loudspeaker performance."

The article explains the science behind loudspeaker R&D at Harman International and is written in a very approachable style for the audio layperson. You can read it here. 

Sunday, December 28, 2008

Part 3 - Relationship between Loudspeaker Measurements and Listener Preferences


Part 1 of this article presented experimental evidence from a study conducted by the author demonstrating that trained and untrained listeners prefer the same loudspeakers (see reference 1). Part 2 showed that the trained listeners performed 3 to 20 times better than untrained listeners based on their ability to give discriminating and reliable loudspeaker ratings. In part 3, we examine the relationship between the listeners' loudspeaker preferences and a set of anechoic measurements performed on the loudspeakers used in that study.

The mean loudspeaker preference ratings and 95% confidence intervals, averaged across all listeners, are plotted for each of the four loudspeakers (see the graph to the right). According to the definition of the preference scale, listeners liked loudspeakers P and I, were relatively neutral towards loudspeaker B, and they disliked loudspeaker M.


The next graph on the right shows a set of anechoic measurements for each of the four loudspeakers P, I, B, and M, shown in descending order based on their subjective preference rating. Each loudspeaker was measured at 70 different angles around its horizontal and vertical orbits in order to fully characterize the quality of its on and off-axis sounds, and allow removal of acoustical interference effects from resonances, which can cause harmful colorations to the reproduced sound. These resonances are visually presented as peaks and dips in the frequency response. In each graph, the frequency curves represent, from top to bottom, the quality of the direct sound, the average listening window, the first reflections, the sound power, and the directivity indices for the first reflections and the sound power. The reader is referred to references 2-4 for more background on how these measurements were derived and experimentally validated through controlled listening tests.

There are clear visual correlations between listeners' loudspeaker preferences and the set of frequency graphs. Both trained and untrained listeners clearly preferred the loudspeakers with the flattest, smoothest and most extended frequency response curves, as exhibited in the measurements of loudspeakers P and I. Loudspeaker B was rated lower due to its less extended, bumpy bass, and a large hole centered at 3 kHz in its sound power curve. The measurements of Loudspeaker M indicate it has a lack of low bass, and has a non-smooth frequency response in all of its measured curves. Both the direct and reflected sounds produced by this loudspeaker will contribute serious colorations to the timbre of reproduced sounds.

It is both satisfying and reassuring to know that both trained and untrained listeners recognize and prefer accurate loudspeakers, and that the accuracy can be characterized with a set of comprehensive anechoic measurements. The next logical step is to use these technical measurements as the basis for modeling and predicting listeners' preference ratings. This will be the topic of a future post in this blog.

References

[1] Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. (download for free courtesy of Harman International)
[2] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 1" J. AES Vol. 23, issue 4, pp. 227-235, April 1986. (download for free courtesy of Harman International).
[3] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2," J. AES, Vol. 34, Issue 5, pp. 323-248, May 1986. (download for free courtesy of Harman International)
[4] Allan Devantier, "Characterizing the Amplitude Response of Loudspeaker Systems," presented at the 113th AES Convention, October 2002.

Saturday, December 27, 2008

Part 2 - Differences in Performances of Trained Versus Untrained Listeners

Part 1 of this article summarized the results of a controlled listening experiment conducted by the author where 300+ listeners, both trained and untrained, rated 4 different loudspeakers based on their preference. The results revealed that the trained and untrained listeners had essentially the same loudspeaker preferences (see reference 1). This provides scientific validation for using trained listeners in loudspeaker tests since their preferences can be safely extrapolated to the preferences of the general population of untrained listeners.

In Part 2, we examine why trained listeners are preferred over untrained listeners for use in listening experiments, by examining differences in performance between the two groups. A common performance metric is the F-statistic, calculated by performing an analysis of variance (ANOVA) on the individual listener's loudspeaker ratings. The F-statistic increases in size as the listener's discrimination and reliability increases. This facet of listener performance is highly desirable for scientists (and bean counters) since fewer listeners and trials are required to achieve an equivalent level of statistical confidence. Some researchers have reported that one trained listener is the statistical equivalent of using 8+ untrained listeners, which translates into considerable cost savings for using trained listeners for audio product testing and research.

The above graph plots the mean loudspeaker F-statistics for 4 groups of untrained listeners categorized according to their occupations. The performance scores of the untrained groups are scaled relative to the mean scores of the trained listener in order to facilitate comparisons between trained and untrained listeners. The trained listeners clearly performed better than any of the untrained groups, by quite a large margin. The relative performance of the untrained groups, from best to worst, were the audio retailers (35%), the audio reviewers (20%), the audio marketing-sales group (10%), and the college students (4%).

The better performance of the audio retailers relative to the other untrained groups may be related to psychological factors such as motivation, expectations, and relevant critical listening experience. The college students - the poorest performing group - were also the youngest and least experienced test subjects. They tended to give all four loudspeakers very similar and very high ratings indicating they were easily satisfied. While this is pure speculation, the students may have had lower sound quality expectations developed through hours of listening to low quality MP3 files reproduced through band-limited earbuds. Most surprising was the relatively poor performance of the audio reviewers, who despite their credentials and years of professional experience, performed 1/5 as well as the trained listeners, and 15 full percentage points lower than the audio retailers. These differences in trained and untrained listener performance underscore the benefits of carefully selecting and training the listeners used for audio product testing and research.

In the next installment of this article, technical measurements of the loudspeakers used in these experiments will be presented. From this, we will explore what aspects of their performance lead to higher preference ratings in controlled listening tests.

Reference 1: Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. (This paper can be purchased from the Audio Engineering Society here, or downloaded for free courtesy of Harman International.)

Friday, December 26, 2008

Part 1- Do Untrained Listeners Prefer the Same Loudspeakers as Trained Listeners?


One of the more controversial topics among audio researchers is whether or not trained listeners should be used for audio product testing and research. The argument against using trained listeners is based on a belief that their tastes and preferences in sound quality are fundamentally different from those of the general untrained listener population for whom the product is intended.

There are few published studies to support the notion that trained listeners have different loudspeaker preferences than untrained listeners. To study this question, the author conducted a large study (see reference 1) that compared the loudspeaker preferences of 300+ untrained and trained listeners. Over the course of 18 months, an identical controlled, double-blind listening test was repeated with different groups of trained and untrained listeners who rated 4 different loudspeakers on an 11-point preference scale using 4 different music programs. Loudspeaker positional effects were controlled via an automated speaker shuffler that moves each loudspeaker into the exact same position.

The mean loudspeaker preference ratings for the different groups of listeners are summarized in the above graph. In terms of rank order, the loudspeaker preferences of the untrained listeners (highlighted in red) are essentially the same as those of the trained listeners (highlighted in blue). As a group, the trained listeners tended to give lower ratings, suggesting they may be more difficult to please. An important conclusion from this study is that the loudspeaker preferences of trained listeners can be safely extrapolated to the tastes of consumers having little or no formal listener training. The study did find significant differences between the trained and untrained listeners in terms of how well performed their listening task. This will be discussed in Part 2 that will appear in the next posting of this blog.

Reference 1: Sean. E Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003.

This paper can be purchased from the Audio Engineering Society here, or downloaded for free courtesy of Harman International.