Saturday, January 3, 2009

Why Consumer Report's Loudspeaker Accuracy Scores Are Not Accurate

For over 35 years, Consumer Reports magazine recommended loudspeakers to consumers based on what many audio scientists believe to be a flawed loudspeaker test methodology. Each loudspeaker was assigned an accuracy score related to the "flatness" of its sound power response measured in 1/3-octave bands. Consumers Union (CU) - the organization behind Consumer Reports - asserted that the sound power best predicts how good the loudspeaker sounds in a typical listening room. Until recently, this assertion had never been formally tested or validated in a published scientific study.

In 2004, the author conducted a study designed to address following question: "Does the CU loudspeaker model accurately predict listeners' loudspeaker preference ratings?" (see Reference 1). A sample of 13 different loudspeaker models reviewed in the 2001 August edition of Consumer Reports was selected for the study. Over the course of several months, the 13 loudspeakers were subjectively evaluated by a panel of trained listeners in a series of controlled, double-blind listening tests. Comparative judgments were made among different groups of 4 speakers at a time using four different music programs. Loudspeaker positional biases were eliminated via an automated speaker shuffler. To control loudspeaker context effects, a balanced test design was used so that each loudspeaker was compared against the other 12 loudspeaker models, an equal number of times. This produced a total of 2,912 preference, distortion and spectral balance ratings, in addition to 2,138 comments.

The above graph plots the mean listener loudspeaker preference rating and 95% confidence intervals (blue circles), and the corresponding CU predicted accuracy score (red squares) for each of the 13 loudspeakers. The agreement between the listener preference and CU accuracy scores is very poor, indeed; in fact, the correlation between the two sets of ratings is actually negative (r = -.22) and statistically insignificant (p = 0.46). The most preferred loudspeaker in the test group (loudspeaker 1) actually received the lowest CU accuracy score (76). Conversely, some of the least preferred loudspeakers (e.g. loudspeakers 9 and 10) received the highest CU accuracy scores. In conclusion, the CU accuracy scores do not accurately predict listeners' loudspeaker preference ratings. Since this study was published, CU has begun to reevaluate their loudspeaker testing methods. Hopefully, their new rating system will more accurately predict the perceived sound quality of loudspeakers in a typical listening room.

In the next installment of this article, I will explain why the CU loudspeaker model failed to accurately predict listeners' loudspeaker preferences, and show some new models that work much better in this regard.

Updated 1/5/2009: Today, I was contacted by Consumer Reports who informed me that since 2006 they no longer publish loudspeaker reviews based on their sound power model that I tested in 2004. I was told their new model for predicting loudspeaker sound quality uses a combination of sound power and other analytics to better characterize what the listener hears in a room. In this regard, it is similar to the predictive model I developed, which I will discuss in an upcoming blog posting.


[1] Sean E. Olive, "A Multiple Regression Model for Predicting Loudspeaker Preferences using Objective Measurements: Part 1 -Listening Test Results," presented at the 116th AES Convention, May 2004.


  1. Interesting.

    What would be nice is to first show that these trained listeners are good enough to replicate the CU measurements, THEN to establish that is not the same as "preference" as demonstrated above.

    Frankly, as I renew my acquaintance with this literature... I am wondering if the training "taught" the panel to like a certain kind of sound?


  2. Hi Ben,

    I don't quite understand your first statement. The CU accuracy ratings are based on acoustical measurements made in an anechoic chamber; they are not perceptual measurements based on listening tests. So we cannot test whether our listeners can replicate the CU measurements since one is a perceptual measurement and the other is an acoustical measurement.

    We test our trained listeners against untrained listeners from time to time, and find they are not "biased" towards a certain sound. The trained listeners have similar loudspeaker preferences as a group of 300+ untrained listeners as documented in this study.


  3. I just saw your 'show' up around Boston and was searching for the original Consumer Reports ratings from back then. Is this something you'd care to help a brother out with? :)

    signed: -5

  4. Hi Anonymous
    The original source of CR ratings for the speakers in my paper came from Consumer Reports Magazine:

    "Small Boxes, big sound”, Consumer Reports, pp. 33-37, (Aug. 2001)."

    You may be able to find in at the CR website

  5. Correct me if I am wrong. Harnan tests a single speaker in mono and on-axis to the listener. How is this a real world representation of a consumer listening experience?