For over 35 years, Consumer Reports magazine recommended loudspeakers to consumers based on what many audio scientists believe to be a flawed loudspeaker test methodology. Each loudspeaker was assigned an accuracy score related to the "flatness" of its sound power response measured in 1/3-octave bands. Consumers Union (CU) - the organization behind Consumer Reports - asserted that the sound power best predicts how good the loudspeaker sounds in a typical listening room. Until recently, this assertion had never been formally tested or validated in a published scientific study.
In 2004, the author conducted a study designed to address following question: "Does the CU loudspeaker model accurately predict listeners' loudspeaker preference ratings?" (see Reference 1). A sample of 13 different loudspeaker models reviewed in the 2001 August edition of Consumer Reports was selected for the study. Over the course of several months, the 13 loudspeakers were subjectively evaluated by a panel of trained listeners in a series of controlled, double-blind listening tests. Comparative judgments were made among different groups of 4 speakers at a time using four different music programs. Loudspeaker positional biases were eliminated via an automated speaker shuffler. To control loudspeaker context effects, a balanced test design was used so that each loudspeaker was compared against the other 12 loudspeaker models, an equal number of times. This produced a total of 2,912 preference, distortion and spectral balance ratings, in addition to 2,138 comments.
The above graph plots the mean listener loudspeaker preference rating and 95% confidence intervals (blue circles), and the corresponding CU predicted accuracy score (red squares) for each of the 13 loudspeakers. The agreement between the listener preference and CU accuracy scores is very poor, indeed; in fact, the correlation between the two sets of ratings is actually negative (r = -.22) and statistically insignificant (p = 0.46). The most preferred loudspeaker in the test group (loudspeaker 1) actually received the lowest CU accuracy score (76). Conversely, some of the least preferred loudspeakers (e.g. loudspeakers 9 and 10) received the highest CU accuracy scores. In conclusion, the CU accuracy scores do not accurately predict listeners' loudspeaker preference ratings. Since this study was published, CU has begun to reevaluate their loudspeaker testing methods. Hopefully, their new rating system will more accurately predict the perceived sound quality of loudspeakers in a typical listening room.
In the next installment of this article, I will explain why the CU loudspeaker model failed to accurately predict listeners' loudspeaker preferences, and show some new models that work much better in this regard.
Updated 1/5/2009: Today, I was contacted by Consumer Reports who informed me that since 2006 they no longer publish loudspeaker reviews based on their sound power model that I tested in 2004. I was told their new model for predicting loudspeaker sound quality uses a combination of sound power and other analytics to better characterize what the listener hears in a room. In this regard, it is similar to the predictive model I developed, which I will discuss in an upcoming blog posting.