Part 1 of this article described a listening test method used at Harman International for evaluating the sound quality of Ipod Music Docking Stations. In part 2, I present the results of a recent competitive benchmarking listening test where three popular Music Stations of comparable price were evaluated by a panel of trained listeners. Were listeners able to reliably formulate a preference among the different Ipod Music Stations using this test method? And what were the underlying sound quality attributes that explain these preferences? Read on to find out.
Throughout this article, I will refer to slides in an accompanying PDF presentation, or you can watch a YouTube video of the presentation.
The Products Tested
A listening test was performed on three Ipod Music Stations that retail for the same approximate price of $599: the Harman Kardon MS 100, the Bose SoundDock 10, and the Bowers & Wilkins Zeppelin (see slide 2). All three products provide Ipod docking playback capability and an auxiliary input for external sources such as CD player,etc. The latter was used in these tests to reproduce a CD-quality stereo test signal fed from a digital sound source.
Listening Test Method
The Music Stations were evaluated in the Harman International Reference Listening Room (slide 4) described in detail in a previous blog posting. Each Music Station was positioned on a shelf attached to the Harman automated in-wall speaker mover, which provides the means for rapid multiple comparisons among three products designed to be used in, on, or near a wall boundary. The music stations were level-matched within 0.1 dB at the listening position by playing pink noise through each unit and adjusting the acoustic output level to produce the same loudness measured via the CRC stereo loudness meter .
All tests were performed double-blind with the identities of the products hidden via an acoustically transparent, but visually opaque screen. The listening panel consisted of 7 trained listeners with normal audiometric hearing. Each listener sat in the same seat situated on-axis to the Music Stations positioned at seated ear height, approximately 11 feet away (slide 5).
The Music Stations were evaluated using a multiple comparison (A/B/C) protocol whereby listeners could switch at will between the three products before entering their final comments and ratings based on overall preference, distortion, and spectral balance. This was repeated using four different stereo music programs with one repeat (4 programs x 2 observations = 8 trials). In total, each listener provided 216 ratings, in addition to their comments. The typical length of the test was between 30-40 minutes. The presentation order of the music programs and Music Stations were randomized by the Harman Listening Test software to minimize any order-related biases in the results.
Results: Overall Preference Ratings For the Music Stations
A repeated measures analysis of variance was used to statistically establish the effects and interactions between the independent variables and the different sound quality ratings. The main effect was related to the Music Stations with no significant effects or interactions observed between the program material and Music Stations. Note that in the following discussion, the brands/models of the Music Stations have removed from the results since this information is not relevant to the primary purpose of the research and this article. Instead, the Music Station products have been assigned the letters A,B and C in descending order according to their mean overall preference rating.
The mean preference ratings and upper 95% confidence intervals based on the 7 listeners are plotted in slide 7. Music Station A received a preference rating of 6.8, and was strongly preferred over the Music Stations B (4.58) and C (4.08).
Individual Listener Preference
The individual listener preference ratings and upper 95% confidence intervals are plotted in slide 8. The intra and inter listener reliability in ratings were generally quite high. All seven listeners rated Music Station A higher than the other two products, although some listeners, notably 55 and 64, were less discriminating and reliable than other the listeners. Both these listeners had significantly less training and experience than the other listeners, which has been demonstrated in previous studies to be an important factor in listener performance.
Nonlinear distortion includes audible buzzes, rattles, noise and other level-dependent distortions related to the performance of the electronics, transducers, and mechanical integrity of the product’s enclosure. In these tests, the average playback level was held constant (78 dB(B) slow), and listeners could not adjust it up or down. Under these test conditions, some listeners still felt there were audible differences in distortion (slide 9) with Music Station A (distortion rating = 7.19) having less distortion than Music Stations B (5.5) and C (4.94).
Some of these differences in subjective distortion ratings could be related to a “Halo Effect," a scaling bias wherein listeners tend to rate the distortion of loudspeakers according to their overall preference ratings - even when the distortion is not audible. An example of “Halo Effect” bias has been noted in a previous loudspeaker study by the author . Reliable and accurate quantification of nonlinear distortion in perceptually meaningful terms remains problematic until better subjective and objective measurements are developed.
Spectral Balance Ratings
Listeners rated the spectral balance of each Music Station across seven equally log-spaced frequency bands using a ± 5-point scale. A rating of 0 indicates an ideal spectral balance, positive numbers indicate too much emphasis within the frequency band, and negative numbers indicate a deemphasis within the frequency band. Rating the spectral balance of an audio component is a highly specialized task that requires skill and practice acquired through using Harman’s “How to Listen” listener training software application. In a previous study , it has been shown that spectral balance ratings are closely related to the measured anechoic listening window of the loudspeaker, although may vary with changes in the directivity and the ratio of direct/reflected sound at the listening location.
The mean spectral balance ratings averaged across all programs and listeners are plotted in slide 10. Listeners felt Music Station A had the flattest or most ideal spectral balance, with the exception of a need for more upper/lower bass, and less emphasis in the upper treble. Music Station B was judged to have too much emphasis in the upper bass (88 Hz), and too little emphasis in the upper midrange/treble. Music Station C was rated to have a slight overemphasis in the upper bass, and a very uneven balance throughout the midrange with a peak centered around 1700 Hz.
Listeners provided comments that described the audible difference among three Music Stations. The frequency or number of times a specific comment was used to describe each product is summarized in slide 11. The correlation between the product’s preference rating and each descriptor is indicated by correlation coefficient (r) shown in the bottom row of the table. The same table data shown in slide 11 are plotted in graphical form in slide 12.
The most common three descriptors applied to the Music Station A were neutral (16), bright (9), and thin (9). These descriptors generally confirm the perceived mean spectral balance ratings summarized in slide 10.
The three most frequent descriptors applied to Music Station B were colored (13), boomy bass (10), and uneven mids(6). The “boomy bass” is clearly suggested in spectral balance ratings (see the large 88 Hz peak) in slide 10.
The three most frequent descriptors used to describe the sound quality of Music Station C were colored (19), uneven mids (9), and harsh (6). All three descriptors have a high negative correlation with the overall preference rating, and may explain the low preference rating this product received. The coloration and unevenness of the midrange are confirmed in the spectral balance rating in slide 10. The harshness is most likely related to the perceived spectral peak perceived around 1700 Hz.
This article summarized the results of a controlled, double-blind listening test performed on three comparatively priced Ipod Music Stations using seven trained listeners with normal hearing. The results provide evidence that the sound quality of Music Station A was strongly preferred over Music Stations B and C. There was strong consensus among all seven listeners who rated Music Station A highest overall. The Music Station preference ratings can be largely explained by examining the perceived spectral balance ratings of the products, which are in turn closely related to listener comments on the sound quality of the products.
The most preferred product, Music Station A, was perceived to have the flattest, most ideal spectral balance, and solicited frequent comments to its neutral sound quality. As the spectral balance ratings deviated from flat or ideal, the products received frequent comments related to coloration, boomy bass, and uneven midrange. While the distortion ratings were highly correlated with preference, more investigation is needed to determine the extent to which the distortion ratings are related to a possible scaling bias known as the “halo effect."
In part 3 of this article, I will present the objective measurements of these products - both anechoic and in-room acoustical measurements - to see if they can reliably predict the subjective ratings of the products reported here.
 Sean E. Olive, “ A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention, preprint 6113 (May 2004).