Saturday, May 1, 2010

Evaluating the Sound Quality of Ipod Music Stations: Part 3 Measurements



In Part 3 of this article, the acoustical measurements of three popular Ipod Music Stations (Harman Kardon MS100, Bose SoundDock 10 and Bowers & Wilkins Zeppelin) are examined to see if they corroborate listeners’ sound quality ratings of the products based on controlled double-blind listening tests. Part 2 summarized the results of those listening tests, and Part 1 described the listening test methodology used for this research.
Throughout this article, I will refer to some slides of a presentation that can be downloaded as a PDF or viewed as a YouTube video.
Mono or Stereo Acoustical Measurements?
There is a substantial body of scientific research on the subjective and objective testing of conventional stereo loudspeakers [1]-[5]. Unfortunately, the same is not true for Ipod Music Stations: this raises several research questions about how they should be evaluated and measured.
The first important question is whether the acoustical measurements should be done in mono or stereo. Due to the proximity of the left and right channel transducer arrays in Music Stations, there is the potential for constructive and destructive interference when both channels are active that will vary according to frequency and the relative inter-channel levels and phases of the music signals. To study this phenomena, the left and right channels were measured and analyzed as both single and combined channels. Generally, we found very little difference in the frequency responses (magnitude and phase) of the left and right channels. Combining the two channels only led to the expected 6 dB increase in sound pressure level (SPL).
Anechoic Measurements of the Music Stations
Each Music Station was measured at distance of 2 meters in the large anechoic chamber at Harman International. The chamber is anechoic down to 60 Hz and this is extended to 20 Hz through a calibration procedure. Each Music Station was subjected to the same battery of measurements used for designing and testing Revel, Infinity and JBL home loudspeakers. A total of 70 frequency response measurements were taken at 10 degree increments in both horizontal and vertical orbits (slide 4). These measurements were then spatially averaged and weighted to characterize the direct, early and late reflected sounds in a typical listening room, in addition to the calculated directivity indices (slides 5-8).
The family of measurement curves (slide 9) reveal significant differences among the three Music Stations in terms of their smoothness and low frequency extension below 70 Hz.
Music Station A has the smoothest frequency response across the family of curves, which corroborates listeners’ comments about its neutral sound and absence of colorations (see slide 11 of Part 2). There is also physical evidence in the measurements that explain listener comments about Music Station A sounding a bit bright and thin, due to a combination of the upward spectral tilt in its listening window curve, and its higher low frequency cutoff.
Music Station B has even more peaks and dips in the curves that contribute to the higher frequency of listener comments regarding audible coloration. Particularly problematic is the large broad resonance at 500 Hz that is visible in both the direct and reflected sounds produced by the product. However, there is nothing in the measurements to explain listeners’ complaints about its boomy bass.
Music Station C clearly has the least tidy set of measurement curves with a significant hole centered at 2 kHz in the on-axis curve. There are visible resonances in the measurements that elicited frequent listener comments about “midrange unevenness” and “coloration.” Finally, the sound power response and directivity indices reveal that this Music Station becomes increasingly directional at higher frequencies compared to its competitors. This could contribute to coloration and dullness at off-axis listening positions and at further listening distances.
Relationship between Anechoic Measurements and Listener Preference
The anechoic measurements of the Music Stations are shown again in Slide 10 along with the listener preference ratings. From this, we see that the overall smoothness of the family of curves appeared to be important underlying factor that influenced listeners’ Music Station preference ratings.
Correlations Between Anechoic Measurements and Perceived Spectral Balance: The Direct Sound Influences the Perceived Spectral Balance Above 300 Hz
There has been a 30+ year debate in the audio community regarding which set of acoustical measurements best predict the loudspeaker’s perceived sound quality in a typical listening room. There are several different camps that include the direct sound response advocates, the sound power response advocates, the in-room measurement advocates, and others, like myself, who argue that you need a combination of all of the above measurements to accurately predict how the loudspeakers will sound in a room.
One way to tackle this debate is to study the correlation between different loudspeaker measurements and listeners’ perceived spectral balance of the loudspeakers in a room. Slide 11 shows the perceived spectral balance ratings of the Music Stations versus the family of anechoic curves that include the listening window (direct sound), first reflections and sound power response.
For Music Station A, there is good agreement between the perceived spectral balance and the listening window curve, which represents the direct sound over a ± 30 degree horizontal angle. For Music Station B, there is generally poor agreement: listeners complained about boomy bass, yet there is nothing in these measurements to suggest why. There is clearly some information missing in the anechoic measurements and/or perhaps the subjective ratings are faulty. We will come back to this topic later.
For Music Station C, there is good agreement between the perceived spectral balance and the listening window curve (direct sound), with indications that the resonances centered at 1.5 and 3.5 kHz were heard and registered by the listeners.
In summary, it seems that for at least two of the Music Stations, the perceived spectral balance can be approximated by looking at the listening window curves that represent the direct sound. However, there is information missing in the anechoic measurements that don’t explain perceptual effects below 300 Hz.
In-Room Measurements of the Music Stations
Below about 300 Hz, the room acoustics and the Music Station/listener positions can have a significant influence on the perceived quality of reproduced sound. Yet, these physical effects are not captured in the anechoic measurements described in the previous section. To further examine these effects, steady-state frequency response measurements of the Music Stations were taken at the primary listening seat at 6 different microphone positions, and then spatially averaged to remove highly localized acoustical interference effects (slide 12). The 1/6-octave smoothed curves for each Music Station are shown in slide 13. Below 200 Hz, there is evidence of room resonances (high Q peaks and dips) and boundary effects that were absent in the previous anechoic measurements (slide 9). Music Station A had less apparent boundary gain than the other two products, probably because the boundary effect was accounted for in its design.

Correlation Between In-Room Measurements and Perceived Spectral Balance: The Influence of Room and Boundary Effects Below 300 Hz
The in-room measurements are plotted in slide 13 along with listeners’ perceived spectral balance ratings. Here, the in-room measurements have been super-smoothed (1-octave) to better correspond to the frequency resolution of the subjective ratings.
Below 300 Hz, there is better agreement between the in-room measurements and listeners’ spectral ratings than observed using the anechoic measurements (slide 11). However, above 300 Hz, there is generally better agreement between the anechoic measurement and spectral ratings, particularly using the listening window curve that represents the direct sound. This confirms the important role that the direct sound plays in our perception of reproduced sound. Below 300 Hz, the room’s standing waves and boundary effects play a dominant role in the quality and quantity of bass we hear. Previous studies [5] have shown bass quality accounts for 30% of listener preference, and cannot be ignored.
Dynamic Compression Measurements
Our scientific understanding of the perception and measurement of nonlinear distortions in loudspeakers is still quite poor. There are currently no standard loudspeaker measurements that adequately capture the perceptual significance of dynamic compression and the associated distortions it produces. This is an area of audio that is in need of more research.
Listeners reported that Music Station A had fewer audible nonlinear distortions than the other two Music Stations. However, it was not clear if the distortions were real or due to a cognitive bias known as the “Halo effect.” Examining the objective distortion measurements will hopefully clarify what is real and not real.
The dynamic linearity of the Music Stations was tested by measuring their anechoic frequency response at different playback SPL’s from 76 to 100 dB SPL (@ 1 meter distance) in 6 dB increments. A relatively short length 4 s log sweep was used as a test signal to minimize the thermal effects on the transducers. Consequently, the measured dynamic compressions shown below were largely related to the behavior of the electronic limiters in the Music Stations, designed to prevent the amplifier clipping, which could otherwise potentially damage the transducers.
Slide 16 shows the dynamic compression for each Music Station. The frequency response measured at 82, 88, 94 and 100 dB SPL’s have been normalized to the 76 dB measurement. Any dynamic compression effects would be exhibited as a deviation from 0 dB. In examining these graphs, Music Station A produced 6 dB more output (100 dB @ 1 meter) than the other Music Stations without significant compression effects.
On the surface, the relationship between these measurements and listeners’ distortion ratings seems to be straightforward: the Music Stations with the higher amounts of compression received lower distortion ratings (slide 17). However, the SPL’s at which the compression effects occurred (> 94 dB) were higher than those used in the listening test.

Harmonic Distortion Measurements
Harmonic distortion (second and third harmonic only) measurements were made in the anechoic chamber at a SPL of 95 dB. The distortion levels of the harmonics are plotted along with the fundamental for each of the Music Stations in slide 18. Note that the levels of the harmonics have been raised 20 dB for the sake of clarity.
All of the Music Stations exhibited relatively high distortion at low frequencies below 100 Hz, with generally less harmonic distortion at higher frequencies. Music Station B differentiated itself by having higher levels of second and third harmonic distortion between 100 Hz to 1 kHz. Music Station C had the lowest distortion even though it received the lowest preference and distortion ratings from the listeners.
In conclusion, the harmonic distortion measurements of the Music Stations are not particularly good at predicting listeners’ distortion ratings, or overall preference in sound quality. This confirms many previous loudspeaker studies that have reported that harmonic distortion measurements are poor predictors of listeners’ overall impression of the loudspeaker. This can be explained by the fact that the distortions are often below the threshold of audibility, and the measurements themselves do not account for the masking properties of human hearing.

Conclusions
This article has shown evidence that a combination of comprehensive anechoic and in-room measurements can help explain listeners’ preferences and spectral balance ratings of the Music Stations evaluated in controlled listening tests.
Above 300 Hz, the anechoic derived listening window curve correlated well with listeners’ spectral balance ratings, whereas the in-room measurements better explained the Music Station’s acoustical interactions with the room below 300 Hz. In these particular tests, the overall smoothness of the on and off-axis frequency response curves provided the best overall indicator of listeners’ preferences and their comments.
Dynamic compression measurements revealed significant differences among the Music Stations in terms of their linear SPL output capability. The most preferred Music Station could play 6 dB louder (100 dB SPL @ 1 meter) than the other units without exhibiting significant dynamic compression. It is unlikely that this was a factor in the listening tests since the units were evaluated at a comfortable average level of 78 dB (B-weighted, slow). Finally, distortion measurements revealed some differences among the products but had no clear correlation with listeners’ sound quality ratings. This highlights the need for further research into the perception and measurement of nonlinear distortion in loudspeakers so that loudspeaker engineers can optimize their designs using psychoacoustic criteria.
References
[1] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 1" J. AES Vol. 23, issue 4, pp. 227-235, April 1986. (download for free courtesy of Harman International).
[2] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2," J. AES, Vol. 34, Issue 5, pp. 323-248, May 1986. (download for free courtesy of Harman International).
[3] W. Klippel, "Multidimensional Relationship between Subjective Listening Impression and Objective Loudspeaker Parameters", Acustica 70, Heft 1, S. 45 - 54, (1990).
[4] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention, preprint 6113 (May 2004).
[5] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part 2 - Development of the Model,” presented at the 117th AES Convention, preprint 6190 (October 2004).

20 comments:

  1. As far as I can tell, that Harman product is only available in Britain.

    ReplyDelete
  2. Hi Sean,

    were you able to predict listeners percieved balance based on measurements? It seems to me that Station B more or less blows apart all that "prediction" theories. It's like horoscope - you can always tell that it WAS correct, but when it comes to predicting the future it's always wrong ;)

    ReplyDelete
  3. Anonymous,

    I believe the Harman Kardon HM100 is currently only available in Europe and possibly parts of Asia. It may be available in USA soon.

    ReplyDelete
  4. Hi Vuki,

    As I mentioned in the article, Music Station B had the worst agreement between the measurements and perceived spectral balance. There was better agreement between other two products. I'm still reviewing the subjective ratings and other objective measurements to explain the discrepancy. Music Station B had high distortion and the limiter seriously rolled of the HF although at levels higher than what we listened at.

    If you look at a few of my previous papers where spectral balance ratings were compared to objective measurements there is a correlation between the two, as you would expect. This prediction will get better with more research.

    ReplyDelete
  5. I bet that Station B manufacturer aimed at the tonal balance as it was perceived by listeners in the test. I also bet that average consumer in the shop would rate music station A as "shril" and station B as the one with the "mighty" sound and would choose it without hesitation.
    IMO somewhat mysterious is also midrange peak perceived with station C.
    Anyway, very nice article. Thank you very much for sharing. Your texts are a real treat for anyone interested in audio.

    ReplyDelete
  6. Hi Vuki,

    Yes, I've heard audio marketing people before make the unsubstantiated claim that neutral accurate sound is boring and doesn't sell. Some say that bright and in your face is what grabs people's taste -- not boomy bass. The problem is that there are many flavors of bad taste and bad sound.

    It would be an interesting research project to test this. Perhaps like TV's there should an in-store "BUY ME" calibration and a "OK I already bought it, now make it accurate" calibration for home.

    I was quite put off by the boomy sound of Station B like most of the listeners. The bass is almost "one-note-like" and quite separate from the rest of the spectrum.

    The large mid peak in the subjective spectral balance for Music Station C I think is from the one or two resonances in the midrange visible in the anechoic measurements. The fact that listeners slightly exaggerated the scaling of these peaks suggests that the resonances are perceptually more obnoxious than they appear in the physical measurements.

    ReplyDelete
  7. Additional variable for research - spectral decay!

    ReplyDelete
  8. I was interested to see that huge suckout in C did not have a bigger impact on user preferences.

    Maybe because it was a notch and not a peak?

    Your point about the room being important at lower frequencies is very interesting. Do you know if there are common room modes that create suckouts? Maybe it just varies room to room.

    ReplyDelete
  9. Anonymous:

    The large suck-out at 2 kHz was limited to the on-axis measurement that has no spatial averaging. That tells you that something is out-of-phase at that position but goes away at different microphone locations. Given that the listening distance was about 9-10 feet away from the Music Stations, this suck-out was likely not a factor. However, sitting close the product at a desktop would likely make it more noticeable, with large spectral changes as you move your head left to right.

    ReplyDelete
  10. Anonymous said:

    "Your point about the room being important at lower frequencies is very interesting. Do you know if there are common room modes that create suckouts? Maybe it just varies room to room."

    Yes, all rooms have natural modes whose frequency, Q and gain depend on several factors: the dimensions of the room, the absorption of the walls, and the placement of the loudspeakers/listeners. Fortunately, you can get decent bass in a room at one or more seats through judicious placement of the loudspeakers, listeners and applying room correction (not all products are equal).

    We have some patented technology known as Sound Field Management that uses multiple subwoofers, gain, delay to greatly reduces the seat to seat variance in bass performance in a room. For more information look at the papers by Toole and Welti/Devantier here:http://www.harman.com/EN-US/OurCompany/Technologyleadership/Pages/ScientificPublications.aspx?CategoryID=Scientific%20Publications

    ReplyDelete
  11. Hi,

    Have you tried AQuA Wideband to evaluate audio uality? What were the results?

    ReplyDelete
  12. <>

    No, we have not tried AQuA Wideband for evaluating loudspeakers. This is a tool for evaluating the perceptual quality of audio codecs much like PEAQ or PESQ and wasn't designed for evaluating loudspeakers in rooms.

    Cheers
    Sea

    ReplyDelete
  13. I assume the Harman product is "A"?

    (Thanks Balaton Mkt for pointing out this blog)

    ReplyDelete
  14. Hi Julie,

    Yes, that is correct assumption: A = HK MS100

    ReplyDelete
  15. And which one is C ?

    ReplyDelete
  16. <<And which one is C?"
    Think Hindenburg.

    ReplyDelete
  17. Hi, I don't know exactly "dynamic compression" of loudspeakers..


    what is dynamic compression??

    ReplyDelete
  18. I have a question..

    So the acoustical measurement should be done in stereo??

    or the measurement is okay to do in mono because there is little difference both left and right channel.

    ReplyDelete
  19. Anonymous,
    If there is little acoustical/electrical interaction between left and right channels and the left/right channels are symmetrical you can measure them separately.

    ReplyDelete
  20. Clara,
    "Dynamic compression" can occur by electronic limiters in powered speakers that limit the amplifier output so avoid clipping the amps, and causing distortion. These limiters can also minimize distortion and clipping of the speakers due to over excursion of the drivers.

    Another form of "dynamic compression" in the speakers themselves from nonlinearities and limits in the motor and mechanical suspension that consists of the spider and surround. Another factor is "power compression" which changes the frequency response of the speaker when the voice coil heats up to a temperature where the DC resistance increases.

    ReplyDelete