Showing posts with label Sean Olive. Show all posts
Showing posts with label Sean Olive. Show all posts

Monday, June 28, 2010

Science in the Service of Art

Last week, I've was given my own front page forum over at WhatsbestForum called "Science in the Service of Art", where I can write about any topic I wish. My first posting is called "Audio Science in the Service of Art".

I will probably post the same articles I write over there, on this blog as well. But for now, I recommend you go over there, read my article, and then leave your comments about what we need to do in order to improve the quality and consistency of recorded and reproduced music.

Harman is committed to a scientific approach towards the design and testing of audio products in the consumer, professional, and automotive audio spaces. Last week, Harman Kardon began a PR campaign called the "Science of Sound" where "Science in the Service of Art" is a major theme. You can read about this on the Harman Kardon web sites (click on the "about" link at the top of the page). Enjoy!

Saturday, May 1, 2010

Evaluating the Sound Quality of Ipod Music Stations: Part 3 Measurements



In Part 3 of this article, the acoustical measurements of three popular Ipod Music Stations (Harman Kardon MS100, Bose SoundDock 10 and Bowers & Wilkins Zeppelin) are examined to see if they corroborate listeners’ sound quality ratings of the products based on controlled double-blind listening tests. Part 2 summarized the results of those listening tests, and Part 1 described the listening test methodology used for this research.
Throughout this article, I will refer to some slides of a presentation that can be downloaded as a PDF or viewed as a YouTube video.
Mono or Stereo Acoustical Measurements?
There is a substantial body of scientific research on the subjective and objective testing of conventional stereo loudspeakers [1]-[5]. Unfortunately, the same is not true for Ipod Music Stations: this raises several research questions about how they should be evaluated and measured.
The first important question is whether the acoustical measurements should be done in mono or stereo. Due to the proximity of the left and right channel transducer arrays in Music Stations, there is the potential for constructive and destructive interference when both channels are active that will vary according to frequency and the relative inter-channel levels and phases of the music signals. To study this phenomena, the left and right channels were measured and analyzed as both single and combined channels. Generally, we found very little difference in the frequency responses (magnitude and phase) of the left and right channels. Combining the two channels only led to the expected 6 dB increase in sound pressure level (SPL).
Anechoic Measurements of the Music Stations
Each Music Station was measured at distance of 2 meters in the large anechoic chamber at Harman International. The chamber is anechoic down to 60 Hz and this is extended to 20 Hz through a calibration procedure. Each Music Station was subjected to the same battery of measurements used for designing and testing Revel, Infinity and JBL home loudspeakers. A total of 70 frequency response measurements were taken at 10 degree increments in both horizontal and vertical orbits (slide 4). These measurements were then spatially averaged and weighted to characterize the direct, early and late reflected sounds in a typical listening room, in addition to the calculated directivity indices (slides 5-8).
The family of measurement curves (slide 9) reveal significant differences among the three Music Stations in terms of their smoothness and low frequency extension below 70 Hz.
Music Station A has the smoothest frequency response across the family of curves, which corroborates listeners’ comments about its neutral sound and absence of colorations (see slide 11 of Part 2). There is also physical evidence in the measurements that explain listener comments about Music Station A sounding a bit bright and thin, due to a combination of the upward spectral tilt in its listening window curve, and its higher low frequency cutoff.
Music Station B has even more peaks and dips in the curves that contribute to the higher frequency of listener comments regarding audible coloration. Particularly problematic is the large broad resonance at 500 Hz that is visible in both the direct and reflected sounds produced by the product. However, there is nothing in the measurements to explain listeners’ complaints about its boomy bass.
Music Station C clearly has the least tidy set of measurement curves with a significant hole centered at 2 kHz in the on-axis curve. There are visible resonances in the measurements that elicited frequent listener comments about “midrange unevenness” and “coloration.” Finally, the sound power response and directivity indices reveal that this Music Station becomes increasingly directional at higher frequencies compared to its competitors. This could contribute to coloration and dullness at off-axis listening positions and at further listening distances.
Relationship between Anechoic Measurements and Listener Preference
The anechoic measurements of the Music Stations are shown again in Slide 10 along with the listener preference ratings. From this, we see that the overall smoothness of the family of curves appeared to be important underlying factor that influenced listeners’ Music Station preference ratings.
Correlations Between Anechoic Measurements and Perceived Spectral Balance: The Direct Sound Influences the Perceived Spectral Balance Above 300 Hz
There has been a 30+ year debate in the audio community regarding which set of acoustical measurements best predict the loudspeaker’s perceived sound quality in a typical listening room. There are several different camps that include the direct sound response advocates, the sound power response advocates, the in-room measurement advocates, and others, like myself, who argue that you need a combination of all of the above measurements to accurately predict how the loudspeakers will sound in a room.
One way to tackle this debate is to study the correlation between different loudspeaker measurements and listeners’ perceived spectral balance of the loudspeakers in a room. Slide 11 shows the perceived spectral balance ratings of the Music Stations versus the family of anechoic curves that include the listening window (direct sound), first reflections and sound power response.
For Music Station A, there is good agreement between the perceived spectral balance and the listening window curve, which represents the direct sound over a ± 30 degree horizontal angle. For Music Station B, there is generally poor agreement: listeners complained about boomy bass, yet there is nothing in these measurements to suggest why. There is clearly some information missing in the anechoic measurements and/or perhaps the subjective ratings are faulty. We will come back to this topic later.
For Music Station C, there is good agreement between the perceived spectral balance and the listening window curve (direct sound), with indications that the resonances centered at 1.5 and 3.5 kHz were heard and registered by the listeners.
In summary, it seems that for at least two of the Music Stations, the perceived spectral balance can be approximated by looking at the listening window curves that represent the direct sound. However, there is information missing in the anechoic measurements that don’t explain perceptual effects below 300 Hz.
In-Room Measurements of the Music Stations
Below about 300 Hz, the room acoustics and the Music Station/listener positions can have a significant influence on the perceived quality of reproduced sound. Yet, these physical effects are not captured in the anechoic measurements described in the previous section. To further examine these effects, steady-state frequency response measurements of the Music Stations were taken at the primary listening seat at 6 different microphone positions, and then spatially averaged to remove highly localized acoustical interference effects (slide 12). The 1/6-octave smoothed curves for each Music Station are shown in slide 13. Below 200 Hz, there is evidence of room resonances (high Q peaks and dips) and boundary effects that were absent in the previous anechoic measurements (slide 9). Music Station A had less apparent boundary gain than the other two products, probably because the boundary effect was accounted for in its design.

Correlation Between In-Room Measurements and Perceived Spectral Balance: The Influence of Room and Boundary Effects Below 300 Hz
The in-room measurements are plotted in slide 13 along with listeners’ perceived spectral balance ratings. Here, the in-room measurements have been super-smoothed (1-octave) to better correspond to the frequency resolution of the subjective ratings.
Below 300 Hz, there is better agreement between the in-room measurements and listeners’ spectral ratings than observed using the anechoic measurements (slide 11). However, above 300 Hz, there is generally better agreement between the anechoic measurement and spectral ratings, particularly using the listening window curve that represents the direct sound. This confirms the important role that the direct sound plays in our perception of reproduced sound. Below 300 Hz, the room’s standing waves and boundary effects play a dominant role in the quality and quantity of bass we hear. Previous studies [5] have shown bass quality accounts for 30% of listener preference, and cannot be ignored.
Dynamic Compression Measurements
Our scientific understanding of the perception and measurement of nonlinear distortions in loudspeakers is still quite poor. There are currently no standard loudspeaker measurements that adequately capture the perceptual significance of dynamic compression and the associated distortions it produces. This is an area of audio that is in need of more research.
Listeners reported that Music Station A had fewer audible nonlinear distortions than the other two Music Stations. However, it was not clear if the distortions were real or due to a cognitive bias known as the “Halo effect.” Examining the objective distortion measurements will hopefully clarify what is real and not real.
The dynamic linearity of the Music Stations was tested by measuring their anechoic frequency response at different playback SPL’s from 76 to 100 dB SPL (@ 1 meter distance) in 6 dB increments. A relatively short length 4 s log sweep was used as a test signal to minimize the thermal effects on the transducers. Consequently, the measured dynamic compressions shown below were largely related to the behavior of the electronic limiters in the Music Stations, designed to prevent the amplifier clipping, which could otherwise potentially damage the transducers.
Slide 16 shows the dynamic compression for each Music Station. The frequency response measured at 82, 88, 94 and 100 dB SPL’s have been normalized to the 76 dB measurement. Any dynamic compression effects would be exhibited as a deviation from 0 dB. In examining these graphs, Music Station A produced 6 dB more output (100 dB @ 1 meter) than the other Music Stations without significant compression effects.
On the surface, the relationship between these measurements and listeners’ distortion ratings seems to be straightforward: the Music Stations with the higher amounts of compression received lower distortion ratings (slide 17). However, the SPL’s at which the compression effects occurred (> 94 dB) were higher than those used in the listening test.

Harmonic Distortion Measurements
Harmonic distortion (second and third harmonic only) measurements were made in the anechoic chamber at a SPL of 95 dB. The distortion levels of the harmonics are plotted along with the fundamental for each of the Music Stations in slide 18. Note that the levels of the harmonics have been raised 20 dB for the sake of clarity.
All of the Music Stations exhibited relatively high distortion at low frequencies below 100 Hz, with generally less harmonic distortion at higher frequencies. Music Station B differentiated itself by having higher levels of second and third harmonic distortion between 100 Hz to 1 kHz. Music Station C had the lowest distortion even though it received the lowest preference and distortion ratings from the listeners.
In conclusion, the harmonic distortion measurements of the Music Stations are not particularly good at predicting listeners’ distortion ratings, or overall preference in sound quality. This confirms many previous loudspeaker studies that have reported that harmonic distortion measurements are poor predictors of listeners’ overall impression of the loudspeaker. This can be explained by the fact that the distortions are often below the threshold of audibility, and the measurements themselves do not account for the masking properties of human hearing.

Conclusions
This article has shown evidence that a combination of comprehensive anechoic and in-room measurements can help explain listeners’ preferences and spectral balance ratings of the Music Stations evaluated in controlled listening tests.
Above 300 Hz, the anechoic derived listening window curve correlated well with listeners’ spectral balance ratings, whereas the in-room measurements better explained the Music Station’s acoustical interactions with the room below 300 Hz. In these particular tests, the overall smoothness of the on and off-axis frequency response curves provided the best overall indicator of listeners’ preferences and their comments.
Dynamic compression measurements revealed significant differences among the Music Stations in terms of their linear SPL output capability. The most preferred Music Station could play 6 dB louder (100 dB SPL @ 1 meter) than the other units without exhibiting significant dynamic compression. It is unlikely that this was a factor in the listening tests since the units were evaluated at a comfortable average level of 78 dB (B-weighted, slow). Finally, distortion measurements revealed some differences among the products but had no clear correlation with listeners’ sound quality ratings. This highlights the need for further research into the perception and measurement of nonlinear distortion in loudspeakers so that loudspeaker engineers can optimize their designs using psychoacoustic criteria.
References
[1] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 1" J. AES Vol. 23, issue 4, pp. 227-235, April 1986. (download for free courtesy of Harman International).
[2] Floyd E. Toole, "Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2," J. AES, Vol. 34, Issue 5, pp. 323-248, May 1986. (download for free courtesy of Harman International).
[3] W. Klippel, "Multidimensional Relationship between Subjective Listening Impression and Objective Loudspeaker Parameters", Acustica 70, Heft 1, S. 45 - 54, (1990).
[4] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention, preprint 6113 (May 2004).
[5] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part 2 - Development of the Model,” presented at the 117th AES Convention, preprint 6190 (October 2004).

Thursday, March 11, 2010

A Method For Training Listeners and Selecting Program For Listening Tests

The benefits of training listeners for subjective evaluation of reproduced sound are well documented [1]-[3]. Not only do trained listeners produce more discriminating and reliable sound quality ratings than untrained listeners, but they can report what they perceive in very precise, quantitative and meaningful terms.


One of the unexpected byproducts of listener training is that it identifies which music selections are most sensitive to distortions commonly found within the audio chain [4]. This is exactly what was found in a series of listener training experiments the author reported in a 1994 paper entitled, “A method for training listeners and selecting program material for listening tests.” The following sections summarize the findings of those early experiments, which helped establish an objective method for training and selecting listeners and program material used for listening tests at Harman International over the past 16 years. A slide presentation summarizing the paper can be downloaded here, and will be referred to throughout the following sections.


Matching the Sound of Spectral Distortions to Their Frequency Response Curve


A computer-based training task was designed where listeners were required to compare different spectral distortions added to programs and then match the frequency response curve of the filter that generated the distortion (see slides 4-5). This was repeated using eight different equalizations and twenty different music selections digitally edited into short 10-20 s loops.


The equalizations included ±3 dB shelving filters at low (100 Hz) and high (5 kHz) frequencies, and ±3 dB resonances (Q = 0.66) centered at 500 Hz and 2 kHz (slide 6). An equalized version of the program (Flat) was always provided as a reference. The twenty music selections include classical, jazz and pop/rock genres with instrumentations that varied from solo instruments, speech and small combos to rock/combos and orchestras (slide 7). Pink noise was also included since this continuous broadband signal has been found to produce the lowest detection thresholds of resonances in loudspeakers [5],[6].


Eight untrained listeners with normal hearing participated in the training exercises, which were conducted over five separate listening sessions consisting of 32 trials each (slides 8 and 9). The presentation order of the equalizations, trials, and programs were randomized to prevent any order related biases. The listener’s performance was based on the percentage of correct responses given over the course of the five training sessions.


The Results


The training results were statistically analyzed using a repeated measures analysis of variance (ANOVA) to determine the effect the different music programs, equalizations, and trials had on the listeners’ performance in correctly identifying the different equalizations (slide 11).


Listener Performance Is Strongly Influenced by Program Selection


The single largest effect on the listener’s performance was the program selection. Slide 13 plots the mean listener performance scores for each of the twenty programs averaged across all eight equalizations. The percentage of correct responses ranged from a high of 88% (pink noise) to a low of 54% (jazz piano trio). Listeners performed the task best when using broadband, spectrally dense continuous signals like pink noise or pop/rock selections like Tracy Chapman, Little Feat, and Jennifer Warnes. Listeners performed worse on programs featuring solo instruments, small combos and speech that produced more discontinuous and narrow band signals. More about this later.



Equalization Context Influences Listener Performance


The effect of equalization on listener performance was surprisingly small (slide 14). There was a tendency for listeners to correctly identify the spectral distortions that occurred at low and high frequency regions versus the midband equalizations. The explanation for this can be found by examining the interaction effect between equalization * trial, indicating that listener performance depended on which combinations of equalizations were presented within a trial. In other words, the context in which an equalization was presented influenced listener performance (slide 15). These contextual effects can be summarized as follows:


  1. Listeners gave more correct responses when the presented equalizations were more separated in frequency.
  2. Listeners gave more correct responses when presented spectral boosts versus notches; spectral notches were often confused with spectral peaks located at slightly higher frequencies.
  3. Low frequency boosts were often confused with high frequency cuts (and vice versa).
  4. Low frequency cuts were often confused with high frequency boosts (and vice versa)



Greater frequency separation between different equalizations would produce more distinctive tonal or timbral differences that would help improve identification. The second observation confirms previous research that has found spectral notches are more difficult to detect than spectral peaks of similar bandwidth [5]. The one exception is broadband dips, which have similar detection thresholds as resonance peaks with equivalent bandwidth[6]. Observations c) and d) are related to each other, and are more difficult to explain. On first glance, it seems implausible that boosts and cuts separated five octaves apart can be confused with one another. A possible explanation is that listeners are using information across the entire bandwidth to judge the perceived perceive balance of the bass and treble. In this case, the slope or shape of the spectra must be an important factor (slide 16). Since a boost or cut of similar magnitude at opposite ends of the audio bandwidth produce similar broadband shapes or slopes, this might explain why listeners might confuse the two with each other.


Program and EQ Interact to Influence Listener Performance


There was also a significant interaction between program and equalization that affected listener performance. This interaction effect was most apparent for the programs presented in training session 3 where listener performance varied significantly depending on the combination of programs and equalization presented to the listener (slide 18). It seems plausible that these differences were related to differences in the spectra of the programs, which was confirmed by plotting the average 1/3-octave spectra of the four programs (slide 19). The largest listener response errors tended to occur when the equalization fell in a frequency range where there was little spectral energy in the programs (e.g. Programs P10 (Stan Getz) and P19 (Canadian Brass)). It makes sense that listeners cannot easily analyze the spectral distortions if the program material does not contain signals that make them audible.



Not All Listeners Are Equal to the Task


No amount of training will make me eligible for the Canadian Olympic hockey team - even if I were 25 years younger. Some people simply lack the innate mental and physical raw material to perform a highly specialized task. This is also true for critical listening as illustrated by the average performance scores of eight listeners after 5 listening sessions (slide 20). The range of individual listener performances range from 82% (listener 4) to 31% (listener 3). All listeners had normal hearing. Therefore, the reason for this large inter-listener variance in performance is related to other factors such as listener motivation, attentiveness, and their listening (and general) intelligence. Training data such as this, can provide an objective quantifiable metric for selecting the best listeners for audio product evaluations.



Practice Makes Perfect


The success of any listener training task that it can lead to measurable improvement in performance with repetition. Slide 21 show shows listener performance measured over five training sessions based on the eight listeners tested. The graph shows monotonic improvement in performance from 65% correct responses to 80% after five training sessions. Additional training sessions would most likely realize further gains in performance for some subjects. In other words, the training works!



Programs With Wider and Flatter Spectrums Improve Listener Performance (Why Tracy Chapman is as Good as Pink Noise)


Spectrum analysis was performed on the different program selections to see if this could explain the strong effect of program on listener performance. The 1/3-octave spectrum of each program was plotted based on a long-term average taken over the entire length of the loop. When we looked at the spectrums of the programs it became clear that this was a significant predictor of how well listeners would perform their task.


Slide 22 plots the average spectrum of four groups of program (5 programs in each group) rank ordered (from highest to lowest) according to the listener performance scores they produced. It clearly shows that the programs with the flattest and most extended spectrums (e.g. pink noise, pop/rock, full orchestra) were better suited for identifying spectral distortions. After pink noise, Tracy Chapman (program 2 in the above graph) had among the widest and flattest spectrums measured, and along with pink noise (program 1) registered the highest listener performance scores. Programs that had narrow band spectra with limited energy above and below 500 Hz (speech, solo instruments, small jazz and classical ensembles) concentrated in group 4 were less suited for identifying spectral distortions. While these groupings had some of the most musically entertaining selections, in the end, they were not good signals for detecting and characterizing spectral distortions in audio components.



Conclusions


A listener training method has been described that teaches listeners how to identify spectral distortions according to their frequency response curve. Experimental evidence was shown indicating listeners improved their performance in this task after 5 training sessions, although not all listeners are equal in their performance.


Statistical analysis of the training data revealed that the program selections are the largest factor influencing listener performance in this task: programs with continuous broadband spectra (e.g. pink noise, Tracy Chapman,etc) provide the best signals for characterizing spectral distortions whereas programs with narrow band spectra (e.g. speech, solo instruments) provide poor signals for performing this task. Furthermore, listeners seem to confuse certain types of spectral distortions with others when the distortions presented share similarities in their frequency, bandwidth, and broadband spectral slope or shape.

Finally, it is important to remember that the training methods and programs discussed in this study focussed on perception and analysis of spectral distortions. While these types of distortions are the most dominant ones found in loudspeakers, microphones and listening rooms, there are other types of distortions for which a different set of programs are likely better suited for revealing their audibility and subjective analysis. The current Harman listener training software “How to Listen” includes training tasks on spectral distortion as well as spatial, dynamic and various types of nonlinear distortions for which we hope to discover the optimal programs for detecting and analyzing their audibility. Stay tuned.



References


  1. Olive, Sean E., "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study,” J. Audio Eng. Soc. Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.
  2. Bech, Soren, “Selection and Training of Subjective for Listening Tests on Sound-Reproducing Equipment,” J. Audio Eng. Soc., vol. 40 no. 7/8 pp. 590-610 (July 1992).
  3. Toole Floyd E. "Subjective Measurements of Loudspeakers Sound Quality and Listener Performance," J. Audio Eng. Soc., vol. 33, pp. 2-32 (1985 Jan./Feb.).
  4. Olive, Sean E., “A Method for Training Listeners and Selecting Program Material for Listening Tests” presented at the 97th AES Convention, preprint 3893, (November 1994).
  5. Toole, Floyd E. and Sean E. Olive, “The Modification of Timbre by Resonances: Perception and Measurement,” J. Audio Eng. Soc., Vol. 36, pp. 122-142 (March 1998).
  6. Olive, Sean E.; Schuck, Peter L.; Ryan, James G.; Sally, Sharon L.; Bonneville, Marc E. “The Detection Thresholds of Resonances at Low Frequencies,” J. Audio Eng. Soc., Vol. 45, Issue 3, pp. 116-128 (March 1997).
  7. Olive Sean E., “Harman’s How to Listen - A New Computer-based Listener Training Program,” May 30,2009.

Friday, February 5, 2010

Evaluating the Sound Quality of Ipod Music Stations: Part 1


For many consumers, an iPod Music Docking Station may be the primary audio device through which they experience most of their recorded music and infotainment. These ubiquitous devices offer a convenient, low cost, portable and easy-to-use solution for enjoying an Ipod through loudspeakers -- but what about their sound quality? What sonic compromises are made in order to achieve this level of convenience and portability? Do certain models or brands of Ipod Music Stations offer better sound than others, and if so, how can consumers identify which ones they are? These are legitimate questions that consumers should be asking when purchasing an Ipod Music Station. Unfortunately, the answers are not readily found.


Choosing an Ipod Music Station based on sonic performance quality is a daunting task for consumers. There are dozens of models to choose from that vary in price from $80 to as high as $3000 for a model designed by Ferrari. Competent in-store demonstrations and reviews of these products are difficult to find, and the technical specifications on the packaging provide no clear indication of how good they sound. For traditional loudspeakers, it is already possible to quantify their sound quality, but the audio industry continues to withhold this information from consumers. Without meaningful performance specifications in place, consumers cannot make sound purchase decisions, nor can manufacturers be easily held accountable for delivering products that sound “ not good enough.”


This article describes a listening test method used at Harman International for evaluating the sound quality of Harman and competitors’ Ipod Music Stations. The goal is to provide subjective ratings of Ipod Music Stations that are accurate, reliable and scientifically valid. From this data, a set of technical performance specifications can be developed that quantify how good the products sound.


Designing Listening Tests For Ipod Music Stations


Fortunately, there already exists a large body of scientific knowledge on how to design accurate, reliable and valid listening tests on loudspeakers. A key ingredient is careful control of listening test nuisance variables: these are psychological, electro-acoustical and experimental factors not directly related to the product(s) under test but nonetheless influence and bias the results (click on the figure below). Some of the more significant nuisance variable controls that should be in place but often are ignored by audio manufacturers and reviewers are:

  • Double-blind conditions (this removes the effects of sighted biases related to brand, price,etc)
  • Trained listeners with normal hearing (trained listeners are up to 20 times more discriminating and reliable than untrained listeners, yet their overall sound quality preferences are similar to those of untrained listeners)
  • Quiet listening room with acoustics that are representative of average homes (important for hearing low level sounds and the quality of the loudspeaker's off-axis radiated sounds)
  • Loudness matching between products (the perception of timbre, spatial and dynamic attributes are level dependent)
  • Selection of well-recorded music selections that are revealing of sound quality differences
  • Multiple comparisons among products which are more discriminating and reliable compared to single stimulus presentations



These important nuisance variable controls are essential for obtaining accurate, reliable and valid sound quality ratings of Ipod Music Stations.



Including the Acoustical Effects of the Wall and Desktop in the Listening Test


If audio products are not tested under similar conditions for which they were designed and intended to be used, the ecological validity (as well as the external validity) of the test may be compromised: in other words, the test results will be of little value or relevance to how the product is typically used in the real world.


Most Ipod Music Stations are intended to be placed on a desktop surface or bookshelf located near a wall, which will cause acoustical reinforcement and cancellation at certain audio frequencies. Below 500 Hz, there will be a gradual increase in sound pressure level that unless compensated for in the design of the product can make vocals and bass instruments sound tubby and boomy. Diffraction effects or reflections from the desktop/bookshelf may also produce audible effects that should be included in the listening test. For these reasons, listening tests on Ipod Music Stations are best done on a desktop/wall boundary.



A Video On How We Evaluate the Sound Quality of Ipod Docking Stations


The video shown at the top of the page illustrates how Ipod Music Stations are currently evaluated in the Harman International Reference Listening Room. The acoustical properties and features of the room have been described in detail in a previous posting.


In the video you see a trained listener comparing three different Ipod Music Stations situated on our automated in-wall speaker mover configured with a removable shelf and desktop. An acoustically transparent, visually opaque screen is placed between the listener and the products under test, so that the test is double-blind (note: the term double-blind implies that neither the listener nor the experimenter know the identities of the products currently selected since the computer controls and randomly assigns the letters A/B/C to the products in each trial.)


The listener can switch between the different products at will and enter their responses via a wireless PDA equipped a custom listening test software (LTS) client application. Sound quality ratings are given on a number of different pre-defined scales that include preference, spectral balance, distortion, auditory image size.This is repeated twice using four different programs.


The PDA client communicates with the LTS server application that performs the following functions:


  • A test wizard that defines of all experimental design and setup parameters (perceptual scales, presentation of stimuli, program, randomization of test objects, playback level,etc), which are then stored in a database
  • automation and administration of the listening test and its hardware (e.g. speaker mover, media player, DSP, audio switcher)
  • collection, storage and statistical analysis of listening test data
  • real-time monitoring of listener’s performance and ratings during the test


LTS makes conducting listening tests an efficient and repeatable process by minimizing human interaction and errors in the listening test setup, storage, and analysis of the results.


Conclusions


This article has described a listening test method used for evaluating Ipod Music Stations with the goal to provide accurate, reliable and valid sound quality ratings. In Part 2, I will show some results from a recent listening test conducted on different Ipod Music Stations, followed by some different acoustical measurements of the products in Part 3. By studying the relationship between well-controlled scientific listening tests and comprehensive acoustical measurements of Ipod Music Stations, a meaningful technical specification based on sound quality can be found.