Showing posts with label listener training. Show all posts
Showing posts with label listener training. Show all posts

Thursday, April 21, 2011

Topics Related to Perception and Measurement of Reproduced Sound


On Tuesday, April 26th 2011, I will be giving a presentation at the meeting of the Los Angeles AES Chapter on several topics related to recent audio research at Harman International. The topics include:

I've briefly discussed these topics in Audio Musings over the past few months, and you can find summaries of them by clicking on the links above. I'll be giving an update on new findings, and briefly touch on topics not mentioned above. As a door prize, Harman will donate a free copy of Dr. Floyd Toole's book Sound Reproduction (shown on the right side bar) autographed by the author of the book.

AES members and nonmember guests are welcome to attend. The meeting will be held at the Sportmen's Lodge in Studio City. More details can be found at the Los Angeles AES website.

Sunday, April 3, 2011

Version 2.04 of Harman How to Listen Now Available For Download!

Version 2.04 of Harman How to Listen is now available for download here.

This update fixes the problem with the noise and hum attribute tests. We've also updated the user's manual to help navigate around some installation issues some users have reported.

Friday, March 25, 2011

Version 2.03 of Harman How to Listen Now Available For Download!



You can download the latest update of Harman How to Listen  (version 2.03) here. This update fixes a bug in the Windows version that prompted listeners to locate program material that was not packaged with the installer. There is no significant change to the Mac version. Enjoy!

Tuesday, March 15, 2011

Harman's "How to Listen" Listener Training Software Now Available as Beta



Well, it's been some time coming, but the listener training software Harman How to Listen is finally available for free download here. This beta software is available in both Mac OSX and Windows versions.

We are pleased to offer the software packaged with four high quality music samples, courtesy of Bravura Records. The 24-bit music tracks are provided in both 96 kHz and 48 kHz formats in order to be compatible with older PC sound cards. We hope you try the software, and find that it improves you critical listening skills. This is a work in progress, and we expect to add more features and training tasks to this public version of the software over time. Enjoy!

Thursday, December 9, 2010

How to Listen: A Course on How to Critically Evaluate the Quality of Recorded and Reproduced Sound

Next month, I will be giving a one day course on How to Listen at the 2011 ALMA Winter Symposium in Las Vegas,held Jan. 4th and 5th, just prior to the CES show. The symposium will also feature other courses, workshops and paper sessions on loudspeaker and headphone design, testing and evaluation. You can register for my course and other events at the ALMA Symposium here. Below is a brief preface for my course How to Listen, which I encourage you to attend.


Figure 1: A listener training in the Harman International Reference Listening Room using the original version of the How to Listen training software.


Whether you are involved in the mixing of live and recorded sound, the design and calibration of sound systems, or just shopping for a new audio system, the question “Does it sound good?” is usually foremost on your mind. With sufficient prodding, most people can offer an opinion on the overall sound quality of a recording and its reproduction. Beyond that, listeners generally lack the necessary experience, training and vocabulary to describe which specific aspects of the sound they like and dislike. Sadly, the audio industry has no standardized terminology that allows musicians, audio engineers and audiophiles to communicate with each other about sound quality in a concise and meaningful way. Courses in critical listening are not commonly available in audio programs at universities, and are even less available to the general public. In summary, there is a real need for a comprehensive course that teaches audio enthusiasts how to critically evaluate sound quality.

A Scientific Approach Towards Training Listeners
To address this need, the author has developed a critical listening course called How to Listen. The course aims to teach students how to evaluate sound quality using percepts well established in the auditory perception field. These sound quality percepts are taught and demonstrated in a controlled way using real-time processing of recorded sounds. This has two benefits. First, the intensity of each attribute can be adjusted according to the aptitude and performance of the listener. Second, closely tying the physical properties of the stimulus to its perception and evaluation (a science known as psychoacoustics) there is theoretical basis behind the training approach. For example, the listener training data can be used to better understand how we perceive sound quality, which physical aspects of sound matter most of its perceived quality, and possibly identify the important underlying sound quality attributes that influence our preferences. Critical listening is treated as a science, rather than the black art it currently is.
How to Listen also includes classroom topics in the fundamentals of human auditory perception, sound quality research in variables that significantly influence the quality of recorded and reproduced sound (e.g. loudspeakers, rooms, recordings, microphones) and a brief tutorial in how to conduct sound quality listening tests that produce accurate, reliable and valid results.
But before we get too far ahead of ourselves, there must be good reasons for training listeners since it requires an investment in time and resources. There is also the question of external validity: Can the sound quality preferences of trained listeners be extrapolated to the preferences of untrained listeners, and does this hold true across different cultures? These questions will be answered in the following sections.
Why Train Listeners?
There are several compelling reasons for training listeners. First, trained listeners have been shown to produce more discriminating and reliable judgment of sound quality than untrained listeners [1]. Fewer listener can be used to achieve a similar level of statistical confidence, which can result in savings in time and money. For example, a panel of 15 trained listeners can provide sound quality ratings with reliable statistical confidence in less than 8 hours. To achieve a similar level of confidence using untrained listeners would require about 10 times more listeners, 10 times more days to complete the testing, and cost 10 times more money to pay the listeners and staff conducting the tests. If the study is conducted by an independent research firm using 200-300 untrained listeners, the cost can easily exceed $100k.
A second reason for training listeners is that they are able to report precisely what they like and dislike about the sound quality using well-defined, meaningful terms. This feedback can provide important guidance for reengineering the product for optimal sound quality.
Besides training listeners for product research, there are benefits in training audio marketing and sales people to become better critical listeners. Training makes them better equipped to communicate sound quality issues to audio engineers and customers. As audio companies expand sales and operations in China, India, and other developing countries, there is a growing need to develop a common cross-cultural understanding as to what constitutes good sound and unacceptable sound.
Does Training Bias Listeners?
An important question is whether the training process itself biases the sound quality preferences of listeners. If the trained listener preferences are different from those of the targeted demographic, there is a danger the product may not be well received in the marketplace. This raises the age old question, “Is preference in sound quality a matter of personal taste - much like food, wine and music - or is it universal?”
To study this question, the author compared the performances and loudspeaker preferences of trained listeners versus untrained listeners [1]. Over 300 untrained listeners were tested over a period of 18 months where they compared four different loudspeakers under controlled, double-blind listening conditions. Their preferences were then compared to the preferences of the trained Harman listening panel.
The results, plotted in Figure 2, show that the rank ordering of the loudspeaker preferences were the same for both the trained and untrained listeners. There were two main differences in how the two groups of listeners responded. First, the trained listeners tended to give lower loudspeaker ratings overall. Second, the trained listeners distinguished themselves from the untrained listeners by generally giving more discriminating and consistent loudspeaker preference ratings.



Figure 2: The mean loudspeaker preference ratings and 95% confidence intervals are shown for four loudspeakers evaluated in a controlled, double-blind listening test. The results of different groups of untrained listeners are compared to those of the 12 Harman listeners.

Relative Performances of Trained Versus Untrained Listeners
A common performance metric used to quantify the listener’s discrimination and consistency in rating sound quality is the F-statistic. This calculation is done by performing an analysis of variance (ANOVA) on the main variable being tested. In the above study [1], the performances of trained versus untrained listeners were compared by calculating the loudspeaker F-statistic for each individual listener.

Figure 3 shows the relative performance of different groups of untrained listeners based on their mean F-statistics compared to the F-statistics of the trained listeners. The relative performances of the untrained groups were: audio retailers (35%), audio reviewers (20%), audio marketing/sales staff (10%), and college students (4%). The poor performance of the students was explained by their tendency to give all four loudspeakers very similar and high ratings. A likely explanation for this was that they experienced a level of sound quality that was much higher than their everyday common experience: compressed MP3 music reproduced through headphones. The good news is that the students seemed to appreciate the higher fidelity sound based on the high ratings. In time, they will hopefully seek out better quality audio systems.



Figure 3: The relative performance of different groups of untrained listeners compared to the trained Harman listeners. Performance is based on the group’s average loudspeaker F-statistic which represents their ability to give discriminating and consistent preference ratings.

Are There Cross-Cultural Preferences in Sound Quality?
One of the oldest controversies in audio is the notion that different cultures or geographical regions of the world have different sound quality preferences [see reference 2]. For example, it is often claimed that Japanese listeners have different loudspeaker preferences than Americans due to differences in language, music, cultural practices and norms, and the acoustics of their homes. So far, very little formal research has done on this subject. In some preliminary studies, the author has found no significant differences in sound quality preferences for loudspeakers and automotive audio systems among Chinese, Japanese and American listeners.
How to Listen: A New Listener Training Software Application
Research has found most sound quality percepts fall under the attribute categories of timbre, spatial, dynamic or related nonlinear distortion. Within these four attributes there are additional sub-attributes that describe more specific sonic characteristics of the attribute. For example, Bright-Dull and Full-Thin are timbre sub-attributes related to the relative emphasis and de-emphasis of high and low frequencies, respectively. Sub-attributes for spatial quality deal with the location and width of the auditory image(s), and the perceived sense of spaciousness or envelopment. Distortion sub-attributes include the presence of noise, hum, audible clipping and distortions specific to the audio device(s) under test.
How to Listen focuses on teaching listeners to evaluate sound quality differences based on these four attributes and their sub-attributes (see Figure 4). While listening to music recordings, one or more attributes are manipulated in a controlled way so that listeners recognize and report the magnitude of these changes using the appropriate terms and scales. An analogy to this would the Wine Aroma Wheel where expert wine tasters are trained to identify the intensities of different aroma-flavors perceived in the wine.



Figure 4: A list of the 17 different training tasks that focus on one or more of the four sound quality attributes: spectral (timbral), spatial, distortion and dynamics.


To facilitate the training process, a proprietary computer-based software program called
“How to Listen” was developed by Harman software engineers Sean Hess and Eric Hu. The software runs on both Mac and PC computers, and can play both stereo and multichannel music files. A real-time DSP engine built into the software application allows real-time manipulation of sound quality attributes in response to the listeners’ responses and performance.
There are currently five different types of training tasks that focus on one or more sound quality attributes (see Figure 4):
  1. Band Identification
  2. Spectral Plot
  3. Spatial Mapping
  4. Attribute Test
  5. Preference Test
Band Identification (see Figure 5) teaches listeners to identify spectral distortions based on their frequency, level and Q-factor using combinations of peak/dip and highpass/lowpass filters. In each trial, the listener compares the unequalized version of the music track (FLAT) to a version that has been equalized (EQ) using one of the filters drawn on the screen. The listener must select the correct filter (Filter 1 or 2) they believe has been applied to the equalized version.




Figure 5: A screen capture of the listener training task “Band Identification” in Harman’s “How to Listen” training software. The listener compares the unequalized music “Flat” to an equalized version (EQ) and must select the EQ filter that is associated with its sound.


The difficulty of each training task automatically increases or decreases based on the listener’s performance. The listener is give immediate feedback on their responses, and they can audition all possible response choices when they enter an incorrect response.

The training task Spectral Plot requires the listener to compare different music programs that have been equalized a number of different ways. The listener must select the equalization curve that best matches its sound quality. This task teaches listeners to behave like human spectrum analyzers. Once fully trained, the listener can draw a graph of the audio system’s frequency response based on how it sounds.
The Spatial Mapping task requires the listener to graphically indicate on a two-dimensional map where a sound appears in the listening space. The Attribute training task requires the listener to correctly rank order two or more sounds on a given attribute scale based on the intensity of the attribute (e.g. bright-dull). For the Preference task, the listener must give preference ratings where the sound quality of the music has been modified for one or more sound quality attributes. The performance of the listener is calculated based on a statistical post-hoc test that determines the discrimination and reliably of the listeners’ preference ratings. Together, these different training tasks teach listeners to critically evaluate any type of sound quality variation they are likely to encounter when listening to recorded and reproduced sound.

Conclusions
The evaluation of sound quality remains an elusive art in the audio industry. Better awareness, understanding, and appreciation of sound quality may be possible if there existed a method to teach listeners how to evaluate the quality of reproduced sound and report what they hear using well-defined and meaningful terminology. How to Listen is a listener training course that aims to achieve those goals. Listeners are taught to identify and rate audible changes to different sound quality percepts related to the spectral, spatial, dynamic and distortion qualities of recorded music. Performance metrics based on the discrimination, accuracy and reliably of the listeners’ responses are factored into whether the listener meets the criterion of being a “trained” listener. The question of whether a listener is truly golden eared or not, is no longer a matter of conjecture and debate since How to Listen will ultimately reveal the true answer.
References
[1] Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.
[2] Sean E. Olive, “Are There Cross-Cultural Preferences in the Quality of Reproduced Sound?” Audio Musings, July 2, 2010.here.

Saturday, April 10, 2010

Evaluating the Sound Quality of Ipod Music Stations: Part 2 Listening Tests


Part 1 of this article described a listening test method used at Harman International for evaluating the sound quality of Ipod Music Docking Stations. In part 2, I present the results of a recent competitive benchmarking listening test where three popular Music Stations of comparable price were evaluated by a panel of trained listeners. Were listeners able to reliably formulate a preference among the different Ipod Music Stations using this test method? And what were the underlying sound quality attributes that explain these preferences? Read on to find out.

Throughout this article, I will refer to slides in an accompanying PDF presentation, or you can watch a YouTube video of the presentation.


The Products Tested
A listening test was performed on three Ipod Music Stations that retail for the same approximate price of $599: the Harman Kardon MS 100, the Bose SoundDock 10, and the Bowers & Wilkins Zeppelin (see slide 2). All three products provide Ipod docking playback capability and an auxiliary input for external sources such as CD player,etc. The latter was used in these tests to reproduce a CD-quality stereo test signal fed from a digital sound source.


Listening Test Method
The Music Stations were evaluated in the Harman International Reference Listening Room (slide 4) described in detail in a previous blog posting. Each Music Station was positioned on a shelf attached to the Harman automated in-wall speaker mover, which provides the means for rapid multiple comparisons among three products designed to be used in, on, or near a wall boundary. The music stations were level-matched within 0.1 dB at the listening position by playing pink noise through each unit and adjusting the acoustic output level to produce the same loudness measured via the CRC stereo loudness meter .

All tests were performed double-blind with the identities of the products hidden via an acoustically transparent, but visually opaque screen. The listening panel consisted of 7 trained listeners with normal audiometric hearing. Each listener sat in the same seat situated on-axis to the Music Stations positioned at seated ear height, approximately 11 feet away (slide 5).

The Music Stations were evaluated using a multiple comparison (A/B/C) protocol whereby listeners could switch at will between the three products before entering their final comments and ratings based on overall preference, distortion, and spectral balance. This was repeated using four different stereo music programs with one repeat (4 programs x 2 observations = 8 trials). In total, each listener provided 216 ratings, in addition to their comments. The typical length of the test was between 30-40 minutes. The presentation order of the music programs and Music Stations were randomized by the Harman Listening Test software to minimize any order-related biases in the results.


Results: Overall Preference Ratings For the Music Stations
A repeated measures analysis of variance was used to statistically establish the effects and interactions between the independent variables and the different sound quality ratings. The main effect was related to the Music Stations with no significant effects or interactions observed between the program material and Music Stations. Note that in the following discussion, the brands/models of the Music Stations have removed from the results since this information is not relevant to the primary purpose of the research and this article. Instead, the Music Station products have been assigned the letters A,B and C in descending order according to their mean overall preference rating.

The mean preference ratings and upper 95% confidence intervals based on the 7 listeners are plotted in slide 7. Music Station A received a preference rating of 6.8, and was strongly preferred over the Music Stations B (4.58) and C (4.08).


Individual Listener Preference
The individual listener preference ratings and upper 95% confidence intervals are plotted in slide 8. The intra and inter listener reliability in ratings were generally quite high. All seven listeners rated Music Station A higher than the other two products, although some listeners, notably 55 and 64, were less discriminating and reliable than other the listeners. Both these listeners had significantly less training and experience than the other listeners, which has been demonstrated in previous studies to be an important factor in listener performance.


Distortion Ratings
Nonlinear distortion includes audible buzzes, rattles, noise and other level-dependent distortions related to the performance of the electronics, transducers, and mechanical integrity of the product’s enclosure. In these tests, the average playback level was held constant (78 dB(B) slow), and listeners could not adjust it up or down. Under these test conditions, some listeners still felt there were audible differences in distortion (slide 9) with Music Station A (distortion rating = 7.19) having less distortion than Music Stations B (5.5) and C (4.94).

Some of these differences in subjective distortion ratings could be related to a “Halo Effect," a scaling bias wherein listeners tend to rate the distortion of loudspeakers according to their overall preference ratings - even when the distortion is not audible. An example of “Halo Effect” bias has been noted in a previous loudspeaker study by the author [1]. Reliable and accurate quantification of nonlinear distortion in perceptually meaningful terms remains problematic until better subjective and objective measurements are developed.


Spectral Balance Ratings
Listeners rated the spectral balance of each Music Station across seven equally log-spaced frequency bands using a ± 5-point scale. A rating of 0 indicates an ideal spectral balance, positive numbers indicate too much emphasis within the frequency band, and negative numbers indicate a deemphasis within the frequency band. Rating the spectral balance of an audio component is a highly specialized task that requires skill and practice acquired through using Harman’s “How to Listen” listener training software application. In a previous study [1], it has been shown that spectral balance ratings are closely related to the measured anechoic listening window of the loudspeaker, although may vary with changes in the directivity and the ratio of direct/reflected sound at the listening location.

The mean spectral balance ratings averaged across all programs and listeners are plotted in slide 10. Listeners felt Music Station A had the flattest or most ideal spectral balance, with the exception of a need for more upper/lower bass, and less emphasis in the upper treble. Music Station B was judged to have too much emphasis in the upper bass (88 Hz), and too little emphasis in the upper midrange/treble. Music Station C was rated to have a slight overemphasis in the upper bass, and a very uneven balance throughout the midrange with a peak centered around 1700 Hz.


Listener Comments
Listeners provided comments that described the audible difference among three Music Stations. The frequency or number of times a specific comment was used to describe each product is summarized in slide 11. The correlation between the product’s preference rating and each descriptor is indicated by correlation coefficient (r) shown in the bottom row of the table. The same table data shown in slide 11 are plotted in graphical form in slide 12.

The most common three descriptors applied to the Music Station A were neutral (16), bright (9), and thin (9). These descriptors generally confirm the perceived mean spectral balance ratings summarized in slide 10.

The three most frequent descriptors applied to Music Station B were colored (13), boomy bass (10), and uneven mids(6). The “boomy bass” is clearly suggested in spectral balance ratings (see the large 88 Hz peak) in slide 10.

The three most frequent descriptors used to describe the sound quality of Music Station C were colored (19), uneven mids (9), and harsh (6). All three descriptors have a high negative correlation with the overall preference rating, and may explain the low preference rating this product received. The coloration and unevenness of the midrange are confirmed in the spectral balance rating in slide 10. The harshness is most likely related to the perceived spectral peak perceived around 1700 Hz.


Conclusions
This article summarized the results of a controlled, double-blind listening test performed on three comparatively priced Ipod Music Stations using seven trained listeners with normal hearing. The results provide evidence that the sound quality of Music Station A was strongly preferred over Music Stations B and C. There was strong consensus among all seven listeners who rated Music Station A highest overall. The Music Station preference ratings can be largely explained by examining the perceived spectral balance ratings of the products, which are in turn closely related to listener comments on the sound quality of the products.

The most preferred product, Music Station A, was perceived to have the flattest, most ideal spectral balance, and solicited frequent comments to its neutral sound quality. As the spectral balance ratings deviated from flat or ideal, the products received frequent comments related to coloration, boomy bass, and uneven midrange. While the distortion ratings were highly correlated with preference, more investigation is needed to determine the extent to which the distortion ratings are related to a possible scaling bias known as the “halo effect."

In part 3 of this article, I will present the objective measurements of these products - both anechoic and in-room acoustical measurements - to see if they can reliably predict the subjective ratings of the products reported here.


References
[1] Sean E. Olive, “ A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention, preprint 6113 (May 2004).

Thursday, March 11, 2010

A Method For Training Listeners and Selecting Program For Listening Tests

The benefits of training listeners for subjective evaluation of reproduced sound are well documented [1]-[3]. Not only do trained listeners produce more discriminating and reliable sound quality ratings than untrained listeners, but they can report what they perceive in very precise, quantitative and meaningful terms.


One of the unexpected byproducts of listener training is that it identifies which music selections are most sensitive to distortions commonly found within the audio chain [4]. This is exactly what was found in a series of listener training experiments the author reported in a 1994 paper entitled, “A method for training listeners and selecting program material for listening tests.” The following sections summarize the findings of those early experiments, which helped establish an objective method for training and selecting listeners and program material used for listening tests at Harman International over the past 16 years. A slide presentation summarizing the paper can be downloaded here, and will be referred to throughout the following sections.


Matching the Sound of Spectral Distortions to Their Frequency Response Curve


A computer-based training task was designed where listeners were required to compare different spectral distortions added to programs and then match the frequency response curve of the filter that generated the distortion (see slides 4-5). This was repeated using eight different equalizations and twenty different music selections digitally edited into short 10-20 s loops.


The equalizations included ±3 dB shelving filters at low (100 Hz) and high (5 kHz) frequencies, and ±3 dB resonances (Q = 0.66) centered at 500 Hz and 2 kHz (slide 6). An equalized version of the program (Flat) was always provided as a reference. The twenty music selections include classical, jazz and pop/rock genres with instrumentations that varied from solo instruments, speech and small combos to rock/combos and orchestras (slide 7). Pink noise was also included since this continuous broadband signal has been found to produce the lowest detection thresholds of resonances in loudspeakers [5],[6].


Eight untrained listeners with normal hearing participated in the training exercises, which were conducted over five separate listening sessions consisting of 32 trials each (slides 8 and 9). The presentation order of the equalizations, trials, and programs were randomized to prevent any order related biases. The listener’s performance was based on the percentage of correct responses given over the course of the five training sessions.


The Results


The training results were statistically analyzed using a repeated measures analysis of variance (ANOVA) to determine the effect the different music programs, equalizations, and trials had on the listeners’ performance in correctly identifying the different equalizations (slide 11).


Listener Performance Is Strongly Influenced by Program Selection


The single largest effect on the listener’s performance was the program selection. Slide 13 plots the mean listener performance scores for each of the twenty programs averaged across all eight equalizations. The percentage of correct responses ranged from a high of 88% (pink noise) to a low of 54% (jazz piano trio). Listeners performed the task best when using broadband, spectrally dense continuous signals like pink noise or pop/rock selections like Tracy Chapman, Little Feat, and Jennifer Warnes. Listeners performed worse on programs featuring solo instruments, small combos and speech that produced more discontinuous and narrow band signals. More about this later.



Equalization Context Influences Listener Performance


The effect of equalization on listener performance was surprisingly small (slide 14). There was a tendency for listeners to correctly identify the spectral distortions that occurred at low and high frequency regions versus the midband equalizations. The explanation for this can be found by examining the interaction effect between equalization * trial, indicating that listener performance depended on which combinations of equalizations were presented within a trial. In other words, the context in which an equalization was presented influenced listener performance (slide 15). These contextual effects can be summarized as follows:


  1. Listeners gave more correct responses when the presented equalizations were more separated in frequency.
  2. Listeners gave more correct responses when presented spectral boosts versus notches; spectral notches were often confused with spectral peaks located at slightly higher frequencies.
  3. Low frequency boosts were often confused with high frequency cuts (and vice versa).
  4. Low frequency cuts were often confused with high frequency boosts (and vice versa)



Greater frequency separation between different equalizations would produce more distinctive tonal or timbral differences that would help improve identification. The second observation confirms previous research that has found spectral notches are more difficult to detect than spectral peaks of similar bandwidth [5]. The one exception is broadband dips, which have similar detection thresholds as resonance peaks with equivalent bandwidth[6]. Observations c) and d) are related to each other, and are more difficult to explain. On first glance, it seems implausible that boosts and cuts separated five octaves apart can be confused with one another. A possible explanation is that listeners are using information across the entire bandwidth to judge the perceived perceive balance of the bass and treble. In this case, the slope or shape of the spectra must be an important factor (slide 16). Since a boost or cut of similar magnitude at opposite ends of the audio bandwidth produce similar broadband shapes or slopes, this might explain why listeners might confuse the two with each other.


Program and EQ Interact to Influence Listener Performance


There was also a significant interaction between program and equalization that affected listener performance. This interaction effect was most apparent for the programs presented in training session 3 where listener performance varied significantly depending on the combination of programs and equalization presented to the listener (slide 18). It seems plausible that these differences were related to differences in the spectra of the programs, which was confirmed by plotting the average 1/3-octave spectra of the four programs (slide 19). The largest listener response errors tended to occur when the equalization fell in a frequency range where there was little spectral energy in the programs (e.g. Programs P10 (Stan Getz) and P19 (Canadian Brass)). It makes sense that listeners cannot easily analyze the spectral distortions if the program material does not contain signals that make them audible.



Not All Listeners Are Equal to the Task


No amount of training will make me eligible for the Canadian Olympic hockey team - even if I were 25 years younger. Some people simply lack the innate mental and physical raw material to perform a highly specialized task. This is also true for critical listening as illustrated by the average performance scores of eight listeners after 5 listening sessions (slide 20). The range of individual listener performances range from 82% (listener 4) to 31% (listener 3). All listeners had normal hearing. Therefore, the reason for this large inter-listener variance in performance is related to other factors such as listener motivation, attentiveness, and their listening (and general) intelligence. Training data such as this, can provide an objective quantifiable metric for selecting the best listeners for audio product evaluations.



Practice Makes Perfect


The success of any listener training task that it can lead to measurable improvement in performance with repetition. Slide 21 show shows listener performance measured over five training sessions based on the eight listeners tested. The graph shows monotonic improvement in performance from 65% correct responses to 80% after five training sessions. Additional training sessions would most likely realize further gains in performance for some subjects. In other words, the training works!



Programs With Wider and Flatter Spectrums Improve Listener Performance (Why Tracy Chapman is as Good as Pink Noise)


Spectrum analysis was performed on the different program selections to see if this could explain the strong effect of program on listener performance. The 1/3-octave spectrum of each program was plotted based on a long-term average taken over the entire length of the loop. When we looked at the spectrums of the programs it became clear that this was a significant predictor of how well listeners would perform their task.


Slide 22 plots the average spectrum of four groups of program (5 programs in each group) rank ordered (from highest to lowest) according to the listener performance scores they produced. It clearly shows that the programs with the flattest and most extended spectrums (e.g. pink noise, pop/rock, full orchestra) were better suited for identifying spectral distortions. After pink noise, Tracy Chapman (program 2 in the above graph) had among the widest and flattest spectrums measured, and along with pink noise (program 1) registered the highest listener performance scores. Programs that had narrow band spectra with limited energy above and below 500 Hz (speech, solo instruments, small jazz and classical ensembles) concentrated in group 4 were less suited for identifying spectral distortions. While these groupings had some of the most musically entertaining selections, in the end, they were not good signals for detecting and characterizing spectral distortions in audio components.



Conclusions


A listener training method has been described that teaches listeners how to identify spectral distortions according to their frequency response curve. Experimental evidence was shown indicating listeners improved their performance in this task after 5 training sessions, although not all listeners are equal in their performance.


Statistical analysis of the training data revealed that the program selections are the largest factor influencing listener performance in this task: programs with continuous broadband spectra (e.g. pink noise, Tracy Chapman,etc) provide the best signals for characterizing spectral distortions whereas programs with narrow band spectra (e.g. speech, solo instruments) provide poor signals for performing this task. Furthermore, listeners seem to confuse certain types of spectral distortions with others when the distortions presented share similarities in their frequency, bandwidth, and broadband spectral slope or shape.

Finally, it is important to remember that the training methods and programs discussed in this study focussed on perception and analysis of spectral distortions. While these types of distortions are the most dominant ones found in loudspeakers, microphones and listening rooms, there are other types of distortions for which a different set of programs are likely better suited for revealing their audibility and subjective analysis. The current Harman listener training software “How to Listen” includes training tasks on spectral distortion as well as spatial, dynamic and various types of nonlinear distortions for which we hope to discover the optimal programs for detecting and analyzing their audibility. Stay tuned.



References


  1. Olive, Sean E., "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study,” J. Audio Eng. Soc. Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.
  2. Bech, Soren, “Selection and Training of Subjective for Listening Tests on Sound-Reproducing Equipment,” J. Audio Eng. Soc., vol. 40 no. 7/8 pp. 590-610 (July 1992).
  3. Toole Floyd E. "Subjective Measurements of Loudspeakers Sound Quality and Listener Performance," J. Audio Eng. Soc., vol. 33, pp. 2-32 (1985 Jan./Feb.).
  4. Olive, Sean E., “A Method for Training Listeners and Selecting Program Material for Listening Tests” presented at the 97th AES Convention, preprint 3893, (November 1994).
  5. Toole, Floyd E. and Sean E. Olive, “The Modification of Timbre by Resonances: Perception and Measurement,” J. Audio Eng. Soc., Vol. 36, pp. 122-142 (March 1998).
  6. Olive, Sean E.; Schuck, Peter L.; Ryan, James G.; Sally, Sharon L.; Bonneville, Marc E. “The Detection Thresholds of Resonances at Low Frequencies,” J. Audio Eng. Soc., Vol. 45, Issue 3, pp. 116-128 (March 1997).
  7. Olive Sean E., “Harman’s How to Listen - A New Computer-based Listener Training Program,” May 30,2009.