Showing posts with label listener training. Show all posts
Showing posts with label listener training. Show all posts

Friday, February 5, 2010

Evaluating the Sound Quality of Ipod Music Stations: Part 1


For many consumers, an iPod Music Docking Station may be the primary audio device through which they experience most of their recorded music and infotainment. These ubiquitous devices offer a convenient, low cost, portable and easy-to-use solution for enjoying an Ipod through loudspeakers -- but what about their sound quality? What sonic compromises are made in order to achieve this level of convenience and portability? Do certain models or brands of Ipod Music Stations offer better sound than others, and if so, how can consumers identify which ones they are? These are legitimate questions that consumers should be asking when purchasing an Ipod Music Station. Unfortunately, the answers are not readily found.


Choosing an Ipod Music Station based on sonic performance quality is a daunting task for consumers. There are dozens of models to choose from that vary in price from $80 to as high as $3000 for a model designed by Ferrari. Competent in-store demonstrations and reviews of these products are difficult to find, and the technical specifications on the packaging provide no clear indication of how good they sound. For traditional loudspeakers, it is already possible to quantify their sound quality, but the audio industry continues to withhold this information from consumers. Without meaningful performance specifications in place, consumers cannot make sound purchase decisions, nor can manufacturers be easily held accountable for delivering products that sound “ not good enough.”


This article describes a listening test method used at Harman International for evaluating the sound quality of Harman and competitors’ Ipod Music Stations. The goal is to provide subjective ratings of Ipod Music Stations that are accurate, reliable and scientifically valid. From this data, a set of technical performance specifications can be developed that quantify how good the products sound.


Designing Listening Tests For Ipod Music Stations


Fortunately, there already exists a large body of scientific knowledge on how to design accurate, reliable and valid listening tests on loudspeakers. A key ingredient is careful control of listening test nuisance variables: these are psychological, electro-acoustical and experimental factors not directly related to the product(s) under test but nonetheless influence and bias the results (click on the figure below). Some of the more significant nuisance variable controls that should be in place but often are ignored by audio manufacturers and reviewers are:

  • Double-blind conditions (this removes the effects of sighted biases related to brand, price,etc)
  • Trained listeners with normal hearing (trained listeners are up to 20 times more discriminating and reliable than untrained listeners, yet their overall sound quality preferences are similar to those of untrained listeners)
  • Quiet listening room with acoustics that are representative of average homes (important for hearing low level sounds and the quality of the loudspeaker's off-axis radiated sounds)
  • Loudness matching between products (the perception of timbre, spatial and dynamic attributes are level dependent)
  • Selection of well-recorded music selections that are revealing of sound quality differences
  • Multiple comparisons among products which are more discriminating and reliable compared to single stimulus presentations



These important nuisance variable controls are essential for obtaining accurate, reliable and valid sound quality ratings of Ipod Music Stations.



Including the Acoustical Effects of the Wall and Desktop in the Listening Test


If audio products are not tested under similar conditions for which they were designed and intended to be used, the ecological validity (as well as the external validity) of the test may be compromised: in other words, the test results will be of little value or relevance to how the product is typically used in the real world.


Most Ipod Music Stations are intended to be placed on a desktop surface or bookshelf located near a wall, which will cause acoustical reinforcement and cancellation at certain audio frequencies. Below 500 Hz, there will be a gradual increase in sound pressure level that unless compensated for in the design of the product can make vocals and bass instruments sound tubby and boomy. Diffraction effects or reflections from the desktop/bookshelf may also produce audible effects that should be included in the listening test. For these reasons, listening tests on Ipod Music Stations are best done on a desktop/wall boundary.



A Video On How We Evaluate the Sound Quality of Ipod Docking Stations


The video shown at the top of the page illustrates how Ipod Music Stations are currently evaluated in the Harman International Reference Listening Room. The acoustical properties and features of the room have been described in detail in a previous posting.


In the video you see a trained listener comparing three different Ipod Music Stations situated on our automated in-wall speaker mover configured with a removable shelf and desktop. An acoustically transparent, visually opaque screen is placed between the listener and the products under test, so that the test is double-blind (note: the term double-blind implies that neither the listener nor the experimenter know the identities of the products currently selected since the computer controls and randomly assigns the letters A/B/C to the products in each trial.)


The listener can switch between the different products at will and enter their responses via a wireless PDA equipped a custom listening test software (LTS) client application. Sound quality ratings are given on a number of different pre-defined scales that include preference, spectral balance, distortion, auditory image size.This is repeated twice using four different programs.


The PDA client communicates with the LTS server application that performs the following functions:


  • A test wizard that defines of all experimental design and setup parameters (perceptual scales, presentation of stimuli, program, randomization of test objects, playback level,etc), which are then stored in a database
  • automation and administration of the listening test and its hardware (e.g. speaker mover, media player, DSP, audio switcher)
  • collection, storage and statistical analysis of listening test data
  • real-time monitoring of listener’s performance and ratings during the test


LTS makes conducting listening tests an efficient and repeatable process by minimizing human interaction and errors in the listening test setup, storage, and analysis of the results.


Conclusions


This article has described a listening test method used for evaluating Ipod Music Stations with the goal to provide accurate, reliable and valid sound quality ratings. In Part 2, I will show some results from a recent listening test conducted on different Ipod Music Stations, followed by some different acoustical measurements of the products in Part 3. By studying the relationship between well-controlled scientific listening tests and comprehensive acoustical measurements of Ipod Music Stations, a meaningful technical specification based on sound quality can be found.



Saturday, May 30, 2009

Harman's "How to Listen" - A New Computer-based Listener Training Program


Trained listeners with normal hearing are used at Harman International for all standard listening tests related to research and competitive benchmarking of consumer, professional and automotive audio products. This article explains why we use trained listeners, and describes a new computer-based software program developed for training and selecting Harman listeners.


Why Train Listeners?

There are many compelling reasons for training listeners. First, trained listeners produce more discriminating and reliable judgments of sound quality than untrained listeners [1]. This means that fewer listeners are needed to achieve the same statistical confidence, resulting in considerable cost savings. Second, trained listeners are taught to identify, classify and rate important sound quality attributes using precise, well-defined terms to explain their preferences for certain audio systems and products. Vague audiophile terms such as “chocolaty”, “silky” or “the bass lacks pace, rhythm or musicality” are NOT part of the trained listener's vocabulary since these descriptors are not easily interpreted by audio engineers who must use the feedback from the listening tests to improve the product. Third, the Harman training itself, so far, has produced no apparent bias when comparing the loudspeaker preferences of trained versus untrained listeners [1]. This allows us to safely extrapolate the preferences of trained listeners to those of the general untrained population of listeners (e.g. most consumers).



Harman's “How to Listen” Listener Training Program

Harman’s “How to Listen” is a new computer-based software application that helps Harman scientists efficiently train and select listeners used for psychoacoustic research and product evaluation. The self-administered program has 17 different training tasks that focus on four different attributes of sound quality: timbre (spectral effects), spatial attributes(localization and auditory imagery characteristics), dynamics, and nonlinear distortion artifacts. Each training task starts at a novice level, and gradually advances in difficulty based on the listeners’ performance. Constant feedback on the listener's responses is provided to improve their learning and performance. A presentation of the training software can be viewed in parts 1 and 2


Spectral Training Tasks

There are two different spectral training tasks. In the Band Identification training task, the listener compares a reference (Flat) and an equalized version of the music program (EQ), and must determine which frequency band is affected by the equalization (see slide 5 of part 2). The types of filters include peaks, dips, peak and dips, high/low shelving and low/high/bandpass filters. The task is aimed at teaching listeners to identify spectral distortions in precise, quantitative terms (filter type, frequency, Q and gain) that directly correspond to a frequency response measurement.


At the easiest skill level, there are only 2 frequency band choices, which are easily detected and classified. However, as the listener advances, the audio bandwidth is subdivided into multiple frequency bands making the audibility and identification of the affected frequency band more challenging.


The Spectral Plot training exercise takes this one step further. The listener compares different music selections equalized to simulate more complex frequency response shapes commonly found in measurements of loudspeakers in rooms and automotive spaces. The listener is given a choice of frequency curves which they must correctly match to the perceived spectral balances of the stimuli. This teaches listeners to graphically draw the perceived timbre of an audio component as a frequency response curve. Once trained, listeners become quite adept at drawing the perceived spectral balance of different loudspeakers, and these graphs closely correspond to their acoustical measurements [2], [3].


Sound Quality Attribute Tasks

The purpose of this task is to familiarize the listener with each of the four sound quality attributes (timbre, spatial, dynamics and nonlinear distortion) and their sub-attributes, and measure the listener's ability to reliably rate differences in the attribute's intensity. For example, in one task the listener must rank order the relative brightness/dullness of two or more stimuli based on the intensities of the brightness/dullness of the processed music selection. As the difficulty of the task increases, the listener must rate more stimuli that have incrementally smaller differences in intensity of the tested attribute. Listener performance is calculated using Spearman’s rank correlation coefficient which expresses the degree to which stimuli have been correctly rank ordered on the attribute scale.


Preference Training

In this task, the listener enters preference ratings for different music selections that have had one or more attributes (timbre, spatial, dynamics and nonlinear distortion) modified by incremental amounts.


By studying the interrelationship between the modification of these attributes and the preference ratings, Harman scientists can uncover how listeners weight different attributes when formulating their preferences. From this, the preference profile of a listener can be mapped based on the importance they place on certain sound quality attributes. The performance metric in the preference task is based on the F-statistic calculated from an ANOVA performed on the individual listeners’ data. The higher the F-statistic, the more discriminating and/or consistent the listeners’ ratings are --- a highly desirable trait in the selection of a listener.


Other Key Features

Harman’s “How to Listen” training software runs on both Windows and Mac OSX platforms, and includes a real-time DSP engine for manipulating the various sound quality attributes. Most common stereo and multichannel sound formats are supported. In “Practice Mode”, the user can easily add their own music selections.


All of the training results from the 100+ listeners located at Harman locations world-wide are stored on a centralized database server. A web-based front end will allow listeners to log in to monitor and compare their performances to those of other listeners currently in training. Of course, the identifies of the other listeners always remain confidential.


Conclusion

In summary, Harman’s “How to Listen” is a new computer-based, self-guided software program that teaches listeners how to identify, classify and rate the quality of recorded and reproduced sounds according to their timbral, spatial, dynamic and nonlinear distortion attributes. The training program gives constant performance feedback and analytics that allow the software to adapt to the ability of the listener. These performance metrics are used for selecting the most discriminating and reliable listeners used for research and subjective testing of Harman audio products.


References

[1] Sean. E Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.


[2] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention (May 2004).


[3] Floyd E. Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal press (July 2008). Available from Amazon here


Thursday, January 1, 2009

A Video on How We Measure Loudspeaker Sound Quality at Harman International


Part of my job at Harman International involves participating in audio dealer training and press events. This involves a 1-2 day field trip to Harman's R&D labs in Northridge where the visitors experience first-hand the listener training process, and participate in a double-blind loudspeaker listening test. Visitors usually leave our labs with a heightened appreciation and respect for the scientific efforts behind the development and testing of new models of Revel, JBL, and Infinity loudspeakers.

A few years ago, Infinity commissioned a video known as "Infinity Academy", aimed at  encapsulating  the 1-2 day training event onto a DVD. Chapter 6, the "Final Test," discusses listener training and the double-blind listening test, where trained listeners evaluate the Harman prototype loudspeaker against its best competitors. The goal is to achieve "best-in-class" performance, attainable only until  the prototype receives a preference rating higher than its best competitor. In the event that the loudspeaker fails on its first attempt, the listeners' feedback is used to re-engineer the loudspeaker, after which, it is re-submitted for another listening test.

The picture to the right shows three loudspeakers on the automated speaker shuffler in the Multichannel Listening Lab. The shuffler brings each loudspeaker into the exact same position within 3 seconds, so that any loudspeaker positional biases are removed from the listening test.

Chapter 6 can be downloaded in  MPEG-4 (H.264, 41 MB) or MPEG  (84 MB) formats.  The entire 6 chapters of the DVD are available here.

Saturday, December 27, 2008

Part 2 - Differences in Performances of Trained Versus Untrained Listeners

Part 1 of this article summarized the results of a controlled listening experiment conducted by the author where 300+ listeners, both trained and untrained, rated 4 different loudspeakers based on their preference. The results revealed that the trained and untrained listeners had essentially the same loudspeaker preferences (see reference 1). This provides scientific validation for using trained listeners in loudspeaker tests since their preferences can be safely extrapolated to the preferences of the general population of untrained listeners.

In Part 2, we examine why trained listeners are preferred over untrained listeners for use in listening experiments, by examining differences in performance between the two groups. A common performance metric is the F-statistic, calculated by performing an analysis of variance (ANOVA) on the individual listener's loudspeaker ratings. The F-statistic increases in size as the listener's discrimination and reliability increases. This facet of listener performance is highly desirable for scientists (and bean counters) since fewer listeners and trials are required to achieve an equivalent level of statistical confidence. Some researchers have reported that one trained listener is the statistical equivalent of using 8+ untrained listeners, which translates into considerable cost savings for using trained listeners for audio product testing and research.

The above graph plots the mean loudspeaker F-statistics for 4 groups of untrained listeners categorized according to their occupations. The performance scores of the untrained groups are scaled relative to the mean scores of the trained listener in order to facilitate comparisons between trained and untrained listeners. The trained listeners clearly performed better than any of the untrained groups, by quite a large margin. The relative performance of the untrained groups, from best to worst, were the audio retailers (35%), the audio reviewers (20%), the audio marketing-sales group (10%), and the college students (4%).

The better performance of the audio retailers relative to the other untrained groups may be related to psychological factors such as motivation, expectations, and relevant critical listening experience. The college students - the poorest performing group - were also the youngest and least experienced test subjects. They tended to give all four loudspeakers very similar and very high ratings indicating they were easily satisfied. While this is pure speculation, the students may have had lower sound quality expectations developed through hours of listening to low quality MP3 files reproduced through band-limited earbuds. Most surprising was the relatively poor performance of the audio reviewers, who despite their credentials and years of professional experience, performed 1/5 as well as the trained listeners, and 15 full percentage points lower than the audio retailers. These differences in trained and untrained listener performance underscore the benefits of carefully selecting and training the listeners used for audio product testing and research.

In the next installment of this article, technical measurements of the loudspeakers used in these experiments will be presented. From this, we will explore what aspects of their performance lead to higher preference ratings in controlled listening tests.

Reference 1: Sean E. Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. (This paper can be purchased from the Audio Engineering Society here, or downloaded for free courtesy of Harman International.)

Friday, December 26, 2008

Part 1- Do Untrained Listeners Prefer the Same Loudspeakers as Trained Listeners?


One of the more controversial topics among audio researchers is whether or not trained listeners should be used for audio product testing and research. The argument against using trained listeners is based on a belief that their tastes and preferences in sound quality are fundamentally different from those of the general untrained listener population for whom the product is intended.

There are few published studies to support the notion that trained listeners have different loudspeaker preferences than untrained listeners. To study this question, the author conducted a large study (see reference 1) that compared the loudspeaker preferences of 300+ untrained and trained listeners. Over the course of 18 months, an identical controlled, double-blind listening test was repeated with different groups of trained and untrained listeners who rated 4 different loudspeakers on an 11-point preference scale using 4 different music programs. Loudspeaker positional effects were controlled via an automated speaker shuffler that moves each loudspeaker into the exact same position.

The mean loudspeaker preference ratings for the different groups of listeners are summarized in the above graph. In terms of rank order, the loudspeaker preferences of the untrained listeners (highlighted in red) are essentially the same as those of the trained listeners (highlighted in blue). As a group, the trained listeners tended to give lower ratings, suggesting they may be more difficult to please. An important conclusion from this study is that the loudspeaker preferences of trained listeners can be safely extrapolated to the tastes of consumers having little or no formal listener training. The study did find significant differences between the trained and untrained listeners in terms of how well performed their listening task. This will be discussed in Part 2 that will appear in the next posting of this blog.

Reference 1: Sean. E Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003.

This paper can be purchased from the Audio Engineering Society here, or downloaded for free courtesy of Harman International.