Saturday, October 31, 2009

Audio's Circle of Confusion

Audio’s “Circle of Confusion” is a term coined by Floyd Toole [1] that describes the confusion that exists within the audio recording and reproduction chain due to the lack of a standardized, calibrated monitoring environment. Today, the circle of confusion remains the single largest obstacle in advancing the quality of audio recording and reproduction.

The circle of confusion is graphically illustrated in Figure 1. Music recordings are made with (1) microphones that are selected, processed, and mixed by (2) listening through professional loudspeakers, which are designed by (3) listening to recordings, which are (1) made with microphones that are selected, processed, and mixed by (2) listening through professional monitors...... you get the idea. Both the creation of the art (the recording) and its reproduction (the loudspeakers and room) are trapped in an interdependent circular relationship where the quality of one is dependent on the quality of the other. Since the playback chain and room through which recordings are monitored are not standardized, the quality of recordings remains highly variable.


Creating Music Recordings Through An Uncalibrated Instrument


A random sampling of ones own music library will quickly confirm the variation in sound quality that exists among different music recordings. Apart from audible differences in dynamic range, spatial imagery, and noise and distortion, the spectral balance of recordings can vary dramatically in terms of their brightness and particularly, the quality and quantity of bass. The magnitude of these differences suggests that something other than variations in artistic judgment and good taste is at the root cause of this problem.


The most likely culprits are the loudspeakers and rooms through which the recording were made. While there are many excellent professional near-field monitors in the marketplace today, there are no industry guidelines or standards to ensure that they are used. The lack of meaningful, perceptually relevant loudspeaker specifications makes the excellent loudspeakers difficult to identify and separate from the truly mediocre ones. To make matters worse, some misguided recording engineers monitor and tweak their recordings through low-fidelity loudspeakers thinking that this represents what the average consumer will hear. Since loudspeakers can be mediocre in an infinite number of ways, this practice only guarantees that quality of the recording will be compromised when heard through good loudspeakers [1]. This is very counterproductive if we want to improve the quality and consistency of audio recording and reproduction.


Another significant source of variation in the recording process stems from acoustical interactions between the loudspeaker and the listening room [1]-[3] Below 300-500 Hz, the placement of the loudspeaker-listener can cause >18 dB variations in the in-room response due to room resonances and placing the loudspeaker in proximity to a room boundary.


Evidence of acoustical interactions has been well documented survey of 164 professional recording studios where the same high-quality, factory calibrated monitored was installed [4]. Figure 2 shows the distribution of in-room responses measured at the primary listening location where the recordings are monitored and mixed. The 1/3-octave smoothed curves show a reasonably tight ± 2.5 dB variation above 1 kHz. However, below 1 kHz, variation in the in-room response gets progressively much worse at lower frequencies. Below 100 Hz, the in-room bass response can vary as much 25 dB among the different control rooms! You needn’t look any further than here to understand why the quality and quantity of bass is so variable among the recordings in your music library.


Evaluating Loudspeakers When the Recording is a Nuisance Variable


Loudspeaker manufacturers are also trapped in the circle of confusion since music recordings are used by listening panels, audio reviewers, and consumers to ultimately judge the sound quality of the loudspeaker. The problem is that distortions in the recording cannot be easily separated from those produced by the loudspeaker. For example, a recording that is too bright can make a dull loudspeaker sound good, and an accurate loudspeaker sound too bright [5]. A review of the scientific literature on loudspeaker listening tests indicates that recordings are a serious nuisance variable that need to be carefully selected and controlled in the experimental design and analysis of test results.


At Harman International, we try to minimize loudspeaker-program interactions in our loudspeaker listening tests by using well-recorded programs that are equally sensitive to distortions found in loudspeakers. Listeners become intimately familiar with the sonic idiosyncrasies of the different programs through extensive listener training and participation in formal tests. In each trial of a loudspeaker test, the listener can switch between different loudspeakers using the same program, which allows them to better separate the distortions in the program (which are constant), from the distortions in the loudspeaker.


Through 25+ years of well-controlled loudspeaker listening tests, scientists have identified the important loudspeaker parameters related to good sound, which can be quantified in a set of acoustical measurements [6],[7] By applying some statistics to these measurements, listeners’ loudspeaker preferences can be predicted [8]. The bass performance of the loudspeaker alone accounts for 30% the listener’s overall preference rating. Good bass is essential to our enjoyment of music, which unfortunately is a frequency range where loudspeakers and rooms are most variable (see Figure 2). Controlling the behavior of loudspeakers and rooms at low frequencies is essential to achieving a more consistent quality of audio recording and reproduction. Fortunately, there are technology solutions today that provide effective control of acoustical interactions between the loudspeaker and rooms.


Breaking the Circle of Circle of Confusion


As Toole points out in [1], the key in breaking the circle of confusion lies in the hands of the professional audio industry where the art is created. A meaningful standard that defined the quality and calibration of the loudspeaker and room would improve the quality and consistency of recordings. The same standard could then be applied to the playback of the recording in the consumer’s home or automobile. Finally, consumers would be able to hear the music as the artist intended.


References


[1] Floyd E. Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal press (July 2008).


[2] Floyd Toole, “Loudspeakers and Rooms: A Scientific Review,” J. Audio Eng. Soc., Vol. 54, No. 6, (2006 June). A free copy of this paper can be downloaded here


[3] Sean E. Olive and William Martens “Interaction Between Loudspeakers and Room Acoustics Influences Loudspeaker Preferences in Multichannel Audio Reproduction,” presented at the 123rd Convention of the AES, preprint 7196 (October 2007).


[4] Aki V. Mäkivirta and Christophe Anet, “The Quality of Professional Surround Audio Reproduction, A Survey Study,”19th International AES Conference: Surround Sound - Techniques, Technology, and Perception (June 2001).


[3] Todd Welti and Allan Devantier, “Low-frequency Optimization Using Multiple Subwoofers,” Audio Eng. Soc., Vol. 54, No. 5, (May 2006). A free copy of this paper can be downloaded here


[4] Sean E. Olive, John Jackson, Allan Devantier, David Hunt, and Sean Hess, “The Subjective and Objective Evaluation of Room Correction Products,” presented at the 127th AES Convention, New York, preprint 7960 (October 2009).


[5] Sean E. Olive,”The Preservation of Timbre: Microphones, Loudspeakers, Sound Sources and Acoustical Spaces,”8th International AES Conference: The Sound of Audio (May 1990)


[6] Floyd E. Toole, “Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 1,” J. Audio Eng. Soc., Vol. 34,No.4, pp.227-235, (April 1986). A free copy of this paper can be downloaded here


[7] Floyd E. Toole, “Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2,” J. Audio Eng. Soc., Vol. 34, No.5, pp. 323-348, (May 1986). A free copy of this paper can be downloaded here


[8] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part II - Development of the Model,” presented at the 117th Convention of the AES, preprint 6190 (October 2004).


Sunday, June 14, 2009

Validation of a Binaural Room Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems


In a previous posting on Audio Musings, I described Harman’s binaural room scanning (BRS) measurement and playback system. BRS is a powerful audio research and testing tool that allows Harman scientists to capture, store and later reproduce through a head-tracking headphone-based auditory display the acoustical signature of one or more audio systems situated in the same or different listening spaces. BRS makes it practical to conduct double-blind listening evaluations of different loudspeakers, listening rooms, and automotive audio systems in a very controlled and efficient way.


I also pointed out that all binaural recording/playback systems contain errors that require proper calibration for their removal. However, removing all BRS errors can become very expensive and impractical, so some compromise is necessary. This precipitates the need to experimentally validate the performance of the BRS system to ensure that the remaining errors after calibration do not significantly change listeners’ perceptual ratings of audio systems evaluated through the BRS system as compared to in situ evaluations.

To this end, Todd Welti, Research Acoustician at Harman International, and I recently presented the results of a series of BRS validation tests performed using different equalizations of a high quality automotive audio system [1]. You can view the Powerpoint presentation of the conference paper here. For more detailed information on this experiment, you can view the proceedings from the recent 36th AES Automotive Conference in Dearborn, Michigan, when they become available in the AES e-library .


To assess the accuracy of the BRS system, a group of trained listeners gave double-blind preference ratings for different equalizations of the audio system evaluated under both in situ (in the car) and BRS playback conditions. For the BRS playback condition, the listener sat in the same car listening to a virtual headphone-based reproduction of the car's audio system. The purpose of the experiment was to determine whether the BRS and in situ methods produced the same preference ratings for different equalizations of the car's audio system.


Listeners gave preference ratings for five different equalizations using 4 different music programs reproduced in mono (left front speaker), stereo (left and right front channels) and surround sound (7.1 channels). The three playback modes were tested separately to isolate potential issues related to differences in how the BRS system reproduced front versus rear, and hard versus phantom-based, auditory images.


The listening test results showed there were no statistically significant differences in equalization preferences between the in situ and BRS playback methods. This was true for mono, stereo and multichannel playback modes (see slides 21-23). An interesting finding was that these results were achieved using a BRS calibration based on a single listener whose calibration tended to work well for the other listeners on the panel. This suggests that individualized listener calibrations for BRS-based listening tests may not be necessary, so long as the calibration and listeners are carefully selected.


In conclusion, this validation experiment provides experimental evidence that a properly calibrated BRS measurement and playback system can produce similar preferences in automotive audio equalization as measured using in situ listening tests.



Reference

[1] Sean E. Olive, Todd Welti, “Validation of a Binaural Car Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems,” presented at the 36th International AES Automotive Audio Conference, (June 2-4, 2009).

Thursday, June 11, 2009

Whole-Body Vibration Associated with Low-Frequency Audio Reproduction Influences Preferred Equalization

Last week I attended the AES Automotive Audio Conference in Dearborn, Michigan where about 70-odd (pun intended) audio scientists and engineers gathered to discuss the latest scientific and technological developments in automotive audio. A detailed description of the program can be found here.

This article focuses on a paper I co-authored and presented called “Whole-body Vibration Associated with Low-Frequency Audio Reproduction Influences Preferred Equalization" [1]. The work was a joint effort between three researchers, Drs. William Martens, Wieslaw Woszcczyk, and Hideki Sakanashi, from the CIRMMT at McGill University in Montreal, and myself, at Harman International. A copy of our Powerpoint presentation given at the conference can be viewed here.

It is well established that human perception is a multimodal sensory experience [2]. For example, both auditory and visual cues associated with a sound source and its acoustic space are integrated and interrogated by high level cognitive processes that determine our spatial perception of the source based on the plausibility, strength and agreement between the visual and auditory cues. Bimodal sensory interactions have been reported in studies where the video quality of the picture influences listeners’ judgment of the audio system’s sound quality and vice versa (although the audio quality has much less influence on the perceived quality of video than vice versa) [3].

However, little is known regarding how low frequency (below 100 Hz) whole-body vibration produced by the audio system influences our perception of the quality and quantity of bass. Perhaps the most related study is Rudmose’s “case of the missing 6 dB” where the perceived loudness of low frequency signals reproduced through headphones was reported to be approximately 6 dB lower than that of loudspeakers producing the equivalent sound pressure levels at the ears [4]. Rudmose showed that the absence of tactile stimulus in headphone reproduction could, in part, account for why headphones sound less loud than loudspeakers when producing equivalent sound pressure level at the ears (the rest of the missing 6 dB was due to experimental factors, and the increased physiological noise in the ear canal introduced by the coupling of the headphone to the ear).

A Tactile-Auditory Bimodal Sensory Experiment
To shed more light on this mystery, an experiment was conducted at McGill University. A total of 6 trained tonmeisters listened through calibrated headphones to binaural recordings of a virtual high-quality automotive audio system. Each listener adjusted the low frequency boost applied to different multichannel music reproductions according to their taste while experiencing a high and low level whole-body vibration. This was generated by a programmable motion platform driven by the low frequency portion (below 80 Hz) of the music signal. In this way, vibration was delivered to both the feet and body of the listener through the chair (see slide 5). The virtual automotive audio system was based on a binaural room scan (BRS) of the audio system installed in our research vehicle located at the Harman International Automotive Audio Research Lab in Northridge, California (see slide 3). For more information on how BRS works, please refer to my previous BRS blog postings, Part 1 and Part 2.

Whole-Body Vibration Influences Preferred Equalization of the Audio System
The researchers found that the preferred bass equalization of music reproduced through the virtual automotive audio system was significantly influenced by the level of whole-body vibration experienced. While the amount of preferred bass boost varied with music program and listeners, the listeners always preferred less bass for the high vibration condition than for the low vibration one, which was 12 dB lower: on average, listeners preferred 6 dB less bass boost in their headphones moving from the low to high vibration conditions (see slide 10). In other words, there was a bimodal sensory interaction effect between the auditory and tactile senses that influenced listeners' preferred bass equalization of music reproduced through the headphones.

It is important to note that the 6 dB effect reported here may not be the same as observed in an automobile where the level and other physical characteristics of the vibration observed may be different from what was tested here. Under driving conditions, listeners experience additional sources of vibration (and acoustic noise) from the road and engine of the vehicle that may partially mask the whole-body vibration effects produced by the audio system. More research is currently underway to study how real and simulated whole-body vibration in vehicles influences listeners' perception of the audio system and its sound quality.

References
[1] William Martens, Wieslaw Woszczyk, Hideki Sakanashi, and Sean E. Olive, “Whole-Body Vibration Associated with Low-Frequency Audio Reproduction Influences Preferred Vibration,” presented at the AES 36th International Conference, Dearborn, Michigan (June 2-4, 2009).

Saturday, May 30, 2009

Harman's "How to Listen" - A New Computer-based Listener Training Program


Trained listeners with normal hearing are used at Harman International for all standard listening tests related to research and competitive benchmarking of consumer, professional and automotive audio products. This article explains why we use trained listeners, and describes a new computer-based software program developed for training and selecting Harman listeners.


Why Train Listeners?

There are many compelling reasons for training listeners. First, trained listeners produce more discriminating and reliable judgments of sound quality than untrained listeners [1]. This means that fewer listeners are needed to achieve the same statistical confidence, resulting in considerable cost savings. Second, trained listeners are taught to identify, classify and rate important sound quality attributes using precise, well-defined terms to explain their preferences for certain audio systems and products. Vague audiophile terms such as “chocolaty”, “silky” or “the bass lacks pace, rhythm or musicality” are NOT part of the trained listener's vocabulary since these descriptors are not easily interpreted by audio engineers who must use the feedback from the listening tests to improve the product. Third, the Harman training itself, so far, has produced no apparent bias when comparing the loudspeaker preferences of trained versus untrained listeners [1]. This allows us to safely extrapolate the preferences of trained listeners to those of the general untrained population of listeners (e.g. most consumers).



Harman's “How to Listen” Listener Training Program

Harman’s “How to Listen” is a new computer-based software application that helps Harman scientists efficiently train and select listeners used for psychoacoustic research and product evaluation. The self-administered program has 17 different training tasks that focus on four different attributes of sound quality: timbre (spectral effects), spatial attributes(localization and auditory imagery characteristics), dynamics, and nonlinear distortion artifacts. Each training task starts at a novice level, and gradually advances in difficulty based on the listeners’ performance. Constant feedback on the listener's responses is provided to improve their learning and performance. A presentation of the training software can be viewed in parts 1 and 2


Spectral Training Tasks

There are two different spectral training tasks. In the Band Identification training task, the listener compares a reference (Flat) and an equalized version of the music program (EQ), and must determine which frequency band is affected by the equalization (see slide 5 of part 2). The types of filters include peaks, dips, peak and dips, high/low shelving and low/high/bandpass filters. The task is aimed at teaching listeners to identify spectral distortions in precise, quantitative terms (filter type, frequency, Q and gain) that directly correspond to a frequency response measurement.


At the easiest skill level, there are only 2 frequency band choices, which are easily detected and classified. However, as the listener advances, the audio bandwidth is subdivided into multiple frequency bands making the audibility and identification of the affected frequency band more challenging.


The Spectral Plot training exercise takes this one step further. The listener compares different music selections equalized to simulate more complex frequency response shapes commonly found in measurements of loudspeakers in rooms and automotive spaces. The listener is given a choice of frequency curves which they must correctly match to the perceived spectral balances of the stimuli. This teaches listeners to graphically draw the perceived timbre of an audio component as a frequency response curve. Once trained, listeners become quite adept at drawing the perceived spectral balance of different loudspeakers, and these graphs closely correspond to their acoustical measurements [2], [3].


Sound Quality Attribute Tasks

The purpose of this task is to familiarize the listener with each of the four sound quality attributes (timbre, spatial, dynamics and nonlinear distortion) and their sub-attributes, and measure the listener's ability to reliably rate differences in the attribute's intensity. For example, in one task the listener must rank order the relative brightness/dullness of two or more stimuli based on the intensities of the brightness/dullness of the processed music selection. As the difficulty of the task increases, the listener must rate more stimuli that have incrementally smaller differences in intensity of the tested attribute. Listener performance is calculated using Spearman’s rank correlation coefficient which expresses the degree to which stimuli have been correctly rank ordered on the attribute scale.


Preference Training

In this task, the listener enters preference ratings for different music selections that have had one or more attributes (timbre, spatial, dynamics and nonlinear distortion) modified by incremental amounts.


By studying the interrelationship between the modification of these attributes and the preference ratings, Harman scientists can uncover how listeners weight different attributes when formulating their preferences. From this, the preference profile of a listener can be mapped based on the importance they place on certain sound quality attributes. The performance metric in the preference task is based on the F-statistic calculated from an ANOVA performed on the individual listeners’ data. The higher the F-statistic, the more discriminating and/or consistent the listeners’ ratings are --- a highly desirable trait in the selection of a listener.


Other Key Features

Harman’s “How to Listen” training software runs on both Windows and Mac OSX platforms, and includes a real-time DSP engine for manipulating the various sound quality attributes. Most common stereo and multichannel sound formats are supported. In “Practice Mode”, the user can easily add their own music selections.


All of the training results from the 100+ listeners located at Harman locations world-wide are stored on a centralized database server. A web-based front end will allow listeners to log in to monitor and compare their performances to those of other listeners currently in training. Of course, the identifies of the other listeners always remain confidential.


Conclusion

In summary, Harman’s “How to Listen” is a new computer-based, self-guided software program that teaches listeners how to identify, classify and rate the quality of recorded and reproduced sounds according to their timbral, spatial, dynamic and nonlinear distortion attributes. The training program gives constant performance feedback and analytics that allow the software to adapt to the ability of the listener. These performance metrics are used for selecting the most discriminating and reliable listeners used for research and subjective testing of Harman audio products.


References

[1] Sean. E Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.


[2] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention (May 2004).


[3] Floyd E. Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal press (July 2008). Available from Amazon here


Saturday, May 23, 2009

The Harman International Reference Listening Room

Last week I returned from the AES Munich Convention where I gave a paper entitled ”A New Reference Listening Room for Consumer, Professional, and Automotive Audio Research.” It describes the features, scientific rationale, and acoustical performance of a new reference listening room designed and built for the purposes of conducting controlled listening tests and psychoacoustic research for consumer, professional, and automotive audio products. The main features of the room include quiet and adjustable room acoustics, a high-quality calibrated playback system, an in-wall loudspeaker mover, and complete automated control of listening tests performed in the room. A copy of my Munich AES presentation is available here.


The first prototype reference room was built at the Harman Northridge campus in 2007. Additional reference listening rooms have since been built at Harman locations in the UK, Germany, with the fourth one being constructed in Farmington Hills, Michigan. We are in the process of measuring and calibrating the performances of the different rooms using acoustical measurements and binaural room scans, which will be evaluated for their perceptual similarity in sound quality.


With a standardized listening room and playback system, Harman scientists can conduct listener training, psychoacoustic research and product testing at different Harman locations throughout the world. The results from these different locations can be compared or pooled together since the room, playback system, and trained listeners are a constant variable. With this brings greater testing efficiency, flexibility, and new opportunities in the kinds of product research and listening tests Harman is able to do in the future. Already, we are using the unique features of these rooms to conduct very controlled listening tests on consumer in-wall speakers, and to research and benchmark the performance of various commercial and prototype loudspeaker-room correction devices.


You will hear a lot more about the Harman International reference listening rooms in the near future because of the pivotal role they will play in the research, testing and subjective benchmarking of new Harman consumer, professional and automotive audio products. Just thinking about these research possibilities makes me truly excited!

Thursday, April 9, 2009

The Dishonesty of Sighted Listening Tests



An ongoing controversy within the high-end audio community is the efficacy of blind versus sighted audio product listening tests. In a blind listening test, the listener has no specific knowledge of what products are being tested, thereby removing the psychological influence that the product’s brand, design, price and reputation have on the listeners’ impression of its sound quality. While double-blind protocols are standard practice in all fields of science - including consumer testing of food and wine - the audio industry remains stuck in the dark ages in this regard. The vast majority of audio equipment manufacturers and reviewers continue to rely on sighted listening to make important decisions about the products’ sound quality.

An important question is whether sighted audio product evaluations produce honest and reliable judgments of how the product truly sounds.


A Blind Versus Sighted Loudspeaker Experiment

This question was tested in 1994, shortly after I joined Harman International as Manager of Subjective Evaluation [1]. My mission was to introduce formalized, double-blind product testing at Harman. To my surprise, this mandate met rather strong opposition from some of the more entrenched marketing, sales and engineering staff who felt that, as trained audio professionals, they were immune from the influence of sighted biases. Unfortunately, at that time there were no published scientific studies in the audio literature to either support or refute their claims, so a listening experiment was designed to directly test this hypothesis. The details of this test are described in references 1 and 2.


A total of 40 Harman employees participated in these tests, giving preference ratings to four loudspeakers that covered a wide range of size and price. The test was conducted under both sighted and blind conditions using four different music selections.


The mean loudspeaker ratings and 95% confidence intervals are plotted in Figure 1 for both sighted and blind tests. The sighted tests produced a significant increase in preference ratings for the larger, more expensive loudspeakers G and D. (note: G and D were identical loudspeakers except with different cross-overs, voiced ostensibly for differences in German and Northern European tastes, respectively. The negligible perceptual differences between loudspeakers G and D found in this test resulted in the creation of a single loudspeaker SKU for all of Europe, and the demise of an engineer who specialized in the lost art of German speaker voicing).


Brand biases and employee loyalty to Harman products were also a factor in the sighted tests, since three of the four products (G,D, and S) were Harman branded. Loudspeaker T was a large, expensive ($3.6k) competitor's speaker that had received critical acclaim in the audiophile press for its sound quality. However, not even Harman brand loyalty could overpower listeners' prejudices associated with the relatively small size, low price, and plastic materials of loudspeaker S; in the sighted test, it was less preferred to Loudspeaker T, in contrast to the blind test where it was slightly preferred over loudspeaker T.


Loudspeaker positional effects were also a factor since these tests were conducted prior to the construction of the Multichannel Listening Lab with its automated speaker shuffler. The positional effects on loudspeaker preference rating are plotted in Figure 2 for both blind and sighted tests. The positional effects on preference are clearly visible in the blind tests, yet, the effects are almost completely absent in the sighted tests where the visual biases and cognitive factors dominated listeners' judgment of the auditory stimuli. Listeners were also less responsive to loudspeaker-program effects in the sighted tests as compared to the blind test conditions. Finally, the tests found that experienced and inexperienced listeners (both male and female) tended to prefer the same loudspeakers, which has been confirmed in a more recent, larger study. The experienced listeners were simply more consistent in their responses. As it turned out, the experienced listeners were no more or no less immune to the effects of visual biases than inexperienced listeners.


In summary, the sighted and blind loudspeaker listening tests in this study produced significantly different sound quality ratings. The psychological biases in the sighted tests were sufficiently strong that listeners were largely unresponsive to real changes in sound quality caused by acoustical interactions between the loudspeaker, its position in the room, and the program material. In other words, if you want to obtain an accurate and reliable measure of how the audio product truly sounds, the listening test must be done blind. It’s time the audio industry grow up and acknowledge this fact, if it wants to retain the trust and respect of consumers. It may already be too late according to Stereophile magazine founder, Gordon Holt, who lamented in a recent interview:


“Audio as a hobby is dying, largely by its own hand. As far as the real world is concerned, high-end audio lost its credibility during the 1980s, when it flatly refused to submit to the kind of basic honesty controls (double-blind testing, for example) that had legitimized every other serious scientific endeavor since Pascal. [This refusal] is a source of endless derisive amusement among rational people and of perpetual embarrassment for me..”



References


[1] Floyd Toole and Sean Olive,”Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests, and Other Interesting Things,” presented at the 97th AES Convention, preprint 3894 (1994). Download here.


[2] Floyd Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal Press, 2008.

Saturday, April 4, 2009

Binaural Room Scanning Part 2: Calibration, Testing, and Validation



In part 1 of this article, I described how binaural room scanning works and why it has great potential as a tool for psychoacoustic research and product testing. In part 2, I will describe some errors inherent to all BRS systems, which require proper calibration to remove them. Finally, I will summarize some research that has focused on testing and validating the performance of BRS systems .


BRS Errors

Unfortunately, all binaural record/reproduction systems inherently produce errors in the signals captured by the mannequin, and later reproduced through the headphones. The categories of BRS errors are summarized in Figure 2 [1]. Certain types of BRS errors (error 9) are easily removed with a correction filter. Individualized errors related to physical differences in the shapes and sizes of listeners’ ears/heads/torso versus those of the mannequin’s, are more challenging.


While it is possible to calibrate and remove individualized errors, doing so can be expensive and time-consuming, making BRS a less practical tool for psychoacoustic research and testing. Therefore, an important question is whether their correction leads to a significant perceptual improvement or difference in the listening test results. For example, if the error has no significant impact on the listening test results and conclusions, then the error is less of a concern. It is well known that humans can re-learn and adapt to errors in their vision or hearing introduced through injury or artificial means, suggesting that listeners may possibly do the same when listening through a BRS system.


BRS Calibration Testing and Validation

To answer some of the above questions, we have been conducting listening tests in parallel using both BRS and conventional in situ methods to determine whether they produce similar results. These tests have been conducted using different loudspeakers auditioned in a reflective listening room [1],[2], and having listeners evaluate the sound quality of different automotive audio systems [3]. So far, we have found no statistically significant differences in the results between the two methods. Listeners’ loudspeaker and automotive audio system preference ratings are the same whether measured in situ or through the BRS system. It is important to note that the BRS calibration used for these tests was based on a single listener, suggesting that individualized calibrations may not be necessary. Listeners are apparently adapting to and ignoring many of the residual errors that remain after calibration. We suspect adaptation is enhanced in multiple comparison listening tasks where the BRS errors are constant and common among the different loudspeakers or car audio systems being evaluated. Using a different BRS system, other researchers have reported similarly good agreement between BRS and in situ tests conducted on different audio CODECS [4], and an automobile audio system manipulated to produce different spectral and spatial attributes [5].


Future BRS Research

There remain many unanswered questions about the performance, calibration and testing of BRS systems. Is it necessary to capture and simulate the whole-body vibration that listeners feel when listening in a car or other listening space where the low frequency tactile element is significant? What is the best method for capturing and reproducing the non-linear distortion of the audio system, which is normally not included in the binaural room impulse response? Given that auditory perception is part of a multi-modal sensory experience, how important is it to include the visual cues (e.g. car and room interiors) that reinforce the auditory cues heard by the listener, and prevent cognitive dissonance? These are questions that we are currently investigating so that we can improve the overall accuracy and perceptual realism of BRS systems used in psychoacoustic research and product evaluations.


References


[1] Sean E. Olive, “Interaction Between Loudspeakers and Room Acoustics Influences Loudspeaker Preferences in Multichannel Audio Reproduction,” PhD Thesis, Schulich School of Music, McGill University, Montreal, Quebec, Canada, (February 2008).


[2] Olive, Sean,Welti Todd, and Martens, William L.,“Listener Loudspeaker Preference Ratings Obtained In Situ Match those Obtained Via a Binaural Room Scanning Measurement and Playback System,” presented at the 122nd Audio Eng. Soc., preprint 7034, (May 2007). Download here.

[3] Olive, Sean and Welti Todd, “Validation of a Binaural Car Scanning System for Subjective Evaluation of Automotive Audio Systems,” to be presented at the 36th International Audio Eng. Conference, Dearborn, Michigan, USA (June 2-4, 2009).

[4] S. Bech, M-A Gulbol, G. Martin, J. Ghani, and W. Ellermeir, “A listening test system for automotive audio - part 2: Initial verification [Preprint 6359]. Proceedings of the 118th International Convention of the Audio Eng. Soc., Barcelona, Spain, (May, 2005). Download here


[5] Søren Bech, Sylvain Choisel and Patrick Hegarty,”A Listening Test System for Automotive Audio – Part 3: Comparison of Attribute Ratings Made in a Vehicle with Those Made Using an Auralisation System,” [Preprint 7224], Proceedings of the 123rd International

Convention of the Audio Eng. Soc., Vienna, Austria, (October 2007). Download here.