Audio Musings by Sean Olive: 2009

Sunday, November 22, 2009

The Effect of Whole-body Vibrations on Preferred Bass Equalizations of Automotive Audio Systems

Binaural Room Scanning (BRS) is a technology that allows Harman scientists to binaurally capture, store, and later reproduce sound fields through a headphone-based auditory display that includes head tracking for accurate localization of sound sources [1]. BRS enables us to do controlled, double blind comparative evaluations of different automotive audio systems, home theatre systems or sound reinforcement systems that would otherwise not be practical or possible to do. However, current BRS systems do not typically capture and reproduce the whole-body vibrations that are associated with low frequencies reproduced by the audio system. Therefore, an important question related to the accuracy and ecological validity of BRS-based evaluations is whether whole-body vibrations play an important role in our perception of the quality and realism of the automotive audio system.

We recently presented a paper at the 127th Audio Engineering Society Convention that addressed this research question [2]. Three experiments were reported that measured the effects of both real and simulated whole-body vibration associated with the low frequency sounds reproduced by an automotive audio system on listeners’ preferred bass equalizations. A PDF of the side presentation can be found here.

In all three experiments, the same automotive audio system was used: a high-quality 17-channel audio system installed in a 2004 Toyota Avalon. The car was parked in our automotive audio research lab with the engine turned off so that the effects of the road and engine noise and vibration were not part of the experiment. The BRS system was calibrated for each individual listener to minimize errors related to headphone fit, etc. The differences in magnitude response measured at the listeners’ ear in situ versus through the BRS system were very small indeed (see slide 9).

The Effect of Real Vibration on Preferred Bass Levels

In the first experiment, listeners adjusted the level of bass equalization (see slide 10) while experiencing real whole-body vibration produced by the car audio system. The same task was also repeated via the BRS headphone-based system without the vibration present. This was repeated three times using three different music programs (see slide 7). On average, listeners adjusted the bass equalization 1.5 dB higher for the BRS playback condition where the low frequency vibration was not present (see slide 11). The preferred bass level was found to be program dependent due to the amount of bass present in the signal, and the resulting vibration it produced(see slides 12 and 13).

The Effect of Simulated Vibration on Preferred Bass Levels

In the second experiment, we simulated the whole-body vibration produced by the audio system by attaching an actuator to the driver's seat of the car. The actuator was driven by the low frequency portion of the audio signal below 100 Hz. Comprehensive whole-body vibration measurements performed prior to this experiment found that most of the whole-body vibration produced by the audio system occurs below 100 Hz (see slides 15 and 16) at the seat and floor. The level and frequency of the whole-body vibration varies with music program, and the weight of the listener.

Each listener sat in the driver’s seat of the car listening a virtual BRS rendering of the automotive audio system reproduced through headphones. Listeners adjusted the bass level in their headphones while experiencing four different levels of simulated whole-body vibration that varied from none, low (0 dB) medium (+4 dB) and high (+8 dB). The medium level corresponded to the measured vibration in experiment one, when the bass equalization of the automotive audio system was adjusted to its preferred level.

The experimental results indicated that the preferred level (dB) of bass equalization decreased 3 dB when the level of whole-body vibration increased 8 dB. (see slide 21), and varied with program. At the low vibration level, there was no effect on the preferred bass level, since the vibration level was near or below detection thresholds reported in the literature. At the highest level (8 dB), the vibration tended to be annoying, and listeners tended to turn the bass level down with the hope that the vibration would also be reduced. The effect of vibration on preferred bass level was somewhat dependent on the listener, which could be related to their weight (see slide 24).

The experimental results confirm those of a previous experiment conducted by Martens et al. where the vibration was simulated via a platform, and both head-tracking and individualized BRS calibrations were not employed [3]. The results from the Martens' et al. experiments and this one are plotted above in Figure 1. In spite of the methodological differences between the two experiments, there is good agreement between the two studies. This suggests that the effect of whole-body vibration on the preferred level of bass equalization is quite robust.

The Effect of Whole-body Vibration on the Similarity in Sound Quality between BRS and In Situ Reproductions

The third experiment, listeners sat in the car and rated the overall similarity in sound quality between the BRS headphone-based reproduction with and without the simulated vibration compared to the same audio system experienced in situ. Listeners could switch at will between the in situ and two BRS reproductions (with and without shaker). The two BRS treatments were presented double-blind, and repeated two times with four music programs (see slide 27).

The results (see slide 29) show that sound quality of the BRS reproduction system was significantly improved with the presence of whole-body vibration (shaker on).

Conclusions

From these experiments, it is clear that the whole-body vibration associated with the low frequency sounds of an audio system influences listeners’ perception of the quality and quantity of bass. When the vibration is absent from a stereo or binaural recording of music reproduced through headphones there may be a perceived lack of bass. A 4 dB increase in whole-body vibration produces about a 1.5 dB decrease in preferred level of bass equalization. However, there appears to be upper and lower threshold limits beyond which a change in vibration level will have no effect. Moreover, the amount of vibration and its effect on preferred levels of bass equalization will depend on the low frequency characteristics of the music and the individual listener (and possibly their weight).

Finally, adding simulated whole-body vibration to BRS reproductions can greatly enhance their perceived realism and fidelity when compared to the in situ experience, as long as the vibration levels are above the listener's detection threshold.

References

[1] Sean E. Olive and Todd Welti, “Validation of a Binaural Car Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems,” presented at the 36th International AES Automotive Audio Conference, (June 2-4, 2009).

[2] Germain Simon, Sean E. Olive, and Todd Welti, “The Effect of Whole-body Vibration on Preferred Bass Equalization in Automotive Audio Systems,” presented at the 127th Audio Eng. Soc. Convention, preprint 7956, (October 2009).

[3] William Martens, Wieslaw Woszczyk, Hideki Sakanashi, and Sean E. Olive, “Whole-Body Vibration Associated with Low-Frequency Audio Reproduction Influences Preferred Vibration,” presented at the AES 36th International Conference, Dearborn, Michigan (June 2-4, 2009).

Sunday, November 1, 2009

The Subjective and Objective Evaluation of Room Correction Products

In a recent article, I discussed audio’s circle of confusion that exists within the audio industry due to the lack of performance standards in the loudspeakers and rooms through which recordings are monitored. As a result, the quality and consistency of recordings remain highly variable. A significant source of variation in the playback chain occurs from acoustical interactions between the loudspeaker and room, which can produce >18 dB variations in the in-room response below 300-500 Hz.

In recent years, audio manufacturers have begun to offer so-called “room correction” products that measure the in-room response of the loudspeakers at different seating locations, and then automatically equalize them to a target curve defined by the manufacturer. The sonic benefits of these room correction products are generally not well known since, to my knowledge, no one has yet published the results of a well-controlled, double-blind listening test on room correction products. To what degree do room correction products improve or possibly degrade the sound quality of the loudspeaker/room compared to the uncorrected version of the loudspeaker/room? Can the sound quality ratings of the different room correction products be explained by acoustical measurements performed at the listening location?

A Listening Experiment on Commercial Room Correction Products

To answer these questions, we conducted some double-blind listening tests on several commercial room correction products [1]. I recently presented the results of those tests at the 127th Audio Engineering Society Convention in New York. A copy of my AES Keynote presentation can be found here.

A total of three different commercial products were compared to two versions of a Harman prototype room correction that will find its way into future Harman consumer and professional audio products. The products included the Anthem Statement D1, the Audyssey Room Equalizer, the Lyngdorf DPA1, and two versions of the Harman prototype product (see slide 7). Included in the test was a hidden anchor: the same loudspeaker and subwoofer without room correction. In this way, we could directly compare how much each room correction improved or degraded the quality of sound reproduction.

Each room correction device was tested in the Harman International Reference Room using a high quality loudspeaker (B&W 802N) and subwoofer (JBL HB5000) (slides 8 and 9). A calibration was performed for each room correction over the six listening seats according to the manufacturer’s instructions. Two different calibrations were performed with the Harman prototype: one based on a multipoint six-seat average, while the second calibration used a six-microphone spatial average focused on the primary listening seat. The different room corrections were level matched for equal loudness at the listening seat.

The Listener's Task

A total of eight trained listeners with normal hearing participated in the tests. Using a multiple comparison method, the listener could switch at will between the six different room corrections, and rate them according to overall preference, spectral balance, as well as give comments (see slide 14). The administration of the test, including the design, switching, collection and storage of listener responses, was computer automated via Harman’s proprietary Listening Test Software. A total of nine trials were completed using three different programs repeated three times. The presentation order of the program and room corrections was randomized.

Results: Significant Preferences For Different Room Corrections

The mean preference ratings and 95% confidence intervals are shown above in Figure 1 (or slide 17). The room correction products are coded from R1 through R6 in descending order of preference. The identities of the products associated with the results are not relevant for the purpose of this article. Three of the five room corrections (RC1-RC3) were strongly preferred over no room correction (RC4). However, one of the room corrections (RC5) was equally rated to the no correction treatment (RC4), and one of the room corrections (RC6) was rated much worse. Overall, the sound quality of R6 was rated "very poor" based on the semantic definitions of the preference scale.

Perceived Spectral Balance of Room Corrections

Listeners rated the perceived spectral balance of each room correction across seven equal logarithmically spaced frequency bands. The mean spectral balance ratings averaged across all listeners and programs are shown in slide 18. The more preferred room corrections were perceived to have a flatter, smoother spectral balance with extended bass. The less preferred room correction products (R5 and R6) were perceived to have too little bass, which made them sound thin and bright.

Listener Comments on Room Corrections

Listeners also gave comments related to the spectral balance of the different room correction products. Slide 19 shows the number of times a particular comment was used to describe each room correction. The bottom row indicates the correlation between preference rating and the frequency of the comment. The most preferred room corrections were described as "neutral" and "full," which corresponded to flatter, smoother and more bass extended spectral balance ratings. The least preferred room corrections (R4-R6) were described as colored, harsh, thin, and muffled, which corresponded to less flat, less smooth, and less bass extended spectral balance ratings. Slide 20 graphically illustrates the same information in slide 19.

Correlation Between Subjective and Objective Measurements

In-room acoustical measurements were made at the six listening seats using a proprietary 12-channel audio measurement system developed by the Harman R&D Group. Slides 23 and 24 show the amplitude response of the different room corrections spatially averaged for the six seats (slide 23), and at the primary listening seat (slide 24). The measurements are plotted from top to bottom in descending order of preference, each vertically offset to more clearly delineate the differences. A few observations can be made:

The six-seat spatially averaged curves (slide 23) of the room corrections do not explain listeners' room correction preferences as well as the spatially averaged curves taken at the primary seat (slide 24). This makes perfect sense since all of the listening was done in the primary listening seat.

Looking at slide 24, the most preferred room corrections produced the smoothest, most extended amplitude responses measured at the primary listening seat. The largest measured differences among the different room corrections occur below 100 Hz and around 2 kHz where the loudspeaker had a significant hole in its sound power response. The room corrections that were able to fill in this sound power dip received higher preference and spectral balance ratings.

A flat in-room target response is clearly not the optimal target curve for room equalization. The preferred room corrections have a target response that has a smooth downward slope with increasing frequency. This tells us that listeners prefer a certain amount of natural room gain. Removing the rom gain, makes the reproduced music sound unnatural, and too thin, according to these listeners. This also makes perfect sense since the recording was likely mixed in room where the room gain was also not removed; therefore, to remove it from the consumers' listening room would destroy spectral balance of the music as intended by the artist.

Conclusions

There are significant differences in the subjective and objective performance of current commercial room correction products as illustrated in these listening test results. When done properly, room correction can lead to significant improvements in the overall quality of sound reproduction. However, not all room correction products are equal, and two of the tested products produced results that were no better, or much worse, than the unequalized loudspeaker. Room correction preferences are strongly correlated to their perceived spectral balance and related attributes (coloration, full/thin, bright/dull). The most preferred room corrections produced the smoothest, most extended in-room responses measured around the primary listening seat.

More tests are underway to better understand and, if necessary, optimize the performance of Harman's room correction algorithms for different acoustical aspects of the room and loudspeaker.

References

[1] Sean E. Olive, John Jackson, Allan Devantier, David Hunt, and Sean Hess, “The Subjective and Objective Evaluation of Room Correction Products,” presented at the 127th AES Convention, New York, preprint 7960 (October 2009).

Saturday, October 31, 2009

Audio's Circle of Confusion

Audio’s “Circle of Confusion” is a term coined by Floyd Toole [1] that describes the confusion that exists within the audio recording and reproduction chain due to the lack of a standardized, calibrated monitoring environment. Today, the circle of confusion remains the single largest obstacle in advancing the quality of audio recording and reproduction.

The circle of confusion is graphically illustrated in Figure 1. Music recordings are made with (1) microphones that are selected, processed, and mixed by (2) listening through professional loudspeakers, which are designed by (3) listening to recordings, which are (1) made with microphones that are selected, processed, and mixed by (2) listening through professional monitors...... you get the idea. Both the creation of the art (the recording) and its reproduction (the loudspeakers and room) are trapped in an interdependent circular relationship where the quality of one is dependent on the quality of the other. Since the playback chain and room through which recordings are monitored are not standardized, the quality of recordings remains highly variable.

Creating Music Recordings Through An Uncalibrated Instrument

A random sampling of ones own music library will quickly confirm the variation in sound quality that exists among different music recordings. Apart from audible differences in dynamic range, spatial imagery, and noise and distortion, the spectral balance of recordings can vary dramatically in terms of their brightness and particularly, the quality and quantity of bass. The magnitude of these differences suggests that something other than variations in artistic judgment and good taste is at the root cause of this problem.

The most likely culprits are the loudspeakers and rooms through which the recording were made. While there are many excellent professional near-field monitors in the marketplace today, there are no industry guidelines or standards to ensure that they are used. The lack of meaningful, perceptually relevant loudspeaker specifications makes the excellent loudspeakers difficult to identify and separate from the truly mediocre ones. To make matters worse, some misguided recording engineers monitor and tweak their recordings through low-fidelity loudspeakers thinking that this represents what the average consumer will hear. Since loudspeakers can be mediocre in an infinite number of ways, this practice only guarantees that quality of the recording will be compromised when heard through good loudspeakers [1]. This is very counterproductive if we want to improve the quality and consistency of audio recording and reproduction.

Another significant source of variation in the recording process stems from acoustical interactions between the loudspeaker and the listening room [1]-[3] Below 300-500 Hz, the placement of the loudspeaker-listener can cause >18 dB variations in the in-room response due to room resonances and placing the loudspeaker in proximity to a room boundary.

Evidence of acoustical interactions has been well documented survey of 164 professional recording studios where the same high-quality, factory calibrated monitored was installed [4]. Figure 2 shows the distribution of in-room responses measured at the primary listening location where the recordings are monitored and mixed. The 1/3-octave smoothed curves show a reasonably tight ± 2.5 dB variation above 1 kHz. However, below 1 kHz, variation in the in-room response gets progressively much worse at lower frequencies. Below 100 Hz, the in-room bass response can vary as much 25 dB among the different control rooms! You needn’t look any further than here to understand why the quality and quantity of bass is so variable among the recordings in your music library.

Evaluating Loudspeakers When the Recording is a Nuisance Variable

Loudspeaker manufacturers are also trapped in the circle of confusion since music recordings are used by listening panels, audio reviewers, and consumers to ultimately judge the sound quality of the loudspeaker. The problem is that distortions in the recording cannot be easily separated from those produced by the loudspeaker. For example, a recording that is too bright can make a dull loudspeaker sound good, and an accurate loudspeaker sound too bright [5]. A review of the scientific literature on loudspeaker listening tests indicates that recordings are a serious nuisance variable that need to be carefully selected and controlled in the experimental design and analysis of test results.

At Harman International, we try to minimize loudspeaker-program interactions in our loudspeaker listening tests by using well-recorded programs that are equally sensitive to distortions found in loudspeakers. Listeners become intimately familiar with the sonic idiosyncrasies of the different programs through extensive listener training and participation in formal tests. In each trial of a loudspeaker test, the listener can switch between different loudspeakers using the same program, which allows them to better separate the distortions in the program (which are constant), from the distortions in the loudspeaker.

Through 25+ years of well-controlled loudspeaker listening tests, scientists have identified the important loudspeaker parameters related to good sound, which can be quantified in a set of acoustical measurements [6],[7] By applying some statistics to these measurements, listeners’ loudspeaker preferences can be predicted [8]. The bass performance of the loudspeaker alone accounts for 30% the listener’s overall preference rating. Good bass is essential to our enjoyment of music, which unfortunately is a frequency range where loudspeakers and rooms are most variable (see Figure 2). Controlling the behavior of loudspeakers and rooms at low frequencies is essential to achieving a more consistent quality of audio recording and reproduction. Fortunately, there are technology solutions today that provide effective control of acoustical interactions between the loudspeaker and rooms.

Breaking the Circle of Circle of Confusion

As Toole points out in [1], the key in breaking the circle of confusion lies in the hands of the professional audio industry where the art is created. A meaningful standard that defined the quality and calibration of the loudspeaker and room would improve the quality and consistency of recordings. The same standard could then be applied to the playback of the recording in the consumer’s home or automobile. Finally, consumers would be able to hear the music as the artist intended.

References

[1] Floyd E. Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal press (July 2008).

[2] Floyd Toole, “Loudspeakers and Rooms: A Scientific Review,” J. Audio Eng. Soc., Vol. 54, No. 6, (2006 June). A free copy of this paper can be downloaded here

[3] Sean E. Olive and William Martens “Interaction Between Loudspeakers and Room Acoustics Influences Loudspeaker Preferences in Multichannel Audio Reproduction,” presented at the 123rd Convention of the AES, preprint 7196 (October 2007).

[4] Aki V. Mäkivirta and Christophe Anet, “The Quality of Professional Surround Audio Reproduction, A Survey Study,”19th International AES Conference: Surround Sound - Techniques, Technology, and Perception (June 2001).

[3] Todd Welti and Allan Devantier, “Low-frequency Optimization Using Multiple Subwoofers,” Audio Eng. Soc., Vol. 54, No. 5, (May 2006). A free copy of this paper can be downloaded here

[4] Sean E. Olive, John Jackson, Allan Devantier, David Hunt, and Sean Hess, “The Subjective and Objective Evaluation of Room Correction Products,” presented at the 127th AES Convention, New York, preprint 7960 (October 2009).

[5] Sean E. Olive,”The Preservation of Timbre: Microphones, Loudspeakers, Sound Sources and Acoustical Spaces,”8th International AES Conference: The Sound of Audio (May 1990)

[6] Floyd E. Toole, “Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 1,” J. Audio Eng. Soc., Vol. 34,No.4, pp.227-235, (April 1986). A free copy of this paper can be downloaded here

[7] Floyd E. Toole, “Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2,” J. Audio Eng. Soc., Vol. 34, No.5, pp. 323-348, (May 1986). A free copy of this paper can be downloaded here

[8] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part II - Development of the Model,” presented at the 117th Convention of the AES, preprint 6190 (October 2004).

Sunday, June 14, 2009

Validation of a Binaural Room Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems

In a previous posting on Audio Musings, I described Harman’s binaural room scanning (BRS) measurement and playback system. BRS is a powerful audio research and testing tool that allows Harman scientists to capture, store and later reproduce through a head-tracking headphone-based auditory display the acoustical signature of one or more audio systems situated in the same or different listening spaces. BRS makes it practical to conduct double-blind listening evaluations of different loudspeakers, listening rooms, and automotive audio systems in a very controlled and efficient way.

I also pointed out that all binaural recording/playback systems contain errors that require proper calibration for their removal. However, removing all BRS errors can become very expensive and impractical, so some compromise is necessary. This precipitates the need to experimentally validate the performance of the BRS system to ensure that the remaining errors after calibration do not significantly change listeners’ perceptual ratings of audio systems evaluated through the BRS system as compared to in situ evaluations.

To this end, Todd Welti, Research Acoustician at Harman International, and I recently presented the results of a series of BRS validation tests performed using different equalizations of a high quality automotive audio system [1]. You can view the Powerpoint presentation of the conference paper here. For more detailed information on this experiment, you can view the proceedings from the recent 36th AES Automotive Conference in Dearborn, Michigan, when they become available in the AES e-library .

To assess the accuracy of the BRS system, a group of trained listeners gave double-blind preference ratings for different equalizations of the audio system evaluated under both in situ (in the car) and BRS playback conditions. For the BRS playback condition, the listener sat in the same car listening to a virtual headphone-based reproduction of the car's audio system. The purpose of the experiment was to determine whether the BRS and in situ methods produced the same preference ratings for different equalizations of the car's audio system.

Listeners gave preference ratings for five different equalizations using 4 different music programs reproduced in mono (left front speaker), stereo (left and right front channels) and surround sound (7.1 channels). The three playback modes were tested separately to isolate potential issues related to differences in how the BRS system reproduced front versus rear, and hard versus phantom-based, auditory images.

The listening test results showed there were no statistically significant differences in equalization preferences between the in situ and BRS playback methods. This was true for mono, stereo and multichannel playback modes (see slides 21-23). An interesting finding was that these results were achieved using a BRS calibration based on a single listener whose calibration tended to work well for the other listeners on the panel. This suggests that individualized listener calibrations for BRS-based listening tests may not be necessary, so long as the calibration and listeners are carefully selected.

In conclusion, this validation experiment provides experimental evidence that a properly calibrated BRS measurement and playback system can produce similar preferences in automotive audio equalization as measured using in situ listening tests.

Reference

[1] Sean E. Olive, Todd Welti, “Validation of a Binaural Car Scanning Measurement System for Subjective Evaluation of Automotive Audio Systems,” presented at the 36th International AES Automotive Audio Conference, (June 2-4, 2009).

Thursday, June 11, 2009

Whole-Body Vibration Associated with Low-Frequency Audio Reproduction Influences Preferred Equalization

Last week I attended the AES Automotive Audio Conference in Dearborn, Michigan where about 70-odd (pun intended) audio scientists and engineers gathered to discuss the latest scientific and technological developments in automotive audio. A detailed description of the program can be found here.

This article focuses on a paper I co-authored and presented called “Whole-body Vibration Associated with Low-Frequency Audio Reproduction Influences Preferred Equalization" [1]. The work was a joint effort between three researchers, Drs. William Martens, Wieslaw Woszcczyk, and Hideki Sakanashi, from the CIRMMT at McGill University in Montreal, and myself, at Harman International. A copy of our Powerpoint presentation given at the conference can be viewed here.

It is well established that human perception is a multimodal sensory experience [2]. For example, both auditory and visual cues associated with a sound source and its acoustic space are integrated and interrogated by high level cognitive processes that determine our spatial perception of the source based on the plausibility, strength and agreement between the visual and auditory cues. Bimodal sensory interactions have been reported in studies where the video quality of the picture influences listeners’ judgment of the audio system’s sound quality and vice versa (although the audio quality has much less influence on the perceived quality of video than vice versa) [3].

However, little is known regarding how low frequency (below 100 Hz) whole-body vibration produced by the audio system influences our perception of the quality and quantity of bass. Perhaps the most related study is Rudmose’s “case of the missing 6 dB” where the perceived loudness of low frequency signals reproduced through headphones was reported to be approximately 6 dB lower than that of loudspeakers producing the equivalent sound pressure levels at the ears [4]. Rudmose showed that the absence of tactile stimulus in headphone reproduction could, in part, account for why headphones sound less loud than loudspeakers when producing equivalent sound pressure level at the ears (the rest of the missing 6 dB was due to experimental factors, and the increased physiological noise in the ear canal introduced by the coupling of the headphone to the ear).

A Tactile-Auditory Bimodal Sensory Experiment

To shed more light on this mystery, an experiment was conducted at McGill University. A total of 6 trained tonmeisters listened through calibrated headphones to binaural recordings of a virtual high-quality automotive audio system. Each listener adjusted the low frequency boost applied to different multichannel music reproductions according to their taste while experiencing a high and low level whole-body vibration. This was generated by a programmable motion platform driven by the low frequency portion (below 80 Hz) of the music signal. In this way, vibration was delivered to both the feet and body of the listener through the chair (see slide 5). The virtual automotive audio system was based on a binaural room scan (BRS) of the audio system installed in our research vehicle located at the Harman International Automotive Audio Research Lab in Northridge, California (see slide 3). For more information on how BRS works, please refer to my previous BRS blog postings, Part 1 and Part 2.

Whole-Body Vibration Influences Preferred Equalization of the Audio System

The researchers found that the preferred bass equalization of music reproduced through the virtual automotive audio system was significantly influenced by the level of whole-body vibration experienced. While the amount of preferred bass boost varied with music program and listeners, the listeners always preferred less bass for the high vibration condition than for the low vibration one, which was 12 dB lower: on average, listeners preferred 6 dB less bass boost in their headphones moving from the low to high vibration conditions (see slide 10). In other words, there was a bimodal sensory interaction effect between the auditory and tactile senses that influenced listeners' preferred bass equalization of music reproduced through the headphones.

It is important to note that the 6 dB effect reported here may not be the same as observed in an automobile where the level and other physical characteristics of the vibration observed may be different from what was tested here. Under driving conditions, listeners experience additional sources of vibration (and acoustic noise) from the road and engine of the vehicle that may partially mask the whole-body vibration effects produced by the audio system. More research is currently underway to study how real and simulated whole-body vibration in vehicles influences listeners' perception of the audio system and its sound quality.

References
[1] William Martens, Wieslaw Woszczyk, Hideki Sakanashi, and Sean E. Olive, “Whole-Body Vibration Associated with Low-Frequency Audio Reproduction Influences Preferred Vibration,” presented at the AES 36th International Conference, Dearborn, Michigan (June 2-4, 2009).

[2] Multimodal Integration, Wikipedia.
[3] Beerends, John G; De Caluew, Frank E. “The Influence of Video Quality on Perceived Audio Quality and Vice Versa,” JAES, Vol. 45, (5), pp. 355-362 (May 1999).
[4] Rudmose, Wayne, “The case of the missing 6 dB,” J. Acoust. Soc. Am. Volume 71, Issue 3, pp. 650-659 (March 1982)

Saturday, May 30, 2009

Harman's "How to Listen" - A New Computer-based Listener Training Program

Trained listeners with normal hearing are used at Harman International for all standard listening tests related to research and competitive benchmarking of consumer, professional and automotive audio products. This article explains why we use trained listeners, and describes a new computer-based software program developed for training and selecting Harman listeners.

Why Train Listeners?

There are many compelling reasons for training listeners. First, trained listeners produce more discriminating and reliable judgments of sound quality than untrained listeners [1]. This means that fewer listeners are needed to achieve the same statistical confidence, resulting in considerable cost savings. Second, trained listeners are taught to identify, classify and rate important sound quality attributes using precise, well-defined terms to explain their preferences for certain audio systems and products. Vague audiophile terms such as “chocolaty”, “silky” or “the bass lacks pace, rhythm or musicality” are NOT part of the trained listener's vocabulary since these descriptors are not easily interpreted by audio engineers who must use the feedback from the listening tests to improve the product. Third, the Harman training itself, so far, has produced no apparent bias when comparing the loudspeaker preferences of trained versus untrained listeners [1]. This allows us to safely extrapolate the preferences of trained listeners to those of the general untrained population of listeners (e.g. most consumers).

Harman's “How to Listen” Listener Training Program

Harman’s “How to Listen” is a new computer-based software application that helps Harman scientists efficiently train and select listeners used for psychoacoustic research and product evaluation. The self-administered program has 17 different training tasks that focus on four different attributes of sound quality: timbre (spectral effects), spatial attributes(localization and auditory imagery characteristics), dynamics, and nonlinear distortion artifacts. Each training task starts at a novice level, and gradually advances in difficulty based on the listeners’ performance. Constant feedback on the listener's responses is provided to improve their learning and performance. A presentation of the training software can be viewed in parts 1 and 2

Spectral Training Tasks

There are two different spectral training tasks. In the Band Identification training task, the listener compares a reference (Flat) and an equalized version of the music program (EQ), and must determine which frequency band is affected by the equalization (see slide 5 of part 2). The types of filters include peaks, dips, peak and dips, high/low shelving and low/high/bandpass filters. The task is aimed at teaching listeners to identify spectral distortions in precise, quantitative terms (filter type, frequency, Q and gain) that directly correspond to a frequency response measurement.

At the easiest skill level, there are only 2 frequency band choices, which are easily detected and classified. However, as the listener advances, the audio bandwidth is subdivided into multiple frequency bands making the audibility and identification of the affected frequency band more challenging.

The Spectral Plot training exercise takes this one step further. The listener compares different music selections equalized to simulate more complex frequency response shapes commonly found in measurements of loudspeakers in rooms and automotive spaces. The listener is given a choice of frequency curves which they must correctly match to the perceived spectral balances of the stimuli. This teaches listeners to graphically draw the perceived timbre of an audio component as a frequency response curve. Once trained, listeners become quite adept at drawing the perceived spectral balance of different loudspeakers, and these graphs closely correspond to their acoustical measurements [2], [3].

Sound Quality Attribute Tasks

The purpose of this task is to familiarize the listener with each of the four sound quality attributes (timbre, spatial, dynamics and nonlinear distortion) and their sub-attributes, and measure the listener's ability to reliably rate differences in the attribute's intensity. For example, in one task the listener must rank order the relative brightness/dullness of two or more stimuli based on the intensities of the brightness/dullness of the processed music selection. As the difficulty of the task increases, the listener must rate more stimuli that have incrementally smaller differences in intensity of the tested attribute. Listener performance is calculated using Spearman’s rank correlation coefficient which expresses the degree to which stimuli have been correctly rank ordered on the attribute scale.

Preference Training

In this task, the listener enters preference ratings for different music selections that have had one or more attributes (timbre, spatial, dynamics and nonlinear distortion) modified by incremental amounts.

By studying the interrelationship between the modification of these attributes and the preference ratings, Harman scientists can uncover how listeners weight different attributes when formulating their preferences. From this, the preference profile of a listener can be mapped based on the importance they place on certain sound quality attributes. The performance metric in the preference task is based on the F-statistic calculated from an ANOVA performed on the individual listeners’ data. The higher the F-statistic, the more discriminating and/or consistent the listeners’ ratings are --- a highly desirable trait in the selection of a listener.

Other Key Features

Harman’s “How to Listen” training software runs on both Windows and Mac OSX platforms, and includes a real-time DSP engine for manipulating the various sound quality attributes. Most common stereo and multichannel sound formats are supported. In “Practice Mode”, the user can easily add their own music selections.

All of the training results from the 100+ listeners located at Harman locations world-wide are stored on a centralized database server. A web-based front end will allow listeners to log in to monitor and compare their performances to those of other listeners currently in training. Of course, the identifies of the other listeners always remain confidential.

Conclusion

In summary, Harman’s “How to Listen” is a new computer-based, self-guided software program that teaches listeners how to identify, classify and rate the quality of recorded and reproduced sounds according to their timbral, spatial, dynamic and nonlinear distortion attributes. The training program gives constant performance feedback and analytics that allow the software to adapt to the ability of the listener. These performance metrics are used for selecting the most discriminating and reliable listeners used for research and subjective testing of Harman audio products.

References

[1] Sean. E Olive, "Differences in Performance and Preference of Trained Versus Untrained Listeners in Loudspeaker Tests: A Case Study," J. AES, Vol. 51, issue 9, pp. 806-825, September 2003. Download for free here, courtesy of Harman International.

[2] Sean E. Olive, “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results,” presented at the 116th AES Convention (May 2004).

[3] Floyd E. Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal press (July 2008). Available from Amazon here