Basic Considerations for Loudness-based Analysis of Room Impulse Responses

Similar documents
Analysing Room Impulse Responses with Psychoacoustical Algorithms: A Preliminary Study

PsySound3: An integrated environment for the analysis of sound recordings

Concert halls conveyors of musical expressions

Loudness and Sharpness Calculation

Proceedings of Meetings on Acoustics

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Calibration of auralisation presentations through loudspeakers

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Why do some concert halls render music more expressive and impressive than others?

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Noise evaluation based on loudness-perception characteristics of older adults

Measurement of overtone frequencies of a toy piano and perception of its pitch

TO HONOR STEVENS AND REPEAL HIS LAW (FOR THE AUDITORY STSTEM)

Perception of bass with some musical instruments in concert halls

JOURNAL OF BUILDING ACOUSTICS. Volume 20 Number

Loudness of pink noise and stationary technical sounds

9.35 Sensation And Perception Spring 2009

Experiments on tone adjustments

We realize that this is really small, if we consider that the atmospheric pressure 2 is

DIFFERENCES IN TRAFFIC NOISE MEASUREMENTS WITH SLM AND BINAURAL RECORDING HEAD

MODIFICATIONS TO THE POWER FUNCTION FOR LOUDNESS

Psychoacoustics. lecturer:

Methods to measure stage acoustic parameters: overview and future research

THE ACOUSTICS OF THE MUNICIPAL THEATRE IN MODENA

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

The influence of Room Acoustic Aspects on the Noise Exposure of Symphonic Orchestra Musicians

Sound design strategy for enhancing subjective preference of EV interior sound

Psychoacoustic Evaluation of Fan Noise

Quarterly Progress and Status Report. An attempt to predict the masking effect of vowel spectra

Trends in preference, programming and design of concert halls for symphonic music

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

A BEM STUDY ON THE EFFECT OF SOURCE-RECEIVER PATH ROUTE AND LENGTH ON ATTENUATION OF DIRECT SOUND AND FLOOR REFLECTION WITHIN A CHAMBER ORCHESTRA

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Comparison between Opera houses: Italian and Japanese cases

The Cocktail Party Effect. Binaural Masking. The Precedence Effect. Music 175: Time and Space

THE DIGITAL DELAY ADVANTAGE A guide to using Digital Delays. Synchronize loudspeakers Eliminate comb filter distortion Align acoustic image.

White Paper JBL s LSR Principle, RMC (Room Mode Correction) and the Monitoring Environment by John Eargle. Introduction and Background:

EFFECTS OF REVERBERATION TIME AND SOUND SOURCE CHARACTERISTIC TO AUDITORY LOCALIZATION IN AN INDOOR SOUND FIELD. Chiung Yao Chen

CHAPTER 20.2 SPEECH AND MUSICAL SOUNDS

The characterisation of Musical Instruments by means of Intensity of Acoustic Radiation (IAR)

Determination of Sound Quality of Refrigerant Compressors

CONCERT HALL STAGE ACOUSTICS FROM THE PERSP- ECTIVE OF THE PERFORMERS AND PHYSICAL REALITY

Comparison of Low Frequency Sound Insulation Field Measurement Methods

Investigation into Background Noise Conditions During Music Performance


Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

Modeling sound quality from psychoacoustic measures

Preferred acoustical conditions for musicians on stage with orchestra shell in multi-purpose halls

Progress in calculating tonality of technical sounds

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Loudness of transmitted speech signals for SWB and FB applications

Simple Harmonic Motion: What is a Sound Spectrum?

The acoustics of the Concert Hall and the Chinese Theatre in the Beijing National Grand Theatre of China

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Binaural dynamic responsiveness in concert halls

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

A few white papers on various. Digital Signal Processing algorithms. used in the DAC501 / DAC502 units

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Relation between violin timbre and harmony overtone

APPLICATION OF A PHYSIOLOGICAL EAR MODEL TO IRRELEVANCE REDUCTION IN AUDIO CODING

MASTER'S THESIS. Listener Envelopment

ANALYSING DIFFERENCES BETWEEN THE INPUT IMPEDANCES OF FIVE CLARINETS OF DIFFERENT MAKES

Listener Envelopment LEV, Strength G and Reverberation Time RT in Concert Halls

Linrad On-Screen Controls K1JT

I. LISTENING. For most people, sound is background only. To the sound designer/producer, sound is everything.!tc 243 2

A SEMANTIC DIFFERENTIAL STUDY OF LOW AMPLITUDE SUPERSONIC AIRCRAFT NOISE AND OTHER TRANSIENT SOUNDS

Analysis, Synthesis, and Perception of Musical Sounds

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Temporal summation of loudness as a function of frequency and temporal pattern

Room acoustics computer modelling: Study of the effect of source directivity on auralizations

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Chapter 2 Auditorium Acoustics: Terms, Language, and Concepts

Study of the Effect of the Orchestra Pit on the Acoustics of the Kraków Opera Hall

Implementing sharpness using specific loudness calculated from the Procedure for the Computation of Loudness of Steady Sounds

Temporal coordination in string quartet performance

ADVANCED PROCEDURES FOR PSYCHOACOUSTIC NOISE EVALUATION

Lecture 2 Video Formation and Representation

Binaural sound exposure by the direct sound of the own musical instrument Wenmaekers, R.H.C.; Hak, C.C.J.M.; de Vos, H.P.J.C.

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

UNIVERSITY OF DUBLIN TRINITY COLLEGE

Proceedings of Meetings on Acoustics

Dynamic Spectrum Mapper V2 (DSM V2) Plugin Manual

The Lecture Contains: Frequency Response of the Human Visual System: Temporal Vision: Consequences of persistence of vision: Objectives_template

Brian C. J. Moore Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England

R G Alcorn, W C Beattie. The Queen s University of Belfast

The Tone Height of Multiharmonic Sounds. Introduction

Note on Posted Slides. Noise and Music. Noise and Music. Pitch. PHY205H1S Physics of Everyday Life Class 15: Musical Sounds

RECORDING AND REPRODUCING CONCERT HALL ACOUSTICS FOR SUBJECTIVE EVALUATION

Music Representations

Faculty of Environmental Engineering, The University of Kitakyushu,Hibikino, Wakamatsu, Kitakyushu , Japan

Absolute Perceived Loudness of Speech

Using the BHM binaural head microphone

Table 1 Pairs of sound samples used in this study Group1 Group2 Group1 Group2 Sound 2. Sound 2. Pair

Natural Radio. News, Comments and Letters About Natural Radio January 2003 Copyright 2003 by Mark S. Karney

FC Cincinnati Stadium Environmental Noise Model

CTP 431 Music and Audio Computing. Basic Acoustics. Graduate School of Culture Technology (GSCT) Juhan Nam

Signal to noise the key to increased marine seismic bandwidth

Study on the Sound Quality Objective Evaluation of High Speed Train's. Door Closing Sound

Transcription:

BUILDING ACOUSTICS Volume 16 Number 1 2009 Pages 31 46 31 Basic Considerations for Loudness-based Analysis of Room Impulse Responses Doheon Lee and Densil Cabrera Faculty of Architecture, Design and Planning, University of Sydney, NSW 2006, Australia dlee7117@mail.usyd.edu.au, densil@usyd.edu.au (Received 1 February 2009 and accepted 17 February 2009) ABSTRACT Room impulse responses (RIRs) are used to characterise the acoustical conditions inside soundcritical rooms such as auditoria. The analysis of RIRs typically involves octave-band filtering, with parameters such as reverberation time, early decay time, temporal energy ratios and spatial parameters derived from this. This paper explores the potential for applying auditory models for the analysis of RIRs incorporating auditory temporal integration (and masking), auditory filterbank analysis, and loudness calculation. The purpose of this is to produce analysis results that are closely related to the sound experienced by listeners. A preliminary step for such analysis is to filter RIRs so that their power spectrum is similar to that of typical material that would be listened to in the rooms (e.g. music or speech), and this paper proposes a music filter suitable for orchestral music, derived from long term power spectra of anechoic music recordings. Dynamic loudness analysis of RIRs yields loudness decay functions that are approximately exponential, which should provide a useful analogy with conventional analysis methods applied to RIRs. 1. INTRODUCTION Room impulse responses (RIRs) are widely used to evaluate acoustical conditions of enclosed spaces [1-4]. From the measured RIRs, a number of acoustical parameters are extracted such as reverberation time, early decay time, strength factor and clarity index to predict various aspects of the acoustical quality of rooms. Although each parameter is used alone or combined with the other parameters to assess the acoustical qualities of auditoria, these parameters do not perfectly correlate with the actual human perception (for example, of reverberance, loudness or clarity) [5]. Furthermore, the details of the perceived reverberation are likely to differ from physical analysis for example the roughly exponential decay curves obtained from RIRs may not correspond to the perceived decay pattern of the sound. One issue in the discrepancy between the conventional acoustical parameters and the human perception is that the former does not sufficiently take into account characteristics of the auditory system, such as temporal integration and spectral masking. While the

32 Basic Considerations for Loudness-based Analysis of Room Impulse Responses human auditory system emphasises and de-emphasises spectral or temporal components of sounds, compared to those measured by a microphone in the free-field [6], the conventional acoustical parameters do little to reflect these characteristics. Psychoacoustical approaches to sound analysis have been developed to make more accurate predictions of human perception, although these are rarely applied to auditorium acoustics. In the Munich school of psychoacoustics, critical band rate based on the vibrating area of the basilar membrane is more often used than frequency in modelling perception [7]. Loudness models reflect the complex dependence of loudness on sound pressure level, frequency, bandwidth and time, and when the natural loudness unit is used (sone) they provide a ratio scale such that doubling or halving in loudness corresponds to a doubling or halving in units. A simple example of the deviation between loudness and sound pressure level is that a 1 khz tone at 60 db is perceived as equally loud to a 50 Hz tone at 85 db and those two tones have the same calculated loudness value of 4 sones. These aspects of the human perception are incorporated into the time-varying or dynamic loudness models suggested by Zwicker [8] and by Chalupper & Fastl [9]. However, analysing RIRs with such models raises some issues. Loudness models are designed for signal analysis, whereas a RIR is a system analysis. This distinction may seem subtle, since RIRs can be listened to like any audio signal, but the purpose of auditorium acoustics analysis is to assess how music or speech is affected by the room, not how a Dirac delta function sounds in the room. One clear difference between music and an impulse is the spectral distribution of the signal, and this is a theme explored the present paper. The purpose of this paper is to present an examination of some of the basic issues that must be considered if a loudness-based analysis method for RIRs is to be developed. The concept is that it should be possible to develop an analysis method for RIRs using principles developed in psychoacoustics (especially dynamic loudness modelling) that provides a closer match to perception than the simple RIR analysis methods currently in use. This paper does not set out to prove this point, but merely to examine two key issues: the importance of spectral weighting; and the characteristics of RIR decay when analysed with a dynamic loudness model. The refinement of such an approach to RIR analysis (such as the derivation of decay parameters) is a matter for further research. An alternative approach to analysing an RIR in assessing room acoustics is to use anechoic music or speech convolved with the RIR. While this has several advantages, especially in auralisation, the results are biased by the particular selection of anechoic recording, making it more difficult to generalise beyond anechoic samples similar to that selected. The concept of the present paper is to retain the RIR in the analysis, but to adapt it so that it is more suitable for listener-oriented analysis; and also to use analysis methods based on an auditory model in this case, the dynamic loudness model of Chalupper and Fastl [9]. A preliminary step in this process is to filter the RIR so that its power spectrum is similar to that which would be heard in the room acoustical context: for example, in a concert hall we are concerned with orchestral music; in a speech auditorium with speech. The following section examines how a filter might be developed based on orchestral music.

BUILDING ACOUSTICS Volume 16 Number 1 2009 33 2. POWER SPECTRAL CHARACTERISTICS OF ORCHESTRAL MUSIC 2.1. Previous studies An impulse excites a room with a white power spectral distribution (equal power per linear spectral component), which is very different from the spectral distribution of music or speech, and also is very different from the distribution of filters in auditory spectral analysis (especially above 500 Hz). In conventional RIR analysis, this issue is ameliorated through extracting room acoustical parameters from octave band analysis (although each octave band retains a +3 db spectral slope bias between its low and high cut-off frequencies, relative to the logarithmic frequency scale). If psychoacoustical models are to be used for RIR analysis, the spectral distribution of the signal should be that of a typical music or speech signal, rather than being dominated by the high frequency content of the white spectrum. Hence, this section of the paper provides a survey of possible music filters based on measurements of the long term spectral qualities of music. Because the spectral distribution of music varies greatly depending on musical style, only orchestral performances are considered for this study the focus of this study being on the analysis of impulse responses from concert halls. With regard to the long term spectrum of music, Sivian et al. [10] conducted a pioneering study of the spectral distribution of live music, and McKnight [11], Bauer [12] and Greiner and Eggers [13] carried out the major studies of the spectral distribution of recorded music. McKnight [11] investigated the highest peak amplitudes of music using VU meter readings. A number of music samples used in McKnight s study had been recorded with a single condenser microphone excluding other studio equipment in order to record sounds close to the actual instruments. Bauer [12] investigated the lowest peak amplitudes of music not exceeded more than 0.1% and 1% of the total length of the music. In Bauer s study, amplitudes of music were represented relative to a 1 khz level setting tone for master tapes (corresponding to 5 cm/s rms lateral velocity on a vinyl record). The study by Greiner and Eggers [13] is similar to that of Bauer, except the researchers employed a larger number of percentile divisions; 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90% of the time. In their study, previously recorded music on compact disc was used for samples. With respect to bandwidth, McKnight used the one-third octave bands while Bauer and Greiner & Eggers worked with one-octave bands. According to Bauer [12], a bandwidth of one octave is the best compromise between a good time-varying amplitude response and a bandwidth certainty, although it introduces 3 db errors in peak output amplitudes. Figure 1 shows the averages of the spectral amplitudes of orchestra performances from the three studies, except all the eight samples used in the Bauer s study are averaged as the researcher does not provide details of performance styles of the samples. An average of the peak amplitudes of orchestra performances from McKnight s study [11] shows a dramatic increase at frequencies over 40 Hz to 100 Hz. Above that they are around 1 db and 4 db in VU meter readings before irregular peaks and dips appear in the high frequency range. The anomalous peak found at 12.5 khz is not explained. For Bauer s study [12], an average of all the results yield a steep increase at frequencies from 30 Hz to 250 Hz and then the averaged value stays around the highest value until 4 khz before it dramatically decreases. The difference between an average for 0.1% of the time

34 Basic Considerations for Loudness-based Analysis of Room Impulse Responses (a) 0 VU Meter reading (db) 2 4 6 8 10 12 14 63 125 250 500 1 k 2 k Frequency (Hz) 4 k 8 k 16 k (b) Level relative to 1 khz Level-setting tone (db) 4 0 4 8 12 1% 0.1% 32 to 62 62 to 125 125 to 250 250 to 500 500 to 1 k 1k to 2 k 2k to 4 k 4k to 8 k 8k to 16 k Frequency (Hz) (c) Peak output level (db) 0 10 20 30 40 50 1 % 90 % 50 % Figure 1. 60 32.5 63 125 250 500 1 k 2 k Frequency (Hz) The averages of musical spectrum from the previous studies of; (A) McKnight, (B) Bauer and (C) Greiner and Eggers. 4 k 8 k 16 k

BUILDING ACOUSTICS Volume 16 Number 1 2009 35 and 1% of the time are almost constant over all the frequencies of interest by 3 db. As would be expected, the average for 1% of the time has lower values. Greiner and Eggers [13] worked with a large number of time divisions, and amplitudes of orchestra performances only for 1%, 50% and 90% of the time are averaged to review their work on this paper. An averaged amplitude for 1% of the time mostly stays around 10 db relative to 2 volts per octave band and those for 50% of the time stays around 20 db at frequencies from 63 Hz to 2 khz before an obvious decrease from 2 khz to 16 khz. For 90% of the time, a steep increase and decrease is found at frequencies from 32.5 Hz to 250 Hz and from 2 khz to 16 khz, respectively. Similar surveys of other types of music and speech have been made. Farina [14] measured the long term average spectrum of music from the personal music players of high school students (15-18 years old) with over 13 hours of music. Long term average spectra of speech have also been studied extensively, and concensus data are given in standards for speech intelligibility measurement (such as ANSI S3.5-1997) [15]. The International Electrotechnical Commission has developed a spectrum to represent the long term distribution of general program content for equipment testing purposes (IEC 60268-1) [16]. 2.2. Analysis of a selection of anechoic recordings The power spectral characteristics of orchestral recordings in auditoria confound the spectral characteristics of the signal (orchestra) and system (room). Hence the previous studies cited are analyses of both the musical sources and the acoustic environments in which the recordings were made. In developing a weighting filter relevant to the orchestral signal alone it would be better to use data from anechoic recordings. In this section of the paper we examine seven anechoic recordings of music from Denon Test CD No. 2 [17]. When these recordings were made, sound was conveyed to the conductor s headphones with two seconds of reverberation time in order to eliminate factors which might influence the performance due to playing in the anechoic condition [17]. Details of the seven performances are given below. The first four performances last for around 30 seconds while the rest for around 90 seconds. 1. Bruckner, Symphony No. 4 in E-flat minor, Romantic (excerpt from first movement) 2. Handel, Water Music (Harty edition, excerpt from sixth piece. Allegro Deciso). 3. Mozart, The Marriage of Figaro, KV492 (excerpt from Overture) 4. Shostakovich, Symphony No. 5 in D minor, Op 47 (excerpt from first movement) 5. Johann and Josef Strauss, Pizzicato Polka 6. Bizet, L Arlésienne Suite No.2 Minuet 7. Glinka, Ruslan and Ludmilla (excerpt from Overture) Samples 1 and 4 contain brass and string sound playing at forte and fortissimo. Sample 7 also has a loud brass and string part with an addition of timpani. Samples 5 and 6 are softer than the other samples. For sample 5, a small number of strings are played mostly at pianissimo and piano. Similarly to the sample 5, one flute and one piano are played at mezzo piano and piano for the most part of sample 6. For sample 2, a large number

36 Basic Considerations for Loudness-based Analysis of Room Impulse Responses 0 (a) (b) 20 Sample 4 40 60 Sample 1 Sample 2 Sample 3 Sample 5 Level (db) 80 0 20 Sample 7 (c) IEC Curve Music spectrum by farina (d) 40 Loud 60 Sample 6 Music spectrum quiet 80 31.5 63 125 250 500 1 k 2 k 4 k 8 k 16 k 31.5 63 125 250 500 1 k 2 k 4 k 8 k 16 k Frequency (Hz) Figure 2. The spectral amplitudes of: (A) the samples 1, 2 & 3; (B) samples 4 & 5; (C) samples 6 & 7; and (D) power averages of all samples (Music Spectrum Loud) and of samples 5 & 6 (Music Spectrum Quiet), shown along with the music spectrum from personal music players found by Farina [9] and the IEC program curve, as a function of the one-third octave bands. of strings are played at around mezzo forte and sample 3 has the widest dynamic range (from pianissimo to forte of strings and brasses) of the seven samples. According to the manufacturer of Denon Test CDs [17, 18], all the performances were sampled in anechoic conditions, which met recommendations specified in ISO-3745 [19] for anechoic chambers. With respect to microphone positions, two omnidirectional microphones at positions above the head of conductor are mainly used. To sample instruments sounding weak, a number of spot microphones are used. Time differences, which occurred due to different microphone positions, are compensated for the recording process [18]. All seven samples used in this study are in a two-channel stereophonic format. Therefore, squared amplitudes of left and right channels were added to provide single values comparable with the previous studies. The obtained spectrum values represent L eq at each 1/3-octave band relative to full scale. Figure 2 (A, B and C) show amplitudes of all the seven samples. As seen in the figure, amplitudes of samples 1, 2, 3, 4 and 7 are between 20 db to 40 db for most one-third octave bands, and samples 5 and 6 are below 40 db for most one-third octave bands. Differences between the two groups

BUILDING ACOUSTICS Volume 16 Number 1 2009 37 20 Level relative to 1 khz (db) Figure 3. 10 0 10 20 30 Music spectrum loud Best fit line 31.5 63 125 250 500 1 k 2 k 4 k 8 k 16 k Frequency (Hz) and music filter derived from Music Spectrum Loud. become greater as frequencies increase. Samples 1, 2, 3, 4 and 7 yield a similar pattern while samples 5 and 6 are distinctive. Figure 2 (D) shows the average of all the seven samples and of samples 5 & 6. Music Spectrum Loud refers to the former (since power averaging means that the spectra of samples 5 and 6 have little influence on the result) and Music Spectrum Quiet to the latter. The two graphs shown in Figure 2 (D) would be appropriate spectra to suggest as representing playing at two different dynamics; quiet and loud. In the mid and high frequency range, the Music Spectrum Loud is similar in profile both to Farina s [14] music spectrum and the IEC program curve, but there is substantially less low frequency energy in our music spectrum than in the Farina and IEC spectra. Compared to the three previous studies, the Music Spectrum Loud profile is somewhat similar to that of McKnight (Figure 1 A) except in the very high frequency range (correlation coefficient of r = 0.88 for frequencies below 8 khz). However it differs more from Bauer s (especially above 1 khz) and Greiner and Egger s percentile spectra (Figure 1 B and C). Figure 3 shows how a music filter may be derived from the power spectrum of music. As mentioned previously, RIRs are measured using an initial stimulus possessing a white spectrum which has a spectral slope of +3 db per octave band or +1 db per onethird octave band. The task of a music filter is to convert a white spectrum to a music spectrum, and so it is the product of the desired music spectrum and a pink filter. Hence, in Figure 3, the Music Spectrum Loud is used to derive a music filter by multiplying with a 3 db/octave function (pink), and the resulting filter function is smoothed. Note that, as a simple alternative to applying a music filter, a pink filter could be used to bring an impulse response somewhat closer to typical listening conditions, although it does deviate substantially from the music filter at the extremes of the frequency range. Similarly, the IEC program curve, multiplied by the pink filter function,

38 Basic Considerations for Loudness-based Analysis of Room Impulse Responses 3.5 3.0 2800 Seats Time (s) 2.5 2.0 1.5 1.0 0.5 700 Seats Figure 4. 0 31.5 63 125 250 500 1 k 2 k 4 k 8 k 16 k Frequency (Hz) Octave band reverberation time of the two halls. could be used, but it also deviates considerably from the anechoic orchestral music in the low frequency range. 3. APPLICATION OF A MUSIC FILTER AND PINK FILTER TO MEASURED RIRS The music filter (best-fit line) and pink filter shown in Figure 3 were applied to measured RIRs from two concert halls. The RIRs were measured by Farina and colleagues in two halls of Rome s Parco della Musica, seating 700 and 2800 [20]. Those RIRs are named Small-Close, Large-Close and Large-Distant. Small-Close refers to the RIR measured in the 700-seat hall at a receiver position 12 m from the onstage source. Large-Close represents the RIR measured in the 2800-seat hall at a receiver position 20.5 m from the on-stage source, and Large-Distant is for the RIR measured 48 m from the source in the same hall as Large-Close. The RIRs were measured with fixed system gain, and so vary in level according to the acoustic conditions. In our analysis, we gave the Small-Close RIR an instantaneous peak sound pressure level of about 85 db, as seen in Figure 6 and the cumulative power sum of the RIR at this gain would be substantially higher. Hence, we use the filter derived from loud (rather than quiet) music in this analysis. To provide a rough idea of acoustical conditions of those halls, reverberation times for the two halls are presented in Figure 4. Reverberation times from two receiver positions in the small hall and three receiver positions in the large hall were averaged for Figure 4. Those positions include the receiver positions for the three RIRs. Figure 5 shows the sound pressure level of the three RIRs with the application of the two filters (music filter and pink filter) and without the filters, as a function of time. These sound pressure levels use exponential temporal integration with a 125 ms time constant (equivalent to the fast setting of a sound level meter). The refers to the unfiltered RIR. As seen in the figure, Small-Close has the greatest sound pressure

BUILDING ACOUSTICS Volume 16 Number 1 2009 39 Sound pressure level (db) 80 70 60 50 40 30 20 10 Small -Close Large -Close 0 0 1 2 Time (s) 3 4 0 1 2 Time (s) 3 4 Sound pressure level (db) 80 70 60 50 40 30 20 10 Large -Distant 0 0 1 2 Time (s) 3 4 Figure 5. The sound pressure level (unweighted) of the three RIRs with an application of the two filters, as a function of time. level, and Large-Distant the least sound pressure level, as would be expected. For all RIRs, the application of filters produces an overall gain. The gains produced by the pink filter are generally within 5 db to 8 db at the start of the decay curves, but increase towards the noise floor at the tail of the impulse response recordings. The gains produced by the music filter are within 5 db at the start of the decay curves and decrease towards the noise floor. Although the decay curves of Figure 5 are generated using the commonly used fast integration time of 125 ms, it is unusual to analyse RIRs with this type of integration. Hence, by way of comparison, Figure 6 shows the decay curve of the Small-Close RIR compared to its instantaneous sound pressure level (derived from the Hilbert transform). This reveals the extent to which the decay curve has been smoothed by fast integration, as well as the contrast between instantaneous and integrated sound pressure level at the start of the RIR. Fast integration is intended to emulate auditory temporal integration (for some signals, 125 ms is the duration beyond which an increase in duration does not yield increased loudness [7]). Therefore it makes an interesting comparison with the results of dynamic loudness modelling. The reason why the vertical axes of Figures 5 and 6 are in sound pressure level units, rather than level with respect to some arbitrary reference (such as full scale amplitude

40 Basic Considerations for Loudness-based Analysis of Room Impulse Responses 100 90 Sound pressure level (db) 80 70 60 50 40 30 20 10 0 0.5 1.0 1.5 Time (s) 2.0 2.5 Figure 6. The sound pressure level of the Small-Close RIR, comparing instantaneous level (grey) with fast integration (black). of the medium) is that loudness modelling requires an assumption to be made about the sound pressure level received by the listener. Loudness models are non-linear, and will only yield useful results for reasonable listening levels. The gain that yielded the sound pressure levels selected for this analysis was chosen because these are of a similar order to levels that might be experienced in an auditorium. Figure 7 shows the modelled loudness of the three RIRs with an application of the two filters and without the filters. The model used is Chalupper and Fastl s [9] dynamic loudness model, which is implemented in the computer program PsySound3 [21]. As seen in the figure, all the initial and filtered RIRs in Small-Close show greater loudness than those in Large-Close and Large-Distant. In contrast to the sound pressure level comparisons shown in Figure 5, the loudness of the original and pink filtered signals yield similar results, while those of the music filtered analysis are slightly but obviously quieter. The fine temporal structure of the decay curves is similar, regardless of the application of a filter. Greater detail in the fine temporal structure is evident in the loudness decays than in the sound pressure level decays of Figure 5. A striking feature of the loudness decay curves in Figure 7 is that they appear to exhibit approximately exponential decay, like the signal s decay curve prior to transformation to decibels. However, closer examination shows that while the first part of the loudness decay curves is approximately exponential, this is followed by faster loudness decay. Figure 8 compares the exponential decay rates (by using a logarithmic value scale) for the Small-Close RIR. In addition to showing the pressure and pressure-squared decays, it shows Stevens power law [22] for loudness (where loudness is proportional to pressure raised to the power of 0.6). The comparison shows that the modelled loudness decay rate

BUILDING ACOUSTICS Volume 16 Number 1 2009 41 Loudness (sones) 25 20 15 10 5 Small-Close Large-Close 0 0 0.5 1 1.5 2 2.5 Time (s) 0 0.5 1 1.5 2 2.5 Time (s) Loudness (sones) 25 20 15 10 5 Large-Distant Figure 7. 0 0 0.5 1 1.5 2 2.5 Time (s) Loudness of the three RIRs with an application of the two filters, as a function of time. is similar to that expected from Stevens power law, but with a faster decay rate once low sound pressure levels are encountered. This faster decay rate would be expected from steady state loudness theory from the fact that the fixed loudness exponent of 0.6 only applies to sounds of moderate loudness (for sound pressure levels roughly between 40 and 80 db). The consistency of the modelled decay with steady state loudness theory suggests that temporal integration (and temporal masking) is having little effect on the coarse structure of the loudness decay. Figure 9 shows the averaged specific loudness (sones/bark) as a function of critical band rate (Bark). The specific loudness pattern can be thought of as a psychoacoustical spectrum, where values are the loudness attributable to the critical band rate units. As seen in the figure, yields the greatest specific loudness at critical band rates from approximately 11 Bark to 24 Bark and the music filtered RIRs attain the highest specific loudness at critical band rates from 3 Bark to 11 Bark. For the pink filter, a substantial increase in specific loudness below 3 Bark appears, which is probably due to the greater loudness growth function in the low frequency range (where the loudness exponent becomes greater than 0.6). The charts show the importance of the peak in the outer ear transfer function above 15 Bark.

42 Basic Considerations for Loudness-based Analysis of Room Impulse Responses Value relative to maximum 1 0.1 0.01 Squared pressure Pressure Loudness Stevens' power law Figure 8. 0.001 0 0.5 1 1.5 Time (seconds) Comparison of decay rates on a logarithmic scale, for the Small-Close RIR (without music filtering). Normalised A-weighted squared pressure and pressure are shown, together with loudness, and the application of Stevens power law to the pressure decay curve. 2 4. DISCUSSION This paper examines some issues that need to be addressed in applying loudness models to RIR analysis. It has examined two aspects of this: (i) the application of spectral weighting to bring a RIR closer to the long term spectrum of orchestral music; and (ii) the application of a dynamic loudness model to filtered (and unfiltered) RIRs from concert auditoria. The results indicate the type and extent of differences that might occur with applying these approaches to the analysis of RIRs from concert auditoria. Perhaps there is no correct solution to the design of music filters because the spectral characteristics of music vary so much. The filters explored in this paper are taken as possible solutions, and are used by way of example. The similarity between the music spectrum used and the IEC program spectrum (except at low frequencies) provides some confidence in the representativeness of the music filter. If more defensible music filters were to be derived for orchestral music, much more extensive anechoic recordings would be needed. Nevertheless, even if an ideal representative spectrum were derived, other factors such as the directivity of sound radiated from the source come in to play (presumably the direct sound is heard from the front of the orchestra, while the reverberation is heard from sound averaged over all radiation directions). On the other hand, auditory analysis of RIRs without applying a filter makes little sense because of the white spectral bias of the excitation signal. The pink filter has some appeal because of its simplicity, although it results in excessive energy in the very low and very high frequency ranges.

BUILDING ACOUSTICS Volume 16 Number 1 2009 43 Loudness (sones/bark) 0.3 0.2 0.1 Small-Close Large-Close 0 0 5 10 15 20 25 Critical band rate (bark) 0 5 10 15 20 25 Critical band rate (bark) Loudness (sones/bark) 0.3 0.2 0.1 Large-Distant Figure 9. 0 0 5 10 15 20 25 Critical band rate (bark) Average specific loudness (Sones/Bark) of the three RIRs with the two filters, as a function of critical band rate (Bark). To use loudness models well, the signal should be calibrated to a realistic listening level. This could be done quite precisely if the sound power level of relevant music was known, and the strength factor associated with each RIR was known. For the present analysis, neither of these pieces of information is available, but an approximate assumption can be made about listening level. Nevertheless, the problem remains that loudness models are non-linear with respect to sound pressure level (loudness growth and upward masking patterns change substantially with sound pressure level). An alternative solution to this might be to simplify the loudness model, to remove the nonlinear gain dependence. The similarity between the Stevens power law slope and dynamic loudness model slope in Figure 8 suggests a starting point for such a simplification (but the loudness model s temporal resolution is finer than that produced by fast temporal integration). The loudness decay function is exponential at first, and is consistent with the loudness that might be calculated from a steady state loudness model (although this might not be so for very short reverberation times). A more subtle aspect of this problem is that the dynamic characteristics of RIRs are very different to those of music, meaning that a dynamic loudness model will respond differently to RIRs than to music in auditoria. This, at least, will impinge on the process of applying realistic gain, and is likely also to be important in interpreting analysis results.

44 Basic Considerations for Loudness-based Analysis of Room Impulse Responses Numerous parameters could be derived from this type of RIR analysis. Most obviously, the calculated loudness of the RIR might be used analogously to strength factor in estimating the perceived loudness of the acoustical system. Rather than weighting the spectrum indirectly by selecting particular octave band values derived from an un-weighted RIR, music-filtering followed by loudness analysis provides a single value that is inherently weighted by the spectral characteristics of music and the sensitivity of the auditory system. However, the most suitable period over which this single-number loudness calculation is made, along with the most suitable gain assumption in the loudness analysis, need to be explored using results from yet-to-be-performed subjective tests. The fact that loudness decay is roughly exponential makes for a straightforward analogy with conventional acoustical parameters such as early decay time and reverberation time. Based on Stevens power law, the time taken for the loudness decay function to halve is analogous to the early decay time evaluation interval of 10 db. Similar analogies could be constructed with the reverbaration time evaluation intervals of T20 and T30 (for example, the time interval between 0.708 and 0.178 of the peak loudness is analogous to the T20 evaluation period of 5 db to 25 db). Again, for such analogous parameters to be meaningful, a subjective study is needed to investigate how perception relates to potential parameters. Further analysis also shows that the loudness decay time is not independent of gain, and increases by a factor of about 0.15 per 10 db gain using Chalupper and Fastl s loudness model. These complications may be seen as disadvantages of loudness-based RIR analysis, and so might be removed by simplifying the loudness modelling. Alternatively they may be taken as potential tools in evaluating the sound of a room represented by RIRs to different types of signals. The RIRs analysed here were made using an omnidirectional microphone. However, a more detailed approach could be taken using a binaural RIR, and possibly a binaural loudness model. The binaural summation procedure proposed by Sivonen and Ellermeier [23] has some potential for this if a single time-varying specific loudness pattern is desired. That model performs binaural signal summation prior to input into an arbitrary loudness model (and so could be applied to Chalupper and Fastl s dynamic model). Another approach could be to use the binaural summation procedure proposed by Moore and Glasberg [24], which may be applied to the output of Glasberg and Moore s [25] time-varying loudness model applied to each ear. That would allow an assessment of the loudness attributable to each ear, although since the analysis does not include phase information, it would not provide sufficient data for detailed auditory spatial modelling. Conventionally, binaural RIRs are analysed using the interaural cross correlation (and not the interaural level differences), and perhaps there is some prospect for integrating these approaches. One question that arises from this approach is whether a loudness-based analysis of RIRs is in fact a good representation of auditory perception, and in a broader sense, low level cognition. Partly, this is to do with the question of whether loudness models are accurate. Beyond this, it might be that the important attributes of RIRs are not just related to loudness, but more to some measure of salience. For example, although there

BUILDING ACOUSTICS Volume 16 Number 1 2009 45 is a very dramatic decline in loudness as a RIR decays, a listener s attention may be drawn into listening to the details in the quieter parts of the reverberant tail. 5. CONCLUSION A psychoacoustical approach to RIR analysis has some possibilities, but there are considerable challenges to overcome in developing a practically useful analysis method. While such methods may draw on pre-existing psychoacoustical models, ultimately they should be validated and refined using subjective responses to stimuli that as pre-existing loudness models are derived from subjective data that are very different from RIRs. The derivation of single number parameters from loudness decay functions has not been explored here, but the fact that the decay functions are relatively simple suggests that this should be feasible. Again, the parameters would need to be based on subjective data (for example, assessments of reverberation period, overall loudness, clarity, and even spatial attributes for music sources convolved with RIRs). The work in this paper is the first step of a larger research project. REFERENCES [1] Schroeder, M. R., New Method of Measuring Reverberation Time, Journal of the Acoustical Society of America, 1965, 37, 409 412. [2] Morgan, D. R., A Parametric Error Analysis of the Backward Integration Method for Reverberation Time Estimation, Journal of the Acoustical Society of America, 1997, 101(5), 2686 2689. [3] Stan, G. B., Embrechts, J. J. and Archambeau, D., Comparison of Different Impulse Response Measurement Techniques, Journal of the Audio Engineering Society, 2002, 50(4), 249 262. [4] Faiget, L., Legros, C. and Ruiz, R., Optimization of the Impulse Response Length: Application to Noisy and Highly Reverberant Rooms, Journal of the Audio Engineering Society, 1998, 46(9), 741 750. [5] Soulodre, G. A. and Bradley J. S., Subjective Evaluation of New Room Acoustic Measures, Journal of the Acoustical Society of America, 1995, 98(1), 294 301. [6] Moore, B. C. J., Glasberg, B. R. and Baer, T., A Model for the Prediction of Thresholds, Loudness, and Partial Loudness, Journal of the Audio Engineering Society, 1997, 45(4), 224 240. [7] Zwicker, E. and Fastl, H., Psychoacoustics: Facts and Models, Springer, Berlin; New York, 1999. [8] Zwicker, E., Procedure for Calculating Loudness of Temporally Variable Sounds, Journal of the Acoustical Society of America, 1977, 62, 675 682. [9] Chalupper, J. and Fastl, H., Dynamic Loudness Model (DLM) for Normal and Hearing-impaired Listeners, Acustica, 2002, 88, 378 386. [10] Sivian, L. J., Dunn, H. K. and White, S. D., Absolute Amplitudes and Spectra of Certain Musical Instruments and Orchestras, Journal of the Acoustical Society of America, 1931, 2, 330 371.

46 Basic Considerations for Loudness-based Analysis of Room Impulse Responses [11] McKnight, J. G., The Distribution of Peak Energy in Recorded Music, and 'Its Relation to Magnetic Recording System, Journal of the Audio Engineering, 1959, 7, 65 80. [12] Bauer, B. B., Octave-band Spectral Distribution of Recorded Music, Journal of the Audio Engineering Society, 1970, 18, 165 172. [13] Greiner, R. A. and Eggers, J., The Spectral Amplitude Distribution of Selected Compact Discs, Journal of the Audio Engineering Society, 1989, 37, 246 275. [14] Farina, A., A Study of Hearing Damage by Personal MP3 Players, 123 rd Audio Engineering Society Convention, New York, NY, USA, 2007. [15] ANSI S3.5-1997, Methods for Calculation of the Speech Intelligibility Index, American National Standards Institute. [16] IEC 60268-1:Bilingual 1988, Amendment 1 - Sound system equipment - Part 1: General, International Electrotechical Commission. [17] Denon Professional Test CDs [CD-ROM]. Japan: Nippon Columbia. [18] Anechoic Orchestral Music Recordings 1995. [CD-ROM]. Japan: Nippon Columbia. [19] ISO-3745:2003, Acoustics - Determination of sound power levels of noise sources using sound pressure - Precision method for anechoic and hemi-anechoic rooms, International Organization for Standardization. [20] Farina, A. and Ayalon R., Recording Concert Hall Acoustics for Posterity, 24th Audio Engineering Society Conference, Banff, Canada, 2003. [21] Cabrera, D., Ferguson, S., Rizwi, F. and Schubert, E., PsySound3: A Program for the Analysis of Sound Recordings, Acoustics 2008, Paris, France, 2008a. [22] Stevens, S. S., The Measurement of Loudness, Journal of the Acoustical Society of America, 1955, 27(5), 815 829. [23] Sivonen, V. P. and Ellermeier, W., Binaural Loudness for Artificial-head Measurements in Directional Sound Fields, Journal of the Audio Engineering Society, 2008, 56(6), 452 461. [24] Moore, B. C. J. and Glasberg, B. R., Modeling Binaural Loudness, Journal of the Acoustical Society of America, 2007, 121(3), 1604 1612. [25] Glasberg, B. R. and Moore, B. C. J., A Model of Loudness Applicable to Timevarying Sounds, Journal of the Audio Engineering Society, 2002, 50, 331 342.