Proceedings of Meetings on Acoustics

Size: px

Start display at page:

Download "Proceedings of Meetings on Acoustics"

Lynne Dawson
5 years ago
Views:

Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.

1 Proceedings of Meetings on Acoustics Volume 19, ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Architectural Acoustics Session 1aAAa: Advanced Analysis of Room Acoustics: Looking Beyond ISO 3382 I 1aAAa3. What is "clarity", and how it can be measured? David H. Griesinger* *Corresponding author's address: David Griesinger Acoustics, 221 Mt Auburn St #504, Cambridge, MA 02138, dgriesinger@verizon.net There is a distinct difference between speech that is crystal clear and speech that is moderately intelligible but muddy. Muddy speech can often be understood, but is difficult to pay attention to and remember. Unfortunately current ISO3382 measures are blind to this difference. We will demonstrate the vital importance of the difference between "clear" and "muddy" though examples of speech and music. We will then present three physiologically based methods that can measure the degree of crystal clarity in a particular acoustic environment. Two of these measures utilize measured binaural impulse responses. The first one, LOC, uses a simple nerve firing model to analyze an impulse response for the buildup of reflected energy at the onsets of sounds. The second measures the degree of phase randomization above 1000Hz caused by a particular impulse response. The third measure - based on a computer model of human hearing - measures clarity directly from binaurally recorded speech. All three measures predict perceived clarity with useful accuracy. Published by the Acoustical Society of America through the American Institute of Physics 2013 Acoustical Society of America [DOI: / ] Received 21 Jan 2013; published 2 Jun 2013 Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 1

2 INTRODUCTION My first brush with some of the acoustic measurement methods later enshrined in the ISO 3382 standards came when John Bradley agreed to measure the first large-scale example of the LARES acoustic enhancement system, installed by the late Neil Muncy, Steve Barbar, and myself in the Wintergarden of the Elgin theater in Toronto. The Wintergarden, a small jewel-box of a Vaudeville theater, had just been immaculately restored. As is typical of such venues it had perfect acoustics for speech and light music. The acoustics were dry but with high intelligibility and a kind of precise clarity throughout the audience. I was to realize only much later that these are also the acoustics of the countless jewel-box theaters all around Europe where Italian opera was conceived and can still be heard. But Neil was convinced some electronics could add a bit of reverberation and improve the sound. The Wintergarden was an ideal place to try it, as electronically adding sound to a room can reduce clarity, but if clarity is already high, adding just a bit of extra late energy can make the sound more beautiful. And it did. The three of us were proud of the result, as were representatives of the Toronto Symphony, who agreed to hold a series of Mozart performances in the restored hall. But when Bradley measured the space there appeared to be almost no difference between the system off and the system on. Where there was a difference, the measurement calculations indicated that the sound should be worse. This experience started a thirty-year quest to find why the most common ISO 3382 measures fail to predict actual sound quality in particular seats, and what might be done to fix them. Neil Muncy was a constant supporter. Acoustics is like an onion, he said. You think you understand one layer, and just below it is just another layer you don t understand. I asked John Bradley why he did all his calculations in octave bands rather than third-octave bands. He said, third-octave bands just give you three times as much information that you don t understand. But even onions have hearts, and the heart of the acoustic onion is CLARITY. Not the clarity that is badly defined by C80 and C50. I am referring to the kind of clarity that makes a listener snap to full attention and say What was that! This is the kind of clarity our brains evolved to instantly detect and the ear and brain hardware that detects it holds the key to understanding why the common ISO measures perform poorly, and how to replace them with measures that work. The author recognizes the considerable hubris in this statement. While preparing this preprint I attended a chamber music concert in a large church in Cambridge. Many years ago I performed in this church, and continue to make well-received recordings. But about eight years ago some well-meaning people decided to cover all the acoustic plaster with dusky gold paint. The organ, a beautiful tracker instrument by Frobenius, is located flat against one of the walls of the crossing, and projects into the space with the strong advantage of the focused case and the wall behind it. It sounds fabulous in the new acoustic. But the RT at 500Hz is now almost 2.5 seconds, and chamber music, usually performed from the center of the crossing, has become problematic. The concert I heard presented the Serenade for Wind Instruments by, and the Schubert Octet. Knowing the problems in the hall, I bought a ticket in the third row. Clarity, as measured by the methods proposed in this preprint, should have been high, as there are no strong early reflections from the musicians to this seat. But the was a mess. Reverberation is not predicted by an energy sum of an impulse response. What you hear is the result of the interaction between held notes and the space. The loudness of reflections builds up with time. Figure three below shows the effect for early reflections. But in this space it is multiple late reflections that dominate. Short notes simply do not excite reverberation in this space. Long held notes do, and the is full of them. Whenever the three French horns played everyone else except the oboes became inaudible. I never heard the second clarinet regardless of how much I tried. This is the result of masking, not of the phase modulation I define as loss of clarity. The Schubert was much better. The notes were quicker, the strings more directional. I almost heard everything, (People sitting further back in the nave were probably not so lucky.) Given the far too short rehearsal times for such a concert the musicians have no chance to hear how their playing sounds in a hall, and might be unable to do anything about it if they could. Searching for a single number, or numbers, that would predict the quality in such a space is likely impossible although definitely worth trying. The concept of clarity presented here was developed to explain the quality of sound in relatively good halls. The measures work well in Boston Symphony Hall with its 2700 seats and a reverberation time of 1.9 seconds. The measures also work well in a 300 seat chamber music venue with a reverberation time of about 1 second. To work the measures assume there is no significant upward masking from excessive low frequencies, and that there is sufficient space between speech syllables that reverberation from one word does not mask the direct sound from the next word. The measures must be used with common sense. But if the goal of building a concert hall is to motivate people to occupy seats, then the ideas about clarity presented here, along with common knowledge of the directionality of musical instruments, might at least lead one to question the wisdom of putting forty percent or more of the audience behind the performers. Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 2

3 I learned about the importance of this kind of clarity by working with conductors such as Barenboim, Fish, Haenchen, Schonwandt, and Lockwood, and with five of the major drama directors in Copenhagen. In Berlin, Amsterdam, and Copenhagen we installed reverberation enhancement systems and had the opportunity to adjust them during rehearsals and live performances while sitting in different seats in the halls. All of these halls have excellent clarity, but lack the cushion of reverberation. But you can t add too much! All of these conductors could hear the point when half a db less reverberant level made a dramatic difference in clarity, and when they heard the A/B test they all chose clarity, not reverberation in their halls. Otherwise they wanted more reverberation. In Copenhagen we attempted to increase intelligibility and loudness for actors in a shoebox drama theater (not a good shape for drama) they called the New Stage. We installed beam forming microphones that covered the stage area, and designed a special electronic gate that turned the mikes on only for about 100ms at the beginning of each syllable. The sound was distributed through 64 carefully delayed high-quality speakers in the house. We tested it in a live performance of Uncle Vanya in Danish with a full audience. We turned the system on and off every 10 minutes as the five directors listened from different seats. At intermission they told me the system works, the actors are louder and more intelligible, but we hate it, turn it off. After 20 minutes we had a consensus for why. The system made the actors seem further away. We would rather the audience not hear the words than have the actors seem further away they said. Then the audience will listen more carefully. This is what we want. Their recommendation was to train the actors to project better. Schonwandt, after conducting Tristan in the old opera next door, wanted to know all about it, and laughed out loud at their recommendation. These young actors don t know how to speak Danish, he said. This experience points out problems with our current concept of intelligibility. In the author s view, intelligibility is only good if a listener can not only identify words and syllables, but be able to remember the meaning of entire sentences 20 minutes later. This task requires much more brain power, and the process must be completed before the next sentence is received. When the listener is distracted by other thoughts the task is even harder, so one of the goals of good acoustics is to cause them to pay close attention. When clarity is poor attention is lost, and too much brain is occupied with simply hearing the sound and parsing its meaning. There is no working memory left for the process of storage. Current definitions of very good and good intelligibility are way too lax. Cinema and drama directors know this. Cinemas are acoustically dry spaces with sound from linear-phase and highly directional hornloaded speakers. Boston drama theaters are apron-stage or old vaudeville palaces. Chamber music was written for similar spaces. You leave these forms at your peril. In this paper we will concentrate on the perception of clarity which is essentially the ability of a space to transfer information from a source to a listener with minimal loss of data. The perception of reverberance will come along for the ride, as reverberation is beautiful only when it enhances sound without degrading this information transfer. We must have measurement techniques that quantify these properties, not only for concerts and opera houses, but for classrooms. NEAR AND FAR IN ACOUSTIC ENVIRONMENTS When any creature perceives a sound the two most important immediate perceptions for survival are: Where did that sound come from?, and How close to me is the source? Humans can nearly instantly auditorily perceive whether many types of sound are close to them or farther away, even with one ear and without loudness cues. Finding a source visually takes too long. Determining the meaning of the sound comfort, threat, or the exchange of information - can come later. Our brains are hard-wired to pay immediate attention to sounds perceived as close; a phenomenon I call Attention!, or simply, A. So near and its opposite far, as well as sound localization are the most immediately necessary acoustic perceptions. How do we do it? What are the frequencies and the physics of distance and direction detection in reverberant environments? Let s start with frequencies. It is well known that the information content of speech is (almost) entirely in frequencies above 500Hz. For all people, even children, this means that information at least the identity of vowels is encoded in amplitude modulations of harmonics of a lower frequency tone. This method of encoding is universally used by almost any creature that wishes to communicate, from insects to elephants. Why? Because harmonics have a unique property they combine to make sharp peaks in the acoustic pressure at the frequency of the fundamental frequency that created them. It is the presence or the absence of these peaks that enables the perception of near and far. The sharp peaks also facilitate separating the signals from noise, and with the appropriate neural network the peaks from one sound source can be separated from another. The two traces shown below can be heard in the following link: Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 3

FIGURE 1. The broadband A-weighted pressure waveform of the syllable one. The top graph shows the sharp peaks created by the phase coherence of the upper harmonics.

4 FIGURE 1. The broadband A-weighted pressure waveform of the syllable one. The top graph shows the sharp peaks created by the phase coherence of the upper harmonics. The lower trace shows the same signal in the presence of reverberation. Both traces have the identical frequency response and sound power, but in the lower trace reflections have randomized the harmonic phases, and the regularly spaced peaks are gone. The upper trace is perceived as close to the listener, and the lower trace is perceived as far away. The impulse response for the lower trace has C80 and C50 equal to infinity, STI 0.96, and RASTI The current ISO measures are based on analyzing the sound power in impulse responses. But figure one suggests some interesting ideas: what if the ear was not sensitive to sound power, but to the peaks shown in the top trace? Then the un-modified signal should sound louder than the modified one. It does. What if the absence of the peaks causes the signal to be perceived as far-away? It does. What if the ear mechanism included some kind of coincidence circuit that would fire when the spacing between the peaks corresponds to a certain pitch? Such a circuit would explain our uncanny ability to separate concurrent sounds from each other if they have different pitches. Such a coincidence circuit needs to be precisely tuned if it is to work, which explains our extraordinary acuity for pitch. Humans hear pitch circular in octaves: a C has the pitch of C regardless of the octave in which it is played. Coincidence circuits have the same property regardless of how they are constructed. Also, no matter how they are constructed, they all require five to ten coincidences to achieve the pitch acuity of human hearing. For a fundamental of ~120Hz this means a structure about 80ms long. This time period is very familiar to us from both the study of loudness and intelligibility. Clearly some such structure must exist. The author believes the circuit lies in the spiral ganglia, located just behind the inner hair cells in the inner ear. Each hair cell is enervated by at least 20 spiral ganglia. Although we do not know exactly how they are wired or how they work, there is speculation in the hearing and speech literature that the spiral ganglia are involved in perceiving sound in noisy environments. They are also damaged by continuous loud noise, reducing our ability to hear speech in noisy environments. This is an exciting field. We will soon know more, but we know enough to make some powerful predictions about sound in rooms. First the peaks we see above only exist when the incoming signal consists of a tone with a definite pitch and lots of upper harmonics. Furthermore, the peaks only exist when there are two or more harmonics at the same time within one critical band. Critical bands have approximately an 1/3 rd octave width, which implies that for my voice the perception of near and far will only be detectable above about 1500Hz, will be impossible to detect from a whisper, and from the sound of a group of voices speaking or singing the same pitch. In this case the upper harmonics are randomized by the chorus effect. The near/far perception from a female voice should only be perceived at about 2500Hz and higher. All these predictions turn out to be true. We conclude that frequencies above 1000Hz play a vital role in our abilities to perceive sound. AZIMUTH DETECTION IN REVERBERANT ENVIRONMENTS What about localization? We know well that in anechoic environments binaural localization at frequencies below 1000Hz is achieved through the ITD, and at higher frequencies by the ILD. Is this true for music or speech in a reverberant space? It turns out that it is not. If we sharply low-pass speech it can still be localized in an anechoic space, but it becomes increasingly difficult to do it when reverberation is present. If we study the threshold of Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 4

5 localization of male speech as a function of the direct to reverberant ratio (D/R) in octave bands we find the threshold drops 6dB/octave as frequency rises, up to about 1000Hz, and then holds constant [1]. Localization and the perception of clarity is a high-frequency phenomenon in reverberant spaces. This does not mean that localization is entirely dependent on the ILD, although above 1000Hz head shadowing can produce at least a 1dB difference in level between the two ears for a 2 degree shift in frontal azimuth. (Two degrees is the accepted JND for binaural localization of speech.) But the peaks shown in figure one exhibit time delay between the two ears, and speech can easily be localized in azimuth by ITD alone although the acuity is less than when ITD and ILD are combined. It is exciting and a bit disturbing to realize that the information content of speech and much music, the sense of sonic distance, the perception of clarity, the ability to sharply localize instruments, and the ability to separate multiple sound streams, all depend critically on the harmonics above 1000Hz from sounds with a definite pitch. CLARITY, REVERBERANCE, AND ENVELOPMENT Bradley and Soulodre [2] have suggested that the perception of Envelopment depends on the strength of reverberant energy arriving 80ms or more after the direct sound. (For orchestral music the author would prefer 100 to 120ms.) They proposed a measure called late lateral energy level, or GLL, as a measure for envelopment. This is a useful proposal as the strength of late reverberation is vital for both reverberance and envelopment. But built into their measure is the assumption that the ear/brain system knows when a sound begins. When you examine an impulse response the direct sound is always prominent. But speech and musical sounds do not always have a rapid rise-time, particularly at low frequencies, and when you convolve them with an impulse response the onset of the direct sound is often obscure. The sound pressure rises gradually as reflections build up, and forward masking can make the direct sound inaudible. When does the 80ms start? Stream formation [3] presents a more serious problem. In the author s experience reverberance and envelopment are distinct perceptions that do not exist unless the brain can separate the direct sound which is perceived as a clear note or syllable from the reverberation that follows it. When separation is possible a sequence of notes or syllables is assembled by the brain into a foreground sound stream composed of distinct elements. The reverberation from these elements is perceived as a continuous background stream which surrounds the listener even if the bulk of the reverberation comes from the front. But when the direct sound is not separable, reverberation fuses with the notes to form a single stream. Such sounds are not sharply localizable. The resulting blur is frontal, and is sometimes called Auditory Source Width. The sound is reverberant but not enveloping or particularly beautiful. In experiments where the level of the direct sound is gradually raised from inaudible to audible the reverberation abruptly becomes both louder and from frontal to enveloping as soon as the direct sound is audible. In a great many concert venues, both for symphony orchestra and for chamber music, more than half-way back into the hall reflections are sufficiently strong that the direct sound is completely masked. The sound is all reverberation, and regardless of the value of GLL the sound is not enveloping. GLL may be a promising way to quantify envelopment, but we must first be sure that there is sufficient clarity that envelopment can be perceived. It is the clarity perception from the upper harmonics that triggers the onset of the >80ms delay before reverberance and envelopment can be detected. The author experimented by filtering the direct sound to include only frequencies above 1500Hz, and then performing a threshold experiment on the perception of envelopment as a function of D/R. The resulting envelopment sounded the same as when full-bandwidth direct sound was used, and the threshold was also the same. Clarity in the high frequencies triggers the sense of envelopment for all frequencies in a reverberant environment. But as Bradley and Soulodre proposed, the late reverberation must be strong enough to be heard. HOW CAN CLARITY BE MEASURED? If clarity in a hall or a classroom depends on pitched sounds either speech or music and on harmonics in the formant range of speech, it is obvious that we need to put particular emphasis (as human hearing does) on these frequencies. Furthermore, we should do it in a way that emulates the type of sound pressure profiles actually present in rooms. In short we need to turn data collected through impulse responses into something resembling the sound pressure in rooms with natural signals before we attempt to analyze them. The most obvious way to do this is to use recorded speech or music as inputs to our analysis system. Figure one suggests a method to do this. It was pointed out that the top trace sounds louder than the bottom trace, even though they have the same sound power. What if we measure the peak level of the waveform and compare it to the RMS Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 5

level of the waveform? This is promising, but does not work very well, as it makes no accounting for the pitch of the signal, which we know is essential for human hearing.

6 level of the waveform? This is promising, but does not work very well, as it makes no accounting for the pitch of the signal, which we know is essential for human hearing. It was suggested above that the ear contains some kind of coincidence detector that resembles a comb filter of approximately 80ms length. If we add such a structure to the model the peaks become MUCH larger than the sound power when the harmonic phases are unaltered by reverberation. We can make a measure out of this observation. If we pass the sound example above through the hearing model described in [1] that includes comb filters, and then plot the ratio between the RMS sound power at the input of the comb filter and the peak level at the output of the comb filter that most nearly matches the pitch of the signal, the result is shown in figure 2. FIGURE 2. The ratio of the peak output from a maximally pitch-corresponding 40ms comb filter following a critical band filter at 1600Hz to the sound power at the input to the same comb filter. The input signal is four sequences of the syllables one to ten. The first sequence is highly reverberant, the last sequence is maximally clear. The middle two are in-between in the amount of phase randomization applied through allpass reverberant impulse responses. All four sequences have the identical sound power and frequency response, but the peak output after comb filtering is lower when excess reverberation is present. The yellow bars represent an approximate average of the ratio during each syllable. The zero in the graph was set in a calibration procedure to fall between the maximum and minimum of the averages. Figure 2 shows that by this measurement technique the first two sequences would be judged as far and the second two as near. This difference is audible. We tried the same experiment with a female voice. To get the same result we had to use a higher frequency critical band. A similar difference in peak level to sound power was observed, although the zero-point was different. The author can supply the code that produced the above graph to anyone interested, but it is a work in progress. It is a windows executable written in C which takes input from a.wav file of binaurally recorded speech. It is calibrated for sound sources in front of the dummy head. Only one channel is analyzed. The sense of clarity the circuit detects is audible with just one ear, but the calibration assumes that both ears are active, and that the reverberant field is diffuse. It is easy to see that different syllables return different results. This is to be expected. Reverberation is a random process, which means that some pitches will sail through a particular impulse response with very little alterations of phase, and other syllables will fare quite differently. The well known variation of the ISO measures with seat position are one aspect of this variability. A future version of the code will examine many critical bands. In the author s opinion this measurement method is crude. Looking at the waveforms in the ear model it uses convinces me that other measures are possible, and perhaps closer to the mechanism used in human hearing. These might include the ease of determining pitch, which varies from cycle to cycle and from critical band to critical band when excess reverberation is present. MEASURING CLARITY FROM A BINAURAL IMPULSE RESPONSE In previous papers [1,4] the author has proposed a method for measuring clarity that was derived from data predicting the threshold of localization. The measure was called LOC. The measure first breaks a binaural impulse response into two parts, the direct sound component, and the reflected component. It then estimates the rate of nerve firings from the two components as a function of time when the impulse response is excited by a syllable or a note. Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 6

7 Note that this procedure differs from the ISO measures very significantly. First, it is interested in the rate of nerve firings, which are not proportional to the sound energy, but proportional to the logarithm of sound energy over a finite dynamic range. Second, the measure is not interested in the impulse response per-se, but in the ratio of the sound pressure of the direct sound during a note versus the sound pressure of the reflected energy which builds up with time as the note is held. Third, since we are trying to measure a phenomenon that only exists above 1000Hz, we high-pass the IR at this frequency before analyzing it. This method is much closer to how our hearing really works than any ISO measurement. FIGURE 3. A diagram that shows the method for determining the value of LOC. An impulse response (here with an RT of one second and a D/R of -10dB) is high-pass filtered and divided into the direct sound component and a reflected component. When convolved with a note the direct sound component is constant with time for the duration of the note. The reverberant component builds up slowly with time eventually reaching a much larger firing rate than the rate from the direct sound. A sound is considered localizable if the integrated number of nerve firings from the direct sound is larger than the integrated number of firings from the reflections in the first 100ms. LOC is the ratio of the area in violet divided by the area in light blue expressed in db. In practice clarity is reliably heard in music when LOC is greater than 3dB. In figure 3 LOC is less than zero. FIGURE 4. A diagram similar to figure 3, but for an impulse response with an RT of 2 seconds and a D/R of -10dB. Note that while the impulse response in figure 3 did not predict the ability to localize, the value for LOC in this picture is higher than in figure 3. The impulse response in figure 3 was constructed from exponentially decaying white noise with an initial time gap of 10ms. The diagram immediately explains why the same direct to reverberant ratio can have very different sonic properties in two different spaces. In small spaces the reflections come much sooner than they do in larger spaces, which reduces the value of LOC. Figure 4 shows a similar diagram where the value of RT was increased to 2 Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 7

8 seconds. Because the D/R was held constant it takes longer for the full value of reflected sound pressure to build up, and thus there are fewer nerve firings inside the 100ms window. LOC is higher than in figure 2. Note particularly the effect of integrating the log of sound pressure rather than sound power. ISO measures all integrate sound power, not from sound but from impulse responses. This method emphasizes the strongest reflections, as their level is squared. Integrating the log of the level of a proposed sound, not an impulse response, emphasizes reflections that come earliest, and de-emphasizes their level. This is what ears do. LOC has been tested in a number of real spaces with good results. It is the first measure the author has found that can predict the seats in a hall where clarity disappears, as well as the seats where clarity is good. It is also simple to compute, requiring nothing more than a few integrals and logarithms. (Although the problems of instrument directivity and hall occupancy are ever-present.) There are only two parameters in the calculation, and they are plausible. The first is that the dynamic range over which nerves can fire is about 20dB, or a factor of ten in pressure. The second is that there is an approximately 80ms to 100ms window over which important aspects of sound are integrated. Such a window is well known from studies of loudness and other studies. FIGURE 5. This Figure 5 shows the average phase randomization induced into upper harmonics of male speech by white-noise impulse responses with a RT of 1 second and various values of pre-delay and D/R. The vertical axis shows that when D/R reduces below -10dB the average phase randomization reaches an asymptote below 4dB radians (90 degrees.) The author has proposed that clarity decreases because reflections have the power to randomize the phases of the upper harmonics of tones with a definite pitch. If the proposal is to be true, it must be possible to directly measure the phenomenon by examining the ability of a particular impulse response to randomize phase. We find that such a measure is possible using Matlab functions. Once again we start with a binaural impulse response (IR), including the direct sound. We window the IR at 80ms, transform to the frequency domain with an FFT, and unwrap the phase. The result is a downward sloping line, with the slope determined by the group delay of the IR. We then fit a thirdorder polynomial to the phase, and then subtract the fit from the original phase function. The result is a straight line with seemingly random wobbles. The question is, are the wobbles sufficient to modulate the phases of adjacent harmonics so that the peaks seen in figure 1 no longer exist? The process was tested with an array of white-noise impulse responses with different values of pre-delay and D/R. The results of such tests are plotted on figure 5. (The colors in figure 5 correspond to pre-delay values of black = 40ms, green = 30ms, cyan = 20ms, red = 10ms, and blue = 0ms.) Whether or not the method used to make figure 5 is a useful measure it is highly interesting to the author. It shows that not only does the degree of randomization rise linearly as D/R goes down, but if we propose that the threshold for localization might correspond to about 1dB radians on the graph, or about +-64 degrees, the graph predicts my localization data [1] surprisingly well. The graph tells us something else important: that the randomization of phase is in fact random. To make the graph in figure five it was necessary to average 8 different random impulse responses for each pair of pre-delays and D/R. Otherwise the separation between the different pre-delays was not uniform. This is telling us that at a fundamental level we should not expect a single impulse response to tell us everything we might like to know about a single seat in a hall. Another very interesting observation one can make from figure 5 is that the human ability to detect clarity in the presence of reverberation is within 2dB of a theoretical limit. That is amazing acuity. Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 8

9 WORKING MEMORY IS LIMITED There is space in this preprint for a short verbal sketch of how the ear and brain process speech. In brief: sound events are filtered into continuous third-octave bands by the basilar membrane, the amplitudes of the formants are detected by the hair cells, separated into streams by the spiral ganglia, and localized in the brain stem. These processes are sub-conscious and relatively immediate (minus the ~80ms involved in phoneme separation.) Higher levels of the brain all of which consume working memory then: use localization and timbre data to assemble separated phonemes into unique streams, use pattern recognition to identify phonemes, use more extensive pattern recognition to identify senomes (triples of phonemes), use patterns and context to identify words, then sentences, then meaning, and finally commit meaning to long-term memory. In some languages, such as German, the meaning of a sentence is not clear until the end, and the whole thing must be temporarily stored until the sentence is complete. Every step in this process takes time and working memory. When clarity is poor phoneme recognition becomes difficult, and more time must be spent inferring from context and grammar what word was meant. When intelligibility is poor senomes run together, word identification becomes more difficult, and even more demands on context and grammar are made, all using more working memory. The result can be a plausible identification of meaning, but no remaining working memory for the process of long-term storage. Current measures would assess the intelligibility in such a situation as very good. And then there is the issue of Attention, or A. Like the audience in Copenhagen if you are fully engaged in listening there is more motivation to put your working memory to listening rather than thinking up clever comments. Clarity is what drives this kind of attention, and the results on classroom performance are significant. For the last few years the first year required course in Physics at MIT has randomly assigned the ~500 students to fixed seats in the large lecture hall. The students in front of the hall reliably do at least a half-grade better than students further back. Shouldn t we give all students the same advantage? Hint almost always sound reinforcement reduces A, particularly when more than one loudspeaker is used. But a single directional speaker below a lectern can help. CONCLUSIONS Music, drama, and speech all depend on the clear transfer of information, and for humans auditory information is largely encoded in harmonics above 1000Hz from low frequency tones. When the phases of these harmonics are sufficiently randomized by reflections we perceive sound as distant or muddy, and information is more difficult to extract and to remember. In halls it is the onset of sounds that determine clarity, during a held note the direct component is nearly always inaudible. But if sound decay masks sound onsets both clarity and intelligibility suffer. If we wish to accurately assess sound quality we need to: manipulate impulse response data to emulate the actual sound pressures received by ears from music and speech, and then analyze these pressures by visually and mathematically examining the relationship between the direct sound component and the build-up of reverberation. When clarity is good and there is sufficient late reverberation the beautiful perception of envelopment can arise. REFERENCES Griesinger, D. H. "The relationship between audience engagement and the ability to perceive pitch, timbre, azimuth and envelopment of multiple sources" A preprint for a presentation at the 126 th convention of the Audio Engineering Society, May Available from the Audio Engineering Society. Bradley, J. S. Review of objective room acoustics measures and future needs Proceedings of the International Symposium on Room Acoustics, ISRA August 2010, Melbourne, Australia Griesinger, D. H. "The psychoacoustics of apparent source width, spaciousness & envelopment in performance spaces" Acta Acustica Vol. 83 (1997) Griesinger, D. H. "Pitch, Timbre, Source Separation" Powerpoints with clickable sound files presented in Paris, September Proceedings of Meetings on Acoustics, Vol. 19, (2013) Page 9

1aAA14. The audibility of direct sound as a key to measuring the clarity of speech and music

1aAA14. The audibility of direct sound as a key to measuring the clarity of speech and music Session: Monday Morning, Oct 31 Time: 11:30 Author: David H. Griesinger Location: David Griesinger Acoustics,