Voice segregation by difference in fundamental frequency: Effect of masker type

Similar documents
AUD 6306 Speech Science

Behavioral and neural identification of birdsong under several masking conditions

Informational masking of speech produced by speech-like sounds without linguistic content

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Precedence-based speech segregation in a virtual auditory environment

Auditory scene analysis

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Proceedings of Meetings on Acoustics

Pitch perception for mixtures of spectrally overlapping harmonic complex tones

The Tone Height of Multiharmonic Sounds. Introduction

Brian C. J. Moore Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England

Temporal Envelope and Periodicity Cues on Musical Pitch Discrimination with Acoustic Simulation of Cochlear Implant

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Pitch is one of the most common terms used to describe sound.

Release from speech-on-speech masking in a front-and-back geometry

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Pitch-Synchronous Spectrogram: Principles and Applications

Psychoacoustics. lecturer:

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Informational Masking and Trained Listening. Undergraduate Honors Thesis

Quarterly Progress and Status Report. An attempt to predict the masking effect of vowel spectra

Music Perception with Combined Stimulation

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

The Cocktail Party Effect. Binaural Masking. The Precedence Effect. Music 175: Time and Space

Digital audio and computer music. COS 116, Spring 2012 Guest lecture: Rebecca Fiebrink

1. Introduction NCMMSC2009

I. INTRODUCTION. 1 place Stravinsky, Paris, France; electronic mail:

This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.

DERIVING A TIMBRE SPACE FOR THREE TYPES OF COMPLEX TONES VARYING IN SPECTRAL ROLL-OFF

1aAA14. The audibility of direct sound as a key to measuring the clarity of speech and music

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

The presence of multiple sound sources is a routine occurrence

INTRODUCTION J. Acoust. Soc. Am. 107 (3), March /2000/107(3)/1589/9/$ Acoustical Society of America 1589

I. INTRODUCTION. Electronic mail:

Tempo and Beat Analysis

Topic 10. Multi-pitch Analysis

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MASTER'S THESIS. Listener Envelopment

We realize that this is really small, if we consider that the atmospheric pressure 2 is

Spatial-frequency masking with briefly pulsed patterns

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

A Behavioral Study on the Effects of Rock Music on Auditory Attention

Why are natural sounds detected faster than pips?

Pitch: The Perceptual Ends of the Periodicity; but Of What Periodicity?

Topic 4. Single Pitch Detection

EFFECT OF REPETITION OF STANDARD AND COMPARISON TONES ON RECOGNITION MEMORY FOR PITCH '

1 Introduction to PSQM

UNIVERSITY OF DUBLIN TRINITY COLLEGE

2 Autocorrelation verses Strobed Temporal Integration

Do Zwicker Tones Evoke a Musical Pitch?

Consonance perception of complex-tone dyads and chords

Modeling sound quality from psychoacoustic measures

Music Source Separation

CHAPTER 20.2 SPEECH AND MUSICAL SOUNDS

I. LISTENING. For most people, sound is background only. To the sound designer/producer, sound is everything.!tc 243 2

MEASURING LOUDNESS OF LONG AND SHORT TONES USING MAGNITUDE ESTIMATION

Effect of room acoustic conditions on masking efficiency

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

A SEMANTIC DIFFERENTIAL STUDY OF LOW AMPLITUDE SUPERSONIC AIRCRAFT NOISE AND OTHER TRANSIENT SOUNDS

Experiments on tone adjustments

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Proceedings of Meetings on Acoustics

Effect of harmonic rank on sequential sound segregation

THE PSYCHOACOUSTICS OF MULTICHANNEL AUDIO. J. ROBERT STUART Meridian Audio Ltd Stonehill, Huntingdon, PE18 6ED England

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

Speech and Speaker Recognition for the Command of an Industrial Robot

Concert halls conveyors of musical expressions

CTP431- Music and Audio Computing Musical Acoustics. Graduate School of Culture Technology KAIST Juhan Nam

Loudness and Sharpness Calculation

Acoustic Prosodic Features In Sarcastic Utterances

Music Representations

Welcome to Vibrationdata

Real-time magnetic resonance imaging investigation of resonance tuning in soprano singing

Analysis, Synthesis, and Perception of Musical Sounds

Auditory Illusions. Diana Deutsch. The sounds we perceive do not always correspond to those that are

Quarterly Progress and Status Report. Violin timbre and the picket fence

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

2. AN INTROSPECTION OF THE MORPHING PROCESS

The perception of concurrent sound objects through the use of harmonic enhancement: a study of auditory attention

CSC475 Music Information Retrieval

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

Proceedings of Meetings on Acoustics

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Department of Psychology, University of York. NIHR Nottingham Hearing Biomedical Research Unit. Hull York Medical School, University of York

Digital music synthesis using DSP

A prototype system for rule-based expressive modifications of audio recordings

Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm

Influence of tonal context and timbral variation on perception of pitch

Sound Quality Analysis of Electric Parking Brake

Symmetric interactions and interference between pitch and timbre

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Speaking in Minor and Major Keys

2018 Fall CTP431: Music and Audio Computing Fundamentals of Musical Acoustics

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

Measurement of overtone frequencies of a toy piano and perception of its pitch

Processing Linguistic and Musical Pitch by English-Speaking Musicians and Non-Musicians

9.35 Sensation And Perception Spring 2009

Sound design strategy for enhancing subjective preference of EV interior sound

Transcription:

Voice segregation by difference in fundamental frequency: Effect of masker type Mickael L. D. Deroche a) Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, Maryland 21205 mderoch2@jhmi.edu John F. Culling School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom cullingj@cardiff.ac.uk Abstract: Speech reception thresholds were measured for a voice against two different maskers: Either two concurrent voices with the same fundamental frequency (F0) or a harmonic complex with the same long-term excitation pattern and broadband temporal envelope as the masking sentences (speech-modulated buzz). All sources had steady F0s. A difference in F0 of 2 or 8 semitones provided a 5-dB benefit for buzz maskers, whereas it provided a 3- and 8-dB benefit, respectively, for masking sentences. Whether intelligibility of a voice increases abruptly with small DF0s or gradually toward larger DF0s seems to depend on the nature of the masker. VC 2013 Acoustical Society of America PACS numbers: 43.66.Dc, 43.71.Gv [SGS] Date Received: July 13, 2013 Date Accepted: October 7, 2013 1. Introduction Effects of a difference in fundamental frequency (DF0) between competing sources were first reported by Brokx and Nooteboom (1982). They used monotonously spoken voices as well as voices that had been resynthesized to have a steady fundamental frequency (F0) and showed that speech recognition improved gradually with increasing DF0. Interestingly, with a DF0 of one octave, the performance dropped, suggesting that the mechanisms that exploit DF0s must be related to the harmonic structure of the competing sources. Later experiments used synthesized vowels rather than resynthesized sentences (Scheffers, 1983; Culling and Darwin, 1993) and found that vowel identification improved sharply for very small DF0s and saturated above one semitone. The mechanisms underlying these effects are still currently under investigation, but one important piece of the puzzle, which is the focus of the present study, is to understand why the pattern of improvement that results from DF0s is very sharp for vowels but gradual for sentences. 1.1 Energetic masking de Cheveigne et al. (1995, 1997a) showed that identification of simultaneous synthetic vowels depended on their harmonicity; identification of a given vowel was unaffected by its own harmonicity, but was reduced if the competing vowel was inharmonic. Deroche and Culling (2011a) demonstrated a similar effect for the recognition of a voice separated by two semitones from a harmonic complex with a speech-shaped spectral profile. Inharmonicity was generated by F0 modulation and reverberation a) Author to whom correspondence should be addressed. J. Acoust. Soc. Am. 134 (5), November 2013 VC 2013 Acoustical Society of America EL465

applied independently to the competing sources. These manipulations resulted in substantial elevation of speech reception threshold (SRT) when they were applied to the masker. As with synthetic vowels, the effect of a DF0 on intelligibility of the target voice did not depend on its harmonicity, but instead depended strongly on the masker s harmonicity. There are at least three main reasons why harmonic complexes should allow these large releases from energetic masking. First, when harmonic partials are in specific phase relationships, such as cosine, sine, and positive Schroeder-phase, they produce highly modulated waveforms after cochlear filtering. There are short temporal dips, as long as the fundamental period, in within-channel temporal envelopes that may allow listeners a better target-to-masker ratio (TMR) at these specific times, which is facilitated by cochlear compression (Kohlrausch and Sander, 1995; Carlyon and Datta, 1997). However, various forms of this phase hypothesis have been examined and seem implausible (de Cheveigne et al., 1997b; de Cheveigne, 1999). Second, harmonic complexes have spectral dips that allow a better TMR at center frequencies located in between resolved partials. Third, even when a role for temporal and spectral dips is excluded, detection of a narrowband of noise is easier against harmonic than inharmonic complexes (Deroche and Culling, 2011b). Periodicity in the masker waveforms may thus also contribute to the reduced masking of harmonic complexes. It remains, however, unclear how much, and in which conditions (i.e., F0 range, phase settings, spectral profile, masker level), each of these accounts contributes to the release from masking observed experimentally. 1.2 Informational masking In the presence of several voices, there might be an ambiguity as to which voice one should attend to, resulting in informational masking (Kidd et al., 2005). For unprocessed sentences, energetic and informational masking occurs together. However, Kidd et al. processed the target and masking sentences such that they occupied different frequency bands, ensuring that energetic masking would be largely absent. They observed large amounts of masking for speech-on-speech configurations, but not for speech-on-noise configurations. In order to release this informational masking, listeners can use a variety of cues to group sounds into sequential streams: F0 (Darwin and Hukin, 2000; Darwin et al., 2003; Drullman and Bronkhorst, 2004), signal-to-noise ratio (Brungart, 2001; Brungart et al., 2001), spatial separation (Darwin and Hukin, 2000; Freyman et al., 2001; Hawley et al., 2004; Kidd et al., 2005, Lee and Shinn-Cunningham, 2008), priming by the target talker or onset cues (Freyman et al., 2004), vocal-tract length, sex difference, and prosody (Culling and Porter, 2004; Darwin et al., 2003; Brungart et al., 2001), and even tactile cues (Drullman and Bronkhorst, 2004). In the case of overlapping dialogue, it is unclear what proportion of the observed DF0 effects can be attributed to a release of energetic masking, and what proportion to a release of informational masking. 1.3 Aim of the present experiment Identification of competing vowels improves with very small DF0s and is not improved further by increasing the DF0 beyond one or two semitones. One would hope that the mechanisms underlying these improvements in vowels would also be involved in the segregation of voices. However, recognition of a voice in the presence of competing voices improves more gradually as DF0 increases (Bird and Darwin, 1998). If the same mechanisms are involved, why is the pattern of improvement so different? The present study attempts to answer this question by (a) testing whether the sharp improvement with small DF0s can occur with full sentences (or whether it is restricted to vowels), and (b) testing whether the gradual improvement with DF0 depends on the masker type. To this aim, the benefit of 2 or 8 semitones DF0 was measured, using target sentences masked by two masking sentences, spoken by the same talker as the target, or masked by harmonic complexes matched in both temporal and spectral envelope, hereafter termed speech-modulated buzz. Little or no informational masking was expected in the presence of buzz maskers, whereas using the same materials as in the EL466 J. Acoust. Soc. Am. 134 (5), November 2013 M. L. D. Deroche and J. F. Culling: Voice segregation by fundamental frequency

present study, Hawley et al. (2004) found evidence of informational masking when two simultaneous masking voices were used. Note that superimposing a large number of masking voices would reduce or eliminate informational masking, but a single voice would also result in little informational masking especially when the same sentence is used throughout a given SRT measurement and displayed on the screen in front of the listener. 2. Method 2.1 Listeners Two different groups of listeners took part in the experiment: The first group had 12 listeners; the second had 8 listeners. They were all undergraduate students, aged between 20 and 30 yrs old, who were paid for their participation. All listeners reported normal hearing and English as their first language. Each listener attended a single 45-min experimental session. 2.2 Stimuli and conditions Two types of maskers were used: Speech-modulated buzz and 2-concurrent masking sentences. For the first group of listeners, target and masker had either the same F0 or F0s that were two semitones apart. For the second group, target and masker had either the same F0 orf0s that were eight semitones apart. The DF0 benefit could then be compared across the two masker types (within-subjects) and the two DF0 sizes (between-subjects). Speech materials were 80 sentences recorded from a male speaker of American English in the original IEEE list format (IEEE, 1969). The Praat PSOLA package resynthesized each sentence with a specified F0 throughout. Target sentences were monotonized at 110, 123.5 (2 semitones higher), and 174.6 Hz (8 semitones higher). Eight masking sentences, different from any of the targets, were monotonized at 110 Hz, and then added by pairs to create four 2-voice maskers. The DF0 was thus always generated from varying the target s F0. Note that changing the masker s F0 over a large range results in very different opportunities for listeners to glimpse spectrally in between masker partials. The buzz maskers were created from a broadband sine-phase harmonic complex, based on a F0 fixed at 110 Hz. A sine-phase complex was chosen because it was a closer approximation to glottal pulse excitation than a random-phase complex. This complex was filtered with a Hamming-window based linear-phase finite impulse response filter with 5000 coefficients designed to match the average long-term excitation pattern of the monotonized masking sentences (as in Deroche and Culling, 2011a). In order to further increase their similarity with the masking sentences, the temporal envelopes of the four 2-voice maskers were extracted by half-wave rectification and low-pass filtering with a cutoff at 20 Hz and applied to the complex with a speech-like spectral profile. This manipulation resulted in four speech-modulated buzz maskers. All maskers and target stimuli were equalized to the same root-mean-square power, prior to changes in TMR during the adaptive track. Maskers were presented at 69 db sound pressure level and the relative target level was adjusted. 2.3 Procedure and equipment The procedure and equipment were similar to those used by Deroche and Culling (2011a). In each list of ten sentences, listeners attempted to type a transcript. Presented with the actual transcript, containing five highlighted key words, listeners scored themselves, disregarding spelling errors. TMR decreased when three words or more were identified and increased when two words or less were identified. Following this 1-up/1-down adaptive rule, measurement of each SRT was taken as the mean TMR in the last eight sentences, tracking 50% intelligibility. For the 2-voice maskers, the two masking sentences were displayed on a computer screen in front of the listener who was instructed to disregard them and to listen to the target sentence (not displayed on the screen). Note that for this masker type a SRT of 0 db occurred when the target level was 3 db higher than each of the two masking sentences. Stimuli were presented diotically over Sennheiser HD650 J. Acoust. Soc. Am. 134 (5), November 2013 M. L. D. Deroche and J. F. Culling: Voice segregation by fundamental frequency EL467

headphones in a single-walled IAC sound-attenuating booth within a sound-treated room in a single-walled IAC sound-attenuating booth within a sound-treated room. Prior to the experimental session, listeners were familiarized with the task with 3 practice runs using 30 unprocessed sentences not used in the experiment, masked by speech-modulated buzz (1 run) or by 2-voice speech maskers (2 runs), also not used in the rest of the experiment. The following eight runs measured two SRTs for each of the four experimental conditions, which were averaged to give one mean SRT per condition. While each of the 80 target sentences was presented to every listener in the same order, the order of the conditions was rotated for successive listeners, to counterbalance order effects. No sentence was presented twice to a listener within the experiment, and each listener could only sign up once. 3. Results The left panel of Fig. 1 shows that mean SRTs were overall lower for buzz than for 2-voice speech maskers, and lower when target and maskers differed in F0. The statistical analysis focused on the DF0 benefits, extracted in each group from the difference in SRTs between the no-df0 and the DF0 conditions and plotted in the right panel. Analysis of variance with one within-subjects factor (masker type) and one betweensubjects factor (DF0 size) revealed no main effect of masker type [F(1,18) ¼ 1.9, p > 0.05], but a main effect of DF0 size [F(1,18) ¼ 14.2, p < 0.001] and a strong interaction [F(1,18) ¼ 17.5, p < 0.001]. The DF0 benefit was overall larger at 8 than at 2 semitones, but this depended largely on the masker type. Post hoc pairwise comparisons revealed that for the speech-modulated buzz masker, the DF0 benefit was similar at 2 and 8 semitones [F(1,18) ¼ 0.2, p > 0.05], whereas for the speech maskers, it was greater by 5 db for 8 than 2 semitones [F(1,18) ¼ 22.7, p < 0.001]. At 2-semitones DF0, the benefit was smaller for speech maskers than for speech-modulated buzz [F(1,18) ¼ 4.9, p < 0.05], and vice versa at 8-semitones DF0 [F(1,18) ¼ 12.9, p < 0.01]. 4. Discussion A DF0 of 2 semitones provided a similar benefit as a DF0 of 8 semitones with speechmodulated buzz maskers (Fig. 1, left side of the right panel). When the harmonic structure of a voice is shifted very little from that of a speech-modulated speech-shaped harmonic complex, intelligibility of this voice improves quite dramatically, by 5 db, but does not seem to improve further by increasing the DF0 size. This sharp pattern of improvement with DF0 (i.e., improvements for very small DF0s which saturate above 2 semitones) has only been observed previously with vowel stimuli (Culling and Darwin, Fig. 1. (Left panel) Mean speech reception thresholds for two types of masker: Speech-modulated buzz or two concurrent masking sentences, with the same F0 or a different F0 than that of the target. One group of listeners was tested for a DF0 of two semitones; the other group for a DF0 of eight semitones. Error bars are 61 standard error of the mean. (Right panel) Mean benefits of 2- and 8-semitones DF0 for the two masker types. Error bars are 61 standard error of the mean. EL468 J. Acoust. Soc. Am. 134 (5), November 2013 M. L. D. Deroche and J. F. Culling: Voice segregation by fundamental frequency

1993; de Cheveigne et al., 1997a,b). In contrast, the use of two concurrent sentences as maskers produced a more gradual pattern of improvement with increments in DF0: A 3-dB benefit at 2 semitones and an 8-dB benefit at 8 semitones (Fig. 1, right side of the right panel). So the reason why intelligibility of a target voice keeps improving as its F0 is set further and further apart from a competing voice (e.g., Bird and Darwin, 1998) may not be due to the use of sentences per se, but the fact that maskers were sentences. Masking sentences may automatically engage phonetic or lexical processing whereas buzz maskers may not. In other words, masking sentences may involve informational masking, whereas buzz maskers may not. Despite our efforts to match the temporal and spectral envelopes of the two masker types, there were breaks in the voicing of speech maskers in which glimpses could have been potentially available. So there was, in principle, more energetic masking with buzz than with speech maskers and yet intelligibility of the target voice was better against the buzz than against the two other voices. So it is likely that informational masking is responsible for this large increase in threshold with speech maskers. Listeners may then use DF0s to release both from energetic and informational masking invoked by masking voices, but a relatively large DF0 seems necessary. It may be that a large DF0 is required for competing voices to be perceptually segregated (Darwin et al., 2003), so the release from informational masking may not be accessible until the competing F0s are quite far apart. The benefit of a 2-semitone DF0 was significantly smaller for masking sentences than for buzz. The release from energetic masking may be reduced with sentences due to breaks in voicing which, as a result of unvoiced consonants, occurred more frequently than in the buzz maskers. The more interrupted the masker s F0, the less beneficial a DF0 may be. With masking voices, therefore, the release from energetic masking may be somewhat reduced and the release from informational masking may require large DF0s to be effective, resulting in more gradual improvements with DF0s. Finally, it is important to bear in mind that there are a variety of instantaneous DF0s in realistic situations of conversation. Their overall benefit may depend on the average DF0 over the long term of sentences (which may be as large as one octave for gender differences) and how robust the mechanisms underlying DF0 effects are to different degrees of F0 modulation. 5. Conclusion The present experiment investigated listeners ability to recognize a male talker in the presence of speech-modulated buzz or two same-male-talker maskers when they differ in F0 from the target by 0, 2, or 8 semitones. With speech-modulated buzz maskers, the DF0 benefit was about 5 db at 2 and 8 semitones, reminiscent of the DF0 effects observed for double-vowels. With speech maskers, the DF0 benefit increased from 3 to 8 db at 2 and 8 semitones, respectively. Therefore the pattern of improvement with DF0 seems to depend on the masker type, particularly whether it is linguistic or not. Acknowledgment This research was supported by the UK EPSRC while both authors were at Cardiff University. References and links Bird, J., and Darwin, C. J. (1998). Effects of a difference in fundamental frequency in separating two sentences, in Psychophysical and Physiological Advances in Hearing, edited by A. R. Palmer, A. Rees, A. Q. Summerfield, and R. Meddis (Whurr, London), pp. 263 269. Brokx, J., and Nooteboom, S. (1982). Intonation and the perceptual separation of simultaneous voices, J. Phonetics 10, 23 36. Brungart, D. (2001). Informational and energetic masking effects in the perception of two simultaneous talkers, J. Acoust. Soc. Am. 109, 1101 1109. J. Acoust. Soc. Am. 134 (5), November 2013 M. L. D. Deroche and J. F. Culling: Voice segregation by fundamental frequency EL469

Brungart, D., Simpson, B., Ericson, M., and Scott, K. (2001). Informational and energetic masking effects in the perception of multiple simultaneous talkers, J. Acoust. Soc. Am. 110, 2527 2538. Carlyon, R. P., and Datta, A. J. (1997). Excitation produced by Schroeder-phase complexes: Evidence for fast-acting compression in the auditory system, J. Acoust. Soc. Am. 101, 3636 3647. Culling, J. F., and Darwin, C. J. (1993). Perceptual separation of simultaneous vowels: Within and across-formant grouping by f0, J. Acoust. Soc. Am. 93, 3454 3467. Culling, J. F., and Porter, J. S. (2004). Effects of differences in the accent and gender of competing voices on speech segregation, in Auditory Signal Processing: Physiology, Psychoacoustics and Models, edited by D. Pressnitzer, A. de Cheveigne, S. McAdams, and L. Collet (Springer Verlag, New York), pp. 307 313. Darwin, C. J., Brungart, D. S., and Simpson, B. D. (2003). Effects of fundamental frequency and vocaltract length changes on attention to one of two simultaneous talkers, J. Acoust. Soc. Am. 114, 2913 2922. Darwin, C. J., and Hukin, R. W. (2000). Effectiveness of spatial cues, prosody and talker characteristics in selective attention, J. Acoust. Soc. Am. 107, 970 977. de Cheveigne, A. (1999). Waveform interactions and the segregation of concurrent vowels, J. Acoust. Soc. Am. 106, 2959 2972. de Cheveigne, A., Kawahara, H., Tsuzaki, M., and Aikawa, K. (1997a). Concurrent vowel segregation. I. Effects of relative amplitude and F0 difference, J. Acoust. Soc. Am. 101, 2839 2847. de Cheveigne, A., McAdams, S., Laroche, J., and Rosenberg, M. (1995). Identification of concurrent harmonic and inharmonic vowels: A test of the theory of harmonic cancellation and enhancement, J. Acoust. Soc. Am. 97, 3736 3748. de Cheveigne, A., McAdams, S., and Marin, C. (1997b). Concurrent vowel segregation. II. Effects of phase, harmonicity and task, J. Acoust. Soc. Am. 101, 2848 2856. Deroche, M. L. D., and Culling, J. F. (2011a). Voice segregation by difference in fundamental frequency: Evidence for harmonic cancellation, J. Acoust. Soc. Am. 130, 2855 2865. Deroche, M. L. D., and Culling, J. F. (2011b). Narrow noise band detection in a complex masker. Masking level difference due to harmonicity, Hear. Res. 282, 225 235. Drullman, R., and Bronkhorst, A. (2004). Speech perception and talker segregation: Effects of level, pitch, and tactile support with multiple simultaneous talkers, J. Acoust. Soc. Am. 116, 3090 3098. Freyman, R., Balakrishnan, U., and Helfer, K. (2001). Spatial release from informational masking in speech recognition, J. Acoust. Soc. Am. 109, 2112 2122. Freyman, R., Balakrishnan, U., and Helfer, K. (2004). Effect of number of masking talkers and auditory priming on informational masking in speech recognition, J. Acoust. Soc. Am. 115, 2246 2256. Hawley, M., Litovsky, R., and Culling, J. (2004). The benefit of binaural hearing in a cocktail party: Effect of location and type of masker, J. Acoust. Soc. Am. 115, 833 843. IEEE (1969). IEEE recommended practice for speech quality measurements, IEEE Trans. Audio Electroacoust. 17, 227 246. Kidd, G., Mason, C., Brughera, A., and Hartmann, W. M. (2005). The role of reverberation in release from masking due to spatial separation of sources for speech identification, Acta Acust. Acust. 91, 526 535. Kohlrausch, A., and Sander, A. (1995). Phase effects in masking related to dispersion in the inner ear. II. Masking period patterns of short targets, J. Acoust. Soc. Am. 97, 1817 1829. Lee, A. K. C., and Shinn-Cunningham, B. G. (2008). Effects of reverberant spatial cues on attentiondependent object formation, J. Assoc. Res. Otolaryngol. 9, 150 160. Scheffers, M. T. M. (1983). Sifting vowels: Auditory pitch analysis and sound segregation, Ph.D. thesis, Rijksuniversiteit Groningen, The Netherlands. EL470 J. Acoust. Soc. Am. 134 (5), November 2013 M. L. D. Deroche and J. F. Culling: Voice segregation by fundamental frequency