AUD 6306 Speech Science

Similar documents
Voice segregation by difference in fundamental frequency: Effect of masker type

Topic 1. Auditory Scene Analysis

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Music Perception with Combined Stimulation

Auditory Illusions. Diana Deutsch. The sounds we perceive do not always correspond to those that are

Precedence-based speech segregation in a virtual auditory environment

Auditory scene analysis

Behavioral and neural identification of birdsong under several masking conditions

Informational masking of speech produced by speech-like sounds without linguistic content

Topic 10. Multi-pitch Analysis

Demonstrations. to accompany Bregman s. Auditory Scene Analysis. The perceptual organization of sound MIT Press, 1990

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Processing Linguistic and Musical Pitch by English-Speaking Musicians and Non-Musicians

Quarterly Progress and Status Report. Violin timbre and the picket fence

Speaking in Minor and Major Keys

MEMORY & TIMBRE MEMT 463

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Proceedings of Meetings on Acoustics

Acoustic and musical foundations of the speech/song illusion

Rhythm and Melody Aspects of Language and Music

Polyrhythms Lawrence Ward Cogs 401

Acoustic Prosodic Features In Sarcastic Utterances

Music Representations

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

HST 725 Music Perception & Cognition Assignment #1 =================================================================

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Automatic Laughter Detection

1 Introduction to PSQM

UNIVERSITY OF DUBLIN TRINITY COLLEGE

Improving Frame Based Automatic Laughter Detection

EMS : Electroacoustic Music Studies Network De Montfort/Leicester 2007

Auditory Stream Segregation (Sequential Integration)

German Center for Music Therapy Research

Timbre perception

Pitch is one of the most common terms used to describe sound.

Experiments on tone adjustments

Topic 4. Single Pitch Detection

Release from speech-on-speech masking in a front-and-back geometry

Informational Masking and Trained Listening. Undergraduate Honors Thesis

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering

The presence of multiple sound sources is a routine occurrence

Acoustic Scene Classification

Analysis, Synthesis, and Perception of Musical Sounds

Temporal Envelope and Periodicity Cues on Musical Pitch Discrimination with Acoustic Simulation of Cochlear Implant

9.35 Sensation And Perception Spring 2009

PHY 103 Auditory Illusions. Segev BenZvi Department of Physics and Astronomy University of Rochester

Proceedings of Meetings on Acoustics

Chapter Two: Long-Term Memory for Timbre

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Music Representations

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

EMERGENT SOUNDSCAPE COMPOSITION: REFLECTIONS ON VIRTUALITY

We realize that this is really small, if we consider that the atmospheric pressure 2 is

Commentary on David Huron s On the Role of Embellishment Tones in the Perceptual Segregation of Concurrent Musical Parts

UNIT 1: QUALITIES OF SOUND. DURATION (RHYTHM)

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

1aAA14. The audibility of direct sound as a key to measuring the clarity of speech and music

Automatic Laughter Detection

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

Brian C. J. Moore Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England

Speech To Song Classification

Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm

Assessment may include recording to be evaluated by students, teachers, and/or administrators in addition to live performance evaluation.

MUSI-6201 Computational Music Analysis

Voice & Music Pattern Extraction: A Review

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg

LESSON 1 PITCH NOTATION AND INTERVALS

K-12 Performing Arts - Music Standards Lincoln Community School Sources: ArtsEdge - National Standards for Arts Education

Psychoacoustics. lecturer:

The Tone Height of Multiharmonic Sounds. Introduction

Musical Illusions Diana Deutsch Department of Psychology University of California, San Diego La Jolla, CA 92093

Music Radar: A Web-based Query by Humming System

Audio Feature Extraction for Corpus Analysis

Effect of room acoustic conditions on masking efficiency

Sound Quality PSY 310 Greg Francis. Lecture 32. Sound perception

"The mind is a fire to be kindled, not a vessel to be filled." Plutarch

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

A CAPPELLA EAR TRAINING

Measurement of overtone frequencies of a toy piano and perception of its pitch

EXPLAINING AND PREDICTING THE PERCEPTION OF MUSICAL STRUCTURE

Music 175: Pitch II. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) June 2, 2015

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

A comparison of the acoustic vowel spaces of speech and song*20

Topics in Computer Music Instrument Identification. Ioanna Karydi

Curriculum Development In the Fairfield Public Schools FAIRFIELD PUBLIC SCHOOLS FAIRFIELD, CONNECTICUT MUSIC THEORY I

Harmony and tonality The vertical dimension. HST 725 Lecture 11 Music Perception & Cognition

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Influence of tonal context and timbral variation on perception of pitch

An Integrated Music Chromaticism Model

Perceptual Considerations in Designing and Fitting Hearing Aids for Music Published on Friday, 14 March :01

CHAPTER 20.2 SPEECH AND MUSICAL SOUNDS

CSC475 Music Information Retrieval

Pitch Perception. Roger Shepard

Perceiving Differences and Similarities in Music: Melodic Categorization During the First Years of Life

Pitch-Synchronous Spectrogram: Principles and Applications

August Acoustics and Psychoacoustics Barbara Crowe Music Therapy Director. Notes from BC s copyrighted materials for IHTP

Transcription:

AUD 3 Speech Science Dr. Peter Assmann Spring semester 2 Role of Pitch Information Pitch contour is the primary cue for tone recognition Tonal languages rely on pitch level and differences to convey lexical meanings within syllables Pitch helps to segregate auditory components from different sound sources Pitch and consonant voicing Voice pitch is higher following a voiceless consonant compared to a voiced consonant. Listeners perceive these small changes; voicing judgments are influenced by F of the following vowel. a d a (lower F) a t a (higher F) Size variation in natural speech Adults voices Fundamental Frequency Formant Frequencies Children s voices Fundamental frequency (Hz) F as a function of age and sex 3 2 1 1 Age (years) Geo. Mean formant frequency (Hz) FFs as a function of age and sex 1 1 Age (years) 1

Pitch and vowel identification There is a systematic relationship between F and formant frequencies across voices (low-pitched voices tend to have lower formants than high-pitched voices and vice versa). log mean FF (Hz) Relationship between F and FFs 2 R 2.4 p 4.e-4 N 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 3 log mean F (Hz) Pitch and vowel identification Frequency-shifted speech is more intelligible and perceived as more natural when the normal co-variation of F and formant frequency is preserved, even when frequency shifts approach or exceed the range found in human speech. 2

Identification accuracy (%).1. 1 1.2 1.4 2.1.2. 1. Sentences Vowels 2. 4.. 2 2 Geo. mean F1-F2-F3 (Hz) Comparison of vowel identification accuracy (red circles) and sentence recognition (blue circles). Geometric mean formant frequency (Hz) Original (unscaled) voices FF=.2F+. r=.; N=2 1 1 Medians of vowels per talker 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Scatterplot of vowel identification accuracy and 1 HINT sentence recognition. 1 2 2 3 3 4 Geometric mean F (Hz) Geometric mean formant frequency (Hz) Age-transformed (gender preserved) FF=.2F+. r=.; N=2 1 1 1 1 1 Medians of vowels per talker 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 3 4 Geometric mean F (Hz) 1 1 1 Geometric mean formant frequency (Hz) Age-transformed (gender swapped) FF=.2F+. r=.; N=2 1 1 1 1 1 Medians of vowels per talker 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 3 4 Geometric mean F (Hz) 1 1 1 Cocktail Party Effect Colin Cherry (13) coined the term cocktail party effect to describe the ability of listeners to attend to a single talker in a mixture of conversations and other background noises. Cocktail Party Phenomenon Cherry s experiments involved listening to two different messages presented to one or both ears, on the same pitch or on different pitches, spoken by two talkers of the same gender or differing in gender. 3

Cocktail Party Phenomenon Cherry concluded that listeners rely on several cues to follow a conversation in the presence of competing voices, including Spatial separation Pitch differences Gender differences Auditory scene analysis The sound that reaches the eardrum of the listener is often a mixture of different sources Acoustic signals originating from different sound sources combine additively Unlike vision, the concept of occlusion is hard to define in audition: sounds overlap but also combine in complex ways. Auditory scene analysis (ASA) is the process by which the auditory system organizes complex mixtures of sound. ASA involves grouping processes, in which sound components that are likely to have come from the same environmental source are linked to form a single perceptual unit. Auditory scene analysis Computational auditory scene analysis Reviewed by Cooke and Ellis (21) Human listeners are good at separating mixtures of sounds, as reflected in speech communication and listening to music in complex listening environments (cocktail parties) Attempts to reproduce this separation process using computational models had limited success (a hard problem!) Demonstration: when a sequence of tones, made up of alternating low and high notes is played to listeners at a slow rate they hear low and high tones alternating. At high rates they hear two separate streams, one high and one low. Bregman used the term stream segregation to describe the auditory processes that group together sounds that share common features and segregate them from sounds that differ. 4

Stream segregation: when the sequence split into two separate streams it is difficult to attend to the low and high streams at the same time. It is much easier to listen for a subset of 3 tones from the sequence when they belong to the same stream. Stream segregation: At slow rates we hear alternating low and high notes. At faster rates we hear two separate streams of low and high notes. Stream segregation: A B Standard= repeating 3-tone cycle Left panels: withinstream Right panels: acrossstream Which set (A or B) preserves the standard more effectively? Auditory grouping principles figure-ground phenomenon proximity good continuation closure common fate old-plus-new heuristic Grouping and segregation are complementary processes Sequential grouping Simultaneous grouping

Grouping by timbre Grouping by timbre Tones that deviate from rising/falling pattern pop out of sequence Tones that deviate from rising/falling pattern pop out of sequence Grouping by onset Harmonics of speech sounds or music that start and stop at the same time are grouped together Gestalt law of common fate Grouping by onset Rasch (1) showed that it is easier to distinguish two tones from one another when onset of the first precedes the onset of the second by a short time interval Tone 1 Tone 2 Time Grouping by onset Darwin (11) showed that a harmonic which starts or stops before the remaining harmonics of a vowel is (partially) excluded from the vowel percept. Grouping by onset Darwin (11) Harmonic # onset is earlier than any of the remaining harmonics; Vowel quality shifts in the direction of lower F1 (as if the harmonic had been removed) Synchronous harmonics Asynchronous harmonics

Schema-based grouping When one harmonic is gradually ramped up in level we hear a slight, gradual change in vowel quality Schema-based grouping When the harmonic is augmented by > db relative to its original vowel, we hear a slight change in vowel quality and we hear an extra superimposed tone. Uniform amplitude harmonics Ramped harmonic # Normal amplitude harmonics harmonic # augmented db harmonic # augmented db Principle of good continuation When sounds are interrupted by silence (e.g., signal dropouts or faulty communication lines) or by interfering sounds, the sound is heard as continuous (the auditory system fills in the missing pieces). Periodicity and noise Hypothesis: Periodicity of speech contributes to robustness Harmonicity in the frequency domain Across-frequency grouping of spectral features Unvoiced sounds (e.g., whispered speech) are more susceptible to masking and interference by competing sounds F and voice separation When two people speak at the same time, it is often easier to understand what they say if the pitches of their voices differ, for example if one voice is male and the other is female. F and voice separation Hypothesis 1: voice separation becomes easier and intelligibility improves as the pitch (F ) difference between the voices is increased.

F and voice separation Voice pitch is rarely constant in natural speech, but changes over time (melody of speech, or prosody). Time variation in voice pitch may help listeners to track a target voice in a mixture of voices. F and voice separation Hypothesis 2: voice separation is easier when the natural variation in pitch (F ) is present, and becomes more difficult when the pitch is held constant (monotone). F and voice separation A high quality speech vocoder was used to construct pairs of sentences on different F s. The F difference between the two voices was manipulated (, 1, 2, 4, or semitones). F was either constant or variable (natural pitch contour) Example stimuli: 2 1 1 Hz Results 2 1 1 Hz Results Significant improvement with F. No effect of F modulation. No interaction of F x F modulation. Marginally higher scores for intoned sentences at and 1 semitones may stem from momentary differences in F between the sentences. 1 st F 2 nd F

Brungart et al. (21) Brungart et al. (21) 2-talker correct responses (%) Different Modulated talker, talker, Same different talker same noise sex 3 3 Target-to-Masker Ratio (db) 2-talker correct responses (%) 3 3 Target-to-Masker Ratio (db) Midterm take-home exam In a recent review of the literature on clear speech, Smiljanić and Bradlow (2) describe a speaking style called clear speech and its effect on intelligibility in different populations of listeners. Your assignment is to review the literature on clear speech, using both primary and secondary sources to summarize what is known about clear speech and what these findings imply for speech production and perception. Midterm take-home exam Identify a set of organizing principles or themes that emerge from current research and use these as section headers for your paper. There is no set length for the paper, but - (double-spaced) pages is a reasonable target. Midterm take-home exam Two review papers to get started: 1. Smiljanić R., Bradlow A.R. (2). Speaking and Hearing Clearly: Talker and Listener Factors in Speaking Style Changes. Lang Linguist Compass. 3(1): 23 24. 2. Uchanski, RM. Clear speech. In: Pisoni, DB.; Remez, R., editors. The handbook of speech perception. Malden, MA/Oxford, UK: Blackwell; 2. p. 2-3.