A real time study of plosives in Glaswegian using an automatic measurement algorithm

Similar documents
Phone-based Plosive Detection

Improving Frame Based Automatic Laughter Detection

Semester A, LT4223 Experimental Phonetics Written Report. An acoustic analysis of the Korean plosives produced by native speakers

Week 6 - Consonants Mark Huckvale

Automatic Laughter Detection

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

/s/-stop Blends: Phonetically Consistent Minimal Pairs for Easier Elicitation

Myanmar (Burmese) Plosives

Automatic Laughter Detection

Music Radar: A Web-based Query by Humming System

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Auditory Illusions. Diana Deutsch. The sounds we perceive do not always correspond to those that are

Measuring oral and nasal airflow in production of Chinese plosive

Processing Linguistic and Musical Pitch by English-Speaking Musicians and Non-Musicians

English Phonetics and Phonology. 1. Voiced and voiceless plosives. Voiced and voiceless plosives: Word-initial position

English Consonants - how can we classify them? Phonetics and Phonology. English Consonants - how can we classify them?

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

A repetition-based framework for lyric alignment in popular songs

Brain-Computer Interface (BCI)

Automatic Piano Music Transcription

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

LING 202 Lecture outline W Sept 5. Today s topics: Types of sound change Expressing sound changes Change as misperception

Sonority as a Primitive: Evidence from Phonological Inventories Ivy Hauser University of North Carolina

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

The odds of eternal optimization in OT

Retrieval of textual song lyrics from sung inputs

A New "Duration-Adapted TR" Waveform Capture Method Eliminates Severe Limitations

Computer Coordination With Popular Music: A New Research Agenda 1

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Pitfalls and Windfalls in Corpus Studies of Pop/Rock Music

A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems

Experiments on musical instrument separation using multiplecause

Sonority as a Primitive: Evidence from Phonological Inventories

Acoustic Prosodic Features In Sarcastic Utterances

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Plosive voicing acoustics and voice quality in Yerevan Armenian

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Singer Traits Identification using Deep Neural Network

CS229 Project Report Polyphonic Piano Transcription

Sarcasm Detection in Text: Design Document

Speech To Song Classification

Syllabling on instrument imitation: case study and computational segmentation method

AUD 6306 Speech Science

Transcription of the Singing Melody in Polyphonic Music

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Automatic Laughter Segmentation. Mary Tai Knox

Using Genre Classification to Make Content-based Music Recommendations

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Lecture 9 Source Separation

PSYCHOLOGICAL AND CROSS-CULTURAL EFFECTS ON LAUGHTER SOUND PRODUCTION Marianna De Benedictis Università di Bari

Expressive performance in music: Mapping acoustic cues onto facial expressions

Load Frequency Control Structure for Ireland and Northern Ireland

AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION

Further Topics in MIR

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Topics in Computer Music Instrument Identification. Ioanna Karydi

A Beat Tracking System for Audio Signals

Experiments with Fisher Data

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Automatic Labelling of tabla signals

Timbre perception

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Automatic music transcription

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Acoustic and musical foundations of the speech/song illusion

Phonology. Submission of papers

MANDARIN SINGING VOICE SYNTHESIS BASED ON HARMONIC PLUS NOISE MODEL AND SINGING EXPRESSION ANALYSIS

Powerful Software Tools and Methods to Accelerate Test Program Development A Test Systems Strategies, Inc. (TSSI) White Paper.

Projektseminar: Sentimentanalyse Dozenten: Michael Wiegand und Marc Schulder

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

Computational analysis of rhythmic aspects in Makam music of Turkey

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

Automatic music transcription

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Topic 10. Multi-pitch Analysis

Laughter and Topic Transition in Multiparty Conversation

Automatic Construction of Synthetic Musical Instruments and Performers

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

CHAPTER 8 CONCLUSION AND FUTURE SCOPE

MEASURING LOUDNESS OF LONG AND SHORT TONES USING MAGNITUDE ESTIMATION

Behavioral and neural identification of birdsong under several masking conditions

Seminar CHIST-ERA Istanbul : 4 March 2014 Kick-off meeting : 27 January 2014 (call IUI 2012)

Note : Answer all questions.

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

IMPROVED MELODIC SEQUENCE MATCHING FOR QUERY BASED SEARCHING IN INDIAN CLASSICAL MUSIC

Multimodal databases at KTH

Instructions for producing camera-ready manuscript using MS-Word for publication in conference proceedings *

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf

Mixed Linear Models. Case studies on speech rate modulations in spontaneous speech. LSA Summer Institute 2009, UC Berkeley

1. Introduction NCMMSC2009

Brian C. J. Moore Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England

D. BARD, J. NEGREIRA DIVISION OF ENGINEERING ACOUSTICS, LUND UNIVERSITY

Zero Crossover Dynamic Power Synchronization Technology Overview

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

The Human Features of Music.

User-Specific Learning for Recognizing a Singer s Intended Pitch

Transcription:

A real time study of plosives in Glaswegian using an automatic measurement algorithm Jane Stuart Smith, Tamara Rathcke, Morgan Sonderegger University of Glasgow; University of Kent, McGill University NWAV42, Pittsburgh, 17 20 October, 2013

A real time study of plosives in Glaswegian using an automatic measurement algorithm Background the voicing contrast in Scottish English Methodology Glasgow real time corpus automatic phonetic measurement improving the algorithm algorithm performance Preliminary results

Background Scottish English is typically observed to show voiceless plosives with shorter aspiration than Southern varieties of English (e.g. Wells 1982)

Background Docherty et al (2011): Scottish English border 159 speakers 4 locations: Scottish/English; East/West 4662 tokens of voiced and voiceless plosives read wordlists

Background Docherty et al (2011): Younger speakers showed longer aspiration, measured as Voice Onset Time (VOT) less prevoicing than older speakers apparent time change? physiological constraints?

Background Docherty et al (2011): Scottish speakers at Eastern end (Eyemouth), showed shorter aspiration/vot than speakers at the Western end of the Border (Gretna) Eyemouth speakers also show more Scottish features (rhoticity; SVLR) fine grained aspect of plosive production subject to subtle sociolinguistic control (cf phonetic imitation studies, e.g. Nielsen 2011)

Research Question Is the voicing contrast in plosives changing in real time in Scottish English?

Research Question Is the voicing contrast in plosives changing in real time in Scottish English? sample of different ages recorded at different points in time sufficient number of tokens hand labelling VOT in spontaneous speech is very time consuming!

Fine phonetic variation and sound change: A real time study of Glaswegian http://soundsofthecity.arts.gla.ac.uk/ Oct 2011 Sept 2014

A real time corpus of Glaswegian vernacular ideal structure Decade of recording Old 67 90 Middle aged 40 55 Young 10 15 1970s 6 m, 6 f 6 m, 6 f 6 m, 6 f 1980s 6 m, 6 f 6 m, 6 f 6 m, 6 f 1990s 6 m, 6 f 6m, 6 f 6m, 6 f 2000s 6 m, 6 f 6m, 6 f 6m, 6 f

Sample for this paper Decade of recording Old 67 90 Middle aged 40 55 1970s 2 f 2 f 2 f 1980s 1990s 2000s 2 f 2 f 2 f Young 10 15

Sample for this paper Decade of recording 1970s 1980s 1990s Old 67 90 2 f (sociolinguistic interview; oral history interview) Middle aged 40 55 2 f (sociolinguistic interview) Young 10 15 2 f (sociolinguistic interview) 2000s 2 f (oral history) 2 f (conversation) 2 f (conversation) Sources (with thanks): Labov; Macaulay; M74 Project; Glasgow Media Project

Corpus for this study LABB CAT (Fromont and Hay; previously ONZEMiner) http://labbcat.sourceforge.net/ Storage of time aligned transcripts Detailed contextualized searches Preliminary segmentation by forced alignment using HTK in LABB CAT

Methodology plosives voiceless /p t k/; voiced /b d g/ stressed syllable initial Automatic measurement algorithm Positive VOT voiceless plosives

Methodology plosives voiceless /p t k/; voiced /b d g/ stressed syllable initial Automatic measurement algorithm Positive VOT voiceless plosives voiced plosives (partial) release = burst + frication Negative VOT Closure duration

Automatic VOT measurement Manuallylabeled VOTs Training Goal: Minimize VOT prediction error on unseen data Classifier Classifier input, for a new stop: Where to start looking for VOT (search boundary) 62 acoustic feature functions Output: Predicted VOT boundaries Sonderegger & Keshet (2012), JASA Henry, Sonderegger, Keshet (2012), Interspeech

Feature functions: Based on cues used by human annotators Example: Mean of high frequency energy between burst and voicing onsets minus its mean before the burst onset Algorithm learns: High for good burst/voicing onset pair, low otherwise

Previous results: Positive VOT On 4 datasets: Trainable: Optimal performance with 50 250 examples Accurate: Performance near intertranscriber agreement 100 90 80 70 60 50 40 30 20 10 0 Intertranscriber Auto/manual Intertranscriber Auto/manual Switchboard Big Brother 2 ms 5 ms 10 ms Sonderegger & Keshet (2012), JASA

Procedure Training data: 100 tokens for 5 speakers First round of manual correction Code 1: correct Code 2: close, worth manually correcting Codes 3 8: completely wrong Algorithm altered Another round of manual correction

Manual correction (all plosives n = 4491) 100% 90% 80% 70% 60% 50% 40% 30% 20% Code 8 Code 7 Code 6 Code 5 Code 4 close Code 3 and Code 2 easily Code 1 corrected 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f02 1910s 1930s 1940s 1960s 1960s Decade of birth 1990s correct

Manual correction (all plosives n = 4491) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f02 1910s 1930s 1940s 1960s 1960s Decade of birth 1990s background noise Code 8 Code 7 Code 6 overlapping Code 5 speakers Code 4 Code 3 Code 2 Code 1 wrong forcedalignment

Manual correction (all plosives n = 4491) 100% 90% 80% strongly reduced 70% 60% 50% 40% 30% 20% Code 8 fricative or Code 7 Code 6 approximant Code 5 Code 4 Code 3 wrong Code 2 but Code 1 unclear why 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f02 1910s 1930s 1940s 1960s 1960s Decade of birth 1990s

Prediction results N = 4491; 12 speakers Code 1: correct: 52% Code 2: close: 15% Codes 3 8: wrong: 33%

Prediction results by voicing voiced: Code 1: correct: 45% Code 2: close: 18% Codes 3 8: wrong: 37% voiceless Code 1: correct: 61% Code 2: close: 12% Codes 3 8: wrong: 25%

Preliminary results voiced voiceless n= 3012 Voicing p < 0.0001

Voiced plosives /b/ 1970s 2000s release phase may be getting longer /d/ very short = burst longer = VOT /g/ n= 1669

Voiceless plosives: /p/ n = 360 1970s 2000s

Voiceless plosives: /p/ OLD MIDDLE AGED YOUNG n = 360 1970s 2000s

Voiceless plosives: /p/ OLD MIDDLE AGED YOUNG p < 0.0053 n = 360 1970s 2000s

Voiceless plosives: /t/ OLD MIDDLE AGED YOUNG p < 0.0053 n = 422 1970s 2000s

Voiceless plosives: /k/ OLD MIDDLE AGED YOUNG p < 0.0053 n = 558 1970s 2000s

Discussion Methodology large number of tokens (6125 > 3012 usable) processed in a short time 52% correct close to previous results in Sonderegger and Keshet (2012) for Switchboard/Big Brother voiced plosives need more parameters promising for sociolinguistic analysis

Discussion Preliminary results real time change? Voicing contrast is robust shift in phonetic realization from voicing to VOT/aspiration? age grading? No consistency in VOT duration according to age group Some younger speakers show much shorter VOTs than much older speakers (and vice versa)

Next steps Improve algorithm for voiced plosives: Positive VOT Negative VOT Closure duration % voicing during closure More speakers

GULP GLASGOW UNIVERSITY LABORATORY OF PHONETICS Feedback gratefully received Jane.Stuart Smith@glasgow.ac.uk