A real time study of plosives in Glaswegian using an automatic measurement algorithm

A real time study of plosives in Glaswegian using an automatic measurement algorithm Jane Stuart Smith, Tamara Rathcke, Morgan Sonderegger University of Glasgow; University of Kent, McGill University NWAV42, Pittsburgh, 17 20 October, 2013

A real time study of plosives in Glaswegian using an automatic measurement algorithm Background the voicing contrast in Scottish English Methodology Glasgow real time corpus automatic phonetic measurement improving the algorithm algorithm performance Preliminary results

Background Scottish English is typically observed to show voiceless plosives with shorter aspiration than Southern varieties of English (e.g. Wells 1982)

Background Docherty et al (2011): Scottish English border 159 speakers 4 locations: Scottish/English; East/West 4662 tokens of voiced and voiceless plosives read wordlists

Background Docherty et al (2011): Younger speakers showed longer aspiration, measured as Voice Onset Time (VOT) less prevoicing than older speakers apparent time change? physiological constraints?

Background Docherty et al (2011): Scottish speakers at Eastern end (Eyemouth), showed shorter aspiration/vot than speakers at the Western end of the Border (Gretna) Eyemouth speakers also show more Scottish features (rhoticity; SVLR) fine grained aspect of plosive production subject to subtle sociolinguistic control (cf phonetic imitation studies, e.g. Nielsen 2011)

Research Question Is the voicing contrast in plosives changing in real time in Scottish English?

Research Question Is the voicing contrast in plosives changing in real time in Scottish English? sample of different ages recorded at different points in time sufficient number of tokens hand labelling VOT in spontaneous speech is very time consuming!

Fine phonetic variation and sound change: A real time study of Glaswegian http://soundsofthecity.arts.gla.ac.uk/ Oct 2011 Sept 2014

A real time corpus of Glaswegian vernacular ideal structure Decade of recording Old 67 90 Middle aged 40 55 Young 10 15 1970s 6 m, 6 f 6 m, 6 f 6 m, 6 f 1980s 6 m, 6 f 6 m, 6 f 6 m, 6 f 1990s 6 m, 6 f 6m, 6 f 6m, 6 f 2000s 6 m, 6 f 6m, 6 f 6m, 6 f

Sample for this paper Decade of recording Old 67 90 Middle aged 40 55 1970s 2 f 2 f 2 f 1980s 1990s 2000s 2 f 2 f 2 f Young 10 15

Sample for this paper Decade of recording 1970s 1980s 1990s Old 67 90 2 f (sociolinguistic interview; oral history interview) Middle aged 40 55 2 f (sociolinguistic interview) Young 10 15 2 f (sociolinguistic interview) 2000s 2 f (oral history) 2 f (conversation) 2 f (conversation) Sources (with thanks): Labov; Macaulay; M74 Project; Glasgow Media Project

Corpus for this study LABB CAT (Fromont and Hay; previously ONZEMiner) http://labbcat.sourceforge.net/ Storage of time aligned transcripts Detailed contextualized searches Preliminary segmentation by forced alignment using HTK in LABB CAT

Methodology plosives voiceless /p t k/; voiced /b d g/ stressed syllable initial Automatic measurement algorithm Positive VOT voiceless plosives

Methodology plosives voiceless /p t k/; voiced /b d g/ stressed syllable initial Automatic measurement algorithm Positive VOT voiceless plosives voiced plosives (partial) release = burst + frication Negative VOT Closure duration

Automatic VOT measurement Manuallylabeled VOTs Training Goal: Minimize VOT prediction error on unseen data Classifier Classifier input, for a new stop: Where to start looking for VOT (search boundary) 62 acoustic feature functions Output: Predicted VOT boundaries Sonderegger & Keshet (2012), JASA Henry, Sonderegger, Keshet (2012), Interspeech

Feature functions: Based on cues used by human annotators Example: Mean of high frequency energy between burst and voicing onsets minus its mean before the burst onset Algorithm learns: High for good burst/voicing onset pair, low otherwise

Previous results: Positive VOT On 4 datasets: Trainable: Optimal performance with 50 250 examples Accurate: Performance near intertranscriber agreement 100 90 80 70 60 50 40 30 20 10 0 Intertranscriber Auto/manual Intertranscriber Auto/manual Switchboard Big Brother 2 ms 5 ms 10 ms Sonderegger & Keshet (2012), JASA

Procedure Training data: 100 tokens for 5 speakers First round of manual correction Code 1: correct Code 2: close, worth manually correcting Codes 3 8: completely wrong Algorithm altered Another round of manual correction

Manual correction (all plosives n = 4491) 100% 90% 80% 70% 60% 50% 40% 30% 20% Code 8 Code 7 Code 6 Code 5 Code 4 close Code 3 and Code 2 easily Code 1 corrected 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f02 1910s 1930s 1940s 1960s 1960s Decade of birth 1990s correct

Manual correction (all plosives n = 4491) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f02 1910s 1930s 1940s 1960s 1960s Decade of birth 1990s background noise Code 8 Code 7 Code 6 overlapping Code 5 speakers Code 4 Code 3 Code 2 Code 1 wrong forcedalignment

Manual correction (all plosives n = 4491) 100% 90% 80% strongly reduced 70% 60% 50% 40% 30% 20% Code 8 fricative or Code 7 Code 6 approximant Code 5 Code 4 Code 3 wrong Code 2 but Code 1 unclear why 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f02 1910s 1930s 1940s 1960s 1960s Decade of birth 1990s

Prediction results N = 4491; 12 speakers Code 1: correct: 52% Code 2: close: 15% Codes 3 8: wrong: 33%

Prediction results by voicing voiced: Code 1: correct: 45% Code 2: close: 18% Codes 3 8: wrong: 37% voiceless Code 1: correct: 61% Code 2: close: 12% Codes 3 8: wrong: 25%

Preliminary results voiced voiceless n= 3012 Voicing p < 0.0001

Voiced plosives /b/ 1970s 2000s release phase may be getting longer /d/ very short = burst longer = VOT /g/ n= 1669

Voiceless plosives: /p/ n = 360 1970s 2000s

Voiceless plosives: /p/ OLD MIDDLE AGED YOUNG n = 360 1970s 2000s

Voiceless plosives: /p/ OLD MIDDLE AGED YOUNG p < 0.0053 n = 360 1970s 2000s

Voiceless plosives: /t/ OLD MIDDLE AGED YOUNG p < 0.0053 n = 422 1970s 2000s

Voiceless plosives: /k/ OLD MIDDLE AGED YOUNG p < 0.0053 n = 558 1970s 2000s

Discussion Methodology large number of tokens (6125 > 3012 usable) processed in a short time 52% correct close to previous results in Sonderegger and Keshet (2012) for Switchboard/Big Brother voiced plosives need more parameters promising for sociolinguistic analysis

Discussion Preliminary results real time change? Voicing contrast is robust shift in phonetic realization from voicing to VOT/aspiration? age grading? No consistency in VOT duration according to age group Some younger speakers show much shorter VOTs than much older speakers (and vice versa)

Next steps Improve algorithm for voiced plosives: Positive VOT Negative VOT Closure duration % voicing during closure More speakers

GULP GLASGOW UNIVERSITY LABORATORY OF PHONETICS Feedback gratefully received Jane.Stuart Smith@glasgow.ac.uk