A real time study of plosives in Glaswegian using an automatic measurement algorithm

Size: px

Start display at page:

Download "A real time study of plosives in Glaswegian using an automatic measurement algorithm"

Philomena McKinney
5 years ago
Views:

1 A real time study of plosives in Glaswegian using an automatic measurement algorithm Jane Stuart Smith, Tamara Rathcke, Morgan Sonderegger University of Glasgow; University of Kent, McGill University NWAV42, Pittsburgh, October, 2013

2 A real time study of plosives in Glaswegian using an automatic measurement algorithm Background the voicing contrast in Scottish English Methodology Glasgow real time corpus automatic phonetic measurement improving the algorithm algorithm performance Preliminary results

3 Background Scottish English is typically observed to show voiceless plosives with shorter aspiration than Southern varieties of English (e.g. Wells 1982)

4 Background Docherty et al (2011): Scottish English border 159 speakers 4 locations: Scottish/English; East/West 4662 tokens of voiced and voiceless plosives read wordlists

5 Background Docherty et al (2011): Younger speakers showed longer aspiration, measured as Voice Onset Time (VOT) less prevoicing than older speakers apparent time change? physiological constraints?

6 Background Docherty et al (2011): Scottish speakers at Eastern end (Eyemouth), showed shorter aspiration/vot than speakers at the Western end of the Border (Gretna) Eyemouth speakers also show more Scottish features (rhoticity; SVLR) fine grained aspect of plosive production subject to subtle sociolinguistic control (cf phonetic imitation studies, e.g. Nielsen 2011)

7 Research Question Is the voicing contrast in plosives changing in real time in Scottish English?

8 Research Question Is the voicing contrast in plosives changing in real time in Scottish English? sample of different ages recorded at different points in time sufficient number of tokens hand labelling VOT in spontaneous speech is very time consuming!

9 Fine phonetic variation and sound change: A real time study of Glaswegian Oct 2011 Sept 2014

10 A real time corpus of Glaswegian vernacular ideal structure Decade of recording Old Middle aged Young s 6 m, 6 f 6 m, 6 f 6 m, 6 f 1980s 6 m, 6 f 6 m, 6 f 6 m, 6 f 1990s 6 m, 6 f 6m, 6 f 6m, 6 f 2000s 6 m, 6 f 6m, 6 f 6m, 6 f

11 Sample for this paper Decade of recording Old Middle aged s 2 f 2 f 2 f 1980s 1990s 2000s 2 f 2 f 2 f Young 10 15

12 Sample for this paper Decade of recording 1970s 1980s 1990s Old f (sociolinguistic interview; oral history interview) Middle aged f (sociolinguistic interview) Young f (sociolinguistic interview) 2000s 2 f (oral history) 2 f (conversation) 2 f (conversation) Sources (with thanks): Labov; Macaulay; M74 Project; Glasgow Media Project

Corpus for this study LABB CAT (Fromont and Hay; previously ONZEMiner) http://labbcat.sourceforge.

13 Corpus for this study LABB CAT (Fromont and Hay; previously ONZEMiner) Storage of time aligned transcripts Detailed contextualized searches Preliminary segmentation by forced alignment using HTK in LABB CAT

14 Methodology plosives voiceless /p t k/; voiced /b d g/ stressed syllable initial Automatic measurement algorithm Positive VOT voiceless plosives

15 Methodology plosives voiceless /p t k/; voiced /b d g/ stressed syllable initial Automatic measurement algorithm Positive VOT voiceless plosives voiced plosives (partial) release = burst + frication Negative VOT Closure duration

16 Automatic VOT measurement Manuallylabeled VOTs Training Goal: Minimize VOT prediction error on unseen data Classifier Classifier input, for a new stop: Where to start looking for VOT (search boundary) 62 acoustic feature functions Output: Predicted VOT boundaries Sonderegger & Keshet (2012), JASA Henry, Sonderegger, Keshet (2012), Interspeech

voicing onsets minus its mean before the burst onset

17 Feature functions: Based on cues used by human annotators Example: Mean of high frequency energy between burst and voicing onsets minus its mean before the burst onset Algorithm learns: High for good burst/voicing onset pair, low otherwise

18 Previous results: Positive VOT On 4 datasets: Trainable: Optimal performance with examples Accurate: Performance near intertranscriber agreement Intertranscriber Auto/manual Intertranscriber Auto/manual Switchboard Big Brother 2 ms 5 ms 10 ms Sonderegger & Keshet (2012), JASA

19 Procedure Training data: 100 tokens for 5 speakers First round of manual correction Code 1: correct Code 2: close, worth manually correcting Codes 3 8: completely wrong Algorithm altered Another round of manual correction

20 Manual correction (all plosives n = 4491) 100% 90% 80% 70% 60% 50% 40% 30% 20% Code 8 Code 7 Code 6 Code 5 Code 4 close Code 3 and Code 2 easily Code 1 corrected 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f s 1930s 1940s 1960s 1960s Decade of birth 1990s correct

21 Manual correction (all plosives n = 4491) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f s 1930s 1940s 1960s 1960s Decade of birth 1990s background noise Code 8 Code 7 Code 6 overlapping Code 5 speakers Code 4 Code 3 Code 2 Code 1 wrong forcedalignment

22 Manual correction (all plosives n = 4491) 100% 90% 80% strongly reduced 70% 60% 50% 40% 30% 20% Code 8 fricative or Code 7 Code 6 approximant Code 5 Code 4 Code 3 wrong Code 2 but Code 1 unclear why 10% 0% 70 O f01 70 O f02 70 M f01 70 M f02 00 O f01 00 O f02 70 Y f01 70 Y f02 00 M f01 00 M f02 00 Y f01 00 Y f s 1930s 1940s 1960s 1960s Decade of birth 1990s

23 Prediction results N = 4491; 12 speakers Code 1: correct: 52% Code 2: close: 15% Codes 3 8: wrong: 33%

24 Prediction results by voicing voiced: Code 1: correct: 45% Code 2: close: 18% Codes 3 8: wrong: 37% voiceless Code 1: correct: 61% Code 2: close: 12% Codes 3 8: wrong: 25%

25 Preliminary results voiced voiceless n= 3012 Voicing p <

26 Voiced plosives /b/ 1970s 2000s release phase may be getting longer /d/ very short = burst longer = VOT /g/ n= 1669

27 Voiceless plosives: /p/ n = s 2000s

28 Voiceless plosives: /p/ OLD MIDDLE AGED YOUNG n = s 2000s

29 Voiceless plosives: /p/ OLD MIDDLE AGED YOUNG p < n = s 2000s

30 Voiceless plosives: /t/ OLD MIDDLE AGED YOUNG p < n = s 2000s

31 Voiceless plosives: /k/ OLD MIDDLE AGED YOUNG p < n = s 2000s

32 Discussion Methodology large number of tokens (6125 > 3012 usable) processed in a short time 52% correct close to previous results in Sonderegger and Keshet (2012) for Switchboard/Big Brother voiced plosives need more parameters promising for sociolinguistic analysis

33 Discussion Preliminary results real time change? Voicing contrast is robust shift in phonetic realization from voicing to VOT/aspiration? age grading? No consistency in VOT duration according to age group Some younger speakers show much shorter VOTs than much older speakers (and vice versa)

34 Next steps Improve algorithm for voiced plosives: Positive VOT Negative VOT Closure duration % voicing during closure More speakers

35 GULP GLASGOW UNIVERSITY LABORATORY OF PHONETICS Feedback gratefully received Jane.Stuart

Phone-based Plosive Detection

Phone-based Plosive Detection 1 Andreas Madsack, Grzegorz Dogil, Stefan Uhlich, Yugu Zeng and Bin Yang Abstract We compare two segmentation approaches to plosive detection: One aproach is using a uniform