Speaking loud, speaking high: non-linearities in voice strength and vocal register variations. Christophe d Alessandro LIMSI-CNRS Orsay, France

Speaking loud, speaking high: non-linearities in voice strength and vocal register variations Christophe d Alessandro LIMSI-CNRS Orsay, France 1

Content of the talk Introduction: voice quality 1. Voice quality dimensions 2. source/filter model in time and frequency 3. Non-linearities : voice quality dimension vs voice acoustic parameters, using synthesis Application: performative synthesis 2

Voice, Speech, Singing, Meaning and Expression Functions of voice in communication: 1. Linguistic and pragmatic functions : to convey linguistic meaning (ideas, concepts, facts ), to perform speech acts (command, promise ). Mainly associated to phoneme and words (double articulation). Noted using writing. 2. Expressive function: to make audible attitudes, feelings, emotions, personality, mood. Speech beyond (or below) linguistic meaning. Mainly associated to prosody and voice quality. Difficult to note using writing. The music of speech 3. Musical function: singing, non linguistic but highly structured communication 3

Voice Quality: a prosodic feature? Prosodic parameters are usually restricted to pitch, duration, pauses and some sort of intensity parameter. But intonation and voice quality are linked (e.g. voice registers) In some languages, voice quality has a phonological status (e.g. strangled tones of Vietnamese) In all languages voice quality has a pragmatic function Synthesis of expressive speech has demonstrated that convincing natural sounding results are impossible to obtain without dealing with voice quality parameters 4

Voice Quality: expression of emotions? Vocal expression of emotions and attitudes is one of the main domains of application for voice quality studies. Although it has been studied for a long time in psychology, it can be considered as an emerging research domain in many areas of speech communication : speech recognition and synthesis, but also speech coding. Voice quality is crucial for singing, theatre and other aesthetical vocalizations. 5

Questions related to Voice Quality Voice quality is still a rather fuzzy concept: what is the timbre of a voice? What are the domains of variation of every day speech? How to measure and quantify voice quality dimensions like vocal effort, vocal tension or noise in the voice? What are the physical and perceptive correlates of voice quality? What are the relationships between voice quality and others aspects of prosody? 6

Voice quality dimensions Ø A promenade in the landscape of voice quality, speech, singing. Ø Phonation dimensions Ø Vocal tract dimensions 7

Voice quality dimensions Syllabic or sentence-level voice quality Dimensions are often defined according to production (instead of perception) Based on settings of respiration, articulation and phonation 8

Speech production model Four main parts: 1. Respiration 2. Phonation 3. Articulation 4. radiation 9

Speech production model Voice quality is in: 1. Respiration: laughter, subglottal pressure 2. Phonation: phonation types, voice registers, effort, tension, voicing, noise 3. Articulation: smile, rounding, rate, strength, front/back, vocal tract length 10

Voice quality dimensions: examples (1) Breathiness Whispe r Semivoiced Nasalisation Nasal 2 Nasal 1 Modal voiced Pressed 1 Pressed 2 Denasaliz ed 1 Denasaliz ed 2 Roughness Modal Rough 1 Rough 2 Creakiness Modal Creaky 1 Creaky 2 Vocal Tract Length Long 2 Long 1 Modal Short 1 Short 2 Tension Lax Modal Tense 1 Tense 2 Lips Rounde d 2 Rounde d 1 Modal Retracted 1 Retracted 2 Press ed 3 Rough 3 Creak y 3 Tense 3 Pitch Low 2 Low 1 Modal High 1 High 2 High 3 Loudness Weak Modal Loud 1 Loud 2 Loud 3 Laughs Smiling Laugh 1 Smiling 1 Laugh 2 Smiling 2 Laugh 3 Smilin g 3 Laugh 4 Laugh 5 Laugh 6 1 female speaker, 1 sentence: "Il est sorti avant le jour." with various vocal qualities. High 4 Loud 4 Laug h 7 High 5 Loud 5 Laug h 8 High 6 Lou d 6 Loud 6b Lou d 7 Loud 7b Autres 1 2 3 4 5 6 11

Voice quality dimensions: examples (2) 1 male speaker, 1 sentence: "She (has) left for a great party today" with various vocal qualities. Modal voice Nasalization Roughness/ Creakiness Vocal tract Tension Lip protrusion mod al1 nasal 1 roug h1 short 1 relax 1 roun d1 mod al2 nasa l2 roug h2 short 2 relax 2 roun d2 moda l3 nasal 3 roug h3 long1 tense d1 smile 1 moda l4 nasal 4 creak y1 long2 tense d2 smile 2 moda l5 creak y2 open ed tense d3 modal 6 creaky 3 closed 1 tense d4 closed 2 tensed 5 Pitch low1 low2 low3 high1 high2 high3 high4 Loudness(1) Loudness(2) Laughs Others whis per loud1 laugh 1 left soft1 soft2 soft3 soft4 soft5 loud 2 laug h2 centr al tensed 6 loud3 loud4 loud5 loud6 strong shout1 shout2 laugh 3 laugh 4 right clear yawn theatri cal omino us1 omino us2 mysteri ous sho ut3 dark 12

Phonation types The three main sources of sound in the larynx are (Catford, 1977): 1. vocal fold vibration (voiced speech) 2. turbulent noise produced through open vocal folds (unvoiced speech) 3. ventricular band vibrations (ventricular speech) 4. Mixtures of voiced, noisy and ventricular phonation types 13

Phonation types Sound examples: 1. vocal fold vibration (voiced speech) 2. turbulent noise produced through open vocal folds (unvoiced speech) 3. ventricular band vibrations (ventricular speech) 4. Mixtures of voiced, noisy and ventricular phonation types 5. Polyphonic voice (ventricular + vocal folds) 14

Main voice quality dimensions Four main dimensions: 1. voice registers :voice mechanisms : creak, modal, falsetto, whistle 2. noise: breathiness, hoarseness 3. Pressure: pressed/lax voice, strangled tones. 4. Effort: accentuation, force. 15

Voice registers Phonation type Description Creak Very low f r e q u e n c y, periodic air pulses Modal Falsetto Usual voice for most males and l o w - p i t c h e d females, low to m e d i u m F 0 register. Usual voice for h i g h p i t c h e d females, high F0 register Production Voice registers Mechanism 0 of vocal folds vibration. Thick and heavy vocal folds, low subglottal pressure, low mean flow Mechanism 1 of vocal folds vibration. Thick and heavy vocal folds vibrating along their whole lengths Mechanism 2 of vocal fold vibration. Thin and light vocal folds, vibrating along about 2/3 of their anterior lengths 16

Ventricular phonation Phonation type Description Production Ventricular phonation Ventricular A harsh quality, with a lot of aperiodicities, low F0 Ventricular creak Very low f r e q u e n c y, p e r i o d i c a i r pulses Produced between the ventricular bands, or false vocal folds Ventricular bands vibration, low subglottal pressure, low mean flow 17

Aperiodicities Phonatio n type B r e a t h phonation B r e a t h y voice Description Unvoiced speech A m i x t u r e o f breath and voice Production Aperiodicities Glottis wide open, high mean flow Incomplete folds closure. High mean flow. Glottal chink Whisper Unvoiced speech Narrowed opening compared to breath phonation, low mean flow Whispery voice Hoarse voice Multipho ny A m i x t u r e o f whisper and voice Irregular, rough quality A v o i c e w i t h multiple F0 and/or sub-harmonics Incomplete folds closure. Low mean flow. Narrow glottal chink. A voice with structural aperiodicities, jitter or shimmer Dissymmetric vibration of the vocal folds, or combination of ventricular and voiced vibrations 18

Lax-tense voice Phonation type Description Production Lax-tense dimension Tense Lax A hard or sharp quality, audible glottal formant A relaxed, soft voice quality Adduction of the posterior part of vocal folds Abduction of the posterior part of vocal folds 19

Vocal effort Phonation type Description Production Vocal effort dimension Loud A strong voice, with much vocal force Flow voice A strong voice, with high amplitude of voicing and flow. Weak A w e a k v o i c e, without vocal force High sub-glottal pressure, high tension of the vocal folds, moderate flow, high voicing amplitude Normal sub-glottal pressure, tension of the vocal folds, high flow, high voicing amplitude Low sub-glottal pressure, low tension of the vocal folds, low flow, low voicing amplitude 20

The voice registers dimension Voice register depend on the underlying voice mechanism: Mechanism 0: vocal fry (creaky voice), very low F0, thick and heavy vocal folds, low sub-glottal pressure, low mean flow Mechanism I: modal voice, usual voice for males and lowpitched females, low to medium F0 register. Thick and heavy vocal folds vibrating along their whole lengths Mechanism II: falsetto voice, usual voice for high pitched females, high F0. Thin and light vocal folds, vibrating along about 2/3 of their anterior lengths Mechanism III: whistle. Very high pitch, mostly children and possibly female 21

The voice registers dimension (Henrich et Castellengo, 2001) Modal voice Falsetto voice (after Vennard, 1967) 22

The voice registers dimension Glissando (barytone) (Henrich, 2001) 23

The voice registers dimension (Glissando contre-ténor) (Henrich, 2001) 24

The noise dimension Represents the relative amount of noise in the speech signal. 1. Additive noises. Whispery voice, breathy voice. Turbulent flow at the glottal constriction. 2. Structural noises. Hoarseness, roughness: 1. Jitter: This is a random fluctuation of the duration of fundamental periods; 2. Shimmer: This is a random fluctuation of amplitude for successive periods. 25

The noise dimension Sound examples 1. Additive noises. 1. Whispery voice. narrow glottis 2. breathy voice. Wide glottis, voicing constriction. 2. Structural noises. Hoarseness, roughness: 26

The pressed/lax dimension The vocal folds can be pressed together more or less strongly at their posterior extremities (arytenoids cartilages): 1. Pressed voice: sometimes called tense or sharp voice quality 2. Lax voice: if the arytenoids are separated, a chink is created at the posterior part of the glottis. Note that this pressed quality may be relatively independent of the vocal effort. 27

The pressed/lax dimension Sound Examples 1. Pressed voice: sometimes called tense or sharp voice quality 2. Lax voice: if the arytenoids are separated, a chink is created at the posterior part of the glottis. Note that this pressed quality may be relatively independent of the vocal effort. 28

The vocal effort dimension important for stress and accentuation important for emotion, affect, attitude etc Loudness = spectral balance and voice amplitude. Results of tension and stiffness of the vocal folds, high sub-glottal pressure 29

The vocal effort dimension Sound examples 1. Speech 1. Soft 2. Loud 3. shouting 2. Singing 3. Emotions 30

Voice range profile (phonetogram) (Sulter, Wit, Schutte, Miller, (1994)) 31

Vocal tract settings important for emotion, affect, attitude etc Important for styles Co-variation with source Very few systematic acoustic studies 32

The vocal tract dimension Sound examples 1. Smiling 2. Rounding 3. Bite block 4. Lengthening 5. Shortening 6. yawning 33

Conclusions on voice quality dimensions About 4 main dimensions for phonation (+ pitch/f0) Vocal tract dimensions of voice quality mostly unknown Respiration dimension of voice quality mostly unknown 34

Modelling the voice source Ø Voice source signals models Ø Time-domain and spectral parameters Ø Physical model and signal models 35

Glottal flow models : time domain Examples: Rosenberg C (Rosenberg, 1971) LF (Liljencrants & Fant, 1985) Klatt (Klatt & Klatt, 1990) R++ (Veldhuis, 1998) 36

Glottal flow models KLGLOTT88 (Klatt & Klatt, Jasa 1988) Rosenberg C (Rosenberg Jasa 1971) LF model,( Liljenkrants, Fant, Lin KTH -STL, 1985) 37

A unified set: 5 time-domain parameters (Doval, d Alessandro & henrich, Acta Acustica 2006) T 0, fundamental period Av, voiced amplitude O q, open quotient a m, asymmetry coefficient (equivalent to speed quotient) Q a, return phase quotient Other parameters of interest :J, total flow of a single pulse E, negative peak amplitude of the glottal flow derivative 38

Time-domain equations In the case of Qa = 0 (abrupt closure), the GFM can all be expressed as : n g (x, a m ) depends on the model normalized glottal flow model : 39

Glottal flow models : frequency domain Glottal flow: Glottal flow derivative: N g (x, a m ) : Fourier transform of n g (x, a m ) N g (x, a m ) : Fourier transform of n g (x, a m ) These two functions depend on the model 40

41 Glottal flow models : spectral description J E j e A A J E j e O T F m n m n v g m n m n q g = = = = ) ( ) ( 2 1 ) ( ) ( 2 1 0 α α π α α π «glottal formant» : g a g a a q a a A F F F E A O T Q F = = π π 2 ) (1 2 1 0 spectral slope : Doval, d Alessandro, Henrich (2006)

Spectral / Time domain :open quotient, asymmetry 42

Spectral / Time domain: spectral tilt Effect of E and Spectral tilt 43

Causal-Anticausal linear voice source model (CALM) Doval, d Alessandro, Henrich (2003) Convergence region for a stable CALM Anticausal filter Causal filter Glottal pulse (CALM vs. R++) Frequency response 44

Voice quality dimensions and acoustic parameters Ø Non-linear relationships between parameters of the acoustic model and voice quality dimensions Ø General relationships Ø Speaker low-high Ø Speaking soft-loud 45

Voice quality dimensions and source parameters dimension Time domain Registers F0, Open quotient Spectral domain T0, Glottal formant Noise Noise, Jitter, Shimmer Noise, harmonic widths Tension Open quotient Glottal formant Force Closure, peak flow Spectral tilt, amplitude 46

Voice quality dimensions and source parameters Parameter Description Duality Main effect on Phonation Time domain parameters Av Amplitude of voicing E, Ags Flow Oq Open quotient Fg Tenseness, Am Asymmetry Bg, Sq Tenseness, Loudness Qa Return phase Fa, Loudness Alternative time domain parameters E Derivative peak Av Flow, Loudness SPL Sound Pressure Level Av, E Flow, Loudness Sq Speed quotient Bg, Am Tenseness, loudness Rd Amplitude quotient (Fant) AV, E, F0 Loudness, Tenseness Aq Amplitude quotient (Alku) AV, E Loudness, Tenseness 47

Voice quality dimensions and source parameters Parameter Description Duality Main effect on Phonation E Derivative peak Av Flow, Loudness SPL Sound Pressure Level Av, E Flow, Loudness Sq Speed quotient Bg, Am Tenseness, loudness Rd Amplitude quotient (Fant) AV, E, F0 Loudness, Tenseness Aq Amplitude quotient (Alku) AV, E Loudness, Tenseness Spectral parameters Fg Glottal formant frequency Oq Tenseness Bg Glottal formant bandwidth Sq, Am Tenseness, loudness Fa Spectral tilt frequency Qa, Tl Loudness Ags Glottal formant amplitude Av, SPL Flow Alternative spectral parameters H1*-H2* 1rst and 2nd Harmonic amplitude differences Oq, Am Tenseness H1*-F3* 1rst harmonic to 3 rd formant amplitude difference Tl, Qa,Fa Loudness Tl Spectral tilt Qa Loudness HRF Harmonic richness factor Qa Loudness 48

Voice quality dimensions and source parameters Parameter Description Duality Main effect on Phonation Aperiodicities Jitter Period-to-period frequency variations Roughness Shimmer Period-to-period amplitude variation Roughness PAPR Periodic-aperiodic ratio Breathiness, whisper LoV Limit of voicing Breathiness, whisper NTL Noise spectral tilt Breathiness, whisper IHN Inter harmonic noise Breathiness, whisper 49

Speaking high Low-high dimension voice registers : signal changes with pitch height: open quotient, amplitude phonétogram : SLP dependence with pitch height formant tuning: vocal tract changes with pitch height FG tuning : open quotient 50

Speaking loud Soft-loud dimension voice spectral tilt changes spl changes noise in the source formant tuning: vowel opening F0 rise F0 contour F1 tuning (Liénard) FG tuning Peakiness, impulsiveness 51

Application to Performative synthesis Formant + CALM source Real-time control Including non-linear source-filter interactions Including a phonetogram DEMO : Cantor Digitalis 22/06/13 NOLISP 2013 52

Acknowledgements Contributions of : Sylvain Le Beux, Nicolas D Alessandro, Lionel Feugère, Boris Doval, Olivier Perrotin For the Cantor Digitalis Are gratefully acknowledged 53