Audiovisual analysis of relations between laughter types and laughter motions

Similar documents
How about laughter? Perceived naturalness of two laughing humanoid robots

Laugh when you re winning

Perception of Intensity Incongruence in Synthesized Multimodal Expressions of Laughter

Smile and Laughter in Human-Machine Interaction: a study of engagement

Seminar CHIST-ERA Istanbul : 4 March 2014 Kick-off meeting : 27 January 2014 (call IUI 2012)

LAUGHTER IN SOCIAL ROBOTICS WITH HUMANOIDS AND ANDROIDS

A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems

PSYCHOLOGICAL AND CROSS-CULTURAL EFFECTS ON LAUGHTER SOUND PRODUCTION Marianna De Benedictis Università di Bari

MAKING INTERACTIVE GUIDES MORE ATTRACTIVE

The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Improving Frame Based Automatic Laughter Detection

Analysis of Engagement and User Experience with a Laughter Responsive Social Robot

Multimodal databases at KTH

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

Pitch-Synchronous Spectrogram: Principles and Applications

Speech Recognition and Signal Processing for Broadcast News Transcription

Automatic Laughter Detection

Laughter and Body Movements as Communicative Actions in Interactions

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Automatic Laughter Detection

Appendix D CONGRUENCE /INCONGRUENCE SCALE. Body and face give opposite message to underlying affect and content

Laughter and Smile Processing for Human-Computer Interactions

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

AUTOMATIC RECOGNITION OF LAUGHTER

Towards automated full body detection of laughter driven by human expert annotation

Empirical Evaluation of Animated Agents In a Multi-Modal E-Retail Application

Rhythmic Body Movements of Laughter

2. AN INTROSPECTION OF THE MORPHING PROCESS

This full text version, available on TeesRep, is the post-print (final version prior to publication) of:

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

Multimodal Analysis of laughter for an Interactive System

Analysis of the effects of signal distance on spectrograms

Doubletalk Detection

Imitating the Human Form: Four Kinds of Anthropomorphic Form Carl DiSalvo 1 Francine Gemperle 2 Jodi Forlizzi 1, 3

A chorus learning support system using the chorus leader's expertise

Acoustic Prosodic Features In Sarcastic Utterances

A repetition-based framework for lyric alignment in popular songs

A comparison of the acoustic vowel spaces of speech and song*20

The roles of expertise and partnership in collaborative rehearsal

1. Introduction NCMMSC2009

Proposal for Application of Speech Techniques to Music Analysis

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

The MAHNOB Laughter Database. Stavros Petridis, Brais Martinez, Maja Pantic

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Implementing and Evaluating a Laughing Virtual Character

Acoustic Scene Classification

Name Identification of People in News Video by Face Matching

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

Processing Linguistic and Musical Pitch by English-Speaking Musicians and Non-Musicians

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

D-Lab & D-Lab Control Plan. Measure. Analyse. User Manual

Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction

Application of a Musical-based Interaction System to the Waseda Flutist Robot WF-4RIV: Development Results and Performance Experiments

Real-time Laughter on Virtual Characters

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

Facial expressions of singers influence perceived pitch relations. (Body of text + references: 4049 words) William Forde Thompson Macquarie University

Appendix C ACCESSIBILITY SCALE CLOSED OPEN

Laughter Animation Synthesis

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

DEVELOPMENT OF MIDI ENCODER "Auto-F" FOR CREATING MIDI CONTROLLABLE GENERAL AUDIO CONTENTS

Normalized Cumulative Spectral Distribution in Music

Expressive performance in music: Mapping acoustic cues onto facial expressions

This manuscript was published as: Ruch, W. (1997). Laughter and temperament. In: P. Ekman & E. L. Rosenberg (Eds.), What the face reveals: Basic and

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Subjective Similarity of Music: Data Collection for Individuality Analysis

The Language Inside Your Brain (plural suffix -s )

Classification of Voice Modality using Electroglottogram Waveforms

Welcome to My Favorite Human Behavior Hack

Automatic music transcription

Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Components of intonation. Functions of intonation. Tones: articulatory characteristics. 1. Tones in monosyllabic utterances

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Measurement of overtone frequencies of a toy piano and perception of its pitch

Human Perception of Laughter from Context-free Whole Body Motion Dynamic Stimuli

Phonetic Aspects of "Speech-Laughs"

Real-time magnetic resonance imaging investigation of resonance tuning in soprano singing

Graphic Features of Text-based Computer-Mediated Communication

Quarterly Progress and Status Report. X-ray study of articulation and formant frequencies in two female singers

Appendix A Types of Recorded Chords

Expressive information

Automatic Rhythmic Notation from Single Voice Audio Sources

How We Sing: The Science Behind Our Musical Voice. Music has been an important part of culture throughout our history, and vocal

Proceedings of Meetings on Acoustics

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

A Bayesian Network for Real-Time Musical Accompaniment

Bridging the Gap Between Humans and Machines: Lessons from Spoken Language Prof. Roger K. Moore

Welcome to Session 7

Audio-Based Video Editing with Two-Channel Microphone

EMS : Electroacoustic Music Studies Network De Montfort/Leicester 2007

Expressive Multimodal Conversational Acts for SAIBA agents

Robert Alexandru Dobre, Cristian Negrescu

6.5 Percussion scalograms and musical rhythm

Retrieval of textual song lyrics from sung inputs

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS 1. Automated Laughter Detection from Full-Body Movements

Emotional Remapping of Music to Facial Animation

BRAIN-ACTIVITY-DRIVEN REAL-TIME MUSIC EMOTIVE CONTROL

Transcription:

Speech Prosody 16 31 May - 3 Jun 216, Boston, USA Audiovisual analysis of relations between laughter types and laughter motions Carlos Ishi 1, Hiroaki Hata 1, Hiroshi Ishiguro 1 1 ATR Hiroshi Ishiguro Labs. carlos@atr.jp, hata.hiroaki@atr.jp, ishiguro@sys.es.osaka-u.ac.jp Abstract Laughter commonly occurs in daily interactions, and is t only simply related to funny situations, but also for expressing some type of attitude, having important social functions in communication. The background of the present work is generation of natural motions in a humaid robot, so that miscommunication might be caused if there is mismatch between audio and visual modalities, especially in laughter intervals. In the present work, we analyzed a multimodal dialogue database, and investigated the relations between different types of laughter (including production type, vowel quality, laughing style, intensity and laughter functions) and different types of motion during laughter (including facial expressions, head and body motion). Index Terms: laughter, facial expression, laughter motion, n-verbal information, natural conversation 1. Introduction Laughter commonly occurs in daily interactions, and is t only simply related to funny situations, but also for expressing some type of attitude, having important social functions in human-human communication. Therefore, it is important to account for laughter in robot-mediated communication as well. The authors have been working on improving human-robot communication, by implementing humanlike motions in several types of humaid robots. Natural (humanlike) behaviors by a robot are required as the appearance of the robot approaches the one of a human, such as in android robots. Several methods for automatically generating lip and head motions from the speech signal of a tele-operator have been proposed in the past [1-4]. Recently we also started to tackle the problem of generating natural motion during laughter [5]. However, we are still t able to generate motions according to different laughter types or different laughter functions. Several works have investigated the functions of laughter and the relationship with acoustic features. For example, it is reported that duration, energy and voicing/unvoicing features change between positive and negative laughter, in a French hospital call center telephone speech [6]. In [7], it is reported that the first formant is and vowels are centralized (schwa), by analyzing English acted laughter data of several speakers. In [8-9], it is reported that mirthful laughter and polite laughter differ in terms of duration, the number of calls (syllables), pitch and spectral shapes, in Japanese telephone conversational dialogue speech. In our previous work [1], we have analyzed laughter events of students in a science classroom of a Japanese elementary school, and found relations between laughter types (production, vowel quality, and style), functions and situations. Regarding the relationship between audio and visual features in laughter, several works have been conducted in the computer graphics animation field [11-13]. However, most of them dealt with symbolic facial expressions, so that dynamic features and differences in smiling face due to different types of laughter are t expressed. As described above, different types of laughter may require different types of smiling faces. Thus, it is important to clarify how different motions are related to different types of laughter. In the present work, we analyzed laughter events in face-to-face human interactions in a multimodal dialogue database, and investigated the relations between different types of laughter (such as production type, laughing style, and laughter functions) and the visual features (facial expressions, head and body motions) during laughter. 2. Analysis data 2.1. Description of the data For analysis, we use the multimodal conversational speech database recorded at ATR/IRC labs [2]. The database contains face-to-face dialogues between several pairs of speakers, including audio, video and (head) motion capture data for each of the dialogue partners. Each dialogue is about 1 ~ 15 minutes of free-topic conversations. The database contains segmentation and text transcriptions, and also includes information about presence of laughter. For the present analysis, data of 12 speakers (8 female and 4 male speakers) were used, from where about 1 laughing speech segments were extracted. 2.2. Antation data The following label sets were used to antate the laughter types and laughter functions. These are based on past works. (The terms in parenthesis are the original Japanese terms used in the antation.) Laughter production type: {breathiness over the whole laughter segment ( kisoku ), alternated pattern of breathy and n-breathy parts ( iki ari to iki nashi kougo ), relaxed ( shikan : vocal folds relaxed, absence of breathiness:), laughter during inhalation ( hikiwarai )} Laughter style: {secretly ( hisohiso ), giggle/chuckle ( kusukusu ), guffaw ( geragera ), sneer ( hanawarai )} Vowel-quality of the laughter: { hahaha, hehehe, hihihi, hohoho, huhuhu, schwa (central vowel)} Laughter intensity level: {1 ( shouwarai ), 2 ( chuuwarai ), 3 ( oowarai ), 4 ( bakushou )} Laughter function: {funny/amused/joy/mirthful laugh ( omoshiroi, okashii, tashii ), social/polite laugh ( aisowarai ), bitter/embarrassed laugh ( nigawarai ), self-conscious laugh ( terewarai ), inviting laugh ( sasoiwarai ), contagious laugh ( tsurarewarai, moraiwarai ), depreciatory/derision laugh ( mikudashiwarai ), dumbfounded laugh ( akirewarai ), untrue laugh ( usowarai ), softening laugh ( kanwa/ba o yawarageru : soften/relax a strained situation)} A research assistant (native speaker of Japanese) antated the labels above, by listening to the segmented intervals 86 doi: 1.21437/SpeechProsody.16-165

(including five seconds before and after the laughter portions.) For the label items in laughter style and laughter functions (items 2 and 4 in Table 1), antators were allowed to select more than one item per laughter event. No specific constraints were imposed for the number of times for listening, or the order for antating all items in Table 1. The number of laughter calls (individual syllables in an /h/-vowel sequence) was also antated for each laughter event, by looking at the spectrogram displays. The following label sets were used to antate the visual features related to motions and facial expressions during laughter. eyelids: {,, } cheeks: {, t } lip corners: {, straightly stretched, lowered} head: { motion, up, down, left or right, ed,, (including motions synchronized with motions like upper-body)} upper body: { motion, front, back, up, down, left or right, ed, turn, (including motions synchronized with other motions like head and arms)} For each laughing speech event, ather research assistant antated the labels related to motion and facial expressions, by looking at the video and the motion data displays. For all antations above, it was allowed to select multiple labels, if multiple items are perceived. 3. Analysis of the laughter events 3.1. Analysis of laughter motions The overall distributions of the motions during laughter were first analyzed. Fig. 1 shows the distributions for each motion type. Firstly, as a most representative feature for facial expression in laughter, it was observed that lip corner is in more than 8% of the laughter events. were in 79%, and eyes were or in 59% of the laughter events. More than 9% of the laughter events were accompanied either by a head or upper body motion, from which the majority of the motions were in the vertical axis ( or front/back body motion, and s for head motion). 1 8 6 Lip corners 25 15 1 5 1 8 6 5 3 1 5 3 1 Figure 1. Distributions of face (lip corners, cheek and eyelids), head and upper-doby motions during laughter speech. For investigating the timing of the motions during laughter speech, we conducted detailed analysis for two of the speakers (female speakers in her s). The instants of eye blinking and the start and end points of eye narrowing and lip corner raising were segmented. As a result, it was observed that the start time of the smiling facial expression (eye narrowing and lip corner raising) usually matched with the start time of the laughing speech, while the end time of the smiling face (i.e., the instant the face turns back to the rmal face) was delayed relative to the end time of the laughing speech by.8 ±.5 seconds for one of the speakers, and 1. ±.7 seconds for the other speaker. Furthermore, it was observed that an eye blinking is usually accompanied at the instant the face turns back from the smiling face to the rmal face. 3.2. Analysis of laughter motions and laughter types Fig. 2 shows the distributions of the laughter motions according to different laughter types (production, vowel quality, and style). The number of occurrences for each item is shown within parenthesis. The items with low number of occurrences are omitted. The results for lip corner and cheek motions are also omitted, since most of laughter events are accompanied by lip corner raising and cheek raising. 1% 9% 8% 7% 6% 5% % 3% % 1% % 1% 9% 8% 7% 6% 5% % 3% % 1% % alternated (479) breathy (128) lax (41) nasalized (66) ha (85) hu (55) schwa (172) giggle (16) guffaw (69) alternated (315) breathy (83) lax (23) nasalized (37) ha (43) hu (26) schwa (8) giggle (3) guffaw (18) down up 87

1% 9% 8% 7% 6% 5% % 3% % 1% % Figure 2. Distributions of eyelids, head motion and body motion, for different categories of production type (left), vowel-quality (mid) and laughter style (right). The total number of utterances is shown within brackets. From the results in Fig.2, it can be observed that almost all laughter events are accompanied by eyelid narrowing and closing in giggle and guffaw laughter styles. In guffaw laughter, all laughter events were accompanied by some body motion, from where the occurrence rate of backward motion was relatively higher. Regarding the vowel quality, by comparing the distributions of ha and hu, it can be observed that in hu the occurrence rate of head down and body frontward motion, while in ha, head up motion occurs with relatively high rate. Regarding the production type, breathy and lax production types show higher occurrence of motion for both head and body motion, compared to the alternated pattern. 3.3. Analysis of laughter motions and laughter functions Fig. 3 shows the distributions of the laughter motions (eyelids, head motion, and body motion) according to different laughter functions. The number of occurrences for each item is shown within parenthesis. The items with low number of occurrences are omitted. From Fig. 3, it can be observed that in funny laughter (funny/amused/joy/mirthful) and contagious laughter, the occurrence rates of cheek raising are higher (above 9%). This is because such types of laughter are thought to be spontaneous laughter, so that Duchenne smiles [14] occur and the cheek is usually. Similar trends were observed for eyelid narrowing or closing. 1% 9% 8% 7% 6% 5% % 3% % 1% % alternated (316) breathy (9) lax (28) nasalized (44) ha (59) hu (39) schwa (116) giggle (37) guffaw (36) funny (122) funny1 (15) funny2 (121) depreciatory (53) dumbfounded () bitter (69) self-conscious (142) social (129) inviting (138) contageous (18) soften (88) turn backward frontward 1% 9% 8% 7% 6% 5% % 3% % 1% % 1% 9% 8% 7% 6% 5% % 3% % 1% % 1% 9% 8% 7% 6% 5% % 3% % 1% % funny (127) funny1 (157) funny2 (13) depreciatory (57) dumbfounded () bitter (72) self-conscious (152) social (135) inviting (145) contageous (112) soften (91) funny (61) funny1 (82) funny2 (62) depreciatory (24) dumbfounded (25) bitter () self-conscious (69) social (74) inviting (67) contageous (51) soften (48) funny (52) funny1 (53) funny2 (56) depreciatory (19) dumbfounded (8) bitter (28) self-conscious (64) social (52) inviting (53) contageous (46) soften (38) t left-right round backward frontward Figure 3. Distributions of eyelids, cheeks, head motion and body motion categories, for different categories of laughter functions. The total number of utterances is shown within brackets. Regarding head motion and body motion, relatively high occurrence of motion (about %) are observed in bitter, social, dumbfounded, and softening laughter. It can be interpreted that the occurrence of head and body motion decreases in these laughter types, since they are t spontaneous, but artificially produced. 3.4. Analysis of laughter motions and laughter intensity Fig. 4 shows the distributions of the laughter motions (eyelids, cheeks, lip corners, head motion, and body motion) according to different laughter intensity categories. The correlations between laughter intensity and different types of motions are much clearer than the laughter styles or laughter functions shown in Sections 3.2 and 3.3. From the results shown for eyelids, cheeks and lip corner, it can be said that the degree of smiling face increased according to the intensity of the laughter, that is, eyelids are or, and both cheeks and lip corners are (Duchenne smile faces). 88

Regarding the body motion categories, it can be observed that the occurrence rates of front, back and motions increase, as the laughter intensity increases. The results for intensity level 4 shows slightly different results, but this is probably because of the small number of occurrences (around, for 8 categories). From the results for head motion, it can be observed that the occurrence rates of s decrease, as the laughter intensity increases. Since s usually appear for expressing agreement, consent or sympathy, they are thought to be easier to appear in low intensity laughter. 1% 9% 8% 7% 6% 5% % 3% % 1% % 1% 9% 8% 7% 6% 5% % 3% % 1% % 1% 9% 8% 7% 6% 5% % 3% % 1% % 1% 9% 8% 7% 6% 5% % 3% % 1% % 1% 9% 8% 7% 6% 5% % 3% % 1% % 1 (379) 2 (221) 3 (13) 4 (23) 1 (2) 2 (229) 3 (19) 4 (24) Lip corners 1 (377) 2 (219) 3 (17) 4 (24) 1 (1) 2 (228) 3 (15) 4 (23) Body motion 1 (362) 2 (183) 3 (93) 4 (19) t lowered straight down up turn back front Figure 4. Distributions of eyelid, cheek, lip corner, head and body motion categories, for different categories of laughter intensity (1 to 4). The total number of utterances is shown within brackets. 4. Conclusions In the present work, we analyzed audiovisual properties of laughter events in face-to-face dialogue interactions. Analysis results of laughter events revealed relationships between laughter motions (facial expressions, head and body motions) and laughter type, laughter function and laughter intensity. Firstly, it was found that giggle and guffaw laughing styles are almost always accompanied by smiling facial expressions and head or body motion. Artificially produced laughter (such as social, bitter, dumbfounded and softening laughter) tends to be accompanied by less motion compared to spontaneous laughter (such as funny and contagious laughter). Finally, it was found that the occurrence of smiling faces (Duchenne smiles) and body motion increase, and the occurrence of s decrease, as the laughter intensity increases. Future works include evaluation of acoustic features for automatic detection and classification of laughter events, and applications to laughter motion generation in humaid robots. 5. Ackwledgements This study was supported by JST/ERATO. We thank Mika Morita, Kyoko Nakanishi and Megumi Taniguchi for contributions in the antations and data analyses. 6. References [1] Ishi, C., Liu, C., Ishiguro, H. and Hagita, N. (12). Evaluation of a formant-based speech-driven lip motion generation, In 13th Annual Conference of the International Speech Communication Association (Interspeech 12), Portland, Oregon, pp. P1a.4, September, 12. [2] C.T. Ishi, C. Liu, H. Ishiguro, and N. Hagita. during dialogue speech and timing control in humaid robots, Proc. of 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI 1), pp. 293-3, 1. [3] C. Liu, C. Ishi, H. Ishiguro, and N. Hagita. Generation of ding, head ing and gazing for human-robot speech interaction. International Journal of Humaid Robotics (IJHR), vol. 1,. 1, January 13. [4] S. Kurima, C. Ishi, T. Minato, and H. Ishiguro. Online Speech- Driven Head Motion Generating System and Evaluation on a Tele-Operated Robot, IEEE International Symposium on Robot and Human Interactive Communication (ROMAN 15), pp. 529-534, 15. [5] Ishi, C., Minato, T., Ishiguro, H. (15) "Investigation of motion generation in android robots during laughing speech," Intl. Workshop on Speech Robotics (IWSR 15), Sep. 15. [6] Devillers, L. & Vidrascu, L., Positive and negative emotional states behind the laughs in spontaneous spoken dialogs, Proc. of Interdisciplinary Workshop on The Phonetics of Laughter, 37-, 7. [7] Szameitat, D. P., Darwin, C. J., Szameitat, A. J., Wildgruber, D., & Alter, K. Formant characteristics of human laughter. J Voice, 25, 32-37, 11. [8] Campbell, N., Whom we laugh with affects how we laugh, Proc. of Interdisciplinary Workshop on The Phonetics of Laughter, 61-65, 7. [9] Tanaka, H. & Campbell, N., Acoustic features of four types of laughter in natural conversational speech, Proc. of ICPhS XVII, 1958-1961, 11. [1] Ishi, C., Hata, H., Hagita, N. (14) "Analysis of laughter events in real science classes by using multiple environment sensor data," Proc. of 15th Annual Conference of the International Speech Communication Association (Interspeech 14), pp. 143-147, Sep. 14. 89

[11] H. Yehia, T. Kuratate, and E. Vatikiotis-Bateson, Using speech acoustics to drive facial motion, Proc. of the 14th International Congress of Phonetic Sciences (ICPhS99), 1, pp. 631-634, 1999. [12] R. Niewiadomski, M. Mancini, Y. Ding, C. Pelachaud, and G. Volpe (14). Rhythmic Body Movements of Laughter, In Proc. of the 16th International Conference on Multimodal Interaction (ICMI '14). ACM, New York, NY, USA, 299-36. [13] Niewiadomski, R, Ding, Y., Mancini, M., Pelachaud, C., Volpe, G., Camurri, A. Perception of intensity incongruence in synthesized multimodal expressions of laughter, The sixth International Conference on Affective Computing and Intelligent Interaction (ACII15), 15 [14] P. Ekman, R.J. Davidson, W.V. Friesen. The Duchenne smile: Emotional expression and brain physiology II. Journal of Personality and Social Psychology, Vol. 58(2), 342-353, 199. 81