EMOTIONFACE: PROTOTYPE FACIAL EXPRESSION DISPLAY OF EMOTION IN MUSIC. Emery Schubert

Similar documents
Transient behaviour in the motion of the brass player s lips

Continuous Response to Music using Discrete Emotion Faces

A Comparison between Continuous Categorical Emotion Responses and Stimulus Loudness Parameters

Electronic Musicological Review

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Subjective Emotional Responses to Musical Structure, Expression and Timbre Features: A Synthetic Approach

Designing Filters with the AD6620 Greensboro, NC

Computer Coordination With Popular Music: A New Research Agenda 1

Expressive information

Time-sharing. Service NUMERICAL CONTROL PARTS PROGRAMMING WITH REMAPT ELECTRIC. ... a new dimension in fast, accurate tape preparation.

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

Expressive performance in music: Mapping acoustic cues onto facial expressions

Speech Recognition and Signal Processing for Broadcast News Transcription

The ASI demonstration uses the Altera ASI MegaCore function and the Cyclone video demonstration board.

Design of Control System for Kiwifruit Automatic Grading Machine

Peak experience in music: A case study between listeners and performers

Swept-tuned spectrum analyzer. Gianfranco Miele, Ph.D

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

Liam Ranshaw. Expanded Cinema Final Project: Puzzle Room

THE EFFECT OF EXPERTISE IN EVALUATING EMOTIONS IN MUSIC

1. BACKGROUND AND AIMS

AMusTCL. Sample paper from Your full name (as on appointment form). Please use BLOCK CAPITALS. Centre

Affective response to a set of new musical stimuli W. Trey Hill & Jack A. Palmer Psychological Reports, 106,

CS229 Project Report Polyphonic Piano Transcription

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

THE APPLICATION OF SIGMA DELTA D/A CONVERTER IN THE SIMPLE TESTING DUAL CHANNEL DDS GENERATOR

Brain.fm Theory & Process

Music Performance Panel: NICI / MMM Position Statement

SI-Studio environment for SI circuits design automation

The relationship between properties of music and elicited emotions

Tone Warmups. Preparation bar for 12 & 13. Feel free to create your own improvisations at any time; close your eyes, and sense the lips.

Community Choirs in Australia

The Convergence of Schenkerian Music Theory and Generative Linguistics: An Analysis and Composition

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Environment Expression: Expressing Emotions through Cameras, Lights and Music

APPLICATION OF PHASED ARRAY ULTRASONIC TEST EQUIPMENT TO THE QUALIFICATION OF RAILWAY COMPONENTS

Ligeti. Continuum for Harpsichord (1968) F.P. Sharma and Glen Halls All Rights Reserved

From quantitative empirï to musical performology: Experience in performance measurements and analyses

Continuous Self-report of Engagement to Live Solo Marimba Performance

DIFFERENTIATE SOMETHING AT THE VERY BEGINNING THE COURSE I'LL ADD YOU QUESTIONS USING THEM. BUT PARTICULAR QUESTIONS AS YOU'LL SEE

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Searching for the Universal Subconscious Study on music and emotion

EMPLOYMENT SERVICE. Professional Service Editorial Board Journal of Audiology & Otology. Journal of Music and Human Behavior

Chapter. Arts Education

THE SOUND OF SADNESS: THE EFFECT OF PERFORMERS EMOTIONS ON AUDIENCE RATINGS

IMPROVING SIGNAL DETECTION IN SOFTWARE-BASED FACIAL EXPRESSION ANALYSIS

Opening musical creativity to non-musicians

The Human Features of Music.

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

CZT vs FFT: Flexibility vs Speed. Abstract

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Reducing False Positives in Video Shot Detection

Perceptual dimensions of short audio clips and corresponding timbre features

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

PHGN 480 Laser Physics Lab 4: HeNe resonator mode properties 1. Observation of higher-order modes:

Sentiment Extraction in Music

Off-line Handwriting Recognition by Recurrent Error Propagation Networks

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

EVOLVING DESIGN LAYOUT CASES TO SATISFY FENG SHUI CONSTRAINTS

Machine-learning and R in plastic surgery Classification and attractiveness of facial emotions

Welcome to My Favorite Human Behavior Hack

Quarterly Progress and Status Report. Musicians and nonmusicians sensitivity to differences in music performance

VivoSense. User Manual Galvanic Skin Response (GSR) Analysis Module. VivoSense, Inc. Newport Beach, CA, USA Tel. (858) , Fax.

Multidimensional analysis of interdependence in a string quartet

Leroy Anderson Went to Harvard

INFLUENCE OF MUSICAL CONTEXT ON THE PERCEPTION OF EMOTIONAL EXPRESSION OF MUSIC

Intelligent Music Systems in Music Therapy

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

Piotr KLECZKOWSKI, Magdalena PLEWA, Grzegorz PYDA

White Paper. Uniform Luminance Technology. What s inside? What is non-uniformity and noise in LCDs? Why is it a problem? How is it solved?

Effects of Using Graphic Notations. on Creativity in Composing Music. by Australian Secondary School Students. Myung-sook Auh

Lian Loke and Toni Robertson (eds) ISBN:

Case Study: Can Video Quality Testing be Scripted?

Practicum 3, Fall 2010

Lab 6: Edge Detection in Image and Video

What is Statistics? 13.1 What is Statistics? Statistics

Emotions perceived and emotions experienced in response to computer-generated music

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Assessment may include recording to be evaluated by students, teachers, and/or administrators in addition to live performance evaluation.

CSC475 Music Information Retrieval

TV Synchronism Generation with PIC Microcontroller

Hybrid active noise barrier with sound masking

Commentary on the Arranging Process of the Octet in G Minor

Wesleyan University AGAINST CONTEXT: HYBRIDITY AS A MEANS TO REDUCE ITS IMPACT.

Automatic Detection of Emotion in Music: Interaction with Emotionally Sensitive Machines

Relation between the overall unpleasantness of a long duration sound and the one of its events : application to a delivery truck

A Framework for Segmentation of Interview Videos

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

Emotional Remapping of Music to Facial Animation

UNIVERSITY OF SOUTH ALABAMA PSYCHOLOGY

Sound visualization through a swarm of fireflies

A prototype system for rule-based expressive modifications of audio recordings

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

The Tone Height of Multiharmonic Sounds. Introduction

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Transcription:

Proceedings o ICAD 04-Tenth Meeting o the International Conerence on Auditory Display, Sydney, Australia, July 6-9, 2004 EMOTIONFACE: PROTOTYPE FACIAL EXPRESSION DISPLAY OF EMOTION IN MUSIC Emery Schubert School o Music and Music Education University o New South Wales Sydney NSW 2052 AUSTRALIA E.Schubert@unsw.edu.au ABSTRACT EmotionFace is a sotware interace or visually displaying the sel-reported emotion expressed by music. Taken in reverse, it can be viewed as a acial expression whose auditory connection or exemplar is the time synchronized, associated music. The present instantiation o the sotware uses a simple schematic ace with eyes and mouth moving according to a parabolic model: Smiling and rowning o mouth represents valence (happiness and sadness) and amount o opening o eyes represents arousal. Continuous emotional responses to music collected in previous research have been used to test and calibrate EmotionFace. The interace provides an alternative to the presentation o data on a two-dimensional emotion-space, the same space used or the collection o emotional data in response to music. These synthesized acial expressions make the observation o the emotion data expressed by music easier or the human observer to process and may be a more natural interace between the human and computer. Future research will include optimization o EmotionFace, using more sophisticated algorithms and acial expression databases, and the examination o the lag structure between acial expression and musical structure. Eventually, with more elaborate systems, automation and greater knowledge o emotion and associated musical structure, it may be possible to compose music meaningully rom synthesized and real acial expressions. 1. INTRODUCTION The ability o music to express emotion is one o its most ascinating and attractive characteristics. Measuring the emotion which music can express has, consequently, occupied thinkers and researchers or a long time. One o the problems requiring consideration is how to measure emotion. There have been three broad approaches: physiological measurement (such as heart rate and skin conductance), observational measures (documenting the listeners physical postures and gestures made while listening) and cognitive sel-reporting. Physiological measures tend to tap into changes that are relective o the arousal dimension o emotion [1]. Few studies have shown that they can reliably dierentiate, or example, between happy and sad emotional responses. Observational methods are rarely ound because they are airly complex and expensive to implement. One o the most important examples o such observational methodology is in the coding o acial expressions [eg. 2], though this approach is yet to be applied to the analysis o the music listener s ace. In both o these methodologies the measurement is restricted to an emotion experienced by the listener. It seems unlikely that physiological and observational approaches could indicate the emotion the listener identiies as being in the music (or more inormation on the distinction between perceived and experienced emotion in music see [3]). The most common way o measuring emotional responses to music has been through cognitive sel-report, where the listener verbally reports the emotion perceived in the music. The sel-report approach has been subdivided into three types o response ormats: open-ended, checklist and rating scale. Typically, with each approach participants are asked to listen to a piece o music and make a response at the end o the piece. Since the 1980s researchers have had easier access to computer technology which allows emotional observations about unolding music to be tracked continuously. For this process, Schubert has argued that the best approach is to use rating scales [4]. He proposed a method o collecting sel-reported emotions by combining two rating scales on a visual display. The rating scales should be reasonably independent and explain a signiicant proportion o variation in emotional response. Several researchers have identiied the dimensions which ulill these criteria as being valence (happiness versus sadness) and arousal (activity versus sleepiness) (eg. [10]). The dimensions have been combined at right angles on a computer screen, with a mouse tracking system which is synchronised with the unolding music [5, 6]. One o the applications o tracking emotional response to music in this way is that pedagogues, researchers, musicians and listeners in general can examine the two dimensional emotion space expressed by music according to the sampled population. In the past [6], the visual interace has been the same emotion space used or collecting data rom individual participants. The present paper describes a method o displaying the emotion expressed by music using continuously synthesized acial expressions. 2. FACIAL EXPRESSION LITERATURE Since Darwin s work on emotion [7] we have had a good understanding o how acial expressions communicate emotional inormation. Humans are highly sensitive to nuances in such acial expressions (e.g. 8, 9) and there is strong evidence that the emotion communicated by acial expressions can be understood universally. This corpus o available emotional expressions in the human ace has been documented and decoded largely through the work o Ekman and Friesen [2]. Their taxonomy allows the meaningul reduction o emotion into 6 prototypical, basic emotions. These basic emotions can be translated onto a continuum using a dimensional model o emotion [10]. The eyes,

Proceedings o ICAD 04-Tenth Meeting o the International Conerence on Auditory Display, Sydney, Australia, July 6-9, 2004 eyebrows and mouth are the main parts o the ace which signal emotional messages, what Fasel and Luettin [11] reer to as intransient acial eatures. Further, eye shape is more important than mouth shape in activating the high arousal emotion o ear [12], and thereore has an important connection with the arousal component o emotional expression. In simple animations the valence o emotion is easy to detect through the shape o the lips (concave up or happy expression, and concave down or sad expressions). It should thereore possible to synthesize a simple, schematic ace with easily recognizable emotional expressions using appropriately shaped curves to represent eye size and mouth shape. Transorming two-dimensional emotion data (valence and arousal) into mouth shape and eye size respectively was viewed to be a logical starting point or providing synthesized, visual display o emotion which a human can understand. The next section describes an algorithm used to draw such a ace dynamically as music unolds (using already gathered subjective arousal and valence data rom a previous study using second by second median responses o 67 participants with a airly high degree o musical training and experience [13]). 3. FACIAL EXPRESSION ALGORITHM The aim o the prototype schematic EmotionFace interace was to produce a visually and algorithmically simple schematic ace able to communicate a spectrum o acial expressions along the arousal and valence dimensions. While such a model is airly simple and more sophisticated algorithms are available or manipulating acial expressions [14], the present realization extracts some o the basic principles which exist in the literature and applies them using only two parabolic unctions. One parabola represents the arousal as expressed by eye opening. First, the lower hal o one eye is calculated according to the ormula: (x) = k a (x - e /2)(x + e /2)/a (2) where a is the median o perceived arousal value (gathered in [13]) with the addition o 100 (the addition o 100 is ensure that the parabola is always concave up, because a can have negative values as large as -100). Arousal appears in the denominator because large values o a need to make the parabola narrower and, in eect, increase the eye opening size. The roots o the parabola are ixed at the horizontal eye lines and eye widths, as shown in Figure 1. The width o an eye is, thereore, set to e, with the roots o the conjugate pair being hal o e on either side o the centre. k a is a calibration constant. In the present instantiation o the interace, the author estimated all calibration constants. k a was set so that or small values o arousal, the eyes would appear to be in a neutral (partially opened) position, but or large negative (x) = k a (x - e /2)(x + e /2)/a upper _ eye (x) = - (x) mouth (x) = k v (x 2 )/v Figure 1. General anatomical/algorithmic structure o EmotionFace. The code was implemented in Hypertalk (the scripting language used in Hypercard or Macintosh). Arousal and valence data are read rom a ile which is synchronized with an audio CD track.

Proceedings o ICAD 04-Tenth Meeting o the International Conerence on Auditory Display, Sydney, Australia, July 6-9, 2004 values, the eyes would appear closed (or almost closed), as i sleeping. Once the lower eye is calculated within the boundary o e/2 < x < e/2, it is copied and placed in the appropriate locations based on the eye centre grids, (shown in Figure 1 as a + over each eye). The parabolas are then lipped, as indicated in the upper_eye unction in Figure 1. The mouth is represented by another parabola whose vertex is ixed at the origin according to the general orm: mouth (x) = k v (x 2 )/v (2) As positive valence, v, increases, the mouth unction deepens in a concave-up position, giving the appearance o a growing smile. When the valence becomes negative, the unction lips to concave down, giving the appearance o a rown. For the discontinuity at v = 0, the asymptotic limit is assumed, and a straight, horizontal line is displayed (i.e. neither concave up, nor concave down). k v is a constant used or calibrating the mouth shape. An additional calibration (not shown mathematically, but indicated visually in Figure 1) is the position o the x-axis, and thereore the vertex. As the length o the parabola increases or increasing values o v more space is required to draw the parabola, and to look more believable (the parabolas shown or the mouth in Figure 1 demonstrate the most extreme values o negative and positive valence, values which are rarely approached in median o subjective response to musical stimuli). Thereore, as the positive value o v rises, the x-axis is adjusted by a gradual, though small amount o lowering. Similarly, as the valence becomes more negative, the x-axis is shited upwards in small, gradual increments. The ace and eyes are drawn within a circle representing the outline o the head. The circle was placed within a square boundary o 300 by 300 pixels. From this constraint the other constants (k a and k v ) and axis positions were calculated. Valence and arousal values were synchronized with an audio-cd playing the music corresponding to the gathered emotion data. The audio-cd track time elapsed was read by an external unction written by Sudderth [151] 4. SAMPLE OUTPUTS The algorithm was applied to data rom an earlier study in which arousal and valence data were already collected [13]. The samples shown here were selected to exempliy parts o the music where extreme emotional responses occurred. The irst example (Figure 2) shows one o the lowest valence points occurring in the slow movement o Concierto de Aranjuez by Rodrigo, which occurs around the 263 rd second o the piece in the recording used. The mouth shape is a negative parabola because the valence is negative (-32 on a scale o 100 to +100), relecting the rown, and the eyes are in a roughly neutral position, though slightly closing because o the small, negative valence (-7, also on a scale o 100 to +100). Figure 2. EmotionFace display at the 263 rd second o the Aranjuez concerto, where arousal was 7 and valence was 32 (each on a 100 to +100 scale). Figure 3 shows the dynamic progression o the ace at the opening o Dvorak s Slavonic Dance No. 1 Op. 46, which commences with a loud, sustained chord. EmotionFace always commences a piece in the neutral position (approximately 0 valence and arousal: The data upon which the acial expressions were calculated or the Dvorak can be seen in Table 1). While there is known to exist some time lag between musical activity and associated emotional response [4], the startle o the loud beginning o this piece (see score in Figure 3) promptly leads EmotionFace to a wide eye opening, beore the valence o the music is noticeably altered. Ater a ew seconds, when the uriant has commenced in the major key, the valence increases, as relected in the growing, concave up, parabolic smile, most noticeably at about the 6 th second. At the sixth second there is a noticeable visual indication o a positive valence expression. Time (seconds) Arousal (-100 to +100) 0 1 1 1 8.5 2 2 50 2 3 68 2 4 73 5 5 76 11 6 82 21 7 81 25 8 85 32 9 85 34 10 85 37 11 86 43 Valence (-100 to +100) Table 1: Sample by sample median values o continuous ratings o subjectively determined arousal and valence expressed by Dvorak s Slavonic Dance, shown in Figure 2. Rated by 67 participants in rom an earlier study [13].

Proceedings o ICAD 04-Tenth Meeting o the International Conerence on Auditory Display, Sydney, Australia, July 6-9, 2004 Face at time (seconds) 0 1 2 3 4 5 0 3 4 5 [continued below] time elapsed (seconds) time elapsed (seconds) 6 7 8 9 10 11 Time (seconds) 6 7 8 9 10 11 Figure 2. EmotionFace screen shots or the irst 11 seconds o Slavonic Dance No. 1 Op 46 by Dvorak. Each ace drawn corresponding to each second o music. The second hal-dozen screen shots are shown below the musical score or ease o viewing. Musical score source: Antonin Dvorak Slavonic Dances No. 1, Op. 46, in ull score. Dover Publications, New York, (1987), pp. 1-2.

Proceedings o ICAD 04-Tenth Meeting o the International Conerence on Auditory Display, Sydney, Australia, July 6-9, 2004 5. CONCLUSION The EmotionFace interace provides an alternative, intuitive method o displaying emotion expressed by music. The approach provides another tool or examining dynamic and time dependent emotion responses to music. In some respects it provides a more meaningul display than a two dimensional plot o the arousal and valence because the human s strong ainity toward the interpretation o acial expressions. The method may have applications or pedagogues by teaching students about the kinds o emotion that music can express. On a more trivial level it could be used to accompany music on people s audio reproduction systems. I this is to occur, a database o emotional responses to many pieces o music needs to be gathered. More serious uture work needs to address the lag structure between the emotion expressed by the music and when it is noticed by the listener. For example, in the Dvorak excerpt described, there is a airly sudden increase in arousal response almost immediately (in about one or two seconds) ater the piece commenced. However, Schubert & Dunsmuir [16] demonstrated that the typical delay between music and emotion is around 3 seconds. Should the acial model relect this dynamically varying delay between causal musical eatures and emotional response, or should it be tied directly (instantaneously) to the musical eatures? Further work will also examine alternative algorithms or displaying acial expressions, or the use o a database o standardized emotional expressions. Eventually, it may be possible to extract emotional inormation directly rom the musical signal. This is most likely to occur when subjective measurements can be modeled with musical eatures alone [17], and when these musical eatures can be automatically extracted in real time. Alternatively, it may become possible to compose pieces o music based on acial expressions. With our current knowledge o the relationship between arousal and valence in both acial expression and in music, the results would most likely be quite primitive. However, in years to come, the prospect o acially produced music composition may become a viable proposition. [5] Madsen, C. K., Emotional response to music as measured by the two-dimensional CRDI, Journal o Music Therapy, 34 (1997), 187-199. [6] Schubert, E., Measuring Temporal Emotional Response to Music Using the Two Dimensional Emotion Space, Proceedings o the 4th International Conerence or Music Perception and Cognition, Montreal, Canada (11-15 August) (1996), 263-268. [7] Darwin, C., The Expression o the Emotions in Man and Animals, University o Chicago Press, Chicago (1965/1872). [8] Adolphs, R. et al., Cortical systems or the recognition o emotion in acial expressions Journal o. Neuroscience. 16 (1996). 7678 7687 [9] Davidson, R. J. & Irwin, W. The unctional neuroanatomy o emotion and aective style Trends in Cognitive Sciences, 3(1) (1999), 11-21. [10] Russell, J. A., Aective space is bipolar. Journal o Social Psychology, 37 (1979), 345-356. [11] Fasel, B. & Luettin, J., Automatic acial expression analysis: a survey, Pattern Recognition 36 (2003), 259 275. [12] Morris, J. S., de Bonis, M. & Dolan, R. J., Human Amygdala Responses to Fearul Eyes, NeuroImage, 17 (1) (September 2002), 214-222. [13] Schubert, E., Measuring Emotion Continuously: Validity and Reliability o the Two Dimensional Emotion Space, Australian Journal o Psychology, 51 (1999), 154-165. [14] Du, Y & Lin, X., Emotional acial expression model building, Pattern Recognition Letters, 24(16) (2003), 2923-2934. [15] Sudderth, J. (1995). CoreCD (Version 1.4) [computer sotware]. Core Development Group, Inc. (1995). [16] Schubert, E. & Dunsmuir, W., Regression modelling continuous data in music psychology, in Suk Won Yi (Ed.), Music, Mind, and Science, Seoul National University Press (1999), pp. 298-352. [17] Schubert, E., Modelling emotional response with continuously varying musical eatures. Music Perception, 21(4) (2004), 561-585. 6. ACKNOWLEDGEMENT This research was supported by an Australian Research Council Grant ARC-DP0452290. I am grateul to Daniel Woo rom the School o Computer Science and Engineering at the University o New South Wales or his assistance in the preparation o this paper. 7. REFERENCES [1] Radocy, R. E. & Boyle, J. D., Psychological oundations o musical behaviour (2nd ed.), Springield, IL: Charles C. Thomas (1988). [2] Ekman, P., & Friesen, W.V., Constants across cultures in the ace and emotion, J. Personality Social Psychol. 17 (2) (1971), 124 129. [3] Gabrielsson, A. Emotion perceived and emotion elt: Same or dierent? Musicae Scientiae. Spec Issue, 2001-2002 (2002), 123-147. [4] Schubert, E., Continuous Measurement o Sel-Report Emotional Response to Music, in P. Juslin and J. Sloboda (Eds.), Music and Emotion: Theory and Research, Oxord University Press, Oxord (2001), pp. 393-414.