Music Emotion Recognition Jaesung Lee Chung-Ang University
Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or Genre Query by Singing or Humming These approach cannot handle below problems The Long Tail [1] These methods do not satisfy people in some situation and specific requirements [2] When people want to sleep, they need a sleepy music with slow tempo to relax their bodies and help them easy sleep When participating in the party, some exciting music with fast tempo is required to help them release their passion In this case, to classify music by emotion is reasonable 2/10
Introduction 3/10 A Music Emotion Recognition system tries to recognize the perceived emotion, because it is relatively invariant to the context (Environment, mood) of listening[4] Expressed Emotion Perceived Emotion Evoked Emotion Too Subjective Relatively Objective Too Subjective Composer s Feeling Music Play Perceiving Emotions by Listener Affect to Listener s Mind Encoding Hearing Inducing Music is created by composer or songwriter They encode their feeling to music using chord or based on composing techniques A listener recognize some emotions A listener evoked by perceived emotion
Introduction In Music Emotion Recognition, Some emotions recognized or perceived in during a piece of music is playing (Perceived Emotion, P Model) Some emotions those are listener s actual feeling after he or she hear a piece of music (Evoked Emotion, E Model) Perceive Evoke Objective Modeling Subjective Modeling Objectively on a small dataset (1,000 songs) fully and carefully annotated by musicians, and Subjectively on a big dataset (41,446 songs) by measuring users satisfaction on a random sample annotated [5] Note that the subjects are asked to label the emotion based on their feelings of what the music sample is trying to evoke, rather than the emotion the subjects perceive at the test [6] 4/10
Introduction In Music Emotion Recognition, They want for model to recognize or train a perceived emotion expressed by music But how? Modeling this procedure! Music Playing Perceiving Emotions by Listener 5/10
Introduction 6/10 Modeling Music Emotion Acoustic Feature Emotion Representation Categories Representation Thayer s Music Emotion [8] Hevner s Music Emotion [9] Tempo Bag-of-Frame [1] Arousal Acoustic Input Acoustic Feature Recognition Rhythm Long term Mod. [7] Training Human Response Valence Music Audio Speech Recognition Articulation : Categorical [10] Music Emotion Recognition Used Thayer s Emotion Plane in [2,4,6,13,15-16,18,24-26] Hevner s Emotion Adjectives in [5,14,22-23] Recognizer (Classifier) Construct to Data Set f 1 f 2 f 3 f 4 p 1 2 3 4 2 p 2 4 1 1 3 p 3 1 2 1 4 : : : : : Neural Network ([4,10-12]) Support Vector Machine ([13-14]) K-Nearest Neighbor ([5,14-16]) Decision Tree ([17]) Gaussian Mixture Model ([2,14,18-19]) Genetic Algorithm ([11,20-21]) Regression ([6]) Fuzzy Approach ([22-24])
Related Works 7/10 Acoustic Feature Representation Dynamics Bag-Of-Frame Model Acoustic Input (WAV File) Rhythm (include Tempo) f i,1 f i,3 f i,5 f i,7 f i,9 f i,11 f i,13 f i,15 f i,17 f i,2 f i,4 f i,6 f i,8 f i,10 f i,12 f i,14 f i,16 f i 0.015 0.018 0.025 0.023 0.027 0.027 0.010 0.010 0.000 0.018 0.015 0.015 0.020 0.021 0.020 0.010 0.010 Pre-Processing (Amplitude Envelope) Timbre Long Term Modulation Averaging MIR Toolbox Lartillot O., Toiviainen P., MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio, ISMIR, 2007 [27] Pitch f j f k 0.020 Categorize High Low
Related Works 8/10 Music Emotion Representation (Labeling) Thayer s Emotion Model Thayer s two-dimensional model of mood, indicating the two underlying stimuli that influence mood response: arousal and valence [18]. Thayer adapted Russell s mood model to music [16]. Researchers who use Thayer s emotion model rather than Hevner s said The adjective checklist is different among different research context and immense [16]. The set of adjectives relating to emotion is immense, and the adjectives vary quite freely in different theories, research, and applications [18]. Categorically Valued Four Quadrants [2,18] Adjectives [13,24] Real Valued Yang et al. [4,6] He claimed that, However, even with the emotion plane, the categorical taxonomy of emotion classes is still inherently ambiguous. Each emotion class represents an area in the emotion plane, and the emotion states within each area may vary a lot [6]. Response Single Labeled Most of researches are fall into this fold, including researches based on Hevner s emotion model [2,4,13,15-16,24-25] Real Value - Each point as an emotion state An alternative is to view the emotion plane as a continuous space and recognize each point as a state [6] - Confused Emotion When people perceive emotion in music, there are some emotions that are reliably perceived and other emotions that are confused with different emotions. Generally, the emotions that are confused appear to similar arousal and/or valence values [26] Multi Labeled - Time-varying Emotion For a piece of classical music, the mood may well change one or more times within a single piece. Therefore, it is not appropriate to detect an exclusive mood for an entire piece of music [18]
Related Works Music Emotion Recognizer (Classifier) Neural Network [1,4,6] It has been shown that the characteristics of music are often better modeled with a non-linear function such as radial basis function [4] Gaussian Mixture Model [18] Based on Thayer s model of mood, a hierarchical framework is proposed for mood detection [18] The distribution of music features is so complicated, a kernel based SVM is adopted to handle this [13] It would be better to provide several candidate moods with confidences [18] Support Vector Machine [13] Fuzzy Logic Inference [24] Primitive Classifiers such as k-nearest Neighbor(k-NN), Naïve Bayes(NB) barely used in Music Emotion Recognition In summary, k-nn is used for Similar Music Retrieval [14-16] and Genetic Algorithm is used for Emotional Music Generation [20-21] It is interesting to note that NB does not used in Music Emotion Recognition Most of recognizer take a form of hierarchical structure Huron pointed out that, in the two factors in Thayer s model, energy is more computationally tractable and can be estimated using simple amplitude-based measure [18] 9/10
Related Works (Summary) 10/10 Music Emotion Recognition Trends Acoustic Feature Set Composition Training Recognizer Emotion Recognition Recognizer Within the MER system, most of researchers proposed novel acoustic feature extractor for their own music database - [6,12,18,26], and so on However, the category of extracted features is almost same, such as Tempo, Rhythm motion, Articulation, Intensity. And the combinations of each acoustic features is important to recognize the perceived emotion - [10,28], and most of all MIR toolbox, Psysound, Marsyas are the most widely used tool box To address the complex property of perceived emotion, researchers are relied on - Structural recognizer that is relied on preliminary knowledge on music emotion [1,4,6,13,18] - Nonlinear recognizer such as neural network, support vector machine [1,4,6,13] - Naïve Bayes and k-nn is barely used in music emotion recognition To handle ambiguous property of music emotion, they used - Fuzzy Login or classification that is provide certain confidence [22] There is two types of emotion representation model - Hevner s Emotional Adjective Checklists (1936) - Thayer s Emotion Model of Music Emotion (1989) Researchers used Thayer s model rather than Hevner s model - The adjective checklist is different among different research context and immense [16] Types of labeling technique - Continuous([4,26]) vs. Static (Most) - Single(Most) vs. Multiple([18])