AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE

Similar documents
On human capability and acoustic cues for discriminating singing and speaking voices

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

1. Introduction NCMMSC2009

Subjective evaluation of common singing skills using the rank ordering method

Subjective Similarity of Music: Data Collection for Individuality Analysis

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

How do scoops influence the perception of singing accuracy?

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

MUSI-6201 Computational Music Analysis

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT

Audio-Based Video Editing with Two-Channel Microphone

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Automatic Rhythmic Notation from Single Voice Audio Sources

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Singing voice synthesis based on deep neural networks

2. AN INTROSPECTION OF THE MORPHING PROCESS

Transcription of the Singing Melody in Polyphonic Music

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

THE importance of music content analysis for musical

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION

Improving Frame Based Automatic Laughter Detection

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Automatic Laughter Detection

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Singer Traits Identification using Deep Neural Network

Supervised Learning in Genre Classification

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Singing Voice Detection for Karaoke Application

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Unisoner: An Interactive Interface for Derivative Chorus Creation from Various Singing Voices on the Web

Query By Humming: Finding Songs in a Polyphonic Database

Automatic Laughter Detection

Measurement of overtone frequencies of a toy piano and perception of its pitch

Topic 10. Multi-pitch Analysis

Tempo and Beat Analysis

Efficient Vocal Melody Extraction from Polyphonic Music Signals

638 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010

/$ IEEE

SINCE the lyrics of a song represent its theme and story, they

A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis

ISSN ICIRET-2014

Analysis, Synthesis, and Perception of Musical Sounds

Advanced Signal Processing 2

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

Outline. Why do we classify? Audio Classification

Retrieval of textual song lyrics from sung inputs

Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Recognising Cello Performers using Timbre Models

Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases *

Musical Instrument Recognizer Instrogram and Its Application to Music Retrieval based on Instrumentation Similarity

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

Topics in Computer Music Instrument Identification. Ioanna Karydi

AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

Unisoner: An Interactive Interface for Derivative Chorus Creation from Various Singing Voices on the Web

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam

Recognising Cello Performers Using Timbre Models

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Classification of Timbre Similarity

Perception of melodic accuracy in occasional singers: role of pitch fluctuations? Pauline Larrouy-Maestri & Peter Q Pfordresher

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Singing accuracy, listeners tolerance, and pitch analysis

Chord Classification of an Audio Signal using Artificial Neural Network

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

CULTIVATING VOCAL ACTIVITY DETECTION FOR MUSIC AUDIO SIGNALS IN A CIRCULATION-TYPE CROWDSOURCING ECOSYSTEM

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS

Proceedings of Meetings on Acoustics

Voice & Music Pattern Extraction: A Review

Music Radar: A Web-based Query by Humming System

Music Genre Classification and Variance Comparison on Number of Genres

A Bayesian Network for Real-Time Musical Accompaniment

Week 14 Music Understanding and Classification

A chorus learning support system using the chorus leader's expertise

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Automatic morphological description of sounds

Semi-supervised Musical Instrument Recognition

Singer Identification

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Singer Recognition and Modeling Singer Error

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Hidden Markov Model based dance recognition

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Transcription:

1th International Society for Music Information Retrieval Conference (ISMIR 29) AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE Tatsuya Kako, Yasunori Ohishi, Hirokazu Kameoka, Kunio Kashino, Kazuya Takeda Graduate School of Information Science, Nagoya University NTT Communication Science Laboratories, NTT Corporation kako@sp.m.is.nagoya-u.ac.jp, ohishi@cs.brl.ntt.co.jp, kameoka@eye.brl.ntt.co.jp kunio@eye.brl.ntt.co.jp, kazuya.takeda@nagoya-u.jp ABSTRACT A stochastic representation of singing styles is proposed. The dynamic property of melodic contour, i.e., fundamental frequency (F ) sequence, is assumed to be the main cue for singing styles because it can characterize such typical ornamentations as vibrato. F signal trajectories in the phase plane are used as the basic representation. By fitting Gaussian mixture models to the observed F trajectories in the phase plane, a parametric representation is obtained by a set of GMM parameters. The effectiveness of our proposed method is confirmed through experimental evaluation where 94.1% accuracy for singer-class discrimination was obtained. 1. INTRODUCTION Although no firm definition has yet been established for singing style in musical information processing research, several studies have reported the relationship between singing styles and such signal features as singing formant [1, 2] and singing ornamentations. Various research efforts have been made to characterize ornamentations by the acoustical property of the sung melody, i.e., vibrato [3 11], overshoot [12], and fine fluctuation [13]. The importance of such melodic features for perceiving singer individuality was also reported in [14] based on psycho-acoustic experiments. They concluded that the average spectrum and the dynamical property of the F sequence affect the perception of the individuality. Those studies suggest that singing style is related to the local dynamics of a sung melody that does not contain any musical information. Therefore, in this study, we focus on the local dynamics of the F sequence, i.e., the melodic contour, as a cue of singing style and propose a parametric representation as a model for singing styles. On the other hand, very few application systems have been reported that use the local dynamics of a sung melody. [15] reported a singer recognition experiment using vibrato. [16] reported a method for evaluating singing skill through the spectrum analysis of the F contour. Although Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. c 29 International Society for Music Information Retrieval. these studies try to use the local dynamics of melodic contour as a cue for ornamentation, no systematic method has been proposed for characterizing singing styles. A lag system model for typical ornamentations was reported in [14,17 19]; however, variation of singing styles was not discussed. In this paper, we propose a stochastic phase plane as a graphical representation of singing styles and show its effectiveness for singing style discrimination. One merit of this representation to characterize singing style is that since neither an explicit detection function for ornamentation like vibrato nor estimation of the target note is required, it is robust to sung melodies. In a previous paper [2], we applied this graphical representation of the F contour in the phase plane to a queryby-hamming system and neutralized the local dynamics of the F sequence so that only musical information was utilized for the query. In contrast, in this study, we use the local dynamics of the F sequence for modeling singing styles and disregard the musical information because musical information and singing style are in a dual relation. In this paper, we also evaluate the proposed representation through a singer-class discrimination experiment in which we show that our proposed model can extract the dynamic properties of sung melodies shared by a group of singers. In the next section, we propose stochastic phase plane (SPP) as a stochastic representation of the melodic contour and show how singing ornamentations are modeled by the proposed SPP. In Section 3, we experimentally show the effectiveness of our proposed method through singer class discrimination experiments. Section 4 discusses the obtained results and concludes this paper. 2. STOCHASTIC REPRESENTATION OF THE DYNAMICAL PROPERTY OF MELODIC CONTOUR 2.1 F signal in the Phase Plane Such ornamental expressions in singing as vibrato are characterized by the dynamical property of their F signal. Since the F signal is a controlled output of the human speech production system, its basic dynamical characteristics can be related to a differential equation. Therefore, we can use the phase plane, which is the joint plot of a variable and its time derivative, i.e., (x, ẋ), to depict its dynamical property. 393

Poster Session 3 Classical (female) F trajectory Pop (female) F - F phase plane 4 2 5 F F [cent] 55 45 1 2 Time [sec] 3-2 Classical (female) F - F phase plane F 5-4 45 4 45 F 55 F [cent] -5 5 F [cent] 55 2 Classical (female) F - F phase plane F 4 5 2-2 -2-4 45 5 F [cent] -4 45 55 5 F [cent] 55 Figure 1. Melodic contour (top) and corresponding phase planes for F - F (middle) and F - F (bottom) Figure 2. Gaussian mixture model fitted to F contour in phase plane Although the signal sequence is not given as an explicit function of time, F (t), but as a sequence of numbers, {F (n)}n=1,,n, we can estimate the time derivative using the delta- coefficient given by phase plane (SPP) and use it for characterizing the melodic contour. A common feature of the trajectories in the phase plane is that most of their segments are distributed around the target note, and therefore the distribution s histogram is multimodal, but each mode can be represented by a simple symmetric 2d or 3d-pdf. Therefore, Gaussian mixture model (GMM), K F (n) = k F (n + k) k= K K, (1) M k2 k= K where 2K is the window length for calculating the dynamics. Changing the window length extracts different aspects of the signal property. An example of such a plot for a given melodic contour is shown in Fig. 1. Here, the F signal (top), the phase plane (middle), and the second order phase plane, which is given by the joint plot of F and F (bottom), are plotted. The singing ornamentations are depicted as the local behavior of the trajectory around the centroids that commonly represent target musical notes. Vibrato in singing, for example, is shown as circular trajectories centered at target notes. In the second order plane, the trajectories appear as lines with a slope of -45 degrees. This shows that the relationship between F and F is given as F = F. λm N (f (n); µm, Σm ), (3) m=1 where T f (n) = [F (n), F (n), F (n)], (4) is adopted for the modeling. N ( ) is a Gaussian distribution, and Θ = {λm, µm, Σm }m=1,,m, (5) are parameters of the model, each of which represents the relative frequency, the mean vector, and the covariance matrix of each Gaussian. A GMM trained for F contours in the phase plane is depicted in Fig. 2. A smooth surface is trained through model fitting. The horizontal deviations of each Gaussian represent the stability of the melodic contour around the target note, but the vertical deviations represent the vibrato depth. In this manner, singing styles can be modeled by set of parameters Θ of the stochastic phase plane. (2) Hence, the sinusoidal component is imposed in the given signal. Over/under-shoots to the target note are represented as spiral patterns around the note. 2.3 Examples of Stochastic Phase Plane In Fig. 3, the F signals of three female singers are plotted: professional classical, professional pop, and an amateur. A deep vibrato is observed as a large vertical deviation in the Gaussians in the professional classical singer s plot. On the other hand, the amateur s plot is characterized by large horizontal deviations. Although deep vibrato is not observed in the plot for the professional pop singer, its smaller horizontal deviation shows that she accurately sang the melody. 2.2 Stochastic representation of Phase Plane Once a singing style is represented as a phase plane trajectory, parameterizing the representation becomes an issue for further engineering applications. Since the F signal is not deterministic, i.e., it varies across singing behaviors, a stochastic model must be defined for the parameterization. By fitting a parametric probability density function to the trajectories in the phase plane, we can build a stochastic 394

1th International Society for Music Information Retrieval Conference (ISMIR 29) Classical (female) 4 Classical (female) 1 F F 2-2 -4 2 4 6 F [cent] 8 1-1 12 Pop (female) 4 2 F 4 6 F [cent] 8 1 12 8 1 12 8 1 12 Pop (female) 1 2 F -2-4 2 4 6 F [cent] 8 1-1 12 Amateur (female) 4 2 4 6 F [cent] Amateur (female) 1 F F 2-2 -4 2 4 6 F [cent] 8 1-1 12 2 4 6 F [cent] Figure 3. Stochastic phase plane models for professional classical (top), professional pop (middle), and amateur (bottom) Figure 4. 2nd order stochastic phase plane models for professional classical (top), professional pop (middle), and amateur (bottom) Table 1. Signal analysis conditions for F estimation. Harmonical PSD pattern matching [21] is used with these parameters. was done in the procedure below. First, the F frequency in [Hz] is converted to [cent] by F [cent]. (6) 12 log2 44 23/12 5 Then the local deviations from the tempered clavier are calculated by the residue operation mod( ): Signal sampling freq. F estimation window length Window function Window shift F contour smoothing coefficient calculation 16 khz 64 ms Hanning window 1 ms 5 ms MA filter K=2 mod (F + 5, 1). (7) Obviously, after this conversion, the F value is limited to (, 1) in [cent]. The stochastic representations of the second order phase plane are also shown in Fig. 4. Strong negative correlations between F and F can be found only in the plot for the professional classical singer that also indicates deep vibrato in the singing style. 3.2 Discrimination Experiment The discrimination of three singer classes, i.e., professional classical, professional pop, and amateur, was performed based on the maximum a posteriori probability (MAP) decision: s = arg max [p(s {F, F, F })] s # " N 1 = arg max log p(f (n) Θs ) + log p(s) (8) s N n=1 3. EPERIMENTAL EVALUATION The effectiveness of using SPP to discriminate different singing styles is evaluated experimentally. where s is the singer-class id and Θs is the model parameters of the sth singer-class. We used Twinkle-Twinkle, Little Star and five etudes sung by singers from each singer class for training and Ode to Joy sung by the same singers for testing. Therefore the results are independent from sung melodies but closed in singers. N is the length of the signal in the samples. Since we assumed an equal a priori probability for singer-class distribution p(s), the above MAP decision is equivalent to the Maximum Likelihood decision. 3.1 Experimental set up The following singing signals of six singers were used: one of each gender in the categories of professional classical, professional pop, and amateur. With/without musical accompaniment, each subject sang songs with Japanese lyrics and hummed. The songs were Twinkle, Twinkle, Little Star, and Ode to Joy and five etudes. A total of 12 song signals was recorded. The F contour was estimated using [21]. The signal processing conditions for calculating F, F, and the F contours are listed in Table 1. Since the absolute pitch of the song signals differ across singers, we normalized them so that only the singing style of each singer is used in the experiment. Normalization 3.3 Results Fig. 5 shows the accuracy of the singer-class discrimination. The best is attained for a 13-second input sig- 395

Poster Session 3 95 1 Closed condition Open condition 9 85 M = 8 M = 16 M = 32 8 5. 7. 9. 11. 13. 15. Signal length [sec] Figure 5. Accuracy in discriminating three singer classes 1 8 6 4 2 F, F, F MFCC MFCC, MFCC Figure 7. Comparing proposed representation with MFCC under two conditions 8 6 4 2 F F, F F, F, F Figure 6. Comparing accuracy in discriminating singer classes nal. The accuracy increases with the length of the test signal and 94.1% is attained with an 8-mixture GMM for singer-class models, when a 13-second signal is available for the test input. No significant improvement in accuracy was found for the longer test input because more songdependent information contaminated the test signal. Fig. 6 compares the accuracy of singer-class discriminations using the three sets of features: F only, (F, F ), and (F, F, F ). As shown in the figure, by combining F and F, the discrimination error rate becomes half of the error when only using F. Combining second order derivative F further reduces the error but not as much as the case of F. These results show that the proposed stochastic representation of the phase plane effectively characterizes the singing styles of the three singer classes. 4. DISCUSSION Our proposed method for representing and parameterizing the F contour effectively discriminates the three typical singer classes, i.e., professional classical and pop, and amateurs. To confirm that the method models the singing styles (and not singer individuality), we compared our proposed representation with MFCC under two conditions. As a closed condition, we trained three MFCC-GMMs using Twinkle-Twinkle, Little Star and five etudes sung by six (male and female professional classic, professional pop, and amateur) singers and used Ode to Joy sung by the same singers for testing. On the other hand, as an open condition, we evaluated the MFCC-GMMs through a singer independent manner where singer-class models (GMMs) were trained by female singer data and tested by male singer data. As shown in Fig. 7, the performances of the MFCC-GMM and the proposed method are almost identical (95.%) in the closed condition. However, in the new (unseen) singer experiment, the result of the MFCC- GMM system significantly degraded to 33.3%, but the proposed method attained 87.9% accuracy. These results suggest that the MFCC-GMM system does not model the singing style but discriminates singer individuality. However, since SPP-GMM can correctly classify even an unseen singer s data, our proposed representation models the F dynamic characteristics common within a singer class better than singer individuality. 5. SUMMARY In this paper, we proposed a model for singing styles based on the stochastic graphical representation of the local dynamical property of the F sequence. Since various singing ornamentations are related to signal production systems described by differential equations, phase plane is a reasonable space for depicting singing styles. Furthermore, the Gaussian mixture model effectively parameterizes the graphical representation; therefore, more than 9% accuracy can be achieved in discriminating the three classes of singers. Since the scale of the experiments was small, increasing the number of singers and singer classes is critical future work. Evaluating the robustness of the proposed method to noisy F sequences estimated under such realistic singing conditions as karaoke is also an inevitable step for building real-world application systems. 6. REFERENCES [1] J. Sundberg, The Science of the Singing. Northern Illinois University Press, 1987. [2] J. Sundberg, Singing and timbre, Music room acoustics, vol. 17, pp. 57 81, 1977. 396

1th International Society for Music Information Retrieval Conference (ISMIR 29) [3] C. E. Seashore, A musical ornament, the vibrato, in Proc. Psychology of Music. McGraw-Hill Book Company, 1938, pp. 33 52. [4] J. Large and S. Iwata, Aerodynamic study of vibrato and voluntary straight tone pairs in singing, J. Acoust. Soc. Am., vol. 49, no. 1A, p. 137, 1971. [5] H. B. Rothman and A. A. Arroyo, Acoustic variability in vibrato and its perceptual significance, J. Voice, vol. 1, no. 2, pp. 123 141, 1987. [6] D. Myers and J. Michel, Vibrato and pitch transitions, J. Voice, vol. 1, no. 2, pp. 157 161, 1987. [7] J. Hakes, T. Shipp, and E. T. Doherty, Acoustic characteristics of vocal oscillations: Vibrato, exaggerated vibrato, trill, and trillo, J. Voice, vol. 1, no. 4, pp. 326 331, 1988. [18] N. Minematsu, B. Matsuoka, and K. Hirose, Prosodic modeling of nagauta singing and its evaluation, in Proc. SpeechProsody, 24, pp. 487 49. [19] L. Reqnier and G. Peeters, Singing voice detection in music tracks using direct voice vibrato, in Proc. IC- CASP, 29, pp. 1658 1688. [2] Y. Ohishi, M. Goto, K. Itou, and K. Takeda., A stochastic representation of the dynamics of sung melody, in Proc. ISMIR, 27, pp. 371 372. [21] M. Goto, K. Itou, and S. Hayamizu, A real-time filled pause detection system for spontaneous speech recognition, in Proc. Eurospeech, 1999, pp. 227 23. [8] C. D Alessandro and M. Castellengo, The pitch of short-duration vibrato tones, J. Acoust. Soc. Am., vol. 95, no. 3, pp. 1617 163, 1994. [9] D. Gerhard, Pitch track target deviation in natural singing, in Proc. ISMIR, 25, pp. 514 519. [1] K. Kojima, M. Yanagida, and I. Nakayama, Variability of vibrato -a comparative study between japanese traditional singing and bel canto-, in Proc. Speech Prosody, 24, pp. 151 154. [11] I. Nakayama, Comparative studies on vocal expressions in japanese traditional and western classicalstyle singing, using a common verse, in Proc. ICA, 24, pp. 1295 1296. [12] G. de Krom and G. Bloothooft, Timing and accuracy of fundamental frequency changes in singing, in Proc. ICPhS, 1995, pp. 26 29. [13] M. Akagi and H. Kitakaze, Perception of synthesized singing voices with fine fluctuations in their fundamental frequency contours, in Proc. ICSLP, 2, pp. 458 461. [14] T. Saitou, M. Goto, M. Unoki, and M. Akagi, Speech- To-Singing synthesis: Converting speaking voices to singing voices by controlling acoustic features unique to singing voices, in Proc. WASPAA, 27, pp. 215 218. [15] T. L. Nwe and H. Li, Exploring vibrato-motivated acoustic features for singer identification, IEEE Transactions on Audio, Speech, and Language processing, pp. 519 53, 27. [16] T. Nakano, M. Goto, and Y. Hiraga, An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features, in Proc. Interspeech, 26, pp. 176 179. [17] H. Mori, W. Odagiri, and H. Kasuya, F dynamics in singing: Evidence from the data of a baritone singer, IEICE Trans. Inf. and Syst., vol. E87-D, no. 5, pp. 186 192, 24. 397