Subjective evaluation of common singing skills using the rank ordering method

Similar documents
On human capability and acoustic cues for discriminating singing and speaking voices

Construction of a harmonic phrase

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Analysis of local and global timing and pitch change in ordinary

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Subjective Similarity of Music: Data Collection for Individuality Analysis

th International Conference on Information Visualisation

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

THE importance of music content analysis for musical

AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE

The effect of exposure and expertise on timing judgments in music: Preliminary results*

638 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010

The Human Features of Music.

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Outline. Why do we classify? Audio Classification

Singer Traits Identification using Deep Neural Network

Toward Music Listening Interfaces in the Future

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

SINCE the lyrics of a song represent its theme and story, they

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Judgments of distance between trichords

1. Introduction NCMMSC2009

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

A chorus learning support system using the chorus leader's expertise

MUSI-6201 Computational Music Analysis

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

CULTIVATING VOCAL ACTIVITY DETECTION FOR MUSIC AUDIO SIGNALS IN A CIRCULATION-TYPE CROWDSOURCING ECOSYSTEM

Measurement of overtone frequencies of a toy piano and perception of its pitch

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

APP USE USER MANUAL 2017 VERSION BASED ON WAVE TRACKING TECHNIQUE

Acoustic and musical foundations of the speech/song illusion

How do scoops influence the perception of singing accuracy?

VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION

The Tone Height of Multiharmonic Sounds. Introduction

INTERACTIVE GTTM ANALYZER

Unisoner: An Interactive Interface for Derivative Chorus Creation from Various Singing Voices on the Web

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

Transcription of the Singing Melody in Polyphonic Music

Speech and Speaker Recognition for the Command of an Industrial Robot

Unisoner: An Interactive Interface for Derivative Chorus Creation from Various Singing Voices on the Web

VocaRefiner: An Interactive Singing Recording System with Integration of Multiple Singing Recordings

Drumix: An Audio Player with Real-time Drum-part Rearrangement Functions for Active Music Listening

Computer Coordination With Popular Music: A New Research Agenda 1

Musical Instrument Recognizer Instrogram and Its Application to Music Retrieval based on Instrumentation Similarity

Expressive performance in music: Mapping acoustic cues onto facial expressions

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

Timbral description of musical instruments

Children s recognition of their musical performance

Introductions to Music Information Retrieval

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Pitch-Synchronous Spectrogram: Principles and Applications

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

MHSIB.5 Composing and arranging music within specified guidelines a. Creates music incorporating expressive elements.

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

How do we perceive vocal pitch accuracy during singing? Pauline Larrouy-Maestri & Peter Q Pfordresher

Music out of Digital Data

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Retrieval of textual song lyrics from sung inputs

SmartMusicKIOSK: Music Listening Station with Chorus-Search Function

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

INFLUENCE OF MUSICAL CONTEXT ON THE PERCEPTION OF EMOTIONAL EXPRESSION OF MUSIC

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Semi-supervised Musical Instrument Recognition

Senior High School Band District-Developed End-of-Course (DDEOC) Exam Study Guide

Effect of coloration of touch panel interface on wider generation operators

Composer Style Attribution

Classification of Different Indian Songs Based on Fractal Analysis

THE SOUND OF SADNESS: THE EFFECT OF PERFORMERS EMOTIONS ON AUDIENCE RATINGS

Spectral correlates of carrying power in speech and western lyrical singing according to acoustic and phonetic factors

Objective Assessment of Ornamentation in Indian Classical Singing

Measuring Radio Network Performance

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Junior Fine Arts Music Judging Sheets

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

TIMBRE AND MELODY FEATURES FOR THE RECOGNITION OF VOCAL ACTIVITY AND INSTRUMENTAL SOLOS IN POLYPHONIC MUSIC

Music Alignment and Applications. Introduction

Basic Operations App Guide

THE EFFECT OF EXPERTISE IN EVALUATING EMOTIONS IN MUSIC

Speech Recognition and Signal Processing for Broadcast News Transcription

Task-based Activity Cover Sheet

Audio Feature Extraction for Corpus Analysis

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

HBI Database. Version 2 (User Manual)

Creating a Feature Vector to Identify Similarity between MIDI Files

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Hidden Markov Model based dance recognition

CALCULATING SIMILARITY OF FOLK SONG VARIANTS WITH MELODY-BASED FEATURES

Chords not required: Incorporating horizontal and vertical aspects independently in a computer improvisation algorithm

Query By Humming: Finding Songs in a Polyphonic Database

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

A prototype system for rule-based expressive modifications of audio recordings

Music Genre Classification and Variance Comparison on Number of Genres

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

Transcription:

lma Mater Studiorum University of ologna, ugust 22-26 2006 Subjective evaluation of common singing skills using the rank ordering method Tomoyasu Nakano Graduate School of Library, Information and Media Studies, University of Tsukuba Tsukuba, Ibaraki 305-8550, Japan nakano@slis.tsukuba.ac.jp Masataka Goto National Institute of dvanced Industrial Science and Technology (IST) Tsukuba, Ibaraki 305-8568, Japan m.goto@aist.go.jp Yuzuru Hiraga Graduate School of Library, Information and Media Studies, University of Tsukuba Tsukuba, Ibaraki 305-8550, Japan hiraga@slis.tsukuba.ac.jp STRCT This paper presents the results of two experiments on singing skill evaluation, where human subjects judge the subjective quality of previously unheard melodies. The aim of this study is to explore the criteria that human subjects use in judging singing skill and the stability of their judgments, as a basis for developing an automatic singing skill evaluation scheme. The experiments use the rank ordering method, where the subjects ordered a group of given stimuli according to their preferred rankings. Experiment 1 uses real, a capella singing as the stimuli, while experiment 2 uses the fundamental frequency (F0) sequence extracted from the singing. In experiment 1, 88.9% of the correlation between the subjects' evaluations was significant at the 5% level. Results of experiment 2 show that the F0 sequence is significant in only certain cases, so that the judgment and its stability in experiment 1 should be attributed to other factors of real singing. In: M. aroni,. R. ddessi, R. Caterina, M. Costa (2006) Proceedings of the 9th International Conference on Music Perception & Cognition (ICMPC9), ologna/italy, ugust 22-26 2006. 2006 The Society for Music Perception & Cognition (SMPC) and European Society for the Cognitive Sciences of Music (ESCOM). Copyright of the content of an individual paper is held by the primary (first-named) author of that paper. ll rights reserved. No paper from this proceedings may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information retrieval systems, without permission in writing from the paper's primary author. No other part of this proceedings may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information retrieval system, without permission in writing from SMPC and ESCOM. Keywords singing skill, subjective evaluation, rank ordering method CKGROUND utomatic evaluation of singing skills is a promising research topic with various applications in scope. Previous research on singing evaluation has focused on trained, professional singers (mostly in classic music), using various approaches from physiology, anatomy, acoustics, and psychology with the aim of presenting objective, quantitative measures of singing quality. Such works have reported that the singing voices have singer's formant [1] and the specific characteristics of fundamental frequency (F0) [2]. In particular, the singer's formant characterizes singing quality as ringing [3]. Our interest is directed more towards ordinary, common person's singing, understanding how they mutually evaluate their quality, and to incorporate such findings in an automatic evaluation scheme. IMS The aim of this study is to explore the criteria that human subjects use in judging singing skill, and identify whether such judgments are stable and in mutual agreement among different subjects. This will serve as a preliminary basis for our goal of developing an automatic singing skill evaluation scheme. Two experiments were carried out. Experiment 1 is intended to verify the stability of human judgment, using a capella singing sequences (solo singing) as the stimuli. Experiment 2 uses the F0 sequences (F0 singing) extracted from solo singing, and is intended to identify their contribution in the judgment. In both experiments, the melodies were previously unheard by the subjects. ISN 88-7395-155-4 2006 ICMPC 1507

METHOD ND EXPERIMENTS The standard method of subjective evaluation by giving grade scores to each tested stimuli [4] is inappropriate for our case of singing evaluation, where the subtleties of subjects' judgments may be obscured by differences in musical experience. So instead, we used a rank ordering method, where the subjects were asked to order a group of stimuli according to their preferred rankings. The singing samples are digital recordings of 16bit/16kHz/monaural. In order to suppress the variance between the samples, all the samples were at the same volume and were presented through a headphone. Interface for Subjective Evaluation Figure 1 shows the interface screen used in the experiments. The speaker icons indicate 10 stimuli (,,..., J), which can be double-clicked to play the sound, and can be moved around by drag-and-drop using the mouse. The subjects are instructed to align the icons horizontally according to their order of judgment, ranging from poor (left-hand side) to good (right-hand side). The vertical positioning is insignificant for the experiments. The left figure shows an initial ting (random order), and the right figure shows an example result, with H judged as the best and as the poorest. t the end of the experiment, the subjects are also instructed to insert two lines (1. and 2. in the right figure) classifying the samples into "good" (H, I), "poor" (, F, D) and "intermediate" (E, J,, G, C). Figure 1. Example subjective evaluation session using the interface screen.. Proceedings of the 9th International Conference on Music Perception & Cognition (ICMPC9). 2006 The Society for Music Perception & Cognition (SMPC) and European Society for the Cognitive Sciences of Music (ESCOM). Copyright of the content of an individual paper is held by the primary (first-named) author of that paper. ll rights reserved. No paper from this proceedings may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information retrieval systems, without permission in writing from the paper's primary author. No other part of this proceedings may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information retrieval system, with permission in writing from SMPC and ESCOM. The Measurement of Rank Correlation The results are analyzed using the Spearman's rank correlation ρ [5] defined as follows: N 6 2 ρ = 1 (1) 3 N N i= 1 ( a i b i ) where N is the number of stimuli (= 10 in the experiments), and a, i b are the i-th component (rank value) of the rank i vectors a and b. The value of ρ ranges from 1 (a = b) to 1 (a, b are reverse order). The correlation of a, b is significant at the 1% level for ρ 0. 7333 and 5% level for ρ 0.5636 [5]. Experiment 1 This experiment uses solo singing as the stimuli. The subjects were presented with four groups of singing, each group with the same melody sung by 10 singers. The task is to order each group using the interface explained above. The subjects were free to listen to the melodies as many times as they want to. The subjects were also asked to give introspective description of their judgments. Subjects 22 subjects (University students, ages 19 to 29) participated in the experiment. 16 had experience with musical instruments, and 2 had experience with vocal music (popular or chorus). 4 stated to possess absolute pitch. The subjects were divided into two s (,, each with 11 subjects), each presented with the same stimuli. Stimuli The samples of stimuli were taken from the RWC Music Database: Popular Music (RWC-MD-P-2001) [7] and the IST Humming Database (IST-HD) [6]. The IST- HD contains singing voices of 100 subjects, each singing the melodies of two excerpts from the chorus and verse sections of 50 songs (100 samples) in the RWC Music Database (Popular Music [7] and Music Genre [8]) Table 1 shows the two stimuli s and. Each has 4 different melodies, sung by 10 individuals of the same gender (1 from RWC-MD-P and 9 from IST-HD) presented as a group on the interface screen. The of the is either Japanese or English. Experiment 2 Experiment 2 follows the same procedure as experiment 1, except that the stimuli are replaced with F0 singing, extracted from the solo singing used in experiment 1 (see below). The subjects were further instructed to ignore any noise cased by the F0 extraction process. Subjects 20 subjects (University students, ages 19 to 35) participated in the experiment. None of them participated in experiment 1. 17 had experience with musical instruments, and 6 had experience with vocal music (popular or chorus). 6 stated ISN 88-7395-155-4 2006 ICMPC 1508

to possess absolute pitch. The subjects were divided into two s (,, each with 10 subjects), each presented with the same stimuli. Stimuli The stimuli used in this experiment are F0 sequences extracted from the samples used in experiment 1, removing all other vocal features. F0 is estimated per 10 msec using the method of Goto et al. [9], and is resynthesized as a sinusoidal wave with its amplitude preserving the power of the most predominant harmonic structure of the original. The resulting F0 sequence a natural impression comparable to the original. music No. Table 1. 80 stimuli (by 40 singers). excerpted section gender the number of singers 27 verse Japanese male 10 28 verse Japanese female 10 90 verse English male 10 97 chorus English female 10 27 chorus Japanese male 10 28 chorus Japanese female 10 90 chorus English male 10 97 verse English female 10 Note: music No. are from RWC-MD-P-2001 RESULTS The ranking correlation (1) is calculated for all pairs of subject rankings, giving the ρ-matrix shown schematically in Figure 2. In the top figure, I corresponds to pairings for rankings in experiment 1 (55 = 11 x (11-1)/2 pairs), II to pairings in experiment 2 (45 = 10 x (10-1)/2 pairs), and III to cross-pairings for rankings in experiments 1 and 2 (110 = 11 x 10 pairs). The bottom figure shows an example gradation display of the ρ-matrix for the " I --Englishfemale" case, corresponding to, English, female singer group. The gradation is darker for higher ρ values. Tables 2,3 show the results for I (ρ values of solo singing). Table 2 shows the percentage of significant pairs, and Table 3 shows the statistical values of ρ for each of the groups. The results show the stability of subject judgments for solo singing. Each singing sample was further labeled as good, poor or otherwise (which is to be used for developing the automatic evaluation scheme), using the following criteria: good poor many subjects evaluated the sample as good and no subject as poor, many subjects evaluated the sample as poor and no subject as good, otherwise neither of the above. Table 4 shows the results of labeling. Table 2. Percentage of significant pairs (solo singing). gender p<.01 p<.05 Japanese male 96.4% (53) 100.0% (55) Japanese female 74.6% (41) 90.9% (50) English male 61.8% (34) 89.1% (49) English female 41.8% (23) 80.0% (44) 68.6% 90.0% (151) (198) Japanese male 45.5% (25) 72.7% (40) Japanese female 72.7% (40) 98.2% (54) English male 52.7% (29) 89.1% (49) English female 74.6% (41) 90.9% (50) 61.4% 87.8% (135) (193) overall (440) 65.0% 88.9% (260) (391) Figure 2. Graphical scheme of the ρ-matrix (above) and example ISN gradation 88-7395-155-4 display for 2006 the ICMPC English female case. 1509 Table 3. Statistics of ρ (solo singing). gender mean (SD) min / max Japanese male 0.87 (0.07) 0.71 / 0.99

Japanese female 0.77 (0.14) 0.38 / 0.95 English male 0.75 (0.14) 0.28 / 0.96 English female 0.69 (0.14) 0.42 / 0.98 Japanese male 0.64 (0.22) 0.03 / 0.98 Japanese female 0.81 (0.13) 0.39 / 0.99 English male 0.73 (0.14) 0.36 / 0.98 English female 0.76 (0.14) 0.36 / 0.96 Table 4. Results of labeling (good/poor). gender good poor otherwise Japanese male 3/10 2/10 5/10 Japanese female 3/10 3/10 4/10 English male 4/10 2/10 4/10 English female 3/10 2/10 5/10 Japanese male 1/10 3/10 7/10 Japanese female 3/10 3/10 4/10 English male 2/10 2/10 6/10 English female 3/10 4/10 3/10 Table 5. Percentage of significant pairs (F0 singing). gender p<.01 p<.05 Japanese male 44.4% (20) 77.8% (35) Japanese female 55.6% (25) 71.1% (32) English male 15.6% (7) 37.8% (17) English female 17.8% (8) 35.6% (16) 33.3% 55.6% overall (180) (60) (100) Japanese male 2.2% (1) 13.3% (6) Japanese female 22.2% (10) 44.4% (20) English male 15.6% (7) 46.7% (21) English female 40.0% (18) 62.2% (28) overall (180) 20.0% 41.7% (36) (75) overall (360) 26.7% 48.6% (96) (175) Japanese male 0.68 (0.17) 0.22 / 0.94 Japanese female 0.69 (0.18) 0.26 / 0.94 English male 0.44 (0.27) -0.24 / 0.89 English female 0.32 (0.39) -0.87 / 0.88 Japanese male 0.27 (0.27) -0.33 / 0.79 Japanese female 0.52 (0.23) -0.03 / 0.89 English male 0.45 (0.29) -0.21 / 0.87 English female 0.64 (0.16) 0.26 / 0.94 Table 7. Percentage of significant pairs (solo F0 singing). gender p<.01 p<.05 Japanese male 54.5% (60) 82.7% (91) Japanese female 40.0% (44) 78.2% (86) English male 25.5% (28) 56.4% (62) English female 20.9% (23) 55.5% (61) 35.2% (155) 68.2% (300) Japanese male 12.7% (14) 27.3% (30) Japanese female 29.1% (32) 60.0% (66) English male 21.8% (24) 44.5% (49) English female 41.8% (46) 78.2% (86) 26.4% 52.5% (116) (231) overall (440) 30.1% 60.3% (271) (531) Table 8. Statistics of ρ (solo F0 singing). gender mean (SD) min / max Japanese male 0.72 (0.17) 0.24 / 0.98 Japanese female 0.66 (0.18) 0.13 / 0.92 English male 0.56 (0.24) -0.21 / 0.99 English female 0.48 (0.32) -0.44 / 0.90 Japanese male 0.42 (0.26) -0.27 / 0.90 Japanese female 0.61 (0.19) 0.08 / 0.92 English male 0.50 (0.24) -0.10 / 0.98 English female 0.66 (0.17) 0.14 / 0.95 Table 6. Statistics of ρ (F0 singing). gender mean (SD) min / max ISN 88-7395-155-4 2006 ICMPC 1510

Tables 5,6 corresponds to the results of Tables 2,3 for II (F0 singing), and Tables 7,8 for III (solo F0 singing cross correlation). The results for II show the stability of subject judgments for F0 singing, while the results for III show the correlation between judgments for solo and F0 singing, indicating the amount of contribution of the F0 factor. Figure 3 shows the bar graph indicating that the results of Tables 2, 5, 7. The criteria that human subjects use in judging singing skill can also be looked into from the introspective comments. Example features mentioned in the comments for experiment 1 include: tonal stability rhythmical stability pronunciation quality singing technique (e.g. vibrato, keeping a stable F0) vocal expression and quality good/poor can be classified from a short sequence (3 5 seconds) personal preference Likewise for experiment 2: tonal stability rhythmical stability singing technique (e.g. vibrato, keeping a stable F0) vocal expression and quality Figure 3. Percentage of significant pairs. DISCUSSION The results of I show that 391 pairs (88.9%) of subject rankings were significant at the 5% level, and 260 pairs (65.0%) were significant at the 1% level. This suggests that the rankings are generally stable and in mutual agreement, meaning that they are based more on common, objective features, contrary to the comments mentioning that evaluation is a matter of personal preference. The ρ values in Tables 3, 6, 8 all have positive (and in many cases, high) mean values, also indicating that the general tendency of the rankings are stable. Furthermore, in the good/poor classification, none of the samples were completely divided between good and poor ratings. eing such, the results of the labeling (good/poor) can be taken as a sufficiently reliable basis to be utilized in developing an automatic evaluation scheme. This is further supported by the fact that many comments refer to objective (or at least, objectively taken) features such as tonal stability as judgment criteria, and that only a short sequence (3 5 sec.) is sufficient for judging good/poor. These points give practical support for the realizability of such a scheme. The results of II show that the subjects' rankings of F0 singing are stable in some cases (e.g. Japanese male, Japanese female, and English female) but not so in others. High correlation rates are obtained when the melodies consist of relatively long notes, which require higher singing skills. ut together with the relatively low overall values of the results of III, it can be said that F0 alone is not decisive for judging singing skills, and other acoustic and musical features are incorporated in achieving the high correlation rates in the results of I. One interesting point is that some comments for experiment 2 mentioned "vocal expression or quality", indicating that such features can (at least in a subjective sense) be recognized even with information of F0 alone. ISN 88-7395-155-4 2006 ICMPC 1511

CONCLUSION The results show that under the control of, singers' gender, and melody type (verse/chorus), the rankings given by the subjects are generally stable, indicating that they depend more on common, objective features rather than reflecting subjective preference. This makes the results reliable enough to be used as a referendum for developing automatic singing evaluation schemes. Further experiments will be conducted in various other tings to explore singing skills in more detail. Work on identifying the key acoustic properties that underlie human judgments is also in progress. CKNOWLEDGMENTS This paper utilized the RWC Music Database (Popular Music) and IST Humming Database. REFERENCES [1] Sundberg, J. (1987). The Science of the Singing Voice. Illinois: the Northern Illinois University Press. [2] Saitou, T., Unoki, M. & kagi, M. (2005). Developmentof an F0 Control Model ased on F0 Dynamic Characteristics for Singing-voice Synthesis. Speech Communication, 46, 405-417. [3] Omori, K., Kacker,., Carroll, L. M., Riley, W. D. & laugrund, S. M. (1996). Singing Power Ratio: Quantiative Evaluation of Singing Voice Quality. Journal of Voice, 10 (3), 228-235. [4] Franco, H., Leonardo, N., Digalakis, V. & Ronen, O. (2000). Combination of Machine Scores for utomatic Grading of Pronunciation Quality. Speech Communication, 30, 121-130. [5] Kendall, M. & Gibbons, J. D. (1990). Rank Correlation Methods. New York: Oxford University Press. [6] Goto, M. & Nishimura, T. (2005). IST Humming Database: Music Database for Singing Research. The Special Interest Group Notes of IPSJ (MUS), 2005 (82), 7-12. (in Japanese) [7] Goto, M., Hashiguchi, H., Nishimura, T. & Oka, R. (2002). RWC Music Database: Popular, Classical, and Jazz Music Databases. in Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR2002), 287-288. [8] Goto, M., Hashiguchi, H., Nishimura, T. & Oka, R. (2003). RWC Music Database: Music Genre Database and Musical Instrument Sound Database. in Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR2003), 229-230. [9] Goto, M., Itou, K. & Hayamizu, S. (1999). Real-time Filled Pause Detection System for Spontaneous Speech Recognition. in Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech 99), 227-230. ISN 88-7395-155-4 2006 ICMPC 1512