AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS

Similar documents
MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Topics in Computer Music Instrument Identification. Ioanna Karydi

MUSI-6201 Computational Music Analysis

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Topic 10. Multi-pitch Analysis

MUSICAL NOTE AND INSTRUMENT CLASSIFICATION WITH LIKELIHOOD-FREQUENCY-TIME ANALYSIS AND SUPPORT VECTOR MACHINES

Musical instrument identification in continuous recordings

WE ADDRESS the development of a novel computational

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

AMusical Instrument Sample Database of Isolated Notes

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

THE importance of music content analysis for musical

Semi-supervised Musical Instrument Recognition

MOTIVATION AGENDA MUSIC, EMOTION, AND TIMBRE CHARACTERIZING THE EMOTION OF INDIVIDUAL PIANO AND OTHER MUSICAL INSTRUMENT SOUNDS

Instrument identification in solo and ensemble music using independent subspace analysis

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Experiments on musical instrument separation using multiplecause

Singing Voice Conversion Using Posted Waveform Data on Music Social Media

Week 14 Music Understanding and Classification

Recognising Cello Performers using Timbre Models

Automatic music transcription

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

CS229 Project Report Polyphonic Piano Transcription

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Lecture 9 Source Separation

Automatic Construction of Synthetic Musical Instruments and Performers

Multipitch estimation by joint modeling of harmonic and transient sounds

Transcription of the Singing Melody in Polyphonic Music

Chord Classification of an Audio Signal using Artificial Neural Network

Recognising Cello Performers Using Timbre Models

Subjective Similarity of Music: Data Collection for Individuality Analysis

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Music Information Retrieval with Temporal Features and Timbre

Environmental sound description : comparison and generalization of 4 timbre studies

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

2018 Fall CTP431: Music and Audio Computing Fundamentals of Musical Acoustics

A prototype system for rule-based expressive modifications of audio recordings

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Feature-based Characterization of Violin Timbre

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation

Violin Timbre Space Features

Classification of Timbre Similarity

Automatic Rhythmic Notation from Single Voice Audio Sources

GOOD-SOUNDS.ORG: A FRAMEWORK TO EXPLORE GOODNESS IN INSTRUMENTAL SOUNDS

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Topic 4. Single Pitch Detection

TYING SEMANTIC LABELS TO COMPUTATIONAL DESCRIPTORS OF SIMILAR TIMBRES

Convention Paper Presented at the 115th Convention 2003 October New York, NY, USA

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

HIT SONG SCIENCE IS NOT YET A SCIENCE

Subjective evaluation of common singing skills using the rank ordering method

Neural Network for Music Instrument Identi cation

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

An Accurate Timbre Model for Musical Instruments and its Application to Classification

Parameter Estimation of Virtual Musical Instrument Synthesizers

A SEGMENTAL SPECTRO-TEMPORAL MODEL OF MUSICAL TIMBRE

Robert Alexandru Dobre, Cristian Negrescu

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

CTP431- Music and Audio Computing Musical Acoustics. Graduate School of Culture Technology KAIST Juhan Nam

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Proceedings of Meetings on Acoustics

Supervised Learning in Genre Classification

CSC475 Music Information Retrieval

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

TIMBRE-CONSTRAINED RECURSIVE TIME-VARYING ANALYSIS FOR MUSICAL NOTE SEPARATION

Audio-Based Video Editing with Two-Channel Microphone

HUMANS have a remarkable ability to recognize objects

Analysis, Synthesis, and Perception of Musical Sounds

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

A Survey on: Sound Source Separation Methods

A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM

Automatic morphological description of sounds

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Cross-Dataset Validation of Feature Sets in Musical Instrument Classification

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

Psychophysical quantification of individual differences in timbre perception

Music Source Separation

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

DXR.1 Digital Audio Codec

Musical Instrument Recognizer Instrogram and Its Application to Music Retrieval based on Instrumentation Similarity

Audio classification from time-frequency texture

PREDICTING THE PERCEIVED SPACIOUSNESS OF STEREOPHONIC MUSIC RECORDINGS

Transcription:

AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine LaBRI - CNRS UMR 5800 - University of Boreaux {fourer, rouas, hanna, robine}@labri.fr ABSTRACT Automatic timbre characterization of auio signals can help to measure similarities between souns an is of interest for automatic or semi-automatic atabases inexing. The most effective methos use machine learning approaches which require qualitative an iversifie training atabases to obtain accurate results. In this paper, we introuce a iversifie atabase compose of worlwie nonwestern instruments auio recorings on which is evaluate an effective timbre classification metho. A comparative evaluation base on the well stuie Iowa musical instruments atabase shows results comparable with those of state-of-the-art methos. Thus, the propose metho offers a practical solution for automatic ethnomusicological inexing of a atabase compose of iversifie souns with various quality. The relevance of auio features for the timbre characterization is also iscusse in the context of non-western instruments analysis. 1. INTRODUCTION Characterizing musical timbre perception remains a challenging task relate to the human auitory mechanism an to the physics of musical instruments [4]. This task is full of interest for many applications like automatic atabase inexing, measuring similarities between souns or for automatic soun recognition. Existing psychoacoustical stuies moel the timbre as a multiimensional phenomenon inepenent from musical parameters (e.g. pitch, uration or louness) [7, 8]. A quantitative interpretation of instrument s timbre base on acoustic features compute from auio signals was first propose in [9] an pursue in more recent stuies [12] which aim at organizing auio timbre escriptors efficiently. Nowaays, effective automatic timbre classification methos [13] use supervise statistical learning approaches base on auio signals features compute from analyze ata. Thus, the performance obtaine with such systems epens on the taxonomy, the size an the iversity of training atabases. However, most c Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine. License uner a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine. Automatic timbre classification of ethnomusicological auio recorings, 15th International Society for Music Information Retrieval Conference, 2014. of existing research atabases (e.g. RWC [6], Iowa [5]) are only compose of common western instruments annotate with specific taxonomies. In this work, we revisit the automatic instrument classification problem from an ethnomusicological point of view by introucing a iversifie an manually annotate research atabase provie by the Centre e Recherche en Ethno-Musicologie (CREM). This atabase is aily supplie by researchers an has the particularity of being compose of uncommon non-western musical instrument recorings from aroun the worl. This work is motivate by practical applications to automatic inexing of online auio recorings atabase which have to be computationally efficient while proviing accurate results. Thus, we aim at valiating the efficiency an the robustness of the statistical learning approach using a constraine stanar taxonomy, applie to recorings of various quality. In this stuy, we expect to show the atabase influence, the relevance of timbre auio features an the choice of taxonomy for the automatic instrument classification process. A result comparison an a crossatabase evaluation is performe using the well-stuie university of Iowa musical instrument atabase. This paper is organize as follows. The CREM atabase is introuce in Section 2. The timbre quantization principle base on mathematical functions escribing auio features is presente in Section 3. An efficient timbre classification metho is escribe in Section 4. Experiments an results base on the propose metho are etaile in Section 5. Conclusion an future works are finally iscusse in Section 6. 2. THE CREM ETHNOMUSICOLOGICAL DATABASE The CREM research atabase 1 is compose of iversifie soun samples irectly recore by ethnomusicologists in various conitions (i.e. no recoring stuio) an from iversifie places all aroun the worl. It contains more than 7000 hours of auio ata recore since 1932 to nowaays using ifferent supports like magnetic tapes or vinyl iscs. The vintage auio recorings of the atabase were carefully igitize to preserve the authenticity of the originals an contain various environment noise. The more recent auio recorings can be irectly igital recore with a high-quality. Most of the musical instruments which com- 1 CREM auio archives freely available online at: http://archives.crem-cnrs.fr/ 295

pose this atabase are non-western an can be uncommon while covering a large range of musical instrument families (see Figure 1(a)). Among uncommon instruments, one can fin the lute or the Ngbaka harp as corophones. More uncommon instruments like Oscillating bamboo, struck machete an struck girer were classifie by ethnomusicologists as iiophones. In this paper, we restricte our stuy to the solo excerpts (where only one monophonic or polyphonic instrument is active) to reuce the interference problems which may occur uring auio analysis. A escription of the selecte CREM sub-atabase is presente in Table 1. Accoring to this table, one can observe that this atabase is actually inhomogeneous. The aerophones are overrepresente while membranophones are unerrepresente. Due to its iversity an the various quality of the composing souns, the automatic ethnomusicological classification of this atabase may appear as challenging. Class name Duration (s) # aerophones-blowe 1,383 146 corophones-struck 357 37 corophones-plucke 715 1,229 75 128 corophones-bowe 157 16 iiophones-struck 522 58 iiophones-plucke 137 753 14 82 iiophones-clinke 94 10 membranophones-struck 170 19 Total 3,535 375 Table 1. Content of the CREM sub-atabase with uration an number of 10-secons segmente excerpts. 3. TIMBRE QUANTIZATION AND CLASSIFICATION 3.1 Timbre quantization Since preliminaries works on the timbre escription of perceive souns, Peeters et al. propose in [12] a large set of auio features escriptors which can be compute from auio signals. The auio escriptors efine numerical functions which aim at proviing cues about specific acoustic features (e.g. brightness is often associate with the spectral centroi accoring to [14]). Thus, the auio escriptors can be organize as follows: Temporal escriptors convey information about the time evolution of a signal (e.g. log attack time, temporal increase, zero-crossing rate, etc.). Harmonic escriptors are compute from the etecte pitch events associate with a funamental frequency (F 0 ). Thus, one can use a prior waveform moel of quasi-harmonic souns which have an equally space Dirac comb shape in the magnitue spectrum. The tonal part of souns can be isolate from signal mixture an be escribe (e.g. noisiness, inharmonicity, etc.). Spectral escriptors are compute from signal timefrequency representation (e.g. Short-Term Fourier Transform) without prior waveform moel (e.g. spectral centroi, spectral ecrease, etc.) Perceptual escriptors are compute from auitoryfiltere banwith versions of signals which aim at approximating the human perception of souns. This can be efficiently compute using Equivalent Rectangular Banwith (ERB) scale [10] which can be combine with gammatone filter-bank [3] (e.g. louness, ERB spectral centroi, etc.) In this stuy, we focus on the soun escriptors liste in table 2 which can be estimate using the timbre toolbox 2 an etaile in [12]. All escriptors are compute for each analyze soun excerpt an may return null values. The harmonic escriptors of polyphonic souns are compute using the prominent etecte F 0 caniate (single F 0 estimation). To normalize the uration of analyze soun, we separate each excerpt in 10-secons length segments without istinction of silence or pitch events. Thus, each segment is represente by a real vector where the corresponing time series of each escriptor is summarize by a statistic. The meian an the Inter Quartile Range (IQR) statistics were chosen for their robustness to outliers. Acronym Descriptor name # Att Attack uration (see ADSR moel [15]) 1 AttSlp Attack slope (ADSR) 1 Dec Decay uration (ADSR) 1 DecSlp Decay slope (ADSR) 1 Rel Release uration (ADSR) 1 LAT Log Attack Time 1 Tcent Temporal centroi 1 Eur Effective uration 1 FreqMo, AmpMo Total energy moulation (frequency,amplitue) 2 RMSenv RMS envelope 2 ACor Signal Auto-Correlation function (12 first coef.) 24 ZCR Zero-Crossing Rate 2 HCent Harmonic spectral centroi 2 HSpr Harmonic spectral sprea 2 HSkew Harmonic skewness 2 HKurt Harmonic kurtosis 2 HSlp Harmonic slope 2 HDec Harmonic ecrease 2 HRoff Harmonic rolloff 2 HVar Harmonic variation 2 HErg, HNErg, HFErg, Harmonic energy, noise energy an frame energy 6 HNois Noisiness 2 HF0 Funamental frequency F 0 2 HinH Inharmonicity 2 HTris Harmonic tristimulus 6 HoevR Harmonic o to even partials ratio 2 Hev Harmonic eviation 2 SCent, ECent Spectral centroi of the magnitue an energy spectrum 4 SSpr, ESpr Spectral sprea of the magnitue an energy spectrum 4 SSkew, ESkew Spectral skewness of the magnitue an energy spectrum 4 SKurt, EKurt Spectral kurtosis of the magnitue an energy spectrum 4 SSlp, ESlp Spectral slope of the magnitue an energy spectrum 4 SDec, EDec Spectral ecrease of the magnitue an energy spectrum 4 SRoff, ERoff Spectral rolloff of the magnitue an energy spectrum 4 SVar, EVar Spectral variation of the magnitue an energy spectrum 4 SFErg, EFErg Spectral frame energy of the magnitue an energy spectrum 4 Sflat, ESflat Spectral flatness of the magnitue an energy spectrum 4 Scre, EScre Spectral crest of the magnitue an energy spectrum 4 ErbCent, ErbGCent ERB scale magnitue spectrogram / gammatone centroi 4 ErbSpr, ErbGSpr ERB scale magnitue spectrogram / gammatone sprea 4 ErbSkew, ErbGSkew ERB scale magnitue spectrogram / gammatone skewness 4 ErbKurt, ErbGKurt ERB scale magnitue spectrogram / gammatone kurtosis 4 ErbSlp, ErbGSlp ERB scale magnitue spectrogram / gammatone slope 4 ErbDec, ErbGDec ERB scale magnitue spectrogram / gammatone ecrease 4 ErbRoff, ErbGRoff ERB scale magnitue spectrogram / gammatone rolloff 4 ErbVar, ErbGVar ERB scale magnitue spectrogram / gammatone variation 4 ErbFErg, ErbGFErg ERB scale magnitue spectrogram / gammatone frame energy 4 ErbSflat, ErbGSflat ERB scale magnitue spectrogram / gammatone flatness 4 ErbScre, ErbGScre ERB scale magnitue spectrogram / gammatone crest 4 Total 164 Table 2. Acronym, name an number of the use timbre escriptors. 2 MATLAB coe available at http://www.cirmmt.org/research/tools 296

aerophones blowe corophones instrument iiophones bowe plucke struck plucke struck pizzicato (a) Hornbostel an Sachs taxonomy (T1) instrument sustaine clinke membranophones struck strings plucke strings bowe strings flute/rees brass piano violin viola cello oublebass violin viola cello oublebass flute clarinet oboe saxophone bassoon (b) Musician s instrument taxonomy (T2) struck trumpet trombone tuba 4.1 Metho overview Here, each soun segment (cf. Section 3.1) is represente by vector of length p = 164 where each value correspons to a escriptor (see Table 2). The training step of this metho (illustrate in Figure 2) aims at moeling each timbre class using the best projection space for classification. A features selection algorithm is first applie to efficiently reuce the number of escriptors to avoi statistical overlearning. The classification space is compute using iscriminant analysis which consists in estimating optimal weights over the escriptors allowing the best iscrimination between timbre classes. Thus, the classification task consists in projecting an input soun into the best classification space an to select the most probable timbre class using the learne moel. input soun Figure 1. Taxonomies use for the automatic classification of musical instruments as propose by Hornbostel an Sachs taxonomy in [16] (a) an Peeters in [13] (b). 3.2 Classification taxonomy In this stuy, we use two atabases which can be annotate using ifferent taxonomies. Due to its iversity, the CREM atabase was only annotate using the Hornbostel an Sachs taxonomy [16] (T1) illustrate in Figure 1(a) which is wiely use in ethnomusicology. This hierarchical taxonomy is general enough to classify uncommon instruments (e.g. struck bamboo) an conveys information about soun prouction materials an playing styles. From an another han, the Iowa musical instruments atabase [5] use in our experiments was initially annotate using a musician s instrument taxonomy (T2) as propose in [13] an illustrate in Figure 1(b). This atabase is compose of common western pitche instruments which can easily be annotate using T1 as escribe in Table 3. One can notice that the Iowa atabase is only compose of aerophones an corophones instruments. If we consier the playing style, only 4 classes are represente if we apply T1 taxonomy to the Iowa atabase. T1 class name T2 equivalence Duration (s) # aero-blowe ree/flute an brass 5,951 668 coro-struck struck strings 5,564 646 coro-plucke plucke strings 5,229 583 coro-bowe bowe strings 7,853 838 Total 24,597 2,735 Table 3. Content of the Iowa atabase using musician s instrument taxonomy (T2) an equivalence with the Hornbostel an Sachs taxonomy (T1). 4. AUTOMATIC INSTRUMENT TIMBRE CLASSIFICATION METHOD The escribe metho aims at estimating the corresponing taxonomy class name of a given input soun. features computation features selection (LDA, MI, IRMFSP) classification space computation (LDA) class moeling class affectation (annotate) Figure 2. Training step of the propose metho. 4.2 Linear iscriminant analysis The goal of Linear Discriminant Analysis (LDA) [1] is to fin the best projection or linear combination of all escriptors which maximizes the average istance between classes (inter-class istance) while minimizing istance between iniviuals from the same class (intra-class istance). This metho assumes that the class affectation of each iniviual is a priori known. Its principle can be escribe as follows. First consier the n p real matrix M where each row is a vector of escriptors associate to a soun (iniviual). We assume that each iniviual is a member of a unique class k [1, K]. Now we efine W as the intraclass variance-covariance matrix which can be estimate by: W = 1 n K n k W k, (1) k=1 where W k is the variance-covariance matrix compute from the n k p sub-matrix of M compose of the n k iniviuals inclue into the class k. We also efine B the inter-class variance-covariance matrix expresse as follows: B = 1 n K n k (µ k µ)(µ k µ) T, (2) k=1 297

where µ k correspons to the mean vector of class k an µ is the mean vector of the entire ataset. Accoring to [1], it can be shown that the eigenvectors of matrix D = (B + W ) 1 B solve this optimization problem. When the matrix A = (B + W ) is not invertible, a computational solution consists in using pseuoinverse of matrix A which can be calculate using A T (AA T ) 1. 4.3 Features selection algorithms Features selection aims at computing the optimal relevance of each escriptor which can be measure with a weight or a rank. The resulting escriptors subset has to be the most iscriminant as possible with the minimal reunancy. In this stuy, we investigate the three approaches escribe below. 4.3.1 LDA features selection The LDA metho etaile in Section 4.2 can also be use for selecting the most relevant features. In fact, the compute eigenvectors which correspon to linear combination of escriptors convey a relative weight applie to each escriptor. Thus, the significance (or weight) S of a escriptor can be compute using a summation over a efine range [1, R] of the eigenvectors of matrix D as follows: S = R v r,, (3) r=1 where v r, is the -th coefficient of the r-th eigenvector associate to the eigenvalues sorte by escening orer (i.e. r = 1 correspons to the maximal eigenvalue of matrix D). In our implementation, we fixe R = 8. 4.3.2 Mutual information Features selection algorithms aim at computing a subset of escriptors that conveys the maximal amount of information to moel classes. From a statistical point of view, if we consier classes an feature escriptors as realizations of ranom variables C an F. The relevance can be measure with the mutual information efine by: I(C, F ) = c P (c, f) P (c, f) P (c)p (f), (4) f where P (c) enotes the probability of C = c which can be estimate from the approximate probability ensity functions (pf) using a compute histogram. Accoring to Bayes theorem one can compute P (c, f) = P (f c)p (c) where P (f c) is the pf of the feature escriptor value f into class c. This metho can be improve using [2] by reucing simultaneously the reunancy by consiering the mutual information between previously selecte escriptors. 4.3.3 Inertia Ratio Maximisation using features space projection (IRMFSP) This algorithm was first propose in [11] to reuce the number of escriptors use by timbre classification methos. It consists in maximizing the relevance of the escriptors subset for the classification task while minimizing the reunancy between the selecte ones. This iterative metho (ι p) is compose of two steps. The first one selects at iteration ι the non-previously selecte escriptor which maximizes the ratio between inter-class inertia an the total inertia expresse as follow: ˆ (ι) = arg max K n k (µ,k µ )(µ,k µ ) T k=1, (5) n (f (ι),i µ )(f (ι),i µ ) T i=1 where f (ι),i enotes the value of escriptor [1, p] affecte to the iniviual i. µ,k an µ respectively enote the average value of escriptor into the class k an for the total ataset. The secon step of this algorithm aims at orthogonalizing the remaining ata for the next iteration as follows: ( ) f (ι+1) = f (ι) f (ι) g ˆ g ˆ ˆ (ι), (6) where f (ι) is the vector of the previously selecte escriptor ˆ (ι) for all the iniviuals of the entire ataset an ˆ g ˆ = f (ι) (ι) / f is its normalize form. ˆ ˆ 4.4 Class moeling an automatic classification Each instrument class is moele into the projecte classification space resulting from the application of LDA. Thus, each class can be represente by its gravity center ˆµ k which correspons to the vector of the average values of the projecte iniviuals which compose the class k. The classification ecision which affect a class ˆk to an input soun represente by a projecte vector ˆx is simply performe by minimizing the Eucliean istance with the gravity center of each class as follows: ˆk = arg min ˆµ k ˆx 2 k [1, K], (7) k where v 2 enotes the l 2 norm of vector v. Despite its simplicity, this metho seems to obtain goo results comparable with those of the literature [12]. 5. EXPERIMENTS AND RESULTS In this section we present the classification results obtaine using the propose metho escribe in Section 4. 5.1 Metho evaluation base on self atabase classification In this experiment, we evaluate the classification of each istinct atabase using ifferent taxonomies. We applie the 3-fol cross valiation methoology which consists in partitioning the atabase in 3 istinct ranom subsets compose with 33% of each class (no collision between sets). Thus, the automatic classification applie on each subset is base on training applie on the remaining 66% of the 298

atabase. Figure 5.1 compares the classification accuracy obtaine as a function of the number of use escriptors. The resulting confusion matrix of the CREM atabase using 20 auio escriptors is presente in Table 4 an shows an average classification accuracy of 80% where each instrument is well classifie with a minimal accuracy of 70% for the aerophones. These results are goo an seems comparable with those escribe in the literature [11] using the same number of escriptor. The most relevant feature escriptors (selecte among the top ten) estimate by the IRMSFP an use for the classification task are etaile in Table 7. This result reveals significant ifferences between the two atabases. As an example, harmonic escriptors are only iscriminative for the CREM atabase but not for the Iowa atabase. This may be explaine by the presence of membranophone in the CREM atabase which are not present in the Iowa atabase. Contrarily, spectral an perceptual escriptors seems more relevant for the Iowa atabase than for the CREM atabase. Some escriptors appear to be relevant for both atabase like the Spectral flatness (Sflat) an the ERB scale frame energy (ErbFErg) which escribe the spectral envelope of signal. aero c-struc c-pluc c-bowe i-pluc i-struc i-clink membr aero 70 3 9 5 7 5 c-struc 6 92 3 c-pluc 5 8 73 4 8 1 c-bowe 13 80 7 i-pluc 79 14 7 i-struc 9 2 5 2 79 4 i-clink 100 membr 11 17 72 Accuracy ratio Accuracy ratio 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Accuracy as a function of the number of escriptor [17 classes] LDA 0 0 20 40 60 80 100 120 140 160 number of escriptors 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 (a) Iowa atabase using T2 MI IRMFSP Accuracy as a function of the number of escriptor [4 classes] LDA 0 0 20 40 60 80 100 120 140 160 number of escriptors MI IRMFSP Table 4. Confusion matrix (expresse in percent of the souns of the original class liste on the left) of the CREM atabase using the 20 most relevant escriptors selecte by IRMSFP. 1 0.9 (b) Iowa atabase using T1 Accuracy as a function of the number of escriptor [8 classes] 0.8 5.2 Cross-atabase evaluation In this experiments (see Table 5), we merge the two atabases an we applie the 3-fol cross valiation metho base on the T1 taxonomy to evaluate the classification accuracy on both atabase. The resulting average accuracy is about 68% which is lower than the accuracy obtaine on the istinct classification of each atabase. The results of cross-atabase evaluation applie between atabases using the T1 taxonomy are presente in Table 6 an obtain a poor average accuracy of 30%. This seems to confirm our intuition that the Iowa atabase conveys insufficient information to istinguish the ifferent playing styles between the non-western corophones instruments of the CREM atabase. 6. CONCLUSION AND FUTURE WORKS We applie a computationally efficient automatic timbre classification metho which was successfully evaluate on an introuce iversifie atabase using an ethnomusicological taxonomy. This metho obtains goo classification results (> 80% of accuracy) for both evaluate atabases which are comparable to those of the literature. However, Accuracy ratio 0.7 0.6 0.5 0.4 0.3 0.2 0.1 LDA 0 0 20 40 60 80 100 120 140 160 number of escriptors (c) CREM atabase using T1 MI IRMFSP Figure 3. Comparison of the 3-fol cross valiation classification accuracy as a function of the number of optimally selecte escriptors. the cross-atabase evaluation shows that each atabase cannot be use to infer a classification to the other. This can be explaine by significant ifferences between these atabases. Interestingly, results on the merge atabase obtain an acceptable accuracy of about 70%. As shown in previous work [11], our experiments confirm the efficiency of IRMFSP algorithm for automatic features selection applie to timbre classification. The interpretation of the 299

aero c-struc c-pluc c-bowe i-pluc i-struc i-clink membr aero 74 14 5 3 2 1 c-struc 12 69 10 5 1 2 c-pluc 1 7 58 29 1 2 2 c-bowe 3 6 33 52 1 3 i-pluc 7 14 79 i-struc 2 2 4 11 2 51 30 i-clink 11 89 membr 6 17 78 Table 5. Confusion matrix (expresse in percent of the souns of the original class liste on the left) of the evaluate fusion between the CREM an the Iowa atabase using the 20 most relevant escriptors selecte by IRMSFP. aero c-struc c-pluc c-bowe aero 72 9 10 9 c-struc 12 12 34 42 c-pluc 23 47 28 3 c-bowe 28 34 24 14 Table 6. Confusion matrix (expresse in percent of the souns of the original class liste on the left) of the CREM atabase classification base on Iowa atabase training. CREM T1 Iowa T1 Iowa T2 CREM+Iowa T1 Eur AttSlp AttSlp AmpMo Acor Dec Acor Acor ZCR RMSenv Hev Hnois HTris3 Sflat SFErg Sflat Sflat ERoff SRoff SVar SSkew SKurt Scre ErbGKurt ErbKurt ErbSpr ErbFErg ErbFErg ErbFErg ErbRoff ErbRoff ErbSlp ErbGSpr ErbGCent Table 7. Comparison of the most relevant escriptors estimate by IRMFSP. most relevant selecte features shows a significant effect of the content of atabase rather than on the taxonomy. However the timbre moeling interpretation applie to timbre classification remains ifficult. Future works will consist in further investigating the role of escriptors by manually constraining selection before the classification process. 7. ACKNOWLEDGMENTS This research was partly supporte by the French ANR (Agence Nationale e la Recherche) DIADEMS (Description,Inexation, Acces aux Documents Ethnomusicologiques et Sonores) project (ANR-12-CORD-0022). 8. REFERENCES [1] T. W. Anerson. An Introuction to Multivariate Statistical Analysis. Wiley-Blackwell, New York, USA, 1958. [2] R. Battiti. Using mutual information for selecting features in supervise neural net learning. IEEE Trans. on Neural Networks, 5(4):537 550, Jul. 1994. [3] E.Ambikairajah, J. Epps, an L. Lin. Wieban speech an auio coing using gammatone filter banks. In Proc. IEEE ICASSP 01, volume 2, pages 773 776, 2001. [4] N. F. Fletcher an T. D. Rossing. The Physics of Musical Instruments. Springer-Verlag, 1998. [5] L. Fritts. Musical instrument samples. Univ. Iowa Electronic Music Stuios, 1997. [Online]. Available: http://theremin.music.uiowa.eu/mis.html. [6] M. Goto, H. Hashiguchi, T. Nishimura, an R. Oka. Rwc music atabase: Music genre atabase an musical instrument soun atabase. In Proc. ISMIR, pages 229 230, Oct. 2003. [7] J. M. Grey an J. W. Goron. Perceptual effects of spctral moifications on musical timbre. Journal of Acoustic Society of America (JASA), 5(63):1493 1500, 1978. [8] S. McAams, S. Winsberg, S. Donnaieu, G. Soete, an J. Krimphoff. Perceptual scaling of synthesize musical timbres: Common imensions, specificities, an latent subject classes. Psychological Research, 58(3):177 192, 1995. [9] N. Misariis, K. Bennett, D. Pressnitzer, P. Susini, an S. McAams. Valiation of a multiimensional istance moel for perceptual issimilarities among musical timbres. In Proc. ICA & ASA, volume 103, Seattle, USA, Jun. 1998. [10] B.C.J. Moore an B.R. Glasberg. Suggeste formulae for calculating auitory-filter banwiths an excitation patterns. Journal of the Acoustical Society of America, 74:750 753, 1983. [11] G. Peeters. Automatic classification of large musical instrument atabases using hierarchical classifiers with intertia ratio maximization. In 115th convention of AES, New York, USA, Oct. 2003. [12] G. Peeters, B. Giorano, P. Susini, N. Misariis, an S. McAams. The timbre toolbox: Auio escriptors of musical signals. Journal of Acoustic Society of America (JASA), 5(130):2902 2916, Nov. 2011. [13] G. Peeters an X. Roet. Automatically selecting signal escriptors for soun classification. In Proc. ICMC, Göteborg, Sween, 2002. [14] E. Schubert, J. Wolfe, an A. Tarnopolsky. Spectral centroi an timbre in complex, multiple instrumental textures. In Proc. 8th Int. Conf. on Music Perception & Cognition (ICMPC), Evanston, Aug. 2004. [15] G. Torelli an G. Caironi. New polyphonic soun generator chip with integrate microprocessorprogrammable asr envelope shaper. IEEE Trans. on Consumer Electronics, CE-29(3):203 212, 1983. [16] E. v. Hornbostel an C. Sachs. The classification of musical instruments. Galpin Society Journal, 3(25):3 29, 1961. 300