AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS

AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine LaBRI - CNRS UMR 5800 - University of Boreaux {fourer, rouas, hanna, robine}@labri.fr ABSTRACT Automatic timbre characterization of auio signals can help to measure similarities between souns an is of interest for automatic or semi-automatic atabases inexing. The most effective methos use machine learning approaches which require qualitative an iversifie training atabases to obtain accurate results. In this paper, we introuce a iversifie atabase compose of worlwie nonwestern instruments auio recorings on which is evaluate an effective timbre classification metho. A comparative evaluation base on the well stuie Iowa musical instruments atabase shows results comparable with those of state-of-the-art methos. Thus, the propose metho offers a practical solution for automatic ethnomusicological inexing of a atabase compose of iversifie souns with various quality. The relevance of auio features for the timbre characterization is also iscusse in the context of non-western instruments analysis. 1. INTRODUCTION Characterizing musical timbre perception remains a challenging task relate to the human auitory mechanism an to the physics of musical instruments [4]. This task is full of interest for many applications like automatic atabase inexing, measuring similarities between souns or for automatic soun recognition. Existing psychoacoustical stuies moel the timbre as a multiimensional phenomenon inepenent from musical parameters (e.g. pitch, uration or louness) [7, 8]. A quantitative interpretation of instrument s timbre base on acoustic features compute from auio signals was first propose in [9] an pursue in more recent stuies [12] which aim at organizing auio timbre escriptors efficiently. Nowaays, effective automatic timbre classification methos [13] use supervise statistical learning approaches base on auio signals features compute from analyze ata. Thus, the performance obtaine with such systems epens on the taxonomy, the size an the iversity of training atabases. However, most c Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine. License uner a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine. Automatic timbre classification of ethnomusicological auio recorings, 15th International Society for Music Information Retrieval Conference, 2014. of existing research atabases (e.g. RWC [6], Iowa [5]) are only compose of common western instruments annotate with specific taxonomies. In this work, we revisit the automatic instrument classification problem from an ethnomusicological point of view by introucing a iversifie an manually annotate research atabase provie by the Centre e Recherche en Ethno-Musicologie (CREM). This atabase is aily supplie by researchers an has the particularity of being compose of uncommon non-western musical instrument recorings from aroun the worl. This work is motivate by practical applications to automatic inexing of online auio recorings atabase which have to be computationally efficient while proviing accurate results. Thus, we aim at valiating the efficiency an the robustness of the statistical learning approach using a constraine stanar taxonomy, applie to recorings of various quality. In this stuy, we expect to show the atabase influence, the relevance of timbre auio features an the choice of taxonomy for the automatic instrument classification process. A result comparison an a crossatabase evaluation is performe using the well-stuie university of Iowa musical instrument atabase. This paper is organize as follows. The CREM atabase is introuce in Section 2. The timbre quantization principle base on mathematical functions escribing auio features is presente in Section 3. An efficient timbre classification metho is escribe in Section 4. Experiments an results base on the propose metho are etaile in Section 5. Conclusion an future works are finally iscusse in Section 6. 2. THE CREM ETHNOMUSICOLOGICAL DATABASE The CREM research atabase 1 is compose of iversifie soun samples irectly recore by ethnomusicologists in various conitions (i.e. no recoring stuio) an from iversifie places all aroun the worl. It contains more than 7000 hours of auio ata recore since 1932 to nowaays using ifferent supports like magnetic tapes or vinyl iscs. The vintage auio recorings of the atabase were carefully igitize to preserve the authenticity of the originals an contain various environment noise. The more recent auio recorings can be irectly igital recore with a high-quality. Most of the musical instruments which com- 1 CREM auio archives freely available online at: http://archives.crem-cnrs.fr/ 295

pose this atabase are non-western an can be uncommon while covering a large range of musical instrument families (see Figure 1(a)). Among uncommon instruments, one can fin the lute or the Ngbaka harp as corophones. More uncommon instruments like Oscillating bamboo, struck machete an struck girer were classifie by ethnomusicologists as iiophones. In this paper, we restricte our stuy to the solo excerpts (where only one monophonic or polyphonic instrument is active) to reuce the interference problems which may occur uring auio analysis. A escription of the selecte CREM sub-atabase is presente in Table 1. Accoring to this table, one can observe that this atabase is actually inhomogeneous. The aerophones are overrepresente while membranophones are unerrepresente. Due to its iversity an the various quality of the composing souns, the automatic ethnomusicological classification of this atabase may appear as challenging. Class name Duration (s) # aerophones-blowe 1,383 146 corophones-struck 357 37 corophones-plucke 715 1,229 75 128 corophones-bowe 157 16 iiophones-struck 522 58 iiophones-plucke 137 753 14 82 iiophones-clinke 94 10 membranophones-struck 170 19 Total 3,535 375 Table 1. Content of the CREM sub-atabase with uration an number of 10-secons segmente excerpts. 3. TIMBRE QUANTIZATION AND CLASSIFICATION 3.1 Timbre quantization Since preliminaries works on the timbre escription of perceive souns, Peeters et al. propose in [12] a large set of auio features escriptors which can be compute from auio signals. The auio escriptors efine numerical functions which aim at proviing cues about specific acoustic features (e.g. brightness is often associate with the spectral centroi accoring to [14]). Thus, the auio escriptors can be organize as follows: Temporal escriptors convey information about the time evolution of a signal (e.g. log attack time, temporal increase, zero-crossing rate, etc.). Harmonic escriptors are compute from the etecte pitch events associate with a funamental frequency (F 0 ). Thus, one can use a prior waveform moel of quasi-harmonic souns which have an equally space Dirac comb shape in the magnitue spectrum. The tonal part of souns can be isolate from signal mixture an be escribe (e.g. noisiness, inharmonicity, etc.). Spectral escriptors are compute from signal timefrequency representation (e.g. Short-Term Fourier Transform) without prior waveform moel (e.g. spectral centroi, spectral ecrease, etc.) Perceptual escriptors are compute from auitoryfiltere banwith versions of signals which aim at approximating the human perception of souns. This can be efficiently compute using Equivalent Rectangular Banwith (ERB) scale [10] which can be combine with gammatone filter-bank [3] (e.g. louness, ERB spectral centroi, etc.) In this stuy, we focus on the soun escriptors liste in table 2 which can be estimate using the timbre toolbox 2 an etaile in [12]. All escriptors are compute for each analyze soun excerpt an may return null values. The harmonic escriptors of polyphonic souns are compute using the prominent etecte F 0 caniate (single F 0 estimation). To normalize the uration of analyze soun, we separate each excerpt in 10-secons length segments without istinction of silence or pitch events. Thus, each segment is represente by a real vector where the corresponing time series of each escriptor is summarize by a statistic. The meian an the Inter Quartile Range (IQR) statistics were chosen for their robustness to outliers. Acronym Descriptor name # Att Attack uration (see ADSR moel [15]) 1 AttSlp Attack slope (ADSR) 1 Dec Decay uration (ADSR) 1 DecSlp Decay slope (ADSR) 1 Rel Release uration (ADSR) 1 LAT Log Attack Time 1 Tcent Temporal centroi 1 Eur Effective uration 1 FreqMo, AmpMo Total energy moulation (frequency,amplitue) 2 RMSenv RMS envelope 2 ACor Signal Auto-Correlation function (12 first coef.) 24 ZCR Zero-Crossing Rate 2 HCent Harmonic spectral centroi 2 HSpr Harmonic spectral sprea 2 HSkew Harmonic skewness 2 HKurt Harmonic kurtosis 2 HSlp Harmonic slope 2 HDec Harmonic ecrease 2 HRoff Harmonic rolloff 2 HVar Harmonic variation 2 HErg, HNErg, HFErg, Harmonic energy, noise energy an frame energy 6 HNois Noisiness 2 HF0 Funamental frequency F 0 2 HinH Inharmonicity 2 HTris Harmonic tristimulus 6 HoevR Harmonic o to even partials ratio 2 Hev Harmonic eviation 2 SCent, ECent Spectral centroi of the magnitue an energy spectrum 4 SSpr, ESpr Spectral sprea of the magnitue an energy spectrum 4 SSkew, ESkew Spectral skewness of the magnitue an energy spectrum 4 SKurt, EKurt Spectral kurtosis of the magnitue an energy spectrum 4 SSlp, ESlp Spectral slope of the magnitue an energy spectrum 4 SDec, EDec Spectral ecrease of the magnitue an energy spectrum 4 SRoff, ERoff Spectral rolloff of the magnitue an energy spectrum 4 SVar, EVar Spectral variation of the magnitue an energy spectrum 4 SFErg, EFErg Spectral frame energy of the magnitue an energy spectrum 4 Sflat, ESflat Spectral flatness of the magnitue an energy spectrum 4 Scre, EScre Spectral crest of the magnitue an energy spectrum 4 ErbCent, ErbGCent ERB scale magnitue spectrogram / gammatone centroi 4 ErbSpr, ErbGSpr ERB scale magnitue spectrogram / gammatone sprea 4 ErbSkew, ErbGSkew ERB scale magnitue spectrogram / gammatone skewness 4 ErbKurt, ErbGKurt ERB scale magnitue spectrogram / gammatone kurtosis 4 ErbSlp, ErbGSlp ERB scale magnitue spectrogram / gammatone slope 4 ErbDec, ErbGDec ERB scale magnitue spectrogram / gammatone ecrease 4 ErbRoff, ErbGRoff ERB scale magnitue spectrogram / gammatone rolloff 4 ErbVar, ErbGVar ERB scale magnitue spectrogram / gammatone variation 4 ErbFErg, ErbGFErg ERB scale magnitue spectrogram / gammatone frame energy 4 ErbSflat, ErbGSflat ERB scale magnitue spectrogram / gammatone flatness 4 ErbScre, ErbGScre ERB scale magnitue spectrogram / gammatone crest 4 Total 164 Table 2. Acronym, name an number of the use timbre escriptors. 2 MATLAB coe available at http://www.cirmmt.org/research/tools 296

aerophones blowe corophones instrument iiophones bowe plucke struck plucke struck pizzicato (a) Hornbostel an Sachs taxonomy (T1) instrument sustaine clinke membranophones struck strings plucke strings bowe strings flute/rees brass piano violin viola cello oublebass violin viola cello oublebass flute clarinet oboe saxophone bassoon (b) Musician s instrument taxonomy (T2) struck trumpet trombone tuba 4.1 Metho overview Here, each soun segment (cf. Section 3.1) is represente by vector of length p = 164 where each value correspons to a escriptor (see Table 2). The training step of this metho (illustrate in Figure 2) aims at moeling each timbre class using the best projection space for classification. A features selection algorithm is first applie to efficiently reuce the number of escriptors to avoi statistical overlearning. The classification space is compute using iscriminant analysis which consists in estimating optimal weights over the escriptors allowing the best iscrimination between timbre classes. Thus, the classification task consists in projecting an input soun into the best classification space an to select the most probable timbre class using the learne moel. input soun Figure 1. Taxonomies use for the automatic classification of musical instruments as propose by Hornbostel an Sachs taxonomy in [16] (a) an Peeters in [13] (b). 3.2 Classification taxonomy In this stuy, we use two atabases which can be annotate using ifferent taxonomies. Due to its iversity, the CREM atabase was only annotate using the Hornbostel an Sachs taxonomy [16] (T1) illustrate in Figure 1(a) which is wiely use in ethnomusicology. This hierarchical taxonomy is general enough to classify uncommon instruments (e.g. struck bamboo) an conveys information about soun prouction materials an playing styles. From an another han, the Iowa musical instruments atabase [5] use in our experiments was initially annotate using a musician s instrument taxonomy (T2) as propose in [13] an illustrate in Figure 1(b). This atabase is compose of common western pitche instruments which can easily be annotate using T1 as escribe in Table 3. One can notice that the Iowa atabase is only compose of aerophones an corophones instruments. If we consier the playing style, only 4 classes are represente if we apply T1 taxonomy to the Iowa atabase. T1 class name T2 equivalence Duration (s) # aero-blowe ree/flute an brass 5,951 668 coro-struck struck strings 5,564 646 coro-plucke plucke strings 5,229 583 coro-bowe bowe strings 7,853 838 Total 24,597 2,735 Table 3. Content of the Iowa atabase using musician s instrument taxonomy (T2) an equivalence with the Hornbostel an Sachs taxonomy (T1). 4. AUTOMATIC INSTRUMENT TIMBRE CLASSIFICATION METHOD The escribe metho aims at estimating the corresponing taxonomy class name of a given input soun. features computation features selection (LDA, MI, IRMFSP) classification space computation (LDA) class moeling class affectation (annotate) Figure 2. Training step of the propose metho. 4.2 Linear iscriminant analysis The goal of Linear Discriminant Analysis (LDA) [1] is to fin the best projection or linear combination of all escriptors which maximizes the average istance between classes (inter-class istance) while minimizing istance between iniviuals from the same class (intra-class istance). This metho assumes that the class affectation of each iniviual is a priori known. Its principle can be escribe as follows. First consier the n p real matrix M where each row is a vector of escriptors associate to a soun (iniviual). We assume that each iniviual is a member of a unique class k [1, K]. Now we efine W as the intraclass variance-covariance matrix which can be estimate by: W = 1 n K n k W k, (1) k=1 where W k is the variance-covariance matrix compute from the n k p sub-matrix of M compose of the n k iniviuals inclue into the class k. We also efine B the inter-class variance-covariance matrix expresse as follows: B = 1 n K n k (µ k µ)(µ k µ) T, (2) k=1 297

where µ k correspons to the mean vector of class k an µ is the mean vector of the entire ataset. Accoring to [1], it can be shown that the eigenvectors of matrix D = (B + W ) 1 B solve this optimization problem. When the matrix A = (B + W ) is not invertible, a computational solution consists in using pseuoinverse of matrix A which can be calculate using A T (AA T ) 1. 4.3 Features selection algorithms Features selection aims at computing the optimal relevance of each escriptor which can be measure with a weight or a rank. The resulting escriptors subset has to be the most iscriminant as possible with the minimal reunancy. In this stuy, we investigate the three approaches escribe below. 4.3.1 LDA features selection The LDA metho etaile in Section 4.2 can also be use for selecting the most relevant features. In fact, the compute eigenvectors which correspon to linear combination of escriptors convey a relative weight applie to each escriptor. Thus, the significance (or weight) S of a escriptor can be compute using a summation over a efine range [1, R] of the eigenvectors of matrix D as follows: S = R v r,, (3) r=1 where v r, is the -th coefficient of the r-th eigenvector associate to the eigenvalues sorte by escening orer (i.e. r = 1 correspons to the maximal eigenvalue of matrix D). In our implementation, we fixe R = 8. 4.3.2 Mutual information Features selection algorithms aim at computing a subset of escriptors that conveys the maximal amount of information to moel classes. From a statistical point of view, if we consier classes an feature escriptors as realizations of ranom variables C an F. The relevance can be measure with the mutual information efine by: I(C, F ) = c P (c, f) P (c, f) P (c)p (f), (4) f where P (c) enotes the probability of C = c which can be estimate from the approximate probability ensity functions (pf) using a compute histogram. Accoring to Bayes theorem one can compute P (c, f) = P (f c)p (c) where P (f c) is the pf of the feature escriptor value f into class c. This metho can be improve using [2] by reucing simultaneously the reunancy by consiering the mutual information between previously selecte escriptors. 4.3.3 Inertia Ratio Maximisation using features space projection (IRMFSP) This algorithm was first propose in [11] to reuce the number of escriptors use by timbre classification methos. It consists in maximizing the relevance of the escriptors subset for the classification task while minimizing the reunancy between the selecte ones. This iterative metho (ι p) is compose of two steps. The first one selects at iteration ι the non-previously selecte escriptor which maximizes the ratio between inter-class inertia an the total inertia expresse as follow: ˆ (ι) = arg max K n k (µ,k µ )(µ,k µ ) T k=1, (5) n (f (ι),i µ )(f (ι),i µ ) T i=1 where f (ι),i enotes the value of escriptor [1, p] affecte to the iniviual i. µ,k an µ respectively enote the average value of escriptor into the class k an for the total ataset. The secon step of this algorithm aims at orthogonalizing the remaining ata for the next iteration as follows: ( ) f (ι+1) = f (ι) f (ι) g ˆ g ˆ ˆ (ι), (6) where f (ι) is the vector of the previously selecte escriptor ˆ (ι) for all the iniviuals of the entire ataset an ˆ g ˆ = f (ι) (ι) / f is its normalize form. ˆ ˆ 4.4 Class moeling an automatic classification Each instrument class is moele into the projecte classification space resulting from the application of LDA. Thus, each class can be represente by its gravity center ˆµ k which correspons to the vector of the average values of the projecte iniviuals which compose the class k. The classification ecision which affect a class ˆk to an input soun represente by a projecte vector ˆx is simply performe by minimizing the Eucliean istance with the gravity center of each class as follows: ˆk = arg min ˆµ k ˆx 2 k [1, K], (7) k where v 2 enotes the l 2 norm of vector v. Despite its simplicity, this metho seems to obtain goo results comparable with those of the literature [12]. 5. EXPERIMENTS AND RESULTS In this section we present the classification results obtaine using the propose metho escribe in Section 4. 5.1 Metho evaluation base on self atabase classification In this experiment, we evaluate the classification of each istinct atabase using ifferent taxonomies. We applie the 3-fol cross valiation methoology which consists in partitioning the atabase in 3 istinct ranom subsets compose with 33% of each class (no collision between sets). Thus, the automatic classification applie on each subset is base on training applie on the remaining 66% of the 298

atabase. Figure 5.1 compares the classification accuracy obtaine as a function of the number of use escriptors. The resulting confusion matrix of the CREM atabase using 20 auio escriptors is presente in Table 4 an shows an average classification accuracy of 80% where each instrument is well classifie with a minimal accuracy of 70% for the aerophones. These results are goo an seems comparable with those escribe in the literature [11] using the same number of escriptor. The most relevant feature escriptors (selecte among the top ten) estimate by the IRMSFP an use for the classification task are etaile in Table 7. This result reveals significant ifferences between the two atabases. As an example, harmonic escriptors are only iscriminative for the CREM atabase but not for the Iowa atabase. This may be explaine by the presence of membranophone in the CREM atabase which are not present in the Iowa atabase. Contrarily, spectral an perceptual escriptors seems more relevant for the Iowa atabase than for the CREM atabase. Some escriptors appear to be relevant for both atabase like the Spectral flatness (Sflat) an the ERB scale frame energy (ErbFErg) which escribe the spectral envelope of signal. aero c-struc c-pluc c-bowe i-pluc i-struc i-clink membr aero 70 3 9 5 7 5 c-struc 6 92 3 c-pluc 5 8 73 4 8 1 c-bowe 13 80 7 i-pluc 79 14 7 i-struc 9 2 5 2 79 4 i-clink 100 membr 11 17 72 Accuracy ratio Accuracy ratio 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Accuracy as a function of the number of escriptor [17 classes] LDA 0 0 20 40 60 80 100 120 140 160 number of escriptors 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 (a) Iowa atabase using T2 MI IRMFSP Accuracy as a function of the number of escriptor [4 classes] LDA 0 0 20 40 60 80 100 120 140 160 number of escriptors MI IRMFSP Table 4. Confusion matrix (expresse in percent of the souns of the original class liste on the left) of the CREM atabase using the 20 most relevant escriptors selecte by IRMSFP. 1 0.9 (b) Iowa atabase using T1 Accuracy as a function of the number of escriptor [8 classes] 0.8 5.2 Cross-atabase evaluation In this experiments (see Table 5), we merge the two atabases an we applie the 3-fol cross valiation metho base on the T1 taxonomy to evaluate the classification accuracy on both atabase. The resulting average accuracy is about 68% which is lower than the accuracy obtaine on the istinct classification of each atabase. The results of cross-atabase evaluation applie between atabases using the T1 taxonomy are presente in Table 6 an obtain a poor average accuracy of 30%. This seems to confirm our intuition that the Iowa atabase conveys insufficient information to istinguish the ifferent playing styles between the non-western corophones instruments of the CREM atabase. 6. CONCLUSION AND FUTURE WORKS We applie a computationally efficient automatic timbre classification metho which was successfully evaluate on an introuce iversifie atabase using an ethnomusicological taxonomy. This metho obtains goo classification results (> 80% of accuracy) for both evaluate atabases which are comparable to those of the literature. However, Accuracy ratio 0.7 0.6 0.5 0.4 0.3 0.2 0.1 LDA 0 0 20 40 60 80 100 120 140 160 number of escriptors (c) CREM atabase using T1 MI IRMFSP Figure 3. Comparison of the 3-fol cross valiation classification accuracy as a function of the number of optimally selecte escriptors. the cross-atabase evaluation shows that each atabase cannot be use to infer a classification to the other. This can be explaine by significant ifferences between these atabases. Interestingly, results on the merge atabase obtain an acceptable accuracy of about 70%. As shown in previous work [11], our experiments confirm the efficiency of IRMFSP algorithm for automatic features selection applie to timbre classification. The interpretation of the 299

aero c-struc c-pluc c-bowe i-pluc i-struc i-clink membr aero 74 14 5 3 2 1 c-struc 12 69 10 5 1 2 c-pluc 1 7 58 29 1 2 2 c-bowe 3 6 33 52 1 3 i-pluc 7 14 79 i-struc 2 2 4 11 2 51 30 i-clink 11 89 membr 6 17 78 Table 5. Confusion matrix (expresse in percent of the souns of the original class liste on the left) of the evaluate fusion between the CREM an the Iowa atabase using the 20 most relevant escriptors selecte by IRMSFP. aero c-struc c-pluc c-bowe aero 72 9 10 9 c-struc 12 12 34 42 c-pluc 23 47 28 3 c-bowe 28 34 24 14 Table 6. Confusion matrix (expresse in percent of the souns of the original class liste on the left) of the CREM atabase classification base on Iowa atabase training. CREM T1 Iowa T1 Iowa T2 CREM+Iowa T1 Eur AttSlp AttSlp AmpMo Acor Dec Acor Acor ZCR RMSenv Hev Hnois HTris3 Sflat SFErg Sflat Sflat ERoff SRoff SVar SSkew SKurt Scre ErbGKurt ErbKurt ErbSpr ErbFErg ErbFErg ErbFErg ErbRoff ErbRoff ErbSlp ErbGSpr ErbGCent Table 7. Comparison of the most relevant escriptors estimate by IRMFSP. most relevant selecte features shows a significant effect of the content of atabase rather than on the taxonomy. However the timbre moeling interpretation applie to timbre classification remains ifficult. Future works will consist in further investigating the role of escriptors by manually constraining selection before the classification process. 7. ACKNOWLEDGMENTS This research was partly supporte by the French ANR (Agence Nationale e la Recherche) DIADEMS (Description,Inexation, Acces aux Documents Ethnomusicologiques et Sonores) project (ANR-12-CORD-0022). 8. REFERENCES [1] T. W. Anerson. An Introuction to Multivariate Statistical Analysis. Wiley-Blackwell, New York, USA, 1958. [2] R. Battiti. Using mutual information for selecting features in supervise neural net learning. IEEE Trans. on Neural Networks, 5(4):537 550, Jul. 1994. [3] E.Ambikairajah, J. Epps, an L. Lin. Wieban speech an auio coing using gammatone filter banks. In Proc. IEEE ICASSP 01, volume 2, pages 773 776, 2001. [4] N. F. Fletcher an T. D. Rossing. The Physics of Musical Instruments. Springer-Verlag, 1998. [5] L. Fritts. Musical instrument samples. Univ. Iowa Electronic Music Stuios, 1997. [Online]. Available: http://theremin.music.uiowa.eu/mis.html. [6] M. Goto, H. Hashiguchi, T. Nishimura, an R. Oka. Rwc music atabase: Music genre atabase an musical instrument soun atabase. In Proc. ISMIR, pages 229 230, Oct. 2003. [7] J. M. Grey an J. W. Goron. Perceptual effects of spctral moifications on musical timbre. Journal of Acoustic Society of America (JASA), 5(63):1493 1500, 1978. [8] S. McAams, S. Winsberg, S. Donnaieu, G. Soete, an J. Krimphoff. Perceptual scaling of synthesize musical timbres: Common imensions, specificities, an latent subject classes. Psychological Research, 58(3):177 192, 1995. [9] N. Misariis, K. Bennett, D. Pressnitzer, P. Susini, an S. McAams. Valiation of a multiimensional istance moel for perceptual issimilarities among musical timbres. In Proc. ICA & ASA, volume 103, Seattle, USA, Jun. 1998. [10] B.C.J. Moore an B.R. Glasberg. Suggeste formulae for calculating auitory-filter banwiths an excitation patterns. Journal of the Acoustical Society of America, 74:750 753, 1983. [11] G. Peeters. Automatic classification of large musical instrument atabases using hierarchical classifiers with intertia ratio maximization. In 115th convention of AES, New York, USA, Oct. 2003. [12] G. Peeters, B. Giorano, P. Susini, N. Misariis, an S. McAams. The timbre toolbox: Auio escriptors of musical signals. Journal of Acoustic Society of America (JASA), 5(130):2902 2916, Nov. 2011. [13] G. Peeters an X. Roet. Automatically selecting signal escriptors for soun classification. In Proc. ICMC, Göteborg, Sween, 2002. [14] E. Schubert, J. Wolfe, an A. Tarnopolsky. Spectral centroi an timbre in complex, multiple instrumental textures. In Proc. 8th Int. Conf. on Music Perception & Cognition (ICMPC), Evanston, Aug. 2004. [15] G. Torelli an G. Caironi. New polyphonic soun generator chip with integrate microprocessorprogrammable asr envelope shaper. IEEE Trans. on Consumer Electronics, CE-29(3):203 212, 1983. [16] E. v. Hornbostel an C. Sachs. The classification of musical instruments. Galpin Society Journal, 3(25):3 29, 1961. 300