Music Performer Recognition Using an Ensemble of Simple Classifiers

Similar documents
Comparative Study of Word Alignment Heuristics and Phrase-Based SMT

Handout #5. Introduction to the Design of Experiments (DOX) (Reading: FCDAE, Chapter 1~3)

A Genetic Programming Framework for Error Recovery in Robotic Assembly Systems

Real-time Scheduling of Flexible Manufacturing Systems using Support Vector Machines and Neural Networks

Exploiting the Marginal Profits of Constraints with Evolutionary Multi-objective Optimization Techniques

Use the template below as a guide for organizing the text of your story.

Recognizing Names in Biomedical Texts using Hidden Markov Model and SVM plus Sigmoid

A Realistic E-Learning System based on Mixed Reality

Cost Control of the Transmission Congestion Management in Electricity Systems Based on Ant Colony Algorithm

Rank Inclusion in Criteria Hierarchies

Object Modeling for Multicamera Correspondence Using Fuzzy Region Color Adjacency Graphs

11 Hybrid Cables. n f Hz. kva i P. Hybrid Cables Description INFORMATION Description

Scheme For Finding The Next Term Of A Sequence Based On Evolution {File Closing Version 4}. ISSN

A BROADCASTING PROTOCOL FOR COMPRESSED VIDEO

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

Minimum Penalized Hellinger Distance for Model Selection in Small Samples

A Computational Model for Discriminating Music Performers

Following a musical performance from a partially specified score.

Positive-living skills for children aged 3 to 6

Part II: Derivation of the rules of voice-leading. The Goal. Some Abbreviations

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

Technical Information

Statistics AGAIN? Descriptives

Chapter 7 Registers and Register Transfers

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

Line numbering and synchronization in digital HDTV systems

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

Heterogeneous Talent and Optimal Emigration 1

Motivation. Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

QUICK START GUIDE v0.98

Read Only Memory (ROM)

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

Polychrome Devices Reference Manual

A STUDY OF TRUMPET ENVELOPES

Image Intensifier Reference Manual

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

Appendix A. Quarter-Tone Note Names

Daniel R. Dehaan Three Études For Solo Voice Summer 2010, Chicago

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

Analysis of Subscription Demand for Pay-TV

References and quotations

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

A. Flue Pipes. 2. Open Pipes. = n. Musical Instruments. Instruments. A. Flue Pipes B. Flutes C. Reeds D. References

Modeling Form for On-line Following of Musical Performances

current activity shows on the top right corner in green. The steps appear in yellow

V (D) i (gm) Except for 56-7,63-8 Flute and Oboe are the same. Orchestration will only list Fl for space purposes

US B2. ( *) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.c. 154(b) by 0 days.

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

A Comparative Analysis of Disk Scheduling Policies

T-25e, T-39 & T-66. G657 fibres and how to splice them. TA036DO th June 2011

A Model of Metric Coherence

the who Produced by Alfred Music P.O. Box Van Nuys, CA alfred.com Printed in USA. ISBN-10: ISBN-13:

PIANO SYLLABUS SPECIFICATION. Also suitable for Keyboards Edition

LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

Practice Guide Sonata in F Minor, Op. 2, No. 1, I. Allegro Ludwig van Beethoven

Detecting Errors in Blood-Gas Measurement by Analysiswith Two Instruments

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

Expressive Musical Timing

Background Manuscript Music Data Results... sort of Acknowledgments. Suite, Suite Phylogenetics. Michael Charleston and Zoltán Szabó

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

INSTRUCTION MANUAL FOR THE INSTALLATION, USE AND MAINTENANCE OF THE REGULATOR GENIUS POWER COMBI

Product Information. Universal swivel units SRU-plus

AREA (SQ. FT.) BREAKDOWN: 1. SALES AREA: 2. ENTRY VESTIBULE (EXT.): 3. SERVICE: 4. TOILET ROOM: 5. OFFICE: 6. STAIRWAY/REAR EXIT: 7.

arxiv: v1 [cs.cl] 12 Sep 2018

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

Product Bulletin 40C 40C-10R 40C-20R 40C-114R. Product Description For Solvent, Eco-Solvent, UV and Latex Inkjet and Screen Printing 3-mil vinyl films

Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms

Loewe bild 7.65 OLED. Set-up options. Loewe bild 7 cover Incl. Back cover. Loewe bild 7 cover kit Incl. Back cover and Speaker cover

Color Monitor. L200p. English. User s Guide

THE Internet of Things (IoT) is likely to be incorporated

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

COMMITTEE ON THE HISTORY OF THE FEDERAL RESERVE SYSTEM. Register of Papers CHARLES SUMNER HAMLIM ( )

S Micro--Strip Tool in. S Combination Strip Tool ( ) S Cable Holder Assembly (Used only

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Craig Webre, Sheriff Personnel Division/Law Enforcement Complex 1300 Lynn Street Thibodaux, Louisiana 70301

The Communication Method of Distance Education System and Sound Control Characteristics

Five Rounds. by Peter Billam. Peter J Billam, 1986

Guide to condition reports for domestic electrical installations

Product Information. Manual change system HWS

BOUND FOR SOUTH AUSTRALIA

Video Cassette Recorder

Supply Quantitative Model à la Leontief *

RHYTHM TRANSCRIPTION OF POLYPHONIC MIDI PERFORMANCES BASED ON A MERGED-OUTPUT HMM FOR MULTIPLE VOICES

BesTrans AOC (Active Optical Cable) Spec and Manual

Product Information. Manual change system HWS

Discussion Paper Series

MODELLING PERCEPTION OF SPEED IN MUSIC AUDIO

RELIABILITY EVALUATION OF REPAIRABLE COMPLEX SYSTEMS AN ANALYZING FAILURE DATA

2 Specialty Application Photoelectric Sensors

T541 Flat Panel Monitor User Guide ENGLISH

A Quantization-Friendly Separable Convolution for MobileNets

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

Transcription:

Musc Performer Recogto Usg a Esemble of Smple Classfers Efstathos Stamatatos 1 ad Gerhard Wdmer 2 Abstract. Ths. paper addresses the problem of detfyg the most lkely musc performer, gve a set of performaces of the same pece by a umber of sklled caddate pasts. We propose a set of features for represetg the stylstc characterstcs of a musc performer. A database of pao performaces of 22 pasts playg two peces by F. Chop s used the preseted expermets. Due to the lmtatos of the trag set sze ad the characterstcs of the put features we propose a esemble of smple classfers derved by both subsamplg the trag set ad subsamplg the put features. Prelmary expermets show that the resultg esemble s able to effcetly cope wth ths dffcult muscal task, dsplayg a level of accuracy ulkely to be matched by huma lsteers (uder smlar codtos). 1 INTRODUCTION The represetato of musc as gve the prted score s ot able to capture every muscal uace. Hece, a pece played exactly as otated the prted score would soud mechacal. Expressve musc performace s the terpretato of a pece of musc accordg to the artst s uderstadg of the structure (or meag ) of the pece. Every sklled performer cotuously modfes mportat parameters, such as tempo ad loudess, order to stress certa otes or shape certa passages. Expressve performace s what makes musc come alve ad what dstgushes oe performer from aother (ad what makes some performers famous). Because of ts cetral role our muscal culture, expressve performace s a cetral research topc cotemporary muscology. Oe ma drecto emprcal performace research ams at the developmet of rules or prcples of expressve performace ether wth the help of huma experts [6] or by processg large volumes of data usg mache learg techques [11]. Obvously, ths drecto attempts to explore the smlartes betwee sklled performers the same muscal cotext. O the other had, the dffereces betwee performers have ot bee studed thoroughly. Repp [10] preseted a exhaustve statstcal aalyss of temporal commoaltes ad dffereces amog dstgushed pasts' terpretatos of a well-kow pece ad demostrated the dvdualty of some famous pasts. However, the dffereces musc performace are stll expressed geerally wth aesthetc crtera rather tha quattatvely. 1 Austra Research Isttute for Artfcal Itellgece, Schottegasse 3, A- 1010 Vea, Austra, emal: staths@a.uve.ac.at 2 Departmet of Medcal Cyberetcs ad Artfcal Itellgece, Uversty of Vea, ad Austra Research Isttute for Artfcal Itellgece, Schottegasse 3, A-1010 Vea, Austra, emal: gerhard@a.uve.ac.at I ths paper, we use AI (specfcally: mache learg) techques a attempt to express the dvdualty of musc performers (pasts) mache-terpretable terms by quatfyg the ma parameters of expressve performace. I order to avod ay subjectve evaluato of our approach, we apply t to a welldefed problem: the automatc detfcato of musc performers, gve a set of pao performaces of the same pece of musc by a umber of sklled caddate pasts. From ths perspectve, our task ca be vewed as a typcal classfcato problem, where the classes are the caddate pasts. A set of features that represet the stylstc propertes of a performer s proposed, troducg the orm performace as a referece pot, whle deas take from mache learg research are appled to the costructo of the classfer. The dmesos of expressve varato that wll be take to accout are the three ma expressve parameters avalable to a past: tmg (varatos tempo), dyamcs (varatos loudess), ad artculato (the use of overlaps ad pauses betwee successve otes). Frst expermetal results show that t s deed possble for a mache to dstgush musc performers (pasts) o the bass of ther performace style. From the pot of vew of mache learg, ths costtutes aother supportg case for the utlty of esemble learg methods (specfcally, the combato of a large umber of depedet smple experts [2]). The cotrbuto of ths work to muscology s the detfcato (va mache learg methodolog of a set of global characterstcs of performace style that seem to be relevat to dstgushg dfferet artsts. O the other had, t must be stressed that the curret results are stll very prelmary ad lmted because of the lmted emprcal data avalable for ths vestgato. Obtag precse measuremets, terms of tmg devatos, dyamcs, ad artculato, of performaces of hghly sklled artsts s a dffcult task. We are curretly vestg a large amout of effort to developg ew methods for extractg expressve detals from gve recordgs ad hope to be able to report o much more extesve expermets the ear future. 2 DATA AND TERMINOLOGY The data used ths study cossts of performaces played ad recorded o a Boesedorfer SE290 computer-motored cocert grad pao, whch s able to measure every key ad pedal movemet of the artst wth very hgh precso. 22 sklled performers, cludg professoal pasts, graduate studets ad professors of the Vea Musc Uversty, played two peces by F. Chop: the Etude op. 10/3 (frst 21 bars) ad the Ballade op. 38 (tal secto, bars 1 to 45). The dgtal recordgs were the

Score devato Norm devato 0.4 0.3 0.2 0.1 0-0.1-0.2 0.3 0.2 0.1 0-0.1-0.2 #01 #02 #03 #04 #05 1 11 21 31 41 51 61 71 1 11 21 31 41 51 61 71 Fgure 1. Smoothed tmg devato of the pasts #01-#05 from the prted score (above) ad the orm of the pasts #06-#10 (below) for the soprao otes of Chop s Etude op. 10/3. trascrbed to symbolc form ad matched agast the prted score [3]. Thus, for each ote a pece we have precse formato about how t was otated the score, ad how t was actually played a performace. The parameters of terest are the exact tme whe a ote was played (vs. whe t should have bee played accordg to the score) ths relates to tempo ad tmg, the dyamc level or loudess of a played ote (dyamcs), ad the exact durato of played ote, ad how the ote s coected to the followg oe (artculato). All ths ca be readly computed from our data. I the followg, the term Iter-Oset Iterval (IOI) wll be used to deote the tme terval betwee the osets of two successve otes of the same voce. We defe Off-Tme Durato (OTD) as the tme terval betwee the offset tme of oe ote ad the oset tme of the ext ote of the same voce. The 22 pasts are referred by ther code ames (.e., #01, #02, etc.). 3 FEATURES FOR CHARACTERIZING PERFORMANCE STYLE If we defe (somewhat smplstcall expressve performace as teded devato from the score, the dfferet performaces dffer the way ad extet the artst devates from the score,.e., from a purely mechacal ( flat ) redto of the pece, terms of tmg, dyamcs, ad artculato. I order to be able to compare performaces of peces or sectos of dfferet legth, we eed to defe features that characterze ad quatfy these devatos at a global level,.e., wthout referece to dvdual otes ad how these were played. Fgure 1 (top) shows the tmg devato of fve pasts (#01- #05) from the prted score of Chop's Etude op. 10/3 (measured as the dfferece betwee performed IOIs ad the IOIs that would result from a mechacal performace of the pece at a pre-specfed fxed tempo). It s obvous that all the pasts ted to devate from the score a smlar way. That s ot surprsg. It s well kow that to a certa extet, expressve varato s correlated wth the structure of the pece of musc (e.g., phrase structure, harmoc structure, etc.); deed, expressve performace s a meas for the performer to commucate structural formato to the lsteer. The peaks ad dps of the resultg performace curves ted to correlate (more or less strogl wth phrase boudares ad phrase ceters. Thus, f we decde to rely o very global summarzatos of a past's tempo devatos etc. ad ot to ecode detaled aspects of the musc played (such as ts phrase structure, harmoc structure, etc.), these global features wll strogly deped o ad vary wth the trag set. Samplg the trag set from slghtly dfferet segmets of the same pece may affect the output of the classfer substatally. Ths problem ca be avoded by the use of what we call orm devato features. I addto to the comparso of the performace of a certa past wth the prted score, we propose the average performace of a dfferet set of performers as a referece pot. Fgure 1 (bottom) shows the tmg devato of pasts #01-#05 from the average performace (.e., orm) of the pasts #06-#10 for the same pece as above. As ca be see, the tmg devatos of the frst set of pasts from the orm of the secod set are more stable across the pece. Ths s a strog dcato that the orm devato features should ot be affected by slght chages to the trag set. Gve a set of referece performaces, the orm devato ca be easly calculated for tmg, dyamcs, ad artculato. Aother valuable source of formato comes from the explotato of the so-called melody lead pheomeo [7]. Notes that should be played smultaeously accordg to the prted score (.e., chords) are usually slghtly spread out over tme. A voce that s to be emphaszed precedes the other voces ad s played louder. Studes of ths pheomeo [9] showed that melody lead creases wth expressveess ad skll level. Therefore, devatos betwee the otes of the same chord terms of tmg ad dyamcs ca provde useful features that capture a aspect of the stylstc characterstcs of the musc performer. Specfcally, the, we propose the followg global features for represetg a musc performace, gve the prted score ad a performace orm derved from a gve set of dfferet performers: Score devato features: D(IOI s, IOI m ) D(IOI s, OTD m ) D(DL s, DL m ) Norm devato features: D(IOI, IOI m ) D(OTD, OTD m ) D(DL, DL m ) Melody lead features: D(ON xy, ON zy ) D(DL xy, DL zy ) tmg artculato dyamcs tmg artculato dyamcs tmg dyamcs where D(x, (a scalar) deotes the devato of a vector of umerc values x from a referece vector y, IOI s ad DL s are the omal ter-oset terval ad dyamc-level, respectvely, accordg to the prted score, IOI, OTD, ad DL are the ter-

oset terval, the off-tme durato, ad the dyamc-level, respectvely, of the performace orm, IOI m, OTD m, ad DL m are the ter-oset terval, the off-tme durato, ad the dyamclevel, respectvely, of the actual performace, ad ON xy, ad DL xy are the o-tme ad the dyamc-level, respectvely, of a ote of the x-th voce wth the chord y. For measurg the devato each of the above features, dfferet types of dstace could be appled. We decded to choose the approprate type of dstace for each feature category accordg to ts statstcal sgfcace the trag set. I the followg expermets, Chop's Ballade op. 38 wll be used as the trag materal, ad the Etude op.10/3 as the test pece. Pasts #01-#12 wll be used as the set of referece pasts to compute the orm performace, ad the task wll be to lear to dstgush pasts #13-#22. For determg the best type of dstace measure for each type of feature, the trag pece (the Ballade) was dvded to four o-overlappg segmets, each cludg 40 soprao otes. For each segmet of the performace of the pece by the pasts #13-#22, the values of the proposed features for the followg dfferet types of dstace were calculated: Smple: Ds ( x, ( ( x y )) Relatve: D Smple absolute: 1 ( x y ) r ( x, ( ) 1 x Dsa ( x, ( x y ) 1 x y Relatve absolute: Dra ( x, ( ) x The, aalyss of varace (aka ANOVA) was appled to these values for extractg coclusos about the statstcal sgfcace of the dfferet types of dstace ad features. The most sgfcat features proved to be the devato from the orm terms of tmg ad artculato, the tmg devato betwee the frst ad the thrd voce as well as betwee the frst ad the fourth voce (the bass le), ad the devato from the score terms of tmg ad artculato. As regards the dfferet types of dstaces, D r gave the best results for the score devato features. Ths type of dstace has bee used prevously for comparg dfferet performaces. D s seems to be the approprate selecto for the orm devato features. Fally, D sa fts better the melody lead features, whch dcates that formato o whether a voce precedes or follows the frst voce a chord s ot that mportat as the degree to whch devates from t. 1 4 THE CLASSIFICATION MODEL 4.1 Problem characterstcs Sce oly two peces were avalable (oe of whch should serve as depedet test pece), the trag examples of the musc performer classfer should cosst of pece segmets rather tha etre muscal peces. To determe the best mode of segmetato (equal legth segmets or segmets based o the pece's phrase structure), a smple expermet was performed. A umber of smple classfers, based o dfferet types of features ad dstace deftos, were traed (va dscrmat aalyss see below) usg the Accuracy (%) Table 1. Comparso of score ad orm devato measures for dfferet types of dstace ad dfferet methods of formg trag examples. Accuracy (%) Dstace Equal-legth Phrase-based D s 52.5 50 D r 60 52.5 D sa 40 30 D ra 52.5 42.5 D s 82.5 77.5 D r 57.5 45 D sa 45 45 D ra 20 20 100 90 80 70 60 50 40 30 Score Norm 5 10 15 20 25 30 40 50 60 Trag example legth Fgure 2. Classfcato accuracy vs. trag example legth ( soprao otes). performaces of the pasts #13-#22 of Ballade op. 38, wth dfferet methods of segmetg the pece to trag examples: oe case, the pece was segmeted to four parts of equal legth (40 soprao otes each), the other, t was cut to four parts accordg to phrase boudares that were detfed maually by a huma expert. Table 1 shows the classfcato accuracy results (leave-oe-out evaluato o the orgal data). As ca be see, all the cases the classfers based o trag examples of equal legth gave better or equal accuracy results comparso wth the phrase-based classfers. The orm devato features geerally outperformed the score devato features. Fgure 2 shows the relato of the legth of the trag examples (umber of soprao otes) wth the classfcato accuracy usg Ballade op. 38 as testg groud ad the orm devato features. The loger the segmets that costtute the trag examples, the more accurate the classfer. Ths meas that for costructg relable classfers t s ecessary to have trag examples as log as possble, whch makes for a rather small umber of examples ad aga meas that the umber of put features per example (segmet) should be rather small ( order to avod overfttg of the trag data). 4.2 The proposed esemble All the above characterstcs of the problem suggest the use of a esemble of classfers rather tha a uque classfer. Recet research mache learg [1, 4] has studed thoroughly the costructo of meta-classfers. I ths study, we take advatage of these techques, costructg a esemble of classfers derved

Table 2. Descrpto of the proposed smple classfers. The thrd colum dcates the umber of trag examples (ad ther legth soprao otes) per class. Code Iput features Tr. examples Acc. (%) C 11 D s (IOI, IOI m ), D s (OTD, OTD m ), D s (DL, DL m ) 4x40 82.5 C 21 D r (IOI s, IOI m ), D r (IOI s, OTD m ), D r (DL s, DL m ) 12x10 50.8 C 22 D r (IOI s, IOI m ), D r (IOI s, OTD m ), D r (DL s, DL m ) 12x10 44.8 C 23 D r (IOI s, IOI m ), D r (IOI s, OTD m ), D r (DL s, DL m ) 12x10 46.7 C 24 D r (IOI s, IOI m ), D r (IOI s, OTD m ), D r (DL s, DL m ) 12x10 48.3 C 31 D sa (ON 1m, ON 2m ), D sa (ON 1m, ON 3m ), D sa (ON 1m, ON 4m ) 4x40 57.5 C 32 D sa (DL 1m, DL 2m ), D sa (DL 1m, DL 3m ), D sa (DL 1m, DL 4m ) 4x40 42.5 C 33 D sa (ON 1m, ON 2m ), D sa (DL 1m, DL 2m ) 4x40 25.0 C 34 D sa (ON 1m, ON 3m ), D sa (DL 1m, DL 3m ) 4x40 35.0 C 35 D sa (ON 1m, ON 4m ), D sa (DL 1m, DL 4m ) 4x40 47.5 Table 3. Predctos of the dvdual smple classfers o performaces of the usee test set (Etude op. 10/3). The frst colum dcates the code of the actual performer. Correct predctos are boldface. Last row summarzes correct guesses. Actual C 11 C 21 C 22 C 23 C 24 C 31 C 32 C 33 C 34 C 35 #13 #13 #13 #16 #13 #18 #13 #13 #13 #13 #13 #14 #14 #21 #14 #22 #22 #21 #21 #13 #21 #15 #15 #21 #21 #14 #21 #14 #15 #13 #15 #17 #13 #16 #18 #18 #16 #18 #18 #16 #16 #19 #16 #16 #17 #17 #17 #17 #17 #17 #15 #17 #16 #16 #21 #18 #13 #13 #16 #18 #18 #17 #17 #22 #18 #14 #19 #13 #19 #19 #13 #13 #16 #19 #19 #16 #19 #20 #14 #21 #14 #14 #14 #20 #20 #14 #14 #20 #21 #14 #14 #14 #14 #14 #17 #17 #13 #21 #14 #22 #22 #17 #19 #19 #22 #16 #16 #15 #16 #16 Correct: 4 3 4 3 3 4 5 3 4 4 from subsamplg the put features ad subsamplg the trag data set. The former techque s usually appled whe multple redudat features are avalable. I our case, the put features caot be used cocurretly due to the lmted sze of the trag set (.e., oly a few trag examples per class are avalable) ad the cosequet dager of overfttg. The latter techque s usually appled whe ustable learg algorthms are used for costructg the base classfers. I our case, a subset of the put features (.e., the score devato measures) s ustable ther values ca chage drastcally gve a slght chage the selected trag segmets. Gve the scarcty of trag data ad the multtude of possble features, we propose the use of a relatvely large umber of rather smple dvdual base classfers (or experts, the termology of [2]). Each expert s traed usg a dfferet set of features ad/or parts of the trag data. The features ad sectos of the trag performaces used for the dvdual experts are lsted table 2. C 11 s based o the devato of the performer from the orm. C 21, C 22, C 23, ad C 24 are based o the devato of the performer from the score ad are traed usg slghtly chaged trag sets (because the orm features are kow to be ustable relatve to chages the data). The trag set was dvded to four dsjot subsets ad the four dfferet overlappg trag sets were costructed by droppg oe of these four subsets (.e., crossvaldated commttees). Fally, C 31, C 32, C 33, C 34, ad C 35 are based o melody lead features. The learg algorthm used to costruct the dvdual experts s dscrmat aalyss, a stadard techque of multvarate statstcs, whch costructs a set of lear fuctos of the put varables by maxmzg the betwee-group varace whle mmzg the wth-group varace [5]. The last colum table 2 shows the accuracy of each dvdual expert o the trag data (estmated va leave-oe-out crossvaldato). As ca be see, the classfer based o orm devato features s by far the most accurate. The combato of the resultg smple classfers or experts s realzed va a weghted majorty scheme. The predcto of each dvdual classfer s weghted accordg to ts accuracy o the trag set [8]. Both the frst ad the secod choce of a classfer are take to accout. Specfcally, the weght w of the classfer C s as follows: w a a where a s the accuracy of the classfer C o the trag set (see table 2). a /2 s used to compute the weght for the secod choce of a classfer. The class recevg the hghest votes s the fal class predcto. Specfcally, f c (x) s the predcto of the classfer C for the case x ad P s the set of possble classes (.e., pasts) the the fal predcto s extracted as follows: xy c ˆ( x) arg max w c ( x) p pp where a=b s 1 f a s equal to b ad 0 otherwse. 4.3 Expermetal results The dvdual base classfers as defed above were traed o the performaces of the Ballade op.38 by pasts #13-#22; pasts #01-#11 were used to defe the performace orm. Both the dvdual base classfers ad the combed esemble classfer were the tested o a depedet test pece, the Etude op.10/3. Table 3 shows the classfcato results for the dvdual base classfers. The classfcato accuracy of each dvdual classfer rages betwee 30% ad 50%. The errors of orm devato ad score devato classfers are partally correlated (.e., commo msclassfcatos: #16-#18, #19-#13, #20-#14, #21-#14). O the xy

Table 4. Predctos (frst ad secod choce) of the esemble of the smple classfers o performaces of the usee test set (Etude op. 10/3). The frst colum dcates the code of the actual performer. Correct predctos are boldface. Last row summarzes correct guesses. Actual 1st choce Score 2d choce Score #13 #13 0.56 #18 0.23 #14 #14 0.31 #21 0.29 #15 #21 0.34 #14 0.25 #16 #16 0.46 #18 0.34 #17 #17 0.47 #15 0.16 #18 #18 0.30 #13 0.26 #19 #19 0.40 #13 0.27 #20 #14 0.42 #20 0.22 #21 #14 0.51 #22 0.15 #22 #22 0.29 #16 0.25 Correct: 7 1 other had, the errors of the melody lead classfers are hghly ucorrelated comparso to the others. Note that ucorrelated errors are very crucal for costructg esembles of classfers [4]. Table 4 shows the classfcato results of the esemble classfer. The esemble correctly detfed the past 7 out of 10 cases, whch gves a accuracy of 70%. The esemble thus performs substatally better tha ay of the costtuet classfers. The score assged to each predcto ca be used as a dcato of the classfer's certaty. Thus, the classfcato of the performaces by pasts #14, #18, ad #22 are the most dffcult cases sce the dstace of the frst choce from the secod choce s less tha 0.05. Note that 70% s a hgh success rate a 10-class task. Note also that ths would be a very dffcult task for a huma: mage you frst hear 10 dfferet pasts performg oe partcular pece (ad that s all you kow about the pasts), ad the you have to detfy each of the 10 pasts a recordg of aother (ad qute dfferet) pece. We are plag a classfcato expermet wth huma lsteers to measure the level of huma performace ths type of task; we expect t to be substatally lower. 5 CONCLUSIONS We have preseted a computatoal approach to the problem of dscrmatg betwee musc performers playg the same pece of musc, ad troduced a set of features that capture some aspects of the dvdual style of each performer. I order to cope effcetly wth ths problem, we proposed a classfcato model that takes advatage of varous techques of costructg meta-classfers. The results show that the dffereces betwee musc performers ca be quatfed. Whle huma experts use mostly aesthetc crtera for recogzg dfferet performers, t s demostrated that the dvdualty of each performer ca be objectvely captured usg mache-terpretable features. Ths research s performed the cotext of a large research project whose goal s to study fudametal prcples of expressve musc performace wth AI methods. The curret study ca be see as aother attempt at dscoverg ad quatfyg features that are crucal to uderstadg ad modelg ths complex pheomeo. The proposed features ca be easly computed ad do ot make use of ay pece-specfc formato (e.g., extracted by structural or harmoc aalyss). However, the results caot be easly terpreted terms of the tradtoal musc theory. Thus, the proposed features are ot lkely to help the explaato of the dffereces betwee the performers. Such a task would requre features assocated wth partcular local muscal cotexts ad pecespecfc formato. The relablty of our curret results s stll severely compromsed by the very small set of emprcal data that were avalable. It s plaed to vest substatal effort the future to collectg ad precsely measurg a larger ad more dverse set of performaces by a set of dfferet pasts (o a computercotrolled pao). Studyg famous cocert pasts wth ths approach would requre us to be able to precsely measure tmg, dyamcs, ad artculato from soud recordgs, whch ufortuately stll s a usolved sgal-processg problem. ACKNOWLEDGEMENTS Ths work was supported by the EU project HPRN-CT-2000-00115 (MOSART) ad the START program of the Austra Federal Mstry for Educato, Scece, ad Culture (Grat o. Y99-INF). The Austra Research Isttute for Artfcal Itellgece ackowledges basc facal support from the Austra Federal Mstry for Educato, Scece, ad Culture. REFERENCES [1] E. Bauer ad R. Kohav, A Emprcal Comparso of Votg Classfcato Algorthms: Baggg, Boostg, ad Varats, Mache Learg, 39 (1/2), pp. 105-139, (1999). [2] A. Blum, Emprcal Support for Wow ad Weghted-Majorty Based Algorthms: Results o a Caledar Schedulg Doma, Mache Learg, 26 (1), pp. 5-23, (1997). [3] E. Cambouropoulos, From MIDI to Tradtoal Musc Notato, I Proc. of the AAAI 2000 Workshop o Artfcal Itellgece ad Musc, 17th Natoal Cof. O Artfcal Itellgece, pp. 19-23 (2000). [4] T. Detterch, Esemble Methods Mache Learg, Frst It. Workshop o Multple Classfer Systems, pp. 1-15, (2000). [5] R. Esebes ad R. Avery, Dscrmat Aalyss ad Classfcato Procedures: Theory ad Applcatos, Lexgto, Mass.: D.C. Health ad Co., 1972 [6] A. Frberg, Geeratve Rules for Musc Performace: A Formal Descrpto of a Rule System Computer Musc Joural, 15 (2), pp. 56-71, (1991). [7] W. Goebl, Sklled Pao Performace: Melody Lead Caused by Dyamc Dfferetato, I Proc. of the 6th It. Cof. o Musc Percepto ad Cogto, (2000). [8] D. Optz ad J. Shavlk, Geeratg Accurate ad Dverse Members of a Neural-Network Esemble, I D. Touretzky, M. Mozer, ad M. Hasselmo (Eds.) Advaces Neural Iformato Processg Systems, 8, pp. 535-541, (1996). [9] C. Palmer, O the Assgmet of Structure Musc Performace, Musc Percepto, 14, pp. 23-56, (1996). [10] B. Repp, Dversty ad Commoalty Musc Performace: A Aalyss of Tmg Mcrostructure Schuma s Träumere. Joural of the Acoustcal Socety of Amerca, 92 (5), pp. 2546-2568, (1992). [11] G. Wdmer, Usg AI ad Mache Learg to Study Expressve Musc Performace: Project Survey ad Frst Report, AI Commucatos, 14, pp. 149-162 (2001).