CLASSIFICATION OF RECORDED CLASSICAL MUSIC USING NEURAL NETWORKS

Similar documents
A QUERY BY HUMMING SYSTEM THAT LEARNS FROM EXPERIENCE

RBM-PLDA subsystem for the NIST i-vector Challenge

Ranking Fuzzy Numbers by Using Radius of Gyration

Scalable Music Recommendation by Search

Melodic Similarity - a Conceptual Framework

e-workbook TECHNIQUES AND MATERIALS OF MUSIC Part I: Rudiments

H-DFT: A HYBRID DFT ARCHITECTURE FOR LOW-COST HIGH QUALITY STRUCTURAL TESTING

Language and Music: Differential Hemispheric Dominance in Detecting Unexpected Errors in the Lyrics and Melody of Memorized Songs

A METRIC FOR MUSIC NOTATION TRANSCRIPTION ACCURACY

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /VETECF.2002.

R&D White Paper WHP 119. Mezzanine Compression for HDTV. Research & Development BRITISH BROADCASTING CORPORATION. September R.T.

Study on evaluation method of the pure tone for small fan

A Reconfigurable Frame Interpolation Hardware Architecture for High Definition Video

C2 Vectors C3 Interactions transfer momentum. General Physics GP7-Vectors (Ch 4) 1

Experimental Investigation of the Effect of Speckle Noise on Continuous Scan Laser Doppler Vibrometer Measurements

other islands for four players violin, soprano sax, piano & computer nick fells 2009

Precision Interface Technology

Version Capital public radio. Brand, Logo and Style Guide

LISG Laser Interferometric Sensor for Glass fiber User's manual.

Compact Beamformer Design with High Frame Rate for Ultrasound Imaging

Stochastic analysis of Stravinsky s varied ostinati

On the Design of LPM Address Generators Using Multiple LUT Cascades on FPGAs

Music from an evil subterranean beast

Chapter 1: Choose a Research Topic

Precision Interface Technology

Grant Spacing Signaling at the ONU

4.5 Pipelining. Pipelining is Natural!

Deal or No Deal? Decision Making under Risk in a Large-Payoff Game Show

Music Technology Advanced Subsidiary Unit 1: Music Technology Portfolio 1

A Low Cost Scanning Fabry Perot Interferometer for Student Laboratory

Citrus Station Mimeo Report CES WFW-Lake Alfred, Florida Lake Alfred, Florida Newsletter No. 2 6.

EWCM 900. technical user manual. electronic controller for compressors and fans

A 0.8 V T Network-Based 2.6 GHz Downconverter RFIC

Cross-Cultural Music Phrase Processing:

Content-Based Movie Recommendation Using Different Feature Sets

The game of competitive sorcery that will leave you spellbound.

Spreadsheet analysis of a hierarchical control system model of behavior. RICHARD S. MARKEN Aerospace Corporation, Los Angeles, California

Chapter 4. Minor Keys and the Diatonic Modes BASIC ELEMENTS

(2'-6") OUTLINE OF REQUIRED CLEAR SERVICE AREA

VOICES IN JAPANESE ANIMATION: HOW PEOPLE PERCEIVE THE VOICES OF GOOD GUYS AND BAD GUYS. Mihoko Teshigawara

FM ACOUSTICS NEWS. News for Professionals. News for Domestic Users. Acclaimed the world over: The Resolution Series TM Phono Linearizers/Preamplifiers

Focus: Orff process, timbre, movement, improvisation. Audience: Teachers K-8

TABLE OF CONTENTS. Jacobson and the Meaningful Life Center. Introduction: Birthday Greeting from Rabbi Simon. Postscript: Do You Matter?

MARTIN KOLLÁR. University of Technology in Košice Department of Theory of Electrical Engineering and Measurement

A Practical and Historical Guide to Johann Sebastian Bach s Solo in A Minor BWV 1013

This is a repository copy of Temporal dynamics of musical emotions examined through intersubject synchrony of brain activity..

BRASS TECHNIQUE BARITONE

Auburn University Marching Band

Keller Central Percussion

SCP725 Series. 3M It s that Easy! Picture this:

Adapting Bach s Goldberg Variations for the Organ. Siu Yin Lie

ABOVE CEILING. COORDINATE WITH AV INSTALLER FOR INSTALLATION OF SURGE SUPRESSION AND TERMINATION OF OUTLET IN CEILING BOX

Û Û Û Û J Û . Û Û Û Û Û Û Û. Û Û 4 Û Û &4 2 Û Û Û Û Û Û Û Û. Û. Û. Û Û Û Û Û Û Û Û Û Û Û. œ œ œ œ œ œ œ œ. œ œ œ. œ œ.

Auditory Stroop and Absolute Pitch: An fmri Study

Making Fraction Division Concrete: A New Way to Understand the Invert and Multiply Algorithm

Texas Bandmasters Association 2016 Convention/Clinic

Jump, Jive, and Jazz! - Improvise with Confidence!

Reference. COULTER EPICS ALTRA Flow Cytometer COULTER EPICS ALTRA HyPerSort System. PN CA (August 2010)

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Design of Address Generators Using Multiple LUT Cascade on FPGA

Flagger Control for Resurfacing or Moving Operation. One-Lane Two-Way Operation

r r IN HARMONY With Nature A Pioneer Conservationist's Bungalow Home By Robert G. Bailey

UNION PROUD! QUARTERLY NEWS TABLE OF CONTENTS TEAMSTERS LOCAL 399. AUGUST 2017 ISSUE N o 14. Fraternally, Steve Dayan

DRIVING HOLLYWOOD BROTHERS SISTERS QUARTERLY NEWS 399 MEMBER POWER TABLE OF CONTENTS TEAMSTERS LOCAL 399. APRIL 2018 ISSUE N o 17

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

crotchets Now transpose it up to E minor here! 4. Add the missing bar lines and a time signature to this melody

SUITES AVAILABLE. TO LET Grade A Offices

Flagger Control for Resurfacing or Moving Operation. One-Lane Two-Way Operation

Options Manual. COULTER EPICS ALTRA Flow Cytometer COULTER EPICS ALTRA HyPerSort System Flow Cytometer. PN AA (August 2010)

Multiple Bunch Longitudinal Dynamics Measurements at the Cornell Electron-Positron Storage Ring

FOR PREVIEW REPRODUCTION PROHIBITED

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Copland and the Folk Song: Sources, Analysis, Choral Arrangements

To Bean or not to bean! by Uwe Rosenberg, with illustrations by Björn Pertoft Players: 2 7 Ages: 10 and up Duration: approx.

2016 Application Instructions - Symphony

Please note that not all pages are included. This is purposely done in order to protect our property and the work of our esteemed composers.

MUSI-6201 Computational Music Analysis

The new face of Speke NEW MERSEY SHOPPING PARK, LIVERPOOL L24 8QB

CpE 442. Designing a Pipeline Processor (lect. II)

The new face of Speke NEW MERSEY SHOPPING PARK, LIVERPOOL L24 8QB

Vis-à-vis. an interactive monodrama for voice, electronics and real-time video. TEXT: clear, then slowly disintegrating... (Ú ~ º) 13. Œ œ. j œ.

A STRONG PAST BUILDS A BRIGHT FUTURE BROTHERS SISTERS QUARTERLY NEWS TABLE OF CONTENTS TEAMSTERS LOCAL 399. February 2019 ISSUE N o 20

Subjective Similarity of Music: Data Collection for Individuality Analysis

'USE YOUR RECORDER AS A RADIO STATION See Page 28. 7, ir,. tic. I, Alit. r 1 MARCH, c "

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

2 TOTAL RECAll. Music and Memory in the Time of YouTube Q;U~ (&l.. +, rj e~ Vg!f'<

De-Canonizing Music History

OPERATORS MANUAL Version 3

Music Genre Classification and Variance Comparison on Number of Genres

Maija Hynninen. Freedom from Fear. for oboe, electronics and lights

SN54273, SN54LS273, SN74273, SN74LS273 OCTAL D-TYPE FLIP-FLOP WITH CLEAR

2014 UHDL Audition Packet

HURDLING THE HAZARDS OFTHE BEGINNING ARRANGER

GEOGRAPHIC VARIATION IN SONG AND DIALECTS OF THE PUGET SOUND WHITE-CROWNED SPARROW

Westerville Parks and Recreation Civic Theatre presents AUDITION PACKET AUDITIONS:

Singer Traits Identification using Deep Neural Network

2017 ANNUAL REPORT. Turning Dreams into Reality FORT BRAGG OUR MISSION: 1, EDUCATION EXPERIENCE EXPLORATION

The Heartz: A Galant Schema from Corelli to Mozart. John A. Rice

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Transcription:

CLASSIFICATIO OF RECORDED CLASSICAL MUSIC USIG EURAL ETWORKS R Malheio ab R P Paiva a A J Mendes a T Mendes a A Cadoso a a CISUC Cento de Infomática e Sistemas da Univesidade de Coimba Depatamento de Engenhaia Infomática, PÓLO II da Univesidade de Coimba, Pinhal de Maocos, P 3030, Coimba, Potugal Tel +35-239-790000, Fax: +35-239-70266, e-mail: {uipedo, toze, tmendes, amilca}@deiucpt b ESCT Escola Supeio de Ciências e Tecnologia da Univesidade Católica Potuguesa Cento Regional das Beias - Viseu, Estada da Cicunvalação, P 3504-505, Viseu, Potugal Tel +35-232-49500, Fax: +35-232-428344, e-mail: smal@cbucppt Abstact As a esult of ecent technological innovations, thee has been a temendous gowth in the Electonic Music Distibution industy In this way, tasks such us automatic music gene classification addess new and exciting eseach challenges Automatic music gene ecognition involves issues like featue extaction and development of classifies using the obtained featues As fo featue extaction, we use the numbe of zeo cossings, loudness, spectal centoid, bandwidth and unifomity These featues ae statistically manipulated, making a total of 40 featues Regading the task of gene modeling, we tain a feedfowad neual netwok (FF) with the Levenbeg- Maquadt algoithm A taxonomy of subgenes of classical music is used We conside thee classification poblems: in the fist one, we aim at disciminating between music fo flute, piano and violin; in the second poblem, we distinguish choal music fom opea; finally, in the thid one, we aim at disciminating between all the abovementioned five genes togethe We obtained 85% classification accuacy in the thee-class poblem, 90% in the two-class poblem and 76% in the five-class poblem These esults ae encouaging and show that the pesented methodology may be a good stating point fo addessing moe challenging tasks Keywods: neual netwoks, music infomation etieval, music classification, music signal analysis Intoduction Pesently, whethe it is the case of a digital music libay, the Intenet o any music database, seach and etieval is caied out mostly in a textual manne, based on categoies such as autho, title o gene This appoach leads to a cetain numbe of difficulties fo sevice povides, namely in what concens music labeling Real-wold music databases fom sites like AllMusicGuide o CDOW gow lage and lage on a daily basis, which equies a temendous amount of manual wok fo keeping them updated Thus, simplifying the task of music database oganization would be an impotant advance This calls fo automatic classification systems Such systems should ovecome the limitations esulting fom manual song labeling, which may be a highly time-consuming and subjective task Some authos have addessed this poblem ecently Tzanetakis and Cook [2] classify music in ten genes, namely, classical, county, disco, hip-hop, jazz, ock, blues, eggae, pop and metal They futhe classify classical music into choi, ochesta, piano and sting quatets Featues used encompass thee classes: timbal, hythmic and pitch-elated featues The authos investigate the impotance of the featues is taining statistical patten ecognition classifies, paticulaly, Gaussian Mixtue Models and k-neaest neighbos 6% accuacy was achieved fo disciminating between the ten classes As fo classical music classification, an aveage accuacy of 8225% was achieved Golub [2] uses seven classes of mixed similaity (a capella, celtic, classical, electonic, jazz, latin and pop-ock) The featues used ae loudness, spectal centoid, bandwidth and unifomity, as Copyight #### by ASME

well as statistical featues obtained fom them A genealized linea model, a multi-laye pecepton and a k- neaest classifie wee used The best of them achieved 67% accuacy Kosina [5] classifies thee highly dissimila classes (metal, dance and classical) using k-neaest neighbos The used featues wee mel-fequency cepstal coefficients, zeo-cossing ate, enegy and beat 88% accuacy was achieved Matin [6] addesses the poblem of instument identification He poposes a set of featues elated to the physical popeties of the instuments with the goal of identifying them in a complex auditoy envionment In ou wok we aim at classifying five subgenes of classical music, namely, opea, choal music and music fo flute, piano and violin This is due to the fact that thee ae not many studies egading specifically classical music Also, digital music libaies have a geat divesity of taxonomies of classical music, which demonstates its pactical usefulness Unlike othe authos who use a boad ange of geneic classes, we chose to focus on specific set of elated classes This seems to be a moe challenging poblem since ou classes show a highe similaity degee, leading to, we think, a moe difficult classification poblem We chose a set of featues based on those used in [3] and [2], encompassing especially timbe and pitch content, which seemed elevant fo the task unde analysis: the numbe of zeo cossings, loudness, spectal centoid, bandwidth and unifomity Rhythmic featues wee not used An FF classifie is used, which is tained via the Levenbeg-Maquadt algoithm Fo validation puposes we obtained 76% accuacy in the fiveclass poblem Ou esults, though fa fom ideal, ae satisfactoy Compaing to [2], we got a simila accuacy using one moe categoy and a educed featue set This pape is oganized as follows Section 2 descibes the pocess of featue extaction and the featues used In Section 3, a shot oveview of FFs and thei application to ou music gene ecognition poblem is pesented Expeimental esults ae pesented and analyzed in Section 4 Finally, in Section 5 some conclusions ae dawn, as well as possible diections fo futue wok 2 Featue Extaction Based on the classification objectives efeed, and taking into account the esults obtained in simila woks, we gave paticula impotance to featues with some significance fo timbal and pitch content analysis We used no hythmic featues, since they did not seem vey elevant fo the type of music unde analysis Howeve, we plan to use them in the futue and evaluate thei usefulness in this context We stated by selecting 6 seconds segments fom each musical piece (22khz sampling, 6 bits quantization, monaual) Since fo taining issues the segment samples used should have little ambiguity egading the categoy they belong to, we selected elevant segments fom each piece The pupose was not to use long taining samples Instead, shot significant segments ae used, mimicking the way humans classify music, ie, shot segments [8] using only music suface featues without any highe-level theoetical desciptions [7] Afte collecting a elevant segment fo each piece, the pocess of featue extaction is stated by dividing each 6s signal in fames of 2322 with 50% ovelap This paticula fame length was defined so that the numbe of samples in each fame is a powe of 2, which is necessay fo optimizing the efficiency of Fast Fouie Tansfom (FFT) calculations [] (Section 22) This gives 52 samples pe fame, in a total of 55 fames Both tempoal and spectal featues ae used, as descibed below 2 Time-Domain Featues As fo tempoal featues, we use loudness and the numbe of zeo cossings Loudness is a peceptual featue that ties to captue the peception of sound intensity Only the amplitude is diectly calculated fom the signal Loudness, ie, the peception of amplitude, can be appoximated as follows [2] (): L( ) = log + 2 x( n) () n= whee L denotes loudness, efes to the fame numbe, is the numbe of samples in each fame, n stands fo the sample numbe in each fame and x(n) stands fo the amplitude n-th sample in the cuent fame The numbe of zeo cossings simply counts the numbe of times the signal cosses the time axis, as follows [3] (2): Z( ) = sgn( x( n) ) sgn( x( n ) ) (2) 2 n= whee Z epesents the numbe of zeo cossings This is a measue of the signal fequency content, which is fequently used in music/speech discimination and fo captuing the amount of noise in a signal [2]

22 Fequency-Domain Featues The spectal featues used, computed in the fequency domain, ae spectal centoid, bandwidth and unifomity Theefoe, the pocess stats by conveting the signal into the fequency domain using the Shot-Time Fouie Tansfom (STFT) [9] In this way, the signal is divided in fames, as stated above The signal fo each fame is then multiplied by a Hanning window, which is chaacteized by a good tade-off between spectal esolution and leakage [] Spectal centoid is the magnitude-weighted mean of the fequencies [2] (3): M ( k) log2 k k= C( ) = (3) k= whee C() epesents the value of the spectal centoid at fame and M(k) is the magnitude of the Fouie tansfom at fame and fequency bin k This is a measue of spectal bightness, impotant, fo instance, in music/speech o musical instument discimination Bandwidth is the magnitude-weighted standad deviation of fequencies [2], as follows (4): k= ( C( ) log k) k= B( ) = (4) 2 2 whee B() epesents the spectal bandwidth at fame This is a measue of spectal distibution: lowe bandwidth values denote a concentation of fequencies close to the centoid (which is the enegy-weighted mean of fequencies), ie, a moe naow fequency ange Unifomity gives a measue of spectal shape It measues the similaity of the magnitude levels in the spectum and it is useful fo disciminating between highly pitched signals (most of the enegy concentated in a naow fequency ange) and highly unpitched signals (enegy distibuted acoss moe fequencies) [2] Unifomity is computed as follows (5): M ( k) U ( ) = (5) log k = M k = ( k) k= Fo each fame, the five featues descibed ae extacted Then, fist-diffeences ae calculated, based on the featue values in consecutive fames, eg, L() - L(-) These five new featues plus the five featues descibed befoe constitute ou set of 0 basis featues Classical music is usually chaacteized by accentuated vaiations in the basis featues thoughout time Theefoe, statistical manipulations of the basis featues ae calculated in ode to cope with this aspect The means and standad deviations fo the ten basis featues ae calculated in 2 seconds chunks, leading to 20 featues The final featues that compose the signatue coespond to the means and standad deviations of the 20 intemediate featues computed peviously We get a total of 40 featues (2 2 0) 3 Gene Modelling with FFs Atificial eual etwoks (A) [4] ae computational models that ty to emulate the behavio of the human bain They ae based on a set of simple pocessing elements, highly inteconnected, and with a massive paallel stuctue As ae chaacteized by thei leaning, adapting and genealization capabilities, which make them paticulaly suited fo tasks such as function appoximation Feedfowad eual etwoks (FF) ae a special class of As, in which all the nodes in some laye l ae connected to all the nodes in laye l- Each neuon eceives infomation fom all the nodes in the pevious laye and sends infomation to all the nodes in the following laye A FF is composed of the input laye, which eceives data fom the exteio envionment, typically one hidden laye (though moe layes may be used [0]) and the output laye, which sends data to the exteio envionment (Figue ) The links connecting each pai of neuons ae given some weight, w This attibution of weights to links is the job of any taining algoithm, as descibed below Each neuon computes an output value based on the input values eceived, the weights of the links fom the neuons in the pevious laye and the neuon s activation function Usually, sigmoid functions ae used [4] The capability of the FF fo mapping input values into output values depends on the link weights Thei optimal detemination is still an open poblem Theefoe, iteative hill-climbing algoithms ae used Thei main limitation comes fom the fact that only local optima ae obtained: only occasionally the global optimum can be found In the context of As, these iteative optimization algoithms ae called taining algoithms

i i 2 i 3 i 39 i40 st feat 40 th feat w 20,39 w 20,40 Input Matix (40 x 20) * 20 th music w 20, st music T th music j j 2 j 3 j 9 j20 flute cl Figue FF used on the classification of music in thee mu sical genes (flute, piano and violin) As ae usually tained in a supevised manne, ie, the weights ae adjusted based on taining samples (input-output pais) that guide the optimization pocedue towads an optimum Fo instance, in the case of ou music gene classification, each netwok input is a vecto with the 40 extacted featues and each taget value has a value of fo the coect class and a value of 0 othewise (Figue ) Ou FF is tained in batch mode, ie, all the taining paes ae pesented to the netwok, an eo measue is computed and only then the weights ae adjusted towads eo eduction In Figue, we have a 40 20 input matix whee each line coesponds to a paticula featue and each column coesponds to each music featue-vecto used fo taining the netwok In the same figue, a 3 20 taget output matix is pesented, whee each column has infomation egading the taget o piano cl o2 violin cl Taget Matix (3 x 20) ** o 3 st music T th 20 th music music o o 2 o 3 class fo the coesponding music featue-vecto: all the lines have zeo value, except fo the line coesponding to the coect class, which has a value of one Fo example, if the T th music signatue denotes a piano piece, and the second output neuon was assigned to the piano categoy, then the T th ouput column would have a value of in the second line, and zeo fo all othe lines The most widely used taining algoithm fo FFs is backpopagation [4] Hee, thee is a fowad pass whee inputs ae pesented to the netwok and output values ae computed The eo between each taget value and the coesponding output value is then calculated Then, a backwad pass is pefomed, whee the weights ae adjusted towads eo eduction, using the gadient descent method This pocess is epeated iteatively until the eo is below a given theshold The gadient descent method has some limitations egading convegence popeties: the algoithm can get stuck in a local minimum and the selection of the leaning ate is usually not tivial (if its value is too low, leaning is slow; if it is too high, the netwok may divege) Theefoe, some vaiants ae used, eg, leaning with a momentum coefficient o defining an adaptive leaning ate [4] Hee, we use the Levenbeg-Maquadt algoithm, which has the advantage of being significantly faste (0 to 00 times faste []) at the cost of highe memoy consumption, due to the computation of a Jacobian matix in each iteation Also, this algoithm conveges in situations whee othes do not [3] Afte taining, the neual netwok must be validated, ie, its esponse to unknown data must be analyzed in ode to evaluate its genealization capabilities Thus, a fowad pass is pefomed, with samples neve pesented befoe, and the same eo measue used duing taining is computed Typically, the available samples ae divided in two sets, one fo taining and the othe fo validation, 2/3 fo the fome and /3 fo the latte, espectively In ode to avoid numeical poblems, all the featues wee peviously nomalized to the [0, ] inteval [] 4 Expeimental Results As stated befoe, ou goal is to classify classical music into five subgenes: flute, piano, violin, choal and opea These can be oganized in a hieachical manne, as depicted in Figue 2 The pesented taxonomy is defined only fo the sake of claity: the pactical classification pefomed was not hieachical We collected a database of 300 monaual musical pieces (60 fo each gene), sampled at 22050 Hz, with 6 bits quantization Fo each piece, 6 seconds segments

wee extacted, based on thei elevance fo the gene in cause, as stated in Section 2 flute instumental classical music vocal piano violin choal opea Figue 2 Classical music gene classification Ou fist goal was to disciminate between thee genes of instumental music: music fo flute, piano and violin The 6s segments extacted wee chosen so as to include soles fom each instument by single o seveal playes in unison, in isolation (monophonic segment) o with an ochesta in the backgound (polyphonic segment) Fo example, in the case of violin, we extacted a segment fom Sping in Vivaldi s Fou Seasons In ou second goal, we wanted to disciminate between genes of vocal music: choals and opea Typically, the musical pieces used fo opea wee vocal soles, essentially pefomed by tenos, sopanos and mezzo-sopanos (Callas, Pavaotti, etc), wheeas fo choal music segments of simultaneous distinct voices wee used without many of the stylistic effects used in opea (vibato, temolo) Many of the used pieces wee also a cappela, ie, only human voices, no instuments Finally, ou thid goal was to disciminate between all of the five genes efeed above Fo the thee poblems addessed we used theelayeed FFs, tained in batch mode via the Levenbeg- Maquadt algoithm Each netwok consists of 40 input neuons (one fo each extacted featue) a vaiable numbe of hidden neuons (descibed below) and 2, 3 o 5 output neuons, accoding to poblem unde analysis Both hidden and output neuons use sigmoid activation functions Fo taining puposes, we used 40 pieces fom each gene, wheeas fo validation the emaining 20 wee used (a total of 200 pieces fo taining and 00 fo validation) Special cae was taken so that the taining samples fo each gene wee divese enough Validation, ie, classification of unknown pieces, was caied out unde two diffeent pespectives that we designate as pecentage calculus ule (PCR) and pecentage calculus ule 2 (PCR2) PCR Unde this pespective, a musical piece fom a paticula gene is well classified when the highest netwok output coesponds to that gene and its value is above o equal 07 (ecall that the netwok outputs values between 0 and ) In this situation, the piece consideed is coectly classified, without any ambiguities When all output values ae unde 07, it is concluded that this paticula musical piece does not belong to any of the defined categoies The highest value is not high enough to avoid possible ambiguities In ode to impove class distinguis hability, we check, fo each well-classified piece, if the second highest netwok output value is at least 02 below the highest one Fo this pupose we define the gn2 < 02 measue, which epesents the pecentage of pieces whee the second highest value was less than 02 below the highest one Fo instance, if a musical piece has a value of 08 fo the ight gene (highest value) and 065 fo the second highest value, this paticula piece will make pat on the gn2 < 02 measue In this situation, it is concluded that this piece shows some ambiguity egading those genes PCR2 In this case, a musical piece fom a paticula gene is well classified if the highest netwok output value coesponds to the ight gene, egadless of its amplitude We futhe define the gn2 > 07 measue, which epesents the pecentage of wongly classified pieces, whee the coect gene coesponds to the second highest value, which is at least 07 The idea is to check if the piece is almost coectly classified Below we pesent the esults fo each of the classification poblems addessed 4 Fist Classification: Thee Genes In this case, musical pieces wee classified into flute, piano and violin pieces Fo the detemination of the most adequate numbe of neuons in the hidden laye, we tested seveal values in the ange [0, 30] The best classification esults wee obtained fo 20 neuons in the hidden laye: an aveage classification accuacy of 83,3% fo PCR and 85% fo PCR2, fo the thee genes Regading PCR analysis (Table ), we got 85% accuacy fo flute, 80% fo piano and 85% fo violin Analyzing the esults fo flute pieces, we also notice that 5% of them wee wongly classified as piano, 5% as violin and 5% did not belong to any of the classes We also see that the distance between the coect value and the second highest value was always at least 02 ( gn2 < 02 = 0%)

PCR 83,3% Flute Piano Violin Flute 85 0 5 Piano 5 80 0 Violin 5 0 85 unclassif 5 0 0 gn 2 < 02 0 0 0 Table Instumental music confusion matix: PCR As fo PCR2 analysis (Table 2), we got 90% accuacy fo flute, 80% fo piano and 85% fo violin Fo violin pieces, 0% of pieces wee wongly classified but the violin class was the second highest value, which was above 07 ( gn2 > 07 = 0%) PCR2 85% Flute Piano Violin Flute 90 0 5 Piano 5 80 0 Violin 5 0 85 gn 2 > 07 0 0 0 Table 2 Instumental music confusion matix: PCR2 By inspection of the classification eos, we noticed that they occu when the instuments ae played in an unusual manne, not included in the taining samples Fo instance, two violin pieces wee classified as piano, which had in common the fact of being extemely slow and having small amplitude vaiations Howeve, the output values fo the violin class wee high (above 07), which comes fom the fact that the timbal featues coectly detected the pesence of violins 42 Second Classification: Thee Genes In this situation, musical pieces wee classified into opea and choal pieces We obtained best classification esults with 25 neuons in the hidden laye: an aveage classification accuacy of 90%, both fo PCR and PCR2, fo the two genes used Regading PCR analysis (Table 3), we obtained 90% accuacy fo c hoal pieces and also 90% fo opea PCR 90% Choal Opea Choal 90 0 Opea 0 90 unclassif 0 0 gn 2 < 02 0 0 Table 3 Vocal music confusion matix: PCR As fo PCR2 analysis (Table 4), we obtained the same esults: 90% accuacy both fo opea and choal pieces PCR2 90% Choal Opea Choal 90 0 Opea 0 90 gn 2 > 07 0 0 Table 4 Vocal music confusion matix: PCR2 Only two choal pieces and two opea pieces wee not coectly classified One of those choal pieces has some instumental pats, unlike most of the taining samples, which ae a capella Also, that paticula piece has a female voice that clealy stands out, which the aveage human being could easily classify as opea As fo the two mistaken opea pieces, we could not find any clea easons fo that behavio The only conclusion we can daw is that the used featues ae good enough fo the well-behaved cases Fo moe atypical situations, a moe thoough featue analysis is equied: elimination of edundant featues and/o inclusion of necessay exta featues 43 Thid Classification: Five Genes Hee, musical pieces wee classified into the five categoies listed befoe: flute, piano, violin, opea and choal music Best classification esults wee obtained with 20 neuons in the hidden laye fo PCR, with 64% aveage classification accuacy, and 30 neuons fo PCR2, with 76% aveage classification accuacy, fo the five genes used Regading PCR analysis (Table 5), we obtained 65% classification accuacy fo flute pieces, 65% fo piano, 70% fo violin, 50% fo choals and 70% fo opea PCR 64% Flute Piano Violin Choal Opea Flute 65 5 5 0 0 Piano 0 65 0 0 0 Violin 0 0 70 0 0 Choal 5 0 5 50 0 Opea 0 0 5 5 70 unclassif 0 0 5 5 20 gn 2 < 02 0 0 20 5 5 Table 5 Mixed classification confusion matix: PCR As fo PCR2 analysis (Table 6), the classification accuacy was 75% fo flute pieces, 65% fo piano, 85% fo piano, 75% fo choals and 80% fo opea Though inteesting, the esults obtained fo this moe complex classification poblem ae less satisfactoy It is clea that the used featues could not sepaate the five classes in a totally unambiguous manne Theefoe, a

deepe featue analysis seems fundamental in ode to obtain bette esults PCR2 76% Flute Piano Violin Choal Opea Flute 75 20 0 0 0 Piano 5 65 0 5 5 Violin 0 5 85 0 0 Choal 0 5 0 75 5 Opea 0 5 5 0 80 gn 2 > 07 0 5 5 0 0 Table 6 Mixed classification confusion matix: PCR2 5 Conclusions The main goal of this pape was to pesent a methodology fo the classification of classical music Although the esults obtained ae not sufficient fo eal-wold applications, they ae pomising In the most complex case, whee we defined five categoies, the classification esults wee less accuate Howeve, in ou opinion, a hieachical classifie, following the stuctue in Figue 3, would lead to bette esults In the futue, we will conduct a moe thoough analysis of the featue space: detection and elimination of edundant featues, as well as definition and utilization of othe featues, which may help to disciminate the moe atypical cases Additionally, we plan to use a boade and deepe set of categoies, ie, moe basis classes and subclas ses In case we use categoies like waltz, hythmic featues, not used in the pesent wok, will cetainly be impotant [4] Haykin S, eual etwoks: A Compehensive Foundation, Macmillan College Publishing, 994 [5] Kosina, K, Music Gene Recognition, MSc Thesis, Hagenbeg, 2002 [6] Matin, K, Towad Automatic Sound Souce Recognition: Identifying Musical Instuments ATO Computational Heaing Advanced Study Institute, Il Ciocco, Italy, 998 [7] Matin, K D, Scheie, E D, Vecoe, B L, Musical content analysis though models of audition, ACM Multimedia Wokshop on Content- Based Pocessing of Music, 998 [8] Peot, D, and Gjedigen, RO, Scanning the dial: An exploation of factos in the identification of musical style, Poceedings of the 999 Society fo Music Peception and Cognition [9] Polika, R, The Wavelet Tutoial, http://engineeingowanedu/~polika/wavelets/ WTtutoialhtml, available by July 2003 [0] Sale W (maintaine), eual ets FAQ, ftp://ftpsascom/pub/neual/ FAQ3html, 200 [] Smith, S, The Scientist and Enginee's Guide to Digital Signal Pocessing [2] Tzanetakis, G and Cook, P, Musical Gene Classification of Audio Signals, IEEE Tansactions on Speech and Audio Pocessing, vol 0, no 5, July 2002 [3] Tzanetakis, G, Essl, G and Cook, P, Automatic Musical Gene Classification of Audio Signals, ISMIR 200 Acknowledgments This wok was patially suppoted by the Potuguese Ministy of Science and Technology (MCT), unde the pogam PRAXIS XXI Refeences [] Demuth, H, Beale, M, eual etwok Toolbox Use s Guide, vesion 4, Mathwoks, 200 [2] Golub, S, Classifying Recoded Music, MSc Thesis, Univesity of Edinbugh, 2000 [3] Hagan, M, Menhaj, M, Taining Feedfowad etwoks with the Maquadt Algoithm, IEEE Tansactions on eual etwoks, vol 5, no 6, ovembe 994