Exploring Melodic Features for the Classification and Retrieval of Traditional Music in the Context of Cultural Source

Size: px

Start display at page:

Download "Exploring Melodic Features for the Classification and Retrieval of Traditional Music in the Context of Cultural Source"

Laurence Higgins
6 years ago
Views:

1 Exploring Melodic Features for the Classification and Retrieval of Traditional Music in the Context of Cultural Source Jan Miles Co Ateneo de Manila University Quezon City, Philippines Andrei Coronel Ateneo de Manila University Quezon City, Philippines ABSTRACT The major change in the preference of music downloading over CD sales has created a major shift in the distribution of music. Music industries are focusing on online products and services over the sales of physical media formats. As the number of available musical recordings rapidly increase, websites face the problem of music organization. Music Information Retrieval (MIR) is a relatively new field with the main goal of developing systems that allow users to retrieve music with the use of music content as opposed to text-based searching. This study attempts to create an accurate MIR system that uses melodic features and similarity in terms of cultural source as a means for retrieving MIDI files. The experiments for the identification of feature sets for the MIR system have been conducted. The feature sets were implemented in a Proposed MIR System. The various feature sets are presented as results together with tests concerning the consistency and quality of the Proposed MIR System. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Information Search and Retrieval retrieval models, search process, selection process; H.5 [Information Interfaces and Presentation]: Sound and Music Computing methodologies and techniques, modeling, Systems; General Terms Management, Performance, Design, Experimentation. Keywords Music, MIDI, Retrieval, Similarity. 1. INTRODUCTION Several research have been conducted in the representation of culture in music due to its impact in creation, performance and interpretation. Cultural background affects the preference of people for music: A group of people with the same cultural background often prefers a similar set of music. Therefore, apart from genre, culture can be used for music organization, especially in an era where most people have access to international music[1]. Music Information Retrieval (MIR) is a relatively new field with the main goal of developing systems that allow users to retrieve music with the use of music content as opposed to text-based search. A community-based framework called Music Information Retrieval Evaluation exchange (MIREX) was inaugurated in 2005 for the formal evaluation of MIR systems and algorithms. Activities in MIREX reflect the main concerns and specific interests of the community with regard to MIR research [3]. The tasks lists for MIREX 2005, 2006, and 2007 include: Audio Train/Test Tasks with the following subtasks: Audio Artist Identification, Audio Genre Classification, Audio Music Mood Classification, and Audio Classical Composer Identification; Symbolic Genre Classification; Audio Onset Detection; Audio Key Detection; Symbolic Key Detection; Audio Tag Classification; Audio Cover Song Identification; Real-time Audio to Score Alignment (a.k.a. Score Following); Query by Singing/Humming; Multiple Fundamental Frequency Estimation and Tracking; Audio Chord Estimation; Audio Melody Extraction; Query by Tapping; Audio Beat Tracking; Audio Music Similarity and Retrieval; Symbolic Melodic Similarity; Structural Segmentation; Audio Drum Detection; and Audio Tempo Extraction [3]. Of the 19 tasks listed, only 3 involve processing of symbolic files. Mainly because many researchers in MIR have backgrounds involved with signal processing such as electrical engineering, acoustics, speech processing, etc.[2]. This study attempts to create a new method for an accurate retrieval of symbolic files, where cultural source will be used as a similarity measure for the retrieval of songs. Section 2 gives a brief description of the techniques adapted from previous literature. Section 3 introduces the experimental flow. Section 4 presents results of the various experiments. Finally, Sections 5 and 6 concludes the study. 2. REVIEW OF RELATED LITERATURE [4] used audio signals to classify music according to cultural source. Timbre features, wavelet-based features, and musicologybased features were extracted from music recordings and were used to classify music into six cultural styles: Western classical music, Chinese traditional music, Japanese traditional music, Indian classical music, Arabic folk music, and African folk music. A dataset of 1300 music recordings, characterized by roughly the same number of recordings per category were used. 900 pieces were used for training and 400 where used for testing. Three supervised classifiers were used: decision tree, KNN, and SVM. An overall accuracy of 86% was achieved by using a combination of the features and SVM. For Western and Oriental (Chinese and Japanese) music, an accuracy of 94% was achieved. It was determined that timbre features are the most effective for cultural classification and can be slightly improved by using additional features. The research suggested the exploration of pair-wise style classification, which will be adapted in this study. [5] studied 21 musical features in an attempt to create a Genetic Algorithm (GA) fitness function. The 21 features reflect pitch, tonality, melodic contour, rhythm and repeated pattern or motifs of a melody. The 21 features were extracted from 36 melodies that were composed of 18 classical melodies, 10 pre-classical compositions, 6 traditional nursery rhymes, and 2 melodies from popular tunes. The features with high standard deviation were determined to be the features that can be used to determine melodic style. These features are pitch range, non-scale notes, step movement, note density, repeated duration, repeated pitch patterns of 3 notes, repeated pitch patterns of 4 notes, repeated duration patterns of 3 notes, and repeated patterns of 4 notes. The 21

2 features identified by [5] will be used in this study as a feature set for cultural classification. [6] mapped the 21 features used by [5] to 13 features available in jsymbolic, a java-based tool that extracts 160 features from MIDI files [7]. The 13 features were used for genre classification. Among the 160 features that jsymbolic can extract from MIDI files, [6] also identified 54 features that pertain to melodic features alone. These features were used to determine a subset of features that are useful for classification of music according to genre using C4.5 decision tree. In this study, the same technique for genre classification will be used for classification of traditional music according to cultural source. The 13 features of [5] mapped by [6] and the 54 melodic features identified by [6] will be used as a feature subset for cultural classification. The method of using C4.5 decision tree as a tool for feature selection will also be used to reduce feature subsets for cultural classification. [8] performed feature selection in order to improve automatic genre classification of traditional Malay music. 10 traditional Malay genres were used for classification: Dikir Barat, Etnik Sabah, Gamelan, Ghazal, Inang, Joget, Keroncong, Tumbuk Kalang, Wayang Kulit, and Zapin. A dataset of 191 songs were converted into WAV files and loaded into MARSYAS-0.2.2, a tool that extracts timbral texture, rhythmic content, and pitch content features. The number of features that were extracted is 73, which is composed of STFT, MFCC, and Beat Features. 7 feature selection methods were used: Correlation-based Feature Selection (CFS) with Best First Search, CFS with Genetic Search Strategy, CFS with Greedy Stepwise Search Strategy, Principal Component Analysis, Chi-square Feature Evaluation, Gain Ration Feature Evaluation, and SVM-Based Feature Evaluation. To test the effectiveness of the feature sets, 18 classification algorithms were used. The most effective feature selection algorithms for Naive Bayes are CFS with Best First Search Strategy and CFS with Greedy Stepwise Search Strategy, whereas Chi-square Feature Evaluation is most effective for SVM. No threshold was indicated for the ranking in Chi-square Feature Evaluation. As a result, CFS with Best First Search Strategy and CFS with Greedy Stepwise Search will be adapted in this study as feature section techniques for cultural classification with the use of Naive Bayes and SVM. 3. METHODOLOGY 3.1 Creation of Feature Sets 400 MIDI files were collected from Musica International [9], composed of 100 Scottish, 100 English, 100 American, and 100 French traditional songs. Four classification challenges were created: Challenge 1: UK VS NonUK; Challenge 2: Scottish VS English; Challenge 3: American VS French; Challenge 4: Scottish, English, American, and French. The corresponding datasets were prepared as follows: 1. Dataset 1: 100 Scottish and 100 English labeled as UK and 100 American and 100 French labeled as NonUK consisting of 400 songs. 2. Dataset 2: 100 Scottish labeled as Scottish and 100 English labeled as English consisting of 200 songs. 3. Dataset 3: 100 American labeled as American and 100 French labeled as French consisting of 200 songs. 4. Dataset 4: 100 Scottish labeled as Scottish, 100 English labeled as English, 100 American labeled as American, and 100 French labeled as French consisting of 400 songs. Each of the four datasets was loaded in jsymbolic for feature extraction. Only the 54 melodic features identified by [6] were selected. Each of the four datasets was loaded in WEKA 3, an open source software with a collection of different machine learning algorithm implementations in JAVA [10]. C4.5 decision tree, or J48 (JAVA implementation) was used to identify the subset of features for classification. Each of the four datasets was loaded in jsymbolic for feature extraction. Only the subsets of features identified by C4.5 were selected. 1. Dataset 1: 25 features. 2. Dataset 2: 21 features. 3. Dataset 3: 9 features. 4. Dataset 4: 32 features. Each of the four datasets was loaded in WEKA 3 for classification. Two classification algorithms were used for each dataset: Naive Bayes and SVM with 10-fold cross-validation. Each of the four datasets was loaded in jsymbolic for feature extraction. Only the 13 features mapped by [6] were selected. After feature extraction, each of the four datasets was loaded in WEKA 3 for classification using Naive Bayes and SVM with 10- fold cross-validation. Table 1. Result of Classification Naïve Bayes SVM Towsey C4.5 Towsey C4.5 UK-NonUK 65% 69% 72% 76.75% Scottish- English 64% 80% 71% 83.5% American- French 84.5% 80% 86% 83.5% Scottish-English- American-French 54.75% 53.75% 60.50% 65.75% According to the results, a higher accuracy can be achieved by using the C4.5 feature sets for challenges 1 and 2, and the Towsey feature set for challenge 3 when classifying with Naive Bayes and SVM. For challenge 4, the Towsey feature set performs better with Naive Bayes while the C4.5 feature set performs better with SVM. The winning feature set for classification challenge was selected for feature selection. The following winning feature sets were chosen for feature selection: C4.5 feature set for challenge 1, C4.5 feature set for challenge 2, Towsey feature set for challenges 3 and 4, and C4.5 feature set for challenge 4. 4 feature selection algorithms were used: Multiple Linear Regression (MLR), Correlation-based Feature Selector (CFS) with Best First Search (BFS), CFS with Greedy Stepwise, and CFS with Genetic Search. The datasets with each of the five feature sets was loaded in R for MLR. The formula used to create the linear models is: model1 <- lm(category f1+f2+f3+f4+.fn,dataset1) where the category corresponds to 0 or 1 (for challenges 1 to 3) and 0 to 3 (for challenge 4) and where f1 to fn represents each of the features.

The feature sets were reduced according to the coefficient of the features. If two or more coefficients are similar, only one of those will be selected and the other/s will be disregarded.

3 The feature sets were reduced according to the coefficient of the features. If two or more coefficients are similar, only one of those will be selected and the other/s will be disregarded. The dataset with each of the five feature sets was loaded in WEKA 3 for feature selection with Correlation-based Feature Selector (CFS) with Best First Search (BFS), CFS with Greedy Stepwise, and CFS with Genetic Search. The datasets were used as full training sets. The datasets were loaded in jsymbolic for feature extraction. The features extracted are the reduced winning feature sets in preparation for classification. The datasets with the reduced feature sets were loaded WEKA 3 for classification. Two classification algorithms were used for each dataset: Naive Bayes and SVM with 10-fold cross-validation. The 4 datasets with the 54 jsymbolic melodic features were loaded in Weka 3 for feature selection with Correlation-based Feature Selector (CFS) with Best First Search (BFS), CFS with Greedy Stepwise, and CFS with Genetic Search. The datasets were used as full training sets. The four datasets were loaded in jsymbolic for feature extraction. The features extracted are the reduced melodic feature sets in preparation for classification. The datasets with the reduced feature sets were loaded WEKA 3 for classification. The dataset with the melodic features were also loaded in WEKA 3 for classification. Two classification algorithms were used for each dataset: Naive Bayes and SVM with 10-fold cross-validation. 3.2 Creation of Proposed MIR System A typical MIR System works according to the following steps: 1. Input a query song (the actual song file). 2. Trigger the software to calculate for similar songs. 3. Software computes for similar songs. 4. Software returns top 10 most similar songs. In the typical setup mentioned above, feature extraction and distance computations are done in real time. Generally, this affects the turn-around time of the software, which relies heavily on the complications of feature extraction as well as the number of songs on its database. Since this research is mainly focused on the implementation of the various feature sets and its effectiveness in the retrieval of similar music, feature extraction and distance computation was precomputed using jsymbolic, for feature extraction, and R for distance computation. The similarity matrices replace the necessary computation that must be performed by the MIR System in order to retrieve the most similar songs. The Proposed MIR System was developed using NetBeans and Java version 1.6.0_45 on OS X The User Interface (UI) was also developed using NetBeans IDE. Figure 1. UI of Proposed MIR System The Proposed MIR System was designed to accept a song name as the query. The system will then use the various matrices to: 1. Identify the song's category A distance matrix containing the distances of each song from the four representative songs: one Scottish, one English, one American, and one French song, was computed using feature set 4 (for classification between Scottish, English, America, and French songs). The smallest distance will be identified as the song's category. 2. Reduce the candidate songs to half its size In this case, the dataset contains 400 MIDI files. The candidate songs will be reduced to 200. A distance matrix containing the distances of each song from all of the songs was computed using feature set 1 (for classification between UK and NonUK songs). The idea is to retrieve similar songs of the same larger class: searching from either only UK or NonUK songs. 3. Reduce the candidate songs to 10 Two distance matrices were created for this part. A distance matrix containing the distances of each song from all of the songs was computed using feature set 2 (for classification between Scottish and English songs) and feature set 3 (for classification between American and French songs). If the identified category is Scottish or English, the first distance matrix will be used, otherwise, the 2nd matrix will be used. The value of the 200 songs reduced from earlier will be retrieved from either the 2nd or the 3rd matrix, the songs with the top 10 smallest values will be retrieved as the top 10 most similar songs. The features of the system, those that are represented by the additional buttons: Create Baseline File, Compute Baseline, Clear output.csv, and Activate Genre Method was developed for the automatic computation involved with the testing of the system, which is discussed in the next section.

4 4. RESULTS AND ANALYSIS 4.1 Feature Sets Results Table 2. Feature Sets with the Highest Classification Accuracy UK and NonUK Classification C4.5 SVM % C4.5 Reduced by MLR Naïve Bayes % Scottish and English Classification C4.5 SVM % C4.5 Naïve Bayes 21 80% American and French Classification 54 jsymbolic Melodic SVM % Towsey Naïve Bayes % Towsey Reduced by CFS Greedy Stepwise Naïve Bayes % Scottish, English, American, and French Classification 54 jsymbolic Melodic Towsey Reduced by CFS BFS, CFS Genetic Search, and CFS Greedy Stepwise SVM 54 68% Naïve Bayes % The columns represent the following: the feature set, the classifier used, the number of features, and the classification accuracy. For UK and NonUK classification, the highest accuracy can be achieved when the C4.5 reduced by MLR feature set is used with Naive Bayes (70.25%) and the C4.5 feature set is used with SVM (76.75%). For Scottish and English classification, the highest accuracy can be achieved when the C4.5 feature set is used with Naive Bayes (80%) and SVM (83.5%). For American and French classification, the highest accuracy can be achieved when the Towsey feature set is used with Naive Bayes (84.5%) or when the Towsey reduced by CFS Greedy Stepwise feature set is used with Naive Bayes (84.5%). For SVM, the highest accuracy can be achieved when the 54 jsymbolic melodic feature set is used (86.5%). For Scottish, English, American, and French classification, the highest accuracy can be achieved when the Towsey reduced by CFS BFS, CFS Genetic Search, and CFS Greedy Stepwise feature set is used with Naive Bayes (57.5%) and the 54 jsymbolic melodic feature set is used with SVM (68%) Analysis For classification challenges 1 and 2: UK and NonUK; Scottish and English, C4.5 decision tree serves as a good feature selection technique that can improve cultural classification of traditional music in the context of cultural style. For classification challenge 3: American and French classification, the 13 Towsey feature set performs similarly with the Towsey reduced by CFS Greedy Stepwise feature set that only has 2 features when used with Naive Bayes. It can be concluded that only the two features: Pitch Variety and Range are needed for classification, which implies that the other features are irrelevant for classification when Naive Bayes is used. The other features may either be strongly correlated with each other or may not be correlated with the classes. The Towsey feature set only works well with classification challenges 3 and 4, where American and French classification are involved. The reason for this can be traced back to the study done by Towsey which was done using 36 melodies: 18 classical melodies by Bach, Mozart, Beethoven and Tchaikovsky; 10 preclassical compositions by Du Fay, Gesualdo, Gibbons, Montiverdi and Palestrina; 6 traditional nursery rhymes and two melodies from popular tunes [5]. The melodic style captured by the feature set that was identified might only apply with these types of songs. An accuracy of at least 70% can only be achieved in the first 3 classification challenges. It is more difficult to classify music into 4 classes specifically in: Scottish, English, American, and French music. In most of the cases, feature selection techniques are needed in order to achieve the highest accuracy. In classification challenge 1, the C4.5 feature set reduced from the 54 jsymbolic melodic feature set is needed to achieve the highest accuracy with SVM. MLR is needed to reduce the C4.5 feature set to achieve the highest accuracy with Naive Bayes. In classification challenge 2, the C4.5 feature set reduced from the 54 jsymbolic melodic feature set is also needed to achieve the highest accuracy both for Naive Bayes and SVM. In classification challenge 3: the Towsey feature set is enough to achieve the highest accuracy with Naive Bayes while classification with SVM does not require any feature selection. In classification challenge 4, the Towsey feature set must be reduced by CFS with either of the three search strategies: BFS, Genetic Search, and Greedy Stepwise for classification with Naive Bayes. No feature selection is needed to achieve the highest accuracy for classification with SVM. Among the eight cases, only three cases do not require feature selection: challenge 3 with the use of the Towsey feature set with Naive Bayes, challenge 3 with the use of the 54 jsymbolic melodic feature set with SVM, and challenge 4 with the use of the 54 jsymbolic melodic feature set with SVM. For some of the classification challenges that require feature selection for a higher accuracy, various feature selection techniques must be used. Different challenges require different feature selection techniques. In terms of classification algorithm, SVM performs better in the cultural classification of traditional music in the context of cultural style. 4.2 Proposed MIR System Dataset Two datasets were used to test the Proposed MIR System. The first dataset used was the original dataset that was used to train the feature sets. This dataset is composed of 400 MIDI files: 100 Scottish, 100 English, 100 American, and 100 French traditional songs collected from [9]. The second dataset consists of the songs of the first dataset together with 20 new songs: 5 Scottish, 5 English, 5 American, and 5 French traditional songs also collected from [9]. The second dataset is composed of 420 traditional songs. Only 5 songs were collected for each category, due to lack of new traditional MIDIs from [9] Consistency Using the original dataset, the leave-one-out method was used to determine the behavior of the system in terms of retrieving the top

5 10 similar songs from a query song. The method can be described by the following steps: 1. Input a query song. 2. Record the top 10 retrieved songs. 3. Input each of the recorded songs as a query song and record the retrieved songs. 4. Compare the first recorded list of songs from each of the recorded set of songs. 5. Score each of the lists. 100% for 9 similar songs. (9 is the highest number since one songs was used as the query for the list.) The consistency can be described by this formula: Consistency = (Number of Similar Songs/9)*100 One song was chosen for each of the categories: one Scottish, one English, one American, and one French song. The songs were chosen at random: 79thfare for Scottish, abidewit for English, afamousr for American, and adieusui for French. After the first experiment, it was observed that some of the songs had similar names. In order to remove this ambiguity, the songs were renamed. The category of the song was added to its song name. Example: 79thfare-scottish. After renaming the songs, the experiment was repeated. For visualization, the values were plotted using a line graph. The x-axis corresponds to the rank of the query song. The y-axis corresponds to the consistency (%) of the song used. Figure 2. Consistency of Proposed MIR System Using 4 Songs A decreasing trend was observed. Due to the interesting observation, the experiment was redesigned. In order to get results that are more descriptive of the entire dataset, 20 songs (20% of the dataset) were selected for each category. A total of 80 songs were used for initial querying. The mean of the consistencies (%) were then plotted on a graph with the same structure as the previous plot. Figure 3. Consistency of Proposed MIR System Using 80 Songs By getting the mean of 20 songs per category, the plot presents results that are more descriptive of the entire dataset. The lines are smoother compared to the original plot, possibly because outliers were tamed by getting the average of 20 songs. The same decreasing trend can be observed. The decreasing trend is true for all of the categories. It can be inferred that the decreasing trend is caused by the level of similarities of the songs used as a query. When the songs on the lower part of the recorded list are used, it returns a lower number of songs that are similar to the recorded songs since it is less similar from the original query. Thus, the decreasing trend can be expected from a typical MIR System. Note that the overlapping of the lines is irrelevant in terms of displaying a relationship between the categories. The y-axis simply corresponds to the level of accuracy. Therefore, the overlaps are insignificant Quality Genre/Class Method In order to gauge the ability of the Proposed MIR System to retrieve melodically similar songs, the Genre Method was used. The same method was used by [11] to test the quality of his proposed music similarity algorithm. The principle behind the method proposes that if the genre of the query and the most similar song match, it is probable that they are the same. For this particular experiment, instead of using genre, cultural source was used as the category. The original dataset was used. The method can be described by the following steps: 1. Input a query song. 2. Retrieve the class of the top most similar song. 3. Record the class of the query song and class of the top most similar song. 4. Repeat until all songs in the Music Similarity Software have been used as a query. 5. Compute the accuracy of prediction. In a way, the Proposed MIR System is being used to predict the classification of the query by assigning the category of the top most similar song to the query. A confusion matrix was used to compute the accuracy of prediction. An overall accuracy of 52.75% was achieved.

Figure 4. Confusion Matrix Produced by Genre/Class Method In the first experiment, it was established that some of the songs were present on two or more categories.

The table below represents a summary of the songs that belong to two or more categories. The 1 s indicate the song s category. Figure 5.

This can be used to explain why a high number of Scottish, English, and American songs have been labeled either as Scottish, English, or American.

Only a few Scottish, English, and American songs have been falsely labeled as French and only a few French songs have been falsely labeled as Scottish, English, and French.

6 Figure 4. Confusion Matrix Produced by Genre/Class Method In the first experiment, it was established that some of the songs were present on two or more categories. This implies that fuzzy boundaries exist between the categories and this can be used to interpret the values in the confusion matrix that was created from the Genre/Class Experiment. The table below represents a summary of the songs that belong to two or more categories. The 1 s indicate the song s category. Figure 5. Songs Belonging to Two or More Categories It can be noticed that the categories Scottish, English, and American are closely related. This can be used to explain why a high number of Scottish, English, and American songs have been labeled either as Scottish, English, or American. Interestingly, the category French is more defined/distinct from the other categories. This is consistent with what can be observed in the confusion matrix. Only a few Scottish, English, and American songs have been falsely labeled as French and only a few French songs have been falsely labeled as Scottish, English, and French. Folk songs are commonly shared by oral tradition. As a result, geographic location may be used to describe the fuzzy boundaries between the categories. The succeeding images present the location of Scotland, England, America, and France. Figure 6. World Map from [12] Figure 7. UK Map from [13] The fuzzy boundary between Scottish and English songs can be supported by its geographic location. The countries are relatively near to each other. On the contrary, the fuzzy boundary between Scottish, English, and American songs seems peculiar. While France is located relatively nearer to Scotland and England, the boundary between Scottish, English, and American songs are fuzzier compared to what is expected. Walter Wiora can explain the odd relationship between French, Scottish, and English songs. In his study of German folk songs, he was able to identify seven types of variation a folk song can transform into: changes in contour, changes in tonality, changes in rhythm, insertion and deletion of parts, changes in form, changes in expression, and demolition of the melody. The difficulty of understanding melodic variation caused by oral transmission was also stated by Bertrand Bronson [14]. Taking this into consideration, geography does not guarantee that countries relatively close to each other will have more similar songs New Songs Method It can be observed that the previous experiment excludes 9 out of the 10 retrieved songs. In order to take all of the retrieved songs into account, another experiment was performed, which involves all of the retrieved songs. Also, the 2nd dataset was used where the new songs were used for querying. The method can be described by the following steps: 1. Input a new song (song that does not belong to the training set) as a query. 2. Record the category of the retrieved songs. 3. Compute the accuracy of prediction for each possible category. In this experiment, the classification of the query song does not only rely on the first song but also on the other retrieved songs. The accuracy is computed by using this formula: Accuracy = (Number of Correct Classification/10)*100

Figure 8. New Songs Method for Scottish Songs The close relationship of Scottish, English, and American is evident when using the new Scottish songs as a query.

New Songs Method for English Songs The close relationship of Scottish, English, and American is also evident when using the new English songs as a query.

New Songs Method for American Songs For all of the new American songs, except the 3rd song, majority of the retrieved songs are American songs.

7 Figure 8. New Songs Method for Scottish Songs The close relationship of Scottish, English, and American is evident when using the new Scottish songs as a query. In the case of the 5th new Scottish songs, all retrieved songs were Scottish. Figure 9. New Songs Method for English Songs The close relationship of Scottish, English, and American is also evident when using the new English songs as a query. Mostly, English songs were retrieved except in the case of the 2nd song, no English songs were retrieved. Figure 10. New Songs Method for American Songs For all of the new American songs, except the 3rd song, majority of the retrieved songs are American songs. The close relationship of Scottish, English, and American is also evident when using the new American songs as a query. Figure 11. New Songs Method for French Songs The 1st and 2nd songs retrieved mostly French songs. However, in the case of the 3rd and 4th song, no French songs were retrieved, English and Scottish songs were mostly retrieved. The 5th song mostly retrieved American songs Subjective Evaluation According to the previous experiment, the ideal case would be for all of the retrieved songs to have categories similar to that of the query. The worst case would be for none of the retrieved songs to have a similar category as to that of the query. Two sets of MIDI files were played and were briefly listened to. One lists represented the ideal case and one list represented the worst case. Upon listening to the first list, one can easily conclude that the songs are alike in terms of style, specifically, songs having a flute like characteristic. On the contrary, upon listening to the second list, one can conclude that the songs have nothing in common. The results are consistent with the Genre/Class Method that, indeed, songs of the same class are most likely similar. Consequently, it can be deduced that the Proposed MIR System, to a certain extent, is able to retrieve similar songs given a query. 5. CONCLUSION In this study, various feature sets were created for the cultural classification of traditional music in the context of cultural source. Existing feature subsets such as that of Towsey and the 54 jsymbolic melodic feature set were used, as well as new feature subsets created with C4.5 decision tree, Multiple Linear Regression (MLR) and Correlation-based Feature Selector (CFS) with three search strategies: Best First Search (BFS), Genetic Search, and Greedy Stepwise. Naive Bayes and SVM were used for classification. An accuracy of at least 70% was achieved when classifying between two classes: UK and NonUK, Scottish and English, American and French. It was determined that existing feature sets such as that of Towsey and the 54 jsymbolic melodic feature set can be used to achieve the highest accuracy in some of the classification challenges. It was observed that most cases however, require feature selection techniques to achieve the highest accuracy and various feature selection techniques must to used to improve feature sets. It was also discovered that SVM performs better than Naive Bayes in the cultural classification of traditional music in the context of cultural source. The various feature sets identified were used in a Propose MIR System. The Proposed MIR System uses the various feature sets as similarity measures when computing the similarity of a query to the songs that are available. Simple Euclidean distance was used to compute the similarities. While examining the behavior of the Proposed MIR System, it was determined that training set contained songs that belong to two or more categories. As a result,

8 it was established that a fuzzy boundary exists between Scottish, English, and American songs. This finding was supported by the results of using the Genre/Class Method and the New Songs Method that was used for testing the ability of the system to retrieve similar songs. The results also reflected Walter Wiora's concept that folk songs transmitted orally undergo multiple transformations and may change its form given that countries relatively near each other do not have fuzzy boundaries. Lastly, when subjective evaluation was used, it was identified that, to a certain extent, the Proposed MIR System has the ability to retrieve similar songs given a query. 6. FURTHER STUDY Going back to the results, only subjective evaluation can truly identify whether the Proposed MIR System is able to retrieve similar songs. Though the other methods used: the Genre/Class Method and the New Songs Method are conceptually accepted methods, it cannot replace evaluation done by a human listener. Based from the interesting results from the Genre/Class Method and the New Songs Method, this study suggests that a human listener must evaluate the dataset that was used, to identify whether a fuzzy boundary truly exists between the classes and also to identify whether the retrieved songs are in fact the most similar songs given a query. 7. ACKNOWLEDGMENTS Our thanks to DOST-ERDT for the funding of this research. 8. REFERENCES [1] Yuxiang Liu, Qiaoliang Xiang, Ye Wang, and Lianhong Cai. In Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on, pages 57 60, April. [2] Stephen J. Downie. The music information retrieval evaluation exchange ( ): A window into music information retrieval research. Acoustical Science and Technology, 29(4): , [3] MIREX Home Page, [4] Jiawei Han and Micheline Kamber. Data Mining : Concepts and Techniques. Elsevier, [5] Michael Towsey, Andrew Brown, Susan Wright, and Joachim Diederich. Towards melodic extension using genetic algorithms. Educational Technology and Society, [6] Andrei Coronel, Ariel Maguyon, and Proceso Fernandez. Specifying features for classical and non-classical melody evaluation. PCSC, [7] Cory Mckay and Ichiro Fujinaga. jsymbolic: A feature extractor for midi files. In Int. Computer Music Conf, pages , [8] Shyamala Doraisamy, Shahram Golzari, Noris Mohd Norowi, Md. Nasir, B Sulaiman, and Nur Izura Udzir. ISMIR 2008 Session 3A Content-Based Retrieval, Categorization and Similarity, A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music. [9] Musica International, January [10] Weka 3: Data Mining Software in Java, February [11] D. Schnitzer. Mirage High-Performance Music Similarity Computation and Automatic Playlist Generation. Master s thesis, Vienna University of Technology, [12] K483295worldmap, April [13] The United Kingdom of Great Britain and Northern Ireland, March [14] Peter Van Kranenburg, Jo rg Garbers, Anja Volk, Frans Wiering, Louis P. Grijp, and Remco C. Veltkamp. Collaboration perspectives for folk song research and music information retrieval: The indispensable role of computational musicology.

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or