A Basis for Characterizing Musical Genres

Size: px

Start display at page:

Download "A Basis for Characterizing Musical Genres"

Ella Anderson
6 years ago
Views:

1 A Basis for Characterizing Musical Genres Roelof A. Ruis Bachelor thesis Credits: 18 EC Bachelor Artificial Intelligence University of Amsterdam Faculty of Science Science Park XH Amsterdam dr. A.K. Honingh Institute for Logic, Language and Computation Faculty of Science University of Amsterdam Science Park XG Amsterdam Supervisors dr. M.W. van Someren Informatics Institute Faculty of Science University of Amsterdam Science Park XG Amsterdam July 4th, 2014

2 Abstract By exploratory means, a method is presented for providing musical genre characterizations using songs from two MIDI corpora. This characterization is first based on global features and later refined using interval sequences derived from melodies. In the first part of the research a top 10 of characteristic global features per genre is established through correlation and clustering. In the second part this correlation is again used to determine a top 10 of characteristic interval sequences. Analysis is then presented as to show meaningful interpretation of the data, resulting in small genre characterizations. 2

3 Contents 1 Introduction 4 2 Related Work 4 3 The Corpora 5 4 Part 1 - Global Features Method Principal Component Analysis Unsupervised Clustering Point-Biserial Correlation Interpretation BOD main BOD popular BOD jazz BOD classical BAL Part 2 - Interval Sequences Method Preprocessing Interval Sequences Sorting on Occurrence Interpretation Conclusion 16 7 Discussion Part Part General A JSymbolic Features 20 B Results Part 1 21 C Results Part

4 1 Introduction When humans talk about music, they tend to group songs with comparable characteristics as belonging to the same genre. Often people can recall some of the musical aspects on which they base this grouping but only tend to name superficial features like instrumentation. It would thus be interesting to see if a more exact genre description could be constructed. Musicologists have already tried to describe genres using literature on music and a good ear, but nowadays it is possible to use computational methods to analyze larger corpora and find more refined musical features. In light of this, musical genre classifiers have been built which obtain high accuracy on mimicking genre grouping behavior. They do however not provide information on genre-specific features, which are exactly what musicologists are eager to discover. It is therefore desirable to aid the musicologist with a method to extract musicologically clearly interpretable features related to genre, or better still, a computationally extracted genre description. Of course the difficulty in extracting such a description should not be underestimated; it is difficult to define beforehand the features by which a genre should be described. Furthermore, it could be that some genres may be defined more by meta-aspects of songs that can not be extracted from the MIDI context such as artist, year of release and geographical origin of a song. Because this research will work with MIDI files which contain no meta-data, such features will be left out and might be explored in future research. It is sensible to look at both global and surface features. Global features can be seen as aspects of songs to which an overall value can be given which is applicable to the whole song, such as electric guitar fraction or average time between attacks. In contrast, surface features focus on the actual melodic content and try to describe certain patterns therein. Either these global or surface aspects might contribute to more accurate genre descriptions and due to the difference in analysis for both, the research is split into two parts. During the first part, global features provided by a feature extractor will be examined using principal component analysis and various other machine learning techniques, working towards a high level genre description. The second part of the research will focus on finding genre specific melodic structures, which can be used as detailed genre descriptors. It is to be expected that these melodic structures themselves can then not only be used descriptively but also to improve the accuracy of existing classification methods. The interpretation of the results will incorporate results from part one as to check their communalities. Because other researchers have conducted very little research into finding explanatory features, the research presented here will be mainly exploratory, providing extensive documentation of the investigated paths and decisions that have been made. Furthermore, no statistical tests will be used at this stage, so if interesting results surface, more meticulous further research will be required. This documentation, therefore serves two goals. Firstly, trying to provide features, expressed in terms of global features and later refined and completed with melodic structures, gained from a corpus of genre specified MIDI songs, capable of meaningfully characterizing the different genres in the corpus. Secondly, it hopes to provide material for supplementary studies, which might be suitable for other than musicological research. For instance social network analysis, where meaningfull features play a role in characterization of groups of people. 2 Related Work Much research has been done into the machine based recognition of musical genres using only MIDI data [8] [3], a combination of MIDI and audio features [2] or features purely based on audio 4

5 [10]. All of the systems presented in these papers first apply some form of feature extraction whereafter they perform a classification based on those features. The classification accuracy is then evaluated and improvements or new ways of classifying music are claimed. These systems, however, do not provide any means of understanding what characterizes a musical genre but only how well it can be classified based on arbitrary features. Sturm [11] argues that because there are more independent variables changing between particular genres in a dataset, classification is unreliable and other types of experimental designs are required to understand how a genre can be characterized. He mentions, among others, inspecting features and answering the question At what is the system looking to identify the genres used by music? (p. 376). A further motivation for this research is the final remark of McKay and Fujinaga [8] in a small genre classification research stating that Further study of which features were selected by which specialist classifier ensembles could also be of great musicological interest. (p. 530). This further emphasizes the need for a thorough look into feature meaning and selection. The high level features required in the first part of the research will have to be extracted by a program capable of handling MIDI songs. The program used in this paper, created by McKay [7] is called JSymbolic and is capable of extracting up to 111 features which meanings are explained in the paper. McKay also notes The library of features used in this thesis should be seen as a work in progress that can continually be expanded and refined [...] (p. 62), stressing that the features extracted by JSymbolic could used some refinement; they do not provide the musicologist with much deep insight into song structure and melodic and rhythmic sequences but instead define a coarse value for, for instance, average note duration or average melodic interval. The second part of the research exploring the use of small melodic structures in genre characterisation, will use an idea presented by Conklin [5]. The paper presents a method for combining a series of notes from a melody into a segment and assigning these segments to a class, based on certain rules about their content. These melodic segment classes are then used for style discrimination of different melodies and results well above random chance are achieved. Honingh et al. [6] also use the aforementioned concept of melodic segment classes to distinguish between tonal and atonal music and develop a model of pitch class progression. Both these studies thus show that inspecting musical features at this scale might not only lead to determining genre distinctions but also provide more insight into which information is hidden in the structure of the music on a very detailed level. 3 The Corpora For this research, two genre annotated MIDI corpora were used. Firstly the Ballroom Dance Corpus [4], from here on indicated with BAL, containing 6 types of ballroom dances which structure can be observed in table 1. Secondly the Bodidharma MIDI Corpus [7] is used, indicated with BOD. The large version of the BOD corpus consists of 38 genres with a hierarchically defined structure with a maximum depth of four. There is, however, also a small subset of this large corpus consisting of 9 sub genres divided into three main genres that will be used here, but the method presented in this paper could be extended to cover the large version of the BOD corpus. The structure of the reduced BOD corpus as used in this research can be seen in table 2. 5

6 Genre Nr. of songs Bossa Nova 36 Mambo 15 Merengue 24 Rumba 20 Salsa 7 Tango 26 Table 1: Structure of the BAL corpus with song counts Because both corpora come prelabeled and are used in other researches, it is assumed that this labeling is correct. Main genre Subgenre Nr. of songs Popular 75 Hardcore Rap 25 Punk 25 Trad. Country 25 Jazz 75 Bebop 25 Jazz Soul 25 Swing 25 Classical 70 Baroque 25 Modern Classical 25 Romantic 20 Table 2: Structure of the BOD corpus with song counts 4 Part 1 - Global Features 4.1 Method Principal Component Analysis First, features are generated for all songs in both the BAL and BOD corpus using the JSymbolic extractor, producing a csv file with data for each separate genre. A complete list of the feature names is included as listing 1 in appendix A. To get an intuition about the high dimensional structure of the data and the possible overlap of genres, principal component analysis (PCA) is used to map the high dimensional data to a 2D space. This also provides means to visually detect outliers or anomalies and a first possible step to finding characteristic features. Visualization The BAL corpus is inspected first. Figure 1 shows all songs in this corpus mapped to a 2D principal component (PC) space using the two most dominant principal components as axes. All features were used in the calculation of these components. It can clearly be observed that most genres form quite compact clusters but that there is still a decent amount of overlap. Because all features were used it is to be expected that the clusters will get better when a decent subset of features is selected. Principal Component Bossa Nova Mambo Merengue Rumba Salsa Tango Principal Component 1 Figure 1: BAL corpus in 2D PC space based on all features. A few other things can be observed from this visualization. At first, the salsa songs are not forming a coherent cluster, which is also due to the fact that very few salsa songs were provided by the BAL corpus. For this reason the salsa genre is dropped for the rest of part 1 of the research. Secondly, the tango songs seem to form two distinct groups. Listening to the songs indicated that the group in the upper right corner consists of tango songs played on solo piano while the other tango songs are performed with a multiple instrument setup. Because it is likely that this distinction will affect the precision of the genre characterization, the tangos performed on piano solo are also removed from the data. The new BAL corpus with salsa and piano tango songs removed is shown in figure 2. 6

7 Principal Component Bossa Nova Mambo Merengue Rumba Tango Contribution Figure 3: Top level of BOD corpus in PC space based on all features PC vector 1 PC vector 2 PC vector Principal Component 1 Figure 2: BAL corpus in PC space without salsa and tango piano based on all features. The BOD corpus shows various degrees of such cluster formation. Clear cluster separation with a bit of overlap can be observed for the main level (figure 3) and the popular sub level (figure 8 in appendix B). Furthermore a clear distinction can be seen between bebop and swing in the jazz sub-genre (figure 9 in appendix B), while jazz soul does have a lot of overlap with bebop. The classical sub level (figure 10 in appendix B) shows the most overlap of all, indicating that the features used might not be working too well for separation of classical music or that a smaller selection of features might be required to get good separation. No other anomalies or strange outliers can be seen so removing any songs from the BOD corpus beforehand is not necessary. Principal Component Popular Jazz Classical Principal Component Dimension Figure 4: Contribution to dimensions of the first 3 PCA vectors for the BAL corpus. Inspecting Dimensions The PC axes are formed by a linear combination of the feature dimensions and it might be that they can explain the forming and position of the clusters fairly well. Results indicate however that too many features are contributing to one PC axis to be of any value. Figure 4 and 5 show the dimensions for the first 3 PCA vectors plotted against their individual dimensional contribution. If only a low number of features would contribute to the PC axis it is to be expected to get a few high contribution values for the very first dimensions and thereafter a steep decline with only very small contributions for the other dimensions. As the figures show, this is not the case for there is no steep decline and no large contribution by the first couple of feature values. Visual Aid While the actual PC axes are thus difficult to interpret by looking at the feature decomposition, it might be possible for a trained musicologist to use a PC plot as an aid for deriving genre characteristics by ear. Looking at figure 2, one could select a range of songs by varying the value of principal component 1 7

8 Contribution while keeping the value of principal component 2 at a fixed value. Listening to this range of songs might then provide insights into aspects of the music that change throughout. Concluding we have seen that while PCA is a method suited for detecting outliers and anomalies in the corpus and visualizing the forming of clusters it can not be used for detecting characteristic features, although it could be used as a visual aid PC vector 1 PC vector 2 PC vector 3 from the others) such an improvement is to be expected. Evaluating clustering Unsupervised clustering is finding clusters of songs which are close together in feature space. It can however be that the found clusters do not represent the original genre groups. To find out how well the found clusters match the original genres the following measure is devised: It is checked to which genre the majority of songs in a cluster belongs (scaled by the total amount of songs in the dataset for that genre), which is then assumed to be the correct class for that cluster. This is done for every cluster. Hereafter, Accuracy and recall are measured and the F1-score is derived for each genre individually. Because the cluster centroids are randomly initialized on each run, 1000 cycles of the clustering algorithm are ran to get a reliable mean. There are as many cluster centroids initialized as there are different genres in the measured corpus Dimension Figure 5: Contribution to dimensions of the first 3 PCA vectors for the top level of the BOD corpus Unsupervised Clustering To discover if the JSymbolic features can really be used for recognizing (and therefore characterizing) genres a solid measure is needed. Although clustering of genres can already be observed in the PCA plots, it provides no exact measure for the amount of separation between the different genres. It is although possible to use unsupervised k-means clustering to detect how well clusters form in the same way as the genre labels indicate for different feature sets. This way it is possible to compare clustering of different feature sets and see if the clustering improves. For characteristic features (features that are in particular suited for separating one genre Bossa Nova Mambo Merengue Rumba Tango Table 3: F1-scores BAL using all features. Clustering with all features Clustering the BAL corpus yields the F1-scores displayed in table 3. A score of 1 means complete separation of a genre into its own cluster and a score of 0 means that all songs for a genre got mixed up in other genre clusters. The values of this clustering correspond to the visual results obtained from the PCA analysis as shown in figure 2. Looking at that figure, the low value for mambo can be explained from the overlap of almost all of the mambo songs with merengue or rumba. These genres take precedence and thus mambo songs are almost always classified under a differ- 8

9 ent genre. Popular Jazz Classical Table 4: F1-scores BOD main using all features. Hardcore Rap Punk Trad. Country Table 5: F1-scores BOD popular using all features. Bebop Jazz Soul Swing Table 6: F1-scores BOD jazz using all features. Genre F1 Baroque Modern Classical Romantic Table 7: F1-scores BOD classical using all features. Within the BOD corpus, the top level as well as the three sub levels can be clustered and scored individually in the same way. Tables 4, 5, 6 and 7 show the F1-scores for clustering these genres. It can be observed that most of the genres already separate very decently on their own except for modern classical music. The fact that some genres are not well separable means that the features used here cannot fully explain the difference between them or that too many features are used Point-Biserial Correlation To get a better understanding of which features predict well for which genre and thus work towards finding actual characteristic features, correlation between the features and the genres can be calculated. Because the features have continuous values and the songs can be regarded as being dichotomous (a song either belongs or does not belong to the measured genre) point-biserial correlation can be used. For each individual genre this correlation is calculated, where all songs within the measured genre are said to be in class 1, and all other songs in class 0. Measuring the point-biserial correlation for all features and all genres in the BOD corpus yields a top 10 correlating features for each genre. Table 8 shows this top 10 for bossa nova together with the correlation scores. Feature 65. Rel. Note Density of Highest Line* Avg. Note Duration* Avg. Time Between Attacks* Rhythmic Variability* Repeated Notes** Importance of Bass Register Melodic Thirds** Avg. Nr. of Independent Voices Str. of Strongest Rhythm. Pulse* Comb. Str. of Two Str. Rhythmic Pulses* Table 8: Top 10 highest point-biserial correlation scores for the bossa nova genre (in BAL). The features in the table are highly correlated with the bossa nova genre when checked against all other genres in the BAL corpus, meaning that these features are likely to characterize bossa nova based on differences it has with the other genres in the corpus. Using the information provided in the research by McKay, a detailed description of the individual features can be given. As an example the first feature Relative Note Density of Highest Line will now be discussed. A musical piece often contains different lines or voices which have an average pitch. The highest line 9

10 refers to the voice with the highest average pitch. The relative note density of the highest line now is the number of notes in the voice with the highest average pitch divided by the average number of notes in all other voices. The high correlation is indicating that, relative to the other genres, bossa nova has a large amount of notes in its high lines compared to its lower lines. Section 4.2 will elucidate more on the interpretation of the other features and other genres. Furthermore some first remarks on melody and rhythm can already be made. The features marked with one asterisk are related to rhythm and the features marked with a double asterisk are related to melody. This shows that bossa nova has high correlations with features relating to both rhythm and melody, but which could use much refinement. For instance if not much melodic thirds are observed one might want to know which intervals then áre most likely to occur. This stresses the need for part 2 of the research. Verifying correlation results To show that the correlating features can indeed be regarded as being characteristic for their corresponding genre, two methods might be used. First the F1- score of the unsupervised clustering using only the selected features listed above can be computed. The results are shown in table 9. Bossa Nova Mambo Merengue Rumba Tango Table 9: F1-scores BAL using Bossa Nova correlating features. When comparing these results to the values given in table 3 it can be observed that the difference between bossa nova and the second best scoring genre has more than quadrupled. This means that the split between the bossa nova genre and the other genres is now rather significant. The fact that the F1 score for bossa nova is somewhat lower than in table 3 is understandable because bossa nova already got separated decently when all features were used. Because PCA could be used as a method of visualizing the data it can be used as a check on how the correlated features have split the data. Figure 6 shows the BAL corpus in PC space using only the features from table 8, and a strong separation between bossa nova and the other genres can clearly be observed. This and the foregoing check using F1 values are a strong indication that this particular feature subset characterizes just bossa nova. Because the goal is to find characteristics of each individual genre, this is a very appealing result. Principal Component Bossa Nova Mambo Merengue Rumba Tango Principal Component 1 Figure 6: BAL corpus in PC space with Bossa Nova correlating features. The top 10 features for each genre together with their F1 scores are included in appendix B and the reference scores for clusters using all features are in tables 3, 4, 5 and 6, 7. Results shown here for bossa nova in the BAL corpus generalize to almost all other genres in the other corpora used: The F1-score of the selected genre increases relative to the other genres when its correlating features are used when compared with clusters in which all features were used. Some genres (such as tango, table B) show 10

11 a strong relative increase while with others such as merengue (table B), this is less obvious, in the case of merengue also showing a big increase for mambo. Another anomaly is observed in table B, where the features correlating with bebop actually provide a more distinct clustering of swing. In these cases that correlating features fail to improve clustering results, these features might not provide a decent genre characterization and are therefore to be interpreted with care. 4.2 Interpretation Although quite some definitions of genres are published online in the Grove Music Online Dictionary [1], these texts do contain only a very small amount of information about specific genre characteristics and instead focus more on the social and cultural backgrounds. It is thus very valuable to work towards genre characterizations as to provide additional information on the understanding of genres. The following paragraphs provide a concise interpretation of the results of some genres; a full fledged interpretation is left to musicologists. The discussed results can be found in table 8 for bossa nova and the leftmost tables of appendix B for the other genres BOD main While the results hopefully present new knowledge to be used for more precise genre descriptions, findings matches with existing beliefs strengthen the validity of the method presented. Especially in the BOD main corpus, such familiar features are observed, likely because these broad genres capture global characteristics. Popular Table B shows the features correlating with the popular music in relation to jazz and classical music. Almost all features match with an intuitive description of popular music. Firstly the use of electric instruments and electric guitar in particular can be observed. Secondly popular music has a strong tonal focus and is likely to use tones only present in a certain scale, which is indicated by positive correlation with Nr. of common pitches, Most common pitch prevalence and negative correlation with Pitch variety. Furthermore a negative correlation with the range of highest line can be observed, which in pop music is often the melodic line which tends to be restricted in range. Jazz In jazz music, a significant correlation with saxophone and brass fraction can be observed. The presence of melodic tritones together with a high pitch variety is an indication of the more complex melodies often found in jazz. Classical Characteristics for the classical genre show the absence of electric instruments and percussion, and importance of high register, absence of importance of bass register and high value for primary register all three indicate that relative to pop and jazz, much is going on in the high lines BOD popular Hardcore Rap The hardcore rap genre is characterized by relatively long songs, where much movement in minor and major seconds is observed ( chromatic motion and stepwise motion ). Both negative note density and average time between attacks indicate that these songs are relatively slow paced. Punk Punk songs have a strong negative correlation with Duration meaning that they are often short songs. Furthermore they use a lot of repeated notes and arpeggiation. An interesting negative correlation that might not be so easily explained is that with Melodic thirds which seems to indicate that punk music lacks melodic thirds as opposed to hardcore rap and traditional country. Traditional Country Traditional country has the highest difference in melodic material of 11

12 all, indicated by pitch variety and pitch class variety. Furthermore a strong correlation with the use of melodic thirds can be observed BOD jazz The BOD jazz corpus provides the most difficult to interpret features and, as shown before, because the bebop features seemed to be unreliable, interpreting the results for the jazz corpus is left to others BOD classical Baroque Within the classical corpus, baroque stands out for its restricted range, restricted pitch variety and restricted melodic tritones. Together with most common melodic interval prevalence, indicating that there is a certain melodic interval that is often reoccurring, one can conclude that this genre is focused around tonality. Unfortunately no features concerning rhythm show up. Modern Classical Although for modern classical change of meter has a quite strong correlation, most other features show very low values. It is therefore difficult likely that these features are not very useful. Romantic The romantic genre seems to be characterized by rhythmic looseness, not only indicated by the feature with the same name but furthermore because of the lack of a strong second strongest rhythmic pulse BAL Bossa Nova As discussed earlier, bossa nova has a relative high amount of notes in the high lines. Contrasting though, there is also a high correlation with Importance of Bass Register. A strong positive correlation with average note duration and average time between attacks might be indicative of relatively slow music when compared to the other genres in the corpus. Furthermore a lack of melodic thirds shows up, which is a feature that might be verified by the second part of the research. Mambo The F1 score for mambo, although increased compared to table 3 is an indication that features for mambo will not be very reliable. Though, the occurrence of melodic fifths is something that again can be examined in part 2. Merengue The first feature for merengue voice equality - number of notes is described as Standard deviation of the total number of notes in each channel that contains at least one note [7] (p.66). A high positive correlation means that some instruments will play much notes while others will play very few. The overall note density is high, and the average note duration is negatively correlated, indicating many short notes. Rumba Rumba has a high correlation with both pitch variety and pitch class variety, indicating that it is likely that its either non-tonal or often uses modulation within a song. Furthermore it has quite a strong correlation with melodic thirds which will be inspected in part two. Tango A lack of unpitched instruments and percussion in general is observed. The use of polyrhythms is notable, together with the high pitch variety and chromatic motion, creating the expectation of interesting complex melodies. Both the use of chromatic motion as well as the lack of arpeggiation can be verified during part two. 5 Part 2 - Interval Sequences 5.1 Method Now that it has been shown that characteristic features can be extracted on a global level, 12

13 the attention in this part shifts to characteristic melodies obtained on the local level. It is known that rhythm plays a significant role at least in characterization of ball room dances [4] and it would be interesting to discover if they can be characterized by melody as well. Most songs however consist of multiple voices of which only one plays the melody. The reason that only melody voices will be analyzed is because the melody plays a dominant role and is often recalled the best, increasing the chance that genres can best be characterized by information extracted from melody. Furthermore multiple notes playing at the same time are rare in melodies, which eases determining by which notes it is formed. A melody consists of consecutive notes which have intervals in between them. Though, different melodies might map to the same interval (Both C to G and A to E have an interval of 7 in between them) so using these interval sequences captures the shape of the melodies instead of the exact pitches and key. This is similar to the way a derivative captures the movement of the original function while disregarding its vertical position Preprocessing Preprocessing the Corpus Because there was no corpus with prelabeled melodic lines that would work well in MATLAB it required some preprocessing of the existing corpora. This preprocessing was rather time consuming so melody extraction was only done for the BAL corpus and the BOD corpus was dropped. Selecting melodic lines was done through listening to the MIDI files and extracting the correct tracks by hand. In the case of melodies with more than one note playing at the same time, the upper voice was chosen because it is in general perceived as the main melody. For each genre, this time including salsa, 10 to 20 melodic lines were selected, depending on the available files. It might be possible to automate this process of melodic track selection, for instance with a method presented by Rizo [9]. Interval extraction From the previously extracted melodic lines, the intervals in semitones were calculated. Figure 7 shows such an example extraction for a random melody: the numbers indicate the intervals relative to the preceding note. Notice that an interval sequence of length n corresponds to a melody of length n+1. During this procedure, the rhythmic information about the melody is lost. This interval series is the data used for calculating the counts of the different micro-melodies. Interval series will be written between curly brackets like so {1, 2, -3, 1, 5, -1, -2} for the example sequence Figure 7: Melodic excerpt with relative pitch intervals The feature vector consists of interval sequences of length 1, 2 and 3. Taking any longer sequences is likely to provide very sparse results while increasing extraction time. In the feature vector all intervals from {-12} up to {12}, {-12,- 12} up to {12,12} and {-7,-7,-7} up to {7,7,7} are included. The triple interval sequences are only measuring a maximum distance of seven semitones (a perfect fifth) because computation would otherwise take rather long. All of the sequences can now be scored by counting their occurrences in the series of pitch intervals for each melody Interval Sequences As well as in the first section, it is possible to find interval sequences that explain a genre particularly well by calculating the point-biserial correlation. This is done in the same way as shown in part 1, grouping the melodies from one genre in one class

14 It is although possible that a particular sequence scores a high correlation because it only exists in one certain melody. Such a sequence should nevertheless not be seen as characteristic because it does not occur in multiple melodies pertaining to one genre. To account for this effect, besides a column indicating the sequence and correlation scores, a third column is added to the results. This column shows the percentage of occurrence of the sequence among all melodies of that genre. If now sequences can be found that score high on both measures, it is a strong indication that these sequences are characteristic sequences illustrative for the genre as a whole. It should be noted that the foregoing is also valid for sequences with a negative correlation and a low occurrence: negative correlation implies that a genre stands out from the others by lacking a particular sequence. Sequence Occurrence {-2,2,-2} % {-2,2} % {2,-2,2} % {2,0,-2} % {-1,-1,-3} % {-7} % {0,-2,2} % {-2,2,0} % {1} % {7,-2,2} % Table 10: Top 10 characteristic melodic sequences for bossa nova in BAL sorted on correlation value. Table 10 shows the 10 sequences with the highest correlation and occurrence percentages for bossa nova, the tables for the other genres are included as the odd numbered tables in appendix C. Section 5.2 provides an interpretation of these results Sorting on Occurrence Sorting the data on correlation stressed the melodic properties that are in particular not shared between genres. While this serves to find very characteristic sequences, results showed that for many sequences occurrence was low. That is why the same set of sequences is now sorted on occurrence, with the condition that the corresponding correlation has to be significant. Table 11 shows the occurrence sorted results for bossa nova. In the case of a sequences with negative correlation, an occurrence of 0% is the optimal result, which is why in the table the first 9 negatively correlating sequences score higher than the tenth (positively correlating) sequence. In appendix C the even numbered tables provide all occurrence sorted results. Sequence Occurrence {-6} % {-3, -3} % {9, -1} % {-2, 0, -3} % {8} % {-8} % {0, 4} % {0, -2, 0} % {-9} % {1} % Table 11: Top 10 characteristic melodic sequences for bossa nova in BAL sorted on highest occurrence. 5.2 Interpretation For each genre in the BAL corpus, the results as included in table 10 and 11 and appendix C will be discussed. The correlation sorted results show rather low scores in the occurrence column for most of the sequences, so albeit that these sequences have a strong correlation with a genre, they are not representative for most melodies of that genre. Therefore definitive conclusions about melodies in genres based on this data should be drawn with caution. Bossa Nova No other genre scores so many sequences with negative correlation in the occurrence sorted results as bossa nova. This is interesting because it indicates that melodic lines in bossa nova can be partly explained by what does 14

15 not happen in the music. Looking at the correlation sorted results, a lot of movement in major seconds ({-2}, {2}) is observed, especially in context of arpeggiation ({-2} followed by {2} or the other way around). Because these sequences of major seconds score rather high, the absence of sequence {0}, {-2}, {0} is remarkable. It indicates that a double note is rarely followed by a double note a minor second lower, strengthening the assumption that the major second is used extensively in arpeggiation. Furthermore there is a complete lack of descending motion in augmented fourths ({-6}, {- 3,-3}) and only a very small amount of movement in minor sixths ({-8}, {8}). The only result that shows up in both tables is the rising minor second ({1}), indicating that this interval is used a lot more in this genre than in the other genres, except for tango. Tango then again stands out because it incorporates chromatic movement, which does not seem to be a bossa nova characteristic. The results obtained in part one indicated a lack of melodic thirds ({3} and {4}) which is in complete accordance with part two: only a single {-3} can be found in the correlation sorted results, and only a single {4} can be found in the occurrence sorted results. Mambo Both tables show very low occurrence on all features, and correlations are rather low compared to other genres as well. The sequences also lack interesting recurring interval patterns. Therefore no reliable conclusions can be drawn regarding characteristic melodic lines. It can however be argued that the mambo melodies might contain the most diverse range of pitch sequences of all the ballroom dance melodies. Part one presented a mambo feature indicating melodic fifths {7}, while also noting that it would likely be unreliable. While the found interval sequences do show some ±{7} intervals, their occurrence is indeed too low to be reliable. Merengue This genre stands out by repeating notes more than the other genres: sequences of two and three repeating notes ({0}, {0, 0}) score high on both occurrence and correlation. These repeating notes are often accompanied by movement in major thirds ({4} or {-4}) and also on their own, the major thirds seem to stand out. The perfect fifth ({7}) also shows up in both tables which together with the major thirds might be indicative of movement over major chords. Finally there is a complete lack of downward chromatic motion ({-1,-1}), indicating that the melodic lines in merengue are likely to move in a more tonal way: minor and major scales have no consecutive minor seconds. Rumba As well as with mambo, occurrence and correlation are rather low for rumba. Though, a large amount of movement in ascending minor thirds ({3}) can be observed, as well as patterns indicative of modulation in minor seconds ({-4, 3}). This is in accordance with the results found in part one which indicated movement in melodic thirds ({3} and {4}). Because occurrence is so low for all sequences, absence of other characteristics is likely. Salsa The correlation sorted results for salsa are scoring too low to be really reliable. Some sequences are however showing some interesting movement: {-4, -3, -4} is a downwards movement over a major seventh chord, {-4, 2, 5} is moving over a major add two chord. Incorporating the occurrence sorted results, strong indication of movement in minor thirds ({-3}) can be observed. Besides this movement, there is a sequence with length three with also quite high correlation ({-1, -4, 2}) which moves over a piece of the standard major or minor scale. Opposing these quite strong characteristics are other frequently occurring sequences which are more difficult to explain in terms of logic melodic movement ({-2, 5}, {1, -3}). This could again be an indication that salsa is not so much characterized by specific pitch sequences. 15

16 Tango Of all ballroom dances, tango stands out for its extensive use of both ascending and descending minor seconds ({1}, {-1}) indicative of chromatic motion, a feature also found in part one of the research. Besides, almost all sequences use only movement in (minor and major) seconds, showing that tango lacks large melodic jumps. When played, most of the sequences sound like they are in a minor key and have a strong tendency to lead to a certain note. Especially the sequence {-2, -2, -1} stands out as a characteristic tango melody, with a downwards motion over a minor scale, over which the patterns {- 1, -2, 7} and {-4, 0, -1} are also moving. This might be an indication that tango melodies, at least the ones presented in this corpus, stand out by being written in minor keys. Another result presented in part one was the absence of arpeggiation. As arpeggiation is movement back and forth between two notes, it can be represented by two interval sequences {n, n} and { n, n} with n any number between 0 and 12. Out of both tables, only three sequences incorporate {-1, 1}. It is thus plausible that very few arpeggiation is occurring in tango songs. 6 Conclusion In this research, both global features and a selected set of small melodic sequences have been examined in order to set up a basis for understanding musical genre through computational means. Using PCA proved useful in visualization of the data and detecting outliers, while it could not be used for actual feature selection. Hereafter a measure for determining the amount of genre clustering and mix up between genres was devised using unsupervised k-means clustering which could then be used to verify correlation between genre and features. Part two used correlation between genres and interval sequences together with occurrence of these sequences among songs to find genre specific melodic structures. Both the global features and small melodic sequences proved to be useful in characterization of a diverse range of musical genres. Some genres, like merengue and mambo in part 1, showed much overlap with each other and seemed to be difficult to separate or characterize. This was though to be expected as some genres would be more likely characterized by other features (such as meta information). The extensive amounts of data provided through both parts of the research require careful further analysis by musicologists or other researchers with a decent musical background. The small amount of analysis performed in this report however served to show that the results indeed have an understandable and clearly interpretable meaning, that eventually might be used to expand understanding of musical genres and define them in a more formal way. Furthermore, as shown in the interpretation of part two, many of the global features related to melody found in part one could be seen again in part two on a more detailed level. This serves to show that global features can be used in exploration of interesting characteristics while methods such as interval sequences can then be used to refine the found global features. The methods presented can thus be used in future work for exploring and refining the global features related to rhythm. 7 Discussion Because the approach taken in this research was based on exploratory means of finding the characteristic features, there is much room for discussing the used methods and alternative approaches. This section will provide an overview of the alternatives for each part of the research as well as some general remarks, on which future work might be based. 16

17 7.1 Part 1 Inspecting PCA components In the principal component analysis presented in the first part of the research, the principal component axes are made up of a combination of all features. It might be interesting to see which features are the most important within these axes and to see what happens to the position of the clusters when important features are removed. Though contribution of the individual features to the PC axes was shown to be small, inspecting a combination of important features might help in getting a better understanding of characteristic features within one corpus. 7.2 Part 2 The use of counts of melodic interval sequences as features yields understandable information which, as was shown, can be used to describe a genre in more detail. There are however more possibilities for extracting features while making sure that these features remain clearly interpretable. Conditional chance Incorporating theory from the area of language processing, the individual pitch distances can be seen as an alphabet and sequences of these can be seen as n-grams. Because the sequences have been counted it is now possible to create an n-gram melodic sequence model from the data and gather data about conditional interval chance. Combining this knowledge with the results obtained in this research, on can, within a genre, further inspect high scoring sequences on which way they are likely to continue. These chance features might also be used in classification, but presumably only 2-grams can be used in this way because a single melodic line lacks enough information to give statistical accurate results for higher n-grams. The second moment In cryptography the second moment of the frequency distribution of letters in a piece of enciphered information can be used to determine in which language it was likely written. As above, the individual pitch distances can be seen as the alphabet (in the case of this research the alphabet incorporates all pitch distances from {-12} up to {12} but different values can be used). Now the second moment of the multinomial distributed intervals can be calculated using equation 1. Here f i is the frequency of the pitch interval {i} and N is the sum over all frequencies of all sequences. S 2 = {12} i={ 12} f i(f i 1) N(N 1) (1) The outcome of this formula is describing the way pitches are distributed within melodies and thus within genres. Whenever the pitches are distributed more equal, one expects to find a lower value for S 2 and whenever pitches are distributed unevenly, for instance in pentatonic scales where only 5 out of 12 pitches are frequently used, one expects to find a high value for S 2. Such a measure might thus be indicative of the amount of tonality and notes in a scale, and could also be used for classification purposes where the initial second moments for the genres should be calculated using a large training corpus. Multiple normal distributions Though certain melodic sequences can be seen as genre characteristics because they occur more in that particular genre, it is not likely that the other genres completely lack them. It is more likely that each particular melodic sequence count (each feature) has a different mean and standard deviation for each genre. Though much data is needed for accurate calculation, these figures then give a much clearer view of the numbers of melodic sequences in each genre. Furthermore, because these distributions also return a chance of a melody belonging to a genre given its melodic sequence counts, they might also be used for classification. 17

18 7.3 General One Versus One Correlation In both parts of the research, the classes used for calculating the correlation of the features with a particular genre were based on a one versus all genre grouping. The songs in the measured genre were said to be in class 1 and all the other songs belonged to class 0. While this serves to find strong characteristics between one genre and all other genres it might be that other interesting genre differences arise through calculation of the correlation between just one genre with one other genre, or one genre with some but not all of the other genres. Exploring alternative musical aspects The methods used in the second part of the research could easily be used to evaluate other feature sets derived from different musical aspects. One of the very obvious aspects is using counts of different sequences of note lengths or even complete rhythmic patterns of melodic lines. Another possibility would be using chord sequences which might be extracted through combining pitch information from multiple voices. On a somewhat larger scale, one could try to base counts or features on the overall structure of the song, a verse-chorus structure for example. This might also provide features describing the compressability of a song; in popular music there is much repetition, so the repeating parts can be compressed, while a modern classical piece is often uncompressable because it lacks repeating parts. References [1] Grove music online. oxfordmusiconline.com, June [2] Z. Cataltepe, Y. Yaslan, and A. Sonmez. Music genre classification using midi and audio features. EURASIP Journal on Applied Signal Processing, 1: , january [3] W. Chai and B. Vercoe. Folk music classification using hidden markov models. In Proc. of International Conference on Artificial Intelligence, [4] E. Chew, A. Volk, and C.-Y. Lee. Dance music classification using inner metric analysis. In B. Golden, S. Raghavan, and E. Wasil, editors, The Next Wave in Computing, Optimization, and Decision Technologies, volume 29 of Operations Research/- Computer Science Interfaces Series, pages Springer US, [5] D. Conklin. Melodic analysis with segment classes. Machine Learning, 65: , [6] A. Honingh, T. Weyde, and D. Conklin. Sequential association rules in atonal music. In Proc. of Mathematics and Computation in Music (MCM2009), [7] C. McKay. Automatic genre classification of midi recordings. Master s thesis, McGill University, Montreal, [8] C. Mckay and I. Fujinaga. Automatic genre classification using large high-level musical feature sets. In Int. Conf. on Music Information Retrieval, ISMIR, pages , [9] D. Rizo, P. J. P. D. Len, C. Prez-sancho, A. Pertusa, and J. M. Iesta. A pattern recognition approach for melody track selection in midi files. In Int. Conf. on Music Information Retrieval, ISMIR, pages 61 66, [10] X. Shao, C. Xu, and M. S. Kankanhalli. Unsupervised classification of music genre using hidden markov model. In IEEE International Conference on Multimedia and Expo, pages , [11] B. L. Sturm. Classification accuracy is not enough - on the evaluation of music genre 18

jsymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada

jsymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada What is jsymbolic? Software that extracts statistical descriptors (called features ) from symbolic music files Can read: MIDI MEI (soon)