Analysis and Clustering of Musical Compositions using Melody-based Features

Similar documents
Composer Style Attribution

Lecture 5: Tuning Systems

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Jazz Melody Generation and Recognition

Modes and Ragas: More Than just a Scale

Modes and Ragas: More Than just a Scale

Modes and Ragas: More Than just a Scale *

Analysis of local and global timing and pitch change in ordinary

Automatic Music Clustering using Audio Attributes

Feature-Based Analysis of Haydn String Quartets

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Supervised Learning in Genre Classification

Outline. Why do we classify? Audio Classification

CPU Bach: An Automatic Chorale Harmonization System

CS229 Project Report Polyphonic Piano Transcription

SPECIAL PUBLICATION. September Notice: NETPDTC is no longer responsible for the content accuracy of the NRTCs.

The purpose of this essay is to impart a basic vocabulary that you and your fellow

Chapter 5. Parallel Keys: Shared Tonic. Compare the two examples below and their pentachords (first five notes of the scale).

Music Preschool. Aesthetic Valuation of Music. Self awareness. Theory of Music. Creation of Music

Music Genre Classification and Variance Comparison on Number of Genres

Theory of Music. Clefs and Notes. Major and Minor scales. A# Db C D E F G A B. Treble Clef. Bass Clef

Hidden Markov Model based dance recognition

Popular Music Theory Syllabus Guide

Singer Recognition and Modeling Singer Error

Varying Degrees of Difficulty in Melodic Dictation Examples According to Intervallic Content

Student Performance Q&A:

Ionian mode (presently the major scale); has half steps between 3-4 and 7-8. Dorian mode has half steps between 2-3 and 6-7.

Acoustic and musical foundations of the speech/song illusion

Additional Theory Resources

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Automatic Piano Music Transcription

jsymbolic 2: New Developments and Research Opportunities

A Basis for Characterizing Musical Genres

Building a Better Bach with Markov Chains

Automatic Labelling of tabla signals

MELONET I: Neural Nets for Inventing Baroque-Style Chorale Variations

How to respell accidentals for better readability # ## 2 # # œ # œ nœ. # œ œ nœ. # œ œ œ # œ ? 2

CALIFORNIA Music Education - Content Standards

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

PLANE TESSELATION WITH MUSICAL-SCALE TILES AND BIDIMENSIONAL AUTOMATIC COMPOSITION

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

FREEHOLD REGIONAL HIGH SCHOOL DISTRICT OFFICE OF CURRICULUM AND INSTRUCTION MUSIC DEPARTMENT MUSIC THEORY 1. Grade Level: 9-12.

Study Guide. Solutions to Selected Exercises. Foundations of Music and Musicianship with CD-ROM. 2nd Edition. David Damschroder

Automatic Rhythmic Notation from Single Voice Audio Sources

Music Composition with RNN

General Music Objectives by Grade

Music Genre Classification

LESSON 1 PITCH NOTATION AND INTERVALS

The Practice Room. Learn to Sight Sing. Level 3. Rhythmic Reading Sight Singing Two Part Reading. 60 Examples

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

NETFLIX MOVIE RATING ANALYSIS

Music Solo Performance

Ear Training & Rhythmic Dictation

Speech To Song Classification

MANCHESTER REGIONAL HIGH SCHOOL MUSIC DEPARTMENT MUSIC THEORY. REVISED & ADOPTED September 2017

MHSIB.5 Composing and arranging music within specified guidelines a. Creates music incorporating expressive elements.

Melodic Minor Scale Jazz Studies: Introduction

AP Music Theory Summer Assignment

ILLINOIS LICENSURE TESTING SYSTEM

THE importance of music content analysis for musical

How Figured Bass Works

Grade One General Music

Predicting Hit Songs with MIDI Musical Features

Student Performance Q&A:

Working with unfigured (or under-figured) early Italian Baroque bass lines

Centre for Economic Policy Research

Topic 10. Multi-pitch Analysis

Can scientific impact be judged prospectively? A bibliometric test of Simonton s model of creative productivity

Standard 1 PERFORMING MUSIC: Singing alone and with others

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Musical Harmonization with Constraints: A Survey. Overview. Computers and Music. Tonal Music

SAMPLE. Music Studies 2019 sample paper. Question booklet. Examination information

On time: the influence of tempo, structure and style on the timing of grace notes in skilled musical performance

Using Genre Classification to Make Content-based Music Recommendations

Music Morph. Have you ever listened to the main theme of a movie? The main theme always has a

Specifying Features for Classical and Non-Classical Melody Evaluation

Creating a Feature Vector to Identify Similarity between MIDI Files

Partimenti Pedagogy at the European American Musical Alliance, Derek Remeš

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

GRATTON, Hector CHANSON ECOSSAISE. Instrumentation: Violin, piano. Duration: 2'30" Publisher: Berandol Music. Level: Difficult

MUSIC THEORY CURRICULUM STANDARDS GRADES Students will sing, alone and with others, a varied repertoire of music.

Robert Alexandru Dobre, Cristian Negrescu

Advanced Placement Music Theory

jsymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada

Discussing some basic critique on Journal Impact Factors: revision of earlier comments

Sequential Association Rules in Atonal Music

Musical Modes Cheat Sheets

CSC475 Music Information Retrieval

PRESCOTT UNIFIED SCHOOL DISTRICT District Instructional Guide January 2016

Introductions to Music Information Retrieval

EASTERN ARIZONA COLLEGE Elementary Theory

Music Theory. Fine Arts Curriculum Framework. Revised 2008

Categorization of ICMR Using Feature Extraction Strategy And MIR With Ensemble Learning

Query By Humming: Finding Songs in a Polyphonic Database

2014 Music Performance GA 3: Aural and written examination

XI. Chord-Scales Via Modal Theory (Part 1)

Exploring the Rules in Species Counterpoint

Chorale Harmonisation in the Style of J.S. Bach A Machine Learning Approach. Alex Chilvers

Curriculum Standard One: The student will listen to and analyze music critically, using the vocabulary and language of music.

Transcription:

Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates musical genres. We use two methods: the k-means algorithm as an unsupervised learning method for understanding how to cluster unorganized music and a Markov Chain Model which determines relative probabilities for the next note s most likely value, and evaluate our algorithms accuracy in predicting correct genre. Our experiments indicate that the k-means approach is modestly successful for separating out most genres, whereas the Markov Chain Model tends to be very accurate for music classification. 1 Objective This paper demonstrates that melodic structure, i.e. note subsequences and which notes are likely to follow other notes, can fundamentally differentiate musical genres, without additional information about instrumentation, chord structure, language, etc. This idea is inspired in part by the concept in Indian classical music that each raga, or scale, is distinguished by its own characteristic melodic phrases, or melodic idioms. This occurs in Western classical music also; for instance, consider the typical third, trilled second, root, root to end phrases in Baroque pieces in the major scale. Potential applications of our models include: using the k-means algorithm to define musical clusters for melodies without specified genres; using either k-means cluster centroids or the Markov Chain Model to determine known melodies similar to a new melody, using the either technique for automated genre prediction, and using the Markov Chain Model to generate new melodies within a musical genre or tradition. 2 Data Songs were stored as arrays of integers, with each integer representing a musical note. Rhythm was ignored. The sources of data were the following: 2,000 Irish traditional songs scraped from thesession.org in the Dorian, Mixolydian, Ionian (Major) and Aeolian (Minor) modes and in the time signatures 2/4, 3/4, 4/4, 6/8 and 9/8. The majority (70%) were major. 1 27 Carnatic (South Indian classical) compositions in equivalent scales to the Irish modes: Shankarabharanam (Major), Kharaharapriya (Dorian), Harikambhoji (Mixolydian), Bhairavi (Minor), and Malahari (incomplete Minor). Smaller data sets: a variety of children s songs and 13 Sarali Varasai (Carnatic vocal exercises) in ragam Mayamalavagowla. All data are expressed relative to the root: the key is disregarded. 3 Methods 3.1 k-means Clustering 3.1.1 Rationale Given a dataset of melodies with unknown genre, can we identify which melodies are similar? To answer this question, we looked for an unsupervised learning algorithm with a non-probabilistic model to identify song clusters. Although we believed that our data would fit subspaces better than clusters, we did not want to obscure the data s original features in our results. Therefore, we decided to use the k-means algorithm as opposed to alternatives such as the PCA model. In order to test the success of our clustering, we ran k-means on two melody genres with only two clusters, used maximum recall probability to determine the correct cluster assignment, and calculated an F-score to account for both precision and recall error. A higher F-score indicates less error and better success. To account for variability in k-means, we averaged the F-scores for multiple iterations of k-means. Eventually, to determine the ideal cluster for a new song, one could compare the song s distance in the feature space to the cluster centroids, and the cluster with the centroid a minimum distance away would be considered the cluster of best fit.

3.1.2 Feature Definition We define our features to be a sequence of absolute notes of a specified length, such as C-E-G#. We characterize each composition by the frequency of each feature in that composition, and cluster the compositions based on their location in the resulting feature space. 3.1.3 Feature Subset Selection Since using all possible note sequences as features would result in a too-sparse feature space for k-means clustering, we selected a subset of features to serve as the axes for the feature space. We considered two methods for selecting features: 1) selecting the total most frequent features across all songs in the two genres, and 2) selecting the features with the highest variance in relative frequency, i.e. features that are very frequent in some categories and very infrequent in other categories. 3.1.4 Number of Features We varied the number of features selected for k-means clustering from 1 feature to 200 features. 1 feature would be equivalent to seeing if a single note sequence is more prevalent in some genres than others. The maximum number of features is number o f n otes featurelength. 3.1.5 Feature Length We examined features from length 1 to 5. For feature length 1, our algorithm is equivalent to analyzing differences in note frequency distributions. 3.2 Markov Chain Model For our second model, we modeled each genre with a Markov chain model for a variety of levels k. This model makes intuitive sense as a way to capture melodic idioms, because it explicitly models each note as being drawn from a probability distribution dependent on the k notes directly preceding it. For a level k Markov chain model, we model the probability P (d (i) g) that a held out document d (i) of length n belongs to a genre g with the following formula: P (d (i) g) = n j=k+1 p(d (i) j d(i) j k 1...j 1, g) where each term p(d i d i k 1...i 1 ) is the smoothed probability given by the Markov chain that the k-note subsequence d i k 1...i 1 (henceforth also: feature) is followed by the note d i. This equation is the result of a slightly stronger variant of Naive Bayes assumption: it is derived by assuming that note d (i) j is independent of all notes further than k notes before it. Therefore: Because of the varying sizes of the data sets, we used a standard 70/30% hold out split for the large data sets and leave out one cross validation (LOOCV) for the smaller ones. The Markov Chain Model tended to perform the quite well, with training error of around 1% for k = 3 for the entire data set. The following data involve representations of the songs in terms of relative degree. Observing the performance of the model as a function of k provides important insight into the data (Figure 2b). Over a variety of parameter values and data subsets, Markov models of level 3 and 4 showed the least average training error. The failure of k = 1 to predict genre well demonstrates that looking only at one note before, suggesting melodic idioms of length two (i.e. intervals), is too myopic. (Note that it still performs significantly better than chance, however.) Longer features also lead to poorer models. This also makes intuitive sense, in that longer subsequences begin to be characteristic of the overall melody of a specific song, and are consequently long enough to be easily consciously recognized. The drive to be unique will therefore discourage songs from developing similar features of this length. Equally importantly, the data becomes sparser for these k-values, because the size of the feature space is exponential in the length of the feature. We postulate moreover that levels three and four showed the best genre categorization because they are similar to the most common lengths of measures, which are natural structural breaks in melody. Irish songs in particular tend to be strongly rhythmic, and furthermore are a robuster dataset. We therefore separated the Irish tunes into four categories, based on their time signatures: 2/4, 3/4, 4/4, 6/8 and 9/8, predicting that k = 3 would predict better for 3/4, 6/8 and 9/8, whereas k = 4 would predict 4/4 better (Figure 3). 2

This certainly turned out to be true for k = 4. For k = 3 the odd-numbered time signatures fared much better, but still had higher error than 4/4. This could either indicate that songs in the tempo 4/4 tend to be more self similar and therefore predictable; or it could be a result of the fact that this category has more songs. The effect of the length of feature chosen can therefore be viewed as, to some extent, implicitly modeling low level structural elements of the songs. This illuminates the inevitable bias of a simple model like this over a domain, music, renowned for its complexity. This is supported by the fact that increasing the size of the dataset did not have a significant effect on its predictive capability. A model which explicitly takes the structure into account, such as a hierarchical feature model, would certainly be excellent for this task, but it was beyond the resources of the authors to implement. By considering the relative degree of notes in a melody instead of the absolute displacement in half steps of a note from the root, as we do with the Markov Chain Model, we demonstrate something even more surprising: within the same musical tradition and the same genre, one can distinguish songs from different scales, even when projected onto the same relative scale. It is easy to suppose, for instance, that major melodies and minor songs are fundamentally the same, differing only in that the latter have flat thirds, sixths and sevenths. This paper, however, demonstrates that at least for Irish folk tunes, this is not the case. Each mode instead appears to be characterized by particular relative melodic idioms that are independent of the absolute difference in half steps from the root. This trend is especially visible in Irish folk music. One reason why this trend might be particularly clear for this genre is a result of the instruments that the songs are played on. Many traditional instruments, such the penny whistle (feadg) and the harp, are tuned diatonically (i.e. the white keys on the piano), and so different modes are also necessarily in different keys. This can affect which melodies are the easiest or most possible to play. On the penny whistle, for instance, it is particularly easy to go to the seventh (all fingers down) before hitting the root (one finger up); whereas this pattern is impossible in the major scale except for in the octave above. 3

GENRE malahari children s tunes Table 1: F-Scores for Feature Subset Selection by Highest Frequency harikambodhi shankarabharanam bhairavi kharaharapriya major minor dorian sarali varasai 0.8833 0.8198 0.8077 0.8539 0.9167 0.9042 0.2790 0.5747 0.5983 0.8228 malahari 0.8542 0.8333 0.8452 0.8667 0.8542 0.2456 0.5531 0.4053 0.4494 children s tunes 0.7471 0.6392 0.8875 0.8750 0.2482 0.4246 0.4405 0.5118 harikambodhi 0.6198 0.8667 0.6889 0.2521 0.3906 0.4160 0.5420 shankarabharanam 0.8786 0.8661 0.2517 0.4428 0.3816 0.6126 bhairavi 0.6467 0.2636 0.5680 0.5159 0.7406 kharaharapriya 0.2509 0.4734 0.4875 0.7510 major 0.5043 0.4459 0.3519 minor 0.5983 0.6684 dorian 0.5600 Mean F-score: 60.32 GENRE malahari children s tunes Table 2: F-Scores for Feature Subset Selection by Highest Variance mixylodian harikambodhi shankarabharanam bhairavi kharaharapriya major minor dorian mixylodian sarali varasai 0.7266 0.7566 0.6701 0.7855 0.8328 0.6466 0.3163 0.9582 0.9594 0.9314 malahari 0.7689 0.8333 0.8452 0.8667 0.8542 0.2450 0.5170 0.4008 0.4052 children s tunes 0.6410 0.6606 0.6990 0.6761 0.2463 0.4464 0.3854 0.5453 harikambodhi 0.6750 0.8667 0.6889 0.2392 0.3850 0.3407 0.4590 shankarabharanam 0.8786 0.8661 0.2478 0.4250 0.3626 0.4593 bhairavi 0.7075 0.2589 0.5344 0.4327 0.6249 kharaharapriya 0.2520 0.4581 0.3857 0.5129 major 0.4172 0.4372 0.3386 minor 0.6197 0.6813 dorian 0.5619 Mean F-score: 57.70 Figure 1: F-score by Number of Features 4

Figure 2 (a) F-score by Number of Features (b) Average classification error over held-out documents in all genres, for the complete dataset (minus the smaller data sets), 9 categories. Figure 3: Training error versus time signature (beats per measure). When training with features of length 4 (right), songs in 4/4 are much more accurately classified. A feature of length 3 (left) improves the classification of odd-valued tempos significantly, but are still not classified as well as 4/4. 5