Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index

Size: px
Start display at page:

Download "Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index"

Transcription

1 Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index Kwan Kim Submitted in partial fulfillment of the requirements for the Master of Music in Music Technology in the Department of Music and Performing Arts Professions in The Steinhardt School New York University Advisor: Dr. Juan P. Bello Reader: Dr. Kenneth Peacock Date: 212/12/11

2 Copyright c 212 Kwan Kim

3 Abstract With the advent of advanced technology and instant access to the Internet, the music databases have grown rapidly, requiring more efficient ways of organizing and providing access to music. A number of automatic classification algorithms are proposed in the field of music information retrieval (MIR) by a means of supervised learning method, in which ground truth labels are imperative. The goal of this study is to analyze a statistical relationship among audio labels such as era, emotions, genres, instruments, and origin, using the Million Song Dataset and Hubert-Arabie adjusted Rand Index in order to observe whether there is a significant enough correlation between these labels. It is found that the cluster validation is low among audio labels, which implies no strong correlation and not enough co-occurrence between these labels when describing songs.

4 Acknowledgements I would like to thank everyone involved in completing this thesis. I especially send my deepest gratitude to my advisor, Juan P. Bello, for keeping me motivated. His critics and insights consistently pushed me to become a better student. I also thank Mary Farbood for being such a friendly mentor. It was a pleasure to work as her assistant for the past year and half. I thank the rest of NYU faculty for providing an opportunity and excellent program to study. Lastly, I thank my family and wife for their support and love.

5 Contents List of Figures List of Tables iv vi 1 Introduction 1 2 Literature Review Music Information Retrieval Automatic Classification Genre Emotion Methodology Data Statistics Filtering st Filtering nd Filtering rd Filtering Co-occurence Hierarchical Structure th Filtering Term Frequency th Filtering Audio Labels Era Emotion ii

6 CONTENTS Genre Instrument Origins Audio Features k-means Clustering Algorithm Feature Matrix Feature Scale Feature Clusters Hubert-Arabie adjusted Rand Index Evaluation and Discussion K vs. ARI HA Hubert-Arabie adjusted Rand Index (revisited) Cluster Structure Analysis Neighboring Clusters vs. Distant Clusters Correlated Terms vs. Uncorrelated Terms Conclusion and Future Work 49 References 5 iii

7 List of Figures 1.1 System Diagram of a Generic Automatic Classification Model System Diagram of a Genre Classification Model System Diagram of a music emotion recognition model Thayer s 2-Dimensional Emotion Plane (19) Clusters Co-occurence - same level Co-occurence - different level Hierarchical Structure (Terms) Intersection of Labels Era Histogram Emotion Histogram Genre Histogram Instrument Histogram Origin Histogram Elbow Method Content-based Cluster Histogram K vs. ARI HA Co-occurence between feature clusters and era clusters Co-occurence between feature clusters and emotion clusters Co-occurence between feature clusters and genre clusters Co-occurence between feature clusters and instrument clusters Co-occurence between feature clusters and origin clusters Co-occurence between era clusters and feature clusters iv

8 LIST OF FIGURES 4.8 Co-occurence between emotion clusters and feature clusters Co-occurence between genre clusters and feature clusters Co-occurence between instrument clusters and feature clusters Co-occurence between origin clusters and feature clusters v

9 List of Tables 3.1 Overall Data Statistics Field List Terms Labels Clusters Hierarchical Structure (Clusters) Hierarchical Structure (µ and σ) Mutually Exclusive Clusters Filtered Dataset Era Statistics Emotion Terms Genre Terms Instrument Statistics Origin Terms Audio Features Cluster Statistics x 2 Contingency Table ARI HA Term Cooccurrence Term Cooccurrence Optimal Cluster Validation Self-similarity matrix Neighboring Clusters Distant Clusters vi

10 LIST OF TABLES 4.8 Term Correlation Term Correlation Term Vectors Label Cluster Distance Label Cluster Distance vii

11 Chapter 1 Introduction In 21 st century, we are living in a world, where an instant access to the countless number of music database is granted. Online music stores such as itunes store or online music streaming service such as Pandora provides millions of songs from artists all over the world. As the music database has grown rapidly with the advent of advanced technology and the Internet, it requires much more efficient ways of organizing and finding music. One of the main tasks in the field of music information retrieval (MIR) is to generate a computational model for classification of audio data such that it is faster and easier to search for and listen to music. A number of researchers have proposed methods to categorize music into different classifications such as genres, emotions, activities, or artists (1, 2, 3, 4). This automated classification would then let us search for audio data based on their labels e.g., when we search for sad music, the audio emotion classification model returns songs with label sad. Regardless of the type of classification model, there is a generic approach to this problem as outlined in figure 1.1 extracting audio features, obtaining labels, and computing the parameters to generate a model by a means of supervised machine learning technique. When utilizing a supervised learning technique to construct a classification model, however, it is imperative that ground truth labels are provided. Obtaining labels involves 1

12 human subjects, which makes the process expensive and inefficient. In certain cases, the number of labels are bound to be insufficient, making it even harder to collect data. As a result, researchers have used a semi-supervised learning method, in which unlabeled data is combined with labeled data during training process in order to improve performance (5). However, this method is also limited to certain situation, where data has only one type of label e.g., if a dataset is labeled by genre, it is possible to construct a genre classification model; however, it is not possible to create a mood classification model without knowing a priori correspondence between genre and mood labels. This causes a problem when certain dataset has only one type of label and needs to be classified into a different label class. It can be much efficient and less expensive if there exists a statistical correspondence among different audio labels so that it enables to easily predict a different label class from the same dataset. Therefore, the goal of this study is to define a statistical relationship among different audio labels such as genres, emotions, era, origin, and instruments, using The Million Song Dataset (6), applying an unsupervised learning technique, i.e. k-means algorithm, and calculating the Hubert-Arabie adjusted Rand (ARI HA ) index (7). The outline of this thesis is organized in following steps: literature review will be provided about previous MIR studies on automatic classification models. The detailed methodology and data analysis will be given in Chapter 3. Based on the results obtained from Chapter 3, possible interpretations of the data will be discussed in Chapter 4. Finally, concluding remarks and future work are laid out in Chapter 5. 2

13 Figure 1.1: System Diagram of a Generic Automatic Classification Model - Labels are used only in supervised learning case 3

14 Chapter 2 Literature Review 2.1 Music Information Retrieval There are many ways to categorize music. One of the traditional ways to categorize music is by its metadata such as name of song, artist, or album, which is known as tag-based or text-based categorization (8). As music databases have grown virtually countless, it requires more efficient ways to query and retrieve music. As opposed to tag-based query and retrieval, which only enables to retrieve songs that we have a priori information about, a content-based query and retrieval allows us to find songs in different ways - e.g., it allows to find songs similar in musical context or structure and it could also recommend songs based on musical labels such as emotion. Music information retrieval (MIR) is a widely and rapidly growing research topic in the multimedia processing industry, which aims at extending the understanding and usefulness of music data, through the research, development and application of computational approaches and tools. As a novel way of retrieving songs or creating a playlist, researchers have come up with a number of classification methods using different labels such as genre, emotion, or cover song (1, 2, 3, 4) so that each classification model could retrieve a song based on its label or musical similarities. These methods are different 4

15 2.2 Automatic Classification than tag-based method since audio features are extracted and analyzed prior to constructing a computational model. Therefore, a retrieved song is based on the content of the audio, not on its metadata. 2.2 Automatic Classification In previous studies, most audio classification models are based on supervised learning method, in which musical labels such as genre or emotion are required (1, 2, 3, 4). Using labels along with well-defined high-dimensional musical features, learning algorithms go through computations to train the data to find possible relationships between the features and a label so that for any given unknown (test) data, the model could correctly recognize the label Genre Tzanetakis et al. (1, 9) are among the earliest researchers who worked on automatic genre classification. Instead of manually assigning musical genre for a song, automatic genre classification model enables to generate a genre label for a given song after comparing its musical features with the model. In (1, 9) the authors used three feature sets, in which each describes timbral texture, rhythmic content, and pitch content, respectively. Features such as spectral centroid, rolloff, flux, zero crossing rate, and MFCC (1) were extracted to construct a feature vector that describes timbral texture of music. Automatic beat detection algorithm (4, 11) was used to calculate the rhythmic structure of music and used as a feature vector that describes rhythmic content. Lastly, pitch detection techniques (12, 13, 14) were used to construct a pitch content feature vector. Figure 3.12 represents the system overview of the automatic genre classification model described in (1, 9). 5

16 2.2 Automatic Classification Figure 2.1: System Diagram of a Genre Classification Model - Gaussian Mixture Model (GMM) is used as a classifier 6

17 2.2 Automatic Classification Emotion In 26, the work of L. Lu et al. (15) was one of a few studies that provided indepth analysis of mood detection and tracking of music signals using acoustic features extracted directly from audio waveform, instead of using MIDI or symbolic representations. Although it has been an active research topic, researchers have consistently faced the same problem with quantification of music emotion due to the nature of subjectivity of music emotion. Recent studies have sought ways to minimize the inconsistency among labels. Skowronek et al. (16) paid close attention to material collection process. They obtained a large number of labelled data from 12 subjects and accounted for only those in agreement with one another in order to exclude the ambiguous ones. In (17), the authors created a collaborative game that collects dynamic (time-varying) labels of music mood from two players and ensures that the players cross check each other s label in order to build a consensus. Defining mood classes is not an easy task. There have been mainly two approaches to defining mood: categorical and continuous. In (15) mood labels are classified into adjectives such as happy, angry, sad, or sleepy. However, the authors in (18) defined mood as a continuous regression problem as described in figure 2.2, and mapped emotion into two-dimensional Thayer s Plane (19) shown in figure 2.3. Recent studies focus on multi-modal classification using both lyrics and audio contents to quantify music emotion (2, 21, 22), on dynamic music emotion modeling (23, 24), or on unsupervised learning approach for mood recognition (25). 7

18 2.2 Automatic Classification Figure 2.2: System Diagram of a music emotion recognition model - Arousal and Valence are two independent regressors Figure 2.3: Thayer s 2-Dimensional Emotion Plane (19) - Each axis is used as an independent regressor 8

19 Chapter 3 Methodology Previous studies have constructed the automatic classification model, using a relationship between audio features and one type of label (e.g. genre or mood). As it is stated in chapter 1, however, if statistical relationship among several audio labels is defined, it could reduce the cost of constructing the automatic classification models. In order to solve the problem, two things are needed: 1. Big Music Data with multiple labels: The Million Song Dataset (6) 2. Cluster Validation Method: Hubert-Arabie adjusted Rand Index A large dataset is required to minimize bias and noisiness of labels. Since labels are acquired from users, small number of music data would lead to large variance among labels and thus large error. A cluster validation method is required to compare sets of clusters created by different labels, hence Hubert-Arabie adjusted Rand index. 9

20 3.1 Data Statistics 3.1 Data Statistics The Million Song Dataset (6) consists of million files in HDF5 format, from which various information can be retrieved including metadata such as the name of artist, title of song, or tags (terms) and musical features such as chroma, tempo, loudness, mode, or key. Table 3.1 shows the overall statistics of the dataset and table 3.2 shows a list of fields available in the files of the dataset. No. Type Total 1 Songs 1,, 2 Data 273 GB 3 Unique Artists 44,745 4 Unique Terms 7,643 5 Artists with at least one term 43,943 6 Identified Cover Song 18,196 Table 3.1: Overall Data Statistics - Statistics of The Million Song Dataset 3.2 Filtering LabROSA, the distributor of The Million Song Dataset, also provides all the necessary functions to access and manipulate the data from Matlab. HDF5 Song File Reader function lets convert.h5 files into a Matlab object, which can be further used to extract labels using get artist terms function and features using get segments pitches, get tempo, and get loudness functions. Therefore, labels enable to create several sets of clusters, while audio features are used to form another set of cluster. Figure 3.1 indicates different sets of clusters. Although it is idealistic to take account of all million songs, due to the noisiness of data, dataset must undergo following filtering process to get rid of unnecessary songs: 1. All terms are categorized into one of 5 label classes 2. Create a set of clusters based on each label class. 1

21 3.2 Filtering 3. Find hierarchical structure of each label class. 4. Make each set of clusters mutually exclusive. 5. Songs that contain at least one term from all of the five label classes are retrieved. Figure 3.1: Clusters - Several sets of clusters can be made using labels and audio features st Filtering As shown in table 3.1, there are 7643 unique terms that describe songs in the dataset. Some examples of these terms are shown in table 3.3. These unique terms have to be filtered so that meaningless terms are ignored. In other words, five labels are chosen so that each term can be categorized into one of following five labels: era, emotion, genre, instrument, and origin. Doing so, any term that cannot be described as one of those labels is dropped. Table 3.4 shows the total number of terms that belong to each label. Note the small number of terms in each label category compared to original 7643 unique terms. This is because many terms cross reference each other. For example, s, s alternative, and s country all count as unique terms, but they are all represented as s under era label class. Similarly, alternative jazz, alternative 11

22 3.2 Filtering Field Name Type Description analysis sample rate float sample rate of the audio used artist familiarity float algorithmic estimation artist hotnesss float algorithmic estimation artist id string Echo Nest ID artist name string artist name artist terms array string Echo Nest tags artist terms freq array float Echo Nest tags freqs audio md5 string audio hash code bars confidence array float confidence measure bars start array float beginning of bars beats confidence array float confidence measure beats start array float result of beat tracking danceability float algorithmic estimation duration float in seconds energy float energy from listener perspective key int key the song is in key confidence float confidence measure loudness float overall loudness in db mode int major or minor mode confidence float confidence measure release string album name sections confidence array float confidence measure sections start array float largest grouping in a song segments confidence array float confidence measure segments loudness max array float max db value segments loudness max time array float time of max db value segments loudness max start array float db value at onset segments pitches 2D array float chroma feature segments start array float musical events segments timbre 2D array float texture features similar artist array string Echo Nest artist IDs song hotttnesss float algorithmic estimation song id string Echo Nest song ID tempo float estimated tempo in BPM time signature int estimate of number of beats/bar time signature confidence float confidence measure title string song title track id string Echo Nest track ID Table 3.2: Field List - A list of fields available in the files of the dataset 12

23 3.2 Filtering rock, alternative r & b, and alternative metal are simply alternative, jazz, rock, r & b, and metal under genre label category. No. Terms 1 s 2 s alternative 3 s country gp worldwide 3113 grammy winner 3114 gramophone punky reggae 5788 pure black metal 5789 pure grunge.... Table 3.3: Terms - Examples of terms (tags) Label Total era 17 emotion 96 genre 436 instrument 78 origin 635 Table 3.4: Labels - Terms belong to each label class In this way, the total number of terms in each category is reduced and it is still possible to search songs without using repetitive terms. For example, a song that has alternative jazz term can be searched by both alternative and jazz keywords, instead of alternative jazz. In addition, composite terms such as alternative jazz or ambient electronics are not included since they are at the lowest level of hierarchical level and the number of elements that belong to such clusters is few. 13

24 3.2 Filtering nd Filtering After all unique terms are filtered into one of five label classes, each term belonging to each label class is regarded as a cluster as shown in table 3.5. Note that it is still not deterministic that all terms are truly representative as independent clusters as it must be taken into account that there are a few hierarchical layers among terms i.e. piano and electric piano terms might not be in the same level of hierarchy in instrument label class. In order to account for differences in layers, co-occurence between a pair of clusters is calculated as explained in next section. Label era emotion genre instrument origin Clusters s 17s 191s 19th century angry chill energetic horror mellow ambient blues crossover dark electronic opera accordion banjo clarinet horn ukelele laptop african belgian dallas hongkong moroccan Table 3.5: Clusters - Each term forms a cluster within each label class rd Filtering Co-occurence Within a single label class, there are a number of different terms, of which each could possibly represent an individual cluster. However, while certain terms inherently possess clear meaning, some do not e.g. in genre label class, the distinctions between dark metal and death metal or acid metal and acid funk might not be obvious. In order to avoid ambiguity among clusters, co-ocurrences of two clusters are measured. Co-occurrences of a pair of clusters can be easily calculated as follows: cooc a,b = intersect(a, b) intersect(a, a), cooc intersect(a, b) b,a = intersect(b, b) (3.1) 14

25 3.2 Filtering where intersect(i, j) counts the number of elements in both i and j clusters. Therefore, if both clusters have high or small co-occurrence values, it implies that there is a large or small overlap between clusters, while if only one of two clusters has a high value and the other has a low value, it implies that one cluster is a subset of the other as illustrated in figures 3.2 and 3.3. Also note that if one cluster is a subset of the other, it implies that they are not at the same hierarchical level. Figure 3.2: Co-occurence - same level - (a) small overlap between two clusters; (b) large overlap between two clusters Figure 3.3: Co-occurence - different level - (a) Cluster B is a subset of Cluster A; (b) Cluster A has relatively large number of elements than Cluster B, of which most belong to intersection Therefore, threshold is set such that if (cooc a,b >.9 & cooc b,a <.1) or (cooc a,b <.1 & cooc b,a >.9), then cluster A is a subset of cluster B or vice versa. If neither condition 15

26 3.2 Filtering is met, two clusters are at the same hierarchical level. In doing so, layers of hierarchy can be retrieved Hierarchical Structure After obtaining co-occurrence values for all the pairs of clusters, the structure of clusters in each label classes can be known. Table 3.6 shows the hierarchical structure of each label class and figure 3.4 shows some of the terms at different hierarchical level. Label 1st Layer 2nd Layer 3rd Layer Total era 3 14 empty 17 emotion 3 93 empty 96 genre instrument empty 9 origin Table 3.6: Hierarchical Structure(Clusters) - Total number of clusters at different layers in each label class Figure 3.4: Hierarchical Structure (Terms) - Examples of terms at different layers in each label class 16

27 3.2 Filtering The structure looks well correlated with intuition with more general terms at higher level such as bass or guitar, while terms such as acoustic bass or classical guitar are at lower level. The number of songs in each cluster also matches well with intuition. Terms at the high level of hierarchy have a large number of songs, while there are relatively small number of songs that belong to terms at the low level. Now that the structure of clusters for each label class is known, it must be carefully decided that which layer should be used as there is a tradeoff between the number of clusters and the number of songs belonging to each cluster: higher layer has a small total number of clusters but each cluster contains sufficient amount of songs and vice versa. In order to make a logical decision, three different thresholds are set: the number of cluster, N, the mean, µ, and the standard deviation, σ, of all levels are calculated and shown in table 3.7. The rationale is that each layer within a label class must have enough number of clusters and that each cluster must contain sufficient number of songs while the variance of the distribution is as small as possible. The author defined the value for all three thresholds as follows: N > 5 µ > 5, σ = as small as possible 1st layer from instrument class and 2nd layer from era, emotion, genre, and origin label classes are selected as shown in table

28 3.2 Filtering Label 1st Layer 2nd Layer 3rd Layer µ σ µ σ µ σ era 59,524 (3) 47,835 21,686 (14) 43,263 empty empty emotion 23,95 (3) 15,677 5,736 (93) 19,871 empty empty genre 35,839 (27) 79,37 5,744 (127) 14,37 2,421 (274) 6,816 instrument 39,744 (18) 76, (72) 2,464 empty empty origin 69,452 (13) 141,44 8,84 (135) 23, (487) 2,8 Table 3.7: Hierarchical Structure (µ and σ)] - The mean and the standard deviation for each layer. Number in parenthesis denotes number of clusters. Bold numbers denote selected layer th Filtering Term Frequency After finding the structure of clusters and selecting the layer in the previous section, all the clusters within the same layer must become mutually exclusive, leaving no overlapping elements among clusters. Therefore, after finding intersections among clusters, it needs to be decided to which cluster the element should belong. In order to resolve conflicts in multiple clusters, the frequency of terms is retrieved for every single element via provided function get artist terms freq. Therefore, for every element within intersection, the term frequency value is taken into account and whichever term that has a higher value should take the element, while the other should lose. In this way, total number of clusters are reduced via merging and all the terms become mutually exclusive. Table 3.8 indicates total number of songs in each label class. Label # of Songs era 387,977 emotion 394,86 genre 7,778 instrument 384,59 origin 871,631 Table 3.8: Mutually Exclusive Clusters - Total number of songs in mutually exclusive clusters 18

29 3.2 Filtering th Filtering Since most songs are given multiple terms, they might belong to several label classes e.g. a song with s and alternative jazz terms belong to both era and genre label class. Therefore, after obtaining the indexes of songs that belong to each category, intersections among these indexes are retrieved so that only the songs with each of all five labels are considered. The description of aforementioned process is shown in figure 3.5. Finally, the total number of clusters in each label class and the total number of songs used in the study after all filtering processes is shown in table 3.9. Figure 3.5: chosen Intersection of Labels - Songs that belong to all five label classes are Songs Era Emotion Genre Instrument Origin Original 1,, Filtered 41, Table 3.9: Filtered Dataset - Total number of songs and clusters after filtering 19

30 3.3 Audio Labels 3.3 Audio Labels Era After all the filtering processes, 7 clusters are selected for era label class. Terms such as 16 th century or 21 th century as well as 3s and 4s are successfully ignored via merging and hierarchy. Table 3.1 and figure 3.6 show the statistics of remaining terms. Note that the distribution is negatively skewed, which is intuitive, because there are more songs that exist in recorded format in later decades than early 2th century due to the advanced recording technology. It also makes sense that the cluster 8s consists of most songs because people use the term 8s to describe 8s rock or 8s music more often than s music or s pop. Era 5s 6s 7s 8s 9s s 2th Total 661 3,525 2,826 17,359 9,555 6,111 1,232 41,269 Table 3.1: Era Statistics - # of Songs belonging to each era cluster Histogram of Era Cluster s 6s 7s 8s 9s s 2th century Cluster Figure 3.6: Era Histogram - Distribution of songs based on era label 2

31 3.3 Audio Labels Emotion There are a total of 34 clusters in emotion label class, which are shown in table Note the uneven distribution of songs in emotion label class is shown in figure 3.7. Clusters such as beautiful, chill, and romantic together consist about one third of the total songs, while there are relatively a few number of songs belonging to clusters such as evil, haunting, and uplifting. Emotion beautiful brutal calming chill energetic evil gore grim happy harsh haunting horror humorous hypnotic intense inspirational light loud melancholia mellow moody obscure patriotic peace relax romantic sad sexy strange trippy uplifting wicked wistful witty Table 3.11: Emotion Terms - all the emotion terms. Histogram of Emotion Cluster beautiful evil haunting inspirational moody romantic uplifting Cluster Figure 3.7: Emotion Histogram - Distribution of songs based on emotion label 21

32 3.3 Audio Labels Genre A total of 44 genre clusters are created and shown in table 3.12 and its distribution is shown in figure 3.8. Also note that certain genre terms such as hip hop, indie, and wave have more songs than the others like emo or melodic. Genre alternative ambient ballade blues british christian classic country dance dub electronic eurodance hard style hip hop instrumental industrial indie lounge modern neo new noise nu old orchestra opera post power progressive r&b rag soundtrack salsa smooth soft swing synth pop techno thrash tribal urban waltz wave zouk Table 3.12: Genre Terms - all the genre terms. 7 Histogram of Genre Cluster alternative country instrumental noise progressive swing wave Cluster Figure 3.8: Genre Histogram - Distribution of songs based on genre label 22

33 3.3 Audio Labels Instrument There are only 7 instrument clusters after filtering processes. The name of each cluster and the number of songs belonging to corresponding cluster is given in table The values make perfect sense as guitar, piano, and synth have many songs in their clusters while there are relatively small number of songs belonging to saxophone and violin. Figure 3.9 shows the histogram of instrument clusters. Instrument bass 2444 drum 513 guitar 9731 piano 5667 saxophone 134 violin 322 synth Table 3.13: Instrument Statistics - belonging to each instrument cluster Histogram of Genre Cluster bass drum guitar piano saxophone violin synth Cluster Figure 3.9: Instrument Histogram - Distribution of songs based on instrument label 23

34 3.3 Audio Labels Origins There are 33 different origin clusters as laid out in table Note that clusters such as american, british, dc, and german have a large number of songs, while clusters such as new orleans, suomi, or texas consists of relatively small number of songs. Also note that terms american and texas both appear as independent clusters, while it seems intuitive that texas should be a subset of american. It is because when describing a song with origin label, certain songs are specifically described by texas than american or united states e.g. country music. Finally, the statistics of origin label class is shown in figure 3.1. Origin african american belgium british canada cuba dc east coast england german ireland israel italian japanese los angeles massachusetts mexico nederland new york norway new orleans poland roma russia scotland southern spain suomi sweden tennessee texas united states west coast Table 3.14: Origin Terms - all the origin terms. Histogram of Origin Cluster african cuba ireland massachusetts new orleans southern texas Cluster Figure 3.1: Origin Histogram - Distribution of songs based on origin label 24

35 3.4 Audio Features 3.4 Audio Features Audio features are extracted in order to construct feature clusters using clustering algorithm, using provided functions such as get segments timbre or get segments pitches. Table 3.15 shows a list of extracted features. It takes about 3ms to extract a feature from one song, which makes a total of 8 hours from million songs. However, since only 41,269 songs are used, the computation time is reduced to less than an hour. No. Feature Function 1 Chroma get segments pitches 2 Texture get segments timbre 3 Tempo get tempo 4 Key get key 5 Key Confidence get key confidence 6 Loudness get loudness 7 Mode get mode 8 Mode Confidence get mode confidence Table 3.15: Audio Features - Several audio features are extracted via respective functions k-means Clustering Algorithm Content-based clusters can be constructed based on clustering algorithm, an unsupervised learning method, which does not require pre-labeling for data and uses only features to construct clusters of similar data points. There are several variants of clustering algorithms such as k-means, k-median, centroid-based, or single-linkage (26, 27, 28). In this study, k-means clustering algorithm is used for automatic clusters. The basic structure of the algorithm is defined in following steps (29, 3): 1. Define a similarity measurement metric, d. (e.g. Euclidean, Manhattan, etc.) 2. Randomly initialize k centroids, µ k. 3. For all data points x, find µ k that returns minimum d. 25

36 3.4 Audio Features 4. Find C k, a cluster that includes a set of points assigned to µ k. 5. Recalculate µ k for every C k. 6. Repeat steps 3 through 5 until it converges. 7. Repeat steps 2 through 6 multiple times to avoid local minima. The author used the (squared) Euclidean distance as the similarity measurement metric, d, and computed the centroid means of each cluster as such: d (i) := x (i) µ k 2 (3.2) µ k := 1 C k i C k x (i) (3.3) where x (i) is the position of i th point. C k is constructed by finding c (i) that minimizes (3.3), where c (i) is the index of the centroid closest to x (i). In other words, points belong to a cluster, where the Euclidean distance between a point and its centroid is minimum Feature Matrix Using extracted audio features such as chroma, timbre, key, key confidence, mode, mode confidence, tempo, and loudness, feature matrix F IxJ is constructed, where I is the total number of points (= 41,269), and J is the total number of features (= 3 i.e. both chroma and timbre features are averaged across time, resulting in 12 x 1 dimensions for each point). Therefore, the cost function of the algorithm is: 1 I I d (i) (3.4) i=1 26

37 3.4 Audio Features and the optimization objective is to minimize (3.4) Feature Scale Feature scaling is necessary as each feature vector is in different range and therefore needs to be normalized for equal weighting. The author used mean/standard deviation scaling method for each feature f j as such: ˆf j = f j µ fj σ fj (3.5) Feature Clusters It is often arbitrary what should be the correct number for K and there is no algorithm that leads to the definitive answer. However, an elbow method is often used to determine the number of cluster, K. Figure 3.11 shows a plot of a cost function based on different K. Either K = 8 or K = 1 marks the elbow of the plot and a possible candidate for the number of clusters. In this study, K = 1 is chosen. 3 Elbow Method Cost K Figure 3.11: Elbow Method - K = 8 or K = 1 is the possible number of clusters 27

38 3.4 Audio Features After choosing the right value of K, the structure of clusters is found and shown in figure Cluster # of Songs 4,349 4,172 4,128 4,475 5,866 Cluster # of Songs 3,933 2,544 5,149 2,436 4,217 Table 3.16: Cluster Statistics - The number of songs within each cluster is found. Histogram of Content based Cluster Cluster Figure 3.12: audio features Content-based Cluster Histogram - Distribution of songs based on 28

39 3.5 Hubert-Arabie adjusted Rand Index 3.5 Hubert-Arabie adjusted Rand Index After obtaining six sets of clusters i.e. five with labels and one with audio features, the relationship among a pair of clusters can be found by calculating the Hubert- Arabie adjusted Rand (ARI HA ) index (7, 31). ARI HA index enables to quantify cluster validation by comparing the generated clusters with the original structure of the data. Therefore, by comparing two different sets of clusters, the correlation between two clusters can be drawn. ARI HA index can be measured as: ARI HA = ( N ) 2 (a + d) [(a + b)(a + c) + (c + d)(b + d)] ( N ) 2. (3.6) 2 [(a + b)(a + c) + (c + d)(b + d)] where N is the total number of data and a, b, c, d represents four different types of pairs. Let A and B be two sets of clusters and P and Q be number of clusters in each set, then a, b, c, and d are defined as following: a : element in the same group of both A and B b : elements in the same group of B but in different group of A c : elements in the same group of A but in different group of B d : elements in different group of both A and B which can be easily described by a contingency table shown in This leads to the computation of a, b, c, and d as following: a = P Q t 2 pq N p=1 q=1 2. (3.7) b = P t 2 p+ P p=1 Q t 2 pq p=1 q=1 2. (3.8) 29

40 3.5 Hubert-Arabie adjusted Rand Index c = Q t 2 +q P q=1 Q t 2 pq p=1 q=1 2. (3.9) d = P Q t 2 pq + N 2 P t 2 p+ p=1 q=1 2 p=1 Q t 2 +q q=1. (3.1) where t pq, t p+, and t +q denote the total number of elements belonging to both pth and qth cluster, the total number of elements belonging to pth cluster, and the total number of elements belonging to qth cluster, respectively. It can be viewed as such that ARI HA = 1 means perfect cluster recovery, while values greater than.9,.8, and.65 mean excellent, good, and moderate recovery, respectively (7). B A pair in same group pair in different group pair in same group a b pair in different group c d Table 3.17: 2 x 2 Contingency Table - 2 x 2 contingency table that describes four different types of pairs: a, b, c, d 3

41 Chapter 4 Evaluation and Discussion ARI HA is calculated for all pairs of cluster sets and shown in table 4.1. Features Era Emotion Genre Instrument Origin Features Era Emotion Genre Instrument Origin Table 4.1: Rand Index ARI HA - Cluster validation is calculated based on Hubert-Arabie adjusted It is observed from Table 4.1 that the cluster validation between any pair of cluster sets is overall very low with the highest correlation between emotion and genre at % and the lowest between origin and era at 3.15 %. Although all the validation values are too low to draw a relationship between a pair of audio labels, it is still interesting to observe that emotion and genre are most correlated among those, indicating that there are common emotion annotations for certain genres. In order to observe a closer relationship between emotion and genre, the number of intersections between each term from both label classes are calculated and the maximum intersection for each term is 31

42 shown in table 4.2 and 4.3. Genre Intersection Emotion Genre Intersection Emotion alternative 159 beautiful old 121 beautiful ambient 92 chill orchestra 117 beautiful ballade 161 beautiful opera 116 romantic blues 24 energetic post 164 chill british 119 beautiful power 16 melancholia christian 214 inspirational progressive 827 chill classic 392 romantic r&b 12 chill country 9 romantic rag 182 chill dance 74 chill soundtrack 142 chill dub 99 chill chill 173 chill electronic 59 chill smooth 1584 chill eurodance 52 uplifting soft 536 mellow hard style 94 gore swing 6 mellow hip hop 32 chill synth pop 132 melancholia instrumental 156 beautiful techno 96 happy industrial 44 romantic thrash 134 peace indie 2167 chill tribal 99 brutal lounge 58 beautiful urban 154 beautiful modern 7 chill waltz 49 romantic neo 169 chill wave 3448 romantic new 2 chill zouk 1 beautiful noise 121 beautiful nu 44 chill Table 4.2: Term Cooccurrence - The most common emotion term for each genre term is observed It is observed that because of disproportional distribution among emotion terms, most genre labels share the same emotion terms such as beautiful, chill, romantic. On the other hand, as the distribution of genre terms are more flat, many emotion terms share different genre terms. However, do note that the co-occurrence between an emotion label and a genre label does not correlate well with intuition as it can be observed from table 4.3. e.g. beautiful & indie, happy & hip hop, uplifting & progressive, 32

43 4.1 K vs. ARI HA Emotion Intersection Genre Emotion Intersection Genre beautiful 883 indie loud 119 christian brutal 99 tribal melancholia 325 indie calming 118 synthpop mellow 536 soft chill 32 hip hop moody 88 alternative energetic 276 wave obscure 5 new evil 72 indie patriotic 3 hip hop gore 94 hardstyle peace 134 thrash grim 659 hip hop relax 23 smooth happy 1472 hip hop romantic 3448 wave harsh 28 noise sad 161 indie haunting 11 electronic sexy 752 hip hop horror 37 wave strange 76 progressive humorous 96 salsa trippy 67 progressive hypnotic 93 smooth uplifting 79 progressive intense 7 rag wicked 99 hip hop inspirational 214 christian wistful 121 classic light 118 soft witty 14 progressive Table 4.3: Term Cooccurrence - The most common genre term for each emotion term is observed which is indicative of the low cluster validation rate. It also indicates that people use only limited vocabulary to describe the emotional aspect of a song regardless of the genre of the given song. Although it seems intuitive and expected that the correlations between audio labels turn out to be low, it is quite surprising that the cluster validations between audio features and each label are also low. In order to understand why this is the case, a number of post-processing steps are proposed. 4.1 K vs. ARI HA In section , the number of clusters, K, was chosen based on the elbow method. This K does not necessarily generate optimal validation rates, and therefore, K vs. ARI HA plot is drawn to find out K that maximizes the validation rates for each set of clusters. Figure 4.1 shows the pattern of ARI HA for each label class as K changes. It 33

44 4.2 Hubert-Arabie adjusted Rand Index (revisited) turns out that the sum of ARI HA is maximum when K = 5, the maximum number of feature clusters..6.5 ARI ha vs. K Era Emotion Genre Instrument Origin.4 ARI ha # of cluster: K Figure 4.1: K vs. ARI HA - ARI HA is maximum when K = Hubert-Arabie adjusted Rand Index (revisited) Using the result from previous section, (K = 5), ARI HA is re-calculated for each label class and shown in table 4.4. Era Emotion Genre Instrument Origin Features (original) Features (K = 5) Table 4.4: Optimal Cluster Validation - optimal ARI HA are calculated for each label class 34

45 4.3 Cluster Structure Analysis 4.3 Cluster Structure Analysis Now that the optimal K and ARI HA values are found, it needs to be discussed the reason for such low cluster validation rates. In order to do so, the structure of clusters needs to be known by calculating the Euclidean distance between centroids of clusters. Table 4.5 shows the Euclidean distance between centroids of clusters. Note that the centroids of clusters 1 and 2 have the minimum distance while those of clusters 3 and 4 have the maximum distance, indicating most similar and dissimilar clusters, respectively. Cluster Table 4.5: Self-similarity matrix - the distances between each pair of clusters are calculated Neighboring Clusters vs. Distant Clusters In order to observe the detailed structure of the cluster, co-occurrence between feature clusters and label clusters are calculated and the first four most co-occurred clusters are returned. In other words, for each feature cluster 1 through 5, four most intersecting clusters from each label class is calculated and shown in figures Note that due to uneven distribution of songs within each label class, the cluster that contains the largest number of songs such as 8s in era label, chill in emotion, hip hop in genre, synth in instrument, and dc in origin, appear frequently across all five feature clusters. In fact, 8s and chill clusters appear as the most co-occurring cluster with all five feature clusters. 35

46 4.3 Cluster Structure Analysis s 9s s 6s Cluster s 9s s 7s Cluster s 2th century 6s 9s Cluster s 9s s 7s Cluster s 9s s 6s Cluster 5 Figure 4.2: Co-occurence between feature clusters and era clusters - First four most co-occurred era clusters are returned for each feature cluster 36

47 4.3 Cluster Structure Analysis chill beautiful romantic happy Cluster 1 chill romantic beautiful happy Cluster chill romantic beautiful mellow Cluster chill happy romantic sexy Cluster chill beautiful romantic mellow Cluster 5 Figure 4.3: Co-occurence between feature clusters and emotion clusters - First four most co-occurred emotion clusters are returned for each feature cluster 37

48 4.3 Cluster Structure Analysis hip hop wave smooth indie Cluster 1 wave indie hip hop soft Cluster soundtrack smooth classic indie Cluster hip hop wave techno progressive Cluster indie wave smooth soft Cluster 5 Figure 4.4: Co-occurence between feature clusters and genre clusters - First four most co-occurred genre clusters are returned for each feature cluster 38

49 4.3 Cluster Structure Analysis synth guitar drum piano Cluster 1 synth guitar piano drum Cluster piano guitar synth drum Cluster synth drum guitar bass Cluster synth guitar piano drum Cluster 5 Figure 4.5: Co-occurence between feature clusters and instrument clusters - First four most co-occurred instrument clusters are returned for each feature cluster 39

50 4.3 Cluster Structure Analysis dc american german british Cluster american roma german los angeles Cluster dc british roma german Cluster dc british german roma Cluster american british dc german Cluster 5 Figure 4.6: Co-occurence between feature clusters and origin clusters - First four most co-occurred origin clusters are returned for each feature cluster Knowing that the distance between clusters 1 and 2 is minimum and the distance between clusters 3 and 4 is maximum, it can be also observed from figures that the co-occurring terms within clusters 1 and 2 are similar, while those within clusters 3 and 4 are quite dissimilar as shown in tables 4.6 and 4.7, indicating neighboring feature clusters share similar label clusters, while distant feature clusters do not. Cluster 1 vs Cluster 2 (8s, 9s, s, 6s) (8s, 9s, s, 7s) (chill, beautiful, romantic, happy) (chill, romantic, beautiful, happy) (hip hop, wave, smooth, indie) (wave, indie, hip hop, soft) (synth, guitar, drum, piano) (synth, guitar, piano, drum) (dc, american, german, british) (dc, british, roma, german) Table 4.6: Neighboring Clusters - clusters with minimum Euclidean distances share similar label clusters 4

51 4.3 Cluster Structure Analysis Cluster 3 vs Cluster 4 (8s, 2th century, 6s, 9s) (8s, 9s, s, 7s) (chill, romantic, beautiful, mellow) (chill, happy, romantic, sexy) (soundtrack, smooth, classic, indie) (hip hop, wave, techno, progressive) (piano, guitar, synth, drum) (synth, drum, guitar, bass) (american, roma, german, los angeles) (dc, british, german, roma) Table 4.7: Distant Clusters - clusters with maximum Euclidean distances have dissimilar label clusters Correlated Terms vs. Uncorrelated Terms Considering the opposite case, the author selected four largest clusters from each label class and calculated the co-occurrence with every feature clusters as shown in figures In order to observe whether highly correlated label clusters can also be characterized by feature clusters, table 4.8 shows the summary of the most correlated terms for the four largest clusters for each label class, whereas table 4.9 shows the least correlated terms for the same clusters. Using histograms from figures , 5-dimensional vector can be created for each term by finding the ratio of each feature cluster (e.g. a vector for 8s term is (Cluster 1, Cluster 2, Cluster 3, Cluster 4, Cluster 5) = (.71, 1,.196,.727,.61)). Using the same method, a total of 41 vectors are retrieved for every single term in tables 4.8 and 4.9 and shown in table 4.1. Using the relationship from tables 4.8, 4.9 and the vectors in 4.1, the Euclidean distance between a pair of vectors is calculated and shown in tables 4.11 and As its average distance indicates, highly correlated terms share similar combination of feature clusters, whereas lowly correlated terms do not. 41

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Using Genre Classification to Make Content-based Music Recommendations

Using Genre Classification to Make Content-based Music Recommendations Using Genre Classification to Make Content-based Music Recommendations Robbie Jones (rmjones@stanford.edu) and Karen Lu (karenlu@stanford.edu) CS 221, Autumn 2016 Stanford University I. Introduction Our

More information

Lecture 15: Research at LabROSA

Lecture 15: Research at LabROSA ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 15: Research at LabROSA 1. Sources, Mixtures, & Perception 2. Spatial Filtering 3. Time-Frequency Masking 4. Model-Based Separation Dan Ellis Dept. Electrical

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Classification of Timbre Similarity

Classification of Timbre Similarity Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Music Information Retrieval Community

Music Information Retrieval Community Music Information Retrieval Community What: Developing systems that retrieve music When: Late 1990 s to Present Where: ISMIR - conference started in 2000 Why: lots of digital music, lots of music lovers,

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface 1st Author 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl. country code 1st author's

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

Supporting Information

Supporting Information Supporting Information I. DATA Discogs.com is a comprehensive, user-built music database with the aim to provide crossreferenced discographies of all labels and artists. As of April 14, more than 189,000

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory

More information

Music Information Retrieval

Music Information Retrieval CTP 431 Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology (GSCT) Juhan Nam 1 Introduction ü Instrument: Piano ü Composer: Chopin ü Key: E-minor ü Melody - ELO

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Breakscience. Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass

Breakscience. Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass Breakscience Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass Jason A. Hockman PhD Candidate, Music Technology Area McGill University, Montréal, Canada Overview 1 2 3 Hardcore,

More information

A Categorical Approach for Recognizing Emotional Effects of Music

A Categorical Approach for Recognizing Emotional Effects of Music A Categorical Approach for Recognizing Emotional Effects of Music Mohsen Sahraei Ardakani 1 and Ehsan Arbabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran,

More information

ISMIR 2008 Session 2a Music Recommendation and Organization

ISMIR 2008 Session 2a Music Recommendation and Organization A COMPARISON OF SIGNAL-BASED MUSIC RECOMMENDATION TO GENRE LABELS, COLLABORATIVE FILTERING, MUSICOLOGICAL ANALYSIS, HUMAN RECOMMENDATION, AND RANDOM BASELINE Terence Magno Cooper Union magno.nyc@gmail.com

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

http://www.xkcd.com/655/ Audio Retrieval David Kauchak cs160 Fall 2009 Thanks to Doug Turnbull for some of the slides Administrative CS Colloquium vs. Wed. before Thanksgiving producers consumers 8M artists

More information

Analysis and Clustering of Musical Compositions using Melody-based Features

Analysis and Clustering of Musical Compositions using Melody-based Features Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

Music Similarity and Cover Song Identification: The Case of Jazz

Music Similarity and Cover Song Identification: The Case of Jazz Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson Automatic Music Similarity Assessment and Recommendation A Thesis Submitted to the Faculty of Drexel University by Donald Shaul Williamson in partial fulfillment of the requirements for the degree of Master

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features

Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features R. Panda 1, B. Rocha 1 and R. P. Paiva 1, 1 CISUC Centre for Informatics and Systems of the University of Coimbra, Portugal

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

The Million Song Dataset

The Million Song Dataset The Million Song Dataset AUDIO FEATURES The Million Song Dataset There is no data like more data Bob Mercer of IBM (1985). T. Bertin-Mahieux, D.P.W. Ellis, B. Whitman, P. Lamere, The Million Song Dataset,

More information

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Mood Tracking of Radio Station Broadcasts

Mood Tracking of Radio Station Broadcasts Mood Tracking of Radio Station Broadcasts Jacek Grekow Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, Bialystok 15-351, Poland j.grekow@pb.edu.pl Abstract. This paper presents

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach

More information

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Introduction Active neurons communicate by action potential firing (spikes), accompanied

More information

Analysis of local and global timing and pitch change in ordinary

Analysis of local and global timing and pitch change in ordinary Alma Mater Studiorum University of Bologna, August -6 6 Analysis of local and global timing and pitch change in ordinary melodies Roger Watt Dept. of Psychology, University of Stirling, Scotland r.j.watt@stirling.ac.uk

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Story Tracking in Video News Broadcasts Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Acknowledgements Motivation Modern world is awash in information Coming from multiple sources Around the clock

More information

Perceptual Evaluation of Automatically Extracted Musical Motives

Perceptual Evaluation of Automatically Extracted Musical Motives Perceptual Evaluation of Automatically Extracted Musical Motives Oriol Nieto 1, Morwaread M. Farbood 2 Dept. of Music and Performing Arts Professions, New York University, USA 1 oriol@nyu.edu, 2 mfarbood@nyu.edu

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Quality of Music Classification Systems: How to build the Reference?

Quality of Music Classification Systems: How to build the Reference? Quality of Music Classification Systems: How to build the Reference? Janto Skowronek, Martin F. McKinney Digital Signal Processing Philips Research Laboratories Eindhoven {janto.skowronek,martin.mckinney}@philips.com

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University danny1@stanford.edu 1. Motivation and Goal Music has long been a way for people to express their emotions. And because we all have a

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Evaluating Melodic Encodings for Use in Cover Song Identification

Evaluating Melodic Encodings for Use in Cover Song Identification Evaluating Melodic Encodings for Use in Cover Song Identification David D. Wickland wickland@uoguelph.ca David A. Calvert dcalvert@uoguelph.ca James Harley jharley@uoguelph.ca ABSTRACT Cover song identification

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC 12th International Society for Music Information Retrieval Conference (ISMIR 2011) MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC Sam Davies, Penelope Allen, Mark

More information

Exploring Relationships between Audio Features and Emotion in Music

Exploring Relationships between Audio Features and Emotion in Music Exploring Relationships between Audio Features and Emotion in Music Cyril Laurier, *1 Olivier Lartillot, #2 Tuomas Eerola #3, Petri Toiviainen #4 * Music Technology Group, Universitat Pompeu Fabra, Barcelona,

More information

Music Information Retrieval

Music Information Retrieval Music Information Retrieval Automatic genre classification from acoustic features DANIEL RÖNNOW and THEODOR TWETMAN Bachelor of Science Thesis Stockholm, Sweden 2012 Music Information Retrieval Automatic

More information

A Survey of Audio-Based Music Classification and Annotation

A Survey of Audio-Based Music Classification and Annotation A Survey of Audio-Based Music Classification and Annotation Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang IEEE Trans. on Multimedia, vol. 13, no. 2, April 2011 presenter: Yin-Tzu Lin ( 阿孜孜 ^.^)

More information

Unifying Low-level and High-level Music. Similarity Measures

Unifying Low-level and High-level Music. Similarity Measures Unifying Low-level and High-level Music 1 Similarity Measures Dmitry Bogdanov, Joan Serrà, Nicolas Wack, Perfecto Herrera, and Xavier Serra Abstract Measuring music similarity is essential for multimedia

More information

13 Matching questions

13 Matching questions Musical Genres NAME 13 Matching questions 1. jazz A. F. 2. pop 3. country 4. blues 5. hip hop B. G. 6. rap 7. reggae 8. heavy metal C. H. 9. classical 10. electronic 11. folk 12. dance D. I. 13. rock and

More information

HIT SONG SCIENCE IS NOT YET A SCIENCE

HIT SONG SCIENCE IS NOT YET A SCIENCE HIT SONG SCIENCE IS NOT YET A SCIENCE François Pachet Sony CSL pachet@csl.sony.fr Pierre Roy Sony CSL roy@csl.sony.fr ABSTRACT We describe a large-scale experiment aiming at validating the hypothesis that

More information

An action based metaphor for description of expression in music performance

An action based metaphor for description of expression in music performance An action based metaphor for description of expression in music performance Luca Mion CSC-SMC, Centro di Sonologia Computazionale Department of Information Engineering University of Padova Workshop Toni

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

Music Information Retrieval for Jazz

Music Information Retrieval for Jazz Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases

MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases 1 MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases Gus Xia Tongbo Huang Yifei Ma Roger B. Dannenberg Christos Faloutsos Schools of Computer Science Carnegie Mellon University 2

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

SIGNAL + CONTEXT = BETTER CLASSIFICATION

SIGNAL + CONTEXT = BETTER CLASSIFICATION SIGNAL + CONTEXT = BETTER CLASSIFICATION Jean-Julien Aucouturier Grad. School of Arts and Sciences The University of Tokyo, Japan François Pachet, Pierre Roy, Anthony Beurivé SONY CSL Paris 6 rue Amyot,

More information