Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index
|
|
- Barbra Norris
- 5 years ago
- Views:
Transcription
1 Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index Kwan Kim Submitted in partial fulfillment of the requirements for the Master of Music in Music Technology in the Department of Music and Performing Arts Professions in The Steinhardt School New York University Advisor: Dr. Juan P. Bello Reader: Dr. Kenneth Peacock Date: 212/12/11
2 Copyright c 212 Kwan Kim
3 Abstract With the advent of advanced technology and instant access to the Internet, the music databases have grown rapidly, requiring more efficient ways of organizing and providing access to music. A number of automatic classification algorithms are proposed in the field of music information retrieval (MIR) by a means of supervised learning method, in which ground truth labels are imperative. The goal of this study is to analyze a statistical relationship among audio labels such as era, emotions, genres, instruments, and origin, using the Million Song Dataset and Hubert-Arabie adjusted Rand Index in order to observe whether there is a significant enough correlation between these labels. It is found that the cluster validation is low among audio labels, which implies no strong correlation and not enough co-occurrence between these labels when describing songs.
4 Acknowledgements I would like to thank everyone involved in completing this thesis. I especially send my deepest gratitude to my advisor, Juan P. Bello, for keeping me motivated. His critics and insights consistently pushed me to become a better student. I also thank Mary Farbood for being such a friendly mentor. It was a pleasure to work as her assistant for the past year and half. I thank the rest of NYU faculty for providing an opportunity and excellent program to study. Lastly, I thank my family and wife for their support and love.
5 Contents List of Figures List of Tables iv vi 1 Introduction 1 2 Literature Review Music Information Retrieval Automatic Classification Genre Emotion Methodology Data Statistics Filtering st Filtering nd Filtering rd Filtering Co-occurence Hierarchical Structure th Filtering Term Frequency th Filtering Audio Labels Era Emotion ii
6 CONTENTS Genre Instrument Origins Audio Features k-means Clustering Algorithm Feature Matrix Feature Scale Feature Clusters Hubert-Arabie adjusted Rand Index Evaluation and Discussion K vs. ARI HA Hubert-Arabie adjusted Rand Index (revisited) Cluster Structure Analysis Neighboring Clusters vs. Distant Clusters Correlated Terms vs. Uncorrelated Terms Conclusion and Future Work 49 References 5 iii
7 List of Figures 1.1 System Diagram of a Generic Automatic Classification Model System Diagram of a Genre Classification Model System Diagram of a music emotion recognition model Thayer s 2-Dimensional Emotion Plane (19) Clusters Co-occurence - same level Co-occurence - different level Hierarchical Structure (Terms) Intersection of Labels Era Histogram Emotion Histogram Genre Histogram Instrument Histogram Origin Histogram Elbow Method Content-based Cluster Histogram K vs. ARI HA Co-occurence between feature clusters and era clusters Co-occurence between feature clusters and emotion clusters Co-occurence between feature clusters and genre clusters Co-occurence between feature clusters and instrument clusters Co-occurence between feature clusters and origin clusters Co-occurence between era clusters and feature clusters iv
8 LIST OF FIGURES 4.8 Co-occurence between emotion clusters and feature clusters Co-occurence between genre clusters and feature clusters Co-occurence between instrument clusters and feature clusters Co-occurence between origin clusters and feature clusters v
9 List of Tables 3.1 Overall Data Statistics Field List Terms Labels Clusters Hierarchical Structure (Clusters) Hierarchical Structure (µ and σ) Mutually Exclusive Clusters Filtered Dataset Era Statistics Emotion Terms Genre Terms Instrument Statistics Origin Terms Audio Features Cluster Statistics x 2 Contingency Table ARI HA Term Cooccurrence Term Cooccurrence Optimal Cluster Validation Self-similarity matrix Neighboring Clusters Distant Clusters vi
10 LIST OF TABLES 4.8 Term Correlation Term Correlation Term Vectors Label Cluster Distance Label Cluster Distance vii
11 Chapter 1 Introduction In 21 st century, we are living in a world, where an instant access to the countless number of music database is granted. Online music stores such as itunes store or online music streaming service such as Pandora provides millions of songs from artists all over the world. As the music database has grown rapidly with the advent of advanced technology and the Internet, it requires much more efficient ways of organizing and finding music. One of the main tasks in the field of music information retrieval (MIR) is to generate a computational model for classification of audio data such that it is faster and easier to search for and listen to music. A number of researchers have proposed methods to categorize music into different classifications such as genres, emotions, activities, or artists (1, 2, 3, 4). This automated classification would then let us search for audio data based on their labels e.g., when we search for sad music, the audio emotion classification model returns songs with label sad. Regardless of the type of classification model, there is a generic approach to this problem as outlined in figure 1.1 extracting audio features, obtaining labels, and computing the parameters to generate a model by a means of supervised machine learning technique. When utilizing a supervised learning technique to construct a classification model, however, it is imperative that ground truth labels are provided. Obtaining labels involves 1
12 human subjects, which makes the process expensive and inefficient. In certain cases, the number of labels are bound to be insufficient, making it even harder to collect data. As a result, researchers have used a semi-supervised learning method, in which unlabeled data is combined with labeled data during training process in order to improve performance (5). However, this method is also limited to certain situation, where data has only one type of label e.g., if a dataset is labeled by genre, it is possible to construct a genre classification model; however, it is not possible to create a mood classification model without knowing a priori correspondence between genre and mood labels. This causes a problem when certain dataset has only one type of label and needs to be classified into a different label class. It can be much efficient and less expensive if there exists a statistical correspondence among different audio labels so that it enables to easily predict a different label class from the same dataset. Therefore, the goal of this study is to define a statistical relationship among different audio labels such as genres, emotions, era, origin, and instruments, using The Million Song Dataset (6), applying an unsupervised learning technique, i.e. k-means algorithm, and calculating the Hubert-Arabie adjusted Rand (ARI HA ) index (7). The outline of this thesis is organized in following steps: literature review will be provided about previous MIR studies on automatic classification models. The detailed methodology and data analysis will be given in Chapter 3. Based on the results obtained from Chapter 3, possible interpretations of the data will be discussed in Chapter 4. Finally, concluding remarks and future work are laid out in Chapter 5. 2
13 Figure 1.1: System Diagram of a Generic Automatic Classification Model - Labels are used only in supervised learning case 3
14 Chapter 2 Literature Review 2.1 Music Information Retrieval There are many ways to categorize music. One of the traditional ways to categorize music is by its metadata such as name of song, artist, or album, which is known as tag-based or text-based categorization (8). As music databases have grown virtually countless, it requires more efficient ways to query and retrieve music. As opposed to tag-based query and retrieval, which only enables to retrieve songs that we have a priori information about, a content-based query and retrieval allows us to find songs in different ways - e.g., it allows to find songs similar in musical context or structure and it could also recommend songs based on musical labels such as emotion. Music information retrieval (MIR) is a widely and rapidly growing research topic in the multimedia processing industry, which aims at extending the understanding and usefulness of music data, through the research, development and application of computational approaches and tools. As a novel way of retrieving songs or creating a playlist, researchers have come up with a number of classification methods using different labels such as genre, emotion, or cover song (1, 2, 3, 4) so that each classification model could retrieve a song based on its label or musical similarities. These methods are different 4
15 2.2 Automatic Classification than tag-based method since audio features are extracted and analyzed prior to constructing a computational model. Therefore, a retrieved song is based on the content of the audio, not on its metadata. 2.2 Automatic Classification In previous studies, most audio classification models are based on supervised learning method, in which musical labels such as genre or emotion are required (1, 2, 3, 4). Using labels along with well-defined high-dimensional musical features, learning algorithms go through computations to train the data to find possible relationships between the features and a label so that for any given unknown (test) data, the model could correctly recognize the label Genre Tzanetakis et al. (1, 9) are among the earliest researchers who worked on automatic genre classification. Instead of manually assigning musical genre for a song, automatic genre classification model enables to generate a genre label for a given song after comparing its musical features with the model. In (1, 9) the authors used three feature sets, in which each describes timbral texture, rhythmic content, and pitch content, respectively. Features such as spectral centroid, rolloff, flux, zero crossing rate, and MFCC (1) were extracted to construct a feature vector that describes timbral texture of music. Automatic beat detection algorithm (4, 11) was used to calculate the rhythmic structure of music and used as a feature vector that describes rhythmic content. Lastly, pitch detection techniques (12, 13, 14) were used to construct a pitch content feature vector. Figure 3.12 represents the system overview of the automatic genre classification model described in (1, 9). 5
16 2.2 Automatic Classification Figure 2.1: System Diagram of a Genre Classification Model - Gaussian Mixture Model (GMM) is used as a classifier 6
17 2.2 Automatic Classification Emotion In 26, the work of L. Lu et al. (15) was one of a few studies that provided indepth analysis of mood detection and tracking of music signals using acoustic features extracted directly from audio waveform, instead of using MIDI or symbolic representations. Although it has been an active research topic, researchers have consistently faced the same problem with quantification of music emotion due to the nature of subjectivity of music emotion. Recent studies have sought ways to minimize the inconsistency among labels. Skowronek et al. (16) paid close attention to material collection process. They obtained a large number of labelled data from 12 subjects and accounted for only those in agreement with one another in order to exclude the ambiguous ones. In (17), the authors created a collaborative game that collects dynamic (time-varying) labels of music mood from two players and ensures that the players cross check each other s label in order to build a consensus. Defining mood classes is not an easy task. There have been mainly two approaches to defining mood: categorical and continuous. In (15) mood labels are classified into adjectives such as happy, angry, sad, or sleepy. However, the authors in (18) defined mood as a continuous regression problem as described in figure 2.2, and mapped emotion into two-dimensional Thayer s Plane (19) shown in figure 2.3. Recent studies focus on multi-modal classification using both lyrics and audio contents to quantify music emotion (2, 21, 22), on dynamic music emotion modeling (23, 24), or on unsupervised learning approach for mood recognition (25). 7
18 2.2 Automatic Classification Figure 2.2: System Diagram of a music emotion recognition model - Arousal and Valence are two independent regressors Figure 2.3: Thayer s 2-Dimensional Emotion Plane (19) - Each axis is used as an independent regressor 8
19 Chapter 3 Methodology Previous studies have constructed the automatic classification model, using a relationship between audio features and one type of label (e.g. genre or mood). As it is stated in chapter 1, however, if statistical relationship among several audio labels is defined, it could reduce the cost of constructing the automatic classification models. In order to solve the problem, two things are needed: 1. Big Music Data with multiple labels: The Million Song Dataset (6) 2. Cluster Validation Method: Hubert-Arabie adjusted Rand Index A large dataset is required to minimize bias and noisiness of labels. Since labels are acquired from users, small number of music data would lead to large variance among labels and thus large error. A cluster validation method is required to compare sets of clusters created by different labels, hence Hubert-Arabie adjusted Rand index. 9
20 3.1 Data Statistics 3.1 Data Statistics The Million Song Dataset (6) consists of million files in HDF5 format, from which various information can be retrieved including metadata such as the name of artist, title of song, or tags (terms) and musical features such as chroma, tempo, loudness, mode, or key. Table 3.1 shows the overall statistics of the dataset and table 3.2 shows a list of fields available in the files of the dataset. No. Type Total 1 Songs 1,, 2 Data 273 GB 3 Unique Artists 44,745 4 Unique Terms 7,643 5 Artists with at least one term 43,943 6 Identified Cover Song 18,196 Table 3.1: Overall Data Statistics - Statistics of The Million Song Dataset 3.2 Filtering LabROSA, the distributor of The Million Song Dataset, also provides all the necessary functions to access and manipulate the data from Matlab. HDF5 Song File Reader function lets convert.h5 files into a Matlab object, which can be further used to extract labels using get artist terms function and features using get segments pitches, get tempo, and get loudness functions. Therefore, labels enable to create several sets of clusters, while audio features are used to form another set of cluster. Figure 3.1 indicates different sets of clusters. Although it is idealistic to take account of all million songs, due to the noisiness of data, dataset must undergo following filtering process to get rid of unnecessary songs: 1. All terms are categorized into one of 5 label classes 2. Create a set of clusters based on each label class. 1
21 3.2 Filtering 3. Find hierarchical structure of each label class. 4. Make each set of clusters mutually exclusive. 5. Songs that contain at least one term from all of the five label classes are retrieved. Figure 3.1: Clusters - Several sets of clusters can be made using labels and audio features st Filtering As shown in table 3.1, there are 7643 unique terms that describe songs in the dataset. Some examples of these terms are shown in table 3.3. These unique terms have to be filtered so that meaningless terms are ignored. In other words, five labels are chosen so that each term can be categorized into one of following five labels: era, emotion, genre, instrument, and origin. Doing so, any term that cannot be described as one of those labels is dropped. Table 3.4 shows the total number of terms that belong to each label. Note the small number of terms in each label category compared to original 7643 unique terms. This is because many terms cross reference each other. For example, s, s alternative, and s country all count as unique terms, but they are all represented as s under era label class. Similarly, alternative jazz, alternative 11
22 3.2 Filtering Field Name Type Description analysis sample rate float sample rate of the audio used artist familiarity float algorithmic estimation artist hotnesss float algorithmic estimation artist id string Echo Nest ID artist name string artist name artist terms array string Echo Nest tags artist terms freq array float Echo Nest tags freqs audio md5 string audio hash code bars confidence array float confidence measure bars start array float beginning of bars beats confidence array float confidence measure beats start array float result of beat tracking danceability float algorithmic estimation duration float in seconds energy float energy from listener perspective key int key the song is in key confidence float confidence measure loudness float overall loudness in db mode int major or minor mode confidence float confidence measure release string album name sections confidence array float confidence measure sections start array float largest grouping in a song segments confidence array float confidence measure segments loudness max array float max db value segments loudness max time array float time of max db value segments loudness max start array float db value at onset segments pitches 2D array float chroma feature segments start array float musical events segments timbre 2D array float texture features similar artist array string Echo Nest artist IDs song hotttnesss float algorithmic estimation song id string Echo Nest song ID tempo float estimated tempo in BPM time signature int estimate of number of beats/bar time signature confidence float confidence measure title string song title track id string Echo Nest track ID Table 3.2: Field List - A list of fields available in the files of the dataset 12
23 3.2 Filtering rock, alternative r & b, and alternative metal are simply alternative, jazz, rock, r & b, and metal under genre label category. No. Terms 1 s 2 s alternative 3 s country gp worldwide 3113 grammy winner 3114 gramophone punky reggae 5788 pure black metal 5789 pure grunge.... Table 3.3: Terms - Examples of terms (tags) Label Total era 17 emotion 96 genre 436 instrument 78 origin 635 Table 3.4: Labels - Terms belong to each label class In this way, the total number of terms in each category is reduced and it is still possible to search songs without using repetitive terms. For example, a song that has alternative jazz term can be searched by both alternative and jazz keywords, instead of alternative jazz. In addition, composite terms such as alternative jazz or ambient electronics are not included since they are at the lowest level of hierarchical level and the number of elements that belong to such clusters is few. 13
24 3.2 Filtering nd Filtering After all unique terms are filtered into one of five label classes, each term belonging to each label class is regarded as a cluster as shown in table 3.5. Note that it is still not deterministic that all terms are truly representative as independent clusters as it must be taken into account that there are a few hierarchical layers among terms i.e. piano and electric piano terms might not be in the same level of hierarchy in instrument label class. In order to account for differences in layers, co-occurence between a pair of clusters is calculated as explained in next section. Label era emotion genre instrument origin Clusters s 17s 191s 19th century angry chill energetic horror mellow ambient blues crossover dark electronic opera accordion banjo clarinet horn ukelele laptop african belgian dallas hongkong moroccan Table 3.5: Clusters - Each term forms a cluster within each label class rd Filtering Co-occurence Within a single label class, there are a number of different terms, of which each could possibly represent an individual cluster. However, while certain terms inherently possess clear meaning, some do not e.g. in genre label class, the distinctions between dark metal and death metal or acid metal and acid funk might not be obvious. In order to avoid ambiguity among clusters, co-ocurrences of two clusters are measured. Co-occurrences of a pair of clusters can be easily calculated as follows: cooc a,b = intersect(a, b) intersect(a, a), cooc intersect(a, b) b,a = intersect(b, b) (3.1) 14
25 3.2 Filtering where intersect(i, j) counts the number of elements in both i and j clusters. Therefore, if both clusters have high or small co-occurrence values, it implies that there is a large or small overlap between clusters, while if only one of two clusters has a high value and the other has a low value, it implies that one cluster is a subset of the other as illustrated in figures 3.2 and 3.3. Also note that if one cluster is a subset of the other, it implies that they are not at the same hierarchical level. Figure 3.2: Co-occurence - same level - (a) small overlap between two clusters; (b) large overlap between two clusters Figure 3.3: Co-occurence - different level - (a) Cluster B is a subset of Cluster A; (b) Cluster A has relatively large number of elements than Cluster B, of which most belong to intersection Therefore, threshold is set such that if (cooc a,b >.9 & cooc b,a <.1) or (cooc a,b <.1 & cooc b,a >.9), then cluster A is a subset of cluster B or vice versa. If neither condition 15
26 3.2 Filtering is met, two clusters are at the same hierarchical level. In doing so, layers of hierarchy can be retrieved Hierarchical Structure After obtaining co-occurrence values for all the pairs of clusters, the structure of clusters in each label classes can be known. Table 3.6 shows the hierarchical structure of each label class and figure 3.4 shows some of the terms at different hierarchical level. Label 1st Layer 2nd Layer 3rd Layer Total era 3 14 empty 17 emotion 3 93 empty 96 genre instrument empty 9 origin Table 3.6: Hierarchical Structure(Clusters) - Total number of clusters at different layers in each label class Figure 3.4: Hierarchical Structure (Terms) - Examples of terms at different layers in each label class 16
27 3.2 Filtering The structure looks well correlated with intuition with more general terms at higher level such as bass or guitar, while terms such as acoustic bass or classical guitar are at lower level. The number of songs in each cluster also matches well with intuition. Terms at the high level of hierarchy have a large number of songs, while there are relatively small number of songs that belong to terms at the low level. Now that the structure of clusters for each label class is known, it must be carefully decided that which layer should be used as there is a tradeoff between the number of clusters and the number of songs belonging to each cluster: higher layer has a small total number of clusters but each cluster contains sufficient amount of songs and vice versa. In order to make a logical decision, three different thresholds are set: the number of cluster, N, the mean, µ, and the standard deviation, σ, of all levels are calculated and shown in table 3.7. The rationale is that each layer within a label class must have enough number of clusters and that each cluster must contain sufficient number of songs while the variance of the distribution is as small as possible. The author defined the value for all three thresholds as follows: N > 5 µ > 5, σ = as small as possible 1st layer from instrument class and 2nd layer from era, emotion, genre, and origin label classes are selected as shown in table
28 3.2 Filtering Label 1st Layer 2nd Layer 3rd Layer µ σ µ σ µ σ era 59,524 (3) 47,835 21,686 (14) 43,263 empty empty emotion 23,95 (3) 15,677 5,736 (93) 19,871 empty empty genre 35,839 (27) 79,37 5,744 (127) 14,37 2,421 (274) 6,816 instrument 39,744 (18) 76, (72) 2,464 empty empty origin 69,452 (13) 141,44 8,84 (135) 23, (487) 2,8 Table 3.7: Hierarchical Structure (µ and σ)] - The mean and the standard deviation for each layer. Number in parenthesis denotes number of clusters. Bold numbers denote selected layer th Filtering Term Frequency After finding the structure of clusters and selecting the layer in the previous section, all the clusters within the same layer must become mutually exclusive, leaving no overlapping elements among clusters. Therefore, after finding intersections among clusters, it needs to be decided to which cluster the element should belong. In order to resolve conflicts in multiple clusters, the frequency of terms is retrieved for every single element via provided function get artist terms freq. Therefore, for every element within intersection, the term frequency value is taken into account and whichever term that has a higher value should take the element, while the other should lose. In this way, total number of clusters are reduced via merging and all the terms become mutually exclusive. Table 3.8 indicates total number of songs in each label class. Label # of Songs era 387,977 emotion 394,86 genre 7,778 instrument 384,59 origin 871,631 Table 3.8: Mutually Exclusive Clusters - Total number of songs in mutually exclusive clusters 18
29 3.2 Filtering th Filtering Since most songs are given multiple terms, they might belong to several label classes e.g. a song with s and alternative jazz terms belong to both era and genre label class. Therefore, after obtaining the indexes of songs that belong to each category, intersections among these indexes are retrieved so that only the songs with each of all five labels are considered. The description of aforementioned process is shown in figure 3.5. Finally, the total number of clusters in each label class and the total number of songs used in the study after all filtering processes is shown in table 3.9. Figure 3.5: chosen Intersection of Labels - Songs that belong to all five label classes are Songs Era Emotion Genre Instrument Origin Original 1,, Filtered 41, Table 3.9: Filtered Dataset - Total number of songs and clusters after filtering 19
30 3.3 Audio Labels 3.3 Audio Labels Era After all the filtering processes, 7 clusters are selected for era label class. Terms such as 16 th century or 21 th century as well as 3s and 4s are successfully ignored via merging and hierarchy. Table 3.1 and figure 3.6 show the statistics of remaining terms. Note that the distribution is negatively skewed, which is intuitive, because there are more songs that exist in recorded format in later decades than early 2th century due to the advanced recording technology. It also makes sense that the cluster 8s consists of most songs because people use the term 8s to describe 8s rock or 8s music more often than s music or s pop. Era 5s 6s 7s 8s 9s s 2th Total 661 3,525 2,826 17,359 9,555 6,111 1,232 41,269 Table 3.1: Era Statistics - # of Songs belonging to each era cluster Histogram of Era Cluster s 6s 7s 8s 9s s 2th century Cluster Figure 3.6: Era Histogram - Distribution of songs based on era label 2
31 3.3 Audio Labels Emotion There are a total of 34 clusters in emotion label class, which are shown in table Note the uneven distribution of songs in emotion label class is shown in figure 3.7. Clusters such as beautiful, chill, and romantic together consist about one third of the total songs, while there are relatively a few number of songs belonging to clusters such as evil, haunting, and uplifting. Emotion beautiful brutal calming chill energetic evil gore grim happy harsh haunting horror humorous hypnotic intense inspirational light loud melancholia mellow moody obscure patriotic peace relax romantic sad sexy strange trippy uplifting wicked wistful witty Table 3.11: Emotion Terms - all the emotion terms. Histogram of Emotion Cluster beautiful evil haunting inspirational moody romantic uplifting Cluster Figure 3.7: Emotion Histogram - Distribution of songs based on emotion label 21
32 3.3 Audio Labels Genre A total of 44 genre clusters are created and shown in table 3.12 and its distribution is shown in figure 3.8. Also note that certain genre terms such as hip hop, indie, and wave have more songs than the others like emo or melodic. Genre alternative ambient ballade blues british christian classic country dance dub electronic eurodance hard style hip hop instrumental industrial indie lounge modern neo new noise nu old orchestra opera post power progressive r&b rag soundtrack salsa smooth soft swing synth pop techno thrash tribal urban waltz wave zouk Table 3.12: Genre Terms - all the genre terms. 7 Histogram of Genre Cluster alternative country instrumental noise progressive swing wave Cluster Figure 3.8: Genre Histogram - Distribution of songs based on genre label 22
33 3.3 Audio Labels Instrument There are only 7 instrument clusters after filtering processes. The name of each cluster and the number of songs belonging to corresponding cluster is given in table The values make perfect sense as guitar, piano, and synth have many songs in their clusters while there are relatively small number of songs belonging to saxophone and violin. Figure 3.9 shows the histogram of instrument clusters. Instrument bass 2444 drum 513 guitar 9731 piano 5667 saxophone 134 violin 322 synth Table 3.13: Instrument Statistics - belonging to each instrument cluster Histogram of Genre Cluster bass drum guitar piano saxophone violin synth Cluster Figure 3.9: Instrument Histogram - Distribution of songs based on instrument label 23
34 3.3 Audio Labels Origins There are 33 different origin clusters as laid out in table Note that clusters such as american, british, dc, and german have a large number of songs, while clusters such as new orleans, suomi, or texas consists of relatively small number of songs. Also note that terms american and texas both appear as independent clusters, while it seems intuitive that texas should be a subset of american. It is because when describing a song with origin label, certain songs are specifically described by texas than american or united states e.g. country music. Finally, the statistics of origin label class is shown in figure 3.1. Origin african american belgium british canada cuba dc east coast england german ireland israel italian japanese los angeles massachusetts mexico nederland new york norway new orleans poland roma russia scotland southern spain suomi sweden tennessee texas united states west coast Table 3.14: Origin Terms - all the origin terms. Histogram of Origin Cluster african cuba ireland massachusetts new orleans southern texas Cluster Figure 3.1: Origin Histogram - Distribution of songs based on origin label 24
35 3.4 Audio Features 3.4 Audio Features Audio features are extracted in order to construct feature clusters using clustering algorithm, using provided functions such as get segments timbre or get segments pitches. Table 3.15 shows a list of extracted features. It takes about 3ms to extract a feature from one song, which makes a total of 8 hours from million songs. However, since only 41,269 songs are used, the computation time is reduced to less than an hour. No. Feature Function 1 Chroma get segments pitches 2 Texture get segments timbre 3 Tempo get tempo 4 Key get key 5 Key Confidence get key confidence 6 Loudness get loudness 7 Mode get mode 8 Mode Confidence get mode confidence Table 3.15: Audio Features - Several audio features are extracted via respective functions k-means Clustering Algorithm Content-based clusters can be constructed based on clustering algorithm, an unsupervised learning method, which does not require pre-labeling for data and uses only features to construct clusters of similar data points. There are several variants of clustering algorithms such as k-means, k-median, centroid-based, or single-linkage (26, 27, 28). In this study, k-means clustering algorithm is used for automatic clusters. The basic structure of the algorithm is defined in following steps (29, 3): 1. Define a similarity measurement metric, d. (e.g. Euclidean, Manhattan, etc.) 2. Randomly initialize k centroids, µ k. 3. For all data points x, find µ k that returns minimum d. 25
36 3.4 Audio Features 4. Find C k, a cluster that includes a set of points assigned to µ k. 5. Recalculate µ k for every C k. 6. Repeat steps 3 through 5 until it converges. 7. Repeat steps 2 through 6 multiple times to avoid local minima. The author used the (squared) Euclidean distance as the similarity measurement metric, d, and computed the centroid means of each cluster as such: d (i) := x (i) µ k 2 (3.2) µ k := 1 C k i C k x (i) (3.3) where x (i) is the position of i th point. C k is constructed by finding c (i) that minimizes (3.3), where c (i) is the index of the centroid closest to x (i). In other words, points belong to a cluster, where the Euclidean distance between a point and its centroid is minimum Feature Matrix Using extracted audio features such as chroma, timbre, key, key confidence, mode, mode confidence, tempo, and loudness, feature matrix F IxJ is constructed, where I is the total number of points (= 41,269), and J is the total number of features (= 3 i.e. both chroma and timbre features are averaged across time, resulting in 12 x 1 dimensions for each point). Therefore, the cost function of the algorithm is: 1 I I d (i) (3.4) i=1 26
37 3.4 Audio Features and the optimization objective is to minimize (3.4) Feature Scale Feature scaling is necessary as each feature vector is in different range and therefore needs to be normalized for equal weighting. The author used mean/standard deviation scaling method for each feature f j as such: ˆf j = f j µ fj σ fj (3.5) Feature Clusters It is often arbitrary what should be the correct number for K and there is no algorithm that leads to the definitive answer. However, an elbow method is often used to determine the number of cluster, K. Figure 3.11 shows a plot of a cost function based on different K. Either K = 8 or K = 1 marks the elbow of the plot and a possible candidate for the number of clusters. In this study, K = 1 is chosen. 3 Elbow Method Cost K Figure 3.11: Elbow Method - K = 8 or K = 1 is the possible number of clusters 27
38 3.4 Audio Features After choosing the right value of K, the structure of clusters is found and shown in figure Cluster # of Songs 4,349 4,172 4,128 4,475 5,866 Cluster # of Songs 3,933 2,544 5,149 2,436 4,217 Table 3.16: Cluster Statistics - The number of songs within each cluster is found. Histogram of Content based Cluster Cluster Figure 3.12: audio features Content-based Cluster Histogram - Distribution of songs based on 28
39 3.5 Hubert-Arabie adjusted Rand Index 3.5 Hubert-Arabie adjusted Rand Index After obtaining six sets of clusters i.e. five with labels and one with audio features, the relationship among a pair of clusters can be found by calculating the Hubert- Arabie adjusted Rand (ARI HA ) index (7, 31). ARI HA index enables to quantify cluster validation by comparing the generated clusters with the original structure of the data. Therefore, by comparing two different sets of clusters, the correlation between two clusters can be drawn. ARI HA index can be measured as: ARI HA = ( N ) 2 (a + d) [(a + b)(a + c) + (c + d)(b + d)] ( N ) 2. (3.6) 2 [(a + b)(a + c) + (c + d)(b + d)] where N is the total number of data and a, b, c, d represents four different types of pairs. Let A and B be two sets of clusters and P and Q be number of clusters in each set, then a, b, c, and d are defined as following: a : element in the same group of both A and B b : elements in the same group of B but in different group of A c : elements in the same group of A but in different group of B d : elements in different group of both A and B which can be easily described by a contingency table shown in This leads to the computation of a, b, c, and d as following: a = P Q t 2 pq N p=1 q=1 2. (3.7) b = P t 2 p+ P p=1 Q t 2 pq p=1 q=1 2. (3.8) 29
40 3.5 Hubert-Arabie adjusted Rand Index c = Q t 2 +q P q=1 Q t 2 pq p=1 q=1 2. (3.9) d = P Q t 2 pq + N 2 P t 2 p+ p=1 q=1 2 p=1 Q t 2 +q q=1. (3.1) where t pq, t p+, and t +q denote the total number of elements belonging to both pth and qth cluster, the total number of elements belonging to pth cluster, and the total number of elements belonging to qth cluster, respectively. It can be viewed as such that ARI HA = 1 means perfect cluster recovery, while values greater than.9,.8, and.65 mean excellent, good, and moderate recovery, respectively (7). B A pair in same group pair in different group pair in same group a b pair in different group c d Table 3.17: 2 x 2 Contingency Table - 2 x 2 contingency table that describes four different types of pairs: a, b, c, d 3
41 Chapter 4 Evaluation and Discussion ARI HA is calculated for all pairs of cluster sets and shown in table 4.1. Features Era Emotion Genre Instrument Origin Features Era Emotion Genre Instrument Origin Table 4.1: Rand Index ARI HA - Cluster validation is calculated based on Hubert-Arabie adjusted It is observed from Table 4.1 that the cluster validation between any pair of cluster sets is overall very low with the highest correlation between emotion and genre at % and the lowest between origin and era at 3.15 %. Although all the validation values are too low to draw a relationship between a pair of audio labels, it is still interesting to observe that emotion and genre are most correlated among those, indicating that there are common emotion annotations for certain genres. In order to observe a closer relationship between emotion and genre, the number of intersections between each term from both label classes are calculated and the maximum intersection for each term is 31
42 shown in table 4.2 and 4.3. Genre Intersection Emotion Genre Intersection Emotion alternative 159 beautiful old 121 beautiful ambient 92 chill orchestra 117 beautiful ballade 161 beautiful opera 116 romantic blues 24 energetic post 164 chill british 119 beautiful power 16 melancholia christian 214 inspirational progressive 827 chill classic 392 romantic r&b 12 chill country 9 romantic rag 182 chill dance 74 chill soundtrack 142 chill dub 99 chill chill 173 chill electronic 59 chill smooth 1584 chill eurodance 52 uplifting soft 536 mellow hard style 94 gore swing 6 mellow hip hop 32 chill synth pop 132 melancholia instrumental 156 beautiful techno 96 happy industrial 44 romantic thrash 134 peace indie 2167 chill tribal 99 brutal lounge 58 beautiful urban 154 beautiful modern 7 chill waltz 49 romantic neo 169 chill wave 3448 romantic new 2 chill zouk 1 beautiful noise 121 beautiful nu 44 chill Table 4.2: Term Cooccurrence - The most common emotion term for each genre term is observed It is observed that because of disproportional distribution among emotion terms, most genre labels share the same emotion terms such as beautiful, chill, romantic. On the other hand, as the distribution of genre terms are more flat, many emotion terms share different genre terms. However, do note that the co-occurrence between an emotion label and a genre label does not correlate well with intuition as it can be observed from table 4.3. e.g. beautiful & indie, happy & hip hop, uplifting & progressive, 32
43 4.1 K vs. ARI HA Emotion Intersection Genre Emotion Intersection Genre beautiful 883 indie loud 119 christian brutal 99 tribal melancholia 325 indie calming 118 synthpop mellow 536 soft chill 32 hip hop moody 88 alternative energetic 276 wave obscure 5 new evil 72 indie patriotic 3 hip hop gore 94 hardstyle peace 134 thrash grim 659 hip hop relax 23 smooth happy 1472 hip hop romantic 3448 wave harsh 28 noise sad 161 indie haunting 11 electronic sexy 752 hip hop horror 37 wave strange 76 progressive humorous 96 salsa trippy 67 progressive hypnotic 93 smooth uplifting 79 progressive intense 7 rag wicked 99 hip hop inspirational 214 christian wistful 121 classic light 118 soft witty 14 progressive Table 4.3: Term Cooccurrence - The most common genre term for each emotion term is observed which is indicative of the low cluster validation rate. It also indicates that people use only limited vocabulary to describe the emotional aspect of a song regardless of the genre of the given song. Although it seems intuitive and expected that the correlations between audio labels turn out to be low, it is quite surprising that the cluster validations between audio features and each label are also low. In order to understand why this is the case, a number of post-processing steps are proposed. 4.1 K vs. ARI HA In section , the number of clusters, K, was chosen based on the elbow method. This K does not necessarily generate optimal validation rates, and therefore, K vs. ARI HA plot is drawn to find out K that maximizes the validation rates for each set of clusters. Figure 4.1 shows the pattern of ARI HA for each label class as K changes. It 33
44 4.2 Hubert-Arabie adjusted Rand Index (revisited) turns out that the sum of ARI HA is maximum when K = 5, the maximum number of feature clusters..6.5 ARI ha vs. K Era Emotion Genre Instrument Origin.4 ARI ha # of cluster: K Figure 4.1: K vs. ARI HA - ARI HA is maximum when K = Hubert-Arabie adjusted Rand Index (revisited) Using the result from previous section, (K = 5), ARI HA is re-calculated for each label class and shown in table 4.4. Era Emotion Genre Instrument Origin Features (original) Features (K = 5) Table 4.4: Optimal Cluster Validation - optimal ARI HA are calculated for each label class 34
45 4.3 Cluster Structure Analysis 4.3 Cluster Structure Analysis Now that the optimal K and ARI HA values are found, it needs to be discussed the reason for such low cluster validation rates. In order to do so, the structure of clusters needs to be known by calculating the Euclidean distance between centroids of clusters. Table 4.5 shows the Euclidean distance between centroids of clusters. Note that the centroids of clusters 1 and 2 have the minimum distance while those of clusters 3 and 4 have the maximum distance, indicating most similar and dissimilar clusters, respectively. Cluster Table 4.5: Self-similarity matrix - the distances between each pair of clusters are calculated Neighboring Clusters vs. Distant Clusters In order to observe the detailed structure of the cluster, co-occurrence between feature clusters and label clusters are calculated and the first four most co-occurred clusters are returned. In other words, for each feature cluster 1 through 5, four most intersecting clusters from each label class is calculated and shown in figures Note that due to uneven distribution of songs within each label class, the cluster that contains the largest number of songs such as 8s in era label, chill in emotion, hip hop in genre, synth in instrument, and dc in origin, appear frequently across all five feature clusters. In fact, 8s and chill clusters appear as the most co-occurring cluster with all five feature clusters. 35
46 4.3 Cluster Structure Analysis s 9s s 6s Cluster s 9s s 7s Cluster s 2th century 6s 9s Cluster s 9s s 7s Cluster s 9s s 6s Cluster 5 Figure 4.2: Co-occurence between feature clusters and era clusters - First four most co-occurred era clusters are returned for each feature cluster 36
47 4.3 Cluster Structure Analysis chill beautiful romantic happy Cluster 1 chill romantic beautiful happy Cluster chill romantic beautiful mellow Cluster chill happy romantic sexy Cluster chill beautiful romantic mellow Cluster 5 Figure 4.3: Co-occurence between feature clusters and emotion clusters - First four most co-occurred emotion clusters are returned for each feature cluster 37
48 4.3 Cluster Structure Analysis hip hop wave smooth indie Cluster 1 wave indie hip hop soft Cluster soundtrack smooth classic indie Cluster hip hop wave techno progressive Cluster indie wave smooth soft Cluster 5 Figure 4.4: Co-occurence between feature clusters and genre clusters - First four most co-occurred genre clusters are returned for each feature cluster 38
49 4.3 Cluster Structure Analysis synth guitar drum piano Cluster 1 synth guitar piano drum Cluster piano guitar synth drum Cluster synth drum guitar bass Cluster synth guitar piano drum Cluster 5 Figure 4.5: Co-occurence between feature clusters and instrument clusters - First four most co-occurred instrument clusters are returned for each feature cluster 39
50 4.3 Cluster Structure Analysis dc american german british Cluster american roma german los angeles Cluster dc british roma german Cluster dc british german roma Cluster american british dc german Cluster 5 Figure 4.6: Co-occurence between feature clusters and origin clusters - First four most co-occurred origin clusters are returned for each feature cluster Knowing that the distance between clusters 1 and 2 is minimum and the distance between clusters 3 and 4 is maximum, it can be also observed from figures that the co-occurring terms within clusters 1 and 2 are similar, while those within clusters 3 and 4 are quite dissimilar as shown in tables 4.6 and 4.7, indicating neighboring feature clusters share similar label clusters, while distant feature clusters do not. Cluster 1 vs Cluster 2 (8s, 9s, s, 6s) (8s, 9s, s, 7s) (chill, beautiful, romantic, happy) (chill, romantic, beautiful, happy) (hip hop, wave, smooth, indie) (wave, indie, hip hop, soft) (synth, guitar, drum, piano) (synth, guitar, piano, drum) (dc, american, german, british) (dc, british, roma, german) Table 4.6: Neighboring Clusters - clusters with minimum Euclidean distances share similar label clusters 4
51 4.3 Cluster Structure Analysis Cluster 3 vs Cluster 4 (8s, 2th century, 6s, 9s) (8s, 9s, s, 7s) (chill, romantic, beautiful, mellow) (chill, happy, romantic, sexy) (soundtrack, smooth, classic, indie) (hip hop, wave, techno, progressive) (piano, guitar, synth, drum) (synth, drum, guitar, bass) (american, roma, german, los angeles) (dc, british, german, roma) Table 4.7: Distant Clusters - clusters with maximum Euclidean distances have dissimilar label clusters Correlated Terms vs. Uncorrelated Terms Considering the opposite case, the author selected four largest clusters from each label class and calculated the co-occurrence with every feature clusters as shown in figures In order to observe whether highly correlated label clusters can also be characterized by feature clusters, table 4.8 shows the summary of the most correlated terms for the four largest clusters for each label class, whereas table 4.9 shows the least correlated terms for the same clusters. Using histograms from figures , 5-dimensional vector can be created for each term by finding the ratio of each feature cluster (e.g. a vector for 8s term is (Cluster 1, Cluster 2, Cluster 3, Cluster 4, Cluster 5) = (.71, 1,.196,.727,.61)). Using the same method, a total of 41 vectors are retrieved for every single term in tables 4.8 and 4.9 and shown in table 4.1. Using the relationship from tables 4.8, 4.9 and the vectors in 4.1, the Euclidean distance between a pair of vectors is calculated and shown in tables 4.11 and As its average distance indicates, highly correlated terms share similar combination of feature clusters, whereas lowly correlated terms do not. 41
MUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)
More informationAutomatic Music Clustering using Audio Attributes
Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,
More informationWHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?
WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.
More informationUsing Genre Classification to Make Content-based Music Recommendations
Using Genre Classification to Make Content-based Music Recommendations Robbie Jones (rmjones@stanford.edu) and Karen Lu (karenlu@stanford.edu) CS 221, Autumn 2016 Stanford University I. Introduction Our
More informationLecture 15: Research at LabROSA
ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 15: Research at LabROSA 1. Sources, Mixtures, & Perception 2. Spatial Filtering 3. Time-Frequency Masking 4. Model-Based Separation Dan Ellis Dept. Electrical
More informationMusic Genre Classification and Variance Comparison on Number of Genres
Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques
More informationSubjective Similarity of Music: Data Collection for Individuality Analysis
Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp
More informationMusic Emotion Recognition. Jaesung Lee. Chung-Ang University
Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or
More informationINTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION
INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for
More informationDAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval
DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca
More informationClassification of Timbre Similarity
Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common
More informationAutomatic Music Genre Classification
Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,
More informationSupervised Learning in Genre Classification
Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music
More informationMusic Mood Classification - an SVM based approach. Sebastian Napiorkowski
Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.
More informationOutline. Why do we classify? Audio Classification
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify
More informationMusic Information Retrieval Community
Music Information Retrieval Community What: Developing systems that retrieve music When: Late 1990 s to Present Where: ISMIR - conference started in 2000 Why: lots of digital music, lots of music lovers,
More informationSemi-supervised Musical Instrument Recognition
Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May
More informationMusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface
MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface 1st Author 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl. country code 1st author's
More informationCS229 Project Report Polyphonic Piano Transcription
CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project
More informationInternational Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL
More informationWeek 14 Music Understanding and Classification
Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n
More informationTopics in Computer Music Instrument Identification. Ioanna Karydi
Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches
More informationInstrument Recognition in Polyphonic Mixtures Using Spectral Envelopes
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu
More informationDetecting Musical Key with Supervised Learning
Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different
More informationMusic Recommendation from Song Sets
Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia
More informationSupporting Information
Supporting Information I. DATA Discogs.com is a comprehensive, user-built music database with the aim to provide crossreferenced discographies of all labels and artists. As of April 14, more than 189,000
More informationMusic Genre Classification
Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers
More informationPredicting Time-Varying Musical Emotion Distributions from Multi-Track Audio
Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory
More informationMusic Information Retrieval
CTP 431 Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology (GSCT) Juhan Nam 1 Introduction ü Instrument: Piano ü Composer: Chopin ü Key: E-minor ü Melody - ELO
More informationTHE importance of music content analysis for musical
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With
More informationComputational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)
Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,
More informationTOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC
TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu
More informationBreakscience. Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass
Breakscience Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass Jason A. Hockman PhD Candidate, Music Technology Area McGill University, Montréal, Canada Overview 1 2 3 Hardcore,
More informationA Categorical Approach for Recognizing Emotional Effects of Music
A Categorical Approach for Recognizing Emotional Effects of Music Mohsen Sahraei Ardakani 1 and Ehsan Arbabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran,
More informationISMIR 2008 Session 2a Music Recommendation and Organization
A COMPARISON OF SIGNAL-BASED MUSIC RECOMMENDATION TO GENRE LABELS, COLLABORATIVE FILTERING, MUSICOLOGICAL ANALYSIS, HUMAN RECOMMENDATION, AND RANDOM BASELINE Terence Magno Cooper Union magno.nyc@gmail.com
More informationA repetition-based framework for lyric alignment in popular songs
A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine
More informationComposer Identification of Digital Audio Modeling Content Specific Features Through Markov Models
Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has
More informationMusic Mood. Sheng Xu, Albert Peyton, Ryan Bhular
Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect
More informationEffects of acoustic degradations on cover song recognition
Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be
More informationSinger Recognition and Modeling Singer Error
Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing
More informationDrum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods
Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National
More informationAUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION
AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate
More informationhttp://www.xkcd.com/655/ Audio Retrieval David Kauchak cs160 Fall 2009 Thanks to Doug Turnbull for some of the slides Administrative CS Colloquium vs. Wed. before Thanksgiving producers consumers 8M artists
More informationAnalysis and Clustering of Musical Compositions using Melody-based Features
Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates
More informationWeek 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University
Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based
More informationMusic Similarity and Cover Song Identification: The Case of Jazz
Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary
More informationhit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.
CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating
More informationNeural Network for Music Instrument Identi cation
Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute
More informationHUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH
Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer
More informationAutomatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson
Automatic Music Similarity Assessment and Recommendation A Thesis Submitted to the Faculty of Drexel University by Donald Shaul Williamson in partial fulfillment of the requirements for the degree of Master
More informationRelease Year Prediction for Songs
Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu
More informationMODELS of music begin with a representation of the
602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and
More informationDimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features
Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features R. Panda 1, B. Rocha 1 and R. P. Paiva 1, 1 CISUC Centre for Informatics and Systems of the University of Coimbra, Portugal
More informationPOST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS
POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music
More informationMusic Radar: A Web-based Query by Humming System
Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,
More informationAbout Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance
Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About
More informationSinger Traits Identification using Deep Neural Network
Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic
More informationThe Million Song Dataset
The Million Song Dataset AUDIO FEATURES The Million Song Dataset There is no data like more data Bob Mercer of IBM (1985). T. Bertin-Mahieux, D.P.W. Ellis, B. Whitman, P. Lamere, The Million Song Dataset,
More informationTOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION
TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz
More informationA QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM
A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr
More informationMood Tracking of Radio Station Broadcasts
Mood Tracking of Radio Station Broadcasts Jacek Grekow Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, Bialystok 15-351, Poland j.grekow@pb.edu.pl Abstract. This paper presents
More informationAutomatic Rhythmic Notation from Single Voice Audio Sources
Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung
More informationCreating a Feature Vector to Identify Similarity between MIDI Files
Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many
More informationON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt
ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach
More informationReconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn
Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Introduction Active neurons communicate by action potential firing (spikes), accompanied
More informationAnalysis of local and global timing and pitch change in ordinary
Alma Mater Studiorum University of Bologna, August -6 6 Analysis of local and global timing and pitch change in ordinary melodies Roger Watt Dept. of Psychology, University of Stirling, Scotland r.j.watt@stirling.ac.uk
More informationAutomatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting
Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced
More informationStory Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004
Story Tracking in Video News Broadcasts Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Acknowledgements Motivation Modern world is awash in information Coming from multiple sources Around the clock
More informationPerceptual Evaluation of Automatically Extracted Musical Motives
Perceptual Evaluation of Automatically Extracted Musical Motives Oriol Nieto 1, Morwaread M. Farbood 2 Dept. of Music and Performing Arts Professions, New York University, USA 1 oriol@nyu.edu, 2 mfarbood@nyu.edu
More informationA PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES
12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional
More informationAutomatic Piano Music Transcription
Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening
More informationQuality of Music Classification Systems: How to build the Reference?
Quality of Music Classification Systems: How to build the Reference? Janto Skowronek, Martin F. McKinney Digital Signal Processing Philips Research Laboratories Eindhoven {janto.skowronek,martin.mckinney}@philips.com
More informationMUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES
MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate
More informationHidden Markov Model based dance recognition
Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,
More informationMusic Source Separation
Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or
More informationCan Song Lyrics Predict Genre? Danny Diekroeger Stanford University
Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University danny1@stanford.edu 1. Motivation and Goal Music has long been a way for people to express their emotions. And because we all have a
More informationRobert Alexandru Dobre, Cristian Negrescu
ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q
More informationEvaluating Melodic Encodings for Use in Cover Song Identification
Evaluating Melodic Encodings for Use in Cover Song Identification David D. Wickland wickland@uoguelph.ca David A. Calvert dcalvert@uoguelph.ca James Harley jharley@uoguelph.ca ABSTRACT Cover song identification
More informationTranscription of the Singing Melody in Polyphonic Music
Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,
More informationMUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC
12th International Society for Music Information Retrieval Conference (ISMIR 2011) MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC Sam Davies, Penelope Allen, Mark
More informationExploring Relationships between Audio Features and Emotion in Music
Exploring Relationships between Audio Features and Emotion in Music Cyril Laurier, *1 Olivier Lartillot, #2 Tuomas Eerola #3, Petri Toiviainen #4 * Music Technology Group, Universitat Pompeu Fabra, Barcelona,
More informationMusic Information Retrieval
Music Information Retrieval Automatic genre classification from acoustic features DANIEL RÖNNOW and THEODOR TWETMAN Bachelor of Science Thesis Stockholm, Sweden 2012 Music Information Retrieval Automatic
More informationA Survey of Audio-Based Music Classification and Annotation
A Survey of Audio-Based Music Classification and Annotation Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang IEEE Trans. on Multimedia, vol. 13, no. 2, April 2011 presenter: Yin-Tzu Lin ( 阿孜孜 ^.^)
More informationUnifying Low-level and High-level Music. Similarity Measures
Unifying Low-level and High-level Music 1 Similarity Measures Dmitry Bogdanov, Joan Serrà, Nicolas Wack, Perfecto Herrera, and Xavier Serra Abstract Measuring music similarity is essential for multimedia
More information13 Matching questions
Musical Genres NAME 13 Matching questions 1. jazz A. F. 2. pop 3. country 4. blues 5. hip hop B. G. 6. rap 7. reggae 8. heavy metal C. H. 9. classical 10. electronic 11. folk 12. dance D. I. 13. rock and
More informationHIT SONG SCIENCE IS NOT YET A SCIENCE
HIT SONG SCIENCE IS NOT YET A SCIENCE François Pachet Sony CSL pachet@csl.sony.fr Pierre Roy Sony CSL roy@csl.sony.fr ABSTRACT We describe a large-scale experiment aiming at validating the hypothesis that
More informationAn action based metaphor for description of expression in music performance
An action based metaphor for description of expression in music performance Luca Mion CSC-SMC, Centro di Sonologia Computazionale Department of Information Engineering University of Padova Workshop Toni
More informationResearch Article. ISSN (Print) *Corresponding author Shireen Fathima
Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)
More informationMusic Information Retrieval for Jazz
Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationDeep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj
Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be
More informationSkip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video
Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American
More information6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016
6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that
More informationBi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset
Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,
More informationDAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes
DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms
More informationMidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases
1 MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases Gus Xia Tongbo Huang Yifei Ma Roger B. Dannenberg Christos Faloutsos Schools of Computer Science Carnegie Mellon University 2
More informationMelody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng
Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the
More informationTopic 10. Multi-pitch Analysis
Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds
More informationAutomatic Labelling of tabla signals
ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and
More informationSIGNAL + CONTEXT = BETTER CLASSIFICATION
SIGNAL + CONTEXT = BETTER CLASSIFICATION Jean-Julien Aucouturier Grad. School of Arts and Sciences The University of Tokyo, Japan François Pachet, Pierre Roy, Anthony Beurivé SONY CSL Paris 6 rue Amyot,
More information