Release Year Prediction for Songs

Size: px
Start display at page:

Download "Release Year Prediction for Songs"

Transcription

1 Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A Jiaying Liu University of California San Diego PID: A ABSTRACT In this assignment, we study on a subset of Million Song Dataset from UCI Machine Learning Repository to obtain a suitable model that can predict the Release Year based on selected features: Timbre Average and Timbre Covariance. We do some exploratory analysis on the data set, label and features, and then apply Linear Regression, Ridge Regression, LASSO Regression and Random Forest models to this prediction task. Mean Absolute Error(MAE) is chosen to measure the model accuracy. The results we got indicate that Linear Regression model is the best one with the MAE of Keywords Release year; Songs; Linear regression; Ridge regression; Lasso regression; Random forest; Mean absolute error 1. INTRODUCTION Million Song Dataset, a famous data set with freely available collection of audio features and metadata for a million contemporary popular music tracks. Also includes its creation process, its content and its possible tracks. There are many attractive features in the Million Song Database. Here, what we want to focus are Timbre Average and Timbre Covariance features, and then do the release year prediction based on these, since this study may have some practical applications in music recommendation. We define year prediction as estimating the year in which a song was released based on its audio features(although metadata features such as artist name or similar artist tags would certaingly be informative). Listeners often have particular affection for music from certain periods of their lives (such as high school or college), thus predicted the release year could be a useful topic. Moreover, a successful model of the variation in music audio characteristics through the years could throw light on the long-term evolution of popular music. Honestly, it is hard to specifically addressing the release year prediction, since surely we are lacking a large music collection spanning both a wide range of genres(at least within western pop) and a long period of time. 2. DATA SET DESCRIPTION AND ANALYSIS The data set used for this project is UCI Machine Learning Respository[1] at YearPredictionMSD which is a subset of the Million Song Dataset[2]. The songs here are mostly western, commercial tracks ranging from 1922 to 2011, with a peak in the year 2000s. Before building our model to predict the songs release year which is later down the road, we take to heart the wise maxim: Essentially, all models are wrong, but some are useful. This suggested and reminded us that we need to learn more on the data set we are studying so that we are able to develop a more useful, but not necessarily correct, release year prediction model for this. 2.1 Data Set Description The data set consists of 515,345 data entries in total and is splitted into two parts as train and test data set 1. The train data set contains 463,715 entries while test data set contains 51,630 entries. The detail is shown in Table 1. Number of entries in total data set: 515,345 Number of entries in train data set: 463,715 Number of entries in test data set: 51,630 Table 1: data set basic information Each data entry consists of 91 attributes containing the information of release year and MFCC 2 -like features represented as numerical vector, which is shown in Table 2. Index Description 0 Release Year 1-12 Timbre Average Timbre Covariance Table 2: data entry description 2.2 Exploratory Analysis Since we have already seen the content of this data set, we need to get basic data statistics and also get a bit further understanding on the content of this data set. Furthermore, we will take an exploratory analysis for label and features in order to know more about it Label: Release Year Release Year is the predictive target variable. The Release Year in train data set is ranging from the year 1922 to 1 The split strategy avoids the producer effect by making sure no song from a given artist ends up in both the train and test set 2 MFCC is the abbreviation for Mel Frequency Cepstral Coefficent

2 2011. The mean of the release year is , the median is and the standard deviation is The release year that appears most is the year All the basic statistics informations are shown in Table 3. min max mean median mode standard deviation Table 3: Statistics of Release Year Figure 2: violin plot of timbre average By plotting the histogram of release year as Figure 1, we can obtain the peak of release year is around 2000s, with gradually increasing before 2000s and then rapidly falling. To visualize Timbre Average features, we tried the principal components analysis on the former 12 attributes in train data set. Percentages of variance explained by first principal component and second principal component are 50.22% and 23.38%, respectively. Figure 3 is the scatter plot of the first principal component versus second principal component from Timbre Average features. Figure 1: Histogram of Release Year Features: Timbre Average and Timbre Covariance From the data set, we can find that each data entry contains 90 features whose former 12 features stand for timbre average and the others represent the timbre covariance( ( ) ). The sample features are shown in Table 4 following in the next page. As for Timbre Average features, we can study them by violin plots shown as Figure 2. Furthermore, we can calculate the covariance matrix of standardized former 12 attributes shown as Table 5, which indicates that these 12 features for Timbre Average are not too correlated. Figure 3: 1st principal component versus 2nd principal component of timbre average Label versus Features In order to visualize the relationship among 1st, 2nd principal components and the release year, we draw the scatter plot of Release Year versus 1st and 2nd principal components from Timbre Average features is shown as Figure 4.

3 Timbre Average Timbre Covariance X1 X2... X12 X13 X14... X Table 4: sample features in data set X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X X X X X X X X X X X X Table 5: covariance of standardized timbre average Figure 4: release year versus 1st and 2nd principal components of timbre average 3. PREDICTIVE TASK IDENTIFICATION The idea of this predictive task is to predict the release year based on timbre average and timbre covariance features and apply all the possible models to this data set. The criterion we use to measure model accuracy is the Mean Absolute Error(MAE), the model that has the smallest MAE we will consider as the best model. 4. MODEL SELECTION Based on the description above, we aim to predict the release year for the songs on the test set. We will use five models for this predictive task. 4.1 Baseline Model In the baseline model, we simply choose the average of release year in train data set as prediction in test data set. ŷ = ȳ 4.2 Linear Regression Model Linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables(or independent variables) denoted as X. A simple linear model can be represented as y = Xβ min β Xβ y Ridge Regression Model Ridge regression helps to penalize the size of regression coefficients in linear model. A ridge regression model is of the form: with the solution: arg min β Xβ y λ β 2 2 ˆβ ridge = (X T X + I) 1 X T Y 4.4 LASSO Lasso(least absolute shrinkage and selection operator) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. With the form as: 1 min β 2N Xβ y λ β Random Forest Model

4 Random forest is a meta estimator that fits a number of classifying decision tress on various sub-samples of the dataset, which helps to improve the predictive accuracy and control over-fitting. Random decision forests correct for decision trees habit of overfitting to their training set. Here, we use the RandomForestRegressor provided by sklearn package in Python. 5. LITERATURE AND RESEARCH UCSD Data Science Student Society github.io/msd-fp-p1/ undertook an exploratory analysis and built year predictive model on Million Song Dataset[2]. They gave an explanation that the peak song count is at year 2007: The criterion metric for this predictive task is Mean Absolute Error(MAE). Successively, we build linear regression, ridge regression, LASSO regression and random forest models on the train date set and apply these models to predict the release year on the test data set. For ridge and LASSO regression, we select different λ range from 0.01 to and achieve the MAE versus λ shown as Figure 5 from which we find MAE will increase as λ increases. This result indicates that there is not much correlation among the 90 attributes of timbre average and timbre covariance. We see from the distribution of the number of songs that over 50% of the songs in the dataset are from the year increment. From looking further into the advancements in technology over time, we observed that the increase in the development of technology used to play and share music such as mp3 players, ipods and iphone s during this time period can explain this trend. In their feature analysis, the correlations between features both from the extended year and the UCI subset were calculated. Their values were visualized with a heatmap from which they concluded that the most highly correlated features are: Hotness vs. familiarity, Loudness vs. familiarity, Artist tag length vs. Artist familiarity, Artist tag length vs. Artist hotness, Pitch averages for the 12 segments and Timbre averages for the 12 segments. Finally Ridge and LASSO regression were used to predict release year with the Mean Square Error criterion. MatthHew Moocarme 3 also had a research on the subset data from the Million Song Data set from the UCI Machine Learning Repository. The innovation of his research was to use Spark to predict the year of a song release. Using the Root Mean Square Error(RMSE) to measure the prediction accuracy, his conclusion was that there is not much correlation between the features, and the linear regression model is pretty good since it obtained RMSE beats the baseline by almost 7 years. Besides, song year prediction using Apache Spark[4] had some similar studies and the UCI subset data was mentioned in several books[3]. 6. RESULT ANALYSIS Mean Absolute Error(MAE) and Root Mean Square Error(RMSE) are two common methods used in model accuracy measurement. Mean Absolute Error is defined as follows: MAE = 1 N y i ŷ i N i=1 and Root Mean Square Error is defined as follows: RMSE = 1 N y i ŷ i N 2 i=1 3 Figure 5: mean absolute error versus lambda for ridge and lasso regression models Next, we displayed the results gathering all the models and their MAEs together that shown in Table 6. In terms of MAE, the baseline model is the worst model with MAE and the simple linear regression model is the best one with MAE The baseline model is the worst because it simply takes the average release year of the train data set as the release year of the test data set, which decreases the accuracy. The reason that linear regression model beats the others might be that it assembles all the timbre average and timbre covariance information for release year prediction. Model MAE Baseline Model Linear Regression Ridge Regression Lasso Regression λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = Random Forest Table 6: mean absolute errors of models Deeply looking into linear regression model, we want to get the absolute of difference between the estimated release

5 year and the actual release year ŷ y in order to know how good is our prediction work going and the results are grouped into 5 parts: < 1 year, 1-3 years, 3-5 years, 5-10 years and > 10 years. The best predictions(< 1 year) takes 10.6% among total results while the worst part(> 10 years) takes 20.0%. The largest amount of this absolute differences is 5-10 years, which is 29.6%. The detail is shown below as a pie chart(figure 6). Compared to the baseline model, simple linear regression model improves MAE from to Figure 6: pie chart of prediction error for linear regression model 7. REFERENCES [1] T. Bertin-Mahieux. Million song dataset. [2] M. Lichman. Uci machine learning repository [3] W. W. Piegorsch. Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery, page 364. John Wiley Sons, illustrated edition, [4] A. K. Prakhar Mishra, Ratika Garg. Song year prediction using apache spark. IEEE, Sept

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

The Million Song Dataset

The Million Song Dataset The Million Song Dataset AUDIO FEATURES The Million Song Dataset There is no data like more data Bob Mercer of IBM (1985). T. Bertin-Mahieux, D.P.W. Ellis, B. Whitman, P. Lamere, The Million Song Dataset,

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY

PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY THE CHALLENGE: TO UNDERSTAND HOW TEAMS CAN WORK BETTER SOCIAL NETWORK + MACHINE LEARNING TO THE RESCUE Previous research:

More information

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007 A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis

More information

NETFLIX MOVIE RATING ANALYSIS

NETFLIX MOVIE RATING ANALYSIS NETFLIX MOVIE RATING ANALYSIS Danny Dean EXECUTIVE SUMMARY Perhaps only a few us have wondered whether or not the number words in a movie s title could be linked to its success. You may question the relevance

More information

More About Regression

More About Regression Regression Line for the Sample Chapter 14 More About Regression is spoken as y-hat, and it is also referred to either as predicted y or estimated y. b 0 is the intercept of the straight line. The intercept

More information

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.) Chapter 27 Inferences for Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 27-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley An

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Perceptual dimensions of short audio clips and corresponding timbre features

Perceptual dimensions of short audio clips and corresponding timbre features Perceptual dimensions of short audio clips and corresponding timbre features Jason Musil, Budr El-Nusairi, Daniel Müllensiefen Department of Psychology, Goldsmiths, University of London Question How do

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

What is Statistics? 13.1 What is Statistics? Statistics

What is Statistics? 13.1 What is Statistics? Statistics 13.1 What is Statistics? What is Statistics? The collection of all outcomes, responses, measurements, or counts that are of interest. A portion or subset of the population. Statistics Is the science of

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e) STAT 113: Statistics and Society Ellen Gundlach, Purdue University (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e) Learning Objectives for Exam 1: Unit 1, Part 1: Population

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

DOES MOVIE SOUNDTRACK MATTER? THE ROLE OF SOUNDTRACK IN PREDICTING MOVIE REVENUE

DOES MOVIE SOUNDTRACK MATTER? THE ROLE OF SOUNDTRACK IN PREDICTING MOVIE REVENUE DOES MOVIE SOUNDTRACK MATTER? THE ROLE OF SOUNDTRACK IN PREDICTING MOVIE REVENUE Haifeng Xu, Department of Information Systems, National University of Singapore, Singapore, xu-haif@comp.nus.edu.sg Nadee

More information

Recognising Cello Performers Using Timbre Models

Recognising Cello Performers Using Timbre Models Recognising Cello Performers Using Timbre Models Magdalena Chudy and Simon Dixon Abstract In this paper, we compare timbre features of various cello performers playing the same instrument in solo cello

More information

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC Vaiva Imbrasaitė, Peter Robinson Computer Laboratory, University of Cambridge, UK Vaiva.Imbrasaite@cl.cam.ac.uk

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

Using Genre Classification to Make Content-based Music Recommendations

Using Genre Classification to Make Content-based Music Recommendations Using Genre Classification to Make Content-based Music Recommendations Robbie Jones (rmjones@stanford.edu) and Karen Lu (karenlu@stanford.edu) CS 221, Autumn 2016 Stanford University I. Introduction Our

More information

Libraries as Repositories of Popular Culture: Is Popular Culture Still Forgotten?

Libraries as Repositories of Popular Culture: Is Popular Culture Still Forgotten? Wayne State University School of Library and Information Science Faculty Research Publications School of Library and Information Science 1-1-2007 Libraries as Repositories of Popular Culture: Is Popular

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

The Great Beauty: Public Subsidies in the Italian Movie Industry

The Great Beauty: Public Subsidies in the Italian Movie Industry The Great Beauty: Public Subsidies in the Italian Movie Industry G. Meloni, D. Paolini,M.Pulina April 20, 2015 Abstract The aim of this paper to examine the impact of public subsidies on the Italian movie

More information

APPLICATION OF MULTI-GENERATIONAL MODELS IN LCD TV DIFFUSIONS

APPLICATION OF MULTI-GENERATIONAL MODELS IN LCD TV DIFFUSIONS APPLICATION OF MULTI-GENERATIONAL MODELS IN LCD TV DIFFUSIONS BI-HUEI TSAI Professor of Department of Management Science, National Chiao Tung University, Hsinchu 300, Taiwan Email: bhtsai@faculty.nctu.edu.tw

More information

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING Mudhaffar Al-Bayatti and Ben Jones February 00 This report was commissioned by

More information

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont. Chapter 5 Describing Distributions Numerically Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide

More information

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University danny1@stanford.edu 1. Motivation and Goal Music has long been a way for people to express their emotions. And because we all have a

More information

Music Similarity and Cover Song Identification: The Case of Jazz

Music Similarity and Cover Song Identification: The Case of Jazz Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary

More information

EVALUATING THE IMPACT FACTOR: A CITATION STUDY FOR INFORMATION TECHNOLOGY JOURNALS

EVALUATING THE IMPACT FACTOR: A CITATION STUDY FOR INFORMATION TECHNOLOGY JOURNALS EVALUATING THE IMPACT FACTOR: A CITATION STUDY FOR INFORMATION TECHNOLOGY JOURNALS Ms. Kara J. Gust, Michigan State University, gustk@msu.edu ABSTRACT Throughout the course of scholarly communication,

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3 MATH 214 (NOTES) Math 214 Al Nosedal Department of Mathematics Indiana University of Pennsylvania MATH 214 (NOTES) p. 1/3 CHAPTER 1 DATA AND STATISTICS MATH 214 (NOTES) p. 2/3 Definitions. Statistics is

More information

Resampling Statistics. Conventional Statistics. Resampling Statistics

Resampling Statistics. Conventional Statistics. Resampling Statistics Resampling Statistics Introduction to Resampling Probability Modeling Resample add-in Bootstrapping values, vectors, matrices R boot package Conclusions Conventional Statistics Assumptions of conventional

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Recognising Cello Performers using Timbre Models

Recognising Cello Performers using Timbre Models Recognising Cello Performers using Timbre Models Chudy, Magdalena; Dixon, Simon For additional information about this publication click this link. http://qmro.qmul.ac.uk/jspui/handle/123456789/5013 Information

More information

UC San Diego UC San Diego Previously Published Works

UC San Diego UC San Diego Previously Published Works UC San Diego UC San Diego Previously Published Works Title Classification of MPEG-2 Transport Stream Packet Loss Visibility Permalink https://escholarship.org/uc/item/9wk791h Authors Shin, J Cosman, P

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Music Mood Classication Using The Million Song Dataset

Music Mood Classication Using The Million Song Dataset Music Mood Classication Using The Million Song Dataset Bhavika Tekwani December 12, 2016 Abstract In this paper, music mood classication is tackled from an audio signal analysis perspective. There's an

More information

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Chapter 5 Between Quantitative Variables Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. Three Tools we will use Scatterplot, a two-dimensional graph of data values Correlation,

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Normalization Methods for Two-Color Microarray Data

Normalization Methods for Two-Color Microarray Data Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright 2009 Dan Nettleton What is Normalization? Normalization describes the process of removing (or minimizing) non-biological variation

More information

hprints , version 1-1 Oct 2008

hprints , version 1-1 Oct 2008 Author manuscript, published in "Scientometrics 74, 3 (2008) 439-451" 1 On the ratio of citable versus non-citable items in economics journals Tove Faber Frandsen 1 tff@db.dk Royal School of Library and

More information

Modeling memory for melodies

Modeling memory for melodies Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University

More information

Open Access Determinants and the Effect on Article Performance

Open Access Determinants and the Effect on Article Performance International Journal of Business and Economics Research 2017; 6(6): 145-152 http://www.sciencepublishinggroup.com/j/ijber doi: 10.11648/j.ijber.20170606.11 ISSN: 2328-7543 (Print); ISSN: 2328-756X (Online)

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Douglas Bates 2011-03-16 Contents 1 sleepstudy 1 2 Random slopes 3 3 Conditional means 6 4 Conclusions 9 5 Other

More information

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson Automatic Music Similarity Assessment and Recommendation A Thesis Submitted to the Faculty of Drexel University by Donald Shaul Williamson in partial fulfillment of the requirements for the degree of Master

More information

A Pattern Recognition Approach for Melody Track Selection in MIDI Files

A Pattern Recognition Approach for Melody Track Selection in MIDI Files A Pattern Recognition Approach for Melody Track Selection in MIDI Files David Rizo, Pedro J. Ponce de León, Carlos Pérez-Sancho, Antonio Pertusa, José M. Iñesta Departamento de Lenguajes y Sistemas Informáticos

More information

Discriminant Analysis. DFs

Discriminant Analysis. DFs Discriminant Analysis Chichang Xiong Kelly Kinahan COM 631 March 27, 2013 I. Model Using the Humor and Public Opinion Data Set (Neuendorf & Skalski, 2010) IVs: C44 reverse coded C17 C22 C23 C27 reverse

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Analysis of Film Revenues: Saturated and Limited Films Megan Gold

Analysis of Film Revenues: Saturated and Limited Films Megan Gold Analysis of Film Revenues: Saturated and Limited Films Megan Gold University of Nevada, Las Vegas. Department of. DOI: http://dx.doi.org/10.15629/6.7.8.7.5_3-1_s-2017-3 Abstract: This paper analyzes film

More information

Setting Energy Efficiency Requirements Using Multivariate Regression

Setting Energy Efficiency Requirements Using Multivariate Regression Setting Energy Efficiency Requirements Using Multivariate Regression Matt Malinowski, ICF, Presenter Dan Baldewicz, ICF EEDAL 2017 Irvine, CA September 13, 2017 About ICF ICF (NASDAQ:ICFI) is a global

More information

Visual Encoding Design

Visual Encoding Design CSE 442 - Data Visualization Visual Encoding Design Jeffrey Heer University of Washington A Design Space of Visual Encodings Mapping Data to Visual Variables Assign data fields (e.g., with N, O, Q types)

More information

ECONOMICS 351* -- INTRODUCTORY ECONOMETRICS. Queen's University Department of Economics. ECONOMICS 351* -- Winter Term 2005 INTRODUCTORY ECONOMETRICS

ECONOMICS 351* -- INTRODUCTORY ECONOMETRICS. Queen's University Department of Economics. ECONOMICS 351* -- Winter Term 2005 INTRODUCTORY ECONOMETRICS Queen's University Department of Economics ECONOMICS 351* -- Winter Term 2005 INTRODUCTORY ECONOMETRICS Winter Term 2005 Instructor: Web Site: Mike Abbott Office: Room A521 Mackintosh-Corry Hall or Room

More information

Statistical Consulting Topics. RCBD with a covariate

Statistical Consulting Topics. RCBD with a covariate Statistical Consulting Topics RCBD with a covariate Goal: to determine the optimal level of feed additive to maximize the average daily gain of steers. VARIABLES Y = Average Daily Gain of steers for 160

More information

Supplemental Material: Color Compatibility From Large Datasets

Supplemental Material: Color Compatibility From Large Datasets Supplemental Material: Color Compatibility From Large Datasets Peter O Donovan, Aseem Agarwala, and Aaron Hertzmann Project URL: www.dgp.toronto.edu/ donovan/color/ 1 Unmixing color preferences In the

More information

Composer Style Attribution

Composer Style Attribution Composer Style Attribution Jacqueline Speiser, Vishesh Gupta Introduction Josquin des Prez (1450 1521) is one of the most famous composers of the Renaissance. Despite his fame, there exists a significant

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface 1st Author 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl. country code 1st author's

More information

Algebra I Module 2 Lessons 1 19

Algebra I Module 2 Lessons 1 19 Eureka Math 2015 2016 Algebra I Module 2 Lessons 1 19 Eureka Math, Published by the non-profit Great Minds. Copyright 2015 Great Minds. No part of this work may be reproduced, distributed, modified, sold,

More information

For these items, -1=opposed to my values, 0= neutral and 7=of supreme importance.

For these items, -1=opposed to my values, 0= neutral and 7=of supreme importance. 1 Factor Analysis Jeff Spicer F1 F2 F3 F4 F9 F12 F17 F23 F24 F25 F26 F27 F29 F30 F35 F37 F42 F50 Factor 1 Factor 2 Factor 3 Factor 4 For these items, -1=opposed to my values, 0= neutral and 7=of supreme

More information

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN Paper SDA-04 Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN ABSTRACT The purpose of this study is to use statistical

More information

ECE438 - Laboratory 1: Discrete and Continuous-Time Signals

ECE438 - Laboratory 1: Discrete and Continuous-Time Signals Purdue University: ECE438 - Digital Signal Processing with Applications 1 ECE438 - Laboratory 1: Discrete and Continuous-Time Signals By Prof. Charles Bouman and Prof. Mireille Boutin Fall 2015 1 Introduction

More information

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA Ming-Ju Wu Computer Science Department National Tsing Hua University Hsinchu, Taiwan brian.wu@mirlab.org Jyh-Shing Roger Jang Computer

More information

STI 2018 Conference Proceedings

STI 2018 Conference Proceedings STI 2018 Conference Proceedings Proceedings of the 23rd International Conference on Science and Technology Indicators All papers published in this conference proceedings have been peer reviewed through

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

A Categorical Approach for Recognizing Emotional Effects of Music

A Categorical Approach for Recognizing Emotional Effects of Music A Categorical Approach for Recognizing Emotional Effects of Music Mohsen Sahraei Ardakani 1 and Ehsan Arbabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran,

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric

More information

Music Information Retrieval

Music Information Retrieval Music Information Retrieval Automatic genre classification from acoustic features DANIEL RÖNNOW and THEODOR TWETMAN Bachelor of Science Thesis Stockholm, Sweden 2012 Music Information Retrieval Automatic

More information

Lecture 15: Research at LabROSA

Lecture 15: Research at LabROSA ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 15: Research at LabROSA 1. Sources, Mixtures, & Perception 2. Spatial Filtering 3. Time-Frequency Masking 4. Model-Based Separation Dan Ellis Dept. Electrical

More information

arxiv: v1 [cs.dl] 9 May 2017

arxiv: v1 [cs.dl] 9 May 2017 Understanding the Impact of Early Citers on Long-Term Scientific Impact Mayank Singh Dept. of Computer Science and Engg. IIT Kharagpur, India mayank.singh@cse.iitkgp.ernet.in Ajay Jaiswal Dept. of Computer

More information

Relationships Between Quantitative Variables

Relationships Between Quantitative Variables Chapter 5 Relationships Between Quantitative Variables Three Tools we will use Scatterplot, a two-dimensional graph of data values Correlation, a statistic that measures the strength and direction of a

More information

Draft December 15, Rock and Roll Bands, (In)complete Contracts and Creativity. Cédric Ceulemans, Victor Ginsburgh and Patrick Legros 1

Draft December 15, Rock and Roll Bands, (In)complete Contracts and Creativity. Cédric Ceulemans, Victor Ginsburgh and Patrick Legros 1 Draft December 15, 2010 1 Rock and Roll Bands, (In)complete Contracts and Creativity Cédric Ceulemans, Victor Ginsburgh and Patrick Legros 1 Abstract Members of a rock and roll band are endowed with different

More information

STAT 503 Case Study: Supervised classification of music clips

STAT 503 Case Study: Supervised classification of music clips STAT 503 Case Study: Supervised classification of music clips 1 Data Description This data was collected by Dr Cook from her own CDs. Using a Mac she read the track into the music editing software Amadeus

More information

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR 12th International Society for Music Information Retrieval Conference (ISMIR 2011) NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR Yajie Hu Department of Computer Science University

More information

Evaluation of video quality metrics on transmission distortions in H.264 coded video

Evaluation of video quality metrics on transmission distortions in H.264 coded video 1 Evaluation of video quality metrics on transmission distortions in H.264 coded video Iñigo Sedano, Maria Kihl, Kjell Brunnström and Andreas Aurelius Abstract The development of high-speed access networks

More information

Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility

Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility Karim M. Ibrahim (M.Sc.,Nile University, Cairo, 2016) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT

More information

A Study of Predict Sales Based on Random Forest Classification

A Study of Predict Sales Based on Random Forest Classification , pp.25-34 http://dx.doi.org/10.14257/ijunesst.2017.10.7.03 A Study of Predict Sales Based on Random Forest Classification Hyeon-Kyung Lee 1, Hong-Jae Lee 2, Jaewon Park 3, Jaehyun Choi 4 and Jong-Bae

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Lessons from the Netflix Prize: Going beyond the algorithms

Lessons from the Netflix Prize: Going beyond the algorithms Lessons from the Netflix Prize: Going beyond the algorithms Yehuda Koren movie #868 Haifa movie #76 movie #666 We Know What You Ought To Be Watching This Summer We re quite curious, really. To the tune

More information

Cluster Analysis of Internet Users Based on Hourly Traffic Utilization

Cluster Analysis of Internet Users Based on Hourly Traffic Utilization Cluster Analysis of Internet Users Based on Hourly Traffic Utilization M. Rosário de Oliveira, Rui Valadas, António Pacheco, Paulo Salvador Instituto Superior Técnico - UTL Department of Mathematics and

More information

Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field

Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field Tuanfeng Zhang November, 2001 Abstract Multiple-point simulation of multiple categories

More information

Frequencies. Chapter 2. Descriptive statistics and charts

Frequencies. Chapter 2. Descriptive statistics and charts An analyst usually does not concentrate on each individual data values but would like to have a whole picture of how the variables distributed. In this chapter, we will introduce some tools to tabulate

More information

Restoration of Hyperspectral Push-Broom Scanner Data

Restoration of Hyperspectral Push-Broom Scanner Data Restoration of Hyperspectral Push-Broom Scanner Data Rasmus Larsen, Allan Aasbjerg Nielsen & Knut Conradsen Department of Mathematical Modelling, Technical University of Denmark ABSTRACT: Several effects

More information

The Effect of DJs Social Network on Music Popularity

The Effect of DJs Social Network on Music Popularity The Effect of DJs Social Network on Music Popularity Hyeongseok Wi Kyung hoon Hyun Jongpil Lee Wonjae Lee Korea Advanced Institute Korea Advanced Institute Korea Advanced Institute Korea Advanced Institute

More information