Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang, University of Alberta Victoria Sabo, Georgetown University Group Mentor: Sercan Yildiz
Outline Introduction Background Previous work General approach Methods Results Conclusions Acknowledgments https://musicmachinery.com/2013/09/22/5025/
Context/motivation Automatic music genre classification Pandora s Genome Project and Spotify s Echo Nest API Music sorting/searching, recommendations Human labor is expensive Investigate algorithms for genre classification - accurate and efficient https://thenextweb.com/insider/2013/05/29/spotify-finally-discovers-socialrecommends-new-music-and-more-to-users-but-only-for-the-webnow/#.tnw_wy2w27nc
Data Openly available through Million Song Dataset - 300GB Each song has song-wide information and segment-based information segments- song broken up these.25 second-long pieces which are screened for their own information features that cannot be quantified by the human ear segments_loudness song_hotttnesss tempo loudness *https://labrosa.ee.columbia.edu/millionsong/
Early work Early attempt: Tzanetakis and Cook (2002) Further attempts since then: features: timbre texture (MFCCs), rhythmic content, pitch... methods: neural network, Gaussian classifier, SVM, K-nearest neighbor classifier... http://newatlas.com/automatic-music-genre-classificationsystem/38240/
Methods Supervised learning with genre-labeled training data Feature extraction/ selection Tempo, loudness, key, pitch, etc. Dimension reduction to focus on specific features/genres Correlation Matrix between characteristics
General approach and machine learning techniques Logistic regression Dimension reduction Gaussian naive Bayes k-means clustering k-nn classifier Support vector machine Decision tree Neural network http://docs.opencv.org/2.4/doc/tutorials/ml/introduction_to _svm/introduction_to_svm.html
Best Subset Selection Gaussian Naive Bayes: Accuracy: training 33.78%, testing 31.75% Support Vector Machine (SVM) Accuracy: training 64.14%, testing 55.36% Logistic Regression Accuracy : training 54.24%, testing 53.19% Decision Trees Accuracy: training 100%, testing 35.70% https://www.analyticsvidhya.com/blog/2016/04/completetutorial-tree-based-modeling-scratch-in-python/ https://www.researchgate.net/figure/255695722_fig1_figure- 1-Illustration-of-how-a-Gaussian-Naive-Bayes-GNBclassifier-works-For-each
Gaussian Classifier Used on MFC coefficients (timbre) Test accuracy: 61.75% http://modelai.gettysburg.edu/2012/music/index.html Confusion matrix: Column labels are actual genres.
Gaussian Classifier Used on MFC coefficients (timbre) Test accuracy: 61.75% Removing pop increased accuracy to 66.93% http://modelai.gettysburg.edu/2012/music/index.html Confusion matrix w/o pop: Column labels are actual genres
Dimension Reduction Helpful in figuring out how to further classify the data scatter plot showing results of t- SNE
k-means Clustering t-sne suggested genres that could be easily separated Standard approach: Euclidean Distance each song represented as the sample mean of its 12-dimensional MFCCs (Symmetrized) KL divergence: each song represented as an MVN distribution KL divergence methods more successful: rap vs. blues: 98.8%, rap vs. blues vs. reggae: 93.5% accuracy Euclidean distance: blues vs. reggae vs rap KL divergence: blues, reggae, rap using
Segment-Level: k-nn Classifier Model most prevalent pitch chroma and timbre waveform as a Markov chain Expect different genres to differ in transition matrices Example heatmap for rock transition matrix (pitch) Example heatmap for blues transition matrix (pitch)
Segment-Level: k-nn Classifier k-nn classifier (k=50) Accuracy (pitch): training 48.5%, testing 52.7% Accuracy (timbre): training 51.24%, testing 54.75% Combined accuracy (pitch+timbre): training 60.2%, testing 59% Confusion matrix for combined test
Neural Network Artificial neural network inspired by the human brain Single layer used to classify all 14 genres (17 hidden units with 30,000 iterations) Test accuracy : 62.28% Confusion matrix for combined features: Columns labels are actual genres.
Accuracy of Different Test- Feature Combinations
Limitations Considers a limited number of genres, not subgenres or new genres Training set comprises mostly rock Missing data on danceability and song_hotttnesss (Time) https://sites.google.com/site/kfrawleymusicgenresincontext/
Future Work Combining both song-level and segment-level attributes into the same test for more methods Better methods for variable screening for this kind of data Pattern recognition and other machine learning techniques (eg: convolutional network, residual learning) Scaling methods to full (300 GB) dataset https://clipartfest.com/categori es/view/d0342e4e212661cf48 756958b25b170b904a6a8b/pa ttern-recognition.html
And to finish... https://xkcd.com/1838/
Acknowledgments Sercan Yildiz Thomas Gehrmann
References Tzanetakis, G., and P. Cook. "Musical genre classification of audio signals." IEEE Transactions on Speech and Audio Processing 10.5 (2002): 293-302. Web. Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), pages 591-596, Oct. 2011. Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012), pages 469-474, Oct. 2012.