Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis

Size: px
Start display at page:

Download "Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis"

Transcription

1 Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis 1 Introduction In this work we propose a music genre classification method that directly analyzes the structure of a song. Information regarding the melody, rhythm, and dynamics of any given song can be used to discern certain structural features that are apparent within the music e.g. intervals, progressions, among other patterns. We are particularly interested in modeling a song as a structured sequence of different symbols (e.g. pitches); we show that this paradigm allows for the use of standard NLP techniques by viewing a song as a language (which is also a structured sequence of symbols). Consequently, we can experiment with certain natural language processing techniques to bring out certain structural features within our data. We show how to map a given song to a finite character space and build NLP-inspired features from this low-level representation; from these high-level features, we train a classifier to predict the genre of the input song out of four options. Concretely, our algorithm takes as input post-processed sonic information about a song and outputs a single classification prediction out of rock, folk, hip hop, and jazz as depicted in Figure 1. Figure 1: Input-Output Specification We evaluate our system on a held-out test set, which is a subset of our data. Our evaluation metric is simple classification accuracy; that is, out of all of our predictions on the test set, our evaluation metric is the proportion of predictions that match the ground truth labels. We hope to do better than random guessing over our four genres, which is around 25% accuracy; a simple model that predicts the label from using the closest song in Euclidean space, where each song is represented by the first 10 notes, yields only very slightly better results at 28% on a small subset of the data. On the other hand, human-level performance on this task is very nearly 100%; thus, there is lots of opportunity for growth in this area. 1

2 2 Related Work Much research has been done in the field of music classification. Dieleman et al. [1] have investigated the use of convolutional neural networks for genre classification. They, like us, also made use of the Million Song Dataset compiled by Columbia s LabROSA. Although this approach exhibits some success, we were more interested in seeing how well a simple NLP-inspired model could achieve the task, though we also made use of a neural network. Instead of focusing on what neural network architecture yields the best results (although this would be an intriguing future direction), we wish to emphasize what combinations of features allow us to make reasonable predictions regarding song genres. Coviello et al. [2] have explored time series models, taking into account deeper structural features related to how a song develops over time. They use a dynamic texture mixture (DTM) as a linear dynamical model for their audio chunks. In our own model, we capture some time series information regarding how we analyze the volumes of certain chunks within the song. However, we opt to only look at the variability of volume over these chunks instead of plugging them into a linear dynamical model. Coviello et al. use EM to learn the parameters of their DTM. We instead train a neural network using our selected features. Hoffman et al. [3] used a Codeword Bernoulli Average (CBA) probabilistic model to assign tags to song. They train this model using EM. Overall, their method makes use of unsupervised learning to represent a song as counts of codewords (clusters found among their feature vectors) which are then used in a bag-of-codewords representation. Our approach is similar in that we also use a bag-of-x representation, however, we use a hand-designed feature extractor instead of supervised learning to generate this representation. 3 Models 3.1 Music as a Language To apply NLP techniques to song audio, we note that we first need an atomic-level representation of a song in the spirit of having characters or words for text. To this end, we note that our dataset of post-processed audio provides several suitable candidate representations: sections, which represent high-level semantic shifts in the course of a song, bars, which represent regular collections of notes known as measures, and segments, which represent specific discrete events such as when a note begins to be played. In the interest of choosing a representation that allows for as much flexibility in engineering high-level features as possible, we chose to go with the fine-grained representation of segments. This segment representation gives us the ability to craft features at the per-note level, along with a rich collection of sonic information such as pitch, timbre, loudness, and duration ; much of this information is lost at the coarser representation levels, even if the coarser levels of representation give more structure to both the technical and semantic skeleton of the composition. With this raw data-level representation in hand, we then focused on creating a compu- 2

3 tationally tractable representation of a song s pitch data. If we let the number of timesteps in a single given song be denoted by T, the pitch data for each song is provided as a matrix with dimensions T 12, where each row represents a distribution over the 12 tonal pitches for each timestep. When considering how to use this data in conjunction with standard NLP techniques, we noted that NLP benefits from the assumption of a finite character space; that is, the range of characters a-z. Since the representation of each timestep with a real-valued 12-vector admits an uncountably infinite range of possible characters, we opted to reduce this character space to 12 by selecting the tone with maximal weight for every timestep and reducing each real-valued 12-vector to an integer-valued scalar in the range of Thus, our final low-level representation of a given song is given by a vector of size T. 3.2 High-Level Language Features Tonal Bigram Bag-of-Words Figure 2: Tonal Bigram Feature Extractor Leveraging the low-level song representation described above, we constructed a bag of words representation of the song by creating bigrams of these thresholded pitch values and storing the frequency with which they occur in the song, resulting in a histogram feature vector of size 144 for each song. A convenient advantage of this bag of word representation is the ability to account for the variable length of the songs in our dataset. 3

4 After implementing a feature extractor to create this bag of word representation, we inspected the features extracted from several songs in our database and noted that each song demonstrated distinct spikes in the frequency data, with a small subset of bigrams being featured at frequencies an order of magnitude higher than the rest of the bigrams; we hypothesize that these frequency spikes correspond to common note intervals that comprise the melody or chorus of each song. This provided evidence that a significant level of structure was being captured with this representation Time-Series Volume Inflections Figure 3: Volume Inflection Feature Extractor In addition to frequency patterns, loudness conveys useful information regarding the structure of a song. Certain genres have particular characteristics pertaining to how much loudness variation they exhibit. Jazz and Folk music, for example, vary in loudness by a large amount. Rock music, on the other hand, makes much use of compression (a technique by which quiet sounds are boosted and loud sounds are lowered), leading to it having a much more consistent loudness profile. Thus, it is useful to design a feature that allows us to capture how much a song s loudness varies. One way to capture this information is by taking the standard deviation over various time chunks (of equal length) over the song. In the case of our model, we found that the ideal number of chunks to take was four (as illustrated in Figure 3). After this process, we have a new feature vector that describes the volume profile of a song Combined Representation The loudness feature vector can simply be appended to the pitch information to obtain a new feature vector that captures melodic and dynamic information. We expect this combined 4

5 representation to help disambiguate any classes that might be hard to differentiate using only pitch features or only loudness features. One particular area where we might benefit from the combined representation is disambiguating between rock music and folk music, which are quite similar in terms of melodic patterns. However, as mentioned before, the use of compression in rock will give it a more consistent loudness profile whereas folk tends to be more varied in volume over the course of a song. Furthermore, volume information gives more insight into how a song evolves over time (as the chunks are always sequential in time), whereas pitch information gives helps us better understand the fundamental structure of the song. Ideally, with the combined information, we expect to see the best aspects of the differentiation between loudness and pitches. 3.3 Neural Network Model Figure 4: Neural Network Architecture To evaluate the power of each of our feature representations, we train a learner to utilize these features to make a prediction about the genre for the input song. Noting that we have a fairly sizeable dataset for each of our four genres and that there is potentially highly non-linear and even non-convex relationships between our input features, we choose to model our learner as a multi-layer neural network [4], depicted in Figure 4. Our neural network utilizes 50 hidden units for each layer, and output scores for four genres. The number of hidden layers in our network was tuned as a hyperparameter, and ranged between one to three. Each layer is fully connected to the next; concretely, each hidden layer consists of a matrix W h R 50 50, followed by a ReLU non-linearity [5], followed by DropOut. Let our input vector be denoted as φ(x) R T and our output vector be denoted as ŷ R 4 ; our network model then also has a fully connected input layer W in R T 50 and fully connected output layer W out R At training time, we encode our target vectors as one-hot vectors of length four, with a one in the index that corresponds to the correct genre. If the correct genre is given by i and our softmax output vector is given by ŷ, our 5

6 loss function is then given by Categorical Cross-Entropy: ( ) exp ŷ i Loss = log j exp ŷ j Thus, our model is incentivized to not only predict the correct label, but to predict the correct label with the maximum possible confidence (i.e. putting as much probability mass in the slot that corresponds to the correct genre). 4 Experiments 4.1 Computational Infrastructure Our data was hosted on an Amazon AWS machine, where we also cleaned the data and ran our training algorithms. We used the Amazon C4 Compute Optimized machines, with Intel Xeon processors, 16 virtual cores, and 30GB of RAM. Our data was cleaned using the Python Numpy library, and stored as a serialized Python dictionary for ease of access. To build and train our neural network, we leveraged the Keras library [6] implemented with TensorFlow [7] to enable rapid experimentation. 4.2 Dataset We utilize the Million Song Dataset [8], which consists of approximately one million sound clips annotated with a large collection of structured metadata such as artist, release year, hotttnesss, latitude, and longitude. Due to copyright issues, it is very difficult to find a large collection of raw audio files for full-length songs for feature extraction and training. The Million Song Dataset mitigates this issue by providing a wealth of sonic information extracted from each song such as a discretization of the continuous audio into note-size chunks, a distribution over pitches for each chunk, volume information in decibels for each note, and duration information for each note. For the purposes of our experiment, the largest drawback of the Million Song Dataset is the lack of direct identification of genre for a given song. However, each song comes labeled with a collection of tags, which are strings that describe assorted characteristics and information related to the song. When counting the frequency of unique tags over all the data, we found that some of the most frequent tags included a comment on the genre of the song. We therefore chose four sufficiently frequent genre tags as our supervision labels for their related songs. The four genres we chose were rock, folk, hip hop, and jazz ; these labels were selected as they seemed sufficiently different to allow for discernible patterns in their constituent songs. In our data cleaning, we retained all songs that belonged to exactly one of these four labels and removed all others; we then applied our feature extractor to each of the retained songs to create our (φ(x), y) pairs. This gave us 10,000 examples per label, for a total of 40,000 examples total. We then separated these examples into a 1,000 example testing set and 9,000 example training set. 6

7 4.3 Experimental Setup In our initial experimentation, we found that if we didn t randomize the training data, our model would very quickly learn to predict the label of the first few training examples for all subsequent examples, leading to a test accuracy of 25%, which is no better than random. We note that this is because of the nature of incremental update algorithms like Stochastic Gradient Descent and AdaGrad, which take their largest steps in the earliest iterations of the algorithm. To mitigate this issue, we randomized our training set and testing set for all subsequent experiments, resulting in non-trivial accuracy. We also found that using an improved update algorithm over vanilla Stochastic Gradient Descent [4] was instrumental to achieving our best results. In particular, we had the best success with AdaGrad [9], which is a learning algorithm that adapts the learning rate for each parameter independently over the lifetime of the training; for our model trained on tonal bigrams, using AdaGrad over SGD boosted our accuracy from 37% to 60%. We subsequently used AdaGrad for all experiments presented in this work. We found that using models of increasing size and training for an increasing number of epochs often led to severe overfitting, where our training accuracies would far outpace our testing accuracies. To mitigate this problem, we introduced two forms of regularization into all of our experiments. We first imposed an L2 penalty on all of our weights, which prevents the weights from growing unreasonably large and thus over-estimating the importance of various features due to their predictive capacity on the training data, which can lead to overfit. We next introduced DropOut [10] layers into our Neural Network model, which randomly eliminates a given neural activation with probability.5; though there is as of yet no rigorous explanation for why DropOut has a regularizing effect on neural network models, empirical evidence has shown it to be a very effective tool in reducing overfit. Finally, we note that the nature of a neural network model leads to a highly non-convex objective function to be optimized; this means that there is very little intuition that can be applied when hand-tuning the training hyper parameters for our models, and so after our initial experimentation we sought a principled way to go about exploring our parameter space. To do so, we utilized grid search, which trains a model for each set of hyper parameters in the Cartesian product of all the possible hyperparameters and takes the one that performs best on the test set. We utilized this setup to identify our optimal model for each of the feature extractors we test below; the hyperparameters we change include the initial learning rate, the L2 regularization strength, and the number of layers in the model. 7

8 5 Results 5.1 Tonal Bigram Bag-of-Words Figure 5: Confusion Matrix for NN trained on Bigram Features Our first experiments involved training and testing our neural network model with the song features generated from our Tonal Bigram Bag-of-Words feature extractor. Our most performant model was able to achieve an accuracy of 60% on our test set; the resulting confusion matrix is given in Figure 5. Straight away, we notice some interesting patterns. First, the classification accuracy is relatively high for jazz, folk, and hip hop. Furthermore, we note that the primary sources of error in our confusion matrix is localized to the top-left and bottom-right corners of the matrix, which indicate that it is relatively difficult to separate songs within the pairs jazz / hip hop and folk / rock ; this result seems fairly intuitive to the authors, who would generally agree that jazz and hip hop are relatively similar, as well as folk / rock. This result seems to indicate then that frequencies of tonal bigrams is sufficient to differentiate broad musical categories, but lacks the representative power to make finer-grained decisions; we postulate that this is because the tonal bigrams represent frequent note intervals, and folk and rock music likely both use a large number of common musical intervals that are known to be pleasing to a broad audience, while jazz and hip hop likely use a wider range of intervals in a manner more indicative of their respective genres. We therefore seek to augment our Tonal Bigram Bag-of-Words features with additional features that increase the discriminative power of our model in deciding within these genre pairs. 8

9 5.2 Time-Series Volume Inflections Figure 6: Confusion Matrix for NN trained on Volume Features Motivated by the previous discussion, we then turned our attention to attempting to identify additional features that could help discriminate more clearly between rock and folk, as well as between jazz and hip hop. We reasoned that an important tool in musical composition is volume dynamics, and that different genres of music likely have different volume patterns that correspond to standard compositional structure. To capture this intuition, we created the Time-Series Volume Inflections feature extractor. Our best model trained on volume inflection features was able to achieve an accuracy of 43%. Though we were initially disheartened by this drop in performance relative to the model trained on tonal bigrams, we noticed an important pattern when analyzing the confusion matrix given in Figure 6; the degree of confusion between folk and rock had been largely disentangled. Indeed, the rock label was highly distinguishable with the volume inflections feature, while it had been difficult to distinguish with tonal bigrams. We postulate that this is due to modern compression techniques for loud music such as rock, which greatly reduces the variation in loudness of much rock music and so is much more easily distinguishable than folk, hip hop, or jazz. We note that though folk and jazz are especially highly entangled with the volume inflections features, they were highly disentangled with the tonal bigrams features. We therefore hypothesized that the combination of tonal bigrams and volume inflection features would give a strong enough representation to distinguish between all pairs of genres, and in particular would help reduce the number of misclassifications in the top-left and bottom-right corners of our confusion matrix. 9

10 5.3 Combined Representation Figure 7: Confusion Matrix for NN trained on Combined Features For our final set of experiments, we combined our tonal bigram feature extractor and our volume inflection feature extractor into a single feature extractor that collects both sets of information and concatenates the resulting feature vectors together. We reasoned that this Combined Feature Extractor would leverage the complementary strengths of our previous two song representations to allow for a more powerful classifier. Our best model trained on these combined features was able to achieve an accuracy of 66%, which is a 10% gain over the accuracy achieved with tonal bigrams alone. When we examine the corresponding confusion matrix shown in Figure 7, we notice a striking change from Figure 5; we now have a very large number of predictions clustered on the main diagonal (corresponding to correct predictions) and a very small number of predictions on the antidiagonal. The matrix overall exhibits a much smoother distribution of predictions where the top-left and bottom-right corners are far less prominent, verifying our hypothesis that the combined features allows for stronger discrimination and a kind of interpolation between the functions derived using either feature extractor in isolation. 6 Conclusion In this work, we have shown that with an appropriate choice of character space, sonic attributes can be mapped to a notion of language, and therefore common natural language analysis techniques such as bigram frequencies and time-series sequence structure can be utilized successfully to disambiguate between different musical patterns; this is an exciting development, since it suggests that there is much structure in common between natural language and musical sequences and so further NLP techniques beyond those explored in this work may potentially be utilized for song analysis. It would be interesting to consider different music-to-language mappings, perhaps maybe even exploring the possibility of musical embeddings (similar to systems such as word2vec). It would also be promising to consider neural network architectures that deal well with data that arises from a temporal domain e.g. RNNs and their variants such as LSTMs. We hope to explore these directions further in future work. 10

11 References [1] Sander Dieleman, Philémon Brakel, and Benjamin Schrauwen. Audio-based music classification with a pretrained convolutional network. In 12th International Society for Music Information Retrieval Conference (ISMIR-2011), pages University of Miami, [2] Emanuele Coviello. Automatic music tagging with time series models [3] Matthew D Hoffman, David M Blei, and Perry R Cook. Easy as cba: A simple probabilistic model for tagging music. [4] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson Education, 2 edition, [5] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages , [6] Franois Chollet. keras, [7] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, Software available from tensorflow.org. [8] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), [9] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul): , [10] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1): ,

Talking Drums: Generating drum grooves with neural networks

Talking Drums: Generating drum grooves with neural networks Talking Drums: Generating drum grooves with neural networks P. Hutchings 1 1 Monash University, Melbourne, Australia arxiv:1706.09558v1 [cs.sd] 29 Jun 2017 Presented is a method of generating a full drum

More information

An AI Approach to Automatic Natural Music Transcription

An AI Approach to Automatic Natural Music Transcription An AI Approach to Automatic Natural Music Transcription Michael Bereket Stanford University Stanford, CA mbereket@stanford.edu Karey Shi Stanford Univeristy Stanford, CA kareyshi@stanford.edu Abstract

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Using Genre Classification to Make Content-based Music Recommendations

Using Genre Classification to Make Content-based Music Recommendations Using Genre Classification to Make Content-based Music Recommendations Robbie Jones (rmjones@stanford.edu) and Karen Lu (karenlu@stanford.edu) CS 221, Autumn 2016 Stanford University I. Introduction Our

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University danny1@stanford.edu 1. Motivation and Goal Music has long been a way for people to express their emotions. And because we all have a

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

Music genre classification using a hierarchical long short term memory (LSTM) model

Music genre classification using a hierarchical long short term memory (LSTM) model Chun Pui Tang, Ka Long Chui, Ying Kin Yu, Zhiliang Zeng, Kin Hong Wong, "Music Genre classification using a hierarchical Long Short Term Memory (LSTM) model", International Workshop on Pattern Recognition

More information

arxiv: v1 [cs.sd] 18 Dec 2018

arxiv: v1 [cs.sd] 18 Dec 2018 BANDNET: A NEURAL NETWORK-BASED, MULTI-INSTRUMENT BEATLES-STYLE MIDI MUSIC COMPOSITION MACHINE Yichao Zhou,1,2 Wei Chu,1 Sam Young 1,3 Xin Chen 1 1 Snap Inc. 63 Market St, Venice, CA 90291, 2 Department

More information

Neural Aesthetic Image Reviewer

Neural Aesthetic Image Reviewer Neural Aesthetic Image Reviewer Wenshan Wang 1, Su Yang 1,3, Weishan Zhang 2, Jiulong Zhang 3 1 Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Finding Sarcasm in Reddit Postings: A Deep Learning Approach Finding Sarcasm in Reddit Postings: A Deep Learning Approach Nick Guo, Ruchir Shah {nickguo, ruchirfs}@stanford.edu Abstract We use the recently published Self-Annotated Reddit Corpus (SARC) with a recurrent

More information

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Background Abstract I attempted a solution at using machine learning to compose music given a large corpus

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Deep Jammer: A Music Generation Model

Deep Jammer: A Music Generation Model Deep Jammer: A Music Generation Model Justin Svegliato and Sam Witty College of Information and Computer Sciences University of Massachusetts Amherst, MA 01003, USA {jsvegliato,switty}@cs.umass.edu Abstract

More information

CREATING all forms of art [1], [2], [3], [4], including

CREATING all forms of art [1], [2], [3], [4], including Grammar Argumented LSTM Neural Networks with Note-Level Encoding for Music Composition Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, and Xiao Zhang 1 arxiv:1611.05416v1 [cs.lg]

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS Hyungui Lim 1,2, Seungyeon Rhyu 1 and Kyogu Lee 1,2 3 Music and Audio Research Group, Graduate School of Convergence Science and Technology 4

More information

A Unit Selection Methodology for Music Generation Using Deep Neural Networks

A Unit Selection Methodology for Music Generation Using Deep Neural Networks A Unit Selection Methodology for Music Generation Using Deep Neural Networks Mason Bretan Georgia Institute of Technology Atlanta, GA Gil Weinberg Georgia Institute of Technology Atlanta, GA Larry Heck

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Image-to-Markup Generation with Coarse-to-Fine Attention

Image-to-Markup Generation with Coarse-to-Fine Attention Image-to-Markup Generation with Coarse-to-Fine Attention Presenter: Ceyer Wakilpoor Yuntian Deng 1 Anssi Kanervisto 2 Alexander M. Rush 1 Harvard University 3 University of Eastern Finland ICML, 2017 Yuntian

More information

Modeling Musical Context Using Word2vec

Modeling Musical Context Using Word2vec Modeling Musical Context Using Word2vec D. Herremans 1 and C.-H. Chuan 2 1 Queen Mary University of London, London, UK 2 University of North Florida, Jacksonville, USA We present a semantic vector space

More information

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You Chris Lewis Stanford University cmslewis@stanford.edu Abstract In this project, I explore the effectiveness of the Naive Bayes Classifier

More information

Generating Music with Recurrent Neural Networks

Generating Music with Recurrent Neural Networks Generating Music with Recurrent Neural Networks 27 October 2017 Ushini Attanayake Supervised by Christian Walder Co-supervised by Henry Gardner COMP3740 Project Work in Computing The Australian National

More information

Evaluating Melodic Encodings for Use in Cover Song Identification

Evaluating Melodic Encodings for Use in Cover Song Identification Evaluating Melodic Encodings for Use in Cover Song Identification David D. Wickland wickland@uoguelph.ca David A. Calvert dcalvert@uoguelph.ca James Harley jharley@uoguelph.ca ABSTRACT Cover song identification

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park katepark@stanford.edu Annie Hu anniehu@stanford.edu Natalie Muenster ncm000@stanford.edu Abstract We propose detecting

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1343 Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet Abstract

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Introduction Brandon Richardson December 16, 2011 Research preformed from the last 5 years has shown that the

More information

Composer Style Attribution

Composer Style Attribution Composer Style Attribution Jacqueline Speiser, Vishesh Gupta Introduction Josquin des Prez (1450 1521) is one of the most famous composers of the Renaissance. Despite his fame, there exists a significant

More information

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering, DeepID: Deep Learning for Face Recognition Xiaogang Wang Department of Electronic Engineering, The Chinese University i of Hong Kong Machine Learning with Big Data Machine learning with small data: overfitting,

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

The Million Song Dataset

The Million Song Dataset The Million Song Dataset AUDIO FEATURES The Million Song Dataset There is no data like more data Bob Mercer of IBM (1985). T. Bertin-Mahieux, D.P.W. Ellis, B. Whitman, P. Lamere, The Million Song Dataset,

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

Deep learning for music data processing

Deep learning for music data processing Deep learning for music data processing A personal (re)view of the state-of-the-art Jordi Pons www.jordipons.me Music Technology Group, DTIC, Universitat Pompeu Fabra, Barcelona. 31st January 2017 Jordi

More information

arxiv: v2 [cs.sd] 15 Jun 2017

arxiv: v2 [cs.sd] 15 Jun 2017 Learning and Evaluating Musical Features with Deep Autoencoders Mason Bretan Georgia Tech Atlanta, GA Sageev Oore, Douglas Eck, Larry Heck Google Research Mountain View, CA arxiv:1706.04486v2 [cs.sd] 15

More information

Enhancing Music Maps

Enhancing Music Maps Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing

More information

Scene Classification with Inception-7. Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Scene Classification with Inception-7. Christian Szegedy with Julian Ibarz and Vincent Vanhoucke Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke Julian Ibarz Vincent Vanhoucke Task Classification of images into 10 different classes: Bedroom Bridge Church

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park, Annie Hu, Natalie Muenster Email: katepark@stanford.edu, anniehu@stanford.edu, ncm000@stanford.edu Abstract We propose

More information

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Audio Feature Extraction for Corpus Analysis

Audio Feature Extraction for Corpus Analysis Audio Feature Extraction for Corpus Analysis Anja Volk Sound and Music Technology 5 Dec 2017 1 Corpus analysis What is corpus analysis study a large corpus of music for gaining insights on general trends

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Melody classification using patterns

Melody classification using patterns Melody classification using patterns Darrell Conklin Department of Computing City University London United Kingdom conklin@city.ac.uk Abstract. A new method for symbolic music classification is proposed,

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University Abstract A model of music needs to have the ability to recall past details and have a clear,

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Analysis and Clustering of Musical Compositions using Melody-based Features

Analysis and Clustering of Musical Compositions using Melody-based Features Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment Gus G. Xia Dartmouth College Neukom Institute Hanover, NH, USA gxia@dartmouth.edu Roger B. Dannenberg Carnegie

More information

Various Artificial Intelligence Techniques For Automated Melody Generation

Various Artificial Intelligence Techniques For Automated Melody Generation Various Artificial Intelligence Techniques For Automated Melody Generation Nikahat Kazi Computer Engineering Department, Thadomal Shahani Engineering College, Mumbai, India Shalini Bhatia Assistant Professor,

More information

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING Adrien Ycart and Emmanouil Benetos Centre for Digital Music, Queen Mary University of London, UK {a.ycart, emmanouil.benetos}@qmul.ac.uk

More information

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington 1) New Paths to New Machine Learning Science 2) How an Unruly Mob Almost Stole the Grand Prize at the Last Moment Jeff Howbert University of Washington February 4, 2014 Netflix Viewing Recommendations

More information

arxiv: v1 [cs.sd] 8 Jun 2016

arxiv: v1 [cs.sd] 8 Jun 2016 Symbolic Music Data Version 1. arxiv:1.5v1 [cs.sd] 8 Jun 1 Christian Walder CSIRO Data1 7 London Circuit, Canberra,, Australia. christian.walder@data1.csiro.au June 9, 1 Abstract In this document, we introduce

More information

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Hendrik Vincent Koops 1, W. Bas de Haas 2, Jeroen Bransen 2, and Anja Volk 1 arxiv:1706.09552v1 [cs.sd]

More information

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification INTERSPEECH 17 August, 17, Stockholm, Sweden A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification Yun Wang and Florian Metze Language

More information

COMPARING RNN PARAMETERS FOR MELODIC SIMILARITY

COMPARING RNN PARAMETERS FOR MELODIC SIMILARITY COMPARING RNN PARAMETERS FOR MELODIC SIMILARITY Tian Cheng, Satoru Fukayama, Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan {tian.cheng, s.fukayama, m.goto}@aist.go.jp

More information

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL Matthew Riley University of Texas at Austin mriley@gmail.com Eric Heinen University of Texas at Austin eheinen@mail.utexas.edu Joydeep Ghosh University

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Music BCI ( )

Music BCI ( ) Music BCI (006-2015) Matthias Treder, Benjamin Blankertz Technische Universität Berlin, Berlin, Germany September 5, 2016 1 Introduction We investigated the suitability of musical stimuli for use in a

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Music Information Retrieval

Music Information Retrieval Music Information Retrieval Automatic genre classification from acoustic features DANIEL RÖNNOW and THEODOR TWETMAN Bachelor of Science Thesis Stockholm, Sweden 2012 Music Information Retrieval Automatic

More information

Analysis of local and global timing and pitch change in ordinary

Analysis of local and global timing and pitch change in ordinary Alma Mater Studiorum University of Bologna, August -6 6 Analysis of local and global timing and pitch change in ordinary melodies Roger Watt Dept. of Psychology, University of Stirling, Scotland r.j.watt@stirling.ac.uk

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Singing voice synthesis based on deep neural networks

Singing voice synthesis based on deep neural networks INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda

More information

AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES. A Thesis. presented to

AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES. A Thesis. presented to AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo In Partial Fulfillment

More information

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition Krishan Rajaratnam The College University of Chicago Chicago, USA krajaratnam@uchicago.edu Jugal Kalita Department

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

arxiv: v1 [cs.sd] 18 Oct 2017

arxiv: v1 [cs.sd] 18 Oct 2017 REPRESENTATION LEARNING OF MUSIC USING ARTIST LABELS Jiyoung Park 1, Jongpil Lee 1, Jangyeon Park 2, Jung-Woo Ha 2, Juhan Nam 1 1 Graduate School of Culture Technology, KAIST, 2 NAVER corp., Seongnam,

More information

Sarcasm Detection in Text: Design Document

Sarcasm Detection in Text: Design Document CSC 59866 Senior Design Project Specification Professor Jie Wei Wednesday, November 23, 2016 Sarcasm Detection in Text: Design Document Jesse Feinman, James Kasakyan, Jeff Stolzenberg 1 Table of contents

More information

Sequence generation and classification with VAEs and RNNs

Sequence generation and classification with VAEs and RNNs Jay Hennig 1 * Akash Umakantha 1 * Ryan Williamson 1 * 1. Introduction Variational autoencoders (VAEs) (Kingma & Welling, 2013) are a popular approach for performing unsupervised learning that can also

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

arxiv: v1 [cs.sd] 12 Dec 2016

arxiv: v1 [cs.sd] 12 Dec 2016 A Unit Selection Methodology for Music Generation Using Deep Neural Networks Mason Bretan Georgia Tech Atlanta, GA Gil Weinberg Georgia Tech Atlanta, GA Larry Heck Google Research Mountain View, CA arxiv:1612.03789v1

More information

Algorithmic Music Composition

Algorithmic Music Composition Algorithmic Music Composition MUS-15 Jan Dreier July 6, 2015 1 Introduction The goal of algorithmic music composition is to automate the process of creating music. One wants to create pleasant music without

More information

Audio spectrogram representations for processing with Convolutional Neural Networks

Audio spectrogram representations for processing with Convolutional Neural Networks Audio spectrogram representations for processing with Convolutional Neural Networks Lonce Wyse 1 1 National University of Singapore arxiv:1706.09559v1 [cs.sd] 29 Jun 2017 One of the decisions that arise

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input.

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input. RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input. Joseph Weel 10321624 Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Representations of Sound in Deep Learning of Audio Features from Music

Representations of Sound in Deep Learning of Audio Features from Music Representations of Sound in Deep Learning of Audio Features from Music Sergey Shuvaev, Hamza Giaffar, and Alexei A. Koulakov Cold Spring Harbor Laboratory, Cold Spring Harbor, NY Abstract The work of a

More information

Neural Network Predicating Movie Box Office Performance

Neural Network Predicating Movie Box Office Performance Neural Network Predicating Movie Box Office Performance Alex Larson ECE 539 Fall 2013 Abstract The movie industry is a large part of modern day culture. With the rise of websites like Netflix, where people

More information