Generating Music with Recurrent Neural Networks

Size: px
Start display at page:

Download "Generating Music with Recurrent Neural Networks"

Transcription

1 Generating Music with Recurrent Neural Networks 27 October 2017 Ushini Attanayake Supervised by Christian Walder Co-supervised by Henry Gardner COMP3740 Project Work in Computing The Australian National University 1

2 Declaration I declare that to the best of my knowledge, the following is entirely my own work, and does not contain material produced by another person, except where cited otherwise. Ushini Attanayake October

3 ACKNOWLEDGEMENTS I would like to thank the following individuals for their contributions towards this project. Thank you to Dr. Christian Walder for his teachings, advice and support throughout the project. Thank you to Dr. Henry Gardner for his advice regarding fulfilling course criteria and the approach to the project. Thank you to Dr. Peter Strazdins for running and curating the practice meetings. 3

4 ABSTRACT Recurrent Neural Networks have been used to model various styles and representations of music. Currently, a sufficient model does not exist which has been trained on the irealb jazz corpus of textual **jazz files. When trained well, such a model can be used to generate novel jazz chord progressions. These progressions can be interpreted by the irealb player software to generate a variety of interesting progressions jazz soloists can improvise to. This report contributes towards achieving such a model by training a Recurrent Neural Network with LSTM cells on the irealb jazz corpus. Two main training techniques were used; separating the root and extension of a chord and transposing the files in the corpus. The test perplexity of the model trained on a dataset where each song was transposed through all keys and the root and extension of all chords were separated was This is a significant improvement from the original data test perplexity of No evidence of overfitting was found in any of the models. 4

5 Table of Contents ACKNOWLEDGEMENTS... 3 ABSTRACT Introduction Literature Review Background The Model Cross Validation The Data Music Theory Experiments Overview Splitting the Root and Extension of Chords Method Transposing Transposing to a single key Transposing through all keys Hyperparameters Combinations Results Splitting the Root and Extension of Chords Transposing to a Single Key Transposing Through All Keys Hyperparameter Combinations Test Scores Conclusions Future Work References Appendix Appendix 1: Study Contract

6 List of Figures 1 Figure 1: **jazz file example List of Graphs 1 Graph 1: No Split 2 Graph 2: Split 3 Graph 3: Single key transpose, no split 4 Graph 4: Single key transpose, split 5 Graph 5: All key transpose, no split 6 Graph 6: Hyperparameter Combinations 7 Graph 7: Test Scores List of Tables 1 Table 1: Test Scores 6

7 1. Introduction The use of Recurrent Neural Networks to generate music has been extensively explored with various genres. This includes models which have been successfully trained on blues and jazz corpuses; modelling chords and melody sequences in various representations. However, a sufficient model which can generate chord progressions in the text-based **jazz representation does not exist. Such a model has useful applications in assisting jazz soloist improve their improvisational skills when used in conjunction with irealb player. This is a software which interprets chord charts and generates band accompaniments. Chord progressions generated from such a model can be easily interpreted by the irealb player as they will be in the **jazz representation. Since the irealb player produces accompaniments for existing songs, a model capable of producing novel chord progressions becomes more useful to a musician as they will have access to a variety of unique chord progression to practice soloing to. This is particularly useful in an experimental style like jazz, where improvisation play a large role in the genre. This project aims to contribute towards creating such a model by training an existing Recurrent Neural Network with LSTM cells on the irealb jazz corpus. Given an input sequence of musical elements, the network makes predictions on the next element in the sequence and thus can be used to predict an entire chord progression for a song. The aim was to obtain a model which can generate novel and meaningful chord progressions in the **jazz representation. Training techniques such as transposing and separating chords into roots and extensions were explored. These techniques reveal musical structures to the network on two levels of granularity; within chords and progressions. Since revealing these structures has the effect of generalising the information the Recurrent Neural Network interprets, it was hypothesised that the use of these techniques would bring us closer to achieving the aim. For pedagogical reasons, this project focusses on the training process and looks to extend upon this work in the future by sampling from the model and producing chord progressions which can be used by the irealb player to generate accompaniments. The model s performance was evaluated by using the perplexity on the test dataset as the performance measure. The perplexity indicates the level of randomness in the predictions the model makes. Cross validation was used to observe whether the model was overfitting the data. The main results show that transposing the dataset through all keys and separating the root and the extension of a chord in pre-processing significantly improved the test perplexity of the model. The test perplexity for this model was and the results from cross validation showed no signs of overfitting. 7

8 1.1 Literature Review The idea for this project was inspired by the works of Sturm, Santos and Korshunova [1] who used textual data in the ABC representation to train a Deep Recurrent Neural Network to generate folk music. Their network was trained on a large amount of data; 23, 958 files. The network was configured to have 3 layers and 512 hidden units. Though the ABC representation does account for chords, Sturm, Santos and Korshunova do not explicitly model chords. In fact, one of their models was trained on a dataset where multiple voices were removed. The ABC representation presents chords in a similar way to the ***jazz representation and the folk music corpus used has files which represent chord progressions. But, what separates the work in this project from the work of Sturm, Santos and Korshunova is that chord progression as explicitly being modelled in this work. The training techniques used in this project also explicitly harness underlying structures in chords and chord progressions. The work of Sturm, Santos and Korshunova was used as a general guideline for this project. Since the architecture of the neural network and the data representation used in this project was similar to theirs, training processes could be kept as the focal point of the project. This project was after all intended to be used as a tool for learning. This project also differs from the one mentioned above in terms of genres as the network in this project was trained on jazz chord progressions. There are a couple of significant works which have trained Recurrent Neural Networks on jazz and blues music. Eck and Schmidhuber [2] trained their network with LSTM cells on blues music. They had models which were capable of generating chords and melodies. However, like the work found in [1], the methods for training the network did not explicitly use the properties in the structures of chords. The data representation used by Eck and Schmidhuber is multi-voiced and had a unit per musical note. Each unit had a binary state. If the note was present, the state was ON, if it was absent, the state was OFF. Therefore, chords and melodies had the same representation. Franklin [3] uses a representation similar to PHCCCF. The pitches are represented by 7bits. The first 4 bits identifies which major Circle of Thirds the note belongs to and the last three bits identify which minor Circle of thirds the note belongs to. Chords are represented by summing the 7bit representation of the notes in a chord. Due to the complexity of this representation, Franklin omits complex chord tones from her experiments and focusses on two tones of chords. These were the triad and the triad plus the 7 th tone. The simpler representation of chords in the **jazz files allows more complex tones to be modelled in this project. 8

9 2. Background In order to explain the reasoning behind the techniques used for training the model and the methods used to evaluate the results, we will first take a look at how the network which was used processes the data and how to interpret its output. Since the techniques also harnesses properties intrinsic to musical data, we will briefly explain the **jazz representation and some basic music theory. 2.1 The Model The model used was an existing Recurrent Neural Network with LSTM cells written in the Tensorflow framework. The network was not written from scratch in ordered to focus the project on the training process. The network was previously trained to model natural language. It takes in 3 plain text files as input. These files represented the train, validation and test datasets. More on this in the cross-validation section. Each LSTM cell takes as an input a sequence of words and predicts the next word in the sequence. Each input sequence is n timesteps long and each LSTM cell in the network processes m such input sequences in parallel. The timestep associated with a word in a sequence indicates its position in the sequence. A cell processes its input sequences one word at a time. After a cell process each word, it feeds the cell state back into itself. The cell state captures information the cell has learnt about a sequence up to a certain timestep. For example, when predicting the word at the t +1 th timestep, the cell takes as input the t th timestep and the cell state of the t-1th timestep. By having this feedback mechanism, the cell is estimating the conditional probability below. n P(X 1, X 2,, X n ) P(X i X 1, X 2,, X i 1 ) i1 It states that the probability of a sequence (X 1, X 2,, X n ) occurring is equivalent to the product of the probabilities that each element in the sequence X 1, X 2,, X n 1 occurs next, given the preceding sequence for that element. This feedback mechanism is what makes LSTM cells particularly useful in modelling sequential data such as language and music. The nature, and therefore sound of a musical chord in a sequence is dependent on all the chords that came before it. Before the network starts processing the data, the raw files must be separated into sequences of words. The algorithm used to do this identifies any group of adjacent characters which are surrounded in whitespace as a word. The algorithm also considers the new-line character \n as whitespace. Though the original network was predicting the next word in the sequence, the output is represented as a probability vector. The dimension of an output vector is the size of the vocabulary of the dataset. The vocabulary of the dataset is the collection of all the unique words in the dataset. After the cell generates an output sequence of predictions based on the input sequence it received. This output sequence is compared to a target sequence. The 9

10 targets are the elements from the input sequence shifted up by one timestep. An error is calculated from the comparison between the sequence of predictions and the target sequence. From this error, the perplexity of the output is calculated. Perplexity is given by the following formula where H(p) is the entropy of the distribution p. 2 H(p) 2 x p(x)log 2p(x) Entropy can be used to describe the randomness of a probability distribution. Since perplexity is a function of entropy, a low perplexity for a given iteration is equivalent to having a low level of randomness in the predictions for that iteration. The perplexity was used as the performance measure because it captures the level of randomness in the prediction. Since there are clear structure in music which we want the network to recognise, we want the predictions to be made in a systemic way rather than a random way Cross Validation If we are to sample from the model, we want the network to generate a sequence which is novel. Therefore, it is important to make sure the model is not overfitting. When the model overfits the data, it is more likely to generate sequences which are extremely similar to the songs it encountered on the training data. Cross validation gives us a way of gauging whether the model is overfitting. It is a technique where the dataset is split into 3 disjoint subsets; the train set, validation set and test set. The training set was chosen to be 60% of the entire dataset while the validation and test sets were 20% each. This is generally how the data is split for cross validation which is why the specific percentages above were chosen. The network processes the entire training set a certain number of times and this is known as the number of epochs. The configuration used for training the jazz dataset had 13 epochs. After each epoch, the network is introduced to a fraction of the validation set. The perplexity is calculated for the training epoch and the validation data after a given epoch. This allows us to gauge the network s performance while its training as we are introducing data it has not previously been trained on throughout the training process. If the perplexity value for the training data continues to improve (grow smaller) while the validation perplexity deteriorates (grows larger), we can assume the model is overfitting. Finally, the test dataset is introduced to the network and the perplexity for the test data of calculated. The test dataset is significantly larger than the fractions of validation data introduce during training. The test perplexity will be my main performance measure as it indicates how well the network is able to generate meaningful sequences when compared to a relatively large amount of data which it hasn t trained on. 10

11 2.2 The Data The irealb jazz corpus consists of 1,186 files. These files are in the **jazz representation which is a text-based representation. The first few lines in each file describe the song; for example, they list the composer and the time signature. The body of each tune is comprised of a sequence of tokens and symbols which represent bar lines. The tokens encapsulate the duration of a chord and the chord itself. The chord is comprised of the root and the extension. For example, for a token 1C:maj7, the duration is 1, the root is C and the extension is maj7. Here, the duration is in Humdrum reciprocal form where the reciprocal of the duration is the fractional duration of a bar. Each token and bar line appear in a new line in the file (See figure below). Therefore, the network identifies each new line as a word in the dataset. If we were to feed the files into the network without any pre-processing, the network would be predicting the likelihood of entire tokens appearing next in the sequence. This is undesirable as the tokens can be broken down further into fundamental components. If these components aren t considered as separate words in the dataset, network is unable to explicitly learn about the relationship between the duration, the root and the extension. Identifying this was what lead to one of the main experiments conducted in the project; investigating whether separating the duration, root and the extension would improve the test perplexity.!!!otl: Afro Blue!!!COM: Santamaria, Mongo!!!ODT: 1959 **jazz *M3/4 *f: 2.F:min7 2.G:min7 4.A-:maj7 4.G:min7 2.F:min7 2.F:min7 2.G:min7 4.A-:maj7 4.G:min7 2.F:min7 2.E- 2.E- 4.D- 4.E- 2.F:min7 2.E- 2.E- 4.D- 4.E- 2.F:min7 *-!!!EEV: irb Corpus 1.0!!!YER: 26 Dec 2012!!!EED: Daniel Shanahan and Yuri Broze!!!ENC: Yuri Broze!!!AGN: Jazz Lead Sheets!!!PPP: Figure 1: **jazz file example 11

12 2.3 Music Theory The root of a chord in the **jazz representation take on the values of note names. There are 7 unique note names in music, namely A, B, C, D, E, F, G. Adjacent notes are separated in pitch by a whole step. Each note can also be sharpened by moving up in pitch by a half step or flattened by moving down in pitch by a half step. In the **jazz representation a sharpened note is suffixed with a # symbol and a flattened note is suffixed with a - symbol. There are some instances where a note at a certain pitch has two different note names. For example, A and B are separated by a whole step. A# is a half-step up from A and B- is a half-step down from B. Therefore A# and B- represent the same note in terms of pitch. The following is an exhaustive set of note names corresponding to 12 distinct pitches. {A, A# or B-, B, C, C# or D-, D, D# or E-, E, F, F# or G-, G, G# or A-} Each of these note names has a corresponding scale. A scale is a sequence of notes ordered by pitch. The degree of a note in a scale identifies its position in this sequence. Since a root in the **jazz representation identify a note name, (and so a corresponding scale) any chord with that root can be expressed as a set of notes played simultaneously. What s interesting is that the notes played alongside the root can be identified purely by their degree in the scale corresponding to the root. This means we can represent a chord solely by the root and a set of degrees in the root s scale. For example, take the chord C major which is rooted at C. The basic triad chord for C major consists of the note names CEG. In the scale of C, the degree of E is 3, the degree of G is 5 and the degree of C is 1. So, an equivalent representation of the chord is C:135. The extension of a chord in the **jazz representation describes a chord in a similar relativistic way. The lack of an extension implies the chord is a triad and so consists of the 1 st,3 rd and 5 th degree. Other examples of extensions are min7, major7 and 7#9. Though these extensions only implicitly identify the degrees in a chord, what is important is that they do not describe absolute note names. Having a relativistic description of the notes in a chord (over a absolute description) gives us a way of expressing the general and therefore fundamental way chords are structured. This observation is essential to make in order to understand the reasoning behind the two training techniques that were explored. 12

13 3. Experiments 3.1 Overview The experiments conducted investigate the effect certain training techniques have on the network s performance. The main techniques involve changing the representation of the data and artificially increasing the size of the corpus. The first technique involves changing the representation of the data so that the root and extension are considered as separate elements. The second technique involves transposing each song in the dataset to one of two keys. We will refer to the key a song is transposed to as the destination key. If the original key of a song was a major key, the destination key would be C major. If the original key of the song was a minor key, the destination key would be A minor. C major was chosen arbitrarily as the major destination key, however A minor was chosen as the minor destination key because it is the relative minor of C. The third technique artificially increases the data set by representing each tune in all 12 keys by way of transposition. The first technique aimed to teach the network the relationship between the root and the extension since any extension can appear next to any root. Both transposition techniques aimed to teach the network that a chord progression is independent of the key of a song; the context the song is in in terms of pitch. The 6 combinations of these techniques were explored by generating the 6 datasets listed below. The network was trained on each of these datasets which resulted in 6 models. The model was trained on the following datasets a) Original data with the duration and chord separated b) Original data with duration, root and extension separated c) Each tune transposed to C major or A minor with the duration and chord separated d) Each tune transposed to C major or A minor with the duration, root and extension separated. e) Each tune transposed through all keys with the duration and chord separated f) Each tune transposed through all keys with the duration, root and extension separated. Cross validation was conducted on the predictions made by each model and the test perplexities were compared across all 6 models. It was expected that the model with the lowest test perplexity would be the most capable of generating realistic chord progressions. One of the original objectives was to experiment with hyperparameter values which dictate the number of hidden layers and hidden units on the best model. A certain combination of values for these hyperparameters could have reduced the model s test perplexity even further. Unfortunately, I was unable to investigate the effect certain hyperparameter values have on the test perplexity of the best model. This is because the best model was trained on data set f) which was 12 times larger than the original dataset. Hence training the network became very slow and data collection wasn t completed in time. However, I was able to try these 13

14 hyperparameter combination on the original data. I restricted the experiments to only consider the hyperparameters which dictate the number of layers in the network and the number of hidden units and the following combinations of hyperparameters were used 1. 1 hidden layer, 300 hidden units hidden layers, 300 hidden units hidden layers, 300 hidden units hidden layers, 600 hidden units hidden layers, 600 hidden units hidden layers, 600 hidden units. 3.2 Splitting the Root and Extension of Chords By splitting the root and the extension of a chord in pre-processing, we are indicating that the network should consider the root and the extension as distinct elements in a sequence. To illustrate the intuition behind the assumption that splitting the root and extension will improve the perplexity, I will compare the probability distribution over the output vector for a model trained on O) the original data set and S) the dataset with the chord split at the root. For simplicity, we will assume that the data will only consist of tokens from the **jazz representation. There are 12 possible values for the root of a chord, namely {"C", "C#"or D, "D", "D#" or E, "E", "F", "F#"or G, "G", "G#"or A, "A", "A#"or B, "B"} For example s sake, let {, min7, maj7, dim7, sus, sus 4, 7b9 } be the set of possibilities for the extensions for a dataset. The exact number of distinct extensions in the dataset is not important for the illustration, it will be a finite number nevertheless. The alphabet generated from dataset S) will have dimensionality Compare this to the dimension of the alphabet for the model trained on dataset O). In this case, the alphabet will consist of all possible combinations of root and extension. Therefore, the alphabet s dimensionality will be 12*784. Since the probability vector the model outputs has the same dimensionality as the alphabet, the distribution will be spread over fewer outcomes for dataset S) so the distribution will be less disperse compared to the distribution over the output vectors for dataset O). The more disperse the distribution is over the output vector, the closer the distribution is to a uniform distribution which is what the distribution would look like if the model was making random predictions. Since perplexity can be thought of as a measure of randomness, it was hypothesised that splitting the root and the extension will lower the test perplexity. 14

15 3.2.1 Method Since the raw data amalgamates the duration with the root and extension, we will consider the data set with the duration separated from the root and the extension as the original dataset (dataset f ). This allows us to isolate the effects of separating the root from the extension. The root was separated from the extension by parsing the lines of each file in the dataset and replacing each token 1 C:maj7 with 1 C : maj7. Replacing the tokens in this was effectively done by inserting white space before and after the colon symbol. This separates the token into 1 C : maj7. Leaving the colon between the C and maj7 ensures that the model will learn to only predict an extension after it has predicted the colon symbol. The colon symbol did not appear in between every root and extension in the dataset. An example would be 2C7. Because this inconsistency in representation, it was important to identify the root of the token and check if it is followed by a colon symbol. If root was not followed by a colon symbol, one was added with surrounding whitespace. The root was taken to be the character at index 1 of the token string. If the character at index 2 of the token string was a sharp symbol # or a flat symbol -, it was concatenated to the root. The splitting process was only applied to the lines in the file that represented the chord progression of the tune, meaning the first few header lines were not parsed during this process. In order to be able to start the parsing from the beginning of the chord progression, we required the index of the line at which the progression starts. Generally, the chord progression would start on the line two lines below the line identifying the time signature. The time signature string always began with an M which made it easy to identify and no other lines within a given file started with an M. So, if i denotes the index of the line starting with M, i +2 was marked as the starting index of the progression. Since the order in which the fields appeared in the header lines were not consistent throughout the corpus, not all files had progressions starting at i +2. Therefore, some manual changes had to be made to the heading lines in a small number of raw **jazz files. 3.3 Transposing Since the chords associated with a key are defined by the degrees in the scale associated with the key, any chord progression can be represented in a key. All that is required are the relative separations between the chords in the progression. Therefore, a chord progression can essentially be defined by the separation of the chords pitches. This means that each chord progression can have a meaningful representation in a different key. It was hypothesised that transposing the jazz files would allow the network to recognised that the structure of the progression is independent of the key the song is defined in. This should improve the crossvalidation results for the following reason. If the model encounters a sequence of chords during training and the same sequence appears in the training or validation set but in a different key, the model will be able to predict the sequence in the validation or test set with a high accuracy. 15

16 3.3.1 Transposing to a single key The transpositions were done through a function in the pre-processing file which extracts the key from a given jazz file and determines whether the key is major or minor. In the **jazz representation, major keys are uppercase letters while minor keys are lowercase. Based on this, the function will transpose all major tunes to C major and all minor tunes to A minor. The transpose function determines the offset of the original key of a song from the key it is being transposed to. The offset is in terms of the number of the number of half steps between notes. Two dictionaries were used to hold the notes and their separation from the destination key; one dictionary holds the offsets of each note name from the note C and the other from the note A. This offset is determined by retrieving the value corresponding to the note from one of the two dictionaries. An array could have used instead of a dictionary and initialised in such a way that the indices of the array represented the separation of the given note from the destination key. But dictionaries were used because several notes can have the same pitch and therefore the same separation from the note we are transposing to. Using a dictionary allowed the same separation value to be assigned to all notes that share the same pitch, while simply using the indexing of an array does not allow for this Transposing through all keys There are 12 musical keys in total. Each key is separated from its adjacent key by a value of 1. If a tune is in the key of G minor, then shifting each root in the tune down in pitch by an offset of 1 will have the effect of transposing the tune to the key of F# min. For the same tune in the key G minor, shifting each root in the tune down in pitch by an offset of 2 will have the effect of transposing the tune to the key of F min. And so, for a given tune, we have a variable which represents the offset we use for transposition which is initialised to 1. The offset was incremented until it reaches a value of 12 and each time we incremented the variable, we created a copy of the original file and used the current value of the offset variable to transpose the copy. This resulted in 12 copies of the song. Each copy will be the tunes representation in one of the 12 keys in music. The only data structures required for this method of transposing are an array holding all the notes and a dictionary whose elements are key value pairs holding all the notes and their corresponding indices in the array. Since there are several notes that map to the same element in the array, it is necessary to use both the array and the dictionary for the same reasons that were expressed in the previous section. The index of the original key is obtained by retrieving it from the dictionary and the value of the offset is added to this value to return the index of the new key and the new root. The new key or root is retrieved from the array with the new index and replaces the old root or old key in the copy of the song. 16

17 3.3.3 Hyperparameters Combinations I aimed to re-train the best model from the those mentioned above using several combinations of hyperparameters. The hyperparameters I wanted to experiment with were the number of layers in the network and the number of hidden units in the network. The number of layers dictates how many LSTM cells are stacked on top of each other. LSTMs are stacked by feeding the output of one cell as the input of another cell. LSTMs are stacked in order to model hierarchical structures in the data. The hidden units represent the network s learning capacity. Having too few hidden units may result in a poor test perplexity while have too many hidden unit may cause the model to overfit the data. The following 6 combinations of hyperparameters were tested to see which combination resulted in the lowest test score hidden layer, 300 hidden units hidden layers, 300 hidden units hidden layers, 300 hidden units hidden layer, 600 hidden units hidden layer, 600 hidden units hidden layer, 600 hidden units. Adjusting the hyperparameters were as simple as changing the value of the variable in the network configuration. Unfortunately, due to time constraints, I was unable to try these combinations on the model with the best test score. But I did achieve some results with the combinations on the original dataset. 17

18 4. Results It is important to note that in the graphs below, though the test perplexity appears to be plotted for each epoch number, the test perplexity was calculated after all 13 epoch numbers were processed. The test perplexity was included in the graphs this was to make for easy comparisons. 4.1 Splitting the Root and Extension of Chords Graph 1: No Split Graph 2: Split 18

19 We can see in the dataset with no split between the root and the chord that the validation perplexity starts to rise after 3 epochs while the training perplexity continues to decrease. This could be an indication of overfitting, but this is unlikely as the validation perplexity soon plateaus. The validation perplexity for the dataset with the root and extension split is nonincreasing, which could imply it is generalizing better than the other model trained on the dataset with no split. In both datasets, the test perplexity is very close to the final validation perplexity. This is likely to be a good indication that the model is capturing a large amount of information about the fundamental structures in the progressions in the dataset within 13 epochs. 4.2 Transposing to a Single Key Graph 3: Single key transpose, no split Graph 4: Single key transpose, split 19

20 For both of the datasets, the validation perplexity more or less plateaus while the training perplexity reduces. Therefore, it would seem that transposing to a single key does not improve or worsen any signs of overfitting that weren t already present from the results in section 4.1. The final validation and test perplexities are very similar, just like the results in section Transposing Through All Keys Graph 5: All key transpose, no split Graph 5: All key transpose, no split 20

21 Given the size of the scale of perplexity for these results if much smaller when compared to the size of the scale from previous result sections, we can assume the increases in validation perplexity to be negligible. This clearly shows that there is no sign of overfitting and this model is generalising very to the new data in encounters. The test perplexities both models in this section are almost identical to their corresponding final validation perplexities. This is a very strong indication that the model has successfully learned the underlying structures of the chord progression in the dataset. This is especially true for the model trained on the dataset which transposed all tunes through all 12 keys and split the root and extension. This is because the final test and validation perplexities are the closest to the final train perplexity in this model when compared to all other models. 4.4 Hyperparameter Combinations Graph 6: Hyperparameter Combinations These results show that changing the hyperparameter values for the number of hidden layers and hidden units don t make a significant improvement on the test perplexity. Especially when compared to the results achieved from other methods. However, no conclusions can be made on the exact effectiveness of the hyperparameter values on the test perplexity. 21

22 4.5 Test Scores Graph 7: Test Scores When comparing the test scores, it is clear that separating the root form the extension of the chord and transposing each song through all 12 keys were the two techniques that were most effective in training a model which recognises the relevant musical structures in chord progressions. Combining these two techniques resulted in the model with the lowest test score. We can see that transposing through all keys achieved better results than transposing to one key. This is likely due to the fact that, when transposing through all keys, the model is more likely to encounter chord structures it has encountered during training. A small part of it may also be due to the fact that the header information in each transposed copy is intertidal across all copies when the songs are transposed through all keys. 22

23 No Split Split Original Transposed to the same key Transposed through all keys Table 1: Test Scores We can quantitatively see from this table the size of the improvements made by the training techniques used. The model trained on the original data with no split had a test perplexity of and the model trained on the dataset where the roots were split from the extensions and the songs were transposed through all keys had a test perplexity of Therefore, combining the best training techniques used resulted in the test perplexity reducing by

24 5. Conclusions From the results obtained, the best model was the one trained on dataset f) where each tune was transposed through all keys and the root of each chord was separated from the extension. This model had the lowest test score which implies that, out of the 6 models that resulted form the experiments, it can predict the next element in a sequence with the lowest level of randomness. This model, like all other models, showed no sign of overfitting which implies the model can generalise well to new data it encounters and therefore is capable of generating novel chord progressions. At the very least, the model is unlikely to replicate progressions which are identical to songs the network encountered during training. However, the best way to test whether the model has learned the structures of chord progressions and the structures of chords themselves is to sample from the data and listen to the progressions it generates. The hyperparameter combinations didn t drastically improve the text score when applied to a model trained on the original data. But, they may be more effective on the data where the root and chord are separated, and the tunes are transposed. 5.1 Future Work To further validate the conclusions made, I aim to extend this work by sampling from the best model and listening to the chord progression the model is capable of producing. I aim to try the combinations of hyperparameters on the best model from this project and see if any of them can improve the test scores much further. An interesting extension would be to introduce an element of human interaction with the model by allowing people to define a seed sequence. When the network is fed with the seed sequence it can generates a sequence of chords which completes the progression. 24

25 References [1] B.L. Sturm, J.F. Santos and I. Korshunova. Folk music style modelling by recurrent neural networks with long short term memory units. In Proc. 16th International Society for Music Information Retrieval Conference [2] D. Eck and J. Schmidhuber. Learning the long-term structure of the blues. In Proc. Int. Conf. on Artificial Neural Networks, [3] Judy A. Franklin. Recurrent neural networks for music computation. ORSA journal on computing, 18(3): ,

26 Appendix Appendix 1: Study Contract 26

27 27

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies Judy Franklin Computer Science Department Smith College Northampton, MA 01063 Abstract Recurrent (neural) networks have

More information

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Background Abstract I attempted a solution at using machine learning to compose music given a large corpus

More information

Recurrent Neural Networks and Pitch Representations for Music Tasks

Recurrent Neural Networks and Pitch Representations for Music Tasks Recurrent Neural Networks and Pitch Representations for Music Tasks Judy A. Franklin Smith College Department of Computer Science Northampton, MA 01063 jfranklin@cs.smith.edu Abstract We present results

More information

Deep Jammer: A Music Generation Model

Deep Jammer: A Music Generation Model Deep Jammer: A Music Generation Model Justin Svegliato and Sam Witty College of Information and Computer Sciences University of Massachusetts Amherst, MA 01003, USA {jsvegliato,switty}@cs.umass.edu Abstract

More information

Various Artificial Intelligence Techniques For Automated Melody Generation

Various Artificial Intelligence Techniques For Automated Melody Generation Various Artificial Intelligence Techniques For Automated Melody Generation Nikahat Kazi Computer Engineering Department, Thadomal Shahani Engineering College, Mumbai, India Shalini Bhatia Assistant Professor,

More information

arxiv: v1 [cs.sd] 8 Jun 2016

arxiv: v1 [cs.sd] 8 Jun 2016 Symbolic Music Data Version 1. arxiv:1.5v1 [cs.sd] 8 Jun 1 Christian Walder CSIRO Data1 7 London Circuit, Canberra,, Australia. christian.walder@data1.csiro.au June 9, 1 Abstract In this document, we introduce

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

A Transformational Grammar Framework for Improvisation

A Transformational Grammar Framework for Improvisation A Transformational Grammar Framework for Improvisation Alexander M. Putman and Robert M. Keller Abstract Jazz improvisations can be constructed from common idioms woven over a chord progression fabric.

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Learning Musical Structure Directly from Sequences of Music

Learning Musical Structure Directly from Sequences of Music Learning Musical Structure Directly from Sequences of Music Douglas Eck and Jasmin Lapalme Dept. IRO, Université de Montréal C.P. 6128, Montreal, Qc, H3C 3J7, Canada Technical Report 1300 Abstract This

More information

The Sparsity of Simple Recurrent Networks in Musical Structure Learning

The Sparsity of Simple Recurrent Networks in Musical Structure Learning The Sparsity of Simple Recurrent Networks in Musical Structure Learning Kat R. Agres (kra9@cornell.edu) Department of Psychology, Cornell University, 211 Uris Hall Ithaca, NY 14853 USA Jordan E. DeLong

More information

Blues Improviser. Greg Nelson Nam Nguyen

Blues Improviser. Greg Nelson Nam Nguyen Blues Improviser Greg Nelson (gregoryn@cs.utah.edu) Nam Nguyen (namphuon@cs.utah.edu) Department of Computer Science University of Utah Salt Lake City, UT 84112 Abstract Computer-generated music has long

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network Indiana Undergraduate Journal of Cognitive Science 1 (2006) 3-14 Copyright 2006 IUJCS. All rights reserved Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network Rob Meyerson Cognitive

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

Intelligent Quantizer and Interval Generator

Intelligent Quantizer and Interval Generator µscale Intelligent Quantizer and Interval Generator Manual Revision: 2018.02.16 Table of Contents Table of Contents Overview Features Installation Before Your Start Installing Your Module Front Panel Controls

More information

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello Small chord vocabularies Typically a supervised learning problem N C:maj C:min C#:maj C#:min D:maj D:min......

More information

Jazz Melody Generation and Recognition

Jazz Melody Generation and Recognition Jazz Melody Generation and Recognition Joseph Victor December 14, 2012 Introduction In this project, we attempt to use machine learning methods to study jazz solos. The reason we study jazz in particular

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University Abstract A model of music needs to have the ability to recall past details and have a clear,

More information

Some researchers in the computational sciences have considered music computation, including music reproduction

Some researchers in the computational sciences have considered music computation, including music reproduction INFORMS Journal on Computing Vol. 18, No. 3, Summer 2006, pp. 321 338 issn 1091-9856 eissn 1526-5528 06 1803 0321 informs doi 10.1287/ioc.1050.0131 2006 INFORMS Recurrent Neural Networks for Music Computation

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

Image-to-Markup Generation with Coarse-to-Fine Attention

Image-to-Markup Generation with Coarse-to-Fine Attention Image-to-Markup Generation with Coarse-to-Fine Attention Presenter: Ceyer Wakilpoor Yuntian Deng 1 Anssi Kanervisto 2 Alexander M. Rush 1 Harvard University 3 University of Eastern Finland ICML, 2017 Yuntian

More information

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment Gus G. Xia Dartmouth College Neukom Institute Hanover, NH, USA gxia@dartmouth.edu Roger B. Dannenberg Carnegie

More information

Algorithmic Music Composition using Recurrent Neural Networking

Algorithmic Music Composition using Recurrent Neural Networking Algorithmic Music Composition using Recurrent Neural Networking Kai-Chieh Huang kaichieh@stanford.edu Dept. of Electrical Engineering Quinlan Jung quinlanj@stanford.edu Dept. of Computer Science Jennifer

More information

NETFLIX MOVIE RATING ANALYSIS

NETFLIX MOVIE RATING ANALYSIS NETFLIX MOVIE RATING ANALYSIS Danny Dean EXECUTIVE SUMMARY Perhaps only a few us have wondered whether or not the number words in a movie s title could be linked to its success. You may question the relevance

More information

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL

More information

Pitch correction on the human voice

Pitch correction on the human voice University of Arkansas, Fayetteville ScholarWorks@UARK Computer Science and Computer Engineering Undergraduate Honors Theses Computer Science and Computer Engineering 5-2008 Pitch correction on the human

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Music Generation from MIDI datasets

Music Generation from MIDI datasets Music Generation from MIDI datasets Moritz Hilscher, Novin Shahroudi 2 Institute of Computer Science, University of Tartu moritz.hilscher@student.hpi.de, 2 novin@ut.ee Abstract. Many approaches are being

More information

Evolutionary Hypernetworks for Learning to Generate Music from Examples

Evolutionary Hypernetworks for Learning to Generate Music from Examples a Evolutionary Hypernetworks for Learning to Generate Music from Examples Hyun-Woo Kim, Byoung-Hee Kim, and Byoung-Tak Zhang Abstract Evolutionary hypernetworks (EHNs) are recently introduced models for

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

A probabilistic approach to determining bass voice leading in melodic harmonisation

A probabilistic approach to determining bass voice leading in melodic harmonisation A probabilistic approach to determining bass voice leading in melodic harmonisation Dimos Makris a, Maximos Kaliakatsos-Papakostas b, and Emilios Cambouropoulos b a Department of Informatics, Ionian University,

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx

Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx Olivier Lartillot University of Jyväskylä, Finland lartillo@campus.jyu.fi 1. General Framework 1.1. Motivic

More information

Audio: Generation & Extraction. Charu Jaiswal

Audio: Generation & Extraction. Charu Jaiswal Audio: Generation & Extraction Charu Jaiswal Music Composition which approach? Feed forward NN can t store information about past (or keep track of position in song) RNN as a single step predictor struggle

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input.

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input. RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input. Joseph Weel 10321624 Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige

More information

BachBot: Automatic composition in the style of Bach chorales

BachBot: Automatic composition in the style of Bach chorales BachBot: Automatic composition in the style of Bach chorales Developing, analyzing, and evaluating a deep LSTM model for musical style Feynman Liang Department of Engineering University of Cambridge M.Phil

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS Hyungui Lim 1,2, Seungyeon Rhyu 1 and Kyogu Lee 1,2 3 Music and Audio Research Group, Graduate School of Convergence Science and Technology 4

More information

Enabling editors through machine learning

Enabling editors through machine learning Meta Follow Meta is an AI company that provides academics & innovation-driven companies with powerful views of t Dec 9, 2016 9 min read Enabling editors through machine learning Examining the data science

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park katepark@stanford.edu Annie Hu anniehu@stanford.edu Natalie Muenster ncm000@stanford.edu Abstract We propose detecting

More information

CPU Bach: An Automatic Chorale Harmonization System

CPU Bach: An Automatic Chorale Harmonization System CPU Bach: An Automatic Chorale Harmonization System Matt Hanlon mhanlon@fas Tim Ledlie ledlie@fas January 15, 2002 Abstract We present an automated system for the harmonization of fourpart chorales in

More information

Experiments on musical instrument separation using multiplecause

Experiments on musical instrument separation using multiplecause Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk

More information

Higher National Unit Specification. General information. Unit title: Music Theory (SCQF level 8) Unit code: J0MX 35. Unit purpose.

Higher National Unit Specification. General information. Unit title: Music Theory (SCQF level 8) Unit code: J0MX 35. Unit purpose. Higher National Unit Specification General information Unit code: J0MX 35 Superclass: LF Publication date: June 2018 Source: Scottish Qualifications Authority Version: 01 Unit purpose This unit is designed

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

Evaluating Melodic Encodings for Use in Cover Song Identification

Evaluating Melodic Encodings for Use in Cover Song Identification Evaluating Melodic Encodings for Use in Cover Song Identification David D. Wickland wickland@uoguelph.ca David A. Calvert dcalvert@uoguelph.ca James Harley jharley@uoguelph.ca ABSTRACT Cover song identification

More information

Analysis and Clustering of Musical Compositions using Melody-based Features

Analysis and Clustering of Musical Compositions using Melody-based Features Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates

More information

Authentication of Musical Compositions with Techniques from Information Theory. Benjamin S. Richards. 1. Introduction

Authentication of Musical Compositions with Techniques from Information Theory. Benjamin S. Richards. 1. Introduction Authentication of Musical Compositions with Techniques from Information Theory. Benjamin S. Richards Abstract It is an oft-quoted fact that there is much in common between the fields of music and mathematics.

More information

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You Chris Lewis Stanford University cmslewis@stanford.edu Abstract In this project, I explore the effectiveness of the Naive Bayes Classifier

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

A Case Based Approach to the Generation of Musical Expression

A Case Based Approach to the Generation of Musical Expression A Case Based Approach to the Generation of Musical Expression Taizan Suzuki Takenobu Tokunaga Hozumi Tanaka Department of Computer Science Tokyo Institute of Technology 2-12-1, Oookayama, Meguro, Tokyo

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

JazzGAN: Improvising with Generative Adversarial Networks

JazzGAN: Improvising with Generative Adversarial Networks JazzGAN: Improvising with Generative Adversarial Networks Nicholas Trieu and Robert M. Keller Harvey Mudd College Claremont, California, USA ntrieu@hmc.edu, keller@cs.hmc.edu Abstract For the purpose of

More information

BayesianBand: Jam Session System based on Mutual Prediction by User and System

BayesianBand: Jam Session System based on Mutual Prediction by User and System BayesianBand: Jam Session System based on Mutual Prediction by User and System Tetsuro Kitahara 12, Naoyuki Totani 1, Ryosuke Tokuami 1, and Haruhiro Katayose 12 1 School of Science and Technology, Kwansei

More information

Music Solo Performance

Music Solo Performance Music Solo Performance Aural and written examination October/November Introduction The Music Solo performance Aural and written examination (GA 3) will present a series of questions based on Unit 3 Outcome

More information

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING Adrien Ycart and Emmanouil Benetos Centre for Digital Music, Queen Mary University of London, UK {a.ycart, emmanouil.benetos}@qmul.ac.uk

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

COMPARING RNN PARAMETERS FOR MELODIC SIMILARITY

COMPARING RNN PARAMETERS FOR MELODIC SIMILARITY COMPARING RNN PARAMETERS FOR MELODIC SIMILARITY Tian Cheng, Satoru Fukayama, Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan {tian.cheng, s.fukayama, m.goto}@aist.go.jp

More information

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval Data-Driven Solo Voice Enhancement for Jazz Music Retrieval Stefan Balke1, Christian Dittmar1, Jakob Abeßer2, Meinard Müller1 1International Audio Laboratories Erlangen 2Fraunhofer Institute for Digital

More information

Bach in a Box - Real-Time Harmony

Bach in a Box - Real-Time Harmony Bach in a Box - Real-Time Harmony Randall R. Spangler and Rodney M. Goodman* Computation and Neural Systems California Institute of Technology, 136-93 Pasadena, CA 91125 Jim Hawkinst 88B Milton Grove Stoke

More information

Popular Music Theory Syllabus Guide

Popular Music Theory Syllabus Guide Popular Music Theory Syllabus Guide 2015-2018 www.rockschool.co.uk v1.0 Table of Contents 3 Introduction 6 Debut 9 Grade 1 12 Grade 2 15 Grade 3 18 Grade 4 21 Grade 5 24 Grade 6 27 Grade 7 30 Grade 8 33

More information

Melody classification using patterns

Melody classification using patterns Melody classification using patterns Darrell Conklin Department of Computing City University London United Kingdom conklin@city.ac.uk Abstract. A new method for symbolic music classification is proposed,

More information

Pattern Discovery and Matching in Polyphonic Music and Other Multidimensional Datasets

Pattern Discovery and Matching in Polyphonic Music and Other Multidimensional Datasets Pattern Discovery and Matching in Polyphonic Music and Other Multidimensional Datasets David Meredith Department of Computing, City University, London. dave@titanmusic.com Geraint A. Wiggins Department

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park, Annie Hu, Natalie Muenster Email: katepark@stanford.edu, anniehu@stanford.edu, ncm000@stanford.edu Abstract We propose

More information

A Bayesian Network for Real-Time Musical Accompaniment

A Bayesian Network for Real-Time Musical Accompaniment A Bayesian Network for Real-Time Musical Accompaniment Christopher Raphael Department of Mathematics and Statistics, University of Massachusetts at Amherst, Amherst, MA 01003-4515, raphael~math.umass.edu

More information

Deep learning for music data processing

Deep learning for music data processing Deep learning for music data processing A personal (re)view of the state-of-the-art Jordi Pons www.jordipons.me Music Technology Group, DTIC, Universitat Pompeu Fabra, Barcelona. 31st January 2017 Jordi

More information

Retiming Sequential Circuits for Low Power

Retiming Sequential Circuits for Low Power Retiming Sequential Circuits for Low Power José Monteiro, Srinivas Devadas Department of EECS MIT, Cambridge, MA Abhijit Ghosh Mitsubishi Electric Research Laboratories Sunnyvale, CA Abstract Switching

More information

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Finding Sarcasm in Reddit Postings: A Deep Learning Approach Finding Sarcasm in Reddit Postings: A Deep Learning Approach Nick Guo, Ruchir Shah {nickguo, ruchirfs}@stanford.edu Abstract We use the recently published Self-Annotated Reddit Corpus (SARC) with a recurrent

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Estimation of inter-rater reliability

Estimation of inter-rater reliability Estimation of inter-rater reliability January 2013 Note: This report is best printed in colour so that the graphs are clear. Vikas Dhawan & Tom Bramley ARD Research Division Cambridge Assessment Ofqual/13/5260

More information

Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks

Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks Douglas Eck and Jürgen Schmidhuber IDSIA Istituto Dalle Molle di Studi sull Intelligenza Artificiale Galleria 2, 6928

More information

Building a Better Bach with Markov Chains

Building a Better Bach with Markov Chains Building a Better Bach with Markov Chains CS701 Implementation Project, Timothy Crocker December 18, 2015 1 Abstract For my implementation project, I explored the field of algorithmic music composition

More information

Permutations of the Octagon: An Aesthetic-Mathematical Dialectic

Permutations of the Octagon: An Aesthetic-Mathematical Dialectic Proceedings of Bridges 2015: Mathematics, Music, Art, Architecture, Culture Permutations of the Octagon: An Aesthetic-Mathematical Dialectic James Mai School of Art / Campus Box 5620 Illinois State University

More information

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Introduction Brandon Richardson December 16, 2011 Research preformed from the last 5 years has shown that the

More information

Automated Accompaniment

Automated Accompaniment Automated Tyler Seacrest University of Nebraska, Lincoln April 20, 2007 Artificial Intelligence Professor Surkan The problem as originally stated: The problem as originally stated: ˆ Proposed Input The

More information

RNN-Based Generation of Polyphonic Music and Jazz Improvisation

RNN-Based Generation of Polyphonic Music and Jazz Improvisation University of Denver Digital Commons @ DU Electronic Theses and Dissertations Graduate Studies 1-1-2018 RNN-Based Generation of Polyphonic Music and Jazz Improvisation Andrew Hannum University of Denver

More information

Primo Theory. Level 7 Revised Edition. by Robert Centeno

Primo Theory. Level 7 Revised Edition. by Robert Centeno Primo Theory Level 7 Revised Edition by Robert Centeno Primo Publishing Copyright 2016 by Robert Centeno All rights reserved. Printed in the U.S.A. www.primopublishing.com version: 2.0 How to Use This

More information

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007 A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis

More information

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Hendrik Vincent Koops 1, W. Bas de Haas 2, Jeroen Bransen 2, and Anja Volk 1 arxiv:1706.09552v1 [cs.sd]

More information

The Human Features of Music.

The Human Features of Music. The Human Features of Music. Bachelor Thesis Artificial Intelligence, Social Studies, Radboud University Nijmegen Chris Kemper, s4359410 Supervisor: Makiko Sadakata Artificial Intelligence, Social Studies,

More information

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University danny1@stanford.edu 1. Motivation and Goal Music has long been a way for people to express their emotions. And because we all have a

More information

Centre for Economic Policy Research

Centre for Economic Policy Research The Australian National University Centre for Economic Policy Research DISCUSSION PAPER The Reliability of Matches in the 2002-2004 Vietnam Household Living Standards Survey Panel Brian McCaig DISCUSSION

More information

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Eita Nakamura and Shinji Takaki National Institute of Informatics, Tokyo 101-8430, Japan eita.nakamura@gmail.com, takaki@nii.ac.jp

More information

Methodologies for Creating Symbolic Early Music Corpora for Musicological Research

Methodologies for Creating Symbolic Early Music Corpora for Musicological Research Methodologies for Creating Symbolic Early Music Corpora for Musicological Research Cory McKay (Marianopolis College) Julie Cumming (McGill University) Jonathan Stuchbery (McGill University) Ichiro Fujinaga

More information

Composing with Pitch-Class Sets

Composing with Pitch-Class Sets Composing with Pitch-Class Sets Using Pitch-Class Sets as a Compositional Tool 0 1 2 3 4 5 6 7 8 9 10 11 Pitches are labeled with numbers, which are enharmonically equivalent (e.g., pc 6 = G flat, F sharp,

More information