Generating Music from Text: Mapping Embeddings to a VAE s Latent Space

Size: px
Start display at page:

Download "Generating Music from Text: Mapping Embeddings to a VAE s Latent Space"

Transcription

1 MSc Artificial Intelligence Master Thesis Generating Music from Text: Mapping Embeddings to a VAE s Latent Space by Roderick van der Weerdt August 15, EC January August 2018 Supervisor: Dr. A. Meroño Peñuela Examiner: Prof. Dr. F.A.H. van Harmelen Assessor: Dr. K.S. Schlobach VU Amsterdam

2 Abstract Music has always been used to elevate the mood in movies and poetry, adding emotions which might not have been without the music. Unfortunately only the most musical people are capable of creating music, let alone the appropriate music. This paper proposes a system that takes as input a piece of text, the representation of that text is consequently transformed into the latent space of a VAE capable of generating music. The latent space of the VAE contains representations of songs and the transformed vector can be decoded from it as a song. An experiment was performed to test this system by presenting a text to seven experts, along with two pieces of music from which one was created from the text. On average the music generated from the text was only recognized in half of the examples, but the poems gave significant results in their recognition, showing a relation between the poems and the generated music. Contents 1 Introduction 1 2 Related Work Background Music Generating with RNN Music Generation with VAE The MIDI Representation Text Based Variational AutoEncoders Word2Vec Method Encoding the Lyrics (with Word2Vec) Bringing the Latent Spaces Together Decoding the Music (with Magenta s MusicVAE) Experiment Implementation Results Discussion Conclusion 10 References 11 A Poems used during the Experiment 13 B Table of the Results 15

3 1 Introduction Music has always been used to elevate the mood in pictures and poetry, adding emotions which might not have been without the music. When a different piece of music is played during a movie it can change a scene from a horror to a comedy, completely changing the impact of what is seen by something that is heard. The famous shower scene from the movie Psycho would not be famous if not for the music [2], bringing a sense of horror, before any visual clues are given. Another example of this synergy, but now between text and music, happens on the radio. When a presenter has prepared a piece of text they often select a piece of music to play along the message. Not only does this give some more depth to the text, it can also be used to embed an extra meaning in the text, just like it is done in cinema. The problem lies here in the time it takes to determine what piece of stock music fits the text, if one fits at all. Preferable would be to create a new piece of music every time one is required, unfortunately creating music to specifically augment one scene or one piece of text requires a lot of time. A musician with enough skill would have to be available to create meaningful music and even then it requires time for the musician to interpret the meaning of the text or scene in order to channel it to a piece of music. Generating a new piece of music specifically for the text would be ideal, but at the moment a way to do this does not exist. Automated addition and creation of music could help in this situation, and everyone else to augment their creations. Research has been exploring different ways to generate new music based on existing music for years [4] using statistical models to learn the transitions between notes from existing music in order to generate new compositions. Advanced neural networks have recently been used to generate entirely new music 1 [16, 14]. The models are trained on a large set of music, resulting in models from which new music can be generated, influenced by the music used during training. But none has sought to create music based on another medium, such as text. This paper details the research of creating music based on a specific text. This will require a model that can generate specific music and also interpret text to base this music on. The newly generated music should fit the text similar to how the music of a song fits the lyrics of that same song. Because when an artist creates a new song the music and the lyrics work together, and are written together in order to enhance the emotion they want to express. In order to create a connection between the feelings addressed by the text and the feelings addressed by the music a mapping will have to be trained that connects a text with a piece of music, this research will work under the hypotheses that the text and music of every song express the same feelings, allowing it to be used to train a model based on the similarity. This model is trained to transform the representations of the text into a representation of a piece of music. Training this model require two models that are able to create a latent space in which the text (in the first) and the music (in the second) can be represented. This can be summed up in the research question: To what extent can the relationship between lyrics and melodies be learned from representations of text and music trained with large collections using embeddings and VAE?. The contributions of this research are: A comparison between state of the art music generation algorithms. (Section 2) A novel approach to combine two latent spaces. (Section 3) An evaluation of the proposed system. (Sections 4 and 5) Section 2 starts with a survey of research related to the goal of generating music, examining different ways of automatically generating music and how to represent the text input for the 1 Whether it can be considered to be new music is discussed in Section

4 generation of the music. Section 3 will detail the system used for the experiment in Section 4. The results of the experiment will be examined and discussed, along with the complications, in Section 5 and lastly an answer to the research question will be given in Section 6. 2 Related Work The field of Machine Learning has been experimenting with different approaches to generate music, this section will survey some of these past approaches. The automated generation of music can be considered to progress concurrently with the generation of text [4]. Initially statistical models were used to generate music [4]. Statistical models like Markov Models [13] were used to generate music based on existing music, looking at the transitions in the existing music in order to re-create similar transitions in the new music. This allowed for pieces that sounded different, but they still were limited to what already existed for the transitions. When RNN (Recurrent Neural Network) [21] models became more practically usable Sutskever et al. created a RNN that was capable of generating the next word in a sentence [17]. This system was character based, meaning there was no representation of the words in the system, but it still generated existing words, learning enough from the characters. This even allowed the system to create words that were not in the training set. For words this might not be convenient, because non-existing words are not useful, but in music this would be an advantage. Further experiments with RNN s yielded entire pages of generated texts [5], which at a glance were not recognizably different from a real text. The effectiveness of RNN s in the text domain led Simon and Oore to investigate the possibilities of music generation with RNN s [16]. Their research mainly focused on adding expressive timing and dynamics to automatically generated music. This allowed the system to generate music, from which unlimited samples could be taken, but to make the music sound differently the model would have to be retrained entirely. In order to create a system that did not have to be retrained and allow the user to work with different possible outputs Bowman et al. used a VAE (Variational AutoEncoder) [11] to generate texts [1]. The VAE allowed them to train a latent space and from their manipulate their texts, generating interpolations of different texts, showing the transition from one line into the other. Another research used the same system to generate new tweets based on a large collection of tweets [15]. As music generation tends to follow in the footsteps of text generation a VAE was also used to generate music [14]. The VAE of Roberts et al. allowed the user to manipulate and in their own words doodle with the music, generating different pieces of music. But it requires music as input to start with the manipulations. All of the previous RNN and VAE implementations that generate music use MIDI [19] representations of music. Other research has been experimenting with the generation of music based on the signal, or raw audio of the music [22]. Zukowski and Carr use a SampleRNN which is able to train on a small training set, in their experiments only one album, and produce similar music. The system would have to be retrained for every new training set and the model would generate hours of music, from which only minutes of music were use able as new music. 2.1 Background This section will investigate the methods created in related research that will be used in the system of this research. 2

5 2.1.1 Music Generating with RNN Simon and Oore use a RNN to generate new music [16]. RNN s are models consisting of multiple layers of nodes where all nodes are connected to each other node within that same layer. The benefit of this is that each node can influence every other node, resulting in a more expressive network. Often the nodes used in RNN s are LSTM (Long Short Term Memory) nodes (or units) [7] which differ from regular nodes in that they remember the previous inputs (with a decay over time). Allowing previous input to influence the next input and be influenced by the input before that. This feature makes RNN s very suitable for longer sentences 2 where each word could affect other words. A second effective usage of RNN s is with images as input, where multiple pixels together means something different from those same pixels separately [20]. In their research Simon and Oore intended to achieve more dynamic music [16] by using a RNN trained on approximately 1400 piano recordings performed by skilled pianists and by using a different music representation from the one applied in the past. Instead of initializing a longer note being played by a sequence of short notes being played Simon and Oore chose to use a notation more similar to the MIDI encoding. Now the beginning (note-on event) and end (note-off event) are used to encode the duration of a note (for more on MIDI see Section 2.1.3). The resulting implementation is able to generate music based on its input, which are (initially) 1400 music pieces. When those 1400 pieces of music are replaced with different music, different music will be generated. In order to make this viable for the system of this research it would require the representation of the text to be approximately 1400 pieces of music that are representative of the text. This would also require the retraining of the RNN for every new input text, which is a costly process. The research only used high quality MIDI files with only one instrument playing, which does not scale to modern music where all different kinds of instruments are played Music Generation with VAE Another way to generate music is to use the MusicVAE proposed by Roberts et al. [14]. VAE s were first proposed by Kingma and Welling [11]. They combine two RNN s with a latent space in between as depicted in Figure 1. The first RNN serves as an encoder of the input data (for example music), which encodes the data as a vector in the latent space. By training the VAE sufficiently the latent space will represent all the input data. The second RNN is a decoder that allows for a vector in the latent spaces to be decoded to a piece of music. When a piece of music would be encoded to a vector in the latent spaces and that same vector would then be decoded to a piece of music the new piece of music will resemble the original piece of music, but it will not be identical. Because the latent space is a relatively small dimensional space to represent the pieces of music, information about the music will be lost. But this is not a problem, since the purpose of the VAE is not to replicate music, but to generate new music. MusicVAE uses the same music representation as the RNN uses internally, taking MIDI files as input. For the training of the VAE a lot more music is required than the training of the RNN, because the latent space must be trained enough to be representative. [14] used approximately 1.5 million unique MIDI files in order to train MusicVAE. One large benefit of using MusicVAE over the RNN from Simon and Oore [16] is the reusability, MusicVAE only needs to be trained once to be able to generate a new piece of music. However it does need some kind of input to base its new music on, as to not simply generate random new music. Two different algorithms to replace the encoder will be examined in Section and 2.1.5, both being able to encode text into an embedding. 2 As famously explained in the blogpost written by Andrej Karpathy [10]. 3 Original image taken from 3

6 Figure 1: Visualization of the VAE used by [14]. First a piece of music is encoded into latent space z, and subsequently this encoding is decoded from the latent space back into a (similar) piece of music The MIDI Representation Both the MusicVAE [14] and Performance RNN [16] use music files encoded in the MIDI format. MIDI (Musical Instrument Digital Interface) is an encoding for musical information originally intended for use between electronic instruments and computers [19]. What makes the MIDI format special when compared to more regular music formats as MP3 or WAV is that it does not contain a compressed recording of a song. Instead it contains separate tracks which each in turn contains the notes and musical events of a song in symbolic form resulting in a machine-interpretable music score. MIDI events tell the instrument everything it needs to know in order to produce the correct sound. This includes events about what kind of instrument should be synthesized but also when a note should start and when a note should stop. This nature of the MIDI file means that it can not be played as other music files can, but every track needs to be synthesized in order to hear the song. The advantage this brings is that each individual track can be extracted, which is not possible with the more conventional music formats. This allows the training of models with only one musical part, instead of all the instruments playing simultaneously Text Based Variational AutoEncoders VAE s trained on large text corpora have had success in creating a latent space capable of generating new sentences. Bowman et al. [1] used approximately ebooks to train their VAE which was capable of generating natural sentences. Semeniuta et al. [15] trained a VAE on twitterdata, creating a VAE that was able to generate new tweets. The benefit of using a VAE to encode the lyrics of the songs would be that the nature of the latent spaces of both the lyrics VAE and the music VAE are more similar. Following the hypotheses that the lyrics and music have the same meaning the latent spaces might be assumed to also be shaped similarly. This would make a transformation between the latent spaces easier to create. But using a VAE would also create a problem. The latent space of the VAE does not create an exact representation of the lyrics, meaning that information about the lyrics will be lost. making a transformation from something similar to the text into the music latent space might create too much noise. On top of that the training data used for the VAE by Bowman et al. [1] and Semeniuta et al. [15] do not fit the purposes of this research, since they differ too much from lyrics. 4

7 2.1.5 Word2Vec A simpler solution would be to use Word2Vec [12]. Word2Vec is a small feed-forward neural network often used in NLP tasks. It allows the user to train a model with a large set of words. The model will create a vector for every word and train the model resulting in the vectors of words similar to each other to be closer together as opposed to words that are dissimilar. This vector space could be compared to the latent space of the VAE s, creating a large representation of the words used during the training. The vector space of a Word2Vec model differs from the latent space of a VAE in that it keeps all the information about the words. When a word in encoded to a vector the vector will be decoded into the original word. This would be more fitting for the system proposed in this paper, since it would allow to keep as much information about the lyrics as possible. 3 Method The proposed system uses a simple learning algorithm, MLP [8], in order to infer a function that maps a latent space of lyrics learned through Word2Vec [12], to a latent space of melodic musical sequences in MIDI learned through MusicVAE [14]. Figure 2 displays the data flow of the system. First the input text is encoded and embedded into the first latent space (z 1 ), as explained in Section 3.1. Secondly the text representation is used as input for the MLP which transforms it into the latent space (z 2 ) of the VAE. The transformation (T) is described in Section 3.2. The resulting embedding subsequently gets decoded into a MIDI file (Section 3.3). All the code used during this research and instructions on how to use them can be found at This section will expand upon the implementation focusing on these three components starting with the lyrics representation. Music decode z 1 : latent space T: transformation z 2 : latent space VAE w2v input text Figure 2: Visualization of the entire system, the arrow shows the data flow. 3.1 Encoding the Lyrics (with Word2Vec) The representation of the lyrics was created using Word2Vec. Word2Vec, as described in Section 2.1.5, allows for the training of vectors, or embeddings, representing words. Embeddings of words more similar to each other will be closer to each other compared with words that are dissimilar. For this project the embeddings were trained only on lyrics from songs, instead of the more usual collections of texts. This was done in order to train only on musical usage of the words, 5

8 and not the more regular use of words as is practised in, for example, news articles [12] or product reviews [6]. Both of which are more often used as training data to train the Word2Vec embeddings. The decision to only use lyrics as training data was based on research [3] stating that when there is enough data available it best to use the data you are trying to represent for your training, instead of a more general dataset. Because the training of the transformation (T) from the text latent space into the music latent space (as described in Section 3.2) will use sentences of text instead of single words the representation of the words alone is not enough. To create the representation of the longer pieces of text the mean of the vector representations of all the words in the text was taken, resulting in one vector representing the piece of text. 3.2 Bringing the Latent Spaces Together In order to connect the latent space of the lyrics to the latent space of the music a MLP (Multilayer Perceptron) [8] was trained to learn the transformation from pieces of lyrics into pieces of music. A visualization of this transformation (T) can be seen in Figure 3. The vector representations of the text are taken as input for the transformation (T) and the vector representations of the pieces of music are used as the output during the training. z 1 z 2 T Figure 3: Visualization of the transformation performed by the MLP, with the dimensions of z 1 and z 2 reduced to two so it can be displayed. The training set consisted of lyrics and music pairs, taken from 409 songs. All those songs were MIDI files that contained a track where the lyrics of the song are timed with the rest of the music. 4 These timed lyrics allowed the MIDI to be broken up into smaller parts, creating a training set of lyrics with a maximum of 140 characters. Each paired with a piece of music from the original song starting at the moment the lyrics also start in the song and continuing for 16 bars (because MusicVAE works best with pieces of music 16 bars long). Keeping the parts a maximum of 140 characters long allowed for the parts to be similar in length even though sentences in songs often are not. The cut-off point of a part was never in the middle of a word, but always at the end of a sentence, resulting in some lyric parts to be a little shorter then 140 characters, thereby maintaining the structure of the sentences. Instead of selecting which melody line to use by examining the music or the tracknames [18] all the MIDI tracks using melody instruments were initially cut up into the 16 bar pieces paired with the lyrics. Melody tracks, as so defined by [14], are the tracks using instruments with a MIDI identifier between 1 and 32. Using the MusicVAE model all those music pieces were tested on whether they could be encoded into the latent space. For each song the track that contained the most pieces of music that could be encoded was selected and only the pieces of music from that track were used to train the MLP model. 4 These kind of MIDI files are often used to play karaoke, where the lyrics must play along with the song. 6

9 3.3 Decoding the Music (with Magenta s MusicVAE) For the implementation a pre-trained VAE model originally trained by Magenta was used [14]. The model was trained on music sequences taken from a large midi dataset, where for every MIDI file melody parts where retrieved with a length of 16 bars (which for a song with a ritme of 120 bpm approximately comes down to 30 seconds chunks). The latent space of the VAE used 512 dimensions. 4 Experiment In order to test the effectiveness of the created system an experiment was performed. The questions consisted of short pieces of text, mostly excerpts from poems. Along with each text two pieces of music were played, one of the pieces was generated from the text used as the question and the second piece of music was generated with a separate piece of text. After both of the pieces of music were finished the participants were asked to mark down which of the two they found most similar to the text based on the feelings expressed by the music. This goal of the experiment is to determine whether the music created by the system are similar enough to the text used to create that specific piece of music so that a different piece of music (created from a different text) is recognizably different. Juslin et al.state the importance of the explicit difference between emotion expressed by music and emotion induced by music [9]. Emotion expressed by music is the emotion corresponding to the song itself. The induced emotion is the emotion that a listener feels when listening to the song. These two emotions do not have to correspond, it might for example be possible to feel happy from listening to a sad song. Even though this research, and therefore this experiment, does not specifically focus only on the emotion expressed by the music, but the bigger less specific feeling expressed by the music, we use professionals for the experiment. These professionals are instructed to (and are expected to be able to) focus on the expressed feelings over the induced feelings. Seven professionals have participated in the experiment, answering questions about eight different poems, each time having to select the best fitting piece of music out of two different pieces. The participants were all part of a training program learning how to work on the radio. Part of there education is learning what kind of music should be played while different texts are read on the radio. This makes them very capable participants, being better suited to perform the experiment than an untrained person. Out of the eight poems that were used during the experiment six were excerpts taken from pre-existing poems by famous poets and two where only one word. The poems can be found in Appendix A. 4.1 Implementation Approximately 57,650 songlyrics were used to train the Word2Vec embeddings, scrapped from the LyricsFreak website and retrieved from: All of the songs are modern music 5 from all different genres, ranging from Pop to Metal to hip hop, but all are written in English. All the lyrics were processed by the system removing all punctuation marks and empty lines, but the sentences were kept intact as this could influence the sliding window. 5 As opposed to classical music. 7

10 Total Correct P1 P2 P3 P4 P5 P6 P7 Figure 4: Correctly answered questions for each participant of the experiment, with confidence interval. The Word2Vec implementation of Gensim 6 has been used, which is a Python port of the original implementation by Google [12]. Standard parameters with a vector size of 100, sliding window size of three, and minimal word count of five were used for the model. The MLP used for the transformation (T) had two hidden layers, the first had 256 nodes and the second Stochastic gradient descent was used to train the model while using the mean squared error as the loss function. It was trained until convergence. 5 Results The raw results of the experiment can be found in Appendix B. Seven experts participated and answered fifteen questions. After fifteen questions the experiment had to be ended prematurely because a radio broadcast had to start. This resulted in only fifteen question being answered. 1,00 0,75 0,50 0,25 0,00 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Figure 5: Normalized number of correct answers for each question

11 Figure 4 visualizes the accuracy of the participants based on their answers. Out of the fifteen questions each expert on average answered 6.71 questions correctly, with at most eight and at least five correct answers by the individual participants. This means the experts chose the wrong piece of music for the text more frequent then the correct piece of music. On average each question was answered correctly by 45% of the experts, with the best answered question having six of the seven experts giving the correct answer. Figure 5 displays how often the question was answered correctly. The worst answered questions were only correctly answered by one of the experts. Performing an ANOVA test showed that there is no significant difference between the questions, with a p-value of Figure 6 shows a comparison of the results of the first and second time a text was used. A paired T-test between the first and second time gives a p-value of 0.741, indicating that the text of the poem was relevant to whether it would be recognized or not. Round 1 Round 2 1,00 0,75 0,50 0,25 0,00 Poem 1 Poem 2 Poem 3 Poem 4 Poem 5 Poem 6 Poem 7 Poem 8 Figure 6: Normalized number of correct answers for each question. 5.1 Discussion Given the similarity between the results of the questions with the same poems a relation between between the poem and the piece of music with which it is associated is shown. This is evidence showing that the text did affect the music that was generated by the system Even though it might not have given the best fitting music for each text, it did give similar results to each poem in both rounds. Meaning that the transformation, although not correctly, did transform to specific points in the latent space. The ANOVA test over the different questions had a p-value of 0.053, which is such a small difference from being significant which means that a slightly longer experiment, especially since this experiment was cut short, could already be enough to make the outcome significant. During the experiment participants mentioned that all the music started to sound similar after the first few music clips, and considering they had to listen to 30 different MIDI clips it could have been too much to see the differences. MIDI music always is more static when compared to real music, since it synthesized instead of played. Suggesting that MIDI might not be the best format suited for these applications, or that the MIDI should be synthesized with actual (digital) instruments, instead of on a laptop. 9

12 Something that should also be mentioned is whether this kind of generated music should be considered actual new music. All the information the model (it being RNN or VAE) uses comes from existing music, so any generated music can only be a product of the original training data. One might consider this means any generated music is unoriginal music. But a more fitting comparison might be to compare it to inspiration, just like real musicians are inspired the model also requires inspiration to create something new. After the experiment one of the participants mentioned that the music commonly used to play under text on the radio has a lower temperature then the music created by the system. Temperature is also discussed by [14] but it is not something that can be controlled when the decoder is used. But by adding this functionality future research might explore as way to create more usable music. For future research it is interesting to examine whether it would be possible to replace the Word2Vec model with a VAE trained on lyrics. In the first place it could be used for the same purposes this research had, but the added benefit would be that the transformation might be turned around. Resulting in a system that should be able to generate text from music. 6 Conclusion In order to answer the research question of this paper: To what extent can the relationship between lyrics and melodies be learned from representations of text and music trained with large collections using embeddings and VAE? a system was created. The system is able take a piece of text and encode it into an embedding using Word2Vec. Transforming the embedding with the MLP into an embedding in the MusicVAE latent space allows the decoder of MusicVAE to decode this second embedding into a piece of music. Experiments have been performed with music generated with the system for specific poems, were every poem was presented to participants of the experiment together with two pieces of music, one generated with this poem, and one with a different poem. The participants only answered 45% of the questions correctly, showing the system did not produce the correct music for the poems. All of the poems were used twice, both with music created specifically for that poem, but the music itself was different. The answers show that the participants had very similar preferences when the poem was used the second time. This indicates that the system, although not producing the correct music, did generate music specifically for the poems. Based on this the answer to the research question is that the relationship between lyrics and melodies can be learned from representations of text and music. Even though the poems and music might not be recognizably the same, they were consistently (dis-)similar. Showing the extent of the relationship to be consistent between the text and the music. 10

13 References [1] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. arxiv preprint arxiv: , [2] Royal S Brown. Herrmann, hitchcock, and the music of the irrational. Cinema Journal, pages 14 49, [3] Erion Çano and Maurizio Morisio. Quality of word embeddings on sentiment analysis tasks. In International Conference on Applications of Natural Language to Information Systems, pages Springer, [4] Darrell Conklin. Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences, pages 30 35, [5] Alex Graves. Generating sequences with recurrent neural networks. arxiv preprint arxiv: , [6] Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pages International World Wide Web Conferences Steering Committee, [7] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8): , [8] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5): , [9] Patrik N Juslin and Petri Laukka. Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3): , [10] Andrej Karpathy. The unreasonable effectiveness of recurrent neural networks. karpathy.github.io/2015/05/21/rnn-effectiveness/, May [11] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arxiv preprint arxiv: , [12] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arxiv preprint arxiv: , [13] Lawrence R Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): , [14] Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. A hierarchical latent vector model for learning long-term structure in music. arxiv preprint arxiv: , [15] Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. A hybrid convolutional variational autoencoder for text generation. arxiv preprint arxiv: ,

14 [16] Ian Simon and Sageev Oore. Performance rnn: Generating music with expressive timing and dynamics [17] Ilya Sutskever, James Martens, and Geoffrey E Hinton. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages , [18] Michael Tang, Yip Chi Lap, and Ben Kao. Selection of melody lines for music databases. In Computer Software and Applications Conference, COMPSAC The 24th Annual International, pages IEEE, [19] The MIDI Manufacturers Association. The complete midi 1.0 detailed specification. In Tech. rep., The MIDI Manufacturers Association, Los Angeles, CA. org/specifications/item/the-midi-1-0-specification, [20] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages , [21] Ronald J Williams and David Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation: Theory, architectures, and applications, 1: , [22] Zack Zukowski and Cj Carr. Generating black metal and math rock: Beyond bach, beethoven, and beatles. In 31st Conference on Neural Information Processing Systems (NIPS 2017),

15 A Poems used during the Experiment Hope is the thing with feathers - That perches in the soul - And sings the tune without the words - And never stops - at all - And sweetest - in the Gale - is heard - And sore must be the storm - That could abash the little Bird That kept so many warm - I ve heard it in the chillest land - And on the strangest Sea - Yet - never - in Extremity, It asked a crumb - of me. Poem 1: Emily Dickinson - Hope is the Thing with Feathers You came to me this morning And you handled me like meat You d have to be a man to know How good that feels, how sweet My mirrored twin, my next of kin I d know you in my sleep And who but you would take me in A thousand kisses deep Poem 2: Leonard Cohen - A Thousand Kisses Deep Stop all the clocks, cut off the telephone, Prevent the dog from barking with a juicy bone, Silence the pianos and with muffled drum Bring out the coffin, let the mourners come. Poem 3: WH Auden - Funeral Blues I was a child and she was a child, In this kingdom by the sea: But we loved with a love that was more than love I and my Annabel Lee; With a love that the winged seraphs of heaven Laughed loud at her and me. Poem 4: EA Poe - Annabell Lee 13

16 Happy the man, and happy he alone, He who can call today his own: He who, secure within, can say, Tomorrow do thy worst, for I have lived today. Be fair or foul or rain or shine The joys I have possessed, in spite of fate, are mine. Not Heaven itself upon the past has power, But what has been, has been, and I have had my hour. Poem 5: John Dryden - Happy the Man So I would have had him leave, So I would have had her stand and grieve, So he would have left As the soul leaves the body torn and bruised, As the mind deserts the body it has used. I should find Some way incomparably light and deft, Some way we both should understand, Simple and faithless as a smile and a shake of the hand. Poem 6: TS Elliot - The Weeping Girl LOVE Poem 7 HATE Poem 8 14

17 B Table of the Results participants P1 P2 P3 P4 P5 P6 P7 Q ,29 Q ,14 Q ,71 Q ,29 Q ,57 Q ,43 Q ,57 Q ,57 Q ,14 Q ,43 Q ,29 Q ,71 Q ,86 Q ,14 Q ,57 Total Correct per Expert Average Correct per question Table 1: Results of the experiment performed with seven participants. 15

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

arxiv: v3 [cs.sd] 14 Jul 2017

arxiv: v3 [cs.sd] 14 Jul 2017 Music Generation with Variational Recurrent Autoencoder Supported by History Alexey Tikhonov 1 and Ivan P. Yamshchikov 2 1 Yandex, Berlin altsoph@gmail.com 2 Max Planck Institute for Mathematics in the

More information

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Introduction Brandon Richardson December 16, 2011 Research preformed from the last 5 years has shown that the

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Modeling Musical Context Using Word2vec

Modeling Musical Context Using Word2vec Modeling Musical Context Using Word2vec D. Herremans 1 and C.-H. Chuan 2 1 Queen Mary University of London, London, UK 2 University of North Florida, Jacksonville, USA We present a semantic vector space

More information

Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks with a Novel Image-Based Representation

Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks with a Novel Image-Based Representation INTRODUCTION Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks with a Novel Image-Based Representation Ching-Hua Chuan 1, 2 1 University of North Florida 2 University of Miami

More information

arxiv: v2 [cs.sd] 15 Jun 2017

arxiv: v2 [cs.sd] 15 Jun 2017 Learning and Evaluating Musical Features with Deep Autoencoders Mason Bretan Georgia Tech Atlanta, GA Sageev Oore, Douglas Eck, Larry Heck Google Research Mountain View, CA arxiv:1706.04486v2 [cs.sd] 15

More information

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS Hyungui Lim 1,2, Seungyeon Rhyu 1 and Kyogu Lee 1,2 3 Music and Audio Research Group, Graduate School of Convergence Science and Technology 4

More information

Real-valued parametric conditioning of an RNN for interactive sound synthesis

Real-valued parametric conditioning of an RNN for interactive sound synthesis Real-valued parametric conditioning of an RNN for interactive sound synthesis Lonce Wyse Communications and New Media Department National University of Singapore Singapore lonce.acad@zwhome.org Abstract

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Deep learning for music data processing

Deep learning for music data processing Deep learning for music data processing A personal (re)view of the state-of-the-art Jordi Pons www.jordipons.me Music Technology Group, DTIC, Universitat Pompeu Fabra, Barcelona. 31st January 2017 Jordi

More information

arxiv: v1 [cs.sd] 17 Dec 2018

arxiv: v1 [cs.sd] 17 Dec 2018 Learning to Generate Music with BachProp Florian Colombo School of Computer Science and School of Life Sciences École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland florian.colombo@epfl.ch arxiv:1812.06669v1

More information

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler

More information

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS First Author Affiliation1 author1@ismir.edu Second Author Retain these fake authors in submission to preserve the formatting Third

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING Adrien Ycart and Emmanouil Benetos Centre for Digital Music, Queen Mary University of London, UK {a.ycart, emmanouil.benetos}@qmul.ac.uk

More information

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Background Abstract I attempted a solution at using machine learning to compose music given a large corpus

More information

Using Variational Autoencoders to Learn Variations in Data

Using Variational Autoencoders to Learn Variations in Data Using Variational Autoencoders to Learn Variations in Data By Dr. Ethan M. Rudd and Cody Wild Often, we would like to be able to model probability distributions of high-dimensional data points that represent

More information

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cv] 16 Jul 2017 OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS Eelco van der Wel University of Amsterdam eelcovdw@gmail.com Karen Ullrich University of Amsterdam karen.ullrich@uva.nl arxiv:1707.04877v1

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Automated sound generation based on image colour spectrum with using the recurrent neural network

Automated sound generation based on image colour spectrum with using the recurrent neural network Automated sound generation based on image colour spectrum with using the recurrent neural network N A Nikitin 1, V L Rozaliev 1, Yu A Orlova 1 and A V Alekseev 1 1 Volgograd State Technical University,

More information

The Human Features of Music.

The Human Features of Music. The Human Features of Music. Bachelor Thesis Artificial Intelligence, Social Studies, Radboud University Nijmegen Chris Kemper, s4359410 Supervisor: Makiko Sadakata Artificial Intelligence, Social Studies,

More information

Humor recognition using deep learning

Humor recognition using deep learning Humor recognition using deep learning Peng-Yu Chen National Tsing Hua University Hsinchu, Taiwan pengyu@nlplab.cc Von-Wun Soo National Tsing Hua University Hsinchu, Taiwan soo@cs.nthu.edu.tw Abstract Humor

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors * Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors * David Ortega-Pacheco and Hiram Calvo Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan

More information

Towards End-to-End Raw Audio Music Synthesis

Towards End-to-End Raw Audio Music Synthesis To be published in: Proceedings of the 27th Conference on Artificial Neural Networks (ICANN), Rhodes, Greece, 2018. (Author s Preprint) Towards End-to-End Raw Audio Music Synthesis Manfred Eppe, Tayfun

More information

A Unit Selection Methodology for Music Generation Using Deep Neural Networks

A Unit Selection Methodology for Music Generation Using Deep Neural Networks A Unit Selection Methodology for Music Generation Using Deep Neural Networks Mason Bretan Georgia Institute of Technology Atlanta, GA Gil Weinberg Georgia Institute of Technology Atlanta, GA Larry Heck

More information

Generating Chinese Classical Poems Based on Images

Generating Chinese Classical Poems Based on Images , March 14-16, 2018, Hong Kong Generating Chinese Classical Poems Based on Images Xiaoyu Wang, Xian Zhong, Lin Li 1 Abstract With the development of the artificial intelligence technology, Chinese classical

More information

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network Indiana Undergraduate Journal of Cognitive Science 1 (2006) 3-14 Copyright 2006 IUJCS. All rights reserved Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network Rob Meyerson Cognitive

More information

Image-to-Markup Generation with Coarse-to-Fine Attention

Image-to-Markup Generation with Coarse-to-Fine Attention Image-to-Markup Generation with Coarse-to-Fine Attention Presenter: Ceyer Wakilpoor Yuntian Deng 1 Anssi Kanervisto 2 Alexander M. Rush 1 Harvard University 3 University of Eastern Finland ICML, 2017 Yuntian

More information

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik Discriminative and Generative Models for Image-Language Understanding Svetlana Lazebnik Image-language understanding Robot, take the pan off the stove! Discriminative image-language tasks Image-sentence

More information

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input.

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input. RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input. Joseph Weel 10321624 Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige

More information

Algorithmic Composition of Melodies with Deep Recurrent Neural Networks

Algorithmic Composition of Melodies with Deep Recurrent Neural Networks Algorithmic Composition of Melodies with Deep Recurrent Neural Networks Florian Colombo, Samuel P. Muscinelli, Alexander Seeholzer, Johanni Brea and Wulfram Gerstner Laboratory of Computational Neurosciences.

More information

DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison

DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison DataStories at SemEval-07 Task 6: Siamese LSTM with Attention for Humorous Text Comparison Christos Baziotis, Nikos Pelekis, Christos Doulkeridis University of Piraeus - Data Science Lab Piraeus, Greece

More information

Music Understanding and the Future of Music

Music Understanding and the Future of Music Music Understanding and the Future of Music Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University Why Computers and Music? Music in every human society! Computers

More information

arxiv: v1 [cs.sd] 8 Jun 2016

arxiv: v1 [cs.sd] 8 Jun 2016 Symbolic Music Data Version 1. arxiv:1.5v1 [cs.sd] 8 Jun 1 Christian Walder CSIRO Data1 7 London Circuit, Canberra,, Australia. christian.walder@data1.csiro.au June 9, 1 Abstract In this document, we introduce

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Algorithmic Music Composition

Algorithmic Music Composition Algorithmic Music Composition MUS-15 Jan Dreier July 6, 2015 1 Introduction The goal of algorithmic music composition is to automate the process of creating music. One wants to create pleasant music without

More information

Modeling memory for melodies

Modeling memory for melodies Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University

More information

The Accuracy of Recurrent Neural Networks for Lyric Generation. Josue Espinosa Godinez ID

The Accuracy of Recurrent Neural Networks for Lyric Generation. Josue Espinosa Godinez ID The Accuracy of Recurrent Neural Networks for Lyric Generation Josue Espinosa Godinez ID 814109824 Department of Computer Science The University of Auckland Supervisors: Dr. Gillian Dobbie & Dr. David

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

A Discriminative Approach to Topic-based Citation Recommendation

A Discriminative Approach to Topic-based Citation Recommendation A Discriminative Approach to Topic-based Citation Recommendation Jie Tang and Jing Zhang Department of Computer Science and Technology, Tsinghua University, Beijing, 100084. China jietang@tsinghua.edu.cn,zhangjing@keg.cs.tsinghua.edu.cn

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

Audio: Generation & Extraction. Charu Jaiswal

Audio: Generation & Extraction. Charu Jaiswal Audio: Generation & Extraction Charu Jaiswal Music Composition which approach? Feed forward NN can t store information about past (or keep track of position in song) RNN as a single step predictor struggle

More information

Talking Drums: Generating drum grooves with neural networks

Talking Drums: Generating drum grooves with neural networks Talking Drums: Generating drum grooves with neural networks P. Hutchings 1 1 Monash University, Melbourne, Australia arxiv:1706.09558v1 [cs.sd] 29 Jun 2017 Presented is a method of generating a full drum

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Deep Jammer: A Music Generation Model

Deep Jammer: A Music Generation Model Deep Jammer: A Music Generation Model Justin Svegliato and Sam Witty College of Information and Computer Sciences University of Massachusetts Amherst, MA 01003, USA {jsvegliato,switty}@cs.umass.edu Abstract

More information

Building a Better Bach with Markov Chains

Building a Better Bach with Markov Chains Building a Better Bach with Markov Chains CS701 Implementation Project, Timothy Crocker December 18, 2015 1 Abstract For my implementation project, I explored the field of algorithmic music composition

More information

Musical Creativity. Jukka Toivanen Introduction to Computational Creativity Dept. of Computer Science University of Helsinki

Musical Creativity. Jukka Toivanen Introduction to Computational Creativity Dept. of Computer Science University of Helsinki Musical Creativity Jukka Toivanen Introduction to Computational Creativity Dept. of Computer Science University of Helsinki Basic Terminology Melody = linear succession of musical tones that the listener

More information

Neural Network Predicating Movie Box Office Performance

Neural Network Predicating Movie Box Office Performance Neural Network Predicating Movie Box Office Performance Alex Larson ECE 539 Fall 2013 Abstract The movie industry is a large part of modern day culture. With the rise of websites like Netflix, where people

More information

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University danny1@stanford.edu 1. Motivation and Goal Music has long been a way for people to express their emotions. And because we all have a

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Music genre classification using a hierarchical long short term memory (LSTM) model

Music genre classification using a hierarchical long short term memory (LSTM) model Chun Pui Tang, Ka Long Chui, Ying Kin Yu, Zhiliang Zeng, Kin Hong Wong, "Music Genre classification using a hierarchical Long Short Term Memory (LSTM) model", International Workshop on Pattern Recognition

More information

DISTRIBUTION STATEMENT A 7001Ö

DISTRIBUTION STATEMENT A 7001Ö Serial Number 09/678.881 Filing Date 4 October 2000 Inventor Robert C. Higgins NOTICE The above identified patent application is available for licensing. Requests for information should be addressed to:

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

arxiv: v1 [cs.sd] 12 Dec 2016

arxiv: v1 [cs.sd] 12 Dec 2016 A Unit Selection Methodology for Music Generation Using Deep Neural Networks Mason Bretan Georgia Tech Atlanta, GA Gil Weinberg Georgia Tech Atlanta, GA Larry Heck Google Research Mountain View, CA arxiv:1612.03789v1

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Research Projects. Measuring music similarity and recommending music. Douglas Eck Research Statement 2

Research Projects. Measuring music similarity and recommending music. Douglas Eck Research Statement 2 Research Statement Douglas Eck Assistant Professor University of Montreal Department of Computer Science Montreal, QC, Canada Overview and Background Since 2003 I have been an assistant professor in the

More information

Algorithmic Music Composition using Recurrent Neural Networking

Algorithmic Music Composition using Recurrent Neural Networking Algorithmic Music Composition using Recurrent Neural Networking Kai-Chieh Huang kaichieh@stanford.edu Dept. of Electrical Engineering Quinlan Jung quinlanj@stanford.edu Dept. of Computer Science Jennifer

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

Sudhanshu Gautam *1, Sarita Soni 2. M-Tech Computer Science, BBAU Central University, Lucknow, Uttar Pradesh, India

Sudhanshu Gautam *1, Sarita Soni 2. M-Tech Computer Science, BBAU Central University, Lucknow, Uttar Pradesh, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Artificial Intelligence Techniques for Music Composition

More information

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL

More information

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About

More information

SentiMozart: Music Generation based on Emotions

SentiMozart: Music Generation based on Emotions SentiMozart: Music Generation based on Emotions Rishi Madhok 1,, Shivali Goel 2, and Shweta Garg 1, 1 Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India 2

More information

The Sparsity of Simple Recurrent Networks in Musical Structure Learning

The Sparsity of Simple Recurrent Networks in Musical Structure Learning The Sparsity of Simple Recurrent Networks in Musical Structure Learning Kat R. Agres (kra9@cornell.edu) Department of Psychology, Cornell University, 211 Uris Hall Ithaca, NY 14853 USA Jordan E. DeLong

More information

Various Artificial Intelligence Techniques For Automated Melody Generation

Various Artificial Intelligence Techniques For Automated Melody Generation Various Artificial Intelligence Techniques For Automated Melody Generation Nikahat Kazi Computer Engineering Department, Thadomal Shahani Engineering College, Mumbai, India Shalini Bhatia Assistant Professor,

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Finding Sarcasm in Reddit Postings: A Deep Learning Approach Finding Sarcasm in Reddit Postings: A Deep Learning Approach Nick Guo, Ruchir Shah {nickguo, ruchirfs}@stanford.edu Abstract We use the recently published Self-Annotated Reddit Corpus (SARC) with a recurrent

More information

Deep Recurrent Music Writer: Memory-enhanced Variational Autoencoder-based Musical Score Composition and an Objective Measure

Deep Recurrent Music Writer: Memory-enhanced Variational Autoencoder-based Musical Score Composition and an Objective Measure Deep Recurrent Music Writer: Memory-enhanced Variational Autoencoder-based Musical Score Composition and an Objective Measure Romain Sabathé, Eduardo Coutinho, and Björn Schuller Department of Computing,

More information

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

A Fast Alignment Scheme for Automatic OCR Evaluation of Books A Fast Alignment Scheme for Automatic OCR Evaluation of Books Ismet Zeki Yalniz, R. Manmatha Multimedia Indexing and Retrieval Group Dept. of Computer Science, University of Massachusetts Amherst, MA,

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Symbolic Music Representations George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 30 Table of Contents I 1 Western Common Music Notation 2 Digital Formats

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Melody classification using patterns

Melody classification using patterns Melody classification using patterns Darrell Conklin Department of Computing City University London United Kingdom conklin@city.ac.uk Abstract. A new method for symbolic music classification is proposed,

More information

Rewind: A Music Transcription Method

Rewind: A Music Transcription Method University of Nevada, Reno Rewind: A Music Transcription Method A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering by

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Montserrat Puiggròs, Emilia Gómez, Rafael Ramírez, Xavier Serra Music technology Group Universitat Pompeu Fabra

More information

An AI Approach to Automatic Natural Music Transcription

An AI Approach to Automatic Natural Music Transcription An AI Approach to Automatic Natural Music Transcription Michael Bereket Stanford University Stanford, CA mbereket@stanford.edu Karey Shi Stanford Univeristy Stanford, CA kareyshi@stanford.edu Abstract

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

Using Deep Learning to Annotate Karaoke Songs

Using Deep Learning to Annotate Karaoke Songs Distributed Computing Using Deep Learning to Annotate Karaoke Songs Semester Thesis Juliette Faille faillej@student.ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH

More information

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies Judy Franklin Computer Science Department Smith College Northampton, MA 01063 Abstract Recurrent (neural) networks have

More information

Less is More: Picking Informative Frames for Video Captioning

Less is More: Picking Informative Frames for Video Captioning Less is More: Picking Informative Frames for Video Captioning ECCV 2018 Yangyu Chen 1, Shuhui Wang 2, Weigang Zhang 3 and Qingming Huang 1,2 1 University of Chinese Academy of Science, Beijing, 100049,

More information

arxiv: v1 [cs.ir] 20 Mar 2019

arxiv: v1 [cs.ir] 20 Mar 2019 Distributed Vector Representations of Folksong Motifs Aitor Arronte Alvarez 1 and Francisco Gómez-Martin 2 arxiv:1903.08756v1 [cs.ir] 20 Mar 2019 1 Center for Language and Technology, University of Hawaii

More information

arxiv: v1 [cs.sd] 21 May 2018

arxiv: v1 [cs.sd] 21 May 2018 A Universal Music Translation Network Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman Facebook AI Research arxiv:1805.07848v1 [cs.sd] 21 May 2018 Abstract We present a method for translating music across

More information

Shimon the Robot Film Composer and DeepScore

Shimon the Robot Film Composer and DeepScore Shimon the Robot Film Composer and DeepScore Richard Savery and Gil Weinberg Georgia Institute of Technology {rsavery3, gilw} @gatech.edu Abstract. Composing for a film requires developing an understanding

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

arxiv: v1 [cs.sd] 9 Dec 2017

arxiv: v1 [cs.sd] 9 Dec 2017 Music Generation by Deep Learning Challenges and Directions Jean-Pierre Briot François Pachet Sorbonne Universités, UPMC Univ Paris 06, CNRS, LIP6, Paris, France Jean-Pierre.Briot@lip6.fr Spotify Creator

More information

An Introduction to Deep Image Aesthetics

An Introduction to Deep Image Aesthetics Seminar in Laboratory of Visual Intelligence and Pattern Analysis (VIPA) An Introduction to Deep Image Aesthetics Yongcheng Jing College of Computer Science and Technology Zhejiang University Zhenchuan

More information

QUALITY OF COMPUTER MUSIC USING MIDI LANGUAGE FOR DIGITAL MUSIC ARRANGEMENT

QUALITY OF COMPUTER MUSIC USING MIDI LANGUAGE FOR DIGITAL MUSIC ARRANGEMENT QUALITY OF COMPUTER MUSIC USING MIDI LANGUAGE FOR DIGITAL MUSIC ARRANGEMENT Pandan Pareanom Purwacandra 1, Ferry Wahyu Wibowo 2 Informatics Engineering, STMIK AMIKOM Yogyakarta 1 pandanharmony@gmail.com,

More information

Sequence generation and classification with VAEs and RNNs

Sequence generation and classification with VAEs and RNNs Jay Hennig 1 * Akash Umakantha 1 * Ryan Williamson 1 * 1. Introduction Variational autoencoders (VAEs) (Kingma & Welling, 2013) are a popular approach for performing unsupervised learning that can also

More information

Repeating and mistranslating: the associations of GANs in an art context

Repeating and mistranslating: the associations of GANs in an art context Repeating and mistranslating: the associations of GANs in an art context Anna Ridler Artist London anna.ridler@network.rca.ac.uk Abstract Briefly considering the lack of language to talk about GAN generated

More information

Music Similarity and Cover Song Identification: The Case of Jazz

Music Similarity and Cover Song Identification: The Case of Jazz Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary

More information

Experiments on musical instrument separation using multiplecause

Experiments on musical instrument separation using multiplecause Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk

More information

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS Mutian Fu 1 Guangyu Xia 2 Roger Dannenberg 2 Larry Wasserman 2 1 School of Music, Carnegie Mellon University, USA 2 School of Computer

More information

CREATING all forms of art [1], [2], [3], [4], including

CREATING all forms of art [1], [2], [3], [4], including Grammar Argumented LSTM Neural Networks with Note-Level Encoding for Music Composition Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, and Xiao Zhang 1 arxiv:1611.05416v1 [cs.lg]

More information

Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis

Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis 1 Introduction In this work we propose a music genre classification method that directly analyzes the structure

More information