Talking Drums: Generating drum grooves with neural networks

Similar documents
Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis

An AI Approach to Automatic Natural Music Transcription

arxiv: v1 [cs.sd] 18 Dec 2018

CREATING all forms of art [1], [2], [3], [4], including

Neural Aesthetic Image Reviewer

arxiv: v1 [cs.lg] 15 Jun 2016

FILL. BOOK Contents. Preface Contents... 4

Lets go through the chart together step by step looking at each bit and understanding what the Chart is asking us to do.

Coming Soon! New Latin Styles. by Marc Dicciani

5-Note Phrases and Rhythmic Tension 2017, Marc Dicciani (written for Modern Drummer Magazine)

Music genre classification using a hierarchical long short term memory (LSTM) model

arxiv: v3 [cs.sd] 14 Jul 2017

A Unit Selection Methodology for Music Generation Using Deep Neural Networks

FREE music lessons from Berklee College of Music

drumlearn ebooks Fast Groove Builder by Karl Price

Mambo Jumbo and All That Jazz: A Multicultural Approach to Teaching Jazz Ensembles

Concise Guide to Jazz

arxiv: v2 [cs.sd] 15 Jun 2017

ST CECILIA DRUMKIT SYLLABUS

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

I) Blake - Introduction. For example, consider the following beat.

GENERATING NONTRIVIAL MELODIES FOR MUSIC AS A SERVICE

metal Fatigue Performance notes

arxiv: v1 [cs.ir] 16 Jan 2019

Setting up your Roland V-Drums with Melodics.

An Agent-based System for Robotic Musical Performance

Modeling Musical Context Using Word2vec

Sub Kick This particular miking trick is one that can be used to bring great low-end presence to the kick drum.

Rhythmic Dissonance: Introduction

Hip Hop Robot. Semester Project. Cheng Zu. Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich

transcends any direct musical culture. 1 Then there are bands, like would be Reunion from the Live at Blue Note Tokyo recording 2.

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

The MPC X & MPC Live Bible 1

LSTM Neural Style Transfer in Music Using Computational Musicology

Drum Set. For the School Jazz Ensemble. Jim Catalano

This library was designed to make song writing as easy as possible! The loops are arranged into the following sections:

Design considerations for technology to support music improvisation

JazzGAN: Improvising with Generative Adversarial Networks

FreeDrumLessons.com Live. Punk Drumming. Lesson #13. Sheet Music Included. With Jared Falk & Dave Atkinson. Overview by Hugo Janado

Essential Drum Skills Course Level 1 Extension Activity Workbook

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

Music Composition with RNN

Polymetric Rhythmic Feel for a Cognitive Drum Computer

Image-to-Markup Generation with Coarse-to-Fine Attention

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

USER GUIDE V 1.6 ROLLERCHIMP DrumStudio User Guide page 1

001 Overview 3. Introduction 3 The Kit 3 The Recording Chain Technical Details 6

Frankenstein: a Framework for musical improvisation. Davide Morelli

Deep learning for music data processing

Musical Creativity. Jukka Toivanen Introduction to Computational Creativity Dept. of Computer Science University of Helsinki

This is why when you come close to dance music being played, the first thing that you hear is the boom-boom-boom of the kick drum.

Algorithmic Composition of Melodies with Deep Recurrent Neural Networks

gresearch Focus Cognitive Sciences

Preview Only. Legal Use Requires Purchase. The Wayfaring Stranger. TRADITIONAL Arranged by MIKE COLLINS-DOWDEN INSTRUMENTATION

arxiv: v1 [cs.sd] 8 Jun 2016

OPEN-HANDED PLAYING VOL. I

Play the KR like a piano

Automatic Generation of Drum Performance Based on the MIDI Code

THE LATIN RUDIMENTS CHUCK SILVERMAN USING BASIC RUDIMENTS TO EXPLORE THE RHYTHMS OF CUBAN AND BRAZILIAN MUSIC

Survival Guide For The Modern Drummer: A Crash Course In All Musical Styles For Drumset (Book & 2 CDs) By Jim Riley

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

By Jack Bennett Icanplaydrums.com DVD 12 JAZZ BASICS

arxiv: v1 [cs.sd] 12 Dec 2016

GimmeDaBlues: An Intelligent Jazz/Blues Player And Comping Generator for ios devices

Various Artificial Intelligence Techniques For Automated Melody Generation

CONTENTS

Evolutionary Computation Applied to Melody Generation

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

Autumn. A: Plan, develop and deliver a music product B: Promote a music product C: Review the management of a music product

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

timing Correction Chapter 2 IntroductIon to timing correction

SAMPLE ASSESSMENT TASKS MUSIC CONTEMPORARY ATAR YEAR 12

Is there a Future for AI without Representation?

2011 Music Performance GA 3: Aural and written examination

GCSE. Music. CCEA GCSE Specimen Assessment Materials for

TOWARDS A GENERATIVE ELECTRONICA: HUMAN-INFORMED MACHINE TRANSCRIPTION AND ANALYSIS IN MAXMSP

IMPROVED ONSET DETECTION FOR TRADITIONAL IRISH FLUTE RECORDINGS USING CONVOLUTIONAL NEURAL NETWORKS

2017 VCE Music Performance performance examination report

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Computers Composing Music: An Artistic Utilization of Hidden Markov Models for Music Composition

Reading Answer Booklet Heart Beat

By Jack Bennett Icanplaydrums.com DVD 14 LATIN STYLES 1

La Salle University. I. Listening Answer the following questions about the various works we have listened to in the course so far.

Considering Vertical and Horizontal Context in Corpus-based Generative Electronic Dance Music

DRUMS. Free Choice Piece DISCOVER MORE. Graded Music Exam: General Information 1

Rhythm Sticks CD Teacher Notes

The Complete Guide to Music Technology using Cubase Sample Chapter

Reference Manual. Manual Development Group 2017 Yamaha Corporation Published 11/2017 PO-B0 v1.10

USC Thornton School of Music

AN INTRODUCTION TO PERCUSSION ENSEMBLE DRUM TALK

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input.

MUSICAL DATA SHEET. Copyright 2008 Blastwave FX, LLC All Rights Reserved. Blastwave FX, LLC.

The Accuracy of Recurrent Neural Networks for Lyric Generation. Josue Espinosa Godinez ID

Singing voice synthesis based on deep neural networks

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

MUSIC NEWS M A S S A C H U S E T T S INSIDE: ... and more! Lessons from the Delta. Singing with Children. It s All About Rhythm.

Cross Rhythms Using Stone

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

2016 HSC Music 1 Aural Skills Marking Guidelines Written Examination

2015 VCE Music Performance performance examination report

Transcription:

Talking Drums: Generating drum grooves with neural networks P. Hutchings 1 1 Monash University, Melbourne, Australia arxiv:1706.09558v1 [cs.sd] 29 Jun 2017 Presented is a method of generating a full drum kit part for a provided kick-drum sequence. A sequence to sequence neural network model used in natural language translation was adopted to encode multiple musical styles and an online survey was developed to test different techniques for sampling the output of the softmax function. The strongest results were found using a sampling technique that drew from the three most probable outputs at each subdivision of the drum pattern but the consistency of output was found to be heavily dependent on style. Keywords: RNN, percussion, generative music, translation 1 Introduction This research details the development of a percussion-role agent as part of a larger project where virtual, self-rating agents with different musical roles work in a process of co-agency to generate music compositions in real-time [Hutchings and McCormack, 2017]. The percussionrole agent was developed for generating multiple possible multi-instrument percussion parts to accompany provided melodies and harmonies in real-time. A neural network based agent was developed to incorporate a range of different music styles from a large corpus of compositions and to utilise a softmax function as part of the self-rating process. A network architecture used in natural language translation was adopted based on the idea that a percussion score could be considered as containing multiple drums speaking different languages but saying the same thing at the same time. The network was trained on a collection of drum kit scores from over 250 pop, rock, funk and Afro-Cuban style compositions and patterns from drum technique books. The output of the network was evaluated from an online survey and a physical interface was developed for feeding kick-drum parts into the network. pehut2@student.monash.edu

1.1 Related work Markov models [Hawryshkewich et al., 2010] [Tidemann and Demiris, 2008], generative grammars [Bell and Kippen, 1992] and neural network models Choi et al. [2016] have all been shown to be effective in the area of drum score generation. The approach shown in this paper is based on the requirements of generating an agent for a multi-agent composition system. Research in this area has demonstrated the need for agent models to match the needs of the overall system [Eigenfeldt and Pasquier, 2009]. The similarities and differences between music and natural language have been explored in detail [Patel, 2003] [Mithen, 2011]. While distinct differences exist in terms of cognitive processing, semantics and cultural function, there are similarities in the structure of phrases that have lead to the use of natural language processing techniques in the analysis and generation of music. 1.2 Translation model Generating a full drum kit score based on the rhythm of one or more individual instruments in the kit is a problem with different challenges than natural language translation. All translations are one to one in word count. Music is a non-semantic form of communication which allows for and values greater structural variation than spoken language so imperfect translations can still be effective. Conversely because there is no perfect translation, there are many different outputs for a given input in the training data, decreasing convergence during training. The problem can also be viewed as one of data-expansion as a single instrument part is expanded to fill a full drum kit with multiple concurrent instruments being used. To take advantage of these strengths and diminish the weaknesses of a translation based neural network model a new syntax for expressing drum parts was developed. 2 Method 2.1 Data preprocessing A collection of 250 drum kit scores in 4/4 were found on drum tablature websites and books and parsed into a music-xml format. Tracks were selected based on the most viewed webpages for rock, pop, funk and Afro-Cuban styles of music and were each checked for accuracy by comparing with the original recordings by ear. Pop, rock and funk styles were selected due to their global popularity and typical use of a standard drum kit. The Afro-Cuban style was added to this list to see if some of the stricter idiomatic structures of the style, such as the clave rhythmic pattern, could be preserved. Afro-Cuban and funk drum tablatures were more difficult to find so the tablatures were augmented with patterns from drum technique instruction books. For each genre a total of 7000-7500 bars were parsed. Each bar was divided into 48 subdivisions, allowing all triplet and tuple divisions down to the resolution of semiquaver triplets to be represented. Each division was given a word token that represented the drums being hit on that subdivision and barlines were replaced with a word token describing the musical style which allowed multiple styles to be encoded in a single RNN network. The tokenised phrase in Equation 1 represents a kick-drum being kicked on each beat of a

single 4/4 bar and a pop style description. pop K o o o o o o o o o o o K o o o o o o o o o o o K o o o o o o o o o o o K o o o o o o o o o o o (1) The full list of letter representations used to create word tokens are presented in Table 1. Composition segments of 4 bars were used as sentences for training the neural network with kick-drum patterns used as inputs to the encoder layer and the rest of the drum parts in the decoder layer. Encoder input sequences were reversed and encoded using one-hot encoding. The kick-drum was selected as the input language because it is usually used to mark the beat of a composition and small changes can dramatically affect the feeling of time. Table 1: Letter representations of drums Drum Cymbal Hi-hat Snare High Tom Tom Floor Tom Kick None Letter C H S T t F K o 2.2 Network architecture The neural network has an RNN sequence-to-sequence architecture [Sutskever et al., 2014] using the Tensorflow deep-learning framework [Abadi et al., 2015]. A model layer of size 128 and 3 layers produced a perplexity of 1.15 when trained with a learning rate of 0.55 and a gradient descent optimiser. This was the lowest perplexity achieved from a manual testing of variations to these hyper-parameters. Hidden states were initialised with all zero values and updated at each step of training. 3 Evaluation An online survey was generated to find a sampling technique that human listeners found preferable. The survey was advertised on social media groups related to drumming and computer music and run for two weeks. 3.1 Survey Participants were presented with a style menu and a 48 step sequence with an editable kickdrum line that they could use to design a four beat kick-drum pattern as seen in Fig. 1. After clicking a Generate Groove button on the interface, the other instrument parts would be generated and a loop of the pattern would begin playing with sounds sampled from drum kits. Participants were then asked to rate the groove as poor, average or good. The survey was designed to encourage a fast and playful experience, so demographic data was not asked or collected. Each time a groove was generated the web application ran the input through the neural network and randomly selected a sampling method. Three sampling methods were tested: A greedy decoder (Method 1), a roulette-wheel sampler across all probabilities (Method 2) and a roulette-wheel sampler of the three most probably tokens at each subdivision (Method 3).

Figure 1: Interface for the online evaluation survey 3.2 Results A total of 1278 groove evaluations were recorded in the survey. Table 2: Survey results for different sampling methods Raw Normalised Good Average Poor Good Average Poor Method 1 91 276 30 0.23 0.70 0.08 Method 2 100 217 125 0.23 0.49 0.28 Method 3 172 183 84 0.39 0.42 0.19 As shown in Table 2 the model produced full drum-kick patterns that were deemed to be average or good in a majority of ratings on the web survey. Of the three sampling methods it can be observed that the greedy encoder had a tendency towards results that participants deemed average. The roulette wheel sampling used in Method 2 had the highest rate of poor ratings. Overall the best performer was the sampler that drew from the three most probable tokens at each subdivision. Examples of 5 drum patterns for each sampling method are available to listen to at https://doi.org/10.6084/m9.figshare.4903181.v1. Table 3: Mean rating for mean initial probabilities of selected notes. Poor =0, average =1, good = 2 Mean probability 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 Mean rating 0.25 0.27 0.58 1.14 1.32 1.54 1.22 4 Discussion and future work The ratings in Table 3 peaked when the average probability was between 0.7-0.8, below the maximum observed bracket of 0.8-0.9. This may be a result of participants valuing familiar but different drum patterns over patterns that they may have heard in songs they know. The significantly higher rating of one band of probability range supports the use of the model in the intended application of a multi-agent system as it provides a means of self-rating output. Mean

ratings of Afro-Cuban style patterns were significantly lower (24% poor) than for other styles (16-18% poor) which may be the result of stylistic bias of the participants or could suggest important elements of the style are not represented in the model output. A syntax for expressing desired accents is being developed as an encoder to expand the pallet and may improve results in the Afro-Cuban and other styles. A physical drum-pedal interface has been developed to test the system with drummers in a natural playing position. References Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org. Bernard Bell and Jim Kippen. Bol processor grammars, understanding music with ai: perspectives on music cognition, 1992. Keunwoo Choi, George Fazekas, and Mark B. Sandler. Text-based LSTM networks for automatic music composition. CoRR, abs/1604.05358, 2016. URL http://arxiv.org/abs/ 1604.05358. Arne Eigenfeldt and Philippe Pasquier. A realtime generative music system using autonomous melody, harmony, and rhythm agents. In XIII Internationale Conference on Generative Arts, Milan, Italy, 2009. Andrew Hawryshkewich, Philippe Pasquier, and Arne Eigenfeldt. Beatback: A real-time interactive percussion system for rhythmic practise and exploration. In NIME, pages 100 105, 2010. Patrick Hutchings and Jon McCormack. Using autonomous agents to improvise music compositions in real-time. In International Conference on Evolutionary and Biologically Inspired Music and Art, pages 114 127. Springer, 2017. S. Mithen. The Singing Neanderthals: The Origins of Music, Language, Mind and Body. Orion, 2011. ISBN 9781780222585. URL https://books.google.com.au/books?id=3ap0tkerd_ wc. Aniruddh D Patel. Language, music, syntax and the brain. Nature neuroscience, 6(7):674 681, 2003. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104 3112, 2014. Axel Tidemann and Yiannis Demiris. A drum machine that learns to groove. In Annual Conference on Artificial Intelligence, pages 144 151. Springer, 2008.