AutoChorale An Automatic Music Generator. Jack Mi, Zhengtao Jin

AutoChorale An Automatic Music Generator Jack Mi, Zhengtao Jin 1 Introduction Music is a fascinating form of human expression based on a complex system. Being able to automatically compose music that both carry aesthetic value and comply with the formal constraints of composing shed new light on the media and presents an interesting challenge. In this project, we are interested in how machine can mimic the process of a music student learning elementary composition, and generate music automatically. 2 Task Definition and Evaluation Our system takes in a learning set consisting of files of music and user inputs specifying aspects of the composition (tempo, approximate length, key, etc.), and it outputs a group of music based on user inputs. Learning set and all output are MIDI files. This decision is made because of the current accessibility of midi file on the web and also the readily written packages in Python that deals with the binary level reading and writing of it. for the sake of simplicity and clarity, both the learning set and the output are limited to the scope of chorale music. We choose Bach chorales [1] to learn from, because they are thought to be the standard and the high of all chorales, they are accessible, there are a decent number of them, and their chord progressions are standard and relatively straightforward. For evaluation of our result, we decide to mainly rely on human evaluations. There are two reasons: 1, there is currently no accessible software that can accurately analyze how good a piece is, and writing one is as difficult as writing our composition machine. 2, The audience of music, as an art, should be humans, so human should be the ultimate judge. We have two steps of evaluation: 1, we send some audio samples onto the web, and invite our friends (most of whom are not music specialists) and music specialists (music major students and faculty) to do surveys about how they like the pieces. 2, we invited music students to analyze some score of ours, and grade the composition from a music student s standard. 3 Infrastructure The purpose of the whole project is to mimic the entire process of a music student learning music. The point of our infrastructure is to translate music information to machine and transfer the machine output to audible music. The whole infrastructure almost from scratch. First, we need to be able to first read and parse out useful information from midi files such is the use of the analyzer class. Then, after the program finishes composing, we need to transform the notes into a midi file so that our composition can be readily audible or imported to score writing software analysis. 4 Approaches

We divide our task into four steps: 1, read and parse useful information from midi files. (chord analyzer) 2, generate a relatively abstract music structure (i.e. a chord progression with bassline). (learner and chord generator) 3, flesh out the abstract structure into detailed notes in each track. (layout) 4, write out the notes into a midi file. (output) 4.1 Reading Music from Midi Before we can learn from the music materials, we need to parse out important information from MIDI files. This is done in the following steps: i. Parsing Messages from Midi Files using Mido patch We choose the type of midi files because they are currently among the most accessible file types we can find online. For our task of learning to write chorales, we have found the full Bach chorales free midi collection online, and we intend to first use mainly BWV250~438 since they are relatively straight forward (unlike the more complicated chorales for cantatas and mass). For the sake of time, we do not want to parse the midi files by hands from binary level, so we choose to use a patch called Mido [2] which is already written in python. Mido does the work to read the data bytes and parse out the messages in each track. It reads the basic information such as the resolution rate ( ticks_per_beat ) of the music. It also reads in all tracks and all the notes, including meta messages such as key signature and time signature. However, because midi files are only tracks with piles of messages with individual time in them, there is no concept of chord or instantaneous notes; in other words, we have no idea what notes are being played in a specific time unless we count from the start for each track. Therefore, we have to do some work in order to line up these notes, so that we can analyze the chord progressions and do something interesting. ii. Reading Notes for Each Beat with Analyzer class

For this task, we will use the Analyzer class. Analyzer is a class we implemented from scratch which takes in a MidiFile instance defined by mido patch, and investigates the note relations with a global clock for all tracks. In order to read the notes, we start from time=0, and step a beat (which is ticks_per_beat time defined in resolution) each time. For each track, we keep a variable that record the accumulated time from all messages before. Each time we step, we first record all notes that are on and last from previous beat. Then while the track time is less than the global time, we record the next message and increment the time as well as the current index for this track. We keep doing this until all tracks reached end_of_track message or ran out of notes. After walking through the midi file for once, we will get a music score like structure that holds the information about which notes play on which beat. For each beat, we record the number of appearance for each note, and we also sort the notes to get the bass note. iii. Identifying Chords from a Patch of Notes In order to learn a general method of chord generation for all keys, we first do the conversion from note number to keyed scale degree (from 0 to 11 according to western standard). The task of identifying chord turns out to be more complicated than we originally thought. The difficulty comes with all the non-chord-tones and deviation in Bach s music. We are currently using the following algorithm: For each beat we have a dictionary. The key is the note name (e.g. C# no matter in what octive). The value is its weight, computed as the added times it appear in all tracks. Then we compute the total score about how well it fits on a possible chord (i.e. a chord with root appeared in the dictionary). For example, when we have C, D, E, F, G, A, C, G appearing in a beat. First we creates a dictionary: {C:2, D:1, E:1, F:1, G:2, A:1}. Then we compute a score for each possible chords: C major, C minor, C diminish, and other C seven chords, D major, D minor, etc. For score computation, we have two steps: 1, add the number of appearance as gain if the note is in the chord, and subtract it if the note is not in the chord. 2, penalize for every note that should be in the chord but doesn t appear in the patch. Taking the C major fit computation as example, step 1: C is in it, so we add 2, and D is not in it so we minus 1, etc.; step 2: every note in the chord appears in the patch, so we do not penalize. Now having two C and two G will indicate that the beat is more likely to be a C major chord instead of a D minor chord. Finally, we convert these chords into roman numerals in the key of the piece. For example, if we are in the key of G, a chord progression of Am-DM-GM will be converted to (2, minor) (7, major) (0, major). The first number represent the midi scale degree of the root (in key of G, G=0, G#=1, A=2, etc), and the second string represents the quality of the chord. We also append the bass note which we track from the beat to convey the inversion, so the final chord presentation for Am6-DM-GM will be ((2, minor), 5) ((7, major), 7) ((0, major), 0), with the first element as chord quality and second as bass note s scale degree. 4.2 Learning and Producing Chord Progression

This is the first key algorithm in our program. We involved knowledge about MDP that we learned from class and come up with our own version of the algorithm that suits the most for our project. In this section, we will describe our model and algorithm in detail. a. Learning: In learning chord progression, we altered n-gram in computational linguistics and use it for learning what chord might be a good fit following a bunch of chords. We create a nested dictionary to store what we have learnt. The outer dictionary has keys as a sequence of chords. If it s in the starting point, we will record the key as Start. The value will be a dictionary with the keys as all possible chords that the machine has seen following the key sequence (plus if it ends End ), and value as the times this chord does appear after the key sequence. For example, we have a progression of I-V-I-ii-V-I (this will be represented in code as [(0, major), (7, major), (0, major), (2, minor), (7, major), (0, major)], for simplicity I will use the normal Roman numeral for explanation, and note that we omit the inversion information, which make an actual tuple like ((0, major), 0) ). When we set the N of n-gram to be 2, we will get a nested dictionary: {(Start, Start): {I: 1}, (Start, I): {V: 1}, (I, V): {I: 1}, (V, I): {ii: 1, End: 1}, (I, ii): {V: 1}, (ii, V): {I: 1}} Since we have converted the actual chords into keyed midi scales, learning from midi files in different keys will not compromise the result. We tested on N=2,3,4,5, and figured out that N=4 is most controllable in length, mostly because Bach chorales tend to write in 4/4 a lot.

b. Using the Learned Result to Generate New Chord Progressions We will use a state-based model. The state is the N chords that we have generated before. For the next chord to generate, we look up in the outer dictionary for possible chords and how many times they appear in each situation. Then from the dictionary, we choose a chord with weighted random function. That will be the next chord we want. We keep doing this process until we reach an End. In order to produce composition results in reasonable length, we improved the basic algorithm with a user defined control of time. It takes in four parameters: left bound, right bound, cut bound and effort. When the number of beats is smaller than left bound, the program will make effort times of tries until it gets a non-end next chord or run out of number of trials. When the number of beats is larger than right bound,, the program will make effort times of tries until it gets an end next chord or run out of number of trials. When the number of beats is larger than cut bound, the program will detect if we can end at next measure, and end immediately once we can end. By using this mechanism, we can have some control on the length, but the piece can still end properly. 4.3 Laying Out Notes According to Chord Progressions

This is the second key algorithm in our program, written mainly in the Layout class. We use rules to evaluate the transition of one note group into another note group, and search for one local maximum. The detailed algorithm is as followed: a. State Based Model Search We model the chord layout problem as a state based model to enable search. The state contains following information: 1, the note in each track on the previous chord. 2, the next chord that we want to generate. Then we do a search. Currently, for the sake of speed and variety, we have implemented a greedy search with a random tie breaker (choose randomly among the layouts with highest scores) In each search process, we first propose all the possible note-layouts in proper range of each voice. For example, the voice range for a soprano is from C4 to A5, so we will propose all possible chord tones among that range. The proposal is key-specified, since a ((0, M), 0) chord for one key is different from the chord in another key. After we have the proposals, we evaluate all the chords with hand coded voice leading rules. We detect and panelize for each violation of voice leading rules in the transition from the previous chord to the current chord. The penalty for each kind of violation is different, so that we can avoid big problems with the cost of some minor mistakes. The rules, their detection and the amount of penalty are hand coded. For example, the penalty for parallel fifth is 50 in our current system, while the penalty for wrong doubling is 10. After we evaluate all these layouts, we sort them and choose randomly one of the layouts with best score. 4.4 Writing the Notes into Midi File For this step we mainly use functions from mido package. We first make some optimization for our group of notes. We put slurs between notes that are the same for multiple beats, so that we can hear breaks or rests of voices, which can fake some phrasing. Then, we encode our notes and rests into the note_on and note_off messages, and write into a MidiFile instance. Finally we output the file into the folder we want. In this process, we can specify tempo, which will be written to the meta-track that holds all high level information. 4.5 User interface We encapsulate individual processes above, and create a Composer class. User can only specify the high level directions, and the composer program will do its job all the way from analyzing to composing. Currently the interface is command-line based. 5 Literature Review Similar systems have been built by David Cope (Emily Howell) [3] and Deep Mind team (an alteration of WaveNet). David Cope s Emily Howell also takes in its mother program Emmy s results and compose based on its learning. Despite that it has more advanced learning algorithms, Emily Howell can interact with human, and get feedbacks on its composition. It gets better gradually after these supervision. Our program currently cannot get feedback or incorporate feedbacks into its future composition. Emily Howell composes in piano, which is a more

complicated system than chorale. Its results have clear themes, constant measures and proper arrangements, which our program currently cannot achieve. However, it cannot do orchestration yet, so there is still space for growth. Wave Net uses deep learning, mimicking the natural language processing job, which had been their main focus. We also have this idea that composing music is just like writing an essay, and we also believe that speech generation has a lot similar with music generation. Their approach with deep learning can be very successful. 6 Future Improvements One problem about the learning process pertains to meter and downbeats. According to music rules, the resolution of tension is better to resolve on downbeats, and one chord should not stay over the bar line. However, since the learning materials contain all kinds of meters, the generated music cannot be in a consistent meter. One possible solution is to put an extra feature of strong or weak beat, and increase the weight for proper beats and decrease the weight for inappropriate beats in our last dictionary, so that we can have a better sense of measure. However, this will shrink the dictionary very significantly that we have to obtain many more learning materials. Another problem with the learning process is the choice chorale music; it is actually slack in terms of metering. Phrases can take as long as they want, and changing meter is also a common practice. For the sake of time, we decide not to implement strong and weak beat control, and let the chord change decide what to emphasize, which gives room for future improvement. For the layout process, one problem is that since we are using greedy search, we cannot guarantee that the result is error free. There are times when error is unavoidable. For alternatives, we can possibly use a search going to global max, but this will compromise the variety, especially when there are only a few ways to achieve no-error. Also, it will be super slow since the number of states grows exponentially. Also, few or no error does not mean musically interesting. Since currently we are only penalizing for error and choosing randomly from the ones with fewest errors, we still cannot detect how good is this move. For an alternative of the layout process, we could use CSP, and set these voice leading rules as constraints. The runtime is still a serious issue, though a little better than other global max searches if implemented with early stopping. 7 Error Analysis We have done two kinds of evaluation on our program: 1. Invite people to rate certain aspects of our composition from listening on an online survey. 2. Invite music student to analyze our composition from score. The survey results are as follows:

- Mostly less good than human. But it is not terrible for most people (even music majors). Commented by my music theory teacher: it does remotely sounds like a renaissance chorale - The result can have big variation. Each composition can have its own strong or weak points. The first has better voice leading & chord progression, the second has better phrasing and breaks. - No decoration makes it boring. No theme, no repetition, no form. From formal analysis carried out by music students, we find the following feedbacks: - We still have some progression errors, like retrogression (i.e. V-IV). This is caused by the problem in chord fitting process: the chord with too many non-chord tones can possibly analyzed incorrectly. We need a better way to analyze, or reduce error in chord progression generation. - Problem with strong and weak beats, like resolving to a weak beat or same chord remain over a bar line. Since our result currently has no sense of bar lines, we cannot deal with this problem currently. - The result has some unresolved 7 th, mainly in bass lines. This problem is caused because we only picture the one bass note for each chord, omitting passing tones, which may be the resolution for these chords in third inversion (4-2 position). We need to implement a way to get the decoration working. 8 References [1] Bach chorale database: www.kunstderfuge.com [2] Mido python package: http://mido.readthedocs.io/en/latest/ [3] Emily Howell https://slab.org/tmp/wiggins-cope.pdf