Evolutionary Computation Applied to Melody Generation Matt D. Johnson December 5, 2003 Abstract In recent years, the personal computer has become an integral component in the typesetting and management of various types of music. However, the computer is capable of serving as more than just a typesetting and data management tool. This paper explores the ability of a computer to generate and arrange four part vocal harmony in the style of church hymnody. The research presented here involves the use of an evolutionary algorithm to generate a melody. The resulting melody is then arranged into four parts using a decision tree for assigning chords. The result is an application that produces unique and pleasing music suitably arranged for Soprano, Alto, Tenor, and Bass. Contents 1 Introduction 2 2 Related Work 3 2.1 Interactive Systems........................... 3 2.2 Autonomous Systems.......................... 3 2.3 Rule Based Systems........................... 4 3 Research Methodology 6 3.1 Problem Size.............................. 6 3.2 Problem Simplification......................... 6 3.3 Problem Representation......................... 7 3.4 Evolutionary Cycle........................... 8 3.4.1 Initialize the Population.................... 8 3.4.2 Terminating Condition..................... 8 3.4.3 Selection of Parents....................... 8 3.4.4 Reproduction.......................... 9 3.4.5 Mutation............................ 9 3.4.6 Rhythm Correction....................... 9 3.4.7 Competition.......................... 10 3.5 Fitness.................................. 10 3.6 Results.................................. 14 1
4 Conclusion 15 Keywords Evolutionary Computation, Evolutionary Algorithm, Artificial Intelligence, Music Generation, Melody Generation, Computer Generated Music, Genetic Algorithm, Fitness Bottleneck 1 Introduction In recent years, personal computers have become tools used to store and typeset sheet music. There is currently research underway which will hopefully lead to computer applications that are capable of generating and arranging music as well as a human being can. Section 2 of this document presents a brief overview of a few such research projects. The research methodology presented in section 3 of this paper presents an evolutionary algorithm which generates a melody in the traditional style of church hymnody. The resulting melody is in the soprano range. Alto, Tenor, and Bass parts are generated to go along with the melody using CAVM, a tool that automatically adds Alto, Tenor, and Bass parts to an existing melody [4]. 2
2 Related Work Evolutionary programming is a powerful tool which has been used by a number of researchers in the field of computer generated music. Across the board, it seems that the greatest challenge to researchers in this community is the fitness function for their evolutionary systems. The authors of [8] categorize computer generated music research according to the fitness function used in that particular method. A number of those categories are used here. 2.1 Interactive Systems The fitness function in an interactive system is a human being. Every generation created by the evolutionary program must be painstakingly evaluated by hand. This creates a fitness bottleneck [1]. However, it practically gaurantees the patient user a computer generated melody that is pleasing to that individual. A system called Variations is presented by Bruce L. Jacob [3]. Jacob chose to conduct his experiments at the level of phrases and motives instead of notes; typically, computer generated music is implemented at the note level. This system uses three modules, namely the ear, composer, and arranger. Each of these modules either uses a genetic algorithm, or was developed with a genetic algorithm. Composing with Variations requires a human operator to define a number of motives which will be used as the basis for the musical composition. [3]. Phrases are developed by the composer module, which performs recombination and variation on the original motives. The composer module refers to the ear module to determine whether a given phrase is acceptable. Once a number of accepted phrases are created, the arranger module will put the phrases together and wait for feedback from a human evaluator. The arranger module will continue to work with the human evaluator until the program terminates. 2.2 Autonomous Systems In a typical EA, the fitness function is constant, while the population evolves over time to become more fit. Autonomous systems are different in that both the population and the fitness function evolve [8]. One of the most interesting pieces of literature uncovered in this research is a paper entitled Frankensteinian Methods for Evolutionary Music Composition [2]. In this paper, Gregory begins by presenting an extensive overview of a number of different music composition projects. Throughout the overview, references to Frankenstien are used to illustrate various points. The paper climaxes in section four when the author presents his evolutionary ideas for generating music. In the Frankensteinian approach, both the individual and the environment coevolve. According to Gregory, this relationship is similar to the relationship between Frankenstein and his monster. Frankenstein and his monster each contributed to the others environment, so they evolved together based on the other. Gregory presents two types of individuals in section 4.2 of his paper, Coevolving hopeful singers and music critcs. The female individual represents the evolving en- 3
vironment and choses the males, which represent the singers. The female maintains a note transistion table. This table indicates what type of transitions she expects and with what frequency. The table is initialized with note transitions collected from simple folk-tune melodies. Over time, the table can change in response to what the female observes in the male singers. This creates the changing environment. Males in the system start out with randomly generated melodies and evolve based on the environment. 2.3 Rule Based Systems The rule based system uses a fitness function which encodes a set of rules. The rules must be built into the system based on the authors musical knowledge [8]. George Papadopoulos and Geraint Wiggins [7] present a genetic algorithm for generating jazz melodies based on an input chord progression. Their algorithm is made distinctive by the following characteristics: 1. The algorithmic fitness function described in [7] calculates the weighted sum of a number of distinct characteristics of the chromosome. This approach avoids the fitness bottleneck described by John A. Biles [1]. 2. Problem specific genetic operators allow this system to converge to a high fitness relativley quickly. 3. The representation of the melody is based on the scale degree of a note, as opposed to the traditional binary encoding. This allows for greater readability and more problem specific operators. The paper concludes by saying that the resulting system frequently generates interesting patterns, and also enumerates some extensions which could lead to more human-like jazz melodies. A genetic algorithm for harmonising chorale melodies is presented in Evolutionary Methods for Musical Composition [9]. Note representation is based on standard western music syntax. Information such as the key signature and time signature is stored. For every note, pitch is expressed in terms of scale degree and its duration is an integer; another integer is used to indicate the octave the note occurs in. The absolute pitch of the note is not stored. The genetic algorithm presented in [9] makes use of several domain specific operators. One such operator is named Splice and is a traditional one point crossover. A unique operator in this implementation is the PhraseEnd operator. The PhraseEnd operator mutates the end of a phrase such that it ends with a chord in root position. Two types of fitness functions are used in this genetic algorithm. One fitness function evaluates individual voices, and tends to favor movement in a consonant direction. The fitness function also leans against large jumps in the voice. The second fitness function considers the relationship between voices, and tends to avoid certain types of parallel motion and cross voices [9]. The authors of [9] note in their review of this genetic algorithm that the results are decent, but certainly not optimal. The domain knowledge encoded in the algorithm allowed for them to acheive the results they got rather quickly - within 300 generations. 4
They end this section of their paper by suggesting that a conventional rule based system working in conjunction with one or more genetic algorithms would be a better approach to harmonisation. 5
Bass Tenor Alto Soprano c d e f g a b c d e f g a b c d e f g a b c d e f g a b { } Great 1 { } Small 0 { } One Line 1 { } Two Line 2 3 Research Methodology 3.1 Problem Size Figure 1: Voice Ranges. The decision to use an evolutionary algorithm to generate a melody is driven by one main factor - complexity. Consider the following: An average soprano can sing notes in the range from D1 to G2, or 18 different pitches. (See Figure 1 for an illustration of voice ranges.) There are 8 note durations typically found in church hymnody: sixteenth, eighth, quarter, half, whole, dotted eighth, dotted quarter, dotted half. The number of notes found in a typical hymn can range from roughly 20 to 60. (40 on average) Given this information, the number of potential melodies can be calculated. Pitches * Durations = 18 * 8 = 144 = # possible notes (# possible notes)ˆ(melody length) = 144ˆ40 = 2.16 * 10ˆ86 melodies The large search space makes an EA well suited to tackling this problem. 3.2 Problem Simplification Since the number of possible melodies is so large, reduction of the search space will make the problem more manageable. This is done quite handily by acting on two observations. First, most melodies stay within the key of the musical piece. Second, the range of most melodies does not exceed one octave. By constraining the melody to notes within one key (F) and one octave, the number of pitches drops from 18 to 8. For completeness, a rest is included as a pitch, making the number of pitches 9. This changes our initial calculation to the following: Pitches * Durations = 9 * 8 = 72 = # possible notes (# possible notes)ˆ(melody length) = 72ˆ40 = 1.96 * 10ˆ74 melodies 6
Note Duration Integer Used whole note 0 dotted half 1 half 2 dotted quarter 3 quarter 4 dotted eigth 5 eigth 6 sixteenth 7 Table 1: Note Duration Mapping Scale Degree Note name in Key of F Integer Used REST - 0 ONE F 1 TWO G 2 THREE A 3 FOUR B flat 4 FIVE C 5 SIX D 6 SEVEN E 7 EIGHT F 8 Table 2: Note Degree Mapping This is still a daunting number of melodies, but it is significantly smaller than the first calculation. Additionally, the restrictions placed on the melody will automatically produce a more pleasing sound, because notes outside the key will not occur. 3.3 Problem Representation The note is the building block of music. Therefore, the cornerstone of the representation is a Note structure. The structure consists of a scale degree and a duration. The duration of a note indicates how long the note will sound. This value is represented as an enumerated integer type. Table 1 illustrates the mapping of a note duration to the underlying integer used in the implementation. The scale degree of a note indicates its pitch within a given key. For simplicity, every melody generated by this algorithm is in the key of F. Table 2 illustrates the mapping between the scale degree, the letter name of the note in the key of F, and the underlying integer used in the implementation. A complete melody consists of a vector of notes. A class called Individual is responsible for storing the melody. In addition to the melody, an Individual also contains the following functions: Initialize, Crossover, GetFitness, ForceBeats, ChangeOneNoteDegree, ChangeOneNoteLength. These func- 7
tions are used throughout the evolutionary process, and will be explained in later sections. The controlling class is named Population. The Population class directs the evolutionary process and stores all the individuals in an AVL tree [5] based on the fitness of that Individual. 3.4 Evolutionary Cycle The evolutionary cycle used is as follows: Initialize the Population; while(the terminating condition has not been reached) { Select two parents; Reproduction; Mutate the children; Correct the rhythm of the children; Competition; } Each of these steps will now be explained in detail. 3.4.1 Initialize the Population The size of the population is encoded in the Population class and is currently set at fifty individuals. For every member of the population, Population will instantiate an Individual and call the Individual::Initialize function. The Individual::Initialize function will decide the length of the melody (from 20 to 60 notes) and then generate that number of notes; each note has a randomly generated scale degree and duration. The Individual::ForceBeats (see section 3.4.6 for details) function is called after Individual::Initialize is called. 3.4.2 Terminating Condition A population will evolve until 100,000 generations has been reached, or until the best individual in the population has a fitness of at least 30. See section 3.5 for a complete description of the fitness function. 3.4.3 Selection of Parents Since the individuals are stored in an AVL Tree based on their fitness, implementation of rank based selection is straight forward. The tree is traversed starting with the most fit individual, proceding towards the least fit individual. At any point along the traversal, the current individual has a twenty percent chance of being selected. Traversal will continue through the tree, giving every individual along the way a twenty percent chance of selection when it is visited, until one is finally selected. If the traversal fails 8
to select an individual, the most fit individual in the tree will be used. Once an individual is selected, the traversal starts over and a second parent is selected using the same criteria. It is possible that the same individual will be selected both times. 3.4.4 Reproduction Reproduction is essentially crossover between the two children, who at this point are just copies of their parents. One child will call its Individual::Crossover function. The Crossover function takes as an argument another Individual,which is the second child. To facilitate the discussion of reproduction, the following terminology will be used: this melody: The melody contained in the Individual whos Crossover function is currently executing. in melody: The melody contained in the Individual who was passed into the Crossover function. temp melody: The temporary melody that was created inside the Crossover function. The Crossover function will randomly select a scale degree and use it as a crossover point. The length of temp melody will be determined to be the length of one of the other two melodies, whichever is shorter. The melody temp melody is created by copying notes from this melody until the crossover point is hit. Then, in melody will be scanned until the crossover point is found. Starting with this crossover point in in melody, notes will be copied from in melody to temp melody until another crossover point is encountered. The algorithm will at this point switch back to this melody for another chunk of notes. Thus, temp melody is created by adding sets of notes from the other two melodies until the melody is full. (Refer to Figure 2 for an example.) At the end of Individual::Crossover, this melody is reassigned to be the same as temp melody. 3.4.5 Mutation The only two mutation operators are ChangeOneNoteLength and ChangeOneNoteDegree, which are used by both children. ChangeOneNoteLength randomly selects a note in the melody. Then, it either decreases or increases the integer which represents the note duration. Table 1 shows the note duration to integer mapping. ChangeOneNoteDegree operates in exactly the same manner, except is modifies the scale degree instead of the note duration. Table 2 shows the note length to integer mapping. 3.4.6 Rhythm Correction During the development stages of this project, the observation was made that the rhythm patterns in the melodies were exceptionally difficult and unusual. To correct this problem, a deterministic function called ForceBeats was introduced. ForceBeats works as follows: Loop through the whole melody Select the next note 9
Figure 2: Reproduction Crossover. If the note duration is equal to one beat Go on to the next note. If the note duration is more than a beat Ensure that the following note or notes do not extend beyond the end of the current count. If the note duration is less than one beat Ensure that the following note or notes plus the current one have a total duration of one count. End loop 3.4.7 Competition Every generation, two individuals are born, and two individuals die. The two individuals created are stored in the tree. Then, the two least fit individuals in the tree are terminated. 3.5 Fitness Without a doubt, the fitness function was the most challenging aspect of this project. The fitness function is a member of the Individual class. The fitness function uses a Fitness Loop, which cycles through every note in the melody, checking the relation- 10
ship of the current note with the note which follows it. As the melody is evaluated, the fitness function keeps a running total of fitness points. The following list provides a name for a particular characteristic within the melody, the fitness points awarded for that characteristic, and a brief description. The phrase next note is used below to indicate the note which follows the current note in the Fitness Loop. 1. SAME NOTE: Fitness Points: 17. The scale degree of the next note has not changed. 2. ONE STEP: Fitness Points: 17. The scale degree of the next note has gone up or down one step. 3. ONE THIRD: Fitness Points: 15. The scale degree of the next note has gone up or down two steps. 4. ONE FOURTH: Fitness Points: 12. The scale degree of the next note has gone up or down three steps. 5. ONE FIFTH: Fitness Points: 10. The scale degree of the next note has gone up or down four steps. 6. OVER FIFTH: Fitness Points: -25. The scale degree of the next note is greater than four steps away. 7. FOUR SEVEN: Fitness Points: -25. The current note is scale degree four and the next note is scale degree seven. 8. SIXTEENTH NOTE: Fitness Points: -10. The current note is a sixteenth note. 9. DRASTIC DURATION CHANGE: Fitness Points: -20. The duration change between the current note and the next note is more than four steps in table 1. 10. BEGIN TONIC: Fitness Points: 50. The melody begins with the tonic note (scale degree 1). 11. END TONIC: Fitness Points: 50. The melody ends in the tonic note (scale degree 1). Fitness points are awarded and stored in a local integer variable as the Fitness Loop executes. The function returns the value of the fitness points divided by the number of notes. In the event that the number of fitness points happens to be negative, the function will return -100. When the population is first initialized, the best individuals fitness is typically around zero. In 1000 generations, the fitness of the best individual will usually achieve at least 15. Sometimes, a fitness of 18 or better can be achieved in that time frame. In other situations, a fitness of 18 is never acheived. Figure 3 shows the fitness of the best individual in the tree every 1000 generations for a particular run. Figure 4 illustrates the average fitness of the population every 1000 generations. 11
18 16 14 12 10 8 6 4 0 1 2 3 4 5 6 7 Figure 3: Best Fitness. 12
20 10 0 10 20 30 40 50 0 1 2 3 4 5 6 7 Figure 4: Average Fitness. Figure 5: A Melody with a fitness of 14. Figure 6: A Melody with a fitness of 15. 13
Figure 7: A Melody with a fitness of 17. Figure 8: A Melody with a fitness of 17 arranged into four parts. 3.6 Results In the early stages of development, the melodies generated were quite dissapointing. However, after fine tuning the fitness function, juggling parameters, and deterministically correcting rhythm patterns, the resulting melodies are quite nice. The best melodies seem to be in the fitness range of 14 to 20, depending on how much excitement is desired in the melody. Melodies on the low end of this range are quite unique and interesting. These melodies are also more difficult to sing, and may not sound as nice. Figure 5 is a good example of this type of melody. (All music shown in this document was typeset by Lilypond [6]) Melodies with a fitness greater than 19 start to exhibit similarity between each other. The algorithm terminates before the one perfect individual is found, but it does appear that given infinite time that perfect individual would be a rather boring melody. Figure 9 is actually one of the more interesting super high fitness melodies. Other individuals with a fitness of 20 have been less interesting. Figure 9: A Melody with a fitness of 20. 14
The best individuals are in the fitness range of 16 to 18. They exhibit uniqueness, are pleasant to listen to and tend to be easy to sing. Figure 7 is an excellent representative of the top notch individuals produced by the EA. Notice how the part moves around and changes frequently, but has few irratic jumps. Figure 8 uses the same melody and presents alto, tenor, and bass to accompany the melody. The alto, tenor, and bass lines are arranged by CAVM [4]. 4 Conclusion Without a doubt, a simple evolutionary algorithm is capable of generating very nice melodies. Further development including musical modeling would lead to even better melodies. More sophisticated and problem specific genetic operators would also likely improve the results. Computers are inherently good at doing any type of work which requires crunching numbers or doing logic. Their biggest weekness lies in areas which involve feelings and emotions, such as art and music. The research presented here, as well as ongoing research in the computer generated music community, leads this author to conclude that there may come a day in the near future in which computers can do more than just crunch numbers. Over time, Computer Science methodologies will continue to develop. Eventually, these methods will converge to mimic the creative nature of the human brain. Imagine a computer that can compose with the anger of Wagner, to do so in a moment, and make no type o s in the process. Artificial Intelligence will grow until it encapsulates the nature and production of human feelings into ones and zeros. At that point, we will have computers that can not only crunch numbers, but can also express emotions. References [1] BILES, J. Genjam: A genetic algorithm for generating jazz solos, 1994. [2] GREGORY, P. T. Frankensteinian methods for evolutionary music composition. [3] JACOB, B. Composing with genetic algorithms, 1995. [4] JOHNSON, M. D., AND WILKERSON, R. W. Computerized arrangement of vocal music. In Intelligent Engineering Systems Through Artificial Neural Networks Volume II (2001). [5] KARAS, W. Code: Abstract avl tree template - available in the public domain. [6] LILYPOND. http://lilypond.org/web. [7] PAPADOPOULOS, G., AND WIGGINS, G. A genetic algorithm for the generation of jazz melodies. 15
[8] SANTOS, A., ARCAY, B., DORADO, J., ROMERO, J., AND RODRIGUEZ, J. Evolutionary computation systems for musical composition. In Proceedings of Acoustics and Music: Theory and Applications (AMTA 2000). vol 1. pp 97-102. ISBN:960-8052-23-8. (2000). [9] WIGGINS, G., PAPADOPOULOS, G., PHON-AMNUAISUK, S., AND TUSON, A. Evolutionary methods for musical composition, 1998. 16