CS 221 Final report The Sixteen Machine: Generating Intelligent Rap Lyrics Aaron Bracket, Antonio Tan-Torres Introduction: The use of language is a fascinating aspect of humans. Specifically, how language is used in rap lyrics is a uniquely human system defined by blurry rules for what a rapper deems as rap music. Rappers lyrics vary immensely depending on how they employ these rules to define rhyme scheme, rhythm, content, and many other characteristics that makeup the rough structure of a rap verse. This rough structure of a uniquely human system makes it particularly interesting to try and decompose into an actual computational system. Essentially, if we can identify and represent these characteristics of rap lyrics, then we should be able to learn from a rapper s lyrics the unique way that the specific rapper utilizes these characteristics, and then create a new rap verse in a similar style. One of the best rappers alive is Kendrick Lamar, so essentially we want to learn what a characteristic Kendrick Lamar rap verse consists of, and then output new Kendrick Lamar-esque rap verses (in an effort to satiate our K-dot hunger as we wait for his next album, hopefully with J Cole). Task Definition: The task at hand is to build an AI agent that produces rap lyrics based on a specific rapper s lyrics. This is trying to tackle the real-world problem of writing rap lyrics, but using artificial intelligence techniques. Our task involves modeling the process of writing rap lyrics by certain characteristics of rap, then learning that model from a rapper s lyrics as data, and then using that model to output different, but similar, rap lyrics. Our system takes in lyrics of a specific rapper, and outputs a new verse (16 lines) of rap lyrics inspired by that rapper. Scope: The scope of the project is determined by what we specify as characteristics of a rap verse, or what a rap verse should like. The goal of building a system that outputs a rap song, with rap lyrics to a musical beat, would be too broad and complex for the scope of this project. Therefore, we are focusing only on outputting rap lyrics (no music). However, the scope of the differences in structure/characteristics of rap lyrics is also very broad to encompass all of them (like inner-rhyme schemes, context, word-play) in this project. So for this project, we have narrowed the scope to focus on the essential rap characteristics of rhyme and rhythm, and output a 16-line verse using specifically a couplet rhyme scheme, where the ending words of every 2 lines rhyme. This simplifies the problem of structuring the outputted rap verse, while still leaving enough flexibility for the characteristics of the structure to be learned from the dataset.
Evaluation: Our evaluation metric will be human-based, since all rap lyrics are evaluated/appreciated by humans. To evaluate rap lyrics as humans is a complex thing, where we can appreciate different aspects of the rap lyrics like content, word choice, flow, etc. For the given scope of the project, our outputted rap verse will be evaluated by humans specifically in terms of rhyme, rhythm/flow, and general likeness to the original rapper. There are no quantifiable metrics for these aspects, and so the evaluation will be a human written evaluation of the rap verse in terms of these characteristics, much like how we evaluate and talk about the aspects of rap. Dataset: Our dataset consists of many rap lyrics from a specific rapper (we used Kendrick Lamar). We scraped all of the rap lyrics of an artist from Genius using the Genius API. Literature Review: While no one had quite focused on doing what we wanted to for rap, some people had dabbled in text generation. One paper we read gave us the idea to use n-grams to help us emulate our favorite rappers [1]. Another group had a nice paper on how to increase variation in text, and we used their section on Naïve Bayes to inform how we chose our words [2]. These two techniques applied in concert are what led to the success of our project, so we owe a lot to them. Infrastructure: In order to accomplish this task, we needed to collect a dataset of raw rap lyrics from a specific rapper. We used the Genius API to scrape all of the Kendrick Lamar rap lyrics on Genius. However, in order to use this dataset of raw rap lyrics, we had to do some pre-processing to remove unnecessary tokens (like [Hook x2] and [Chorus] ), as well as any choruses or skits, so that we just had the actual lyrics. Another part of our infrastructure that we had to establish was the infrastructure of a rap verse. We determined that our system would focus on the essential characteristics of a rap verse, which are rhyme and rhythm, and that the system would output a 16-line verse in a couplet rhyme scheme. To include this rap verse infrastructure into the actual infrastructure of our system, we had to figure out how to computationally represent these characteristics through patterns in the lyrics. We decided to use the Natural Language Toolkit library from Python to assist these efforts. For rhyme, we find out which words the rapper uses to rhyme together based on assuming all of their rap lyrics follow the couplet rhyme scheme, so we assume the last words of every 2 lines to be rhyming words. We scrape these rhyming words from the lyrics dataset and judge how good is the rhyme based on if they appear in the Natural Language Toolkit rhyming corpus (because not all rhyming words are used in the corpus, but if they are then they are sure to be a rhyme).
For rhythm, we can measure this as the number of syllables in a line. Using the Natural Language Toolkit, we can count the number of syllables in each line of the rapper s lyrics, and then we can take the average of these syllable counts to be the average rhythm or flow of the rapper. With the computational representations for these characteristics established, we can learn them from the dataset and use them to output a new verse. Approach: The challenges of building the system lie in how we model the rhyme and rhythm of rap lyrics, and how we use the representations of rap characteristics to construct a verse. Using the computational representations of rhyme and rhythm discussed above, our system has to use these representations in a way that outputs a good verse. To do this, we decided to implement a statebased system where the states, actions, and costs would model the lyric-writing process. The states would capture where we are at in the lyrics so far, and the actions would be adding a word/ending a line, and the costs of these actions would be determined by the rhyme/rhythm heuristics. Initially, we tried the approach of making the problem a State-Search Problem where the minimum cost path would be the best verse possible in terms of rhyme/rhythm. However, this proved problematic for our project because we want to create an AI agent that produces different lyrics each time. While this minimum cost path state-search approach did seem to capture the lyric-writing process in terms of using rhyme/rhythm, it would always output the same verse which is not the kind of AI rapper that we wanted. To address this problem, we decided to model the problem as a Markov Decision Process, or an MDP problem. This introduces some randomness to the Search-State problem so that the AI agent can output different verses by solving the problem via different paths. MDP Implementation: States: o Each state is represented by a tuple containing (Current line number, N- gram tuple, rhyming word, string of current lyrics) o This captures all the information we need to know for where we are at in the lyric-writing process Actions: o In each state, the writer can take one of two actions: o Keep Going : continue to add words to line by choosing word from n- gram set o Finish the Line : choose the word to add from the intersection of n-gram set and set of rhyming words Transition Probabilities:
o Probability of choosing the word from its respective set (word count / total number of word possibilities) o The number of word possibilities varies depending on if we pick a word from just the n-gram set, or the intersection with the rhyming words Rewards: o Rewards for non-terminal states are zero o Rewards for terminal states are determined by a rhythm heuristic and rhyme heuristic o Rhythm: difference in syllable count between output verse and average syllable count from data o Rhyme: number of rhymes that are recognized by the Natural Language Toolkit library Discount: o We used a discount of factor of 1 Concrete Example (of how the MDP decides on the next word): So in this example, the current state given is represented by the top circle. It s the first line, so the current line metric is 0. The previous 2 words, which we are using as our n-gram, are you and feel. Because this is the first line, there are no previous rhyming words. And the last metric is just a complete set of the lyrics; so far, we have Do you feel. Given the action of Keep Going, we have four possibilities that can happen. These possibilities are all of the words that have ever appeared after the sequence you feel, giving us bad, like, it, and that. Each word is associated with a possible next state, which is much like the previous state, but the n- gram has changed to be feel and the selected word, as well as the lyrics adding the selected word to them. The sequence you feel appeared 120 times, and the probability that we move to each of the states is the number of times the words associated with them appeared after you feel out of 120. The MDP then makes a random weighted decision.
Baseline Approach: For our baseline, we implemented a simple 2-gram model with Kendrick lyrics. This model does not take into account any rhyme/rhythm characteristics. As you can see, some of the individual phrases sound very Kendrick like, but it falls woefully flat in terms of both rhythm and rhyme. Sample Baseline output: We all my life on tour, ya bish Crawled under stress days And I'm behind bars That considered the far you believe in time ago Is your mind, you re very welcome Ah yeah, fuck your ways deceitful Church me tell you hit the fucking frightening Hood nigga shopped in the grind for eternity, return of the police relaxed" So many artists gave her soon as fucked up, shut the car then looked at a fuck? We made a meteor speed dial, I think she on fours Enthused by a fight he get a house or die Anybody can water And I'm followed by the street from me with the lions start rhyming,ya bish That's what when we was Making sure your heartbeat, it either caught me Rock on your game right So I see the mirror and y'all fucks never should bite their sorrows Oracle Approach: As an Oracle we tried writing as different rappers, emulating their rhythms, rhyme scheme, and subject matter. Each rapper tended to have a different style, so these variables had a whole spectrum that we could distinguish them by. In this example, we wrote what we thought would be a Kendrick verse. We incorporated a lot of multisyllabic rhymes, and made sure to have a similar subject matter to that of Kendrick, talking about police brutality and killing other rappers on the mic. We also tried to write with syllabic counts that would have a rhythm close to Kendrick s most popular flow. Sample Oracle output: I'm a product of the 80s, grew up on slim shady Having dreams of a Mercedes, no wonder I turned out crazy I used to be blind and underground, something like a mole But now that my eyes are open I don t fuck with nobody but Cole Born sinner, but I still act like I be innocent Kill your favorite rapper, deadly with a pen Pussies liking other pussies, group of lesbians, them But should be fightin' popo till feds be in the pen Cause they shooting first and asking questions never Don t really care whether or not a nigga is clever Where's the logic in that? I'm feeling under pressure They must be jokin', but I'm no Heath Ledger Sick of these clowns runnin round, this aint gotham
If a nigga frown, gun 'em down, this is Compton I grew up around Pirus, I was drowning in bloods Now I drown in pussy and all my haters drownin in mud MDP Approach (sample output): No discrimination, she got this is elementary, I'll take your mother Vaca'd in the industry hard when I should cover Yeah, life I put me of us starving, these niggas, tell my ear into beef Feed the police You can feel, like a little nappy-headed nigga don't want it, told you know I'll take your troops at your bitch with gasoline she holding my head slugs go through your info To go after that never respect, pussy frenchin High by myself, my mission Pack a guarantee that trolly, your parents house when She gobble gobble for no whammy on you and your Friends Determination ambition, plus no kills I no cries foreal Punchlines mean I hit the way hell of this game That's some competition on my hormones just forgot y'all on the game Give me BET, I'm holding the channel Man look, yo strap in your daddy on repeat, they got 200 in shambles Error Analysis: While not quite as good as our Oracle, the beginning elements of rap are definitely there. Since our milestone, we limited the maximum amount of syllables it can in a way that was proportional to be approximately double the average syllables, which eliminated any super long run on lines. But, as you can see, the rhyming is definitely pretty good. And for the most part, it flows, with a lot of intraline consistency. Lastly, the verse is undeniably a Kendrick verse, with several phrases sounding distinctly Kendrick-esque, and using the same slant rhymes that he would use and sound best when imagined in his voice. Unfortunately for the quality, we had to make some tradeoffs. The first tradeoff we made was to limit the lyrics we were inputting to just one artist, so that the code could run without taking so long as to crash our computers. This had the fortunate effect of having our verses sound distinguishably like one artist, but overall coherency is generally improved the more data you use. The next tradeoff we had to make was using only 2-grams. This leads to a lot of weird constructions because oftentimes the previous two words are not enough to predict what third word would actually make sense in that next spot. But, we found that even when moving to just 3-grams, whole verses were being determined after just the first two words, as most three word sequences only appeared once, so by our model, they were forced to choose the same fourth word every time, which led to a lyric chain of original lyrics. Another tradeoff we made was to
prioritize rhyming over rhythm, perhaps too much, so that while each line rhymes, this is sometimes at the expense of choosing a rhyming word that is within the n-gram. Some errors that have no easy solution include interline consistency, making sure that each line relates to some theme or plays more off of the preceding line. We hypothesized that we could use k-means on the words to put them into clusters, and then prioritize words that were in the same cluster, but we did not have time to do this, and this would no doubt require better computers and more time than we had left to run properly. Further Work: To better fix the rhythm problem, we tried a solution that gave more power to the MDP to allow it to more naturally decide when to end a line, and then using the calculated optimal policy of a given state to choose whether or not to keep going, but this approach sadly proved to be to resource heavy. The beginnings of it are in the attached code. Conclusion: While we may not have produced a Rapping AI that can pass the Turing Test, we still created an AI that proves that this field has some promise. By improving upon the AI using methods described in our Error Analysis and Further Work sections, a future party could definitely get the rapper closer to the goal then we were currently able to do with our time and resources. References: 1. He H., Jin J., Xiong Y., Chen B., Sun W., Zhao L. (2008) Language Feature Mining for Music Emotion Classification via Supervised Learning from Lyrics. In: Kang L., Cai Z., Yan X., Liu Y. (eds) Advances in Computation and Intelligence. ISICA 2008. Lecture Notes in Computer Science, vol 5370. Springer, Berlin, Heidelberg. 2. T. C. Ferreira, E. Krahmer, S. Wubben. Towards more variation in text generation: Developing and evaluating variation models for choice of referential form. Tilburg University