Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility

Size: px
Start display at page:

Download "Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility"

Transcription

1 Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility Karim M. Ibrahim (M.Sc.,Nile University, Cairo, 2016) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF COMPUTER SCIENCE NATIONAL UNIVERSITY OF SINGAPORE 2018 Supervisor: Associate Professor Ye Wang Examiners: Associate Professor Ng Teck Khim Associate Professor Huang Zhiyong

2

3 DECLARATION I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. Signature: Date: 16 October 2018 Karim M. Ibrahim 2018 I

4

5 ACKNOWLEDGMENTS Firstly, I would like to express my sincere gratitude to my advisor Prof. Wang Ye for the continuous support of my study and related research, for his patience and motivation. His guidance helped me in both my research and my life through his helpful advice. My sincere thanks also goes to Dr. Kat Agres, Dr. David Grunberg, and Dr. Douglas Turnball. Without they precious support it would not be possible to conduct this research. I thank my fellow labmates in for the stimulating discussions, for the sleepless nights we were working together before deadlines, and for all the fun we have had in the last two years. Last but not the least, I would like to thank my family for supporting me spiritually throughout writing this thesis and my my life. II

6

7 Contents Abstract List of Publications List of Tables List of Figures List of Abbreviations V VI VII VIII IX 1 Introduction Motivation Problem Statement Contributions Thesis Outline Literature Survey and Problem Identification Factors affecting sung lyrics intelligibility Relevance to Speech Intelligibility Problem Statement Proposed System Defining ground truth measure of intelligibility The behavioral experiment and collecting the dataset Lab Experiment Amazon Mechanical Turk Experiment Investigating relevant acoustic features III

8 CONTENTS CONTENTS Preprocessing Audio features Building an acoustic model Future Work 28 5 Conclusions 30 References 31 APPENDICES 35 A Transcription Surveys 36 B List of songs 39 IV

9 ABSTRACT Learning a new language is a complex task that requires time dedication and continuous learning. Language immersion is a recommended approach to maximize the learning by using the foreign language in daily activities. Research has shown that listening to music in a foreign language can enhance the listener s language and enrich his/her vocabulary. In this study, we investigate how to recommend songs that maximize the language learner benefit from listening to songs in the target languages. Specifically, we are proposing a method for annotating songs according to their intelligibility to human listeners. We then propose a number of acoustic features that measure the different factors affecting intelligibility in singing voice and use them in building a system to automatically estimate the intelligibility of a given song. Finally, we study the usability of crowdsourcing platforms for collecting and annotating songs according to their intelligibility score on a large scale. V

10

11 List of Publications Ibrahim, K. M., Grunberg, D., Agres, K., Gupta, C., and Wang, Y. Intelligibility of sung lyrics: A pilot study. Proc. The International Society of Music Information Retrieval (ISMIR) Oct VI

12

13 List of Tables 3.1 Comparison between lab-controlled and MTurk experiments in terms of cost, preparation time, and time to get results. The results show that MTurk is superior in all three categories Classification accuracy for different genres B.1 List of songs used in the first lab experiment and model training. The dataset included five genres: Pop/Rock, Jazz, RnB, Folk, and Classical with 10 songs per genre B.2 List of songs used in the second MTurk experiment. The dataset focused on less intelligible genres and included five genres: Metal, Tap, Electro, Reggae, and Punk with 10 songs per genre VII

14

15 List of Figures 3.1 The process of labeling songs with intelligibility score The distribution of the transcription accuracies (Intelligibility score) The webpage setup for the Mturk experiment Score obtained from lab-controlled experiment vs MTurk experiment Comparison between intelligibility score distribution across batches Intelligibility Scores across different genres Confusion Matrix of the SVM output Confusion matrix of the different genres A.1 Survey filled by the participants in the lab experiment A.2 Survey filled by the participants in the lab experiment VIII

16

17 List of Abbreviations VAR Vocal to Accompaniment Ratio HRR Harmonics to Residual Ratio HFE High Frequency Energy HFC High Frequency Component MTurk Amazon Mechanical Turk MFCC Mel-frequency cepstral coefficients IX

18

19 Chapter 1 Introduction It is a common practice for many individuals to listen to music on a daily basis. It has been shown that music affects the mood and the mental clarity [28]. Based on this, music has been used in various contexts to solve different problems. For example, music is used in health related application, e.g. music therapy for patients diagnosed with Parkinson s disease or terminal cancer [11, 38], improving education process [10] and improving the mood of customers while shopping[49]. In this work, we focus on using music to improve the process of learning a new language. We study the factors that would make a listener favor a song over the other in terms of its suitability for understanding the sung lyrics. We then propose a computational model to automatically estimate these factors. 1.1 Motivation Learning a foreign language is a complex task that receives much attention from the research community on how to facilitate the process. Language immersion is a common strategy to improve a foreign language by performing daily tasks in the target language, e.g. reading articles and watching movies. For a daily music listener who is learning a new language, selecting suitable songs in the target language can help enriching the vocabulary of the student and his familiarity with the language. Research 1

20 Chapter 1. Introduction has shown that singing and language development are closely related at the neurological level [35, 45], and experimental results have demonstrated that singing along with music in the second language is an effective way of improving memorization and pronunciation [21, 31]. However, specific songs are only likely to help these students if they can understand the content of the lyrics [19]. As second language learners may have difficulty in understanding certain songs in their second language due to their lack of fluency, they could be helped by a system capable of automatically determining which songs they are likely to find intelligible and match their level of fluency. Although singing voice analysis has received much attention from the research community, the problem of intelligibility of a given set of sung lyrics is not studied as much. 1.2 Problem Statement Intelligibility describes how easily a listener can comprehend the words that a performer sings. The lyrics of very intelligible songs can easily be understood, while the lyrics of less intelligible songs sound incomprehensible to the average listener. People s impressions of many songs are strongly influenced by how intelligible the lyrics are. One study even finding that certain songs were perceived as happy when people could not understand its lyrics, but was perceived as sad when the lyrics were made comprehensible[32]. It would thus be useful to enable systems to automatically determine intelligibility, as it is a key factor in people s perception of a wide variety of songs. Besides the intelligibility of the singing voice, other factors that affect the progress of language learning with music is the lyrics complexity and sentence structure. For beginner levels, using correct grammar and simple language is recommended to enrich their vocabulary. However, for an advanced level, listening to songs that contain more colloquial and less formal language is useful for reaching higher level of familiarity and cultural integration. Music is regarded as a gateway to understanding society s cul- 2

21 Chapter 1. Introduction ture, connecting with the current generation and understanding more about the cultural history of the countries speaking the foreign language. It is common for music to express and discuss society s issues and conditions. This is an important part of learning a new language. People who learn a new language are often interested in integrating in these foreign societies and connecting with people of the same age, a challenge that traditional classrooms often fail to resolve. In this study, we focus mainly on estimating the intelligibility of a given song. we define the system structure as follows: Inputs: - A target song (singing voice mixed with background music). - The corresponding lyric text (if available). Output: A score between 0 and 1 reflecting the intelligibility of the sung lyrics. 1.3 Contributions The focus of this thesis is investigating the problem of intelligibility as an essential first step in recommending music for language learning. After reviewing the factors that affect the intelligibility of singing voice based on the cognitive studies, we proceed to build a computational model for intelligibility estimation. Our main contributions can be summarized into: 1. We propose a reliable behavioral study to label songs according to their intelligibility. 2. We propose a set of acoustic and textual features that reflect the different factors that affect song s intelligibility. 3. We train a prediction model using the proposed features to automatically estimate the intelligibility of a given song. 3

22 Chapter 1. Introduction 4. We study the efficiency and accuracy of using crowdsourcing for intelligibility score annotation compared to a lab-controlled annotation. We conclude by proposing directions of future work that include other factors affecting language learning with music, such as lyrics complexity and grammar correctness, and their applications in building a music recommendation system for language learning. 1.4 Thesis Outline The thesis is structured as follows: Chapter 2 studies the existing literature as well as defines the problem to be investigated. In Chapter 3 we report our proposed system, which includes our labeling scheme, features, model training and the crowdsourcing approach for data annotation. A description of the future directions and plans are described in Chapter 4. Finally we conclude the thesis with a summary in Chapter 5. 4

23 Chapter 2 Literature Survey and Problem Identification The problem of recommending music for language learners has not been extensively studied in the literature. However, there have been some cognitive studies on the intelligibility and lyrics complexity, which are relevent ot this problem. In the following we cover some of the cognitive studies which we will use as basis for building our acoustic model. One of the primary factors in selecting suitable songs for language learners is the intelligibility of the sung lyrics. The fact that sung lyrics could be more difficult to comprehend than spoken words has long been established in the scientific community. For example, singing at high pitch significantly impairs the intelligibility of the sung lyrics and one study showed that even professional voice teachers and phoneticians had difficulty telling vowels apart when sung at high pitch [12]. Another study by Collister and Huron showed that sung lyrics causes hearing errors as much as seven times more frequent than spoken ones [3]. Such studies also noted lyric features which could help differentiate intelligible from unintelligible songs; for instance, one study noted that songs comprised mostly of common words sounded more intelligible than songs with less frequent words [17]. However, lyric features alone are not sufficient to assess intelligibility; the same lyrics can be rendered more or less intelligible depending on, for instance, the speed 5

24 Chapter 2. Literature Survey and Problem Identification at which they are sung. These other factors must be taken into account to truly assess lyric intelligibility. One feature of the singing voice that has been addressed by the research community is the overall quality of the singing voice. Some of the proposed methods of assessing singing voice quality in the literature have been shown to reliably distinguish between trained and untrained singers [2, 34, 47]. One acoustic feature which multiple studies have found to be useful for this purpose is the power ratio of frequency bands containing energy from the singing voice to other frequency bands. Additionally, calculation of pitch intervals and vibrato have also been shown to be useful for this purpose [33]. However, while the quality of singing voice may be a factor in assessing intelligibility, it is not the only such factor. Aspects of the song that have nothing to do with the skill of the singer or the quality of their performance, such as the presence of loud background instruments, can contribute, and additional features that take these factors into account are needed for a system which determines lyric intelligibility. Another related task is that of singing transcription, in which a computer must listen to and transcribe sung lyrics [29]. It may seem that one could assess intelligibility by comparing a computer s transcription of the lyrics to a ground truth set of lyrics and determining if the transcription is accurate. But this too does not really determine intelligibility, at least as humans perceive it. A computer can use various filters and other signal processing or machine learning tools to process the audio and make it easier to understand, but a human listening to the music will not necessarily have access to such tools. Thus, even if a computer can understand or accurately transcribe the lyrics of a piece of music, this does not indicate whether those lyrics would be intelligible to a human as well. 2.1 Factors affecting sung lyrics intelligibility Several cognitive studies has been conducted to identify the factors affecting the intelligibility of the sung lyrics. In the following we go through 6

25 Chapter 2. Literature Survey and Problem Identification these studies and list their findings on what factors are important for the problem in hand. These factors will be the basis on which we build our acoustic model to automatically assess the intelligibility similar to how listeners perceive it. One recent study investigates several factors that are assumed to affect the intelligibility, which are detailed below, and conducted a behavioral experiment to test these hypotheses [17]. Another study investigated the factors affecting intelligibility categorically as either performer-related factors, environment-related factors and listener-related factors [8]. Finally, in [5], the authors studied the intelligibility with focus on style and genres related factors and how different genres are more or less intelligible than others on average. Since the purpose of this work is to build an acoustic system to assess the intelligibility, we excluded the listener-related factors, such as hearing impairment. In the following, we list the different factors that were proven to affect the intelligibility which will be the basis for building our model: 1. Balance between singer(s) and accompaniment [8]. 2. Using common and familiar words in the language increases the intelligibility of the song [17]. 3. Melismas reduce the intelligibility of a song. Melismas are a a syllable that is sustained over several notes [17]. 4. When syllable stresses are aligned with musical stresses, the intelligibility increases[17]. 5. Repeating same words are across multiple phrases in the same song increases intelligibility [17]. 6. Genre and compositional style [5, 8]. Based on the above factors defined by cognitive studies, it is possible to measure this individual factors from an audio wave and build a model capable of estimating a song s suitability for a language learner. However, building such a system would require a labeled dataset with the ground truth reflecting the intelligibility and complexity levels. 7

26 Chapter 2. Literature Survey and Problem Identification 2.2 Relevance to Speech Intelligibility Speech and singing are naturally associated and share several similar problems. Hence, it is important to consider the speech intelligibility problem and whether the proposed approaches in the literature are relevant to singing intelligibility. It is clear that the factors that would make the speech unintelligible would also work in the case of singing. However, by reviewing the literature, we observe that most approaches are focusing on estimating speech intelligibility after being distorted (modified) by a transmission system. The purpose of these studies is to evaluate the system by measuring the quality and intelligibility of the speech after being transmitted. These systems require access to the original speech before being transmitted. By studying the literature, we find that there are mainly two approaches to quantify the intelligibility, one subjective and another objective [44]. Subjective measures estimates the intelligibility using a listening test [30]. Objective measures are focusing on how to computationally estimate the intelligibility of a given signal. Since many of these approaches were developed to measure the transmission quality of transmission systems, most depend on having the original signal as a reference.an example of objective measure is the Articulation Index (AI) [20]. The calculation of the articulation index is based on measuring the signal-to-noise ratio in the transmitted signal over a number of a specific bands. Additional approaches were proposed that build on top of the articulation index and as in [13, 43]. These approaches are the basis of similar recent measurement called the speech transmission index (STI) [14] which is standardized by IEC standard [4]. However, the articulation index and speech transmission index are focusing on the intelligibility for a transmission system and based mainly on the errors due to the transmission quality, rather than the speech-related factors. Additionally, it required having a clean version of the speech for comparison, which is not available in the case of singing. While adapting this measures to the case of singing is still an open area of research, we focus in this thesis of studying and measuring the intelligibility based on 8

27 Chapter 2. Literature Survey and Problem Identification the factors that are specific for the case of singing voice. 2.3 Problem Statement Our goal is to design a system to recommend songs for students learning a foreign language as part of their language immersion. The main research problem to be solved is to automatically estimate the intelligibility of the sung lyrics of a given song. Solving this problem will help in selecting the suitable songs with comprehensible sung lyrics matching the level of fluency of the user. To solve this problem, the challenges are: 1. Defining ground truth measure of intelligibility. 2. Collecting a dataset for this specific problem. 3. Investigating relevant acoustic features. 4. Building a predictive model to estimate the intelligibility using the selected features. Solving the problem of intelligibility estimation is sufficient to integrate this criteria in current recommendation systems so it would recommend songs that match the user s taste and also scores higher than a certain threshold depending on the user s fluency. However, this would only serve as an initial system where additional factors can be integrated afterwards. 9

28 Chapter 3 Proposed System In this chapter, we discuss the steps taken in solving the target problem. We specifically focus on estimating the intelligibility as an initial step of selecting songs suitable for language learners. We state our approach in solving each of the four main challenges introduced in the problem statement in the previous chapter. 3.1 Defining ground truth measure of intelligibility To build a system that can automatically process a song and evaluate the intelligibility of its lyrics, it is essential to gather ground truth data that reflects this intelligibility on average across different listeners. Hence, we conducted a study where participants were tasked with listening to short excerpts of music and transcribing the lyrics, a common task for evaluating intelligibility of lyrics [5]. The accuracy of their transcription can be used to assess the intelligibility of each excerpt. The experiment was initially conducted in a lab which required physical presence of the participants. In the next phase, we investigated the possibility of labeling our dataset using crowd sourcing platforms, specifically Amazon Mechanical Turk (MTurk), and verified its suitability and accuracy in this specific task as indicated in Section

29 Chapter 3. Proposed System 3.2 The behavioral experiment and collecting the dataset In order to study the different acoustic and textual features useful in estimating the intelligibility and build an acoustic model to predict it, it is essential to have a reliable and well-labeled data. Using the method defined in Section 3.1, we collected a total of 200 excerpts across two phases. The first phase used a setup that required physical presence of the participants in the lab and was used to label 100 excerpts. The second verified the accuracy of using MTurk platform for this labeling task and then was used to label another 100 excerpts Lab Experiment Participants Seventeen participants (seven females and ten males) volunteered to take part in the experiment. Participants were between 21 to 41 years (mean = 27.4 years). All participants indicated no history of hearing impairment and that they spoke some English as a second language. Participants were rewarded with a $10 voucher for their time. Participants were recruited through university channels via posters and fliers. The majority of the participants were university students. Materials For the purpose of this study, we focused solely on English-language songs. Because one of the main applications for such a system is to recommend music for students who are learning foreign languages, we focused on genres that are popular for students. To identify these genres, we asked 48 university students to choose the 3 genres that they listen to the most, out of the 12 genres introduced in [5], as these 12 genres cover a wide variety of singing styles. The twelve genres are: Avante-garde, Blues, Classical, Country, Folk, Jazz, Pop/Rock, Rhythm and Blues, Rap, Reggae, Reli- 11

30 Chapter 3. Proposed System gious, and Theater. Because the transcription task is long and tiring for participants, we limited the number of genres tested to only five, from which we would draw approximately 45 minutes worth of music for transcription. We selected the five most popular genres indicated by the 48 participants: Classical, Folk, Jazz, Pop/Rock, and Rhythm and Blues. After selecting the genres, we collected a dataset of 10 songs per genre. Because we were interested in evaluating participants ability to transcribe an unfamiliar song, as opposed to transcribing a known song from memory, we focused on selecting songs that are not well-known in each genre. We approached this by selecting songs that have less than 200 ratings on the website Rate Your Music (rateyourmusic.com). Rate Your Music is a database of popular music where users can rate and review different songs, albums and artists. Popular songs have thousands of ratings while less known songs have few ratings. We used this criteria to collect songs spanning the 5 genres to produce our dataset. The songs were randomly selected, with no control over the vocal range or the singer s accent, as long as they satisfied the condition of being in English and having few ratings. Because transcribing an entire song, let alone 50 songs, would be an overwhelming process for the participants, we selected short excerpts from each song to be transcribed. Two excerpts per song were selected randomly such that each excerpt would include a complete utterance (e.g., no excerpts were terminated mid-phrase). Excerpts varied between 3 to 16 seconds in length (average = 6.5 seconds), and contained 9.5 words on average. The ground-truth lyrics for these songs were collected from online sources and reviewed by the experimenters to ensure they matched the version of the song used in the experiment. It is important to note that selecting short excerpts might affect intelligibility, because the context of the song (which may help in understanding the lyrics) is lost. However, using these short excerpts is essential in making the experiment feasible for the participants, and would still broadly reflect the intelligibility of the song. The complete dataset is composed of 100 excerpts from 50 songs, 2 excerpts per song, covering 5 genres, and 10 songs per genre. 12

31 Chapter 3. Proposed System A song with ground truth lyrics Human par5cipants transcribe the lyrics Compare with ground truth Intelligibility Score Figure 3.1: The process of labeling songs with intelligibility score Procedure We conducted the experiment in three group listening sessions. During each session, the participants were seated in a computer lab, and recorded their transcriptions of the played excerpts on the computer in front of them. The excerpts were played in randomized order, and each excerpt was played twice consecutively. Between the two playbacks of each excerpt there was a pause of 5 seconds, and between different excerpts a pause of 10 seconds, to allow the participants sufficient time to write their transcription. The total duration of the listening session is 46:59 minutes. Two practice trials were presented before the experimental trials began, to familiarize participants with the experimental procedure. Figure 3.1 shows the complete procedure of labeling one given song. Results and Discussion To evaluate the accuracy of the participants transcription, we counted the number of words correctly transcribed by the participant that match the ground truth lyrics. For each transcription by each student, the ratio between correctly transcribed words to the total number of words in the excerpt was calculated. We then calculated the average ratio for each excerpt across all 17 participants to yield an overall score for each excerpt between 0 and 1. This score was used to represent the ground-truth transcription accuracy, or Intelligibility score, for each excerpt. The distribution of Intelligibility scores in the dataset is shown in Figure 3.2. From the figure, we can observe that the intelligibility scores are biased towards higher values, i.e. there are relatively few excerpts with a low intelligibility score. This is caused by the restricted set of popular genres indicated by students, as certain excluded genres would be expected to have low intelligibility, such as 13

32 Chapter 3. Proposed System Distribution of intelligibility Score for 1st batch of dataset 3 Density Intelligibility Score Figure 3.2: The distribution of the transcription accuracies (Intelligibility score). Heavy Metal. Not having a wide variance of intelligibility scores will affect our system s ability in learning. Hence, in the second phase of collecting the dataset, we focused on including the genres with less intelligibility that were not included in this phase Amazon Mechanical Turk Experiment One major drawback of the previous method is that it requires long time, physical presence of both participants and researchers and needs a space reserved for the whole time of the experiment. This makes performing the experiment on a large scale difficult. Hence, we investigated the possibility of using online crowdsourcing platform such as MTurk. Mturk is an online platform that enables individuals and employers to coordinate and perform human intelligence tasks, known as HITs, in return for a payment. This will resolve the problems of requiring physical presence of both parties and the need of reserved space for the experiment. However, we need to investigate whether it will save time and if it will produce results with high accuracy that correlates with the results in the controlled lab. Other studies have been conducted to validate MTurk s in speech transcription [26], however, to our knowledge, there is no such 14

33 Chapter 3. Proposed System Figure 3.3: The webpage setup for the Mturk experiment studies for lyrics transcription and its use in estimating intelligibility score. MTurk setup The webpage interface is the main setup for the MTurk experiment. We designed it in a way that delivers all the required information to the user, while replicating the experiment conducted in the lab, for comparison and validation purposes. Figure 3.3 shows the webpage interface for the participants. The setup is composed of three main parts: instructions, playback and transcription, and a short survey of participants age, gender, musical experience and favorite genres. We priced a single HIT with 0.01 US Dollars. We limited the time of the HIT to maximum 2 minutes and allowed the excerpt to be played maximum two times, same as the lab experiment. Results and Discussion The initial part of the experiment was conducted using the same 100 excerpts used in the previous lab experiment. We asked for 17 transcriptions per excerpt, same as the lab experiment. For a total number of 1700 HITs, the whole experiment was completed in one week. Table 3.1 shows 15

34 Chapter 3. Proposed System Intelligibility Scores from Lab Intelligibility Score collected in Lab vs MTurk Intelligibility Scores from Mturk Figure 3.4: Score obtained from lab-controlled experiment vs MTurk experiment a comparison between lab and MTurk experiments. Figure 3.4 shows the scores obtained from both the lab and MTurk experiments. The figure shows that the results correlate with each other with a Pearson s correlation of Hence, it is safe to assume that MTurk is a reliable platform to label such datasets with our required criteria while being more economic and requiring shorter time. Method Lab MTurk Cost 170$ 41$ Preparation time Time to get results 2 weeks to find participants 3 one-hour sessions across two weeks 1 week to get familiar with MTurk and setup the webpage (Needed only once) one week Table 3.1: Comparison between lab-controlled and MTurk experiments in terms of cost, preparation time, and time to get results. The results show that MTurk is superior in all three categories Extending the dataset using MTurk After validating the reliability of using MTurk for labeling songs with intelligibility score, we extended our dataset with additional 100 excerpts. 16

35 Chapter 3. Proposed System The motivation is to enlarge the dataset and to balance the between high and low intelligible songs. As previously shown in Figure 3.2, the first batch of the dataset is skewed towards high intelligible songs, due to using certain intelligible genres. In the second batch of data collection, we focused on using genres that are known to be less intelligible to balance the distribution of intelligibility score across the dataset. For the 2nd batch, we collected another 100 excerpts of these genres: Metal, Punk, Rap, Reggae, Electro. As shown in Figure 3.5, after the 2nd batch included more excepts that have low intelligibility, which balanced the skewed scores in the first batch. Having a dataset that is equally distributed is important for the model training and generalization. Figure 3.6 shows the distribution of intelligibility scores across different genres. We can see how the new five extra genres have a lower intelligibility scores on average to balance the dataset. The results also broadly agree with the results from [5] which was a lab-controlled experiment. Additionally validating the scores collected from MTurk. 3.3 Investigating relevant acoustic features The purpose of this study is to select audio features that can be used to build a system capable of 1) predicting the intelligibility of song lyrics, and 2) evaluating the accuracy of these predictions with respect to the ground truth gathered from human participants. In the following approach, we analyze the input signal and extract expressive features that reflect the different aspects of an intelligible singing voice. Several properties may contribute to making the singing voice less intelligible than normal speech. One such aspect is the presence of background music, as accompanying music can cover or obscure the voice. Therefore, highly intelligible songs would be expected to have a dominant singing voice compared with the accompanying music [5]. Unlike speech, the singing voice has a wider and more dynamic pitch range, often featuring higher pitches in soprano vocal range. This has been shown to affect the intelligibility of the songs, 17

36 Chapter 3. Proposed System Distribution of intelligibility Score for 1st batch of dataset Distribution of intelligibility Score for 2nd batch of dataset 3 3 Density 2 1 Density Intelligibility Score (a) The distribution of the Intelligibility score of the 1st batch Intelligibility Score (b) The distribution of the Intelligibility score of the 2nd batch Distribution of intelligibility Score for complete dataset 2.0 Density Intelligibility Score (c) The distribution of the Intelligibility score of the full dataset Figure 3.5: batches Comparison between intelligibility score distribution across Intelligibility Scores per Genre Intelligibility Score Metal Punk Reggae Rap ClassicalElectroPop rock Jazz RnB Folk Genre Figure 3.6: Intelligibility Scores across different genres 18

37 Chapter 3. Proposed System especially with respect to the perception of sung vowels [3, 1]. An additional consideration is that in certain genres, such as Rap, singing is faster and has a higher rate of words per minute than speech, which can reduce intelligibility. Furthermore, as indicated in [18], the presence of common, frequently occurring words helps increase intelligibility, while uncommon words decrease the likelihood of understanding the lyrics. In our model, we aimed to include features that express these different aspects to determine the intelligibility of song lyrics across different genres. These features are then used to train the model to accurately predict the intelligibility of lyrics in the dataset, based on the ground truth collected in our behavioral experiment. This part of the study was conducted on the initial 100 excerpts before the dataset extension using MTurk Preprocessing To extract the proposed features from an input song, two initial steps are required: separating the singing voice from the accompaniment, and detecting the segments with vocals. To address these steps, we selected the following approaches based on current state-of-the-art methods: Vocals Separation Separating vocals from accompaniment music is a well-known problem that has received considerable attention in the research community. Our approach makes use of the popular Adaptive REPET algorithm [25]. This algorithm is based on detecting the repeating patten in the song, which is meant to represent the background music. Separating the detected pattern leaves the non-repeating part of the song, meant to capture the vocals. Adaptive REPET also has the advantage of discovering local repeating patterns in the song over the original REPET algorithm [37]. Choosing Adaptive REPET was based on two main advantages: The algorithm is computationally attractive, and it shows competitive results compared to other separation algorithms, as shown in the evaluation of [23]. 19

38 Chapter 3. Proposed System Detecting Vocal Segments Detecting vocal and non-vocal segments in the song is an important step in extracting additional information about the intelligibility of the lyrics. Various approaches have been proposed to perform accurate vocal segmentation, however, it remains a challenging problem. For our approach, we implemented a method based on extracting the features proposed in [24], then training a Random Forest classifier using the Jamendo corpus 1 [39]. The classifier was then used to binary classify each frame of the input file as either vocals or non-vocals Audio features In this section, we investigate the set of features we used in training the model for estimating lyrics intelligibility. We use a mix of features reflecting specific aspects of intelligibility plus common standard acoustic features. The selected features are: 1. Vocals to Accompaniment Music Ratio (VAR): Defined as the energy of the separated vocals divided by the energy of the accompaniment music. This ratio is computed only in segments where vocals are present. This feature reflects how strong the vocals are compared to the accompaniment. High VAR suggests that vocals are relatively loud and less likely to be obscured by the music. Hence, higher VAR counts for higher intelligibility. This feature is particularly useful in identifying songs that are unintelligible due to loud background music which obscures the vocals. 2. Harmonics-to-residual Ratio (HRR): Defined as the the energy in a detected fundamental frequency (f0) according to the YIN algorithm [6] plus the energy in its 20 first harmonics (a number chosen based on empirical trials), all divided by the energy of the residual. This ratio is also applied only to segments where vocals are present

39 Chapter 3. Proposed System Since harmonics of the detected f0 in vocal segments are expected to be produced by the singing voice, this ratio, like VAR, helps to determine whether the vocals in a given piece of music are stronger or weaker than the background music which might obscure it. 3. High Frequency Energy (HFE): Defined as the sum of the spectral magnitude above 4kHz, HF E n = N b /2 k=f 4k a n,k (3.1) where a n,k is the magnitude of block n and FFT index k of the short time Fourier transform of the input signal, f 4k is the index corresponding to 4 khz and N b is the FFT size [16]. We calculate the mean across all frames of the separated and segmented vocals signal, as we are interested in the high energy component in vocals and not the accompanying instruments. We get a scalar value per input file reflecting high frequency energy. Singing in higher frequencies has been proven to be less intelligible than music in low frequencies [3], so detection of high frequency energy can be a useful clue that such vocals might be present and could reduce the intelligibility of the music, such as frequently happens with opera music. 4. High Frequency Component (HFC): Defined as the sum of the amplitudes and weighted by the frequency squared, HF C n = N b /2 k=1 k 2 a n,k (3.2) where a n,k is the magnitude of block n and FFT index k of the short time Fourier transform of the input signal and N b is the FFT size [27]. This is another measure of high frequency content. 5. Syllable Rate: Singing at a fast pace while pronouncing several syllables over a short period of time can negatively affect the 21

40 Chapter 3. Proposed System intelligibility[7]. In the past, Rao et al. used temporal dynamics of timbral features to separate singing voice from background music [40]. These features showed more variance over time for singing voice, while being relatively invariant to background instruments. We expect that these features will also be sensitive to the syllable rate in singing. We use the temporal standard deviation of two of their timbral features: sub-band energy (SE) in the range of ([ Hz]), and sub-band spectral centroid (SSC) in the range of ([ khz]), defined as SSC = khigh k=k low f(k) X(k) khigh k=k low X(k) (3.3) SE = k high k=k low X(k) 2 (3.4) where f(k) and X(k) are frequency and magnitude spectral value of the k th frequency bin, and k low and k high are the nearest frequency bins to the lower and upper frequency limits on the sub-band respectively. According to [40], SE enhances the fluctuations between voiced and unvoiced utterances, while SSC enhances the variations in the 2 nd, 3 rd and 4 th formants across phone transitions in the singing voice. Hence, it is reasonable to expect high temporal variance of these features for songs with high syllable rate, and vice versa. Thus, this feature is able to differentiate songs with high and low syllable rates. We would expect that very high and very low syllable rates should lead to low intelligibility score, while rates in a similar range to that of speech should result in high intelligibility score. 6. Word-Frequency Score: Songs which use common words have been shown to be more intelligible than those which use unusual or obscure words [18]. Hence, we calculate a word-frequency score for the lyrics of the songs as an additional feature. This feature is a nonacoustic feature that is useful in cases where the lyrics of the song are available. We calculate the word-frequency score using the wordfreq 22

41 Chapter 3. Proposed System open-source toolbox [42] which provides an estimates of the frequencies of words in many languages. 7. Tempo and Event Density: These two rhythmic features reflect how fast the beat and rhythm of the song are. Event density is defined as the average frequency of events, i.e., the number of note onsets per second. Songs with very fast beats and high event density are likely to be less intelligible than slower songs, since the listener has less time to process each event before the next one begins. We used the MIRToolbox[22] to extract these rhythmic features. 8. Mel-frequency cepstral coefficients (MFCCs): MFCCs approximates the human auditory system s response more closely than the linearly-spaced frequency bands [36]. MFCCs have been proven to be effective features in problems related to singing voice analysis [41], and so were considered as a potential feature here as well. For our system, we selected the 17 first coefficients (excluding the 0th) as well as the deltas of those features, which proved empirically to be the best number of coefficients. The MFCCs are extracted from the original signal without separation, as it reflects how the whole song is perceived. By extracting this set of features for an input file, we end up with a vector of 43 features to be used in estimating the intelligibility of the lyrics in this song. 3.4 Building an acoustic model We used the dataset and ground-truth collected in our behavioral experiment to train a Support Vector Machine model to estimate the intelligibility of the lyrics. To categorize the intelligibility to different levels that would match a language student s fluency level, we divided our dataset to three classes: High Intelligibility: excerpts with transcription accuracy of greater 23

42 Chapter 3. Proposed System than Moderate Intelligibility: excerpts with transcription accuracy between 0.33 and 0.66 inclusive. Low Intelligibility: excerpts with transcription accuracy of less than Out of the 100 samples in our dataset, 43 are in the High Intelligibility class, 42 are in the Moderate Intelligibility class, and the remaining 15 are in the Low Intelligibility class. For this pilot study, we tried a number of common classifiers, including Support Vector Machine (SVM), random forest and k-nearest neighbors. Our trials for finding a suitable model led to using SVM with a linear kernel, as it is an efficient, fast and simple model which is suitable for this problem. Finally, as a preprocessing step, we normalize all the input feature vectors before passing them to the model to be trained. Model Evaluation Because this problem has not been addressed before in the literature, and it is not possible to perform evaluation using other methods, we based our evaluation on classification accuracy from the dataset. Given the relatively small number of samples in the dataset, we used leave-one-out crossvalidation for evaluation. To evaluate the performance of our model, we compute overall accuracy, as well as the Area Under the ROC Curve (AUC). We scored AUC of 0.71 and accuracy of 66% with the aforementioned set of features and model. The confusion matrix of validating our model using leave-one-out cross-validation on our collected dataset is shown in Figure 3.7. The figure shows that the classifier has relatively more accuracy in predicting high and moderate than low intelligibility, which is often confused with the moderate class. Given that our findings are based on a relatively small segment of excerpts with low intelligibility, the classifier was found to be trained to work better on the high and moderate excerpts. Following model evaluation on the complete dataset, we were interested in investigating how the model performs on different genres, specifically how 24

43 Chapter 3. Proposed System Figure 3.7: Confusion Matrix of the SVM output. Confusion Matrix of Rock Genre High Moderate Low High Moderate Low Confusion Matrix of Folk Genre High Moderate Low High Moderate Low Confusion Matrix of R&B Genre Confusion Matrix of Jazz Genre High High Moderate Moderate Low Low High Moderate Low High Moderate Low Figure 3.8: Confusion matrix of the different genres 25

44 Chapter 3. Proposed System Genre Classification Accuracy Pop/Rock 60% R&B 55% Classical 70% Folk 55% Jazz 60% Table 3.2: Classification accuracy for different genres it performs when tested with a genre that was not included in the training dataset. This would imply how the model generalizes when running on different genres that was not present during training, as well as showing how changing genres affect classification accuracy. We performed an evaluation where we trained our model using 4 out of the 5 genres in our dataset, and tested it on the 5th genre. The classification accuracy across different genres is shown in Table 3.2. The results show variance in classifying different genres. For example, Classical music receives higher accuracy, while genres as Rhythm and Blues and Folk shows less accuracy. By analyzing the confusion matrices of each genre shown in Figure 3.8, we found that the confusion is mainly between high and moderate classes. By reviewing the impact of the different features on the classifier performance, we looked into what features have the biggest impact using the attribute ranking feature in Weka [48]. We found that several MFCCs contribute most in differentiating between the three classes, which we interpret to be due to analyzing the signal in different frequency sub-bands incorporates perceptual information of both the singing voice and the background music. This was followed by the features reflecting the syllable rate in the song, because singing rate can radically affect the intelligibility. Vocals-to- Accompaniment Ratio and High Frequency Energy followed in their impact on differentiating between the three classes. The features that had the least impact were the tempo and event density, which does not necessarily reflect the rate of singing. For further studies on the suitability of the features in classifying songs with very low intelligibility, the genres pool can be extended to include 26

45 Chapter 3. Proposed System other genres with lower intelligibility, rather than being limited to the popular genres between students. Further studies can also include the feature selection and evaluation process: similar to the work in [46], deep learning methods may be explored to select the features which perform best, rather than hand-picking features, to find the most suitable set of features for this problem. It is possible to extend the categorical approach of intelligibility levels to a regression problem, in which the system evaluates the song s intelligibility with a percentage. Similarly, certain ranges of the intelligibility score can be used to recommend songs to students based on their fluency level. 27

46 Chapter 4 Future Work In its current state, our work covers a pilot approach for estimating the intelligibility of the singing voice. However, the broader problem is to recommend songs for students who are learning a foreign language. Hence, Future work would include an extended studies for estimating the intelligibility of the singing voice, analyzing the lyrics in terms of complexity and grammatical structure, and use the proposed scores in recommending music to language students based on their fluency level. Regarding the possible directions of future work on the intelligibility score, it could include: 1. Investigating the effectiveness of using baseline acoustic and textual features to expand current feature set. 2. expanding the dataset to allow using approaches that require largescale datasets. Using the validated MTurk labeling scheme, the challenge is to find a way to collect the ground truth lyrics and align them with the excerpts using methods from the literature, e.g. [15, 9]. 3. With a large-scale dataset, we can investigate approaches of deep learning, such as convolutional neural networks, that perform feature extraction instead of using hand-picked features. Further work on lyrics complexity would investigate the grammatical structure of the lyrics and its correctness, to avoid recommending songs 28

INTELLIGIBILITY OF SUNG LYRICS: A PILOT STUDY

INTELLIGIBILITY OF SUNG LYRICS: A PILOT STUDY INTELLIGIBILITY OF SUNG LYRICS: A PILOT STUDY Karim M. Ibrahim 1 David Grunberg 1 Kat Agres 2 Chitralekha Gupta 1 Ye Wang 1 1 Department of Computer Science, National University of Singapore, Singapore

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. Pitch The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. 1 The bottom line Pitch perception involves the integration of spectral (place)

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Subjective evaluation of common singing skills using the rank ordering method

Subjective evaluation of common singing skills using the rank ordering method lma Mater Studiorum University of ologna, ugust 22-26 2006 Subjective evaluation of common singing skills using the rank ordering method Tomoyasu Nakano Graduate School of Library, Information and Media

More information

Acoustic and musical foundations of the speech/song illusion

Acoustic and musical foundations of the speech/song illusion Acoustic and musical foundations of the speech/song illusion Adam Tierney, *1 Aniruddh Patel #2, Mara Breen^3 * Department of Psychological Sciences, Birkbeck, University of London, United Kingdom # Department

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene Beat Extraction from Expressive Musical Performances Simon Dixon, Werner Goebl and Emilios Cambouropoulos Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria.

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Modeling memory for melodies

Modeling memory for melodies Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Audio Feature Extraction for Corpus Analysis

Audio Feature Extraction for Corpus Analysis Audio Feature Extraction for Corpus Analysis Anja Volk Sound and Music Technology 5 Dec 2017 1 Corpus analysis What is corpus analysis study a large corpus of music for gaining insights on general trends

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Perceptual dimensions of short audio clips and corresponding timbre features

Perceptual dimensions of short audio clips and corresponding timbre features Perceptual dimensions of short audio clips and corresponding timbre features Jason Musil, Budr El-Nusairi, Daniel Müllensiefen Department of Psychology, Goldsmiths, University of London Question How do

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Features for Audio and Music Classification

Features for Audio and Music Classification Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

The Human Features of Music.

The Human Features of Music. The Human Features of Music. Bachelor Thesis Artificial Intelligence, Social Studies, Radboud University Nijmegen Chris Kemper, s4359410 Supervisor: Makiko Sadakata Artificial Intelligence, Social Studies,

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Measuring a Measure: Absolute Time as a Factor in Meter Classification for Pop/Rock Music

Measuring a Measure: Absolute Time as a Factor in Meter Classification for Pop/Rock Music Introduction Measuring a Measure: Absolute Time as a Factor in Meter Classification for Pop/Rock Music Hello. If you would like to download the slides for my talk, you can do so at my web site, shown here

More information

CZT vs FFT: Flexibility vs Speed. Abstract

CZT vs FFT: Flexibility vs Speed. Abstract CZT vs FFT: Flexibility vs Speed Abstract Bluestein s Fast Fourier Transform (FFT), commonly called the Chirp-Z Transform (CZT), is a little-known algorithm that offers engineers a high-resolution FFT

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

Introductions to Music Information Retrieval

Introductions to Music Information Retrieval Introductions to Music Information Retrieval ECE 272/472 Audio Signal Processing Bochen Li University of Rochester Wish List For music learners/performers While I play the piano, turn the page for me Tell

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.

More information

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION Graham E. Poliner and Daniel P.W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University, New York NY 127 USA {graham,dpwe}@ee.columbia.edu

More information

Retrieval of textual song lyrics from sung inputs

Retrieval of textual song lyrics from sung inputs INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Retrieval of textual song lyrics from sung inputs Anna M. Kruspe Fraunhofer IDMT, Ilmenau, Germany kpe@idmt.fraunhofer.de Abstract Retrieving the

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Musical Acoustics Session 3pMU: Perception and Orchestration Practice

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Acoustic Prosodic Features In Sarcastic Utterances

Acoustic Prosodic Features In Sarcastic Utterances Acoustic Prosodic Features In Sarcastic Utterances Introduction: The main goal of this study is to determine if sarcasm can be detected through the analysis of prosodic cues or acoustic features automatically.

More information

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Marcello Herreshoff In collaboration with Craig Sapp (craig@ccrma.stanford.edu) 1 Motivation We want to generative

More information

Music Information Retrieval Community

Music Information Retrieval Community Music Information Retrieval Community What: Developing systems that retrieve music When: Late 1990 s to Present Where: ISMIR - conference started in 2000 Why: lots of digital music, lots of music lovers,

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University danny1@stanford.edu 1. Motivation and Goal Music has long been a way for people to express their emotions. And because we all have a

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam CTP431- Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology KAIST Juhan Nam 1 Introduction ü Instrument: Piano ü Genre: Classical ü Composer: Chopin ü Key: E-minor

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

Recognising Cello Performers using Timbre Models

Recognising Cello Performers using Timbre Models Recognising Cello Performers using Timbre Models Chudy, Magdalena; Dixon, Simon For additional information about this publication click this link. http://qmro.qmul.ac.uk/jspui/handle/123456789/5013 Information

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices Yasunori Ohishi 1 Masataka Goto 3 Katunobu Itou 2 Kazuya Takeda 1 1 Graduate School of Information Science, Nagoya University,

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

Using Genre Classification to Make Content-based Music Recommendations

Using Genre Classification to Make Content-based Music Recommendations Using Genre Classification to Make Content-based Music Recommendations Robbie Jones (rmjones@stanford.edu) and Karen Lu (karenlu@stanford.edu) CS 221, Autumn 2016 Stanford University I. Introduction Our

More information

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Introduction Active neurons communicate by action potential firing (spikes), accompanied

More information

Speech To Song Classification

Speech To Song Classification Speech To Song Classification Emily Graber Center for Computer Research in Music and Acoustics, Department of Music, Stanford University Abstract The speech to song illusion is a perceptual phenomenon

More information

AUDITION PROCEDURES:

AUDITION PROCEDURES: COLORADO ALL STATE CHOIR AUDITION PROCEDURES and REQUIREMENTS AUDITION PROCEDURES: Auditions: Auditions will be held in four regions of Colorado by the same group of judges to ensure consistency in evaluating.

More information

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Analytic Comparison of Audio Feature Sets using Self-Organising Maps Analytic Comparison of Audio Feature Sets using Self-Organising Maps Rudolf Mayer, Jakob Frank, Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology,

More information

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC 12th International Society for Music Information Retrieval Conference (ISMIR 2011) MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC Sam Davies, Penelope Allen, Mark

More information

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson Automatic Music Similarity Assessment and Recommendation A Thesis Submitted to the Faculty of Drexel University by Donald Shaul Williamson in partial fulfillment of the requirements for the degree of Master

More information

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment Gus G. Xia Dartmouth College Neukom Institute Hanover, NH, USA gxia@dartmouth.edu Roger B. Dannenberg Carnegie

More information

Does Music Directly Affect a Person s Heart Rate?

Does Music Directly Affect a Person s Heart Rate? Wright State University CORE Scholar Medical Education 2-4-2015 Does Music Directly Affect a Person s Heart Rate? David Sills Amber Todd Wright State University - Main Campus, amber.todd@wright.edu Follow

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information