Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Size: px
Start display at page:

Download "Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues"

Transcription

1 Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park Annie Hu Natalie Muenster Abstract We propose detecting and responding to humor in spoken dialogue by extracting language and audio cues and subsequently feeding these features into a combined Recurrent Neural Network (RNN) and logistic regression model. In this paper, we parse Switchboard phone conversations to build a corpus of punchlines and unfunny lines where punchlines precede tokens in Switchboard transcripts. We create a combined RNN and logistic regression model that uses both acoustic and language cues to predict whether a conversational agent should respond to an utterance with. Our model achieves an F1- score of 63.2 and accuracy of This model outperforms our logistic language model (F1-score 56.6) and RNN acoustic model (59.4) as well as the final RNN model of Bertero. et al s (2016) paper (52.9). Using our final model, we create a laughbot that audibly responds to a user with when their utterance is classified as a punchline. A conversational agent outfitted with a humor-recognition system such as the one we present in this paper would be valuable as these agents gain utility in everyday life. 1 Introduction Our project aims to detect humor and thus predict when someone will laugh based on their textual and acoustic cues. Humor is dependent on both the strings of words, how the speakers voice changes, as well as situational context. Predicting when someone will laugh is a significant part of contemporary media. For example, sitcoms are written and performed primarily to cause, so the ability to detect precisely why something will draw is of extreme importance to screenwriters and actors. Being able to detect humor could improve embodied conversational agents who are attributed more human attributes when they are able to display the appreciation of humor (Nijholt, 2003). Siri currently will only say you are very funny, but could become more human if it had the ability to recognize social cues like humor, and respond to them with. Thus, the objective is to train and test a model that can detect whether or not a line is funny and should be appropriately responded with. 2 Background / Related Work Existing work in predicting humor uses a combination of text and audio features and various machine learning methods to tackle the classification problem. The goal of Piot et als (2014) research in Predicting when to Laugh with Structured Classification was to predict and create an avatar that imitated a human experts sense of when to laugh in a conversation. Piot et al reduced the problem to a multi-class classification problem, where they found that using the large margin algorithm had the most natural behavior (laughing in same proportion to the expert). In a later paper, Piot et al. (2015) in Imitation Learning Applied to Embodied Conversational Agents examined the problem of using a enabled Interaction Manager to make decisions about whether to laugh or not. They defined the Regularized Classification for Apprenticeship Learning (RCAL) algorithm using audio features. Final results showed that RCAL performed much better than Multi-class classification on

2 and slightly worse on. The paper noted the problem of having a large class imbalance between and non- in the dataset, and suggested potential future work in Inverse Reinforcement Learning (IRL) algorithms. From this, we propose weighting the datasets to reduce imbalance so that the non- does not overpower what the model will learn from punchlines. Bertero and Fung (2016) compared three supervised machine learning methods to predict and detect humor in The Big Bang Theory sitcom dialogues. From a corpus where punchlines were annotated by, Bertero et al. extracted audio and language features. Audio features and language features were fed into 3 models: CNN, RNN, and CRF were compared against a logistic regression baseline (F1-score 29.2). The CNN using word vectors and overlapping time frames of 25ms performed the best (F1-score of 68.6). The RNN model (F1-score 52.9) should have performed the best but actually performed worse than CNN likely due to overfitting. The paper proposes future work building a better dialog system that understands humor. In this paper, our objective is to improve the RNN model. Rather than a sitcoms dataset, this paper will use conversational conversations whose humor we hypothesize is more relevant for a conversational agent like Siri. Further, our objective is to build a simple dialog system, laughbot, that responds to humor. 3 Dataset The focus of this project is detecting humor in everyday, regular speech in contrast to past work we have examined that analyzes humor in sitcoms. We use the Switchboard Corpora available on AFS, which consists of around 3000 groups of audio files, transcripts, and word-level time intervals. We classified each line of a transcript as a punchline if it preceded any indication of at the beginning of the next person s response: B: That, that s the major reason I m walking up the stairs. A: [Laughter] To go skiing? Punchline Table 1: Punchline inducing A: Uh-huh. Well, you must have a relatively clean conscience then []. B: [Laughter] Punchline Table 2: Laughter that induces is classified as a punchline A: just for fun. Not Punchline B: Shaking the scorpions out of their shoes []. Table 3: A line preceding someone laughing at themselves does not count as a punchline We split our data set into 80% train / 10% validation / 10% test sets. Considering the imbalanced datasets that previous work ran into, we sample 5% of the non-punchlines since our original dataset is heavily imbalanced towards nonpunchlines. This achieves a more balanced train set among positive (punchlines) and negative (unfunny lines) classes. Our final datasets each have about 35-40% punchlines. 3.1 Features We extract a combination of language and audio features from the data. Language features include: unigrams, bigrams, trigrams: we prune the vocabulary and keep the n-grams that appear more than a certain frequency in the train set, tuning the threshold on our val set. In this model, we keep n-grams that appear at least twice. Parts of speech: we implemented NLTK s POS-tagger to pull the number of nouns, verbs, adjective, adverbs and pronouns appearing in the example. (Steven Bird, 2009) Sentiment: we utilized NLTK s vader toolkit to extract sentiment from the punchline, in a scale of more negative to more positive (Steven Bird, 2009) Length, average word length: from reading past work, we learned Sitcom punchlines are often short, so we use length features. (Bertero, 2016) We also extract acoustic features from each audio file (converted to.wav format) with the opens- MILE toolkit and match the timestamped features 2

3 to the timed transcripts in Switchboard to extract corresponding acoustic and language features for a given example. (Eyben et al., 2013) Acoustic features include: MFCC: We expect MFCC vectors to store the most information about an audio sample, so we sample 12 vectors every 10ms with a maximum of 50 time intervals per example, since certain lines may be too long to fully store. Energy level: We also expect the speaker s energy to be a strong indicator of humor, so we include this as an additional feature. 4 Approach / Models 4.1 Baseline Our baseline was an all positive classifier, predicting every example as a punchline. The precision of this classifier is the proportion of true punchlines in the dataset (around 35%) and 100% recall. We also used all negative classifier, predicting every line as unfunny which has a precision of the proportion of unfunny lines (around 65%) and 0% recall. Classifier Precision Recall F1-score All Positive All Negative Table 4: Baseline metrics 4.2 Logistic Regression Language Model We train a logistic regression model using only language features (ngrams, sentiment, line length) as a secondary baseline. Logistic regression is an intuitive starting model for binary classification, and also allows us to observe and tune the performance of just our language features on predicting humor. 4.3 RNN Acoustic Model We next train a Recurrent Neural Network (RNN) using only acoustic features, to observe the performance of our acoustic features on classifying lines of audio as punchlines or not. We choose an RNN to better capture the sequential nature and thus conversational context of dialogue, and we use Gated Recurrent Unit (GRU) cells so our model can better remember earlier timesteps in a line, instead of overemphasizing the latest timesteps. During training, we use standard softmax cross entropy to calculate cost. We initially used an Adam optimizer, as it handles less-frequently seen training features better and converges smother than Stochastic Gradient Descent. Our final RNN uses an Adamax optimizer to further stabilize the model between epochs and to make the model more robust in handling less-frequently seen features and gradient noise. 4.4 Final Combined Model After designing separate language and acoustic models, we combined the two as such: 1. Run our RNN on all acoustic features in the training set, and extract the final hidden state vector in the RNN on each training example. 2. Concatenate this vector with all language features for its corresponding training example. 3. Use the combined feature vectors to train a logistic regression model. Figure 1: Diagram of final RNN + LogReg model Figure 1 shows our combined model architecture. For testing, we follow a similar process of running the acoustic features into our pre-trained RNN, concatenating the final hidden state vector with language features for an example, and running the combined feature vector through our pre-trained logistic regression model to see the prediction. 5 Laughbot Architecture The laughbot is a simple user-interface application that implements the model we built during our research and testing, predicting humor using a chatbot-style audio prompter. It is intended for demonstration purposes for both letting users experiment with custom input as well as showing the results of our project in an accessible and tangible form. The user speaks into the microphone, after which the laughbot will classify whether what was said was funny, and audibly laugh if so. 3

4 The laughbot is designed to take user input audio, transcribe it, and feed the audio file and transcription into our pre-trained RNN and logistic regression model. Multithreading allows the user to use the microphone to speak for as much of the maximum 60 second time segment as he or she would like, before pressing Enter to indicate end of speech. The audio is then saved as a.wav file and transcribed by hitting the Google Cloud Speech API. Both the transcription and the original audio file are sent through the pre-trained model in which acoustic features are extracted and run through the pre-trained RNN. The last hidden states are combined with textual features extracted from the transcription as features for the entire logistic regression model. Once a classification is obtained, not funny or funny, the laughbot will either keep a straight face by staying silent and just prompt for more audio, or it will randomly play one of several laughtracks that we recorded during the late night hours of project development. The classification is almost immediate. The brunt of the runtime of our implementation depends on the speed of transcription from Google Cloud Speech API, thus the strength of the wifi connection. During development of the laughbot, we originally tried to design it to work in real-time so the user could continually speak and the laughbot could laugh at every point in the one-sided conversation when it recognized humor. We were able to transcribe audio in real-time with the Google Speech API, and intended to multithread it to capture an audio file simultaneously, but we faced problems structuring the rest of the interface to allow it to continually run the input through the model until humor was detected or the speaker paused for more than a certain threshold. Real-time recognition and laughing is a next-step implementation that will involve sending partial transcripts and audio files through the model continuously, concatenating audio and transcript data to preceding chunks to allow for context and longer audio cues to contribute to the classification. To see a sample of our laughbot in action, see this video: 6 Results 6.1 Evaluation We evaluate using accuracy, precision, recall, and F1 scores, with greatest emphasis on F1 scores. Accuracy calculates the proportion of correct predictions: Accuracy = T P + T N T P + T N + F P + F N Precision calculates the proportion of predicted punchlines that were actual punchlines: Precision = T P T P + F P Recall calculates the proportion of true punchlines that were captured as punchlines: Recall = T P T P + F N F1 is the harmonic mean of precision and recall, which can be calculated as below. F 1 = 2 Precision Recall Precision + Recall Table 5 and Figure 2 show the final performance of our models on these metrics, evaluated on the test dataset. Notably, our final model not only beat our baseline, it also beat the final RNN model of (Bertero, 2016), the most similar approach to our own model. Note Bertero et al s paper used the Big Bang Theory sitcom dialogues which we hypothesize is an easier dataset to classify than general phone conversations. Their dataset also had a higher proportion of punchlines leading to a higher F1 score in their positive baseline. Thus their final CNN model improved less compared to their baseline (14.36% improvement) than ours performed compared to our baseline (16.35% improvement). Further, while Bertero s final CNN model had a higher F1-score than our final RNN model, our model had higher accuracy. See Model Analysis Our final combined RNN acoustic and logistic regression language model performed best of all our models. This fit our expectations, as humor should depend on both what was said and how it was said. Both the language only and audio only model had fairly similar accuracies to the final model, but had much lower recall scores 4

5 Classifier Accuracy Precision Recall F1-score Logistic Regression (train) RNN (train) Combined (train) Logistic Regression (validation) RNN (validation) Combined (validation) Logistic Regression (test) RNN (test) Combined (test) Table 5: Comparison of all models on all datasets Classifier Accuracy Precision Recall F1-score Bertero s Positive Baseline Our Positive Baseline Logistic regression (language only) RNN (audio only) Bertero s Final RNN Our Final Model (RNN + LogReg) Bertero s Final CNN Table 6: Comparison of our model against Bertero et al Figure 2: Comparison of models on test datasets (especially the language model), suggesting that the combination model was better at correctly predicting punchlines while the individual models perhaps tended to be too conservative in their predictions, in that they predicted non-punchline for too many true punchlines. We tuned our RNN and regression models separately on the validation set. We found that the language model performed best when the frequent n-grams threshold was set at 2 (so we only included n-grams that occurred at least twice in the train set), and performance dropped as this threshold was increased. This makes sense since for bigrams and especially trigrams, the number that appear at least x times in a dataset drops drastically as x increases, so with a too-high threshold, we were excluding too many potentially useful features. We also found that sentence length was a particularly important feature, which confirmed our expectation that most punchlines would be relatively short. With the RNN, we found that increasing the number of hidden states greatly improved model performance up to a certain point, then began causing overfitting past that point. The same was true of the number of epochs we ran the model through during training. As we were using an Adamax optimizer, which already performs certain optimizations to adapt the model learning rate, we did not perform much tuning on our initial learning rate. 6.3 Error Analysis Table 5 and Figure 3 show the performance of our language-only, audio-only, and combined models on our train, validation, and test datasets. All models performed significantly better on the train set than on the validation or test set, especially the final combined model, suggesting that our model is strongly overfitting to the training data. This may 5

6 dataset and achieved higher and higher F1 scores on the test set. Table 7 shows our model performance on different portions of the dataset, with noticeably less overfitting and better test set performance as the dataset portion increased. This suggests that having an even larger dataset could achieve better results. Figure 3: Comparison of F1-scores on train, val, and test be helped with hyperparameter tuning i.e. decreasing the number of epochs or number of hidden units in our RNN, or changing the regularization of our logistic regression model or by simplifying the model i.e. decreasing the maximum number of MFCC vectors to extract or decreasing the number of language features. As we explore in the next section, this may also be helped by training on larger datasets. 6.4 Dataset Analysis 6.5 Laughbot Analysis In testing our laughbot by speaking to it and waiting for or, our laughbot responded appropriately to several types of inputs. We noticed that was often caused by shorter inputs (though not in all cases, as seen in Figure 7), as well as by in the punchline. On longer inputs or inputs with negative sentiment or both, laughbot generally considered the line as unfunny. Laughbot responded positively to jokes, at some cases waiting for the actual punchline. Laughbot considered questions and statements as unfunny as shown in 8. Figure 4: RNN model train accuracy and cost on 20% of dataset Transcript you re cute ha ha ha you re so fun to talk to haha my grandma died why did the chicken cross the road to get to the other side do you like cheese i finished my cs224s project Response We initially ran our models on only 20% of the full Switchboard dataset to speed up training. Figure 4 shows the training accuracy and cost curves of our RNN model, which suggest strong overfitting to the train set. Once we finalized our combined RNN and logistic regression model we continued to run on larger portions of the Switchboard Table 8: Laughbot successes There were cases that fooled the laughbot and laughbot inappropriately responded with. For example, the laughbot laughed at I love you, likely because the statement has positive sentiment and is short in length. Sometimes, unfunny lines 6

7 Classifier Dataset Size Accuracy Precision Recall F1-score Combined (train) 20% (4659 examples) Combined (test) 20% (618 examples) Combined (train) 50% (12011 examples) Combined (test) 50% (1527 examples) Combined (train) 100% (23658 examples) Combined (test) 100% (2893 examples) Table 7: Model performance on varying dataset sizes said in a funny manner (raising the pitch of the last word) can induce. For example saying my grandma died but with high pitch at the end will cause laughbot to respond with. Whether this should be considered a success on part of the laughbot is up to the discretion of the user. 7 Conclusion and Future Work Our combined RNN and logistic regression model performed best with an F1-score of 63.2 on the test set and an accuracy of Future work will focus on reducing overfitting, as our final model run on the entire dataset still performs significantly higher on the train set. Since logistic regression is a much more naive model than an RNN, we will work on improving this ensemble model to fully utilize the predictive power of both. We also wish to explore Convolutional Neural Networks (CNNs), both stand-alone and in ensemble models, as our research showed CNNs to have higher F1-score than RNN models for this task (Bertero, 2016). We would also train on a larger dataset for a more generalizable model. To test laughbot, we would like to make it realtime as well as run it on sitcoms without their laughtracks to see how closely laughbot laughs compared to the original laughtracks of the TV show. Additional implementations could include more complex classification to identify the level or type of humor; laughbot would be able to respond with giggles or guffaws based on user input. Sarcasm in particular has always been difficult to detect in natural and spoken language processing, but our model for detecting humor is a step towards being able to recognize the common textual cues along with the specific intonations that normally accompany sarcastic speech. Since we trained and tested on real conversations, this model s humor detection is applicable to real, everyday speech more so than scripted jokes. In this paper we also identified areas in our implementation with room for improvement. With future work, our model could be a viable addition to conversational agents to make them embody more human-like attributes. Acknowledgments We would like to thank our professor Andrew Maas and the CS244S Spoken Natural Language Processing teaching team. Special thanks to Raghav and Jiwei for their direction on our combined RNN and regression model. References Dario Bertero Deep learning of audio and language features for humor prediction. Olivier Pietquin Matthieu Geist Bilal Piot Predicting when to laugh with structured classification. Interspeech. Matthieu Geist Olivier Pietquin Bilal Piot Imitation learning applied to embodied conversational agents. MLIS. Florian Eyben, Felix Weninger, Florian Gross, and Bjrn Schuller Recent developments in opensmile, the munich open-source multimedia feature extractor. ACM Multimedia (MM), pages Anton Nijholt Humor and embodied conversational agents. Edward Loper Ewan Klein Steven Bird Natural Language Processing with Python. OReilly Media Inc. 7

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park, Annie Hu, Natalie Muenster Email: katepark@stanford.edu, anniehu@stanford.edu, ncm000@stanford.edu Abstract We propose

More information

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS Dario Bertero, Pascale Fung Human Language Technology Center The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong dbertero@connect.ust.hk,

More information

Deep Learning of Audio and Language Features for Humor Prediction

Deep Learning of Audio and Language Features for Humor Prediction Deep Learning of Audio and Language Features for Humor Prediction Dario Bertero, Pascale Fung Human Language Technology Center Department of Electronic and Computer Engineering The Hong Kong University

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Finding Sarcasm in Reddit Postings: A Deep Learning Approach Finding Sarcasm in Reddit Postings: A Deep Learning Approach Nick Guo, Ruchir Shah {nickguo, ruchirfs}@stanford.edu Abstract We use the recently published Self-Annotated Reddit Corpus (SARC) with a recurrent

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Sarcasm Detection in Text: Design Document

Sarcasm Detection in Text: Design Document CSC 59866 Senior Design Project Specification Professor Jie Wei Wednesday, November 23, 2016 Sarcasm Detection in Text: Design Document Jesse Feinman, James Kasakyan, Jeff Stolzenberg 1 Table of contents

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

An AI Approach to Automatic Natural Music Transcription

An AI Approach to Automatic Natural Music Transcription An AI Approach to Automatic Natural Music Transcription Michael Bereket Stanford University Stanford, CA mbereket@stanford.edu Karey Shi Stanford Univeristy Stanford, CA kareyshi@stanford.edu Abstract

More information

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text Sabrina Stehwien, Ngoc Thang Vu IMS, University of Stuttgart March 16, 2017 Slot Filling sequential

More information

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons Center for Games and Playable Media http://games.soe.ucsc.edu Kendall review of HW 2 Next two weeks

More information

Image-to-Markup Generation with Coarse-to-Fine Attention

Image-to-Markup Generation with Coarse-to-Fine Attention Image-to-Markup Generation with Coarse-to-Fine Attention Presenter: Ceyer Wakilpoor Yuntian Deng 1 Anssi Kanervisto 2 Alexander M. Rush 1 Harvard University 3 University of Eastern Finland ICML, 2017 Yuntian

More information

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification INTERSPEECH 17 August, 17, Stockholm, Sweden A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification Yun Wang and Florian Metze Language

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Background Abstract I attempted a solution at using machine learning to compose music given a large corpus

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

The Lowest Form of Wit: Identifying Sarcasm in Social Media

The Lowest Form of Wit: Identifying Sarcasm in Social Media 1 The Lowest Form of Wit: Identifying Sarcasm in Social Media Saachi Jain, Vivian Hsu Abstract Sarcasm detection is an important problem in text classification and has many applications in areas such as

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello Small chord vocabularies Typically a supervised learning problem N C:maj C:min C#:maj C#:min D:maj D:min......

More information

Melody classification using patterns

Melody classification using patterns Melody classification using patterns Darrell Conklin Department of Computing City University London United Kingdom conklin@city.ac.uk Abstract. A new method for symbolic music classification is proposed,

More information

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina What? Novel

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics Olga Vechtomova University of Waterloo Waterloo, ON, Canada ovechtom@uwaterloo.ca Abstract The

More information

Generating Music with Recurrent Neural Networks

Generating Music with Recurrent Neural Networks Generating Music with Recurrent Neural Networks 27 October 2017 Ushini Attanayake Supervised by Christian Walder Co-supervised by Henry Gardner COMP3740 Project Work in Computing The Australian National

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Rewind: A Music Transcription Method

Rewind: A Music Transcription Method University of Nevada, Reno Rewind: A Music Transcription Method A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering by

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Speech Recognition and Voice Separation for the Internet of Things

Speech Recognition and Voice Separation for the Internet of Things Speech Recognition and Voice Separation for the Internet of Things Mohammad Hasanzadeh Mofrad and Daniel Mosse Department of Computer Science School of Computing and Information University of Pittsburgh

More information

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Amal Htait, Sebastien Fournier and Patrice Bellot Aix Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,13397,

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews Universität Bielefeld June 27, 2014 An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews Konstantin Buschmeier, Philipp Cimiano, Roman Klinger Semantic Computing

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Computational modeling of conversational humor in psychotherapy

Computational modeling of conversational humor in psychotherapy Interspeech 2018 2-6 September 2018, Hyderabad Computational ing of conversational humor in psychotherapy Anil Ramakrishna 1, Timothy Greer 1, David Atkins 2, Shrikanth Narayanan 1 1 Signal Analysis and

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Hendrik Vincent Koops 1, W. Bas de Haas 2, Jeroen Bransen 2, and Anja Volk 1 arxiv:1706.09552v1 [cs.sd]

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues

Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues Rahul Gupta o, Nishant Nath, Taruna Agrawal o, Panayiotis Georgiou, David Atkins +, Shrikanth Narayanan o o Signal

More information

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Marcello Herreshoff In collaboration with Craig Sapp (craig@ccrma.stanford.edu) 1 Motivation We want to generative

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Deep Jammer: A Music Generation Model

Deep Jammer: A Music Generation Model Deep Jammer: A Music Generation Model Justin Svegliato and Sam Witty College of Information and Computer Sciences University of Massachusetts Amherst, MA 01003, USA {jsvegliato,switty}@cs.umass.edu Abstract

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC Vaiva Imbrasaitė, Peter Robinson Computer Laboratory, University of Cambridge, UK Vaiva.Imbrasaite@cl.cam.ac.uk

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler

More information

LAUGHTER serves as an expressive social signal in human

LAUGHTER serves as an expressive social signal in human Audio-Facial Laughter Detection in Naturalistic Dyadic Conversations Bekir Berker Turker, Yucel Yemez, Metin Sezgin, Engin Erzin 1 Abstract We address the problem of continuous laughter detection over

More information

arxiv: v1 [cs.cl] 3 May 2018

arxiv: v1 [cs.cl] 3 May 2018 Binarizer at SemEval-2018 Task 3: Parsing dependency and deep learning for irony detection Nishant Nikhil IIT Kharagpur Kharagpur, India nishantnikhil@iitkgp.ac.in Muktabh Mayank Srivastava ParallelDots,

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

SentiMozart: Music Generation based on Emotions

SentiMozart: Music Generation based on Emotions SentiMozart: Music Generation based on Emotions Rishi Madhok 1,, Shivali Goel 2, and Shweta Garg 1, 1 Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India 2

More information

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection Some Experiments in Humour Recognition Using the Italian Wikiquote Collection Davide Buscaldi and Paolo Rosso Dpto. de Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS Hyungui Lim 1,2, Seungyeon Rhyu 1 and Kyogu Lee 1,2 3 Music and Audio Research Group, Graduate School of Convergence Science and Technology 4

More information

Music Generation from MIDI datasets

Music Generation from MIDI datasets Music Generation from MIDI datasets Moritz Hilscher, Novin Shahroudi 2 Institute of Computer Science, University of Tartu moritz.hilscher@student.hpi.de, 2 novin@ut.ee Abstract. Many approaches are being

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information

Chapter Two: Long-Term Memory for Timbre

Chapter Two: Long-Term Memory for Timbre 25 Chapter Two: Long-Term Memory for Timbre Task In a test of long-term memory, listeners are asked to label timbres and indicate whether or not each timbre was heard in a previous phase of the experiment

More information

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections 1/23 Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections Rudolf Mayer, Andreas Rauber Vienna University of Technology {mayer,rauber}@ifs.tuwien.ac.at Robert Neumayer

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Conversational Agents Instructor: Preethi Jyothi Oct 26, 2017 (All images were reproduced from JM, chapters 29,30) Chatbots Rule-based chatbots Historical

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition Krishan Rajaratnam The College University of Chicago Chicago, USA krajaratnam@uchicago.edu Jugal Kalita Department

More information

INGEOTEC at IberEval 2018 Task HaHa: µtc and EvoMSA to Detect and Score Humor in Texts

INGEOTEC at IberEval 2018 Task HaHa: µtc and EvoMSA to Detect and Score Humor in Texts INGEOTEC at IberEval 2018 Task HaHa: µtc and EvoMSA to Detect and Score Humor in Texts José Ortiz-Bejar 1,3, Vladimir Salgado 3, Mario Graff 2,3, Daniela Moctezuma 3,4, Sabino Miranda-Jiménez 2,3, and

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

A Fast Alignment Scheme for Automatic OCR Evaluation of Books A Fast Alignment Scheme for Automatic OCR Evaluation of Books Ismet Zeki Yalniz, R. Manmatha Multimedia Indexing and Retrieval Group Dept. of Computer Science, University of Massachusetts Amherst, MA,

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Automatic Joke Generation: Learning Humor from Examples

Automatic Joke Generation: Learning Humor from Examples Automatic Joke Generation: Learning Humor from Examples Thomas Winters, Vincent Nys, and Daniel De Schreye KU Leuven, Belgium, info@thomaswinters.be, vincent.nys@cs.kuleuven.be, danny.deschreye@cs.kuleuven.be

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

arxiv: v2 [cs.sd] 31 Mar 2017

arxiv: v2 [cs.sd] 31 Mar 2017 On the Futility of Learning Complex Frame-Level Language Models for Chord Recognition arxiv:1702.00178v2 [cs.sd] 31 Mar 2017 Abstract Filip Korzeniowski and Gerhard Widmer Department of Computational Perception

More information

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm Anupam Khattri 1 Aditya Joshi 2,3,4 Pushpak Bhattacharyya 2 Mark James Carman 3 1 IIT Kharagpur, India, 2 IIT Bombay,

More information

A Note Based Query By Humming System using Convolutional Neural Network

A Note Based Query By Humming System using Convolutional Neural Network INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden A Note Based Query By Humming System using Convolutional Neural Network Naziba Mostafa, Pascale Fung The Hong Kong University of Science and Technology

More information

Generating Original Jokes

Generating Original Jokes SANTA CLARA UNIVERSITY COEN 296 NATURAL LANGUAGE PROCESSING TERM PROJECT Generating Original Jokes Author Ting-yu YEH Nicholas FONG Nathan KERR Brian COX Supervisor Dr. Ming-Hwa WANG March 20, 2018 1 CONTENTS

More information

Speech To Song Classification

Speech To Song Classification Speech To Song Classification Emily Graber Center for Computer Research in Music and Acoustics, Department of Music, Stanford University Abstract The speech to song illusion is a perceptual phenomenon

More information

The Sparsity of Simple Recurrent Networks in Musical Structure Learning

The Sparsity of Simple Recurrent Networks in Musical Structure Learning The Sparsity of Simple Recurrent Networks in Musical Structure Learning Kat R. Agres (kra9@cornell.edu) Department of Psychology, Cornell University, 211 Uris Hall Ithaca, NY 14853 USA Jordan E. DeLong

More information

Enabling editors through machine learning

Enabling editors through machine learning Meta Follow Meta is an AI company that provides academics & innovation-driven companies with powerful views of t Dec 9, 2016 9 min read Enabling editors through machine learning Examining the data science

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You Chris Lewis Stanford University cmslewis@stanford.edu Abstract In this project, I explore the effectiveness of the Naive Bayes Classifier

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

Detecting Sarcasm in English Text. Andrew James Pielage. Artificial Intelligence MSc 2012/2013

Detecting Sarcasm in English Text. Andrew James Pielage. Artificial Intelligence MSc 2012/2013 Detecting Sarcasm in English Text Andrew James Pielage Artificial Intelligence MSc 0/0 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference

More information

Using Deep Learning to Annotate Karaoke Songs

Using Deep Learning to Annotate Karaoke Songs Distributed Computing Using Deep Learning to Annotate Karaoke Songs Semester Thesis Juliette Faille faillej@student.ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

DISCOURSE ANALYSIS OF LYRIC AND LYRIC-BASED CLASSIFICATION OF MUSIC

DISCOURSE ANALYSIS OF LYRIC AND LYRIC-BASED CLASSIFICATION OF MUSIC DISCOURSE ANALYSIS OF LYRIC AND LYRIC-BASED CLASSIFICATION OF MUSIC Jiakun Fang 1 David Grunberg 1 Diane Litman 2 Ye Wang 1 1 School of Computing, National University of Singapore, Singapore 2 Department

More information

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007 A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING Adrien Ycart and Emmanouil Benetos Centre for Digital Music, Queen Mary University of London, UK {a.ycart, emmanouil.benetos}@qmul.ac.uk

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information

A real time study of plosives in Glaswegian using an automatic measurement algorithm

A real time study of plosives in Glaswegian using an automatic measurement algorithm A real time study of plosives in Glaswegian using an automatic measurement algorithm Jane Stuart Smith, Tamara Rathcke, Morgan Sonderegger University of Glasgow; University of Kent, McGill University NWAV42,

More information