Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Size: px
Start display at page:

Download "Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues"

Transcription

1 Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park, Annie Hu, Natalie Muenster Abstract We propose detecting and responding to humor in spoken dialogue by extracting language and audio cues and subsequently feeding these features into a combined recurrent neural network (RNN) and logistic regression model. In this paper, we parse Switchboard phone conversations to build a corpus of punchlines and unfunny lines where punchlines precede tokens in Switchboard transcripts. We create a combined RNN and logistic regression model that uses both acoustic and language cues to predict whether a conversational agent should respond to an utterance with. Our model achieves an F1-score of 63.2 and accuracy of This model outperforms our logistic language model (F1-score 56.6) and RNN acoustic model (59.4) as well as the final RNN model of D. Bertero, 2016 (52.9). Using our final model, we create a laughbot that audibly responds to a user with when their utterance is classified as a punchline. A conversational agent outfitted with a humorrecognition system such as the one we present in this paper would be valuable as these agents gain utility in everyday life. Keywords Chatbots; spoken natural language processing; deep learning; machine learning I. INTRODUCTION Our project takes the unique approach of building a laughbot to detect humor and thus predict when someone will laugh based on their textual and acoustic cues. Humor depends on both the strings of words, how the speakers voice changes, as well as situational context. Predicting when someone will laugh is a significant part of contemporary media. For example, sitcoms are written and performed primarily to cause, so the ability to detect precisely why something will draw is of extreme importance to screenwriters and actors. Being able to detect humor could improve embodied conversational agents who are perceived as having more human attributes when they are able to display the appreciation of humor [1]. Siri currently will only say you are very funny, but could become more human if she had the ability to recognize social cues like humor, and respond to them with. Thus, our objective is to train and test a model that can detect whether or not a line is funny and should be appropriately responded to with. II. LIMITATIONS OF PREVIOUS WORK Existing work in predicting humor uses a combination of text and audio features and various machine learning methods to tackle the classification problem. The goal of research done in [2] was to predict and create an avatar that imitated a human expert s sense of when to laugh in a conversation. They reduced the problem to a multi-class classification problem, where they found that using the large margin algorithm resulted in the most natural behavior (laughing with the same proportion to the expert). A later paper [3] examined the problem of using a enabled interaction manager to make decisions about whether to laugh or not. They defined the regularized classification for apprenticeship learning (RCAL) algorithm using audio features. Final results showed that RCAL performed much better than multi-class classification on and slightly worse on. The paper noted the problem of having a large class imbalance between and non- in the dataset, and suggested potential future work in inverse reinforcement learning (IRL) algorithms. From this, we propose weighting the datasets to reduce the imbalance so that the non- does not overpower what the model will learn from punchlines. In [4], author compared three supervised machine learning methods to predict and detect humor in The Big Bang Theory sitcom dialogues. From a corpus where punchlines were annotated by, they extracted audio and language features. Audio features and language features were fed into three models: convolutional neural network (CNN), RNN, and conditional random field (CRF), and compared against a logistic regression baseline (F1-score 29.2). The CNN using word vectors and overlapping time frames of 25ms performed the best (F1-score of 68.6). The RNN model (F1-score 52.9) should have performed the best but actually performed worse than the CNN, likely due to overfitting. Their paper proposes future work building a better dialog system that understands humor. III. PRESENT WORK Rather than a sitcom s dataset, our paper uses conversational conversations whose humor we hypothesize is more relevant for a conversational agent like Siri. We create an ensemble model combining a recurrent neural network with a logistic regression classifier that, when run on audio and language features from lines of conversation, can identify whether a line is humorous. We additionally implement a laughbot, a simple dialog system based on our ensemble model that converses with a user and responds to humorous input with. In Section IV, we introduce the Switchboard dataset we used for training, the preprocessing steps we performed, and the features we extracted to implement our models. Section V then gives implementation details for these models, including a baseline, a language-only, an audio, and a combined model, and Section VI gives details for our laughbot interface. In Section VII we define our evaluation metrics, present results, /18/$31.00 c 2018 IEEE 1 P a g e

2 and analyze errors. Finally, in Section VIII we discuss future plans to improve our humor-detection model and our laughbot system. IV. DATASET Since the focus of this project is detecting humor in everyday, regular speech, our dataset must reflect everyday conversations. We use the Switchboard Corpora available on AFS, which consists of around 3000 groups of audio files, transcripts, and word-level time intervals from phone conversations between two speakers. We classified each line of a transcript as a punchline if it preceded any indication of at the beginning of the next person s response. Tables I, II, and III show sample lines and their corresponding classifications. TABLE II. TABLE I. TABLE III. LINE INDUCING LAUGHTER IS CLASSIFIED AS A PUNCHLINE B: That, that s the major reason I m walking up Punchline the stairs. A: [Laughter] To go skiing? LAUGHTER THAT INDUCES LAUGHTER IS CLASSIFIED AS A PUNCHLINE A: Uh-huh. Well, you must have a relatively clean Punchline conscience then []. B: [Laughter] LINE PRECEDING SOMEONE LAUGHING AT THEMSELVES DOES NOT COUNT AS A PUNCHLINE A: just for fun. B: Shaking the scorpions out of their shoes []. We split our data set into 80% train / 10% validation / 10% test sets. Considering the imbalanced datasets that previous work worked with, we sampled 5% of the non-punchlines because our original dataset is heavily imbalanced towards non-punchlines. This achieved a more balanced training set among positive (punchlines) and negative (unfunny lines) classes. Our final datasets each have about 35-40% punchlines. A. Features We extracted a combination of language and audio features from the data. Language features include: Unigrams, bigrams, trigrams: we pruned the vocabulary and kept the n-grams that appear more than a certain frequency in the training set, tuning the threshold on our val set. In this model, we kept n- grams that appear at least twice. Parts of speech: we implemented NLTK s POStagger to pull the number of nouns, verbs, adjective, adverbs and pronouns appearing in the example [5]. Sentiment: we utilized NLTK s vader toolkit to extract sentiment from the punchline, in a scale of more negative to more positive [5]. Length, average word length: from reading past work, we learned sitcom punchlines are often short, so we used length features [4]. We also extracted acoustic features from each audio file (converted to.wav format) with the opensmile toolkit and matched the timestamped features to the timed transcripts in Switchboard to extract corresponding acoustic and language features for a given example [6]. Acoustic features include: MFCC: We expected MFCC vectors to store the most information about an audio sample, so we sampled 12 vectors every 10ms with a maximum of 50 time intervals per example, since certain lines may be too long to fully store. Energy level: We also expected the speaker s energy to be a strong indicator of humor, so we included this as an additional feature. A. Baseline V. IMPLEMENTED MODELS Our baseline was an all positive classifier, predicting every example as a punchline. The precision of this classifier is the proportion of true punchlines in the dataset (around 35%) and 100% recall. We also use all negative classifier, predicting every line as unfunny which has a precision of the proportion of unfunny lines (around 65%) and 0% recall. Table IV displays these baseline results. TABLE IV. BASELINE METRICS Classifier Precision Recall F1-score All Positive All Negative B. Logistic Regression Language Model We trained a logistic regression model using only language features (ngrams, sentiment, line length) as a secondary baseline. Logistic regression was an intuitive starting model for binary classification, and also allowed us to observe and tune the performance of just our language features on predicting humor. C. RNN Acoustic Model We next trained a RNN using only acoustic features, to observe the performance of our acoustic features on classifying lines of audio as punchlines or not. We chose an RNN to better capture the sequential nature and thus conversational context of dialogue, and we use Gated Recurrent Unit (GRU) cells so our model can better remember earlier timesteps in a line, instead of overemphasizing the latest timesteps. During training, we used standard softmax cross entropy to calculate cost. We initially used an Adam optimizer because it handles less frequently seen training features better and converges smoother than stochastic gradient descent. Our final RNN uses an Adamax optimizer to further stabilize the model between epochs and to make the model more robust in handling lessfrequently seen features and gradient noise /18/$31.00 c 2018 IEEE 2 P a g e

3 strength of the wifi connection. To see a sample of our laughbot in action, see this video: Fig. 1. Diagram of final RNN + Logistic Regression model. D. Final Combined Model After designing separate language and acoustic models, we combined the two by: 1) Running our RNN on all acoustic features in the training set, and extract the final hidden state vector in the RNN on each training example. 2) Concatenating this vector with all language features for its corresponding training example. 3) Using the combined feature vectors to train a logistic regression model. Fig. 1 shows our combined model architecture. For testing, we followed a similar process of running the acoustic features into our pre-trained RNN, concatenating the final hidden state vector with language features, and running the combined feature vector through our pre-trained logistic regression model to see the prediction. VI. A. Overview LAUGHBOT APPLICATION ARCHITECTURE The laughbot is a simple user-interface application that implements the model we built during our research and testing, predicting humor using a chatbot-style audio prompter. It is intended for demonstration purposes for both letting users experiment with custom input as well as showing the results of our project in an accessible and tangible form. The user speaks into the microphone, after which the laughbot will classify whether what was said was funny, and audibly laugh if so. B. Transcription Architecture The laughbot is designed to take user input audio, transcribe it, and feed the audio file and transcription into our pre-trained RNN and logistic regression model. Multithreading allows the user to use the microphone to speak for as much of the maximum 60 second time segment as he or she would like, before pressing Enter to indicate end of speech. The audio is then saved as a.wav file and transcribed by hitting the Google Cloud Speech API. Both the transcription and the original audio file are sent through the pre-trained model in which acoustic features are extracted and run through the pre-trained RNN. The last hidden states are combined with textual features extracted from the transcription as features for the entire logistic regression model. Once a funny or not funny classification is obtained the laughbot will either keep a straight face by staying silent and simply prompt for more audio, or it will randomly play one of several laughtracks that we recorded during the late night hours of project development. The classification is almost immediate. The brunt of the runtime of our implementation depends on the speed of transcription from Google Cloud Speech API, thus the A. Evaluation VII. RESULTS We evaluated using accuracy, precision, recall, and F1 scores, with greatest emphasis on F1 scores. Accuracy calculates the proportion of correct predictions: Accuracy = T P + T N T P + T N + F P + F N Precision calculates the proportion of predicted punchlines that were actual punchlines: Precision = T P T P + F P Recall calculates the proportion of true punchlines that were captured as punchlines: Recall = T P T P + F N F1 is the harmonic mean of precision and recall, which can be calculated as below: F 1 = 2 Precision Recall Precision + Recall Table VII-A and Fig. 2 show the final performance of our models on these metrics, evaluated on the test dataset. Notably, our final model not only beat our baseline, it also beat the final RNN model of [4], the most similar approach to our own model. Bertero et al. used The Big Bang Theory sitcom dialogues which we hypothesize is an easier dataset to classify than general phone conversations. Their dataset also had a higher proportion of punchlines leading to a higher F1 score in their positive baseline. Their final Convolutional Neural Network (CNN) model, a deep feed-forward neural network, performed the best, but improved less compared to their baseline (14.36% improvement) than ours performed compared to our baseline (16.35% improvement). Further, while the final CNN model proposed by Bertero et al. had a higher F1-score than our final RNN model, our model had higher accuracy (see Table VII-A). B. Model Analysis Our final combined RNN acoustic and logistic regression language model performed the best of all our models. This fit our expectations, as humor should depend on both what was said and how it was said. Both the language only and audio only model had fairly similar accuracies to the final model, but had much lower recall scores (especially the language model), suggesting that the combination model was better at correctly predicting punchlines while the individual models perhaps tended to be too conservative in their predictions, in that they predicted non-punchlines for too many true punchlines /18/$31.00 c 2018 IEEE 3 P a g e

4 TABLE V. COMPARISON OF ALL MODELS ON ALL DATASETS Classifier Accuracy Precision Recall F1-score Logistic Regression (train) RNN (train) Combined (train) Logistic Regression (validation) RNN (validation) Combined (validation) Logistic Regression (test) RNN (test) Combined (test) TABLE VI. COMPARISON OF OUR MODEL AGAINST MODELS IN BERTERO ET AL. Classifier Accuracy Precision Recall F1-score Bertero s Positive Baseline Our Positive Baseline Our Logistic regression (language only) Our RNN (audio only) Bertero s Final RNN Our Final Model (RNN + LogReg) Bertero s Final CNN Fig. 3. Comparison of F1-scores on train, val, and test. Fig. 2. Comparison of models on test datasets. We tuned our RNN and regression models separately on the validation set. We found that the language model performed best when the frequent n-grams threshold was set at 2 (so we only included n-grams that occurred at least twice in the training set), and performance dropped as this threshold was increased. This makes sense since for bigrams and especially trigrams, the number that appear at least x times in a dataset drops drastically as x increases, so with a too-high threshold, we were excluding too many potentially useful features. We also found that sentence length was a particularly important feature, which confirmed our expectation that most punchlines would be relatively short. With the RNN, we found that increasing the number of hidden states greatly improved model performance up to a certain point, then began causing overfitting past that point. The same was true of the number of epochs we ran the model through during training. As we were using an Adamax optimizer, which already performs certain optimizations to adapt the model learning rate, we did not perform much tuning on our initial learning rate. C. Error Analysis Table VII-A and Fig. 3 show the performance of our language-only, audio-only, and combined models on our training, validation, and test datasets. All models performed significantly better on the training set than on the validation or test set, especially the final combined model, suggesting that our model is strongly overfitting to the training data. This may be helped by hyperparameter tuning such as decreasing the number of epochs or number of hidden units in our RNN, or changing the regularization of our logistic regression model. Simplifying the model could also help by decreasing the maximum number of MFCC vectors to extract or decreasing the number of language features. As we explore in the next section, overfitting may be reduced by training on larger datasets. D. Dataset Analysis We initially ran our models on only 20% of the full Switchboard dataset to speed up development. Fig. 4 shows the training accuracy and cost curves of our RNN model, which suggest strong overfitting to the training set. Once we finalized our combined RNN and logistic regression model we continued to run on larger portions until we used the full 100% of the Switchboard dataset and achieved the highest F1 score /18/$31.00 c 2018 IEEE 4 P a g e

5 TABLE VII. MODEL PERFORMANCE ON VARYING DATASET SIZES Classifier Dataset Size Accuracy Precision Recall F1-score Combined (train) 20% (4659 examples) Combined (test) 20% (618 examples) Combined (train) 50% (12011 examples) Combined (test) 50% (1527 examples) Combined (train) 100% (23658 examples) Combined (test) 100% (2893 examples) TABLE VIII. LAUGHBOT SUCCESSES Transcript you re cute ha ha ha you re so fun to talk to haha my grandma died why did the chicken cross the road to get to the other side do you like cheese I finished my cs224s project Response Fig. 4. RNN model train accuracy and cost on 20% of dataset. on the test set. Table VII-D shows our model performance on different portions of the dataset, with noticeably less overfitting and better test set performance as the dataset portion increased. This suggests that having an even larger dataset could achieve better results. E. Laughbot Analysis In testing our laughbot by speaking to it and waiting for or as shown in Fig. 5 and 6, our laughbot responded appropriately to several types of inputs. Fig. 5. Fig. 6. Example laughbot success responding with. Example laughbot success responding with no. We noticed that was often caused by shorter inputs (though not in all cases, as seen in Table VII-E), as well as by in the punchline. On longer inputs or inputs with negative sentiment or both, laughbot generally correctly considered the line as not a punchline. Laughbot responded positively to jokes, at some cases waiting for the actual punchline. Laughbot considered questions and statements as unfunny as shown in Table VII-E. There were cases that fooled the laughbot and laughbot inappropriately responded with. For example, the laughbot laughed at I love you, likely because the statement has positive sentiment and is short in length. Sometimes, unfunny lines said in a funny manner (raising the pitch of the last word) can induce. For example saying my grandma died but with high pitch at the end will cause laughbot to respond with. Whether this should be considered a success for laughbot is up to the discretion of the user. VIII. A. Improving the Model LIMITATIONS AND FUTURE WORK Our combined RNN and logistic regression model performed best with an F1-score of 63.2 on the test set and an accuracy of Future work will focus on reducing overfitting, as our final model run on the entire dataset still performs significantly higher on the train set. Since logistic regression is a much more naive model than an RNN, we will work on improving this ensemble model to fully utilize the predictive power of both. We also wish to explore CNNs, both stand-alone and in ensemble models, as our research showed CNNs to have higher F1-score than RNN models for this task [4]. We would also train on a larger dataset for a more generalizable model. Additional implementations could include more complex classification to identify the level or type of humor; laughbot would be able to respond with giggles or guffaws based on user input. Sarcasm in particular has always been difficult to detect in natural and spoken language processing, but our model for detecting humor is a step towards being able to recognize the common textual cues along with the specific intonations that normally accompany sarcastic speech. B. Making Laughbot Realtime During development of the laughbot, we originally tried to design it to work in real-time so the user could continually speak and the laughbot could laugh at every point in the one-sided conversation when it recognized humor. We were able to transcribe audio in real-time with the Google Speech /18/$31.00 c 2018 IEEE 5 P a g e

6 API, and intended to multithread it to capture an audio file simultaneously, but we faced problems structuring the rest of the interface to allow it to continually run the input through the model until humor was detected or the speaker paused for more than a certain threshold. Real-time recognition and laughing is a next-step implementation that will involve sending partial transcripts and audio files through the model continuously, concatenating audio and transcript data to preceding chunks to allow for context and longer audio cues to contribute to the classification. Once real-time recognition and response is implemented, we could run laughbot on sitcoms stripped of their laughtracks and allow the laughbot to respond. Then, we could compare how closely laughbot s laughs compare to the original laughtracks of the TV show. IX. CONCLUSION Our project takes the unique approach of training on phone conversations and combining a RNN and logistic regression model to classify spoken speech as funny or not funny. Our final model achieves an F1-score of 63.2 and accuracy of 73.9 on the test set, outperforming RNN models in previous work. Since we trained and tested on real conversations, this model s humor detection is most applicable to real, everyday speech that might be faced by a conversational agent. We also outline the architecture of Laughbot, a conversational agent that listens to users and responds with to funny utterances. With future work, our model is a promising development for conversational agents that will detect and respond to humor in realtime. ACKNOWLEDGMENTS We would like to thank our professor Andrew Maas, Dan Jurafsky and the Stanford University CS244S Spoken Natural Language Processing teaching team. Special thanks to Raghav and Jiwei for their direction on our combined RNN and regression model. REFERENCES [1] Anton Nijholt. Humor and embodied conversational agents [2] Matthieu Geist Bilal Piot, Olivier Pietquin. Predicting when to laugh with structured classification. Interspeech, [3] Olivier Pietquin Bilal Piot, Matthieu Geist. Imitation learning applied to embodied conversational agents. MLIS, [4] Dario Bertero. Deep learning of audio and language features for humor prediction [5] Ewan Klein Steven Bird, Edward Loper. Natural Language Processing with Python. OReilly Media Inc., [6] Florian Eyben, Felix Weninger, Florian Gross, and Bjrn Schuller. Recent developments in opensmile, the munich open-source multimedia feature extractor. pages ACM Multimedia (MM), /18/$31.00 c 2018 IEEE 6 P a g e

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park katepark@stanford.edu Annie Hu anniehu@stanford.edu Natalie Muenster ncm000@stanford.edu Abstract We propose detecting

More information

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS Dario Bertero, Pascale Fung Human Language Technology Center The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong dbertero@connect.ust.hk,

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Deep Learning of Audio and Language Features for Humor Prediction

Deep Learning of Audio and Language Features for Humor Prediction Deep Learning of Audio and Language Features for Humor Prediction Dario Bertero, Pascale Fung Human Language Technology Center Department of Electronic and Computer Engineering The Hong Kong University

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Finding Sarcasm in Reddit Postings: A Deep Learning Approach Finding Sarcasm in Reddit Postings: A Deep Learning Approach Nick Guo, Ruchir Shah {nickguo, ruchirfs}@stanford.edu Abstract We use the recently published Self-Annotated Reddit Corpus (SARC) with a recurrent

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Sarcasm Detection in Text: Design Document

Sarcasm Detection in Text: Design Document CSC 59866 Senior Design Project Specification Professor Jie Wei Wednesday, November 23, 2016 Sarcasm Detection in Text: Design Document Jesse Feinman, James Kasakyan, Jeff Stolzenberg 1 Table of contents

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

An AI Approach to Automatic Natural Music Transcription

An AI Approach to Automatic Natural Music Transcription An AI Approach to Automatic Natural Music Transcription Michael Bereket Stanford University Stanford, CA mbereket@stanford.edu Karey Shi Stanford Univeristy Stanford, CA kareyshi@stanford.edu Abstract

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text Sabrina Stehwien, Ngoc Thang Vu IMS, University of Stuttgart March 16, 2017 Slot Filling sequential

More information

Image-to-Markup Generation with Coarse-to-Fine Attention

Image-to-Markup Generation with Coarse-to-Fine Attention Image-to-Markup Generation with Coarse-to-Fine Attention Presenter: Ceyer Wakilpoor Yuntian Deng 1 Anssi Kanervisto 2 Alexander M. Rush 1 Harvard University 3 University of Eastern Finland ICML, 2017 Yuntian

More information

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Background Abstract I attempted a solution at using machine learning to compose music given a large corpus

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons Center for Games and Playable Media http://games.soe.ucsc.edu Kendall review of HW 2 Next two weeks

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification INTERSPEECH 17 August, 17, Stockholm, Sweden A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification Yun Wang and Florian Metze Language

More information

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina What? Novel

More information

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Amal Htait, Sebastien Fournier and Patrice Bellot Aix Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,13397,

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Speech Recognition and Voice Separation for the Internet of Things

Speech Recognition and Voice Separation for the Internet of Things Speech Recognition and Voice Separation for the Internet of Things Mohammad Hasanzadeh Mofrad and Daniel Mosse Department of Computer Science School of Computing and Information University of Pittsburgh

More information

The Lowest Form of Wit: Identifying Sarcasm in Social Media

The Lowest Form of Wit: Identifying Sarcasm in Social Media 1 The Lowest Form of Wit: Identifying Sarcasm in Social Media Saachi Jain, Vivian Hsu Abstract Sarcasm detection is an important problem in text classification and has many applications in areas such as

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition Krishan Rajaratnam The College University of Chicago Chicago, USA krajaratnam@uchicago.edu Jugal Kalita Department

More information

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews Universität Bielefeld June 27, 2014 An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews Konstantin Buschmeier, Philipp Cimiano, Roman Klinger Semantic Computing

More information

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC Vaiva Imbrasaitė, Peter Robinson Computer Laboratory, University of Cambridge, UK Vaiva.Imbrasaite@cl.cam.ac.uk

More information

Generating Music with Recurrent Neural Networks

Generating Music with Recurrent Neural Networks Generating Music with Recurrent Neural Networks 27 October 2017 Ushini Attanayake Supervised by Christian Walder Co-supervised by Henry Gardner COMP3740 Project Work in Computing The Australian National

More information

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello Small chord vocabularies Typically a supervised learning problem N C:maj C:min C#:maj C#:min D:maj D:min......

More information

Rewind: A Music Transcription Method

Rewind: A Music Transcription Method University of Nevada, Reno Rewind: A Music Transcription Method A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering by

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Melody classification using patterns

Melody classification using patterns Melody classification using patterns Darrell Conklin Department of Computing City University London United Kingdom conklin@city.ac.uk Abstract. A new method for symbolic music classification is proposed,

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics Olga Vechtomova University of Waterloo Waterloo, ON, Canada ovechtom@uwaterloo.ca Abstract The

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

A Fast Alignment Scheme for Automatic OCR Evaluation of Books A Fast Alignment Scheme for Automatic OCR Evaluation of Books Ismet Zeki Yalniz, R. Manmatha Multimedia Indexing and Retrieval Group Dept. of Computer Science, University of Massachusetts Amherst, MA,

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

Generating Original Jokes

Generating Original Jokes SANTA CLARA UNIVERSITY COEN 296 NATURAL LANGUAGE PROCESSING TERM PROJECT Generating Original Jokes Author Ting-yu YEH Nicholas FONG Nathan KERR Brian COX Supervisor Dr. Ming-Hwa WANG March 20, 2018 1 CONTENTS

More information

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Hendrik Vincent Koops 1, W. Bas de Haas 2, Jeroen Bransen 2, and Anja Volk 1 arxiv:1706.09552v1 [cs.sd]

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Marcello Herreshoff In collaboration with Craig Sapp (craig@ccrma.stanford.edu) 1 Motivation We want to generative

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Enabling editors through machine learning

Enabling editors through machine learning Meta Follow Meta is an AI company that provides academics & innovation-driven companies with powerful views of t Dec 9, 2016 9 min read Enabling editors through machine learning Examining the data science

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection Some Experiments in Humour Recognition Using the Italian Wikiquote Collection Davide Buscaldi and Paolo Rosso Dpto. de Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain

More information

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network Indiana Undergraduate Journal of Cognitive Science 1 (2006) 3-14 Copyright 2006 IUJCS. All rights reserved Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network Rob Meyerson Cognitive

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

An Introduction to Deep Image Aesthetics

An Introduction to Deep Image Aesthetics Seminar in Laboratory of Visual Intelligence and Pattern Analysis (VIPA) An Introduction to Deep Image Aesthetics Yongcheng Jing College of Computer Science and Technology Zhejiang University Zhenchuan

More information

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS Hyungui Lim 1,2, Seungyeon Rhyu 1 and Kyogu Lee 1,2 3 Music and Audio Research Group, Graduate School of Convergence Science and Technology 4

More information

Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues

Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues Rahul Gupta o, Nishant Nath, Taruna Agrawal o, Panayiotis Georgiou, David Atkins +, Shrikanth Narayanan o o Signal

More information

Chapter Two: Long-Term Memory for Timbre

Chapter Two: Long-Term Memory for Timbre 25 Chapter Two: Long-Term Memory for Timbre Task In a test of long-term memory, listeners are asked to label timbres and indicate whether or not each timbre was heard in a previous phase of the experiment

More information

A. Ideal Ratio Mask If there is no RIR, the IRM for time frame t and frequency f can be expressed as [17]: ( IRM(t, f) =

A. Ideal Ratio Mask If there is no RIR, the IRM for time frame t and frequency f can be expressed as [17]: ( IRM(t, f) = 1 Two-Stage Monaural Source Separation in Reverberant Room Environments using Deep Neural Networks Yang Sun, Student Member, IEEE, Wenwu Wang, Senior Member, IEEE, Jonathon Chambers, Fellow, IEEE, and

More information

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You Chris Lewis Stanford University cmslewis@stanford.edu Abstract In this project, I explore the effectiveness of the Naive Bayes Classifier

More information

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler

More information

CHAPTER-9 DEVELOPMENT OF MODEL USING ANFIS

CHAPTER-9 DEVELOPMENT OF MODEL USING ANFIS CHAPTER-9 DEVELOPMENT OF MODEL USING ANFIS 9.1 Introduction The acronym ANFIS derives its name from adaptive neuro-fuzzy inference system. It is an adaptive network, a network of nodes and directional

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information

Deep Jammer: A Music Generation Model

Deep Jammer: A Music Generation Model Deep Jammer: A Music Generation Model Justin Svegliato and Sam Witty College of Information and Computer Sciences University of Massachusetts Amherst, MA 01003, USA {jsvegliato,switty}@cs.umass.edu Abstract

More information

Computational modeling of conversational humor in psychotherapy

Computational modeling of conversational humor in psychotherapy Interspeech 2018 2-6 September 2018, Hyderabad Computational ing of conversational humor in psychotherapy Anil Ramakrishna 1, Timothy Greer 1, David Atkins 2, Shrikanth Narayanan 1 1 Signal Analysis and

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

Speech To Song Classification

Speech To Song Classification Speech To Song Classification Emily Graber Center for Computer Research in Music and Acoustics, Department of Music, Stanford University Abstract The speech to song illusion is a perceptual phenomenon

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Using Deep Learning to Annotate Karaoke Songs

Using Deep Learning to Annotate Karaoke Songs Distributed Computing Using Deep Learning to Annotate Karaoke Songs Semester Thesis Juliette Faille faillej@student.ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007 A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Creating Mindmaps of Documents

Creating Mindmaps of Documents Creating Mindmaps of Documents Using an Example of a News Surveillance System Oskar Gross Hannu Toivonen Teemu Hynonen Esther Galbrun February 6, 2011 Outline Motivation Bisociation Network Tpf-Idf-Tpu

More information

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections 1/23 Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections Rudolf Mayer, Andreas Rauber Vienna University of Technology {mayer,rauber}@ifs.tuwien.ac.at Robert Neumayer

More information

A Comparison of Methods to Construct an Optimal Membership Function in a Fuzzy Database System

A Comparison of Methods to Construct an Optimal Membership Function in a Fuzzy Database System Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2006 A Comparison of Methods to Construct an Optimal Membership Function in a Fuzzy Database System Joanne

More information

Music Generation from MIDI datasets

Music Generation from MIDI datasets Music Generation from MIDI datasets Moritz Hilscher, Novin Shahroudi 2 Institute of Computer Science, University of Tartu moritz.hilscher@student.hpi.de, 2 novin@ut.ee Abstract. Many approaches are being

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

LAUGHTER serves as an expressive social signal in human

LAUGHTER serves as an expressive social signal in human Audio-Facial Laughter Detection in Naturalistic Dyadic Conversations Bekir Berker Turker, Yucel Yemez, Metin Sezgin, Engin Erzin 1 Abstract We address the problem of continuous laughter detection over

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Conversational Agents Instructor: Preethi Jyothi Oct 26, 2017 (All images were reproduced from JM, chapters 29,30) Chatbots Rule-based chatbots Historical

More information

World Journal of Engineering Research and Technology WJERT

World Journal of Engineering Research and Technology WJERT wjert, 2018, Vol. 4, Issue 4, 218-224. Review Article ISSN 2454-695X Maheswari et al. WJERT www.wjert.org SJIF Impact Factor: 5.218 SARCASM DETECTION AND SURVEYING USER AFFECTATION S. Maheswari* 1 and

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

arxiv: v1 [cs.cl] 3 May 2018

arxiv: v1 [cs.cl] 3 May 2018 Binarizer at SemEval-2018 Task 3: Parsing dependency and deep learning for irony detection Nishant Nikhil IIT Kharagpur Kharagpur, India nishantnikhil@iitkgp.ac.in Muktabh Mayank Srivastava ParallelDots,

More information

SentiMozart: Music Generation based on Emotions

SentiMozart: Music Generation based on Emotions SentiMozart: Music Generation based on Emotions Rishi Madhok 1,, Shivali Goel 2, and Shweta Garg 1, 1 Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India 2

More information

arxiv: v2 [cs.sd] 31 Mar 2017

arxiv: v2 [cs.sd] 31 Mar 2017 On the Futility of Learning Complex Frame-Level Language Models for Chord Recognition arxiv:1702.00178v2 [cs.sd] 31 Mar 2017 Abstract Filip Korzeniowski and Gerhard Widmer Department of Computational Perception

More information

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Introduction Brandon Richardson December 16, 2011 Research preformed from the last 5 years has shown that the

More information

The Sparsity of Simple Recurrent Networks in Musical Structure Learning

The Sparsity of Simple Recurrent Networks in Musical Structure Learning The Sparsity of Simple Recurrent Networks in Musical Structure Learning Kat R. Agres (kra9@cornell.edu) Department of Psychology, Cornell University, 211 Uris Hall Ithaca, NY 14853 USA Jordan E. DeLong

More information