Temporal patterns of happiness and sarcasm detection in social media (Twitter)

Similar documents
Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm

Sarcasm Detection in Text: Design Document

The Lowest Form of Wit: Identifying Sarcasm in Social Media

World Journal of Engineering Research and Technology WJERT

Harnessing Context Incongruity for Sarcasm Detection

arxiv: v1 [cs.cl] 3 May 2018

Sarcasm Detection on Facebook: A Supervised Learning Approach

KLUEnicorn at SemEval-2018 Task 3: A Naïve Approach to Irony Detection

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

TWITTER SARCASM DETECTOR (TSD) USING TOPIC MODELING ON USER DESCRIPTION

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text

Modelling Sarcasm in Twitter, a Novel Approach

#SarcasmDetection Is Soooo General! Towards a Domain-Independent Approach for Detecting Sarcasm

arxiv: v1 [cs.cl] 8 Jun 2018

Detecting Sarcasm in English Text. Andrew James Pielage. Artificial Intelligence MSc 2012/2013

SARCASM DETECTION IN SENTIMENT ANALYSIS Dr. Kalpesh H. Wandra 1, Mehul Barot 2 1

Are Word Embedding-based Features Useful for Sarcasm Detection?

Sarcasm as Contrast between a Positive Sentiment and Negative Situation

Text Analysis. Language is complex. The goal of text analysis is to strip away some of that complexity to extract meaning.

NETFLIX MOVIE RATING ANALYSIS

LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets

Projektseminar: Sentimentanalyse Dozenten: Michael Wiegand und Marc Schulder

Basic Natural Language Processing

SARCASM DETECTION IN SENTIMENT ANALYSIS

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Who would have thought of that! : A Hierarchical Topic Model for Extraction of Sarcasm-prevalent Topics and Sarcasm Detection

저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다.

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Citation & Journal Impact Analysis

Fracking Sarcasm using Neural Network

DOCTORAL DISSERTATIONS OF MAHATMA GANDHI UNIVERSITY A STUDY OF THE REFERENCES CITED

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Comparative study of Sentiment Analysis on trending issues on Social Media

Semantic Role Labeling of Emotions in Tweets. Saif Mohammad, Xiaodan Zhu, and Joel Martin! National Research Council Canada!

Sarcasm in Social Media. sites. This research topic posed an interesting question. Sarcasm, being heavily conveyed

Automatic Sarcasm Detection: A Survey

Towards a Contextual Pragmatic Model to Detect Irony in Tweets

This is a repository copy of Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis.

Supplementary Note. Supplementary Table 1. Coverage in patent families with a granted. all patent. Nature Biotechnology: doi: /nbt.

Information Networks

LT3: Sentiment Analysis of Figurative Tweets: piece of cake #NotReally

Promo Mojo: Fox Takes First and Second Place with NFL, '9-1-1'

Understanding People in Low Resourced Languages

Evaluation Tools. Journal Impact Factor. Journal Ranking. Citations. H-index. Library Service Section Elyachar Central Library.

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

CrystalNest at SemEval-2017 Task 4: Using Sarcasm Detection for Enhancing Sentiment Classification and Quantification

Figures in Scientific Open Access Publications

Audio Feature Extraction for Corpus Analysis

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli

Tweet Sarcasm Detection Using Deep Neural Network

Discussing some basic critique on Journal Impact Factors: revision of earlier comments

Markers of Literary Language A Computational-Linguistic Odyssey

Do we really know what people mean when they tweet? Dr. Diana Maynard University of Sheffield, UK

arxiv:submit/ [cs.cv] 8 Aug 2016

Modelling Irony in Twitter: Feature Analysis and Evaluation

Analysis of data from the pilot exercise to develop bibliometric indicators for the REF

Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying

Creating Mindmaps of Documents

The decoder in statistical machine translation: how does it work?

Improving MeSH Classification of Biomedical Articles using Citation Contexts

Deriving the Impact of Scientific Publications by Mining Citation Opinion Terms

Sentiment of two women Sentiment analysis and social media

Implementation of Emotional Features on Satire Detection

Your research footprint:

Article Title: Discovering the Influence of Sarcasm in Social Media Responses

A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection

Connected Industry and Enterprise Role of AI, IoT and Geospatial Technology. Vijay Kumar, CTO ESRI India

Sentiment Analysis. Andrea Esuli

Repeated measures ANOVA

What are Bibliometrics?

BIBLIOMETRIC REPORT. Bibliometric analysis of Mälardalen University. Final Report - updated. April 28 th, 2014

Using Calibration Pinpoints for locating devices indoor Master of Science Thesis

An extensive Survey On Sarcasm Detection Using Various Classifiers

Understanding the Changing Roles of Scientific Publications via Citation Embeddings

Universität Bamberg Angewandte Informatik. Seminar KI: gestern, heute, morgen. We are Humor Beings. Understanding and Predicting visual Humor

Detecting Attempts at Humor in Multiparty Meetings

Maths-Whizz Investigations Paper-Back Book

Analyzing Second Screen Based Social Soundtrack of TV Viewers from Diverse Cultural Settings

AIIP Connections. Part I: Writers Guidelines Part II: Editorial Style Guide

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

A COMPREHENSIVE STUDY ON SARCASM DETECTION TECHNIQUES IN SENTIMENT ANALYSIS

Approaches for Computational Sarcasm Detection: A Survey

Sentiment Aggregation using ConceptNet Ontology

Inverted Index Construction

Harnessing Cognitive Features for Sarcasm Detection

Outline. Why do we classify? Audio Classification

Web of Science Unlock the full potential of research discovery

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach

Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus

Navigate to the Journal Profile page

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

Sentiment and Sarcasm Classification with Multitask Learning

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Detecting Musical Key with Supervised Learning

Automatic Music Clustering using Audio Attributes

Journal Citation Reports on the Web. Don Sechler Customer Education Science and Scholarly Research

Transcription:

Temporal patterns of happiness and sarcasm detection in social media (Twitter) Pradeep Kumar NPSO Innovation Day November 22, 2017

Our Data Science Team Patricia Prüfer Pradeep Kumar Marcia den Uijl Next member? 3 Peter Fontein Hendri Adriaens

Content: 1. Average Happiness Measurement 1.1 Introduction 1.2 Data Collection from Twitter 1.3 Data Cleaning 1.4 Method 1.5 Result 1.6 Interpretation 2. Sarcasm Detection in Tweets 2.1 Introduction 2.2 Training data collection 2.3 Method 2.4 Training Result 4

Why Twitter data? Popular microblogging site 1.1 Introduction 500 million tweets a day, 200 billion a year 240+ million active users Twitter audience varies from commoner to celebrities User often discuss current affairs and personal views on various subjects Challenges Tweets are highly unstructured and also non grammatical Non standard vocabulary and abbreviations Lexical variations Cultural context of phrases, terms and symbols Hidden sarcasm 5

Population of Twitter Users in the Netherlands 2.6 million Dutch users, of which 0.9 million daily Usage by age category 10% 19% 8% 23% 25% 15-19 yrs 20-39 yrs 40-64 yrs 65-79 yrs 80+ yrs 6 By 2016, stable, decrease in youth use, increase of elderly people

Social media analytics and process Capture Gather data from various sources Preprocess the data Extract pertinent information from the data Understand Remove noisy data Perform advanced analytics: opinion mining, topic modelling, trend analysis, sentiment analysis Temporal (time series) Happiness/Sentiment analysis Sarcasm analysis Present Summarize and evaluate the findings from understand stage 7

8 Twitter structure

Data hidden in plain sight Time Social network Author Tweet Description Location Popularity 9 Sentiment Topic

Approach: An Overview Tweet download using Twitter API Preprocessing and Cleaning Sanitization and emoticons replacement Tokenizer 10 Happiness calculation, term frequency and topic modelling

1.2 Data Collection from Twitter Tweet Streaming criteria: 1 % of data streaming is possible The words in the top-10s are either articles (de, het, een), prepositions (in, van), conjunctions (en, dat), a personal, pronoun (ik), a negation (niet) or a conjugation of to be (is) [1][2] The words in the top-10 are the same for men and women 11 [1]. Collecting and Analysing Chats and Tweets in SoNaR Eric Sanders, CLST, Radboud University Nijmegen [2] https://dev.twitter.com/streaming/overview

1.3 Tweet Cleaning Step 1 : Removing the HTTP links (urls) Step 2 : Removing the # tags Step 3 : Replacement of Emoticons (faces, objects, nature, flags) Emoji cheat sheet: Smileys and People Animals and nature Objects Activity, Travel and Places, Objects, Symbols, Flags 12 [3]. https://www.webpagefx.com/tools/emoji-cheat-sheet/

Step 4: Sanitization (treating the abbreviations and repeated sounds) Some Dutch abbreviations: Hgh: Hoe gaat het Gmj: goed met jou Idk: i don t know Gwn: gewoon Vgm: volgens mij K: ik Vaka: vakantie Das: dat is Sws: sowieso Wnr: wanneer T: het 13 Step 5: Monogram model

1.4 Method (Word collection and ranking) Sourcing of words: Twitter Google books Ranking methodology: New York times Music Library 1. Top 5000 words (most frequent) from each corpus merged resulting in 10,222 words [4] 2. 50 evaluations per word 3. Words ranked on the scale of 1-9 14 Top words are: Words Average happiness 1. laughter 8.50 2. happiness 8.44 3. love 8.42 4. joy 8.16 Bottom words are: Words Average happiness 1. killer 1.56 2. cancer 1.54 3. death 1.44 4. terrorist 1.30 [4] Data-Set: Data collected from LabMT [3]. The over 10,222 unique words were labeled with Amazon's Mechanical Turk.

1.4 Method (Temporal Happiness Calculation) Mathematical formula: Average Happiness = n i=1 h avg (W i ) f i n 1 f i f i = frequency of ith word h avg (W i ) = estimate of average happiness of ith word 15

Average Happiness Index 1.5 Result Interactive dynamic graph is available at https://www.centerdata.nl/nl/projecten-van-centerdata/tijdelijke-blijheidsscore-van-nederlandse-tweets 16

Two instances: 1.6 Interpretation Most used terms Term Top hashtags Score : 3.92 (5 AM on 19 th August 2017) { Hard, idiot, good, 'get, 'knows, 'like, 'police, 'Mexican, struggle fuck, 'loves, 'blonde, fantastic, drug, "government, dismissed 'care } [('#nieuws', 248), ('#nieuwstwitter', 207), ('#vacature', 204), ('#NL', 191), ('#actueel', 120), ('#NieuwsTwitter', 120), ('#Krant', 102), ('#feywil ', 73), ('#lab', 64), ('#kkl', 61), ('#brugopen', 60), ('#Nieuws', 55), ('#Nederland', 54), ('#voetbal', 53), ('#Politie', 46), ('#Amsterdam', 43), ('#LaraconEU ', 37), ('#HLN', 36), ('#E313', 35), ('#tdd', 33)] 17 Score : 4.65 (4 PM on 20 th August 2017, Sunday) {'good', 'request',bright' 'care', 'okay, victory, lucky, sunday, 'passion, well, 'cookies', happy''dismissed': 'theaters 'like,'mature''weekend,'har d, 'thought''strange, 'main': 'car 'personal,'social,'stole.'lov e,helps,walking,negative,s pa,laugh,ride,start,sea, sonic': 'needed' 'sitting } [('#ajagro', 787), ('#Ajax', 328), ('#nieuwstwitter', 275), ('#nieuws', 254), ('#NieuwsTwitter', 169), ('#actueel', 168), ('#ANDSTV', 163), ('#brugopen', 143), ('#AJAgro', 116), ('#utrwil', 116), ('#excfey', 106), ('#CNBLUE', 105), ('#andstv', 89), ('#PushAwardsKathNiels', 83), ('#FCGroningen', 82), ('#voetbal', 79), ('#PSV', 78), ('#NACpsv', 77), ('#NACpraat', 72)]

Topic Modelling on Twitter Data Score : 4.65 (4 PM on 20 th August 2017, Sunday) Topic 1 [goed,nook,ooik,we,juist,steed,zee,my,kom,wel,meisje,vrouwen, nederland] Topic 2 [Iik,wel,ooik,hebt,waar,echt,heel,d enik,weer,erg,gaat,zit,mensen,zin morgen] Topic 3 [iik,leuik,vind,video,frans,gelezen, waal,waarom,geld,nl,stuik,wedstrij d,blijft,vragen] Topic 4 [Minder,bedankt,middle,smokkelma ffia,ngo s,knechten,afname,verdrin kingen,verdraait,club,bal,tijd,leuke, blonde] Topic 5 [Weer,nee,niet,ooik,gewoon,volgen s,smile, we, gaat,uur, keer,gaan,zomer,wel,ij] Score : 3.92 (5 AM on 19 th August 2017) iik,zegtweer,waar,he,ooik,gaan,allemaa l,wel,mooi,nieuwe,bal,barcelona,we,va kantie [man.weg,grote,tijdens,krijgt,rood,gei k,twee,meisje,no,jullie,mensen,omg,sp eelt] [iik,video,vind,leuik,via,wonder,live,wis dom,toegevoegd,ooik,we,afspeellijst,g oal,onze,amp] Nou,minder,juist,gaat,nederland,zee,m iddel,worden,tijd,ooik,smokkelmaffia,k nechten,afname,ngo s [my,beste,gemaakt,school,gelijik,you,t oe,geniet,pa,gewoon,extra] 18

2. Sarcasm Detection in Tweets 2.1 Introduction 2.2 Training data collection 2.3 Methods 2.4 Training Result 19

Underlying Hypothesis: 2.1 Introduction Contrast in Sarcastic Tweets: Sarcasm detection relies on the assumption that a negative situation often appears after the positive situations in a sarcastic document.[5] [positive verb phrase] + [negative verb phrase] 1. "Honesty is the best policy - when there is money in it." - Mark Twain 2. (een ouder tegen een kind met een slecht rapport) Je bent weer eens de beste leerling van de klas! The training dataset consists of 20000 clean sarcastic tweets 100000 clean non-sarcastic tweets 20 [5] Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert and Ruihong Huang, Sarcasm as Contrast between a Positive Sentiment and Negative Situation

2.2 Feature Engineering Sentiment analysis Topic modeling Part of speech tagging n-grams model e.g. 1. unigrams is one word (example: really, great, super, awesome, etc.) and 2. bigrams words (example: really great, super awesome, etc) 21

2.3 Method Step 1: Split each tweet in one, two and three parts! Step 2: Sentiment analysis on all parts Splitting Blob sentiment 0.213793322025 Blob subjectivity 0.0501915056376 Blob sentiment 1/2-0.0218950362976 Blob sentiment 2/2-0.0323621717231 Blob subjectivity 1/2-0.100033759951 Blob subjectivity 2/2-0.0904785266556 Blob sentiment 1/3 0.0209378078692 Blob sentiment 2/3-0.0577754412137 Blob sentiment 3/3 0.0344419200665 Blob subjectivity 1/3 0.110001944706 Blob subjectivity 2/3 0.0676131604139 Blob subjectivity 3/3 0.0520556715094 No Split Two Splits Three Splits Step 3: Topic modelling- decompose each tweet as sum of topics, to be used as feature 22

2.4 Training Results Training results, Support vector machine (linear kernel) Precision Recall f1-score Sarcasm 0.91 0.93 0.92 Non sarcasm 0.63 0.58 0.61 Avg 0.86 0.87 0.87 23

Conclusions and future work Enrich library for acronym expansion and emoticons replacement Applying deep learning methods for sarcasm analysis Collection of labelled Dutch tweets for training the model for sarcasm detection Additional features to be explored to tweak the algorithm for sarcasm detection in Dutch tweets Consideration retweet as a factor 24

Thanks for your attention!!