TWITTER SARCASM DETECTOR (TSD) USING TOPIC MODELING ON USER DESCRIPTION

TWITTER SARCASM DETECTOR (TSD) USING TOPIC MODELING ON USER DESCRIPTION Supriya Jyoti Hiwave Technologies, Toronto, Canada Ritu Chaturvedi MCS, University of Toronto, Canada Abstract Internet users go to different social media platforms to read about reviews or comments about a product before they buy or invest in one. Thus, for many companies it is very important to keep track of sentiments of their customers review, respond to them in-time, and to keep the brand value high. There is no dearth of models being created for mining sentiments from twitter data but these models fail when sarcasm is involved in tweets. Thus, sarcasm detection (when expressed implicitly, in contrary to being expressed using explicit words) can help in gaining better insight of customer sentiments on their opinion or review about a product or company. In order to fulfil the above mentioned objective of detecting sarcasm, this paper engineers features from tweets and their description in order to capture the sarcasm expressed in tweets. Each registered author on twitter has the opportunity to selfdescribe oneself. We utilize this selfdescription to extract extra information about personality traits of twitter authors. This additional knowledge about the author of a tweet, helps in classifying sarcastic tweets from nonsarcastic ones with a high degree of accuracy. We engineer and test a range of different feature-sets from tweet-text and it s author s self-description using LinearSVM and Logistic Regression predictive models. We achieve reliable results (an F-score of 0.78 and AUC of 0.88), which are comparable to the other sarcasm detection models reported in literature. Keywords: Sarcasm Detection, Sentiment Analysis, Text Classification, Natural Language Processing, Text Mining, Machine Learning, Social Media I. INTRODUCTION Sarcasm has become a common form of communication on social media platforms as people find it entertaining. Detecting sarcasm is considered a difficult task even for humans without the knowledge of the author s personality and without hand or facial gestures. Identifying sarcasm automatically is an even more challenging task, especially when it is expressed implicitly. For example, a tweet written by a high school boy says I LOVE being kicked out of my BBall team based on my FIRST Match!! #Sarcasm. Automated methods that analyse this tweet may (incorrectly) label it as a positive tweet because of the presence of the word LOVE. Detecting sarcasm from unstructured text generated on social media platforms has many applications, both in industry and the government sectors: such as sentiment analysis on political issues, product ranking system, review summarization and opinion mining. From the perspective of a business, it is very important to recognize sentiments of their customers but sarcastic comments make it a difficult task. As most of the internet users go to different social media platforms (Twitter, Facebook, Reddit etc.) to read 94

product reviews before buying or investing into any product or service. Thus, it is very important for businesses to know customers true opinions expressed on these social media platforms to keep the brand value high and make better business decisions. Building a machine learning model that can detect sarcasm from written text data is not a trivial task. Twitter has grown to 328 million active users per month in the is quarter of 2017. Twitter s growth and ease of using Twitter API for downloading tweets has opened a great opportunity for private businesses and government sectors to analyse twitter data and make informed and timely decisions. Building automatic sarcasm detection model on tweets comes with many challenges. Twitter has a word limit of 140 words, this compels users to use abbreviations, short sentences, unconventional style of writing and incorrect sentence structure to express opinions. In addition, positive words are mostly used to express negative sentiments, and thus, conventional text analysis techniques such as SentiWordNet1 (lexical database for opinion mining), bag-of-words, TF-IDF (term frequency - inverse document frequency), tokenisation, part-of-speech tagging etc. alone cannot be used to detect sarcasm. In our approach of Twitter Sarcasm Detector (TSD), we engineer features from two main pieces of information: a tweet s text and the author s selfdescription. Twitter allows a tweet of maximum 140 characters, which can contain #hashtags, images, URLs, @mentions to other users and emojis along with the plain text. Throughout this paper, we will use TT to refer to tweet s text. Twitter also gives it s users an option to describe him/herself within a limit of 160 words - we define this as Author s Self- Description, hereafter referred as ASD.Fig 2 shows an example of Twitter user s self-description. In this paper we categories our features into three categories, as explained below: _ Lexical based features L: Features such as n- grams,intensifiers, capital words, double quotes, and count of part-of-speech. We generate these features from TT and refer them as Lexical based features, related to words or vocabulary of a language. These features helps in capturing the nuances of writing style of a Twitter user. Figure 1. Example of a Sarcastic Tweet _ Sentiment based features S: Sarcastic tweets have contrasts in sentiments, as identified by earlier research work [14]. Thus, we calculate positive, negative and difference of sentiment score (using SentiWordNet and Python s TextBlob package) of different parts of TT as features. _ Topic-Modeling based features T: In machine learning and natural language processing, topicmodeling is a type of statistical model for discovering hidden semantic structures from a collection of documents. A document typically consists of multiple topics, where a topic is a cluster of co-occurring words. We apply topic modeling technique on ASD and use these as features. These features help us in capturing our assumption that ASD (author s selfdescription) can give a better idea of one s inclination towards sarcasm and author s twitter behaviour. Details of all the three categories and features can be found in Table II. The motivation of this work is to improve the accuracy of the automatic sarcasm detection model and make it more generalizable so that it can flag sarcastic tweets being generated on Twitter. This is based on the suggestions given in earlier research [3], 95

[11], [12] that includes features from tweets and information available about its author can help in detecting sarcasm. Thus, in our TSD model we propose to use Twitter s ASD (author s selfdescription) in order to generate features that can capture a tweet s author s Twitter behaviour / personality traits. For example, a high school teacher who describes himself as Ordinary Next door Guy Love football Live Simple is less likely to make sarcastic comments than a high school student whose self description is Funny Sarcastic Live Life to Fullest Social Media Enthusiast. More often than not, users on all social media platforms, while creating an account, get to write about themselves in a user-description/profiledescription section. For capturing personality traits of the tweet s author, we have applied Latent Dirichlet Allocation (LDA) [4], to create features to train our classifier. LDA is a popular topic modeling technique for finding topics, where a topic is defined as a cluster of co-occurring words with different probabilities of appearance in documents discussing these topics, as it assumes that a document is a mixture of such topics. We first learn these topics from the user s self-description text, then the classifier will have to learn about these specific topics and create an association with authors. This technique will add that extra information about the personality of an author. We propose a Twitter Sarcasm Detector model (TSD), which captures knowledge about a tweet s author in addition to lexical features from TT to enhance performance of the sarcasm classifiers. Fig 3 is a pictorial representation of our model. Our results suggest that lexical features alone are not Figure 2. Example of a User s/author s profile enough to capture sarcasm, so including more information about the author helps in improving the accuracy of the predictive model. The rest of the paper is organized as follows: Section II has the problem definition. Section III discusses the related literature work. Section IV describes our proposed methodology of sarcasm detection in detail through three sub-sections: data description, data cleaning and preprocessing, and feature engineering. A discussion of experimental setup and performance evaluation is given in Section V. Final conclusion and future directions are outlined in Section VI. II. PROBLEM DEFINITION Sarcasm, "remarks that mean the opposite of what someone says, made to criticize someone or something in a way that is amusing to others but annoying to the person criticized, as defined in Cambridge Dictionary 2. A way of expressing words that are the opposite of what you mean in order to be unpleasant to someone or to make fun of someone [12]. We formally define sarcasm detection problem for Twitter as follows: Definition: Sarcasm Detection on Twitter: given an unlabeled tweet T along with its author s profile description d, our aim is to automatically detect whether tweet T is sarcastic or not. 96

III. RELATED WORK Sarcasm detection can be formulated as a classification problem [8], in which, given a piece of text, the goal is to classify it either as sarcastic or non-sarcastic. Literature on sarcasm detection is mostly centered around the twitter data. This is mostly because Twitter has become the most popular platform for communication and sharing opinions with 319 million active users per month3. Ease of using Twitter API to download tweets has opened gates to many research topics and application of text mining. Sarcasm has been a widely researched topic under Natural Language Processing. A solution to the problem of sarcasm detection can improve the accuracy of many other important research topics such as Figure 3. Pictorial representation of TSD sentiment analysis, information retrieval, and all its additional linguistic features that characterize applications (example customer service, sarcasm such as intensifiers, exclamations and making/maintaining brand value). Quick response to negative or sarcastic tweets can help in saving a company s reputation. explicit markers. Bamman and Smith [3] focused on four classes of features which included author s feature, tweet features, response features and Davidov, Tsur and Rappoport [5] used a semisupervised audience features. They used binary logistic classification technique on Twitter data and Amazon product reviews. They used patternbased and punctuation-based features to train their weighted knn-like strategy for sarcasm detection. Liebrecht, Kunneman and Bosch [10] experimented on Dutch tweets and used the #sarcasm as their gold standard for collecting labeled data. They proposed regression to run experiments. Their analysis concluded that a significant improvement in accuracy is seen in models with features derived from author s information and not just tweet alone. Riloff et al. [14] used lexicon-based approach to detect sarcasm on an assumption that sarcastic tweets have a contrast in positive situation and negative situations. They used 97

SVM classifier for experimentations.rajadesingan, Zafarani and Lui [12] based their model of automatic sarcasm detection on theories from behavioural and psychological studies. They proposed a behavioural modeling framework for sarcasm detection, SCUBA, which captured sarcasm in form of features such as contrast of sentiments, complex forms of expression, mean of conveying emotions, familiarity with language and written expressions. SCUBA also takes help of past tweets of each author as additional knowledge to detect sarcasm. Joshi, Jain, Bhattacharyya and Carman [9] designed an approach which uses topic modeling that detect sarcasmprevalent topics from dataset of tweets and, estimates distributions corresponding to prevalence of a topic, prevalence of a sentiment-bearing words. Mukherjee and Bala [11] proposed feature-set that captured both the content as well as writing style of the author. They experimented with fuzzy clustering and Naive Bayes methods. Most of the work on automatic sarcasm detection is focused on linguistic and lexical features captured from tweets text. However, recently, Mukherjee and Bala [11] suggested that capturing knowledge about the authors of the tweet can improve automatic detection of sarcasm. This paper use a variation of Joshi, Jain, Bhattacharyya and Carman [9] approach. Our approach applies topic modeling on author s selfdescription instead of tweet s text. The rationale behind our approach is that most often, people when describe themselves on social networking sites, like Twitter, are concise while expressing their personality traits. Thus, this feature helps the model in capturing prevalence of sarcasm from self-reported personality traits in profile description of Twitter accounts. IV. METHODOLOGY The following subsections describe the data used in the proposed Twitter Sarcasm Detector (TSD) model and the required steps to prepare and preprocess this data so that it can be used towards extracting relevant features used in predicting the sarcasm in tweets. A. Data Description In order classify tweets into sarcastic or non-sarcastic categories using a machine learning model, we need labeled tweets. Labeling tweets manually is a time consuming task. Self-reporting of moods or sentiments with hashtags is a common practice on social networking sites. Prior work, [3],[9], [13], have utilized this important and common behavior of twitter users to generate labeled dataset. We follow the same standard approach to download tweets from Twitter API4 annotated with #sarcasm or #sarcastic as sarcastic tweet and all the other tweets as nonsarcastic tweets. In addition to just tweet s text (TT), we also download each tweet s author s selfdescription (ASD). The dataset contains 9,991 sarcastic (labeled as positive) and 10,000 nonsarcastic (labeled as negative) tweets, collected on 17 January 2017. Taking data directly from Twitter API is always noisy, thus we use many preprocessing steps, as explained in the next section, before applying the feature engineering module. B. Data Cleaning and Preprocessing The two main pieces of information we are using to build our sarcasm model are: tweet s text (TT) and the author s selfdescription (ASD). Both are in the string format, thus, we apply many preprocessing before feeding it into our feature engineering module. We apply separate preprocessing steps on TT and ASD. Twitter allows a tweet of maximum 140 characters, which can contain #hashtags, images, URLs, 98

@mentions to other users and emojis along with the plain text. For example, a tweet in Fig 1 contains #hashtags (#Sarcasm, #iphone). Similarly, when a user creates an account on Twitter, he/she gets to write a short bio for his/her profile. For example, a user in Fig 2 describes himself as sarcastic, genius and future billionaire. There is a restriction of 160 characters, thus, most people use different adjectives, separated by punctuation marks. During preprocessing of tweets, we prune all non- English tweets, as our model s focus is on English language tweets only. We exclude hyperlinks and @mentions as they do not contribute much in capturing sarcasm. We also remove images attached to tweets as we are not handling sarcasm expressed in images in this model. We replace most common contractions (like don t with do not, won t with will not, lol with laughing out loud, etc.) with full forms and also substituted emojis with their meanings such as :-( with sad, :-) with happy, :-@ angry. We remove the #Sarcasm and #Sarcastic hashtags from our positive dataset. Whereas, all the other hashtag words are kept by just removing the # character. Preprocessing steps of ASD includes, replacing missing ASD s with an alphabet and removing all the punctuation marks. Hashtag words are kept by just removing # character. C. Feature Engineering The most important module of our Twitter Sarcasm Detector (TSD) model is feature engineering where we extract features from tweets text (TT) and author s self-description (ASD) that can help in building an automatic sarcasm classifier.we divide these features into three categories L-LEXICAL, S- SENTIMENT, and T-TOPIC_MODELING based features. Table I lists a brief overview of these features, their categories and whether they are applied to TT or ASD. The features are defined as: L-LEXICAL = {N-grams, Intensifiers, Capital Letters, Word-Count, Double-Quotes, Part-of-Speech Tags}; S-SENTIMENT = {Sentiment Score and Contrast in Sentiments}; T-TOPIC_MODELING = {Topic_modeling using LDA}. L-LEXICAL features are based on words or vocabulary of English language, all these features are designed to capture the nuances of writing style of a Twitter user. S- SENTIMENT based features generally capture both positive, negative and difference in sentiment scores of tweet. This feature is based on Riloff et al. [14] suggestion that sarcastic tweets have a positive/negative contrast between a sentiment and a situation. Use of topic modeling technique to create feature category TTOPIC_ MODELING is inspired by Joshi, Jain, Bhattacharyya and Carman [9]. Table II describes each feature proposed with an example and rational behind using that feature. Running Example of TSD: The following is a running example of feature engineering module of TSD. It takes as input a tweet text (TT) and it s author s self-description (ASD). It generates the features shown below for the given TT and ASD. These features are described in Table II. Inputs: _ Tweet: "Dream Date: Love take me to Sephora Give me your credit card - Just Leave #Dream Date #Sarcasm #Sephora _ User self-description: "Optimistic Researcher Stem cell biologist Explorer witty :) sarcasm my 1st language fun loving. TSD-generated-features: _ L-LEXICAL = {n-grams:{ contains(dream, date) : 1, contains(love) : 1.0, contains(take me to sephora) : 1.0, contains(credit card) : 1.0, 99

contains(just leave) : 1.0, contains(give me your) : 1.0, contains(credit) : 1.0 etc. }, word_count :13, Capital_Letter : 0, POSVerb : 3.0, POSNoun : 5.0, POSAdverb : 2.0, POSAdjective : 3.0, intensifiers : 0}; _ S-SENTIMENT = { Sentiment_score: { { Negative sentiment :0.3278273,Positivesentiment :1.52008928 }, Sentiment_Contrast: { Sentiment contrast : 0.692261904 }}; _ T-TOPIC_MODELING={ Topic :184 : 0.0709312, Topic :134 : 0.071785, Topic :7 : 0.2143336, Topic:8 : 0.077306, Topic :5 : 0.0717857} The LDA algorithm [4] assigns five topics as features ofthe given ASD. Each feature consist of the topic identifierand the probabilty of its existence in ASD. V. EXPERIMENTAL EVALUATION The goal of our experiment is to show that the feature engineering done from tweet-text (TT) and author s self-description (ASD) in our proposed Twitter Sarcasm Detector (TSD) model can improve the performance of automatic sarcasm detection on Twitter Network. To classify a tweet into SARCASTIC or Table I LIST OF FEATURES, APPLIED TO AND THEIR CATEGORIES Table II DETAILS OF FEATURES ENGINEERED (*NEW FEATURES) 100

NON-SARCASTIC category we use SVM with Linear Kernel (C=0.1) and Logistic Regression (l2 penalty and C = 1.0) from python s Scikit-learn package. Performance of these models is evaluated using the 10-fold cross-validation method. Cross validation generalizes the model to unseen datasets by automatically performing a number of trainpredict-evaluate operations on different input data. In 10-fold cross-validation, input data is split into 10 parts, where 9 parts of data are used for training and 1 part is used for testing the model. This process is repeated 10 times and evaluation metrics are averaged. Thus, for an empirical performance evaluation of full and different combinations of feature-set, we use standard Precision, Recall, F- score, Accuracy and AUC of ROC (Area Under the Curve of Receiver Operating Characteristic) as metrics. Precision is a measure of fraction of retrieved instances that are relevant, thus, can be seen as a measure of exactness. Recall is defined as a fraction of relevant instances that are retrieved, thus, seen as a measure of correctness. F-score is the harmonic mean of precision and recall. Accuracy of a binary classifier is the proportion of correctly classified data, and Area Under the ROC Curve of a classifier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance i.e. P(score(x+) > score(x )). We experiment using different combinations of categories defined in the TSD model to study the impact of proposed additional feature (T- TOPIC_MODELING) on the performance metrics. The combinations used are: (a) L-LEXICAL + S- SENTIMENT + T-TOPIC_MODELING; (b) L- LEXICAL + T-TOPIC_MODELING; (c) L- LEXICAL + S-SENTIMENT; (d) L-LEXICAL + S_SENTIMENT. Table III shows experimental results of Twitter Sarcasm Detector (TSD) model for all the three categories (L-LEXICAL, S-SENTIMENT and T- TOPIC_MODELING) of features. Table IV shows results of two categories (L- LEXICAL & TTOPIC_ MODELING). Similarly, results of experiments for L-LEXICAL & S SENTIMENT and S-SENTIMENT & TTOPIC_ MODELING are shown in Table V and Table VI, respectively. Joachims [7] provides both theoretical and empirical evidence that SVM is well suited for text classification. The rationale behind using Linear Kernel is its faster performance and avoidance to overfitting with sparse matrix, as our model generates a number of nominal features under n-gram feature. A. Performance Evaluation To perform empirical evaluation of our proposed model, we download tweets using Twitter API and create a labeled dataset using the technique explained in Section IV-A. We apply 10-fold cross-validation evaluation method on both LinearSVM (C=0.1) and Logistic Regression (l2 penalty and C= 1.0). Davis and Goadrich [6] regard AUC as a single number summary for a classifier s performance. Thus, we use AUC to compare performance of individual and different combinations of our feature-sets. The AUC value of n-grams and Topic_Modeling features, when tested individually with 101

Figure 4. Comparative performance values of AUC LinearSVM, is 0.63479 and 0.62394, respectively. There is a significant improvement in AUC values (TSD-LinearSVM: 0.8789 and TSD- LogisticRegression: 0.8801), when T- TOPIC_MODELING is combined with L-LEXICAL feature category. Both LinearSVM and LogisticRegression perform well and give similar results for all the metrics for the different combinations of features. The AUC score of both the classifiers is around 88%, which is comparatively higher than the results of other automatic sarcasm detection models, provided in literature (See Fig. 4). We compare AUC values of our TSD with reported AUC values of Liebrecht, Kunneman and Bosch [10], SCUBA [12], and Abercrombie and Hovy [1][8], and we observe an improvement in AUC value in the range of 5 to 28% (Fig 4). Other performance measures are also found to be comparable to existing methods, if not higher (e.g. Accuracy and F-score are found to be 79%). Experiments with TSD data also show that excluding L-LEXICAL features (Tables III and VI) give much lower values of Accuracy and AUC, this supports the fact that lexical features extracted from tweets are essential to detecting sarcasm. An analysis of all the features and different combinations assert our hypothesis that providing additional information about the tweet s author s personality helps in improving the performance of automatic detection of sarcasm on Twitter data. VI. CONCLUSIONS AND FUTURE WORK In this paper, we propose a model called Twitter Sarcasm Detector (TSD), for automatically classifying sarcastic tweets by using features generated from tweet-text along with taking it s author s self-description into account. Our proposed approach acknowledges that some people are more sarcastic than others. Experimental results assert our hypothesis that features obtained from author s information in addition to their tweet, perform better in terms of correctly classifying sarcastic tweets. One of the application of our model is that classifying sarcasm can help businesses to analyse sentiments of customers towards their business. Knowing true sentiments of customers can help business make informed and better decisions by having the ability to answer questions like 102

What are the true emotions of my customers towards my products/company?. In future this work can be extended to analyse non- English tweets, extending this model to detect sarcasm on other microblogs (such as Facebook and Reddit) and engineer more features from information available about a tweet s author (e.g. number of followers, verified account, frequency of tweets per day). VII. ACKNOWLEDGEMENT This research was sponsored by Hiwave Technologies (7204833 Canada Inc.), Toronto, Canada. The authors gratefully acknowledge the use of services and facilities of Hiwave Technologies. The authors also thank Vibhu Bhan, who provided support towards this research. REFERENCES [1] Gavin Abercrombie and Dirk Hovy. Putting sarcasm detection into context: The effects of class imbalance and manual labelling on supervised machine classification of twitter conversations. In Proceedings of the ACL 2016 Student Research Workshop, pages 107 113, Berlin, Germany, August 2016. Association for Computational Linguistics. [2] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. LREC, 10:2200 2204, 2010. [3] David Bamman and Noah Smith. Contextualized sarcasm detection on twitter. International AAAI Confreence on Web and Social Media, Feb. 2015. [4] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(9), Jan93-1022 2003. 103

[5] Dmitry Davidov, Oren Tsur, and Ari Rappoport. Semi-supervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, CoNLL 10, pages 107 116, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. [6] Jesse Davis and Mark Goadrich. The relationship between precisionrecall and roc curves. In Proceedings of the 23rd International Conference on Machine Learning, ICML 06, pages 233 240, New York, NY, USA, 2006. ACM. [7] Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning, pages 137 142. Springer, 1998. [8] Aditya Joshi, Pushpak Bhattacharyya, and Mark James Carman. Automatic sarcasm detection: Survey. Computing Research Repository (CoRR), abs/1602.03426, 2016. [9] Aditya Joshi, Prayas Jain, Pushpak Bhattacharyya, and Mark Carman. Who would have thought of that! : A hierarchical topic model for extraction of sarcasm-prevalent topics and sarcasm detection. arxiv preprint arxiv:1611.04326, Nov. 2016. [10] Christine Liebrecht, Florian Kunneman, and Antal Van den Bosch. The perfect solution for detecting sarcasm in tweets #not. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 29 37, Atlanta, Georgia, June 2013. Association for Computational Linguistics. [11] Shubhadeep Mukherjee and Pradip Kumar Bala. Sarcasm detection in microblogs using naive bayes and fuzzy clustering. Technology in Society, 48:19 27, 2017. [12] Ashwin Rajadesingan, Reza Zafarani, and Huan Liu. Sarcasm detection on twitter: A behavioral modeling approach. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 15, pages 97 106, New York, NY, USA, 2015. ACM. [13] Kumar Ravi and Vadlamani Ravi. A novel automatic satire and irony detection using ensembled feature selection and data mining. Knowledge-Based Systems, 2016. [14] Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. Sarcasm as contrast between a positive sentiment and negative situation. Empirical Methods in Natural Language Processing, 13:704 714, Oct 2013. [15] Zelin Wang, Zhijian Wu, Ruimin Wang, and Yafeng Ren. Twitter sarcasm detection exploiting a context-based model. Web Information Systems Engineering WISE 2015, 9418 of the series Lecture Notes in Computer Science:77 91, Dec 2015. 104