Do We Criticise (and Laugh) in the Same Way? Automatic Detection of Multi-Lingual Satirical News in Twitter

Size: px
Start display at page:

Download "Do We Criticise (and Laugh) in the Same Way? Automatic Detection of Multi-Lingual Satirical News in Twitter"

Transcription

1 Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Do We Criticise (and Laugh) in the Same Way? Automatic Detection of Multi-Lingual Satirical News in Twitter Francesco Barbieri, Francesco Ronzano, Horacio Saggion Universitat Pompeu Fabra, Barcelona, Spain Abstract During the last few years, the investigation of methodologies to automatically detect and characterise the figurative traits of textual contents has attracted a growing interest. Indeed, the capability to correctly deal with figurative language and more specifically with satire is fundamental to build robust approaches in several sub-fields of Artificial Intelligence including Sentiment Analysis and Affective Computing. In this paper we investigate the automatic detection of Tweets that advertise satirical news in English, Spanish and Italian. To this purpose we present a system that models Tweets from different languages by a set of language independent features that describe lexical, semantic and usage-related properties of the words of each Tweet. We approach the satire identification problem as binary classification of Tweets as satirical or not satirical messages. We test the performance of our system by performing experiments of both monolingual and cross-language classifications, evaluating the satire detection effectiveness of our features. Our system outperforms a word-based baseline and it is able to recognise if a news in Twitter is satirical or not with good accuracy. Moreover, we analyse the behaviour of the system across the different languages, obtaining interesting results. 1 Introduction Satire is a form of language where humour and irony are employed to criticise and ridicule someone or something. Even if often misunderstood, in itself, satire is not a comic device it is a critique but it uses comedic devices such as parody, exaggeration, slapstick, etc. to get its laughs. [Colletta, 2009]. Satire is distinguished by figurative language and creative analogies, where the fiction pretends to be real. Satire is also characterised by emotions (like anger and disappointment) that are hard to detect due to their ironic dimension. The research described in this paper is partially funded by the Spanish fellowship RYC and the SKATER-TALN UPF project (TIN C06-03). The ability to properly detect and deal with satire would be strongly beneficial to several fields where a deep understanding of the metaphorical traits of language is essential, including Affective Computing [Picard, 1997] and Sentiment Analysis [Turney, 2002; Pang and Lee, 2008]. Looking at the big picture, computational approaches to satire are fundamental to build a smooth human-computer interaction, improving the way computers interpret and respond to peculiar human emotional states. In this paper we study the characterisation of satire in social networks, experimenting new approaches to detect satiric Tweets inside and across languages. We retrieve satirical Tweets from popular satirical news Twitter accounts, in English, Spanish and Italian. We rely on these accounts since their content is a contribution of several people and their popularity reflects the interest and appreciation for this type of language. We compare the Tweets of satirical news accounts with Tweets of popular newspapers, advertising actual news. A few examples from our dataset are the following ones: Satirical News English: Police Creative Writing Awards praise most imaginative witness statements ever. Spanish: Artur Mas sigue esperando el doble check de Mariano Rajoy tras la votación del 9-N. (Artur Mas is still waiting for Mariano Rajoy s double check after 9-N consultation). Italian: Potrei non opporre veti a un presidente del Pd, ha detto Berlusconi iscrivendosi al Pd. ( I might not limit powers of Democratic Party president, said Berlusconi enrolling in the Democratic Party). Non-Satirical News English: Major honours for The Times in the 2014 British Journalism Awards at a ceremony in London last night. Spanish: Rajoy admite que no ha hablado con Mas desde que se convocó el 9-N y que no sabe quién manda ahora en Cataluña. (Rajoy admits that he hasn t talked to Mas since the convocation of 9-N consultation and that he doesn t know who s governing in Catalonia). Italian: Berlusconi e il Colle: Non metterò veti a un 1215

2 candidato Pd. (Berlusconi states: I will not limit powers of Democratic Party president ). In these examples we can see that satire is used to criticise and convey a peculiar hidden meaning to the reader. The satirical English example is a critic against police and its dishonest way of solving issues by inventing witnesses. The satirical Spanish Tweet is a critic against Rajoy (Prime Minister of Spain at the time of writing), as he did not want to discuss with Mas (Prime Minister of Catalonia at the time of writing) the decision of doing a consultation on November 9th (on the Catalonia independence). For this reason Mas is still waiting for him to consider it. The satirical Tweet in Italian criticises the power that Berlusconi had in Italy even though he was not Italian prime minister any more. Our system relies on language-independent intrinsic word features (word usage frequency in a reference corpus, number of associated meanings, etc.) and on language dependent word-based features (lemmas, bigram, skip-gram). As classifier we employ the supervised algorithm Support Vector Machine 1 [Platt, 1999] because it has proven effective in text classification tasks. The contributions of this paper are: (1) a novel languageindependent framework to detect satire in Twitter, (2) a set of experiments to test our framework with English, Spanish and Italian Tweets, (3) a set of cross-language experiments to analyse similarities and differences in the use of satire in English, Spanish, and Italian, and (4) a dataset composed of satirical news Tweets and non-satirical news Tweets in English, Spanish and Italian. Our paper includes seven sections. In the second section we provide an overview of the state of the art on satire and related AI topics. In Section 3 we describe the tools we used to process Tweets in the three languages. In Section 4 we introduce the features we exploit to detect satiric Tweets in different languages, and in Section 5 we introduce and evaluate the experiment we carried out to test the performances of our satire-detection system. In the last two sections we discuss the cross-language abilities of our system, showing its behaviour across different languages. We then summarise our work in the last Section. 2 Literature Review Satire is a form of communication where humour and irony are used to criticise someone s behaviour and ridicule it. Satirical authors may be aggressive and offensive, but they always have a deeper meaning and a social signification beyond that of the humour [Colletta, 2009]. Satire loses its significance when the audience does not understand the real intents hidden in the ironic dimension. Indeed, the key message of a satirical utterances lays in the figurative interpretation of the ironic sentence. Satire has been often studied in literature [Peter, 1956; Mann, 1973; Knight, 2004; LaMarre et al., 2009], but rarely with a computational approach. The work of Burfoot and Baldwin [2009] attempts to computationally model satire in English. They retrieved 1 LibLINEAR: news-wires documents and satire news articles from the web, and build a model able to recognise satirical articles. Their approach included standard text classification, lexical features (including profanity and slang) and semantic validity where they identify the named entities in a given document and query the web for the conjunction of those entities. The use of irony in satire is fundamental. The traditional definition of irony is saying the opposite of what you mean [Quintilien and Butler, 1953]. Since 2010 researchers designed models to detect irony automatically. Veale [2010] proposed an algorithm for separating ironic from non-ironic similes in English, detecting common terms used in this ironic comparison. Reyes et. al [2013] proposed a model to detect irony in English Tweets, pointing out that skip-grams which capture word sequences that contain (or skip over) arbitrary gaps, are the most informative features. Barbieri and Saggion [2014a] and [2014b] designed an irony detection system that avoided the use of the word-based features. However, irony has not been study intensively in languages other than English. A few researches has been carried out on irony detection on other languages like Portuguese [Carvalho et al., 2009; de Freitas et al., 2014], Dutch [Liebrecht et al., 2013] and Italian [Barbieri et al., 2014]. Affective computing (AC) is a well known AI field dealing with Human-Computer Interaction. Affective computing studies intelligent systems that are able to recognise, process and generate human emotions. Emotions are relevant for AI as they play roles not only in human creativity but also in rational human thinking and decision making as computers that will interact naturally and intelligently with humans need the ability to at least recognise and express affect [Picard, 1997]. There are many AC applications [Tao and Tan, 2005] including computer vision (emotion of a face and body language), wearable computing, and all the natural language processing area (from the content to the voice tone). 3 Text Analysis and Tools We associated to each Tweet a normalised version of its text by expanding abbreviations and slang expressions, properly converting hashtags into words whether they have a syntactic role (i.e. they are part of the sentence), and removing links and mentions ). We describe in this section the tools and dataset we used. 3.1 English Tools We made use of the GATE application TwitIE [Bontcheva et al., 2013] where we enriched the normaliser, adding new abbreviations, new slang words, and improving the normalisation rules. We also employed TwitIE for tokenisation, Part Of Speech (POS) tagging and lemmatisation. We used Word- Net [Miller, 1995] to extract synonyms and synsets of a word. We employed the sentiment lexicon SentiWordNet3.0 [Baccianella et al., 2010]. Finally, the American National Corpus 2 has been employed as frequency corpus to obtain the usage frequency of words in English

3 Language Non-Satirical Satirical English The Daily Mail (UK) NewsBiscuit The Times (UK) The Daily Mash Spanish El Pais El Mundo Today El Mundo El Jueves Italian Repubblica Spinoza Corriere della Sera Lercio Table 1: List of Twitter accounts of newspaper and satirical news in British English, Iberian Spanish, and Italian. 3.2 Spanish Tools We relied on the tool Freeling [Carreras et al., 2004] to perform sentence splitting, tokenisation, stop words removal, POS tagging, and Word Sense Disambiguation. WSD in Freeling using the Spanish Wordnet of the TALP Research Centre, mapped by means of the Inter-Lingual-Index to the English Wordnet 3.0 whose synset IDs are in turn characterised by sentiment scores by means of SentiWordnet. As corpus frequency we used the texts of a dump of the Spanish Wikipedia as of May Italian Tools We tokenised, POS tagged, applied Word Sense Disambiguation (UKB) and removed stop words from the normalised text of Tweets by exploiting Freeling. We also used the Italian WordNet to get synsets and synonyms of each word of a Tweet as well as the sentiment lexicon Sentix [Basile and Nissim, 2013] derived from SentiWordnet to get the polarity of synsets. We relied on the CoLFIS Corpus frequency of Written Italian Dataset In order to train and test our system we retrieved Tweets from twelve twitter accounts from June 2014 to January We considered four Twitter accounts for each language (English, Spanish and Italian), and within each language two are satirical and two are non-satirical newspapers. They are shown in Table 1. After downloading the Tweets we filtered them removing the Tweets that were not relevant to our study (for instance: Buy our t-shirt or Watch the video ). We left only Tweets that were actual news (satirical or non-satirical). In order to have a balanced dataset, with the same contribution from each Twitter account, we selected 2,766 Tweets randomly from each account, obtaining a total of 33,192 Tweets, where half (16,596) were satirical and half were non-satirical news (2,766 was the least number of Tweets that a single account included, which was the Italian satirical account Lercio ). We shared 5 this dataset as a list of Tweet IDs since per Twitter policy it is not possible to share the text of the Tweets. 4 Our Model We characterised each Tweet by six classes of features: (1) Word-Based, (2) Frequency, (3) Synonyms, (4) Ambiguity, eng.htm 5 (5) Part of Speech, (6) Sentiments, and (7) Punctuation. Some of these features were aimed at capturing common word-patterns (1) and others to describe intrinsic aspects of the words included in each Tweet (2-6). The interesting propriety of the intrinsic word features is that they do not rely on words-patterns, hence can be used across languages. 4.1 Word-Based We designed this group of features to build our baseline, since word based features are usually very competitive in text classification tasks. We computed the five word-based features: lemma (lemmas of the Tweet), bigrams (combination of two lemmas in a sequence) and skip 1/2/3 gram. 4.2 Frequency We accessed the frequency corpora (of each language, Section 3) to retrieve the frequency of each word of a Tweet. Thus, we derive three types of Frequency features: rarest word frequency (frequency of the most rare word included in the Tweet), frequency mean (the arithmetic average of all the frequency of the words in the Tweet) and frequency gap (the difference between the two previous features). These features are computed including all the words of each Tweet. We also determined these features by considering only Nouns, Verbs, Adjectives, and Adverbs. Moreover, we count the number of bad/slang words in the Tweet (using three lists we compiled for each language). The final number of Frequency features is Ambiguity To model the ambiguity of the words in the Tweets we use the WordNet synsets associated to each word. Our hypothesis is that if a word includes several meanings/synsets it is more likely to be used in an ambiguous way. For each Tweet we calculate the maximum number of synsets associated to a single word, the mean synset number of all the words, and the synset gap that is the difference between the two previous features. We determine the value of these features by including all the words of a Tweet as well as by considering only Nouns, Verbs, Adjectives or Adverbs separately. The Ambiguity features are Part Of Speech The features included in the Part Of Speech (POS) group are designed to capture the syntactic structure of the Tweets. The features of this group are eight and each one of them counts the number of occurrences of words characterised by a certain POS. The eight POS considered are Verbs, Nouns, Adjectives, Adverbs, Interjections, Determiners, Pronouns, and Appositions. 4.5 Synonyms We consider the frequencies (for each language its own frequency corpora, see Section 3) of the synonyms of each word in the Tweet, as retrieved from WordNet. Then we computed, across all the words of the Tweet: the greatest and the lowest number of synonyms with frequency higher than the one present in the Tweet, the mean number of synonyms with frequency greater/lower than the frequency of the related word 1217

4 present in the Tweet. We determine also the greatest/lowest number of synonyms and the mean number of synonyms of the words with frequency greater/lower than the one present in the Tweet (gap feature). We computed the set of Synonyms features by considering both all words of the Tweet together and only words belonging to each one of the four Parts of Speech listed before. 4.6 Sentiments The sentiments of the words in Tweets are important for two reasons: to detect the sentiment (e.g. if Tweets contain mainly positive or negative terms) and to capture unexpectedness created by a negative word in a positive context or vice versa. Relying on the three Sentiment lexicons described in Section 3, we computed the number of positive / negative words, the sum of the intensities of the positive / negative scores of words, the mean of positive / negative score of words, the greatest positive / negative score, the gap between the greatest positive / negative score and the positive / negative mean. Moreover we simply count (and measure the ratio) the words with polarity not equal to zero, to detect subjectivity in the Tweet. As previously done, we computed these features by considering only Nouns, Verbs, Adjectives, and Adverbs. 4.7 Characters These features were designed to capture the punctuation style of the satirical Tweets. Each feature that is part of this set is the number of a specific punctuation mark, including:.,!,?, $, %, &, +, -, =. We also compute numbers of Uppercase and Lowercase characters, and length of the Tweet. 5 Experiments and Results In order to test the performances of our system we run monolingual and cross-lingual experiments. 5.1 Monolingual Experiments In order to test the performances of our system we run two kind of balanced binary classification experiments, where the two classes are satire and non-satire. We gathered three datasets of English, Spanish and Italian Tweets; each dataset includes two newspaper accounts, N1 and N2, and two satirical news accounts, S1 and S2. In the first binary balanced classification experiment, we train the system on a dataset composed of 80% of Tweets from one of the newspaper accounts and 80% of Tweets from one of the satirical accounts (5,444 Tweets in total). Then we test the system on a dataset that includes 20% of the Tweets of a newspaper account that is different from the one used for training and 20% of the Tweets of a satirical account that has not been used for training. The final size of our testing set is 1,089 Tweets. We run the following configurations: Train: 80% N1 and 80% S1 / Test: 20% N2 and 20% S2 Train: 80% N1 and 80% S2 / Test: 20% N2 and 20% S1 Train: 80% N2 and 80% S1 / Test: 20% N1 and 20% S2 Train: 80% N2 and 80% S2 / Test: 20% N1 and 20% S1 It is relevant to remark that thanks to these training and test set configurations, we never use Tweets from the same account in both the training and testing datasets, thus we can evaluate the ability of our system to detect satire independently from the linguistic and stylistic features of a specific Twitter account. As a consequence we avoid the account modelling / recognition effect, as the system is never trained on the same accounts where it is tested. In the second binary balanced classification experiment, the training set is composed of all the Tweets of each account. The dataset include 33,192 Tweets, and we evaluate the performance of our SVM classifier by a 5-folds cross validation. For each experiment we evaluate a word-based model (W- B, word-based features from Section 4.1) that we consider our baseline, a model that relies on intrinsic word features (described from Section 4.2 to Section 4.6), and a third model that includes all the features of Section 4. English Experiments In Table 2 are reported the results of the English classification. When training on The Times and The Daily Mash and testing on the others the word-based features obtained same results than our models (F1 of versus 0.635). In all the other cases, including the classification of all satirical Tweets versus all non satirical ones (N1+N2 vs S1+S2), the intrinsicword model outperforms the word-based one. We can note that the results of the second experiment are higher with respect to any feature set, especially if we consider word-based features. We have to highlight that unlike the first experiment, in the second experiment Tweets from the same accounts are used both for training and testing, thus the system is able to learn to recognise besides satire, also the language and writing style features of each account. When we extended the intrinsic word feature set by adding also word based features, we can observe that the performances of our classifiers improved (up to in one combination, and up to in the union of the accounts). According to the information gain scores, the best features in the N1+N2 vs S1+S2 dataset (see Table 6) belongs to the groups Character (length of the Tweet, the number of First uppercase words), POS (number of nouns) and Sentiment groups (ratio of words with polarity), Ambiguity (synset gap of nouns) and Frequency (rarest word frequency). Train Test W-B Intrinsic All N1S1 N2S N1S2 N2S N2S1 N1S N2S2 N1S N1N2S1S2 5-fold Table 2: English monolingual classification. The table shows the F1 of each model, where N1=The Daily Mail, N2=The Times, S1=NewsBiscuit and S2=The Daily Mash. In bold the best results (not by chance confirmed by two-matchedsamples t-test with unknown variances) between word-based and Intrinsic models. 1218

5 Spanish Experiments The Spanish model performances are reported in Table 3. F- measures are promising, with the best score when training on the accounts El Mundo and El Mundo Today (0.805 using only intrinsic word features). The intrinsic word features outperformed the word-based baseline in all the classifications. When adding the word-based features to the intrinsic features the results decrease in three cases out of four. Moreover word-based model obtained worse results also in the N1+N2 vs S1+S2 classification, even with the chance of modelling specific accounts. We can see in Table 6 that best features for Spanish were the Character (length, uppercase character ratio), POS (number of noun and appositions) and Frequency group (frequency gap of nouns, rarest noun and rarest adjective) and Ambiguity (mean of the number of synsets). Train Test W-B Intrinsic All N1S1 N2S N1S2 N2S N2S1 N1S N2S2 N1S N1N2S1S2 5-fold Table 3: Spanish monolingual classification. The table shows the F1 of each model, where N1=El Pais, N2=El Mundo, S1=El Mundo Today, and S2=El Jueves. In bold the best results between word-based and Intrinsic models (same statistical test than English). Italian Experiments In the Italian experiments (Table 4) the intrinsic-word model outperformed the word-based model in all the combinations obtaining the best result when training on Repubblica and Lercio and testing on the other accounts (F1 are respectively and 0.541). Incorporating word-based features to the intrinsic-word features model increased the F1 in two cases and decrease in the other two. However in the second type of experiment adding word-features helps. In Table 6 we can see that the best groups of features to detect satire was Characters (uppercase and lowercase ratio, length) POS (number of verbs), Ambiguity (verb synset mean, gap and max number of synset), Frequency (verb mean, gap, and rarest). In general, verbs seems play an important role in satire detection in Italian. Train Test W-B Intrinsic All N1S1 N2S N1S2 N2S N2S1 N1S N2S2 N1S N1N2S1S2 5-fold Table 4: Italian monolingual classification. The table shows the F1 of each model, where N1=Repubblica, N2=Corriere della Sera, S1=Spinoza, and S2=Lercio. In bold the best results between word-based and Intrinsic models (same statistical test than English). 5.2 Cross-Lingual Experiments In addition to these experiments focused on a single language, we also analysed the performances of a system composed of only language independent features (intrinsic features, features form 2 to 6 in Section 4) in a multi-lingual context, running two types of experiments. In the first cross-language experiment we train our model on the Tweets in a language and test the model over the Tweets of a different language; in this way we can see if the satirical accounts of different languages cross-reinforce the ability to model satire. By considering each language pair, we trained our satirical Tweet classifier on a language and tested it on another one. We carry out these experiments to gain a deeper understanding of our intrinsic model assessing whether a model induced from one language can be used to detect the satire phenomena in a different language. The second cross-language experiment was a 5-folds cross validation over all the dataset, including the Tweets from all the accounts of all the languages (total of 22,228 Tweets, where 16,596 were satirical and 16,596 non-satirical news). Table 5 shows the results of the cross-lingual experiments (F1 of Non-Satirical and Satirical classes and the mean). A model trained in one language is not always capable of recognising satire in a different language. For example, a model trained in Italian is not able to recognise English and Spanish satire (F1 of 0.05 and 0.156). However, when testing in Italian and training in English and Spanish the system obtains the highest F1 scores of this type of experiment (respectively and 0.695). When testing in English the system recognises satire (0.669) but not newspapers (0.031) when trained in Spanish, and vice versa when trained in Italian (good F1 for non-satirical newspaper, but low for satire). When testing Spanish (while training in an other language) the system seems better recognising newspapers rather than satire. One of the most interesting result is the 5-fold cross validation over the whole dataset, including all the accounts of all the languages (last raw of Table 5). The F1 score of this experiment is and it can be considered a high score considering the noise that could derive when we generate the same features in different languages. Indeed, the word-based model scores 8 point less. Train Test Non-sat. Satire Mean English Spanish English Italian Spanish English Spanish Italian Italian Spanish Italian English All word-based (5-folds) All intrinsic (5-folds) Table 5: Cross-Languages experiments. Train in one language and testing in a different one, and in the last two raws a 5-folds cross validation on the whole dataset (all accounts of all languages) using the word-based and the intrinsic-word models. 1219

6 n English Spanish Italian Eng+Spa+Ita 1 [char]length [char]length [char]uppercase-ratio [char]tot-char 2 [pos]num-noun [char]uppercase-ratio [char]lowercase-ratio [char]uppercase-ratio 3 [char]first-uppercase [char]lowercase-ratio [pos]num-verbs [char]lowercase-ratio 4 [senti]words-with-pol [char]first-uppercase [char]length [pos]num-nouns 5 [char]lowercase-ratio [pos]num-noun [amb]verb-synset-mean [char]first-uppercase 6 [senti]positive-ratio [char]longest-word [amb]verb-synset-gap [pos]num-verbs 7 [freq]rarest-noun [char]exclamation-mark [amb]verb-max-synset [amb]verb-max-synset 8 [char]uppercase-ratio [char]longest-shortest-gap [freq]verb-mean-freq [amb]verb-synset-gap 9 [senti]noun-with-pol-ratio [char]average-word-length [freq]verb-gap-freq [amb]verb-synset-mean 10 [amb]noun-synset-gap [pos]num-adpositions [freq]rarest-verb [freq]rarest-noun 11 [char]shortest-word [freq]noun-gap-freq [pos]num-noun [char]longest-shortest-gap 12 [freq]rarest-word [freq]rarest-adjective [char]longest-word [char]longest-word 13 [amb]word-synset-gap [freq]rarest-noun [char]longest-shortest-gap [char]average-word-length 14 [amb]max-noun-synset [pos]num-numbers [amb]max-num-synset [senti]words-with-pol 15 [syno]lowest-gap [amb]synset-mean [pos]num-pronoun [freq]verb-mean-freq Table 6: Best 15 features of the language-independent model (all features without word-based) ranked considering the information gain scores in the N1+N2 vs S1+S2 dataset. In the last column are reported the best features considering the arithmetic average of the information gain of each language. In [bold] are reported the group of each feature. 6 Discussion Across the three languages we considered, the different quality of the linguistic resources adopted as well as the distinct accuracy on the NLP tools exploited to analyse Tweets introduce some biases when we generate our set of cross lingual features, referred to as intrinsic-word features. These biases have to be considered in the interpretation of the results of our cross-lingual experiments. Our intrinsic-word features model (features from 2 to 6 in Section 4) outperforms the word-based baseline in each single language experiments, showing that the use of intrinsicword features represent a good approach for satire detection across the three languages we considered. The best performance of the intrinsic-word features occurs in the Italian dataset, where they obtain an F-measure of in one combination, while the word-based model scores only Adding word-based features to the intrinsic-word model seems to increase the performance only in the second type of experiment, where all accounts are included in the training. Yet, the word-based features are strictly related to the words used by specific accounts. The use of word-based features is not domain and language independent because it is strictly related to specific words rather than inner crossaccount and cross-language linguistic traits of satire. The best features (see Table 6) across the languages were Characters, Part Of Speech and Ambiguity. In English we note that beside the Characters features (relevant in all the languages), the number of words with polarity (positive or negative) is important (but not that important for Spanish and Italian). Additionally, the use of rare nouns (infrequent) is a characteristic of English satire. What distinguishes Spanish satire is the number of nouns and appositions, and the use of long words. In this language also the detection of rare nouns and rare adjective is a distinctive feature of satire. In Italian, the Characters feature are also important, especially the uppercase and lowercase ratio. Moreover in Italian satire verbs play a key role. Indeed the number of verbs, the number of synsets associated to a verb and the frequency usage of a verb (if it is rare or not) are strongly distinctive for Italian satirical news. Furthermore, as in Spanish, using long words may be sign of Italian satire. One last curious result is that the use of slang and bad words is not relevant if compared to the satire detection contributions of structural features (Characters and Frequency) and semantic features (like ambiguity). This fact suggests that the satirical news of the accounts we selected mimic appropriately non-satirical news. In the cross-lingual experiments we can deduce that it is not always possible to train in one language and test in another one with the proposed model (Table 5). Yet, there are interesting results. For instance, when training in Italian the system is not able to detect English and Spanish satire, but when testing on Italian and training in the other languages results are better. The interpretation may be that Italian satire is less intricate, easy to detect but not able to recognise other kind of satire. Our intrinsic-word model when trained in Spanish is able to detect Italian satire with a precision of (with satire F1 of 0.733), which is a very interesting result considering the complexity of the task. We need to consider that the two datasets are written in different languages, and the satirical topics are different (as they are related to politics and culture). On the other hand English can not be detected by Spanish nor Italian systems, but they both can recognise an aspect of the English dataset (Spanish recognises English satire, and Italian recognises with good accuracy, F1 of 0.71, English newspapers). Finally, the last results that deserve further analysis is the 5-fold cross validation over the all dataset, where all the accounts of all the languages were included. The accuracy of our model is promising (F1 of 0.767) as in this dataset the noise is very high: 22,228 Tweets on three different languages and different topics. 1220

7 7 Conclusions In this paper we proposed an approach to detect news satire in Twitter in different languages. Our approach avoids the use of word-based features (Bag Of Words), by relying only on language independent features, that we referred to as intrinsicword features since they aim to detect inner characteristics of the words. We tested our approach on English, Spanish and Italian Tweets and obtained significant results. Our system was able to recognise if a Tweet advertises a non-satirical or satirical news, outperforming a word-based baseline. Moreover we tested the system with cross-language experiments, obtaining interesting results that deserve of a deeper investigation. We plan to explore our approach with new languages, and seek methods to combine languages to obtain better accuracy in cross-lingual satiric detection. References [Baccianella et al., 2010] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In LREC, volume 10, pages , [Barbieri and Saggion, 2014a] Francesco Barbieri and Horacio Saggion. Automatic Detection of Humour and Irony in Twitter. International Conference on Computational Creativity, ICCC, [Barbieri and Saggion, 2014b] Francesco Barbieri and Horacio Saggion. Modelling Irony in Twitter. In Proceedings of the EACL Student Research Workshop, pages 56 64, Gothenburg, Sweden, April ACL. [Barbieri et al., 2014] Francesco Barbieri, Francesco Ronzano, and Horacio Saggion. Italian Irony Detection in Twitter: a First Approach. The First Italian Conference on Computational Linguistics CLiC-it 2014, page 28, [Basile and Nissim, 2013] Valerio Basile and Malvina Nissim. Sentiment analysis on italian tweets. In Proceedings of the 4th WASSA Workshop, pages , [Bontcheva et al., 2013] Kalina Bontcheva, Leon Derczynski, Adam Funk, Mark A. Greenwood, Diana Maynard, and Niraj Aswani. TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text. In Proceedings of Recent Advances in Natural Language Processing Conferemce, [Burfoot and Baldwin, 2009] Clint Burfoot and Timothy Baldwin. Automatic satire detection: Are you having a laugh? In Proceedings of the ACL-IJCNLP 2009 conference short papers, pages ACL, [Carreras et al., 2004] Xavier Carreras, Isaac Chao, Lluis Padró, and Muntsa Padró. Freeling: An open-source suite of language analyzers. In LREC, [Carvalho et al., 2009] Paula Carvalho, Luís Sarmento, Mário J Silva, and Eugénio de Oliveira. Clues for detecting irony in user-generated contents: oh...!! it s so easy;-). In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, pages ACM, [Colletta, 2009] Lisa Colletta. Political satire and postmodern irony in the age of stephen colbert and jon stewart. The Journal of Popular Culture, 42(5): , [de Freitas et al., 2014] Larissa A de Freitas, Aline A Vanin, Denise N Hogetop, Marco N Bochernitsan, and Renata Vieira. Pathways for irony detection in tweets. In Proceedings of the 29th Annual ACM Symposium on Applied Computing, pages ACM, [Knight, 2004] Charles A Knight. The literature of satire. Cambridge University Press, [LaMarre et al., 2009] Heather L LaMarre, Kristen D Landreville, and Michael A Beam. The irony of satire political ideology and the motivation to see what you want to see in the colbert report. The International Journal of Press/Politics, 14(2): , [Liebrecht et al., 2013] Christine Liebrecht, Florian Kunneman, and Antal van den Bosch. The perfect solution for detecting sarcasm in tweets# not. WASSA 2013, page 29, [Mann, 1973] Jill Mann. Chaucer and Medieval Estates Satire: The Literature of Social Classes and the General Prologue to the Canterbury Tales. Cambridge University Press Cambridge, [Miller, 1995] George A Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39 41, [Pang and Lee, 2008] Bo Pang and Lillian Lee. Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr., 2(1-2):1 135, January [Peter, 1956] John Peter. Complaint and satire in early english literature [Picard, 1997] RW Picard. Affective Computing. MIT Press, [Platt, 1999] John Platt. Fast training of support vector machines using sequential minimal optimization. Advances in kernel methodssupport vector learning, 3, [Quintilien and Butler, 1953] Quintilien and Harold Edgeworth Butler. The Institutio Oratoria of Quintilian. With an English Translation by HE Butler. W. Heinemann, [Reyes et al., 2013] Antonio Reyes, Paolo Rosso, and Tony Veale. A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, pages 1 30, [Tao and Tan, 2005] Jianhua Tao and Tieniu Tan. Affective computing: A review. In Affective computing and intelligent interaction, pages Springer, [Turney, 2002] Peter D Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th ACL, pages Association for Computational Linguistics, [Veale and Hao, 2010] Tony Veale and Yanfen Hao. Detecting Ironic Intent in Creative Comparisons. In ECAI, volume 215, pages ,

Modelling Sarcasm in Twitter, a Novel Approach

Modelling Sarcasm in Twitter, a Novel Approach Modelling Sarcasm in Twitter, a Novel Approach Francesco Barbieri and Horacio Saggion and Francesco Ronzano Pompeu Fabra University, Barcelona, Spain .@upf.edu Abstract Automatic detection

More information

Modelling Irony in Twitter: Feature Analysis and Evaluation

Modelling Irony in Twitter: Feature Analysis and Evaluation Modelling Irony in Twitter: Feature Analysis and Evaluation Francesco Barbieri, Horacio Saggion Pompeu Fabra University Barcelona, Spain francesco.barbieri@upf.edu, horacio.saggion@upf.edu Abstract Irony,

More information

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm Anupam Khattri 1 Aditya Joshi 2,3,4 Pushpak Bhattacharyya 2 Mark James Carman 3 1 IIT Kharagpur, India, 2 IIT Bombay,

More information

LT3: Sentiment Analysis of Figurative Tweets: piece of cake #NotReally

LT3: Sentiment Analysis of Figurative Tweets: piece of cake #NotReally LT3: Sentiment Analysis of Figurative Tweets: piece of cake #NotReally Cynthia Van Hee, Els Lefever and Véronique hoste LT 3, Language and Translation Technology Team Department of Translation, Interpreting

More information

World Journal of Engineering Research and Technology WJERT

World Journal of Engineering Research and Technology WJERT wjert, 2018, Vol. 4, Issue 4, 218-224. Review Article ISSN 2454-695X Maheswari et al. WJERT www.wjert.org SJIF Impact Factor: 5.218 SARCASM DETECTION AND SURVEYING USER AFFECTATION S. Maheswari* 1 and

More information

This is a repository copy of Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis.

This is a repository copy of Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. This is a repository copy of Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/130763/

More information

Sarcasm Detection in Text: Design Document

Sarcasm Detection in Text: Design Document CSC 59866 Senior Design Project Specification Professor Jie Wei Wednesday, November 23, 2016 Sarcasm Detection in Text: Design Document Jesse Feinman, James Kasakyan, Jeff Stolzenberg 1 Table of contents

More information

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews Universität Bielefeld June 27, 2014 An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews Konstantin Buschmeier, Philipp Cimiano, Roman Klinger Semantic Computing

More information

Affect-based Features for Humour Recognition

Affect-based Features for Humour Recognition Affect-based Features for Humour Recognition Antonio Reyes, Paolo Rosso and Davide Buscaldi Departamento de Sistemas Informáticos y Computación Natural Language Engineering Lab - ELiRF Universidad Politécnica

More information

The final publication is available at

The final publication is available at Document downloaded from: http://hdl.handle.net/10251/64255 This paper must be cited as: Hernández Farías, I.; Benedí Ruiz, JM.; Rosso, P. (2015). Applying basic features from sentiment analysis on automatic

More information

Harnessing Context Incongruity for Sarcasm Detection

Harnessing Context Incongruity for Sarcasm Detection Harnessing Context Incongruity for Sarcasm Detection Aditya Joshi 1,2,3 Vinita Sharma 1 Pushpak Bhattacharyya 1 1 IIT Bombay, India, 2 Monash University, Australia 3 IITB-Monash Research Academy, India

More information

Sentiment Analysis. Andrea Esuli

Sentiment Analysis. Andrea Esuli Sentiment Analysis Andrea Esuli What is Sentiment Analysis? What is Sentiment Analysis? Sentiment analysis and opinion mining is the field of study that analyzes people s opinions, sentiments, evaluations,

More information

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli Introduction to Sentiment Analysis Text Analytics - Andrea Esuli What is Sentiment Analysis? What is Sentiment Analysis? Sentiment analysis and opinion mining is the field of study that analyzes people

More information

Automatic Detection of Sarcasm in BBS Posts Based on Sarcasm Classification

Automatic Detection of Sarcasm in BBS Posts Based on Sarcasm Classification Web 1,a) 2,b) 2,c) Web Web 8 ( ) Support Vector Machine (SVM) F Web Automatic Detection of Sarcasm in BBS Posts Based on Sarcasm Classification Fumiya Isono 1,a) Suguru Matsuyoshi 2,b) Fumiyo Fukumoto

More information

KLUEnicorn at SemEval-2018 Task 3: A Naïve Approach to Irony Detection

KLUEnicorn at SemEval-2018 Task 3: A Naïve Approach to Irony Detection KLUEnicorn at SemEval-2018 Task 3: A Naïve Approach to Irony Detection Luise Dürlich Friedrich-Alexander Universität Erlangen-Nürnberg / Germany luise.duerlich@fau.de Abstract This paper describes the

More information

LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets

LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets Hongzhi Xu, Enrico Santus, Anna Laszlo and Chu-Ren Huang The Department of Chinese and Bilingual Studies The Hong Kong Polytechnic University

More information

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS Dario Bertero, Pascale Fung Human Language Technology Center The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong dbertero@connect.ust.hk,

More information

Formalizing Irony with Doxastic Logic

Formalizing Irony with Doxastic Logic Formalizing Irony with Doxastic Logic WANG ZHONGQUAN National University of Singapore April 22, 2015 1 Introduction Verbal irony is a fundamental rhetoric device in human communication. It is often characterized

More information

Document downloaded from: This paper must be cited as:

Document downloaded from:  This paper must be cited as: Document downloaded from: http://hdl.handle.net/10251/35314 This paper must be cited as: Reyes Pérez, A.; Rosso, P.; Buscaldi, D. (2012). From humor recognition to Irony detection: The figurative language

More information

arxiv: v1 [cs.cl] 3 May 2018

arxiv: v1 [cs.cl] 3 May 2018 Binarizer at SemEval-2018 Task 3: Parsing dependency and deep learning for irony detection Nishant Nikhil IIT Kharagpur Kharagpur, India nishantnikhil@iitkgp.ac.in Muktabh Mayank Srivastava ParallelDots,

More information

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections 1/23 Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections Rudolf Mayer, Andreas Rauber Vienna University of Technology {mayer,rauber}@ifs.tuwien.ac.at Robert Neumayer

More information

Detecting Sarcasm in English Text. Andrew James Pielage. Artificial Intelligence MSc 2012/2013

Detecting Sarcasm in English Text. Andrew James Pielage. Artificial Intelligence MSc 2012/2013 Detecting Sarcasm in English Text Andrew James Pielage Artificial Intelligence MSc 0/0 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference

More information

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics Olga Vechtomova University of Waterloo Waterloo, ON, Canada ovechtom@uwaterloo.ca Abstract The

More information

Implementation of Emotional Features on Satire Detection

Implementation of Emotional Features on Satire Detection Implementation of Emotional Features on Satire Detection Pyae Phyo Thu1, Than Nwe Aung2 1 University of Computer Studies, Mandalay, Patheingyi Mandalay 1001, Myanmar pyaephyothu149@gmail.com 2 University

More information

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs Damian Borth 1,2, Rongrong Ji 1, Tao Chen 1, Thomas Breuel 2, Shih-Fu Chang 1 1 Columbia University, New York, USA 2 University

More information

Identifying functions of citations with CiTalO

Identifying functions of citations with CiTalO Identifying functions of citations with CiTalO Angelo Di Iorio 1, Andrea Giovanni Nuzzolese 1,2, and Silvio Peroni 1,2 1 Department of Computer Science and Engineering, University of Bologna (Italy) 2

More information

Do we really know what people mean when they tweet? Dr. Diana Maynard University of Sheffield, UK

Do we really know what people mean when they tweet? Dr. Diana Maynard University of Sheffield, UK Do we really know what people mean when they tweet? Dr. Diana Maynard University of Sheffield, UK We are all connected to each other... Information, thoughts and opinions are shared prolifically on the

More information

Acoustic Prosodic Features In Sarcastic Utterances

Acoustic Prosodic Features In Sarcastic Utterances Acoustic Prosodic Features In Sarcastic Utterances Introduction: The main goal of this study is to determine if sarcasm can be detected through the analysis of prosodic cues or acoustic features automatically.

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Amal Htait, Sebastien Fournier and Patrice Bellot Aix Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,13397,

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Mining Subjective Knowledge from Customer Reviews: A Specific Case of Irony Detection

Mining Subjective Knowledge from Customer Reviews: A Specific Case of Irony Detection Mining Subjective Knowledge from Customer Reviews: A Specific Case of Irony Detection Antonio Reyes and Paolo Rosso Natural Language Engineering Lab - ELiRF Departamento de Sistemas Informáticos y Computación

More information

Helping Metonymy Recognition and Treatment through Named Entity Recognition

Helping Metonymy Recognition and Treatment through Named Entity Recognition Helping Metonymy Recognition and Treatment through Named Entity Recognition H.BURCU KUPELIOGLU Graduate School of Science and Engineering Galatasaray University Ciragan Cad. No: 36 34349 Ortakoy/Istanbul

More information

#SarcasmDetection Is Soooo General! Towards a Domain-Independent Approach for Detecting Sarcasm

#SarcasmDetection Is Soooo General! Towards a Domain-Independent Approach for Detecting Sarcasm Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference #SarcasmDetection Is Soooo General! Towards a Domain-Independent Approach for Detecting Sarcasm Natalie

More information

First Stage of an Automated Content-Based Citation Analysis Study: Detection of Citation Sentences 1

First Stage of an Automated Content-Based Citation Analysis Study: Detection of Citation Sentences 1 First Stage of an Automated Content-Based Citation Analysis Study: Detection of Citation Sentences 1 Zehra Taşkın *, Umut Al * and Umut Sezen ** * {ztaskin; umutal}@hacettepe.edu.tr Department of Information

More information

National University of Singapore, Singapore,

National University of Singapore, Singapore, Editorial for the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL) at SIGIR 2017 Philipp Mayr 1, Muthu Kumar Chandrasekaran

More information

TWITTER SARCASM DETECTOR (TSD) USING TOPIC MODELING ON USER DESCRIPTION

TWITTER SARCASM DETECTOR (TSD) USING TOPIC MODELING ON USER DESCRIPTION TWITTER SARCASM DETECTOR (TSD) USING TOPIC MODELING ON USER DESCRIPTION Supriya Jyoti Hiwave Technologies, Toronto, Canada Ritu Chaturvedi MCS, University of Toronto, Canada Abstract Internet users go

More information

Automatic Sarcasm Detection: A Survey

Automatic Sarcasm Detection: A Survey Automatic Sarcasm Detection: A Survey Aditya Joshi 1,2,3 Pushpak Bhattacharyya 2 Mark James Carman 3 1 IITB-Monash Research Academy, India 2 IIT Bombay, India, 3 Monash University, Australia {adityaj,pb}@cse.iitb.ac.in,

More information

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection Some Experiments in Humour Recognition Using the Italian Wikiquote Collection Davide Buscaldi and Paolo Rosso Dpto. de Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter

SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter Aniruddha Ghosh University College Dublin, Ireland. arghyaonline@gmail.com Tony Veale University College Dublin, Ireland. Tony.Veale@UCD.ie

More information

Towards a Contextual Pragmatic Model to Detect Irony in Tweets

Towards a Contextual Pragmatic Model to Detect Irony in Tweets Towards a Contextual Pragmatic Model to Detect Irony in Tweets Jihen Karoui Farah Benamara Zitoune IRIT, MIRACL IRIT, CNRS Toulouse University, Sfax University Toulouse University karoui@irit.fr benamara@irit.fr

More information

Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest

Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest Dragomir Radev 1, Amanda Stent 2, Joel Tetreault 2, Aasish Pappu 2 Aikaterini Iliakopoulou 3, Agustin

More information

Melody classification using patterns

Melody classification using patterns Melody classification using patterns Darrell Conklin Department of Computing City University London United Kingdom conklin@city.ac.uk Abstract. A new method for symbolic music classification is proposed,

More information

arxiv: v1 [cs.cl] 8 Jun 2018

arxiv: v1 [cs.cl] 8 Jun 2018 #SarcasmDetection is soooo general! Towards a Domain-Independent Approach for Detecting Sarcasm Natalie Parde and Rodney D. Nielsen Department of Computer Science and Engineering University of North Texas

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

A Kernel-based Approach for Irony and Sarcasm Detection in Italian

A Kernel-based Approach for Irony and Sarcasm Detection in Italian A Kernel-based Approach for Irony and Sarcasm Detection in Italian Andrea Santilli and Danilo Croce and Roberto Basili Universitá degli Studi di Roma Tor Vergata Via del Politecnico, Rome, 0033, Italy

More information

Clues for Detecting Irony in User-Generated Contents: Oh...!! It s so easy ;-)

Clues for Detecting Irony in User-Generated Contents: Oh...!! It s so easy ;-) Clues for Detecting Irony in User-Generated Contents: Oh...!! It s so easy ;-) Paula Cristina Carvalho, Luís Sarmento, Mário J. Silva, Eugénio De Oliveira To cite this version: Paula Cristina Carvalho,

More information

Figurative Language Processing: Mining Underlying Knowledge from Social Media

Figurative Language Processing: Mining Underlying Knowledge from Social Media Figurative Language Processing: Mining Underlying Knowledge from Social Media Antonio Reyes and Paolo Rosso Natural Language Engineering Lab EliRF Universidad Politécnica de Valencia {areyes,prosso}@dsic.upv.es

More information

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 6th Adminstrivia The Homework Pipeline: Homework 2 graded Homework 4 not back yet soon Homework 5 due Weds by midnight No classes next

More information

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors * Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors * David Ortega-Pacheco and Hiram Calvo Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan

More information

Chinese Word Sense Disambiguation with PageRank and HowNet

Chinese Word Sense Disambiguation with PageRank and HowNet Chinese Word Sense Disambiguation with PageRank and HowNet Jinghua Wang Beiing University of Posts and Telecommunications Beiing, China wh_smile@163.com Jianyi Liu Beiing University of Posts and Telecommunications

More information

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Finding Sarcasm in Reddit Postings: A Deep Learning Approach Finding Sarcasm in Reddit Postings: A Deep Learning Approach Nick Guo, Ruchir Shah {nickguo, ruchirfs}@stanford.edu Abstract We use the recently published Self-Annotated Reddit Corpus (SARC) with a recurrent

More information

Semantic Role Labeling of Emotions in Tweets. Saif Mohammad, Xiaodan Zhu, and Joel Martin! National Research Council Canada!

Semantic Role Labeling of Emotions in Tweets. Saif Mohammad, Xiaodan Zhu, and Joel Martin! National Research Council Canada! Semantic Role Labeling of Emotions in Tweets Saif Mohammad, Xiaodan Zhu, and Joel Martin! National Research Council Canada! 1 Early Project Specifications Emotion analysis of tweets! Who is feeling?! What

More information

Temporal patterns of happiness and sarcasm detection in social media (Twitter)

Temporal patterns of happiness and sarcasm detection in social media (Twitter) Temporal patterns of happiness and sarcasm detection in social media (Twitter) Pradeep Kumar NPSO Innovation Day November 22, 2017 Our Data Science Team Patricia Prüfer Pradeep Kumar Marcia den Uijl Next

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

PunFields at SemEval-2018 Task 3: Detecting Irony by Tools of Humor Analysis

PunFields at SemEval-2018 Task 3: Detecting Irony by Tools of Humor Analysis PunFields at SemEval-2018 Task 3: Detecting Irony by Tools of Humor Analysis Elena Mikhalkova, Yuri Karyakin, Dmitry Grigoriev, Alexander Voronov, and Artem Leoznov Tyumen State University, Tyumen, Russia

More information

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons Center for Games and Playable Media http://games.soe.ucsc.edu Kendall review of HW 2 Next two weeks

More information

Evaluating Humorous Features: Towards a Humour Taxonomy

Evaluating Humorous Features: Towards a Humour Taxonomy Evaluating Humorous Features: Towards a Humour Taxonomy Antonio Reyes, Paolo Rosso, and Davide Buscaldi Natural Language Engineering Lab - ELiRF Departamento de Sistemas Informáticos y Computación Universidad

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Are Word Embedding-based Features Useful for Sarcasm Detection?

Are Word Embedding-based Features Useful for Sarcasm Detection? Are Word Embedding-based Features Useful for Sarcasm Detection? Aditya Joshi 1,2,3 Vaibhav Tripathi 1 Kevin Patel 1 Pushpak Bhattacharyya 1 Mark Carman 2 1 Indian Institute of Technology Bombay, India

More information

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Luiz G. L. B. M. de Vasconcelos Research & Development Department Globo TV Network Email: luiz.vasconcelos@tvglobo.com.br

More information

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007 A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis

More information

Sentiment Aggregation using ConceptNet Ontology

Sentiment Aggregation using ConceptNet Ontology Sentiment Aggregation using ConceptNet Ontology Subhabrata Mukherjee Sachindra Joshi IBM Research - India 7th International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya, Japan

More information

arxiv: v1 [cs.cl] 1 Apr 2019

arxiv: v1 [cs.cl] 1 Apr 2019 Recognizing Musical Entities in User-generated Content Lorenzo Porcaro 1 and Horacio Saggion 2 1 Music Technology Group, Universitat Pompeu Fabra 2 TALN Natural Language Processing Group, Universitat Pompeu

More information

Inducing an Ironic Effect in Automated Tweets

Inducing an Ironic Effect in Automated Tweets Inducing an Ironic Effect in Automated Tweets Alessandro Valitutti, Tony Veale School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D4, Ireland Email: {Tony.Veale, Alessandro.Valitutti}@UCD.ie

More information

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.

More information

Regression Model for Politeness Estimation Trained on Examples

Regression Model for Politeness Estimation Trained on Examples Regression Model for Politeness Estimation Trained on Examples Mikhail Alexandrov 1, Natalia Ponomareva 2, Xavier Blanco 1 1 Universidad Autonoma de Barcelona, Spain 2 University of Wolverhampton, UK Email:

More information

Multi-modal Analysis of Music: A large-scale Evaluation

Multi-modal Analysis of Music: A large-scale Evaluation Multi-modal Analysis of Music: A large-scale Evaluation Rudolf Mayer Institute of Software Technology and Interactive Systems Vienna University of Technology Vienna, Austria mayer@ifs.tuwien.ac.at Robert

More information

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC Vaiva Imbrasaitė, Peter Robinson Computer Laboratory, University of Cambridge, UK Vaiva.Imbrasaite@cl.cam.ac.uk

More information

How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text

How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text Aditya Joshi 1,2,3 Pushpak Bhattacharyya 1 Mark Carman 2 Jaya Saraswati 1 Rajita

More information

High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers

High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers Brett Powley and Robert Dale Centre for Language Technology Macquarie University Sydney, NSW

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Figurative Language Processing in Social Media: Humor Recognition and Irony Detection

Figurative Language Processing in Social Media: Humor Recognition and Irony Detection : Humor Recognition and Irony Detection Paolo Rosso prosso@dsic.upv.es http://users.dsic.upv.es/grupos/nle Joint work with Antonio Reyes Pérez FIRE, India December 17-19 2012 Contents Develop a linguistic-based

More information

Introduction to WordNet, HowNet, FrameNet and ConceptNet

Introduction to WordNet, HowNet, FrameNet and ConceptNet Introduction to WordNet, HowNet, FrameNet and ConceptNet Zi Lin the Department of Chinese Language and Literature August 31, 2017 Zi Lin (PKU) Intro to Ontologies August 31, 2017 1 / 25 WordNet Begun in

More information

Improving MeSH Classification of Biomedical Articles using Citation Contexts

Improving MeSH Classification of Biomedical Articles using Citation Contexts Improving MeSH Classification of Biomedical Articles using Citation Contexts Bader Aljaber a, David Martinez a,b,, Nicola Stokes c, James Bailey a,b a Department of Computer Science and Software Engineering,

More information

arxiv: v2 [cs.cl] 20 Sep 2016

arxiv: v2 [cs.cl] 20 Sep 2016 A Automatic Sarcasm Detection: A Survey ADITYA JOSHI, IITB-Monash Research Academy PUSHPAK BHATTACHARYYA, Indian Institute of Technology Bombay MARK J CARMAN, Monash University arxiv:1602.03426v2 [cs.cl]

More information

Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus

Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus Both sets of texts were preprocessed to provide comparable

More information

Enhancing Music Maps

Enhancing Music Maps Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park katepark@stanford.edu Annie Hu anniehu@stanford.edu Natalie Muenster ncm000@stanford.edu Abstract We propose detecting

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다.

저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다. 저작자표시 - 비영리 - 동일조건변경허락 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 이차적저작물을작성할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할수없습니다. 동일조건변경허락. 귀하가이저작물을개작, 변형또는가공했을경우에는,

More information

Scalable Semantic Parsing with Partial Ontologies ACL 2015

Scalable Semantic Parsing with Partial Ontologies ACL 2015 Scalable Semantic Parsing with Partial Ontologies Eunsol Choi Tom Kwiatkowski Luke Zettlemoyer ACL 2015 1 Semantic Parsing: Long-term Goal Build meaning representations for open-domain texts How many people

More information

Multi-modal Analysis for Person Type Classification in News Video

Multi-modal Analysis for Person Type Classification in News Video Multi-modal Analysis for Person Type Classification in News Video Jun Yang, Alexander G. Hauptmann School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, PA 15213, USA {juny, alex}@cs.cmu.edu,

More information

Detecting Hoaxes, Frauds and Deception in Writing Style Online

Detecting Hoaxes, Frauds and Deception in Writing Style Online Detecting Hoaxes, Frauds and Deception in Writing Style Online Sadia Afroz, Michael Brennan and Rachel Greenstadt Privacy, Security and Automation Lab Drexel University What do we mean by deception? Let

More information

arxiv: v1 [cs.cl] 26 Jun 2015

arxiv: v1 [cs.cl] 26 Jun 2015 Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest arxiv:1506.08126v1 [cs.cl] 26 Jun 2015 Dragomir Radev 1, Amanda Stent 2, Joel Tetreault 2, Aasish

More information

Kavita Ganesan, ChengXiang Zhai, Jiawei Han University of Urbana Champaign

Kavita Ganesan, ChengXiang Zhai, Jiawei Han University of Urbana Champaign Kavita Ganesan, ChengXiang Zhai, Jiawei Han University of Illinois @ Urbana Champaign Opinion Summary for ipod Existing methods: Generate structured ratings for an entity [Lu et al., 2009; Lerman et al.,

More information

The Lowest Form of Wit: Identifying Sarcasm in Social Media

The Lowest Form of Wit: Identifying Sarcasm in Social Media 1 The Lowest Form of Wit: Identifying Sarcasm in Social Media Saachi Jain, Vivian Hsu Abstract Sarcasm detection is an important problem in text classification and has many applications in areas such as

More information

A combination of opinion mining and social network techniques for discussion analysis

A combination of opinion mining and social network techniques for discussion analysis A combination of opinion mining and social network techniques for discussion analysis Anna Stavrianou, Julien Velcin, Jean-Hugues Chauchat ERIC Laboratoire - Université Lumière Lyon 2 Université de Lyon

More information

Computational Laughing: Automatic Recognition of Humorous One-liners

Computational Laughing: Automatic Recognition of Humorous One-liners Computational Laughing: Automatic Recognition of Humorous One-liners Rada Mihalcea (rada@cs.unt.edu) Department of Computer Science, University of North Texas Denton, Texas, USA Carlo Strapparava (strappa@itc.it)

More information

BIBLIOMETRIC REPORT. Bibliometric analysis of Mälardalen University. Final Report - updated. April 28 th, 2014

BIBLIOMETRIC REPORT. Bibliometric analysis of Mälardalen University. Final Report - updated. April 28 th, 2014 BIBLIOMETRIC REPORT Bibliometric analysis of Mälardalen University Final Report - updated April 28 th, 2014 Bibliometric analysis of Mälardalen University Report for Mälardalen University Per Nyström PhD,

More information

Deriving the Impact of Scientific Publications by Mining Citation Opinion Terms

Deriving the Impact of Scientific Publications by Mining Citation Opinion Terms Deriving the Impact of Scientific Publications by Mining Citation Opinion Terms Sofia Stamou Nikos Mpouloumpasis Lefteris Kozanidis Computer Engineering and Informatics Department, Patras University, 26500

More information

Face-threatening Acts: A Dynamic Perspective

Face-threatening Acts: A Dynamic Perspective Ann Hui-Yen Wang University of Texas at Arlington Face-threatening Acts: A Dynamic Perspective In every talk-in-interaction, participants not only negotiate meanings but also establish, reinforce, or redefine

More information

Lyric-Based Music Mood Recognition

Lyric-Based Music Mood Recognition Lyric-Based Music Mood Recognition Emil Ian V. Ascalon, Rafael Cabredo De La Salle University Manila, Philippines emil.ascalon@yahoo.com, rafael.cabredo@dlsu.edu.ph Abstract: In psychology, emotion is

More information

15-415: Database Applications. Project 1: Querying the MovieLens Database

15-415: Database Applications. Project 1: Querying the MovieLens Database 15-415: Database Applications Project 1: Querying the MovieLens Database School of Computer Science Carnegie Mellon University, Qatar Spring 2015 Assigned date: February 03, 2015 Due date: February 17,

More information

Basic Natural Language Processing

Basic Natural Language Processing Basic Natural Language Processing Why NLP? Understanding Intent Search Engines Question Answering Azure QnA, Bots, Watson Digital Assistants Cortana, Siri, Alexa Translation Systems Azure Language Translation,

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park, Annie Hu, Natalie Muenster Email: katepark@stanford.edu, anniehu@stanford.edu, ncm000@stanford.edu Abstract We propose

More information

Composer Style Attribution

Composer Style Attribution Composer Style Attribution Jacqueline Speiser, Vishesh Gupta Introduction Josquin des Prez (1450 1521) is one of the most famous composers of the Renaissance. Despite his fame, there exists a significant

More information