Sentiment and Sarcasm Classification with Multitask Learning

Similar documents
arxiv: v1 [cs.cl] 3 May 2018

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm

Humor recognition using deep learning

Joint Image and Text Representation for Aesthetics Analysis

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Sarcasm Detection in Text: Design Document

Harnessing Context Incongruity for Sarcasm Detection

Detecting Musical Key with Supervised Learning

World Journal of Engineering Research and Technology WJERT

How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text

Music Composition with RNN

Tweet Sarcasm Detection Using Deep Neural Network

arxiv: v1 [cs.ir] 16 Jan 2019

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

HumorHawk at SemEval-2017 Task 6: Mixing Meaning and Sound for Humor Recognition

LSTM Neural Style Transfer in Music Using Computational Musicology

arxiv: v3 [cs.sd] 14 Jul 2017

Image-to-Markup Generation with Coarse-to-Fine Attention

Deep Aesthetic Quality Assessment with Semantic Information

Computational modeling of conversational humor in psychotherapy

SentiMozart: Music Generation based on Emotions

Deep Learning of Audio and Language Features for Humor Prediction

An Introduction to Deep Image Aesthetics

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

Neural Network for Music Instrument Identi cation

DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison

Generating Chinese Classical Poems Based on Images

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Automatic Detection of Sarcasm in BBS Posts Based on Sarcasm Classification

Are Word Embedding-based Features Useful for Sarcasm Detection?

A Survey of Sarcasm Detection in Social Media

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

arxiv: v1 [cs.lg] 15 Jun 2016

Less is More: Picking Informative Frames for Video Captioning

A New Scheme for Citation Classification based on Convolutional Neural Networks

NLPRL-IITBHU at SemEval-2018 Task 3: Combining Linguistic Features and Emoji Pre-trained CNN for Irony Detection in Tweets

LYRICS-BASED MUSIC GENRE CLASSIFICATION USING A HIERARCHICAL ATTENTION NETWORK

Fracking Sarcasm using Neural Network

TWITTER SARCASM DETECTOR (TSD) USING TOPIC MODELING ON USER DESCRIPTION

Neural Aesthetic Image Reviewer

arxiv:submit/ [cs.cv] 8 Aug 2016

Music genre classification using a hierarchical long short term memory (LSTM) model

arxiv: v1 [cs.lg] 16 Dec 2017

Attending Sentences to detect Satirical Fake News

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

KLUEnicorn at SemEval-2018 Task 3: A Naïve Approach to Irony Detection

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Singer Traits Identification using Deep Neural Network

Audio Cover Song Identification using Convolutional Neural Network

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

Projektseminar: Sentimentanalyse Dozenten: Michael Wiegand und Marc Schulder

Approaches for Computational Sarcasm Detection: A Survey

Who would have thought of that! : A Hierarchical Topic Model for Extraction of Sarcasm-prevalent Topics and Sarcasm Detection

Automatic Piano Music Transcription

An extensive Survey On Sarcasm Detection Using Various Classifiers

First Stage of an Automated Content-Based Citation Analysis Study: Detection of Citation Sentences 1

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Evaluating Melodic Encodings for Use in Cover Song Identification

A Discriminative Approach to Topic-based Citation Recommendation

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Lyrics Classification using Naive Bayes

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Article Title: Discovering the Influence of Sarcasm in Social Media Responses

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

An AI Approach to Automatic Natural Music Transcription

Chord Classification of an Audio Signal using Artificial Neural Network

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Automatic Laughter Detection

arxiv: v2 [cs.sd] 15 Jun 2017

Lyric-Based Music Mood Recognition

Implementation of Emotional Features on Satire Detection

Singing voice synthesis based on deep neural networks

arxiv: v2 [cs.cl] 20 Sep 2016

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

#SarcasmDetection Is Soooo General! Towards a Domain-Independent Approach for Detecting Sarcasm

SARCASM DETECTION IN SENTIMENT ANALYSIS Dr. Kalpesh H. Wandra 1, Mehul Barot 2 1

arxiv: v1 [cs.ir] 20 Mar 2019

SARCASM DETECTION IN SENTIMENT ANALYSIS

Multimodal Music Mood Classification Framework for Christian Kokborok Music

arxiv: v1 [cs.sd] 5 Apr 2017

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

Sarcasm Detection on Facebook: A Supervised Learning Approach

This is a repository copy of Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis.

Formalizing Irony with Doxastic Logic

CS229 Project Report Polyphonic Piano Transcription

COMPARING RNN PARAMETERS FOR MELODIC SIMILARITY

Real-valued parametric conditioning of an RNN for interactive sound synthesis

National University of Singapore, Singapore,

Music Genre Classification and Variance Comparison on Number of Genres

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY

Distortion Analysis Of Tamil Language Characters Recognition

Sparse, Contextually Informed Models for Irony Detection: Exploiting User Communities, Entities and Sentiment

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cl] 8 Jun 2018

Transcription:

1 Sentiment and Sarcasm Classification with Multitask Learning Navonil Majumder, Soujanya Poria, Haiyun Peng, Niyati Chhaya, Erik Cambria, and Alexander Gelbukh arxiv:1901.08014v1 [cs.cl] 23 Jan 2019 Abstract Sentiment classification and sarcasm detection are both important NLP tasks. We show that these two tasks are correlated, and present a multi-task learningbased framework using deep neural network that models this correlation to improve the performance of both tasks in a multi-task learning setting. Our method outperforms the state-of-the-art by 3 4%. I. INTRODUCTION The surge of Internet has enabled large-scale text-based opinion sharing on a wide range of topics. This has led to the opportunity of mining user sentiment on various topics from the data publicly available over the Internet. The most important task in the analysis of the users opinions is sentiment classification: determining whether a given text, such as user review, comment, or tweet, expresses positive or negative sentiment. When expressing their opinions or sentiment, users often use sarcasm for emphasis. In a sarcastic text, the sentiment intended by the author is the opposite of its literal meaning. E.g., the sentence Thank you alarm for never going off is literally positive ( Thank you ), however, the intended sentiment is negative alarm never going off. Unless this sentiment shift is detected with semantics, the sarcasm classifier may fail to spot sarcasm. Currently, most researchers focus on either sentiment classification or sarcasm detection [19], [7], without considering the possibility of mutual influence between the two tasks. However, one can observe that the two tasks are correlated: people usually (though not always) use sarcasm as a device for the expression of emphatic negative sentiment. This observation can lead to a simple way in which one of the two tasks can help improve the other i.e. if an expression can be detected as sarcastic, its sentiment can be assumed negative; if the expression N. Majumder and A. Gelbukh are with the CIC, Instituto Politécnico Nacional, Mexico City, Mexico, e-mail: see http://www.gelbukh.com. S. Poria is with the SCSE, Nanyang Technological Institute, Singapore, e-mail: see http://http://www.ntu.edu.sg/home/sporia. H. Peng and E. Cambria are with the SCSE, Nanyang Technological Institute, Singapore, e-mail: see http://sentic.net/erikcambria. N. Chhaya is with the Adobe Research, India, e-mail: nchhaya@ adobe.com. can be classified as positive, then it can be assumed not sarcastic. We show here that while this logic does lead to a slight improvement, there is a better way of combining the two tasks. Namely, in this paper, we train a classifier for both sarcasm and sentiment in a single neural network using multi-task learning which is a novel learning scheme and has gained recent popularity [1], [11]. We empirically show that this method outperforms the results obtained with two separate classifiers and, in particular, outperforms the current state of the art [14]. Rest of the paper is structured as follows. Section II outlines the related work; Section III presents our approach; Section IV list the baselines; Section V discusses the results; and Section VI concludes this paper. II. RELATED WORK Machine learning methods e.g., [20], [26] and deep neural networks, such as CNN [8], [17], [13], recursive neural networks [5], [24], recurrent neural networks [25], or memory networks [10], have shown good performance for sentiment detection. Knowledge-based methods explore syntactic characteristics/patterns/rules [18] and employ sentiment resources [6]. However, sarcasm detection currently focuses on extracting features, such as syntactic [2], surface pattern-based [4], or personality-based features [19], as well as contextual incongruity [7]. Mishra et al. [14] extracted multimodal cognitive features for both sentiment classification and sarcasm detection, without modelling the two tasks in a single system. However, recently multi-task learning has been successfully applied in many NLP tasks, such as implicit discourse relationship identification [11], key-phrase boundary classification [1]. In this paper, we apply it to sentiment classification and sarcasm detection. III. METHOD Following [22], many sarcastic sentences carry negative sentiment. We leverage this to improve both sentiment classification and sarcasm detection. We use multi-task learning, where a single neural network is used to perform

2 more than one classification task, in our case, sentiment classification and sarcasm detection. This network facilitates synergy between the two tasks, resulting in improved performance on both tasks in comparison with their standalone counterparts. accommodate two different tasks, sarcasm detection and sentiment classification: H sar = ReLU(HW sar + b sar ), H sen = ReLU(HW sen + b sen ), Sarcasm detection s sar Neural tensor network s + tensor T Sentiment detection s sen where W [sar,sen] R D dru D t and b [sar,sen] R D t. e) Attention network: Word representations in H are encoded with task-specific sentence-level context. To aggregate these context-rich representations into the sentence representation s, we use attention mechanism, due to its ability to prioritize words relevant for the classification: P = tanh(h W AT T ), (1) H sar H sen α = softmax(p T W α ), (2) s = αh T, (3) GRU GRU GRU x 1 sentence X Fig. 1. Our multi-task architecture. a) Task Definition: We solve two tasks with a single network. Given a sentence [w 1, w 2,..., w l ], where w i are words, we assign it both a sentiment tag (positive / negative) and a sarcasm tag (yes / no). b) Input Representation: We use D g -dimensional (D g = 300) Glove word-embeddings [16] x i R D g to represent the words w i, padding the variable-length input sentences to a fixed length with null vectors. Thus, the input is represented as a matrix X = [x 1, x 2,..., x L ], where L is the length of the longest sentence. c) Sentence Representation: In the next layers, we obtain sentence representation from X using Gated Recurrent Unit (GRU) [3] with attention mechanism [12] as follows. d) Sentence-level word representation: The sentence X is fed to a GRU of size D gru = 500 with parameters W [z,r,h] R D g D gru and U [z,r,h] R D gru D gru to get context-rich sentence-level word representations H = [h 1, h 2,, h L ], h t R D gru at the hidden output of the GRU. We use H for both sarcasm and sentiment. Thus, H is transformed to H sar and H sen using two different fully-connected layers of size D t = 300 in order to x L where W AT T R Dt 1, W α R L L, P R L 1, and s R D t. In Eq. (2), α [0, 1]L gives the relevance of words for the task, multiplied in Eq. (3) by the contextaware word representations in H. f) Inter-Task Communication: We use Neural Tensor Network (NTN) [23] of size D ntn = 100 to fuse sarcasmand sentiment-specific sentence representations, s sar and s sen, to obtain the fused representation s +, where s + = tanh(s sar T [1 D ntn] s T sen + (s sar s sen )W + b), where T R D ntn D t D t, W R 2D t D ntn, b, s + R D ntn, and stands for concatenation. The vector s + contains information relevant to both sentiment and sarcasm. Instead of NTN, we also tried attention and concatenation for fusion, which resulted in inferior performance (Section V). g) Classification: For the two tasks, we use two different softmax layers for classifications. h) Sentiment classification: We use only s sen as sentence representation for sentiment classification, since we observe best performance without s +. We apply softmax layer of size C (C = 2 for binary task) on s sen for classification as follows: P sen = softmax(s sen Wsen softmax ŷ sen = argmax(p sen [j]), j + b softmax sen ), where Wsen softmax R Dt C, b softmax sen R C, P sen R C, j is the class value (0 for negative and 1 for positive), and ŷ sen is the estimated class value.

3 i) Sarcasm classification: We use s sar s + as sentence representation for sarcasm classification using softmax layer with size C (C = 2) as follows: P sar = softmax((s sar s + ) Wsar softmax ŷ sar = argmax(p sar [j]), j + b softmax sar ), where Wsar softmax R (D t+d ) C ntn, b softmax sar R C, P sar R C, j is the class value (0 for no and 1 for yes), and ŷ sar is the estimated class value. j) Training: We use categorical cross-entropy as the loss function (J ; * is sar or sen) for training: J = 1 N N i=1 C 1 yij j=0 log P i [j], where N is the number of samples, i is the index of a sample, j is the class value, and yij 1, if expected class value of sample i is j, = 0, otherwise. As a training algorithm, we use Stochastic Gradient Descent (SGD)-based ADAM algorithm [9], which optimizes each parameter individually with different and adaptive learning rates. Also, we minimize both loss functions, namely J sen and J sar, with equal priority, by optimizing the parameter set θ ={U [z,r,h], W [z,r,h], W, b, W AT T, W α, T, W, b, W softmax, b softmax }. IV. EXPERIMENTS a) Dataset: The dataset [15] consists of 994 samples, each sample containing a text snippet labeled with sarcasm tag, sentiment tag, and eye-movement data of 7 readers. We ignored the eye-movement data in our experiments. Of those samples, 383 are positive and 350 are sarcastic. b) Baselines and Model Variants: We evaluated the following baselines and variations of our model. c) Standalone classifiers: Here, we used h =F CLayer(GRU(X)), P =SoftmaxLayer(h ), where * represents sar or sen, X is the input sentence as a list of word embeddings. We feed X to GRU and pass the final output through a fully-connected layer (F CLayer) to obtain sentence representation h. We apply final softmax classification (Sof tmaxlayer) to h. d) Sentiment coerced by sarcasm: In this classifier, the sentences classified as sarcastic are forced to be considered negative by the sentiment classifier. e) Simple multi-task classifier: The following equations summarize this variant: h =F CLayer (GRU(X)), (4) P =SoftmaxLayer (h ), (5) where * represents sar or sen. This setting shares the GRU between two tasks. Final output of GRU is taken as the sentence representation. Sentence representation is fed to two different task-specific fully-connected layers (F CLayer ), giving h. Subsequently, h are fed to two different softmax layers SoftmaxLayer for classification. f) Simple multi-task classifier with fusion: In this variant, we changed Eq. (5) to: P sar =SoftmaxLayer sar (h sar F ), (6) P sen =SoftmaxLayer sen (h sen ), (7) where F = NT N(h sar, h sen ). Here, h sar and h sem are fed to a Neural Tensor Network (NT N) whose output is concatenated with h sar for classification. Sentiment classification is done with h sen only. We also tried variants with other methods of fusion (such as fully connected layer or Hadamard product) instead of NTN, as well as variants with h sen F instead of, or in addition to, h sar F, but they did not imprive the results. g) Task-specific GRU with fusion: Here, we used two separate GRUs for the two tasks in Eq. (4): h =F CLayer (GRU (X)). (8) We used Eq. (6) and Eq. (7) for P. Again, we tried concatenating F with h sen, both, or none as in Eq. (5), but this did not improve the results. h) Best model: shared attention: Here, we added the attention mechanism to the matrix H in Eq. (4), and used Eq. (6) and Eq. (7) for P. This model, described in detail in Section III, is the main model we present in this paper since it gave the best results. We also tried separate GRUs as in Eq. (8), but this did not improve the results. V. RESULTS AND DISCUSSION The results using 10-fold cross validation are shown in Table I. As baselines, we used the standalone sentiment and sarcasm classifiers, as well as the CNN-based stateof-the-art method [14] (SoA). Our standalone GRU-based sentiment and sarcasm classifiers performed slightly better than the SoA, even though the SoA also uses the gaze data

4 TABLE I RESULTS FOR VARIOUS EXPERIMENTS. Variant Sentiment Sarcasm Average Precision Recall F-Score Precision Recall F-Score F-Score State of the art [14] 79.89 74.86 77.30 87.42 87.03 86.97 82.13 Standalone classifiers 79.02 78.03 78.13 89.96 89.25 89.37 83.75 Standalone coerced 81.57 80.06 80.38 Multi-Task simple 80.41 79.88 79.7 89.42 89.19 89.04 84.37 Multi-Task with fusion 82.32 81.71 81.53 90.94 90.74 90.67 86.10 Multi-Task with fusion and separate GRUs 80.54 80.02 79.86 91.01 90.66 90.62 85.24 Multi-Task with fusion and shared attention (Section III) 83.67 83.10 83.03 90.50 90.34 90.29 86.66 present in the dataset but is never available in any reallife setting. In contrast, our method, besides improving results, is applied to plain-text documents such as tweets, without any gaze data. As expected, the sentiment classifier coerced by sarcasm classifier performed better than the standalone sentiment classifier. This means that an efficient sarcasm detector can boost the performance of a sentiment classifier. All our multi-task classifiers outperformed both standalone classifiers. However, the margin of improvement for multi-task classifier over the standalone classifier is greater for sentiment than for sarcasm. Probably this is because sarcasm detection is a subtask of sentiment analysis. Analyzing examples and attention visualization of the multi-task network, we observed that the multi-task network mainly helps improving sarcasm classification when there is a strong sentiment shift, which indicates the possibility of sarcasm in the sentence. The example given in the introduction was classified incorrectly by the standalone sarcasm classifier but correctly by the standalone sentiment classifier; coercing one of the classifiers by the other would not change the result. In the multi-task network, both sentiment and sarcasm are detected correctly, apparently because the network detected the sentiment shift in the sentence, which improved sarcasm classification. Similarly, the sentence Absolutely love when water is spilt on my phone, just love it is classified as positive by the standalone sentiment classifier: Absolutely love highlighted by the attention scores (not presented in this short paper). However, the standalone sarcasm classifier identified it as sarcastic due to water spilt on my phone (seen from the attention scores) and in the multi-task network this clue corrected the sentiment classifier s output. Even our standalone GRU-based classifiers outperformed the CNN-based state-of-the-art method. The multitask classifiers outperformed the standalone classifiers because of shared representation which serves as additional regularization for each task from the other task. Adding NTN fusion to the multi-task classifier further improved the results, giving the best performance for sarcasm detection. Adding attention network shared between the tasks further improves the performance for sentiment classification. As the last column of Table I shows, on average the best results across the two tasks were obtained with the architecture described in Section III. VI. CONCLUSIONS We presented a classifier architecture that can be trained on sentiment or sarcasm data and outperforms the state of the art in both cases on the dataset used by [14]. Our architecture uses a GRU-based neural network, while the state-of-the-art method by [14] used a CNN. Furthermore, we showed that multi-task learningbased methods significantly outperform these standalone sentiment and sarcasm classifiers. This indicates that sentiment classification and sarcasm detection are related tasks. Finally, we presented a multi-task learning architecture that gave the best results, out of a number of variants of the architecture that we tried. In the future, we intend to incorporate multimodal information [21] in our network for improved performance. REFERENCES [1] I. Augenstein and A. Søgaard. Multi-Task Learning of Keyphrase Boundary Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 341 346, Vancouver, Canada, July 2017. Association for Computational Linguistics. [2] F. Barbieri, H. Saggion, and F. Ronzano. Modelling sarcasm in twitter, a novel approach. In WASSA@ ACL, pages 50 58, 2014. [3] J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, abs/1412.3555, 2014. [4] D. Davidov, O. Tsur, and A. Rappoport. Semi-supervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the fourteenth conference on computational natural language learning, pages 107 116. Association for Computational Linguistics, 2010.

5 [5] L. Dong, F. Wei, C. Tan, D. Tang, M. Zhou, and K. Xu. Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification. In ACL (2), pages 49 54, 2014. [6] A. Esuli and F. Sebastiani. Sentiwordnet: A high-coverage lexical resource for opinion mining. Evaluation, pages 1 26, 2007. [7] A. Joshi, V. Sharma, and P. Bhattacharyya. Harnessing context incongruity for sarcasm detection. In ACL (2), pages 757 762, 2015. [8] Y. Kim. Convolutional neural networks for sentence classification. arxiv preprint arxiv:1408.5882, 2014. [9] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. [10] A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, and R. Socher. Ask me anything: Dynamic memory networks for natural language processing. In International Conference on Machine Learning, pages 1378 1387, 2016. [11] M. Lan, J. Wang, Y. Wu, Z.-Y. Niu, and H. Wang. Multitask Attention-based Neural Networks for Implicit Discourse Relationship Representation and Identification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1299 1308, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. [12] M.-T. Luong, H. Pham, and C. D. Manning. Effective Approaches to Attention-based Neural Machine Translation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1412 1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics. [13] N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria. Dialoguernn: An attentive rnn for emotion detection in conversations. arxiv preprint arxiv:1811.00405, 2018. [14] A. Mishra, K. Dey, and P. Bhattacharyya. Learning cognitive features from gaze data for sentiment and sarcasm classification using convolutional neural network. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 377 387, Vancouver, Canada, July 2017. Association for Computational Linguistics. [15] A. Mishra, D. Kanojia, and P. Bhattacharyya. Predicting readers sarcasm understandability by modeling gaze behavior, 2016. [16] J. Pennington, R. Socher, and C. D. Manning. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532 1543, 2014. [17] S. Poria, E. Cambria, and A. Gelbukh. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems, 108:42 49, 2016. [18] S. Poria, E. Cambria, A. Gelbukh, F. Bisio, and A. Hussain. Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Computational Intelligence Magazine, 10(4):26 36, 2015. [19] S. Poria, E. Cambria, D. Hazarika, and P. Vij. A deeper look into sarcastic tweets using deep convolutional neural networks. In COLING, pages 1601 1612, 2016. [20] S. Poria, A. Gelbukh, A. Hussain, S. Bandyopadhyay, and N. Howard. Music genre classification: A semi-supervised approach. In Mexican Conference on Pattern Recognition, pages 254 263. Springer, 2013. [21] S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea. Meld: A multimodal multi-party dataset for emotion recognition in conversations. arxiv preprint arxiv:1810.02508, 2018. [22] E. Riloff, A. Qadir, P. Surve, L. D. Silva, N. Gilbert, and R. Huang. Sarcasm as Contrast between a Positive Sentiment and Negative Situation. In EMNLP, 2013. [23] R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 926 934. Curran Associates, Inc., 2013. [24] R. Socher, C. C. Lin, C. Manning, and A. Y. Ng. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 129 136, 2011. [25] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104 3112, 2014. [26] A. Zadeh, P. P. Liang, S. Poria, P. Vij, E. Cambria, and L.-P. Morency. Multi-attention recurrent network for human communication comprehension. arxiv preprint arxiv:1802.00923, 2018.