Projektseminar: Sentimentanalyse Dozenten: Michael Wiegand und Marc Schulder

Projektseminar: Sentimentanalyse Dozenten: Michael Wiegand und Marc Schulder Präsentation des Papers ICWSM A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews von Oren Tsur, Dmitry Davidov, Ari Rappoport Maximilian Wolf Sentimentanalyse 24.06.2015

Table of contents 1. Introduction 2. Methods 3. Two Evaluation Experiments 4. Results 5. Conclusion 2

What is sarcasm? Definition: the activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someone else feel stupid or show them that you are angry (Macmillan English Dictionary 2007) 3

Examples 1. Love The Cover (book) 2. Where am I? (GPS device) 3. Trees died for this book? (book) 4

Examples 4. Be sure to save your purchase receipt (smart phone) 5. Are these ipods designed to die after two years? (musicplayer) 6. Great for insomniacs (book) 5

Examples 7. All the features you want. Too bad they don t work! (smart phone) 8. Great idea, now try again with a real product development team (e-reader) 9. Defective by design (music player) 6

Why should we care? Commercial point of view: personalization of recommendation systems Improvement of review summarization and opinion mining systems 7

SASI Semi-supervised Algorithm for Sarcasm Identification 8

SASI Two modules: 1. semi supervised pattern acquisition identifies sarcastic patterns provides features for a classifier 2. a classification algorithm that classifies each sentence to a sarcastic class 9

Classification Framework Training Phase Manually labeled sentences (seeds) Score: 1 2 3 4 5 not sarcastic definitely sarcastic Labeled sentences pattern-based features 10

Pattern extraction Definition: A pattern is a ordered sequence of high frequency words (HFWs) and content words (CW). HFW: corpus frequency more than 1000 words per million CW: corpus frequency less than 100 words per million. 11

Pattern extraction Example: Sony does not care about customer opinions. Pattern: [company] does not CW about CW CW. replace company/author/product/book name with [product] [company] [title] [author] less specific patterns 12

Pattern extraction Patterns consist of: 2-6 HFWs 1-6 Slots for CWs Starts and ends with a HFW Minimal pattern: [HFW] [CW] [HFW] 13

Pattern extraction Patterns may overlap. Garmin apparently does not care much about product quality or customer support. [company] CW does not CW much does not CW much about CW CW or not CW much about CW CW or CW CW. 14

Pattern selection Hundreds of patterns! 15

Pattern selection Are they all useful? No! 16

Pattern selection Two criteria to select useful patterns: First: remove all patterns which only occur in sentences from a single product/book Second: remove all patterns which were labeled as 5 (clearly sarcastic) and also labeled 1 (not sarcastic) in the training set 17

Recap We have useful patterns, great! What now? 18

Pattern matching Patterns relatively long exact matches uncommon Taking advantage of partial matches reduces the sparsity 19

Pattern matching Feature value for each pattern: 0 a 1 and 0 g 1 are parameters to assign reduced scores for imperfect matches a = g = 0.1 1 : Exact match (all components in correct order) a : g * n / N : 0 : No match Sparse match (additional non-matching words) Incomplete match (only n > 1 of N components appear in the sentence) 20

Pattern matching Garmin apparently does not care much about product quality or customer support [company] CW does not 1 (exact match) [company] CW not 0.1 (sparse match) [company] CW CW does not 0,1 * 4/5 = 0.08 (incomplete match) 1 : Exact match a : Sparse match g * n / N : Incomplete match 0 : No match 21

Punctuation-based features 1. sentence length in words 2. number of! characters in the sentence 3. number of? characters in the sentence 4. number of quotes in the sentence 5. number of capitalized words in the sentence Normalized to be in [0-1] range Each of these features is equal to the weight of a single pattern feature 22

Seed training set SASI is semi-supervised a small seed of annotated data is required 80 sentences labeled 3-5 (sarcastic to some degree) Positive examples Full text of 80 negative reviews with no sarcastic sentences Negative examples 23

Results 5-point cross validation Precision Recall F Score punctuation 25,6% 31,2% 28,1% patterns 74,3% 78,8% 76,5% pat+punct 86,8% 76,3% 81,2% Not a bad first result, but: Seed very small (only 80 sarcastic sentences) 24

Data enrichment Only a small annotated seed available Number of sarcastic sentences in the seed modest Annotation is expensive How to find more training examples without additional annotation effort? 25

Data enrichment Sarcastic sentences often co-appear Automated web search (using Yahoo! BOSS API) Example: this book was really good until page 2! search: this book was really good until get: Gee, I thought this book was really good until I found out the Author didn't get into Bread Loaf! Score: weighted average of the closest training set vectors 26

Extended training set 471 positive examples (sarcastic sentences) 5020 negative examples (non-sarcastic) 27

Results 5-point cross validation Precision Recall F Score punctuation 25,6% 31,2% 28,1% patterns 74,3% 78,8% 76,5% pat+punct 86,8% 76,3% 81,2% Enrich punct 40,0% 39,0% 39,5% Enrich pat 76,2% 77,7% 76,9% All: SASI 91,2% 75,6% 82,7% Slightly better score after data enrichment Punctuation = weakest feature 28

Evaluation Two experiments 1.) Evaluate pattern acquisition process 5 fold cross validation over seed data 2.) Evaluate SASI Test on unseen sentences, compare to human annotated gold standard 29

Data Amazon.com reviews #products #reviews avg. stars avg. length (chars) 120 66271 4.19 953 Different product domains: Books, music players, cameras, gps, consoles,... 30

Evaluation SASI classified all sentences in 66000 reviews No gold standard classification available smaller test set / evaluation set 90 sarcastic sentences (labeled 3-5) 90 non-sarcastic sentences (labeled 1-2) Two constraints 31

Evaluation Two constraints: 1.) all non-sarcastic sentences belong to negative reviews (1-3 stars) All sentences drawn from the same population Increasing chances for direct or indirect negative sentiment 2.) all sentences contain a named entity Keeps the evaluation set relevant More likely to contain sentiment 32

Evaluation 15 Annotators of varying cultural backgrounds Each Annotator labeled 40 sentences (1-5) Each sentence labeled by 3 annotators Fleiss k = 0,34 fair agreement Inter Annotator Agreement 33

Baseline Star-sentiment baseline Exploits the Amazon meta-data: star rating Identify unhappy reviewers (1-3 stars) Classify sentence as sarcastic, if sentences exhibit strong positive sentiment (in a negative review) List of positive sentiment words predefined (about 20) great, excellent, best, top, exciting, etc 34

Results Evaluation on the evaluation set Starsentiment Precision Recall F Score 50% 16% 24,2% SASI 76,6% 81,3% 78,8% SASI much better than baseline Baseline only captures a certain type of sarcasm Why does SASI work so well? 35

Some examples A classic sarcastic comment: Silly me, the Kindle and the Sony ebook can't read these protected formats. Great! Patterns: me, the CW and [product] can't [product] can't CW these CW CW. Great! can't CW these CW CW. these CW CW. Great! 36

Some examples Punctuation marks = weakest predictors... and capital letters = strong predictors Example: Precision Recall F Score punctuation 25,6% 31,2% 28,1% patterns 74,3% 78,8% 76,5% pat+punct 86,8% 76,3% 81,2% i guess Enrich I don't punct think 40% very 39,0% brilliantly... 39,5% well... it Enrich pat 76,2% 77,7% 76,9% was ok... but not good to read just for fun... cuz All: SASI 91,2% 75,6% 82,7% it's not fun... Well you know what happened. ALMOST NOTHING HAPPENED!!! 37

Some examples SASI fail?: this book was really good until page 2! vs. this book was really good until page 430! No context similar patterns 38

Some examples Context is captured indirectly: This book was great until page 2! What an achievement! Patterns can cross sentence boundries what produces more patterns SASI uses context 39

Speculation Motivation for using sarcasm Product reviews Avg. stars Price sarcastic Shure E2c 782 3,8 99$ 51 Da Vinci Code 3481 3,46 9,99$ 79 Sony MDR-NC6 576 3,37 69,99$ 34 The God Delusions 1022 3,91 27$ 19 Kindle ereader 2900 3,9 489$ 19 Three factors for using sarcasm: 1. Popularity of a product 2. Simplicity of a product 3. Price of a product 40

Conclusion SASI Found some strong features for sarcasm CAPITAL LETTERS positive sentiment in negative review A combination of subtle features works better than using just some strong features Sarcasm has various facets 41

Danke für die Aufmerksamkeit! 42

Extra Unterschied zwischen Sarkasmus und Ironie: Sarkasmus = Intention Sarkasmus ein frontaler, offener oder indirekter, verschlüsselter Angriff. Ironie = Stilmittel Ironie ist, wenn das Gegenteil von dem, was gesagt wird, gemeint ist. (http://www.sprachschach.de/sarkasmus-ironie-der-gar-nicht-mal-feine-unterschied/) 43

Extra feature and feature vector a feature is an individual measurable property of a phenomenon being observed. Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition. A set of numeric features can be conveniently described by a feature vector. 44