Learning multi-grained aspect target sequence for Chinese sentiment analysis. H Peng, Y Ma, Y Li, E Cambria Knowledge-Based Systems (2018)

Tutorial

Learning multi-grained aspect target sequence for Chinese sentiment analysis H Peng, Y Ma, Y Li, E Cambria Knowledge-Based Systems (28)

Ideas Task: Aspect term sentiment classification Problems Eg.: The red apple released in California was not that interesting. Eg.: The room size is small, but the view is excellent. Opportunities in Chinese Compositionality = Train ( 火车) Wood (木) + Fire (火) Jungle (林) Vehicle ( 车 ) Forest (森)

Solutions Adaptive word embeddings Aspect target sequence modelling Attention mechanism Sequence modelling-lstm Multi-grained learning Fusion of granularities

https://github.com/senticnet

Q Term Docs Docs2 Docs3 Angels Fools Angels rush Angels fear Fools rush Fear fools Fear to Where angels To tread in queries generated a) Which arefear the biword boolean by the following phrase query? Rush in. fools rush in 2. where angels rush in 3. angels fear to tread b) Which are, if any, the document retrieved?

term doc Q2 doc2 angels #36, 74, 252, 65$ fools #, 7, 74, 222$ fear in #3, 37, 76, 444, 85$ rush #2, 66, 94, 32, 72$ to #47, 86, 234, 999$ doc3 #5, 23, 42$ #8, 78, 8, 458$ #3, 43, 3, 433$ #8, 328, 528$ #, 2,, 47, 5$ #5, 7, 25, 95$ #4, 6, 44$ #4, 24, 774, 944$ #9, 39, 599, 79$ Which treaddocument(s), #57, 94, 333$ if any, meet each of the following phrase where based #67, 24, 393,over mentioned #, 4,, #4,index? 36, queries, on the positional $ (a) fools rush in (b) where angels rush in (c) angels fear to tread 42, 43$; 736$

Reca Biword Index Index every consecutive pair of terms in the text as a phrase Es. Friends, Romans, Countrymen would generate the biwords:. friends romans 2. romans countrymen Longer phrase queries can be broken into the Boolean query on biwords: Es. stanford university palo alto stanford university AND university palo AND palo alto

Reca Positional index Extract inverted index entries for each distinct term: to, be, or, not. Merge their doc:position lists to enumerate all positions with to be or not to be. to: 2:,7,74,222,55; 4:8,6,9,429,433; 7:3,23,9;... be: :7,9; 4:7,9,29,43,434; 5:4,9,;... Same general method for proximity searches

Group discussion

A.a fools rush in => fools rush AND rush in where angels rush in => where angels AND angels rush AND rush in angels fear to tread => angels fear AND fear to AND to tread

A.b fools rush in = doc where angels rush in = doc, doc3 angels fear to tread = null

A2 fools rush in => doc Fools #, 7, 74, 222$ 444, 85$ rush #2, 66, 94, 32, 72$ in #3, 37, 76, where angels rush in => doc3 Where #4, 36, 736$ angels #5, 23, 42$ #5, 7, 25, 95$ rush #4, 6, 44$ in Doc;No positional merge available Where #67, 24, 393, $ angels #36, 74, 252, 65$ 94, 32, 72$ in #3, 37, 76, 444, 85$ rush #2, 66,

Q3 Consider the table of term frequencies for 3 documents denoted Doc, Doc2, Doc3 below. Compute the tf-idf weights for the terms car, auto, insurance, best, for each document, using idf the table below. wthe ( values log tffrom t,d ) log ( N / df t ) t,d term Doc Doc 2 Doc3 idf car 27 24.65 auto 3 33 2.8 insuran ce 33 29.62 best 4 7.5

Sec. 6.2.2 tf-idf weighting Recall The tf-idf weight of a term is the product of its tf weight and its idf weight. w t,d ( log tf t,d ) log ( N / df t ) Best known weighting scheme in information retrieval Note: the - in tf-idf is a hyphen, not a minus sign! Alternative names: tf.idf, tf x idf Increases with the number of occurrences within a document

Group discussion

A3 w t,d ( log tft,d ) log ( N / dft ) tf Doc Doc 2 Doc3 idf car 27 24.65 auto 3 33 2.8 insuran ce 33 29.62 best +log tf 4 Doc Doc 2 car 2.43 auto w Doc Doc 2 Doc3 car 4. 3.3 3.93 auto 3.8 5.24 7 Doc3.5 insuran ce 4.8 3.99 2 2.38 best 3.23 3.35.48 2.52 insuran ce 2.52 2.46 best 2.5 2.23

Q4 Refer to the tf and idf values for four terms and three documents from Q3. Compute the two top scoring documents on the query best car insurance for each of the following weighing schemes: (i) nnn.atc; (ii) ntc.atc. ddd.qqq

Sec. 6.4 tf-idf example: lnc.ltc Recall Document: car insurance auto insurance Query: best car insurance Term Document tfraw tf-wt auto best car insurance wt Query norm alize tf-raw.52 2.3.3 tf-wt Pro d df idf 5 2.3 5.3.3.34.52 2. 2..52.27.68 3. 3..78.53 Doc length = 2 2 2.32.92 Score = ++.27+.53 =.8 wt norma lize

Group discussion

A4 Find document vectors: (i) nnn (ii) ntc nnn Doc Doc2 Doc3 car 27**= 27 **= 24**= 24 auto 3**=3 33**= 33 **= **= 33**= 29**= Doc2 33 29 insuran ntc ce Doc car (27*.65=44.55)/49.6 (*.65=6.5)/88.55 4**= **= 7**= = = 4 7.9.9 (24*.65=39.6)/66.5=.6 auto (3*2.8=6.24)/49.6=.3 (33*2.8=68.64)/88.5 5=.78 *2.8= insuranc e *.62= (33*.62=53.46)/88.5 5=.6 (29*.62=46.98)/66.5=.7 best (4*.5=2)/49.6=.42 *.5= (7*.5=25.5)/66.5=.38 best Doc3

A4 Find the vector for query best car insurance: (i,ii) atc tf a t at atc car.5+.5*/=.65.65.6 auto.5+.5*/=.62.62.59 insurance best nnn.atc.5 Doc3 car.5+.5*/=.5 Doc Doc2 27*.6=6.2 *.6=6 auto insuranc e *.59= 33*.59=9. 47 29*.59=7. best 4*.54=7.5 6 7*.54=9.8 SUM 23.75 (3rd) 4.69 (st) 25.47 (2nd) 24*.6=4.4.54 max(tf)= length=2.76

A4 (ii) ntc.atc ntc Doc Doc2 Doc3 atc car.9.9.6.6 auto.3.79 insurance.6.7.59 best.42.38.54 ntc.atc Doc Doc2 Doc3 car.9*.6=.54.9*.6=..6*.6=.36 auto insurance.6*.59=.3 6.7*.59=.42 best.42*.54=.2 3.38*.54=.2 SUM.77 (2nd).47(3rd).99 (st)

Q5 Antony and Julius Cleopatr Caesar a The Tempest Antony 57 73 Brutus 28 57 Caesar 232 227 Calpurni a Cleopatr a 23 37 Mercy 5 a) Compute the cosine similarity and the Euclidian distance between the Worser 2 brutus based on the termdocuments and the query: caesar mercy document count matrix above. b) How does the Euclidian distance change if we normalize the vectors? w t,d ( log tft,d ) log ( N / dft ) NB: Compute the vector space using tf-idf formula of Q3

Euclidean distance Recall Euclidean distance: the distance between points (x,y ) and (x,y ) is given by: 2 2 Unfortunately, this distance is biased by the length of the vectors. So is not able to detect the correct terms distribution

Cosine similarity illustrated 27 Recall

Group discussion

A5 Compute the vector space Antony and Cleopatr a Julius Caesar The Tempest Query Antony.56.5 Brutus.43.56.8 Caesar.59.59.8 Calpurni a.95 Cleopatr a.42.45 Mercy.35.38.8 Worser.23.8

A5 Antony and Cleopatra Julius Caesar The Tempest Cosine similarity.57.62.35 Euclidian distance.9.22.58

A5 Normalized values Antony and Cleopatr a Julius Caesar The Tempest Query Antony.25.69 Brutus.93.9.33 Caesar.265.2.33 Calpurni a.322 Cleopatr a.88.446 Mercy.9.376.33 Worser.3.78.49.47.67 Euclidia n distance normaliz

Tutorial 2

Context-Dependent Sentiment Analysis in User-Generated Videos Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., & Morency, L. P. (27). In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume : Long Papers) (Vol., pp. 873-883).

Idea Utterance context influences sentiment eg.: Movie review of Green Hornet : The Green Hornet did something similar It engages the audience more, they took a new spin on it, and I just loved it

Solutions Model the order of utterance appearance Contextual LSTM Fusion of modalities Hierarchical Framework

https://github.com/senticnet

Q Consider the following class conditioned word probabilities (c=non-spam, c=spam): For each of the 3 email snippets below, ignoring case, punctuations, and words beyond the known vocabulary words, compute the class conditioned document probabilities for each of the 3 documents (6 in total: P(d c), P(d2 c), P(d3 c), P(d c), P(d2 c), P(d3 c)) using the Naïve Bayes model.

Sec.3.2 Recall Naive Bayes Classifier d x, x2,, xn cmap argmax P (cj x, x2,, xn ) cj C The Theprobability probabilityof ofaa document documentddbeing beingin inclass class c.c. argmax P ( x, x2,, xn cj )P (cj ) Bayes Bayes Rule Rule cj C argmax P ( x cj )P ( x2 cj ) P ( xn cj )P (cj ) cj C N (C c j ) ˆ P (c j ) N Pˆ ( xi c j ) Conditional Conditional Dependence Dependence Assumption Assumption N ( X i xi, C c j ) N (C c j ) k

Q: documents d: OEM software - throw packing case, leave CD, use electronic manuals. Pay for software only and save 75-9%! Find incredible discounts! See our special offers! d2: Our Hottest pick this year! Brand new issue Cana Petroleum! VERY tightly held, in a booming business sector, with a huge publicity campaign starting up, Cana Petroleum (CNPM) is set to bring all our readers huge gains. We advise you to get in on this one and ride it to the top! d3: Dear friend, How is your family? hope all of you are fine, if so splendid. Yaw Osafo-Maafo is my name and former Ghanaian minister of finance. Although I was sacked by President John Kufuor on 28 April 26 for the fact I signed 29 million book publication contract with Macmillan Education without reference to the Public Procurement Board and without Parliamentary approval.

Q: Naïve Bayes model p( dj ck ) t p( wi ck ) i f ( wi, dj ) where f(wi,dj) = frequency of word wi in document dj

Hint d2: Our Hottest pick this year! Brand new issue Cana Petroleum! VERY tightly held, in a booming business sector, with a huge publicity campaign starting up, Cana Petroleum (CNPM) is set to bring all our readers huge gains. We advise you to get in on this one and ride it to t the top! f ( wi, dj ) p( dj ck ) p( wi ck ) i p(d2 c) = p(hottest c)*p(brand c)*p(new c)*p(huge c)2

Group discussion

A p(d c ).5.2.2.3 6 6 p(d c ).99.93.99.99 9.2 p(d2 c )..2 2.3..2 4 p(d2 c ).98.92 2.9.99 7.47 p(d3 c ).5.4 2 2 p(d3 c ).98.2.96 2

Q2 Compute the posterior probabilities of each document in Question, given c and c, (6 in total: P(c d), P(c d), P(c d2), P(c d2), P(c d3), P(c d3)) assuming that 8% of all email received are spam, i.e., prior class probability P(c)=.8 (from which you can derive P(c)=P(c)), and finally decide whether each document p(is ck spam. dj ) p(dj ck ) p(ck )

Group discussion

Q2 P(c) = -P(c) =.2

A2 P(c d) P(d c)xp(c)=6x-6x.2=.2x-6 P(c d) P(d c)xp(c)=.92x.8=.72 P(c d2) P(d2 c)xp(c)=.2x-4x.2=2.4x5 P(c d2) P(d2 c)xp(c)=.747x.8=.6 P(c d3) P(d3 c)xp(c)=.2x.2=.4 P(c d3) P(d3 c)xp(c)=.96x.8=.6

Q3 Build a Naïve Bayes classifier using words as features for the training set in Table 2 and use the classifier to classify the test set in the table.

Bayes probability Prior probability: Probability of expecting class ck before taking in account any evidence Likelihood: Recall True only because we make the "naive" conditional independence assumptions Posterior probability:

Recall Naive Bayes: Learning Number of documents belonging to class ck Total number of documents Number of occurrence of term xi in docs of class ck Number of terms appearing in docs of class ck

MAP classifier MAP is maximum a posteriori Detect the class that maximize our posteriori probability We just try all the class ck Recall

Group discussion

A3 Prior probability: p(china)=2/4, p(~china)=2/4

A3 (learning) Doc Id Terms Taipei 2 Macao 3 Japan 4 Sapporo Taiwan Taiwan Shanghai Sapporo Osaka Taiwan Vocabulary = {Taipei, Taiwan, Macao, Shanghai, Japan, Sapporo, Osaka} Vocabulary = 7 Doc class #Terms Yes 5 No 5

A3 (learning) P(Taipei yes)=(+)/(5+7)=2/2 P(Taipei no)=(+)/(5+7)=/2 P(Taiwan yes)=(2+)/(5+7)=3/2 P(Taiwan no)=(+)/(5+7)=2/2 P(Sapporo yes)=(+)/(5+7)=/2 P(Sapporo no)=(2+)/(5+7)=3/2

A3 (classifying) Doc Id Terms 5 Taiwan Taiwan 2 2 3 3 2.6 P(yes d5)= 4 2 2 2 P(no d5)= 2 2 3 3.47 3 4 2 2 Answer: d5 belongs to the class no Sapporo

Q4 Each of two Web search engines A and B generates a large number of pages uniformly at random from their indexes. 3% of A s pages are present in B s index, while 5% of B s pages are present in A s index. What is the ratio between the number of pages in A s index and the number of pages in B s?

Recall

Group discussion

A4 3% x A = 5% x B A/B = 5/3