Annotation Guidelines

Similar documents
b Introductory Conversation

WHY IS IT USEFUL? Find the meaning Find the word you need the right word in the context Control the spelling of a word Find out how to use a word

Act 5: Comunicare, viaggiare e mangiare!

Act 1: Una nuova avventura

Handbook. Exercise #1: YOU PRACTISE WITH THE PRESENTER. Exercise #2: YOU TAKE PART IN A CONVERSATION. Welcome to the AudioNovo Language Course!

Act 4: Movie Set Travel Agency

Ask Manu Italiano [Ep.003] STASERA (tonight), STAMATTINA (this morning) & other cool expressions

Unit 1 - Review #4 The Verb PIACERE & Indirect Object Pronouns

ITALY READS NEWS. Collaboration between John Cabot University and Liceo Blaise Pascal

Grammar zone. 1 Riordina le parole e riscrivi le frasi. 3 Completa. 2 Completa. Ordine degli aggettivi - Preposizioni di tempo

WELCOME!!!! IIIA & IIIB ENGLISH LANGUAGE SCHOOL YEAR website:

CORSO DI LINGUA INGLESE. Il futuro e le Wh-questions Words!

WELCOME!!!! IIA & IIB ENGLISH LANGUAGE SCHOOL YEAR REVISION. website:

DIRECT OBJECT PRONOUNS (PRONOMI DI OGGETTO DIRETTO)

Recategorization and sentence structure

Luigi Rizzi TG 1. Locality

Ask Manu Italiano [Ep.008] How to make sentences in Italian

Lesson: 83 Le preposizioni con ANDARE

The verb PIACERE (to like) #1

La Vita Che Ti Diedi (Il Teatro Di Pirandello) (Volume 11) (Italian Edition) By Luigi Pirandello

Così Fan Tutte, K.588 (Act II, Aria: Donne Mie, La Fate A Tanti (baritone)): Full Score (Qty 2) [A2928] By Wolfgang Amadeus Mozart READ ONLINE

VITE PERICOLOSE DI BRAVI RAGAZZI (SPECIAL BOOKS) (ITALIAN EDITION) BY CHRIS FUHRMAN

Ilio Volante Composer

Le Mele di Idunn - Libro I - Ragnarok Era (Italian Edition)

RECENSIONI. Press, Cambridge New York Verifiche XLVI (1), 2017, pp

Come fare il test! Solo una delle opzioni di scelta corrisponde alla risposta corretta.

SURFPLAY (Passioni Pop) (Italian Edition) [Kindle Edition] By Francesco Fiorentino

MODULA 2 DISCIPLINE AND DESIGN MODULA 2 DISCIPLINE AND PDF MODULAR PROGRAMMING - WIKIPEDIA MODULE - WIKIPEDIA

The verb PIACERE (to like) #4

Estate (Summer) Estate (Summer)

Così Fan Tutte, K.588 (Act I, Duetto: Ah Quarda Sarella): Full Score (Qty 2) [A6248] By Wolfgang Amadeus Mozart

God Save the Double! Mauro Lo Monaco, Sergio Vinciguerra, Diana Cruickshank. Dance Research, Volume 24, Number 1, Summer 2006, pp.

24 Italian Songs And Arias: Medium High Voice (Book, Vocal Collection) PDF

Lingua e Traduzione per l impresa internazionale EN=>ITA. BA_Mediazione Anno III, Semestre 2

Ilio Volante Composer

INTRODUCING MYSELF! Hello! What s your name? My name is... How old are you? I am... Where are you from? I am from... Where do you live? I live in...

(Carmelo Mangano) of a good fortune, must be in want of a wife. di una buona fortuna, deve essere in necessità (ha bisogno) di una moglie.

Creativity and Landscape Towards a new european identity Landscape and silence!

Thursday 19 January 2012 Morning

World Journal of Engineering Research and Technology WJERT

Aroldo (Act II, Aria: Ah! Dagli Scanni Eterei): Bassoon 1 And 2 Parts [A5013] By Giuseppe Verdi

musicofilia 536F05390D5C6D3674F387676A12ABF9 Musicofilia 1 / 6

Eccezione Se il verbo finisce in consonante preceduta da una sola vocale accentata, si raddoppia la consonante.

Don Carlos (Act I, Coro E Finale: Inni Di Festa): Full Score [A5024] By Giuseppe Verdi

Lingua e Traduzione I : prassi traduttiva EN<=>ITA. BA_Mediazione Anno I, Semestre 2

Così Fan Tutte, K.588: Chorus Score (Italian / English) (Qty 3) [A2313] By Wolfgang Amadeus Mozart

Opera (e)studio 2019/2020

Laura Pariani. Quando Dio Ballava Il Tango.(Book Review): An Article From: World Literature Today [HTML] [Digital] By Angela M. Jeannet READ ONLINE

English Well Spoken. tel: Lavagna- Whiteboard -

DOWNLOAD OR READ : MURDOCH MYSTERIES PDF EBOOK EPUB MOBI

Striptemple - Tracklist#1 (Italian Edition) [Kindle Edition] By Fabio Corrirossi

(LIKE MY FATHER) directed by Stefano Mordini

Semantic Role Labeling of Emotions in Tweets. Saif Mohammad, Xiaodan Zhu, and Joel Martin! National Research Council Canada!

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm

Acoustic Prosodic Features In Sarcastic Utterances

Green Day. Uno! Dos! Tr By Emanuele Binelli Mantelli

The final publication is available at

Sarcasm Detection in Text: Design Document

Yo Canto (I Sing) (Laura Pausini) La niebla que se posa en la mañana

EDITORIAL RULES. Laboratorio Comunicazione e Immagine

Clues for Detecting Irony in User-Generated Contents: Oh...!! It s so easy ;-)

Copying is illegal. Review copy only. U j œ. œ œ œ œ œ. œ œ. # œ œ œ œ œ œ œ. ? b. œ œ œ œ œ œ œ œ. œ œ. Nancy M. Raabe

Where are you from? Vocabulary & dialogue. American Polish Russian British Canadian Australian French Japanese. COUNTRY NATIONALITY -ish The UK

LINGUA E TRADUZIONE PER L IMPRESA INTERNAZIONALE. Dott.ssa LAURA PICCHIO Lezione 6

F. Scott Fitzgerald. Il grande Gatsby. Trans. Franca Cavagnoli. Allira Hanczakowski

Top class French town Auteur compositeur Eddy Ray Cooper

Sentiment Analysis. Andrea Esuli

Cite. Infer. to determine the meaning of something by applying background knowledge to evidence found in a text.

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli

Curriculum Map. Unit #3 Reading Fiction: Grades 6-8

Text Analysis. Language is complex. The goal of text analysis is to strip away some of that complexity to extract meaning.

LINGUA E TRADUZIONE PER L IMPRESA INTERNAZIONALE

L'Oblio E La Follia (Italian Edition) [Kindle Edition] By Marco Mischianti

LEXICOGRAPHIC ISSUES IN COMBINATORICS

Figurative Language Processing: Mining Underlying Knowledge from Social Media

Annotating Expressions of Opinions and Emotions in Language

Please note that not all pages are included. This is purposely done in order to protect our property and the work of our esteemed composers.

Understatement Linguistic strategies at work in specialised and non-specialised language

SpringBoard Academic Vocabulary for Grades 10-11

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection

Joel Martinson (Choral score) Selah Publishing Co., Inc. Hn. J œ œ œ œ œ œ. j œ. 8 5 Choir: (Women or Men) for review only. ni- mi- pax.

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

Lo spazio della vulnerabilità. The space of vulnerability

Tutte Le Poesie (RSMediaItalia Nobel Collection) (Illustrato) (Italian Edition) [Kindle Edition] By Luigi Pirandello

power Style, quality and technology

L Opera della Primaziale Pisana and the Anima Mundi Festival of Sacred Music

2018 NAMCYA JUNIOR VOICE CATEGORY Additional Guidelines and Repertoire Requirements

L ISTITUTO BRUNO LEONI IN

LINGUA E TRADUZIONE PER L IMPRESA INTERNAZIONALE

L Opera della Primaziale Pisana and the Anima Mundi Festival of Sacred Music

Intonational meaning in Spanish conversation: low-rising vs. circumflex questions. Francisco Torreira & Simeon Floyd

Illinois Standards Alignment Grades Three through Eleven

Understanding People in Low Resourced Languages

Articoli & abstract Articles & Abstracts N. 15/2014 (II)

In memory of Giovanni Cecchetti

What is SOAPSTone? Speaker: The voice that tells the story Occasion: The time and the place of the

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

Ave Maria. œ œ œ œ œ. œ œ j. j œ. n œ # œ œ. Lord is with. Sol m Gm

Il Feng Shui In Casa Con Le Pietre Ediz Illustrata

Curriculum Map. Unit #3 Reading Fiction: Grades 6-8

Transcription:

Felicittà Annotation Guidelines February 9, 2014 1 Introduction The main purpose of this corpus is the creation of a gold standard for the evaluation of the system designed within the context of Felicittá. The ultimate goal, however, is to make available a useful resource for this purpose for other similar projects as well. The corpus consists of a collection of tweets automatically detected from Twitter and selected on the basis of the following criteria : Each tweet must be self-contained, there should not be a logical link between a tweet and the next one Tweets should not be confined to a given geographical area or time frame Tweets should be randomly collected on different days, and both on weekdays and holidays Both the corpus and these guidelines were developed through multiple stages. At a first stage, 100 tweet of the corpus were all annotated independently by four annotators (A1, A2, A3, A4). The annotated sets were automatically compared to detect differences; such differences were discussed and, as a result, a first guidelines version was drafted. In this stage, the HUM label (not comprised in the original tagset) was also added, in order to properly annotate both ironic and sarcastic tweets, thus avoiding to force their classification towards a positive or negative interpretation. A1 and A2 proceeded by annotating 900 tweets each, following the guidelines. After a discussion on the results of the first overall set of 1000 tweets, the guidelines were revised accordingly and a new set of 500 tweets was then annotated. A3 reannotated from scratch the whole corpus, thus consisting of 1500 tweets, using the finalized guidelines, and finally a disagreement was computed. A4 reannotated the tweets where the disagreement has been detected, and an updated version of the guidelines was produced. The final version of the corpus thus contains all the tweets where a full agreement was reached among the four annotators, while discarding the others. 1

Section 2 of this report is devoted to the description of the tagset used in this corpus and the general use of the labels contained in it, while in Section 3 some tricky cases will be discussed. 2 Tagset Except for RT, the tags used for the manual sentiment annotation are the same used within the Senti-TUT project for the development of a novel Italian corpus for sentiment analysis 1. In both SentiTUT and Felicittá gold corpus, sentiments are annotated at tweet level with one of the following tags: POS NEG MIXED HUM NONE UN RT Positive Negative Both positive and negative Ironic Neutral Unknown Repeated The meaning and use of each of these tags will be described in the next sections. NOTE: The annotation applies not only to the tweets whose author is the user him/herself, but also to the so-called retweet, as well as the tweet-containing quotes or reported speech. In fact, although these may not necessarily reflect the user s opinion, they could equally convey a given sentiment that can be useful in order to detect the mood of a certain community at any given time. 2.1 POS Tweets can be labeled as POS if: clearly express a positive opinion of the author with respect to a person or group, action, or event [3,93387E+17] RT @stylinsleeds: sua voce, è meravigliosa @harrysgreatlove amo troppo la [3,91884E+17] Peperoni e patate al forno...,.che dire? Complimenti alla cuoca!!! clearly express the positive state of mind of the author 1 http://www.di.unito.it/ tutreeb/sentitut.html 2

[3,91946E+17] Finalmente! Felicità assoluta! :D http://t.co/ ugboowzaz1 report a positive opinion or mood that may not be attributed directly to the author [3,91904E+17] RT @stylinsl0ve: "congratulazioni jay sei incint.." "sisi sono felicissima come eleanor del resto che sta con louis perché lo ama tanto e l... 2.2 NEG Tweets can be labeled as NEG if: clearly express a negative opinion by the author in respect of a person or group, action, or event [3,93423E+17] ma perché la gente è così rincoglionita? dello smog? sarà colpa clearly express the negative state of mind of the author [3,92035E+17] @believemestyles nno ma ri giuro che sto da schifo ho la testa che mi scoppia e mi viene da vomitare fuck report a negative opinion or mood that may not be attributed directly to the author [3,91825E+17] RT @AmorosoMyLife : significato dei vizi pt 7: mangiare in continuazione quello che capita: senso di vuoto, solitudine. a polemical tone is used, revealing a critical attitude against a person, group or event [3,93308E+17] RT @Yi Benevolence: Nessun politico italiano è mai andato in Europa a fare i nostri interessi http://t.co/rgpj19bsbh @rinaldi euro m5s 3

2.3 MIXED The cases that can be labeled as MIXED are those in which multiple different sentiment are expressed within the same tweet. Such sentiments can be expressed both in relation to different targets: [3,93407E+17] I talent show sono dei karaoke, dice Miles Kane. Uno che va ascoltato, se amate gli Who http://t.co/6le1y2niqz http://t.co/ FiyQZqhvkk [3,93376E+17] Thohir: Inter, Ventola il mio giocatore preferito...e noi che pensavamo che a non capire niente di calcio fosse solo Moratti. and in regard to the same target: [3,93384E+17] @GialloParma, mai, perche dovrei? Sono tra i 20 fondatori del PD di Parma. Se vince le primarie e il mio segretario,anche se non condivido 2.4 HUM The tweets labeled as HUM are the ones with a clear ironic, sarcastic or otherwise humorous intent: [3,93303E+17] @SimoneLodi andrenfkdksjjsjfj non riesco nemmeno a scriverlo! [3,9188E+17] Oggi pranzo da nonna. Tornerò rotolando [3,93307E+17] Ho la risposta a tutte le vostre domande: Si [3,91965E+17] GIUSTO DOBBIAMO FARE LORO LA DANZA DELLA PIOGGIA COSI SONO CONTENTI...!!!!!!!!!!!! http://t.co/hqz4qwgbvu [3,91849E+17] RT @AmedeoTomanelli: @angelacasciaro @m giul Diritto? In Italia? Ah, ho capito... Sei maestro di tennis! 4

2.5 NONE When the tweet contains a mere observation or mention of an objective fact, and it does not convey any state of mind or opinion, it is considered neutral and is therefore tagged with the label NONE. [3,93326E+17] RT @PPolicy News: Assemblea @comuni anci, il discorso integrale di Fassino http://t.co/ujjqzf7edq... @ShareTheRoadFab [3,93173E+17] RT @ViolettaItalyIT: Novità in vista: @TiniStoessel canterà il brano Libresoy, tratto dal nuovo film Disney Frozen: il regno di ghiaccio... However, when such observation is followed by other extratextual elements that give the text a different connotation, the tweet would no longer be considered neutral. For such cases, see the discussion in Section 3.1. 2.6 UN A tweet can be labeled as UN if its classification may be difficult for one of the following reasons: the tweet is unintelligible, because it is incomplete 2 [3,932E+17] Quando Dio manda una malattia o morte a bambino o donna o uomo o vecchio ha ragione e dovete stare muti o vi... http://t.co/ttoqf53aou it contains acronyms, abbreviations, loanwords, jargon or dialect terms that the annotator may not know [3,91742E+17] @twdehmaneiro é a musica zkskdkwk [3,92004E+17] Busca a campanita a certain mood is perceived, but a context knowledge is required for its interpretation [3,93367E+17] RT @xneedharry: Voglio un ragazzo che mi lasci 2 However, if the annotator is able to interpret the meaning of the tweet, despite its incompleteness, he/she can assign the appropriate tag. 5

rubare le sue felpe dall armadio e che pur accorgendosene non dica nulla. [3,93387E+17] RT @brecordz: Congratulazioni! http://t.co/ ZjvMRDGOdr [3,91943E+17] @giulioguazzugli @Stefano Pantano Impossibile. Sono della Roma. [3,9193E+17] @RosyHiddles @ScarlettCavendi Guielmo del Toro... direi ragazze ke questi sn i ruoli perfetti x tom Loki ne è un esempio... 2.7 RT RT label is used for those tweets that occur more than once in the corpus. [3,9193E+17] RT @horanislife: DOBREV http://t.co/7qlgtygug1 [3,91929E+17] RT @horanislife: RIVERA http://t.co/resmn5ys9c 3 i capolavori di madre natura NINA i capolavori di madre natura NAYA However, these are rare cases. 3 Discussion Those described above are the general guidelines. However, there are cases which by their ambiguity and complexity can make more complicated the annotator s work. In the following sections, some of these cases will be discussed. 3.1 Punctuation, ions, emoticons Whenever messages with an apparently neutral content, such as: [3,91825E+17] @Smile Sel1D fatto are accompanied by iconic elements, such as emoticons, can acquire a certain connotation, thus giving the message an explicit polarity. [3,91825E+17] @Smile Sel1D fatto! C; 3 In this specific case, although what is presumably the target (respectively Nina Dobrev and NAYA RIVERA) is different, the content of the tweets is basically unchanged. 6

The use of C; in this tweet, for example, conveys a positive mood; as a result, the tweet can be labeled as POS 4. Punctuation marks also can be indicators of the sentiment polarity, as in: [3,93254E+17] Buongiorno!!! http://t.co/y0ckqhlx8o where the!!! suggest a positive mood 5. Therefore, the general principle to be followed in these cases is that when the text is neutral, the polarity is established on the basis of the non-textual element(s). On the other hand, we may also encounter cases where the meaning of the text and iconic element are in conflict, as in: [3,93138E+17] @Valles Core tonto (cuore) In these cases, the text takes precedence over the iconic element, and the sentiment polarity is identified on the basis of the former. 3.2 Verbs expressing hope or desire As verbs such as hope or desire express an aspiration to something positive, this makes us lean on labeling them as POS, as in the example below: [3,93423E+17] Spero tanto che ci sia ancora il mio prof preferito @flavnc :-** However, we cannot apply this priciple as a general rule. [3,92008E+17] @pierluigi ds thoir spero venda taider e guarin e prenda giocatori ce vogliono giocare! The tweet reported above, for example, has a clear polemic connotation that leads to the annotation of the tweet as NEG. 3.3 Others The purpose of the sentiment annotation task is to identify the overall mood of a tweet. However, the fine line between a given sentiment and another within a small space such as that of a tweet often makes its labeling a non-trivial task. In the example below, the tweet could be considered both as entirely negative and humorous. [3,9176E+17] RT @mondoditumblr: Mi sento come una virgola. A volte 4 For a detailed list of the emoticons and their meaning, see the dedicated page on Wikipedia: http://en.wikipedia.org/wiki/emoticon 5 Otherwise, the tweet would be labeled as neutral, thus NONE. 7

qualcuno si dimentica di me, a volte sono di troppo e a volte non servo a nulla. In these cases, we therefore opted to assign the sentiment that is perceived to be the most prevalent. The tweet reported above would then be labeled as NEG. It may also happen that a tweet containing multiple different sentiments - and therefore considerable as MIXED - actually contains expressions that lean in favor of an overall positive or negative interpretation of the tweet. [3,92027E+17] E stata una settimana perfetta Ma questa domenica ha rovinato tutto Ma proprio tutto. [3,91819E+17] RT @valeriag97: "Ma è carino, si presenta bene.." "Si ma noi non siamo la Ventura" Ahahahahaha tanta stima per MorganX XF7 8