Introduction to NLP. Ruihong Huang Texas A&M University. Some slides adapted from slides by Dan Jurafsky, Luke Zettlemoyer, Ellen Riloff

Similar documents
Introduction to NLP. Ruihong Huang Texas A&M University. Some slides adapted from slides by Dan Jurafsky, Luke Zettlemoyer, Ellen Riloff

Introduction to NLP. What is Natural Language Processing?

Fall 2018 TR 8:00-9:15 PETR 106

Tamar Sovran Scientific work 1. The study of meaning My work focuses on the study of meaning and meaning relations. I am interested in the duality of

PERCUSSION SYLLABUS FOR APPLIED PERCUSSION LESSONS (Lower Division 149, 151, 152, Upper Division 352, & Graduate 551, 552)

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th

Sentence Processing. BCS 152 October

Semantics. Philipp Koehn. 16 November 2017

RUSS 4304 BANNED AND CENSORED WORKS OF RUSSIAN LITERATURE. Department of Modern Languages University of Texas at Arlington Fall 2011 T/TH 2:00-3:20

Toward Computational Recognition of Humorous Intent

TEXAS A&M UNIVERSITY - COMMERCE DEPARTMENT OF MUSIC

Metonymy Research in Cognitive Linguistics. LUO Rui-feng

Foundations in Data Semantics. Chapter 4

Helping Metonymy Recognition and Treatment through Named Entity Recognition

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Instant Words Group 1

Beyond Intents! NLU for Conversational UIs. Head of Research. MetaForum 2017, Brussels. Dr. Rebecca Jonsson

Creating Mindmaps of Documents

Ideas for teaching advanced level students

Word Sense Disambiguation in Queries. Shaung Liu, Clement Yu, Weiyi Meng

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

Sentiment Analysis. Andrea Esuli

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli

CURRICULUM CATALOG. English III (01003) WA

Syllabus: PHYS 1300 Introduction to Musical Acoustics Fall 20XX

Unit 7. Exercise 1. Listening Activity: Taking a Test about International Tourist Destinations, p.

The Application of Stylistics in British and American Literature Teaching. XU Li-mei, QU Lin-lin. Changchun University, Changchun, China

Sentiment Aggregation using ConceptNet Ontology

MITOCW watch?v=yebr410e2re

The Visual Denotations of Sentences. Julia Hockenmaier with Peter Young and Micah Hodosh University of Illinois

Sarcasm Detection in Text: Design Document

MUS : SURVEY OF MUSIC LITERATURE Cultural Arts Building, 1023 TTR 5:00-6:15 p.m.

Fundamentals of Music Theory MUSIC 110 Mondays & Wednesdays 4:30 5:45 p.m. Fine Arts Center, Music Building, room 44

Functional Piano MUSI 1180 Monday, Wednesday Sessions FALL Course Number, Section Number, and Course Title: MUSI 1180 Functional Piano

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS EAR TRAINING III MUS Class Hours: 1.0 Credit Hours: 1.0

A&M Commerce Chorale Spring 2018 Syllabus MUS 300CS-009 MWF 3:00-3:50 Music Building 105

Basic Natural Language Processing

American Music (MUSI 1310) Spring, 2016 HCC Distance Education

Texas A&M University-Commerce Department of Music Percussion Methods: MUS , Spring 2013 Monday & Wednesday, 9:00-9:50 Purpose Requirements:

World Journal of Engineering Research and Technology WJERT

The ACL Anthology Network Corpus. University of Michigan

Music World Music - the art of listening -

Music 4 - Exploring Music Fall 2015

MUS Chamber Choir (TR 2-250) Spring 2014 COURSE SYLLABUS

Texas A&M Commerce. University Singers Syllabus MUS 100U-007. Fall/Spring MWF 3:00 3:50; Music Building 105

CENTRAL TEXAS COLLEGE MUSI 1301 FUNDAMENTALS OF MUSIC. Semester Hours Credit: 3

Key - Worksheet 3 Linguistics Eng B

Functional Piano MUSI 1181 Mondays & Wednesdays FALL 2018

Automatic Speech Recognition (CS753)

Random seismic noise reduction using fuzzy based statistical filter

SCHOOL ANNOUNCEMENTS WEDNESDAY, APRIL

Pumpkin Pie Math 5 + 1= 2 + 3= 4 + 2= 6+ 3= 7 + 3= 1 + 2= 8 + 2= 9 + 1= 3 + 4= 2 + 2= 4 + 5= 6 + 2= Name Date. Practice solving addition problems.

Music 111 Music Appreciation I, 3 Units

Lesson 9 - When and Where Do You Want to Go?

TEXAS A&M UNIVERSITY-COMMERCE MUSIC DEPARTMENT APPLIED MUSIC: VOICE LEANNE SCAGGS, INSTRUCTOR COURSE SYLLABUS, FALL 2016

Introduction to WordNet, HowNet, FrameNet and ConceptNet

UGS 303 THE BEATLES AND BEYOND SPRING 2017

Detecting Sarcasm in English Text. Andrew James Pielage. Artificial Intelligence MSc 2012/2013

CASPER COLLEGE COURSE SYLLABUS MUSC 1041:01 Music Theory II for Musical Theatre. Lecture Hours: 3 Lab Hours: 0 Credit Hours: 3

CURRICULUM CATALOG ENGLISH III (01003) NY

DEGREE IN ENGLISH STUDIES. SUBJECT CONTENTS.

Identifying functions of citations with CiTalO

Interdepartmental Learning Outcomes

POLS 3045: Humor and American Politics SPRING 2017, Dr. Baumgartner Meets Tues. & Thur., 9:30-10:45, in Brewster, D-202

Sentence Variety. Vary the Beginnings of Sentences Vary Methods of Joining Ideas

Concept of ELFi Educational program. Android + LEGO

CLARIN - NL. Language Resources and Technology Infrastructure for the Humanities in the Netherlands. Jan Odijk NO-CLARIN Meeting Oslo 18 June 2010

Westminster College School of Music Fall, 2018

Write a summary of the text in English, including the most important points, using your own words whenever possible (maximum 50 words,).

Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus

Music 111 Music Appreciation I, 3 Units

Scope and Sequence for NorthStar Listening & Speaking Intermediate

六级 口语考试流程 : 模拟题 3 号. 考官录 音 :Thank you. OK, now that we know each other, let s go on. First, I d like to ask each of you a question.

Linguistics 104 Language and conceptualization

ENGLISH STUDIES SUMMER SEMESTER 2017/2018 CYCLE/ YEAR /SEMESTER

Scalable Semantic Parsing with Partial Ontologies ACL 2015

ACADEMIC INTEGRITY POLICY (AIP) AN OVERVIEW. Office of Academic Integrity (OAI) Fall Semester

Your Research Assignment: Searching & Citing

(Faculty/field of study)

MUH 2051: Music Cultures of the World Fall pm-1pm

TEXAS A&M UNIVERSITY-COMMERCE MUSIC DEPARTMENT APPLIED MUSIC: VOICE JENNIFER GLIDDEN, INSTRUCTOR COURSE SYLLABUS

Modern Latin America HIST 3358 JO Spring 2005, Wednesdays 7:00-9:45 pm

Students will play scale exams with a metronome-scales in eighth notes,quarter note equals 80.

UNIVERSITY OF SOUTH ALABAMA PSYCHOLOGY

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

An Introduction to Deep Image Aesthetics

2015, Adelaide Using stories to bridge the chasm between perspectives

Student Performance Q&A:

Introduction to Natural Language Processing Phase 2: Question Answering

Acoustic Prosodic Features In Sarcastic Utterances

GERUNDS AND INFINITIVES

HISTORY 3800 (The Historian s Craft), Spring :00 MWF, Haley 2196

How English Phrases Are Formed: Syntax I

read read essay book how writes write. essay

How to Solve Syllogisms for IBPS Exam Reasoning Section?

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Paper 1 Question 2. L.O. To build our knowledge of language techniques and to practise our ability to analyse writer s language choices.

Music 4 - Exploring Music Fall 2016

Do we really know what people mean when they tweet? Dr. Diana Maynard University of Sheffield, UK

What are meanings? What do linguistic expressions stand for or denote?

Transcription:

Introduction to NLP Ruihong Huang Texas A&M University Some slides adapted from slides by Dan Jurafsky, Luke Zettlemoyer, Ellen Riloff

"An Aggie does not lie, cheat, or steal or tolerate those who do." For additional information, please visit: http://aggiehonor.tamu.edu. Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, currently located in the Disability Services building at the Student Services at White Creek complex on west campus or call 979-845-1637. For additional information, visit http://disability.tamu.edu.

Piazza: CSCE 489-508, NLP https://piazza.com/class#fall2017/csce489508 course page: http://faculty.cse.tamu.edu/huangrh/fall17/fall1 7_nlp_foundation_technique.html

Class participation: 10% Four Programming Assignments: 40% The Final Project: 25% (abstract: 5%, presentation+report+code+data: 20%) Annotation assignment: 5% Final exam: 20%

Late Policy: 20% reduction per day. Including programming assignments, annotation assignment, and the final project.

Programming Assignments Code: has to be runnable Report: how to run, results and analysis, remaining issues, known bugs.

The Final Project Due by mid semester (10/12, before the class starts): 1-page abstract By the end of the semester: submit code data and a report, and a class presentation. Report: 8 pages maximum, describe the problem, approaches and evaluation results.

The final Project Solving a mini core research problem you have identified by reading recent research papers from top NLP conferences. Developing a nice NLP application system.

Basic Recipe of Forming a Project Choose a Topic and do a quick survey Prepare data Think about evaluation methods Start to work on it

Core research problems Semantics, word sense disambiguation Coreference resolution, discourse, pragmatics Consider to participate in a SemEval task (http://alt.qcri.org/semeval2018/index.php?id=ta sks)

Applications Question-Answering Text Summarization Dialogue systems Sentiment Analysis Machine Translation Interdisciplinary applications

What is NLP? What is NLP? Fundamental goal: deep understand of broad language Not just string processing or keyword matching End systems that we want to build: Simple: spelling correction, text categorization Complex: speech recognition, machine translation, information extraction, sentiment analysis, question answering Unknown: human-level comprehension (is this just NLP?)

Question Answering: Jeopardy! US Cities: Its largest airport is named for a World War II hero; its second largest, for a World War II battle.

Information Extraction Subject: curriculum meeting Event: Curriculum mtg Date: January 15, 2012 Date: Jan-16-2012 Start: To: Dan Jurafsky End: 10:00am 11:30am Where: Gates 159 Hi Dan, we ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris 15 Create new Calendar entry

Google Knowledge Graph Knowledge Graph: things not strings

Text Summarization Condensing documents Single or multiple docs Extractive or synthetic Aggregative or representative Very contextdependent! An example of analysis with generation

Human-machine Dialogs

Machine Translation Helping human translators Enter Source Text: 这不过是一个时间的问题. Fully automatic Translation from Stanford s Phrasal: This is only a matter of time. 19

Inter-Disciplinary Computer Science: artificial intelligence, machine learning Linguistics: computational linguistics Psychology: cognitive psychology, psycholinguistics Statistics: probabilistic methods, information theory

Interactions with Linguists (History) 70s and 80s: more linguistic focus -deeper models, toy domains, rule-based systems 90s: empirical revolution -robust corpus-based methods, empirical evaluation 2000s: richer linguistic representations used in statistical approaches

Outline of Words: Text classification of Words: language modeling, parts of speech tagging of Words: syntactic parsing, dependency parsing : thesaurus, distributional, distributed, coreference, pragmatics

Language Technology making good progress still really hard Sentiment analysis mostly solved Best roast chicken in San Francisco! The waiter ignored us for 20 minutes. Spam detection Let s go to Agra! Buy V1AGRA Part-of-speech (POS) tagging ADJ ADV Carter told Mubarak he shouldn t run again. Word sense disambiguation (WSD) I need new batteries for my mouse. ADJ NOUN VERB Colorless green ideas sleep furiously. Named entity recognition (NER) PERSON LOC Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness? Coreference resolution ORG Einstein met with UN officials in Princeton Question answering (QA) Paraphrase XYZ acquired ABC yesterday ABC has been taken over by XYZ Summarization Parsing The Dow Jones is up I can see Alcatraz from the window! Machine translation (MT) The 13th Shanghai International Film Festival You re invited to our dinner party, Friday May 27 at 8:30 Housing prices rose Dialog 第13届上海国际电影节开幕 Information extraction (IE) The S&P500 jumped Party May 27 add Economy is good Where is Citizen Kane playing in SF? Castro Theatre at 7:30. Do you want a ticket?

Ambiguity!!

Ambiguities inherent in Language Language is succinct and expressive. Human resolve ambiguities naturally.

Syntax: structural ambiguity Time flies like an arrow. Metaphor: Time/NOUN flies/verb like/prep an/art arrow/noun New Fly Species: Time/NOUN flies/noun like/verb an/art arrow/noun Stopwatch Imperative: Time/VERB flies/noun like/prep an/art arrow/noun

Syntax: structural ambiguity (attachment) I saw the Grand Canyon flying to New York. I watered the plant with yellow leaves. I saw the man on the hill with the telescope.

But syntax doesn t tell us much about meaning Colorless green ideas sleep furiously. [Chomsky] plastic cat food can cover

Semantics: Lexical Ambiguity I walked to the bank... of the river. to get money. The bug in the room... was planted by spies. flew out the window. I work for John Hancock... and he is a good boss. which is a good company.

Discourse, Pragmatics

Discourse: coreference A Short Story President John F. Kennedy was assassinated. The president was shot yesterday. Relatives said that John was a good father. JFK was the youngest president in history. His family will bury him tomorrow. Friends of the Massachusetts native will hold a candlelight service in Mr. Kennedy s home town.

Pragmatics Rules of Conversation Can you tell me what time it is? Could I please have the salt? Speech Acts I bet you $50 that the Jazz will win tonight. Will you marry me?

NLP: a branch of AI Lack of world knowledge inferences

World Knowledge, Inferences John went to the diner. He ordered a steak. He left a tip and went home. John wanted to commit suicide. He got a rope.

Sparsity!!!

Zipf s Law the frequency of any word is inversely proportional to its rank: f = K / r fat-tail, most words occur only a couple of times high lexical diversity -> data sparseness

Goals of the class Key tasks, algorithms Essentially skills to build your system (Hopefully) see problems, holes, gaps, start research