Scalable Semantic Parsing with Partial Ontologies ACL 2015

Similar documents
Projektseminar: Sentimentanalyse Dozenten: Michael Wiegand und Marc Schulder

Sentiment Aggregation using ConceptNet Ontology

MUSI-6201 Computational Music Analysis

Music Genre Classification

Sarcasm Detection in Text: Design Document

Identifying functions of citations with CiTalO

Natural Language Processing

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Improving MeSH Classification of Biomedical Articles using Citation Contexts

Enabling editors through machine learning

Semantic Role Labeling of Emotions in Tweets. Saif Mohammad, Xiaodan Zhu, and Joel Martin! National Research Council Canada!

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Improving Frame Based Automatic Laughter Detection

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases

Music Radar: A Web-based Query by Humming System

What s New in the 17th Edition

Chapter 1 Midterm Review

Detecting Musical Key with Supervised Learning

Singer Traits Identification using Deep Neural Network

The Visual Denotations of Sentences. Julia Hockenmaier with Peter Young and Micah Hodosh University of Illinois

The ACL Anthology Network Corpus. University of Michigan

Digital Text, Meaning and the World

Absolute Relevance? Ranking in the Scholarly Domain. Tamar Sadeh, PhD CNI, Baltimore, MD April 2012

A repetition-based framework for lyric alignment in popular songs

Lyric-Based Music Mood Recognition

Modelling Intellectual Processes: The FRBR - CRM Harmonization. Authors: Martin Doerr and Patrick LeBoeuf

The Mediterranean TV Channel. Project presentation

Neural Network Predicating Movie Box Office Performance

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

Introduction to Natural Language Processing Phase 2: Question Answering

ABSTRACT CITATION HANDLING: PROCESSING CITATION TEXTS IN SCIENTIFIC DOCUMENTS. Michael Alan Whidby Master of Science, 2012

a start time signature, an end time signature, a start divisions value, an end divisions value, a start beat, an end beat.

THE STRATHMORE LAW REVIEW EDITORIAL POLICY AND STYLE GUIDE

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

Government Unit 3 Performance Task Analysis and Argumentative Writing: Foreign Affairs Paragraph

Nielsen Examines TV Viewers to the Political Conventions. September 2008

A Comparison of Methods to Construct an Optimal Membership Function in a Fuzzy Database System

Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

1) I feel good today.?! 2) Hey! Can you hear me.?! 3) I like oranges.?! 4) What time did you go to the movie last night.?! 5) Where are we going.?!

Author-Specific Sentiment Aggregation for Polarity Prediction of Reviews

Video-based Vibrato Detection and Analysis for Polyphonic String Music

CSE 517 Natural Language Processing Winter 2013

Music Genre Classification and Variance Comparison on Number of Genres

Detect Missing Attributes for Entities in Knowledge Bases via Hierarchical Clustering

Agilent Technologies. N5106A PXB MIMO Receiver Tester. Error Messages. Agilent Technologies

WHITEPAPER. Customer Insights: A European Pay-TV Operator s Transition to Test Automation

Large Scale Concepts and Classifiers for Describing Visual Sentiment in Social Multimedia

Recommending Citations: Translating Papers into References

National University of Singapore, Singapore,

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Web of Science Unlock the full potential of research discovery

Semi-supervised Musical Instrument Recognition

Analysis of Cancon Facebook pages and posts

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Introduction to WordNet, HowNet, FrameNet and ConceptNet

NYU Scholars for Individual & Proxy Users:

gresearch Focus Cognitive Sciences

NYU Scholars for Department Coordinators:

CRIS with in-text citations as interactive entities. Sergey Parinov CEMI RAS and RANEPA

FOIL it! Find One mismatch between Image and Language caption

HIT SONG SCIENCE IS NOT YET A SCIENCE

2 o Semestre 2013/2014

The Million Song Dataset

Life Domain: Income, Standard of Living, and Consumption Patterns Goal Dimension: Objective Living Conditions. Income Level

Chinese Word Sense Disambiguation with PageRank and HowNet

Supervised Learning in Genre Classification

Figures in Scientific Open Access Publications

Supplementary Note. Supplementary Table 1. Coverage in patent families with a granted. all patent. Nature Biotechnology: doi: /nbt.

YOU ARE WHAT YOU LIKE INFORMATION LEAKAGE THROUGH USERS INTERESTS

Natural Language Processing (CSE 517): Predicate-Argument Semantics

Primary and Secondary Sources of information

On Meaning. language to establish several definitions. We then examine the theories of meaning

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Multi-modal Analysis for Person Type Classification in News Video

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

A Generic Semantic-based Framework for Cross-domain Recommendation

Automatic Music Clustering using Audio Attributes

Music Information Retrieval Community

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Methodologies for Creating Symbolic Early Music Corpora for Musicological Research

VFA Participation Agreement 2018 (Year 5)

Precision testing methods of Event Timer A032-ET

MULTIPLE TPS REHOST FROM GENRAD 2235 TO S9100

THE USE OF THOMSON REUTERS RESEARCH ANALYTIC RESOURCES IN ACADEMIC PERFORMANCE EVALUATION DR. EVANGELIA A.E.C. LIPITAKIS SEPTEMBER 2014

CHAPTER 2 REVIEW OF RELATED LITERATURE. advantages the related studies is to provide insight into the statistical methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

jsymbolic 2: New Developments and Research Opportunities

Using Genre Classification to Make Content-based Music Recommendations

Release Year Prediction for Songs

Detecting Hoaxes, Frauds and Deception in Writing Style Online

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Transcription:

Scalable Semantic Parsing with Partial Ontologies Eunsol Choi Tom Kwiatkowski Luke Zettlemoyer ACL 2015 1

Semantic Parsing: Long-term Goal Build meaning representations for open-domain texts How many people live in Seattle? Semantic Parser SELECT Population FROM CityData where City=="Seattle ; Executor 620,778 (Kwiatkowski et.al, 13, Liang et.al. 11, Cai & Yates 2013, Berant et.al. 13,14, Reddy et. a. 14) 2

Semantic Parsing: Large Domain is a large, community authored knowledge base with: 40 Million Entities 2 Billion Facts 20,000 Relations 10,000 Types 100 Domains 3

Current semantic parsers can parse How many people live in Seattle? Which college did Obama go to? What party did Clay establish? 4

Current semantic parsers can parse cannot parse How many people live in Seattle? Which college did Obama go to? How many people live in Anyang? Which college did Eunsol go to? What party did Clay establish? Who are Russian short story writers in 19th century? What is a popular seaside resort city in Italy? 5

Remaining Challenges Fact Incompleteness How many people live in Anyang? Which college did Eunsol go to? Schema Incompleteness Who are Russian short story writers in 19th century? What is a popular seaside resort city in Italy? 6

Remaining Challenges Fact Incompleteness How many people live in Anyang? Which college did Eunsol go to? Schema Incompleteness Who are Russian short story writers in 19th century? What is a popular seaside resort city in Italy? 7

Remaining Challenges: Fact Incompleteness 620,778 Seattle Population State How many people live in Anyang? Which college did Eunsol go to? Washington Anyang Unable to handle sentences Population with facts not in Freebase? 8

Remaining Challenges: Fact Incompleteness 620,778 Seattle Population State How many people live in Anyang? Which college did Eunsol go to? Washington Anyang 70% of people in FB have no birth place (West 14) Population 66% of facts missing in pilot study for our dataset? 9

Remaining Challenges Fact Incompleteness How many people live in Anyang Which college did Eunsol Schema Incompleteness Who are Russian short story writers in 19th century? What is a popular seaside resort city in Italy? 10

Remaining Challenges: Schema Incompleteness Who are Russian short story writers in 19th century? What is popular seaside resort city in Italy? Unable to handle concepts outside existing schema 11

Remaining Challenges: Schema Incompleteness Who are Russian short story writers in 19th century? What is popular seaside resort city in Italy? In a pilot study on our dataset: Unable to handle concepts 27.2% of sentences describe outside existing schema concepts not in Freebase 12

Previous Approach? Existing data is filtered to ensure completeness: FB917 dataset is created from Freebase (Cai and Yates 13) 93% of originally gathered questions cannot be answered with FB (WebQuestions, Berant 13) 13

Remaining Challenges Fact Incompleteness How New many learning people approach live in Anyang? with broad coverage lexical statistics Which college did Eunsol go to? Schema Incompleteness Who are Russian short story writers in 19th century? Semantic parser with partial groundings What is a popular seaside resort city in Italy? 14

This Work Build meaning representations with both Freebase concepts and open concepts British playwright, novelist and short story writer Semantic Parser 15

Parsing with Incompleteness 1. Open Information Extraction (Banko et al., 07; Fader et al., 11) 2. Matrix Factorization (Riedel, 13;Krishnamurthy, 15) 3. Web Search Queries (Joshi, 14)

Outline 1. Task and Applications 2. Data 3. Semantic Parser with Partial Ontology 4. Learning 5. Evaluation

Task Build a meaning representation with Freebase concepts and concepts outside Freebase British playwright, novelist and short story writer Semantic Parser

Focus: Noun Phrases Interesting noun-noun modifier, implicit relations. Itself is a referring expression, resembling queries. 19

Focus: Noun Phrases Interesting noun-noun modifier, implicit relations. Useful for information extraction, when paired with an entity. 20

Applications Input: Referring Expression Resolution (QA) Noun Phrase Entity Attribute Extraction (IE) (Entity, Noun Phrase) British playwright, novelist and short story writer Somerset Maugham, British playwright, novelist and short story writer Output: Sommerset Maugham (S. Maugham, Nationality, U.K) (S. Maugham, Profession, Novelist) (S. Maugham, Profession, Playwright)

Overview: Approach British playwright, novelist and short story writer Semantic Parser 22

Referring Expression Resolution (QA) British playwright, novelist and short story writer Semantic Parser Sommerset Maugham 23

Referring Expression Resolution (QA) Entity Attribute Extraction (IE) British playwright, novelist and short story writer Semantic Parser Sommerset Maugham (x, Nationality, U.K) (x, Profession, Novelist) (x, Profession, Playwright) 24

Referring Expression Resolution (QA) Entity Attribute Extraction (IE) Somerset Maugham, British playwright, novelist and short story writer Semantic Parser Sommerset Maugham (S.Maugham, Nationality, U.K) (S.Maugham, Profession, Novelist) (S.Maugham, Profession, Playwright) 25

Outline 1. Task and Applications 2. Data 3. Semantic Parser with Partial Ontology 4. Learning 5. Evaluation

Wikipedia Category 27

Wikipedia Category 28

Wikipedia Category Film directors from New York 29

Wikipedia Category Film directors from New York 30

Wikipedia Category On average, 15% entity overlap with Freebase Exciting opportunity for information extraction Challenge for existing learning techniques Film directors from New York 31

Wikipedia Category: Data statistics Entire Set Number of Category 365 K Number of words per category 4.1 Number of entity-category pair 7 million 32

Appositives Relation between a named entity and a nominal. Laurie Hays, a executive editor at Bloomberg News, is leaving the company. Laurie Hays, a executive editor at Bloomberg News 33

Appositives Extracted from open texts such as news articles Malta, an EU outpost in the Mediterranean, decided today Richard Nixon, a former president of the United States Maputo, the relaxed seaside capital of Mozambique, 34

Appositives Extracted from open texts such as news articles Malta, an EU outpost in the Mediterranean, decided today Richard Nixon, a former president of the United States Maputo, the relaxed seaside capital of Mozambique, 35

Appositives: Data statistics Entire Set Number of apposition 67 K vocab 25 K Number of words per apposition 5.73 36

Outline 1. Task and Applications 2. Data 3. Semantic Parser with Partial Ontology 4. Learning 5. Evaluation

Two Stage Semantic Parsing (EMNLP 13) British playwright, novelist and short story writer Domain Independent Parse Ontology Match 38

Two Stage Semantic Parsing (EMNLP 13) British playwright, novelist and short story writer Domain Independent Parse Ontology Match 39

Two Stage Semantic Parsing (EMNLP 13) British playwright, novelist and short story writer Domain Independent Parse Ontology Match 40

Two Stage Semantic Parsing with Partial Grounding British playwright, novelist and short story writer Domain Independent Parse Ontology Match 41

Partial Grounding: Open Schema Explicitly model concepts not in Knowledge base as OpenRel and OpenType Plants described in 1891 Lower_classification(plant, x) OpenRel_described_in(x, 1891) Former municipalities in Brandenburg OpenType_Former(x) OpenRel(x, Municipality) Located_In(x, Brandenburg) 42

Partial Grounding: Open Schema Explicitly model concepts not in Knowledge base Benefits of as open OpenRel schema: and OpenType Plants described in 1891 Help learn String-Freebase concepts (plant, lower_classification, x) Allow partial execution (x, OpenRel(described_in), 1891) Capture useful information, although not Former municipalities in Brandenburg grounded (x, Type, OpenType_Former) (x, OpenRel, Municipality) (x, Located_In, Brandenburg) 43

Two Stage Semantic Parsing Domain Independent Parse with Partial Grounding Ontology Match Structure Match Constant Matches for. OPEN Constant Matches OpenType OpenRel Municipality Location.ContainedBy Brandenburg 44

Outline 1. Task and Applications 2. Data 3. Semantic Parser with Partial Ontology 4. Learning 5. Evaluation

Previous Work: Direct Supervision How many people live in Seattle? How many people live in Seattle? Semantic Parser SELECT Population FROM CityData where City=="Seattle ; Latent Executor 620,778 620,778 46

Supervision from Unfiltered Data 47

Supervision from Unfiltered Data Social democratic parties in Greece Semantic Parser Executor {Agreement for the New Greece, Agricultural and Labour Party, Free Citizens, Democratic Social Movement } 4 entities 48

Supervision from Unfiltered Data Social democratic parties in Greece Semantic Parser Missing X Political_Party.Ideology facts make direct Social supervision Democratic difficult. X Political_Party.Country Greece Executor {Agreement for the New Greece, Agricultural and Labour Party, Free Citizens, Democratic Social Movement } 4 entities 49

Supervision from Unfiltered Data Social democratic parties in Greece Semantic Parser 50

Supervision from Unfiltered Data Social democratic parties in Greece Semantic Parser Executor Gold mapping is expensive to gather large scale. {Agreement for the New Greece, Agricultural and Labour Party, Free Citizens, Democratic Social Movement } 4 entities 51

Learning with Fact Incompleteness 1. Two-Stage Learning 2. Two kinds of data i. Small Annotated Dataset ii. Broad Coverage Lexical Statistics 52

Two-Stage Learning Social democratic parties in Greece CCG Domain Independent Parse Ontology Matcher ONT 53

Two-Stage Learning Social democratic parties in Greece CCG Domain Independent Parse Small training set with logical form British playwright, novelist and short story writer, ( ) x 500 54

Two-Stage Learning Social democratic parties in Greece CCG Domain Independent Parse Derivations are scored using a linear model Highest scoring logical form( second stage ) is passed to the 55

Two-Stage Learning ONT Ontology Matcher is a large, community authored knowledge base with: 20,000 Relations 10,000 Types 100 Domains 56

Broad Coverage Lexical Statistics : Wikipedia Category dataset 85 K vocabulary 365 K category 2.5 M entity 7 M category-entity pair 57

Mapping words to Freebase Attribute Pixar Feature Films Animation Films from Pixar Pixar songs 58

Mapping words to Freebase Attribute Pixar Feature Films Animation Films from Pixar Pixar songs Ratatouille Wall-E Finding Nemo Toy Story Monster Inc. Just keep swimming 59

Mapping words to Freebase Attribute Pixar Feature Films Animation Films from Pixar Pixar songs Ratatouille Wall-E Finding Nemo Toy Story Monster Inc. Just keep swimming (film.production_companies, Pixar) (film.film.produced_by, John Lasseter) (film.film.directed_by, John Lasseter) (film.film.film_festivals, 2011 Anima Mundi ) (film.film.starring.actor, Bob Peterson) 60

Mapping words to Freebase Attribute Pixar Feature Films Animation Films from Pixar Pixar songs Ratatouille Wall-E Finding Nemo Toy Story Monster Inc. Just keep swimming Large amount of information (film.production_companies, Pixar) for String - Entity - Freebase attribute alignment (film.film.produced_by, John Lasseter) (film.film.directed_by, John Lasseter) (film.film.film_festivals, 2011 Anima Mundi ) (film.film.starring.actor, Bob Peterson) 61

Mapping words to Freebase Attribute Pixar Feature Films Animation Films from Pixar Pixar songs Pointwise Mutual Information(PMI)= Ratatouille Wall-E Finding Nemo Toy Story Monster Inc. Just keep swimming P(String, Freebase Attribute) log( ) (film.production_companies, /m/0kk9v) (film.film.produced_by, John Lasseter) (film.film.directed_by, P(Freebase John Attribute) Lasseter) (film.film.film_festivals, m.0h15pp1 ) P(String) as a feature 62

Features Domain Independent Parse Parse Features: CCG Lexicon, Capitalization String -> Freebase features Wikipedia Lexical Statistics Ontology Match Surface Lexical Features String Match, Stem Match KnowledgeBase Features 63

Outline 1. Task and Applications 2. Data 3. Semantic Parser with Partial Ontology 4. Learning 5. Evaluation

Experimental Setup Training Set: 500 annotated Wikipedia Category Test Set (Manual Evaluation): 500 unseen Wikipedia Category 300 appositives Baseline: SVM Classifier trained with annotated logical forms 65

Applications Input: Referring Expression Resolution (QA) Noun Phrase Entity Attribute Extraction (IE) (Entity, Noun Phrase) British playwright, novelist and short story writer Somerset Maugham, British playwright, novelist and short story writer Output: Entity attributes for Freebase S. Maugham Nationality U.K S. Maugham Profession Novelist Sommerset Maugham S. Maugham Profession Playwright

Evaluation Metric: Referring Expression Resolution Alternative Rock Groups from Nevada Gold music group(x) music.artist.origin(x, NEVADA) music.genre.artist(alternative rock, x) X X Output music.artist.origin(x, NEVADA) music.genre.artist(hard rock, x) X Precision: 0.5 Recall: 0.3 F1:0.375 Exact Match: False 67

Referring Expression Resolution: (5 fold cross validation on Training Set) 40% 32% 35.1 24% 28.6 16% 15.9 8% 6.8 0% Exact Match F1 68

Referring Expression Resolution: (5 fold cross validation on Training Set) 40% 32% 24% 28.6 35.1 31.1 16% 8% 6.8 15.9 13.7 0% Exact Match F1 69

Referring Expression Resolution: (5 fold cross validation on Training Set) 40% 32% 24% 28.6 35.1 31.1 16% 8% 6.8 15.9 13.7 11 21.6 0% Exact Match F1 70

Referring Expression Resolution: (5 fold cross validation on Training Set) 40% Baseline Our System Without OpenSchema Without Lexical Statistics KCAZ13 32% 24% 28.6 35.1 31.1 16% 8% 0% 6.8 15.9 13.7 11 Exact Match 1.4 F1 21.6 7.06 71

Applications Input: Referring Expression Resolution (QA) Noun Phrase Entity Attribute Extraction (IE) (Entity, Noun Phrase) British playwright, novelist and short story writer Somerset Maugham, British playwright, novelist and short story writer Output: Entity attributes for Freebase (S. Maugham, Nationality, U.K) (S. Maugham, Profession, Novelist) (S. Maugham, Profession, Playwright) Sommerset Maugham

Entity Attribute Extraction: (5 fold cross validation on Training Set) Baseline Our System 50 40 44.2 37.3 37.7 30 26.5 32.8 30.6 20 10 0 Precision Recall F1 73

Entity Attribute Extraction: Test Result Baseline Our System 80 64 48 56.7 61.2 32 33.2 16 4.9 Wikipedia Appositive 74

Error analysis 10% : named entity retrieving failure 10% : spurious lexical match 10% : looking at different domain e.g: stage actor to film.actor 15% : wrong underspecified logical form 30% : mapping to superset or subset e.g: novel to book 75

Entity Attribute Extraction: Test Result 80 72.6% 60 58.7% 61.9% 40 20 13.9% 0 Baseline Our System Baseline Our System Wikipedia Appositive 76

Entity Attribute Extraction: Test Result 80 72.6% 60 40 58.7% 61.9% 7 million category-entity pairs, Given 66% missing facts, 12 million new facts! 20 13.9% 0 Baseline Our System Baseline Our System Wikipedia Appositive 77

Contributions Introduce large-scale semantic parsing datasets Partial grounding to large knowledge base Learn from two kinds of supervision: large-scale co-occurence statistics and small labeled tuning set 78

Future work Barack Hussein Obama is the 44th and current President of the United States, the first African American to hold the office. More compositional and complicated structures: orders, comparison, min, max, range Extending to declarative sentences 79

Questions? 80

The Number of Extracted Facts Baseline Our System 2.3 1.725 1.15 1.6 1.9 1.6 2 1.3 0.9 0.575 Wikipedia (Dev) Wikipedia (Test) Appositive 81

Referring Expression Resolution: Test Set IE Baseline Our System 50 Exact Match (%) 40 30 20 10 21.8 28.4 0 Wikipedia 0 Appositive 4.7 82