Word Sense Disambiguation in Queries. Shaung Liu, Clement Yu, Weiyi Meng

Similar documents
Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Improving MeSH Classification of Biomedical Articles using Citation Contexts

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Outline. Why do we classify? Audio Classification

Creating Mindmaps of Documents

Chinese Word Sense Disambiguation with PageRank and HowNet

A Categorical Approach for Recognizing Emotional Effects of Music

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Semantic Analysis in Language Technology

What are meanings? What do linguistic expressions stand for or denote?

COSC282 BIG DATA ANALYTICS FALL 2015 LECTURE 11 - OCT 21

Reducing False Positives in Video Shot Detection

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

A Framework for Segmentation of Interview Videos

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

Sentiment Aggregation using ConceptNet Ontology

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Sarcasm Detection in Text: Design Document

A probabilistic framework for audio-based tonal key and chord recognition

A Discriminative Approach to Topic-based Citation Recommendation

INTRODUCTION TO ONTOLOGY

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

Supplementary Note. Supplementary Table 1. Coverage in patent families with a granted. all patent. Nature Biotechnology: doi: /nbt.

Topics in Computer Music Instrument Identification. Ioanna Karydi

Alma PWG Report. February 2014

Neural Network Predicating Movie Box Office Performance

Automatic Labelling of tabla signals

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Research on concept-sememe tree and semantic relevance computation

Major Assignment: Independent Novel Study

Journal Papers. The Primary Archive for Your Work

Sentiment Analysis. Andrea Esuli

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli

Decision-Maker Preference Modeling in Interactive Multiobjective Optimization

Word Meaning and Similarity

Powerful Software Tools and Methods to Accelerate Test Program Development A Test Systems Strategies, Inc. (TSSI) White Paper.

Identifying functions of citations with CiTalO

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Lyric-Based Music Mood Recognition

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

A Music Retrieval System Using Melody and Lyric

Survey of Hyponym Relation Extraction from Web Database Using Motif Patterns with Feature Extraction Model

The Visual Denotations of Sentences. Julia Hockenmaier with Peter Young and Micah Hodosh University of Illinois

Query By Humming: Finding Songs in a Polyphonic Database

Citation Resolution: A method for evaluating context-based citation recommendation systems

Chapter Two - Finding and Evaluating Sources

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

An Introduction to Deep Image Aesthetics

Absolute Relevance? Ranking in the Scholarly Domain. Tamar Sadeh, PhD CNI, Baltimore, MD April 2012

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Polyphonic Music Retrieval: The N-gram Approach

CONDITIONS OF HAPPINESS

MPEG has been established as an international standard

TREE MODEL OF SYMBOLIC MUSIC FOR TONALITY GUESSING

Retrieval of textual song lyrics from sung inputs

Computational Modelling of Harmony

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Introduction of WSD in the UHF Band in Europe from a Broadcaster s Perspective

Book Indexes p. 49 Citation Indexes p. 49 Classified Indexes p. 51 Coordinate Indexes p. 51 Cumulative Indexes p. 51 Faceted Indexes p.

Toward Multi-Modal Music Emotion Classification

Enabling editors through machine learning

MODULE 4. Is Philosophy Research? Music Education Philosophy Journals and Symposia

1000 Words is Nothing: The Photographic Present in Relation to Informational Extraction

674 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 3, SEPTEMBER 2011

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

Semantics. Philipp Koehn. 16 November 2017

A Comparison of Methods to Construct an Optimal Membership Function in a Fuzzy Database System

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

KPI and SLA regime: November 2016 performance summary

Supervised Learning in Genre Classification

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Music Information Retrieval Community

Harmonic Generation based on Harmonicity Weightings

Music Genre Classification and Variance Comparison on Number of Genres

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Incorporating Domain Knowledge with Video and Voice Data Analysis in News Broadcasts

Advanced Data Structures and Algorithms

into a Cognitive Architecture

Learning Word Meanings and Descriptive Parameter Spaces from Music. Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab

Image Steganalysis: Challenges

Measuring Academic Impact

Comparison Parameters and Speaker Similarity Coincidence Criteria:

2013 Environmental Monitoring, Evaluation, and Protection (EMEP) Citation Analysis

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Citations and Annotations in Classics:Old Problems and New Per

WHEN LYRICS OUTPERFORM AUDIO FOR MUSIC MOOD CLASSIFICATION: A FEATURE ANALYSIS


SINGING is a popular social activity and a good way of expressing

CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central

Transcription:

Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng

Objectives (1) For each content word in a query, find its sense (meaning); (2) Add terms ( synonyms, hyponyms etc of the determined sense) to the query so as to improve retrieval effectiveness.

Example Query: Recycling automobile tire Recycling: sense 1: cause to repeat a cycle; Sense 2 : use again after processing disambiguated to sense 2: A synonym: Reuse Automobile tire has unique sense A synonym: car tire Generate phrases: reuse automobile tire, reuse car tire, recycle car tire

Our Approach to determine the sense of a content word t1 Find a phrase in the query containing t1. Let the phrase be (t1, t2). Each ti, i = 1, 2, has synonym sets, their definitions, hyponym sets, and their definitions The sense of t1 is determined by comparing these 4 pieces of information against those of t2

Comparison of information of t1 against that of t2 t1 t2 synonym synonym def( synonym) def( synonym) hyponym hyponym def (hyponym) def( hyponym)

An Example Phrase in query: philosophy Stoicism A synonym of one sense, S1, of philosophy is philosophical system The definition of one sense, S2, of Stoicism contains philosophical system. Thus, the sense of philosophy is S1 and that of Stoicism is S2.

Another example Query: induction, deduction The definition of one sense, S1, of induction and that of one sense, S2, of deduction have the common words reasoning, general. Thus, the sense of induction is determined to be S1 and that of deduction is determined to be S2.

What happens if multiple senses of a content word are obtained? t1 t2 Syn Syn Def(Syn) Ded(Syn) Hypo Hypo Def(Hypo) Def(Hypo) 16 cases Two or more cases yield different senses

Resolve Mutiple senses 2 key parameters: (1) Historical accuracies of the Cases: Determined by experiments (2) Likelihood that a word has a given sense: given by Wordnet (frequency)

What happens if the technique yields no sense (1) Choose the most likely sense, if it is at least 50% chance of being correct. (2) Use Web search to determine the sense.

Web search to determine sense of a term t Suppose t has two senses. From the definition of each sense of t, form a vector of content words, say V1, V2. Submit the query containing t to Google. From the top 20 documents, extract the content words around t to form a vector V. Choose sense i, if sim( V, Vi) is maximum.

Experimental Results TREC 2004 queries, robust track 250 queries 258 unique sense terms, 333 ambiguous terms

Case Frequency Web Applicability 65% 30% 5% Accuracy 89.4% 93% 81% Overall accuracy 90%

Similarity function of our system Similarity( Q, D) = ( phrase similarity, term similarity); phrase similarity = sum of idfs of phrases; term similarity = Okapi similarity D1 is ranked ahead of D2 if phrase-sim 1 > phrase-sim 2 or if phrase-sim1 =phrasesim 2 and term-sim 1 > term-sim 2

Recognition of phrases in queries A phrase, say p, is recognized in a query as (a) named entity: eg name of person or (b) dictionary phrase: in Wordnet or (c) simple phrase: containing two words or (d) complex phrase: more than 2 words

Recognition of phrases in documents A phrase p, say (term 1, term 2) appears in a document if the terms are within a certain distance. named entity: terms need to be adjacent dictionary phrase: terms within distance d1 simple phrase: terms within d2; complex phrase: d3; d1 < d2 < d3; d1, d2, d3 determined by decision tree

Impact of WSD on effectiveness No-WSD WSD improvement TREC6.28.32 17% TREC7.25.31 22.6% TREC8.29.32 11.4% TREC12.37.41 10.5% TREC13.38.42 10% Hard 50.18.20 14.7% Old 200.30.34 14.9% Overall.31.35 13.7% (previous best known result:.33)

Summary Utilizes 3 methods for word sense disambiguation. Case analysis, guessing based on frequency, Web search Yields 100% coverage and 90% accuracy Improves retrieval effectiveness

Comparison with other word sense disambiguation algorithm Earlier works mostly disambiguates words in documents rather than in queries Previous best result is around 71% accuracy.

Conclusion Accuracy of our current system is around 90%. Yields improvement in retrieval effectiveness Will attempt to improve both accuracy in word sense disambiguation and retrieval effectiveness