Detect Missing Attributes for Entities in Knowledge Bases via Hierarchical Clustering

Similar documents
Music Radar: A Web-based Query by Humming System

Document Analysis Support for the Manual Auditing of Elections

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

Subjective Similarity of Music: Data Collection for Individuality Analysis

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

MUSI-6201 Computational Music Analysis

Creating a Feature Vector to Identify Similarity between MIDI Files

Singer Traits Identification using Deep Neural Network

Music Recommendation from Song Sets

Transcription An Historical Overview

Topics in Computer Music Instrument Identification. Ioanna Karydi

Automatic Piano Music Transcription

Features for Audio and Music Classification

EVOLVING DESIGN LAYOUT CASES TO SATISFY FENG SHUI CONSTRAINTS

Post-Routing Layer Assignment for Double Patterning

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Audio: Generation & Extraction. Charu Jaiswal

An Introduction to Deep Image Aesthetics

A wavelet-based approach to the discovery of themes and sections in monophonic melodies Velarde, Gissel; Meredith, David

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

PYROPTIX TM IMAGE PROCESSING SOFTWARE

METHOD TO DETECT GTTM LOCAL GROUPING BOUNDARIES BASED ON CLUSTERING AND STATISTICAL LEARNING

Music Source Separation

YOU ARE WHAT YOU LIKE INFORMATION LEAKAGE THROUGH USERS INTERESTS

Authorship Verification with the Minmax Metric

base calling: PHRED...

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

GRADE 11 NOVEMBER 2015 MUSIC P2

Networks of Things. J. Voas Computer Scientist. National Institute of Standards and Technology

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

INTRODUCING GLAM ROCK

Supplementary Note. Supplementary Table 1. Coverage in patent families with a granted. all patent. Nature Biotechnology: doi: /nbt.

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Supporting Information

InPlace User Guide for Faculty of Arts, Education and Social Sciences Staff

A Generic Semantic-based Framework for Cross-domain Recommendation

I came in like a wrecking ball; I never hit so hard in love. ( Wrecking Ball Miley Cyrus) It s like you re my mirror, my mirror staring back at me.

Part 1: Introduction to Computer Graphics

Latest News. In the Studio

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Reducing False Positives in Video Shot Detection

Research on the Development of Education Level of University Sports Aesthetics Based on AHP

Music Genre Classification

arxiv: v1 [cs.sd] 8 Jun 2016

Citation & Journal Impact Analysis

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Breakscience. Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass

WHO S CITING YOU? TRACKING THE IMPACT OF YOUR RESEARCH PRACTICAL PROFESSOR WORKSHOPS MISSISSIPPI STATE UNIVERSITY LIBRARIES

Perceptual Evaluation of Automatically Extracted Musical Motives

Music Information Retrieval with Temporal Features and Timbre

Chinese Word Sense Disambiguation with PageRank and HowNet

Music Information Retrieval Community

Music Structure Analysis

Written Progress Report. Automated High Beam System

Chart Hits Of (Chart Hits Of Piano Vocal Guitar) By Hal Leonard Corp. READ ONLINE

Shades of Music. Projektarbeit

Imaging of Impacted Composite Armours using Data Clustering

BitWise (V2.1 and later) includes features for determining AP240 settings and measuring the Single Ion Area.

OpenOne Outage Management System

A repetition-based framework for lyric alignment in popular songs

Understanding Book Popularity on Goodreads

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Jazz Melody Generation and Recognition

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

Repositorio Institucional de la Universidad Autónoma de Madrid.

Detecting Musical Key with Supervised Learning

Improving Frame FEC Efficiency. Improving Frame FEC Efficiency. Using Frame Bursts. Lior Khermosh, Passave. Ariel Maislos, Passave

Year and Best Male Rock Vocal Performance, as well as two American Music Awards. It was inducted into the Music Video Producers Hall of Fame.

Usage metrics: tools for evaluating science collections

Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index

Cooperative music composition platform

Enabling editors through machine learning

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

ECG SIGNAL COMPRESSION BASED ON FRACTALS AND RLE

NATIONAL SENIOR CERTIFICATE GRADE 12

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Metonymy Research in Cognitive Linguistics. LUO Rui-feng

INTERACTIVE GTTM ANALYZER

A simplified fractal image compression algorithm

Protégé and the Kasimir decision-support system

Pattern Based Melody Matching Approach to Music Information Retrieval

In each of the above television series, Shon succeeded in gaining tens of thousands of new fans, most of whom are young girls under the age of 16.

Music Information Retrieval

Music Genre Classification and Variance Comparison on Number of Genres

Imagine - As Recorded By John Lennon - SATB Choral Sheet Music READ ONLINE

Automatic Music Clustering using Audio Attributes

Audio Structure Analysis

Automatically Discovering Talented Musicians with Acoustic Analysis of YouTube Videos

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

mmwave Radar Sensor Auto Radar Apps Webinar: Vehicle Occupancy Detection

Leaving Certificate 2013

Fingerprint Verification System

SOUL MUSIC. A merger of gospel-charged singing, secular subject matter, and funk rhythms.

Voice Controlled Car System

TRAFFIC SURVEILLANCE VIDEO MANAGEMENT SYSTEM

Transcription:

Detect Missing Attributes for Entities in Knowledge Bases via Hierarchical Clustering Bingfeng Luo, Huanquan Lu, Yigang Diao, Yansong Feng and Dongyan Zhao ICST, Peking University

Motivations Entities often have missing attributes Human-made KBs: human negligence Auto-constructed KBs: incompleteness of source data, imperfectness of algorithm Detecting missing attributes is useful Present possible missing attributes to open KB (like Wikipedia) editors Rescore candidate triples proposed by relation extraction tools Taylor Swift (Wikipedia) Occupation: singer Instrument: vocal, guitar Genre: pop, country, rock Born: Pennsylvania The attribute Record Label is missing! (She works for Big Machine)

Overview Basic Idea Entities in the same category may share some common attributes Algorithm Framework Build a cluster system over the entities in KB Apply our basic idea in each cluster to find missing attributes Taylor Swift Occupation: singer Instrument: vocal, guitar Genre: pop, country, rock Born: Pennsylvania Michael Jackson Justin Bieber Pop singer The attribute Record Label is missing Lady Gaga Robin Thicke Most of Pop Singers have the attribute Record Label Taylor Swift should also have that attribute Record Label

Building Clustering System Entity Representation Each entity has several (attribute, value) pairs The value of each attribute can be represented as a vector (explain later) Each entity can be represented as a set of vectors, and each vector is an attribute value Taylor Swift Occupation: singer Instrument: vocal, guitar Genre: pop, country, rock Born: Pennsylvania Taylor Swift Occupation: [1.1, -0.3, 0.4] Instrument: [0.1, 0.2, -0.4], [-0.4, 1.0, 0.9] Genre: [0.2, 0.3, -0.5], [0.4, 0.8, 0.9], [0.3, 0.9, -0.8] Born: [0.1, 0,3, 0,2]

Building Clustering System How to Acquire Attribute Value Vector? Numeric values and date values are not very useful We only consider string values when clustering Use word2vec to convert words or phrases in the attribute value into vectors If a value have several words or phrases, will simple average them up Not Useful Clusters: 1.70, 1.71, 1.80 Useful Clusters: High, Medium, Short Useful clusters need human assistance Birth Date: 1982.07.24 Height: 1.75 Birth Place: Beijing word2vec [1.32, 0.43, -0.83,, 0.55]

Building Clustering System How to cluster entities? Clustering entities directly is hard, since it contains so many vectors Instead, we cluster attribute values within each attribute singer director SCAN Clustering Algorithm (allow overlap) [1.0, 0.8, 0.9] artist [0.95, 0.7, 0.9] = 1 2 [0.9, 0.6, 0.9] ([1.0, 0.8, 0.9] + [0.9, 0.6, 0.9]) Occupation physicist [-1.0, -0.5, 0.1] chemist [-1.0, -0.4, 0.1] scientist [-1.0, 0.4, 0.07] = 1 3 biologist [-1.0, -0.3, 0.0] ([-1.0, -0.5, 0.1] + [-1.0, -0.4, 0.1] + [-1.0, -0.3, 0.0])

Building Clustering System Clustering Within Attribute Keep clustering to form hierarchical structure The average height of the clustering system is about 4 layers singer director [1.0, 0.8, 0.9] [0.9, 0.6, 0.9] Clustering Clustering [0.95, 0.7, 0.9] [0.75, 0.72, 0.33] Clustering Occupation physicist [-1.0, -0.5, 0.1] [0.15, 0.1, 0.2] chemist [-1.0, -0.4, 0.1] [-1.0, 0.4, 0.07] biologist [-1.0, -0.3, 0.0] [-0.15, 0.11, 0.11] [0.01, 0.12, 0.33]

Building Clustering System Clustering Within Attribute Build clustering hierarchy within each attribute [1.0, 0.8, 0.9] [0.95, 0.7, 0.9] [0.75, 0.72, 0.33] [0.9, 0.6, 0.9] [0.15, 0.1, 0.2] [-1.0, -0.5, 0.1] Occupation [-1.0, -0.4, 0.1] [-1.0, 0.4, 0.07] [0.01, 0.12, 0.33] [-1.0, -0.3, 0.0] [-0.15, 0.11, 0.11] [-0.1, 0.2, 0.3] [-0.05, 0.15, 0.3] [0.0, 0.1, 0.3] Instrument [-1.1, -0.5, 0.0] [-1.0, 0.4, 0.07]

Building Cluster System Assign entities to clusters Assign entities to clusters according to its attribute value guitar Taylor Swift Occupation: singer instrument: guitar, vocal genre: pop, country, rock born: Pennsylvania Occupation singer director physicist biologist artist scientist instrument violin erhu piano accordion string instrument clavier

Building Cluster System Intersection of different clusters The intersection of different clusters is also meaningful We will keep the new cluster only when its size is large enough (contains a fair number of entities) Within attribute: Genre: country country pop Between attribute: Genre: pop pop pop singer Occupation: singer

Detecting missing attributes Old entities Exist in the KB when building the cluster system Already assigned clusters to them Simply apply our basic idea Taylor Swift Occupation: singer Instrument: vocal, guitar Genre: pop, country, rock Born: Pennsylvania Michael Jackson Pop singer The attribute Record Label is missing Lady Gaga Justin Bieber Robin Thicke Most of Pop Singers have the attribute Record Label Taylor Swift should also have that attribute Record Label

Detecting missing attributes New entities Not exist in the KB when building the cluster system Find clusters for them first Vector Similarity Match Justin Bieber Occupation: singer Instrument: vocal, guitar Genre: pop, R&B Born: London Justin Bieber Occupation: [1.1, -0.3, 0.4] Instrument: [0.1, 0.2, -0.4], [-0.4, 1.0, 0.9] Genre: [0.2, 0.3, -0.5], [0.1, 0.8, 0.2] Born: [-0.1, 0,8, 0,9] Within Occupation: singer [1.1, -0.3, 0.4] Should also belong to cluster artist, and other clusters contains cluster singer

Summary Represent entity as a set of vectors Taylor Swift Occupation: singer Instrument: vocal, guitar Genre: pop, country, rock Born: Pennsylvania Taylor Swift Occupation: [1.1, -0.3, 0.4] Instrument: [0.1, 0.2, -0.4], [-0.4, 1.0, 0.9] Genre: [0.2, 0.3, -0.5], [0.4, 0.8, 0.9], [0.3, 0.9, -0.8] Born: [0.1, 0,3, 0,2] Assign clusters to entities Michael Jackson Justin Bieber Pop singer Generated From Lady Gaga Robin Thicke Occupation Instrument Build clustering system within attributes AND interest these clusters [1.0, 0.8, 0.9] [0.95, 0.7, 0.9] [0.9, 0.6, 0.9] [0.15, 0.1, 0.2] [-1.0, -0.5, 0.1] [-1.0, -0.4, 0.1] [-1.0, 0.4, 0.07] [-1.0, -0.3, 0.0] [-0.15, 0.11, 0.11] [-0.1, 0.2, 0.3] [0.0, 0.1, 0.3] [-1.0, 0.4, 0.07] [-1.1, -0.5, 0.0] Detect missing attributes based on the Basic Idea The attribute record label is missing

Experiment Dataset 20,000 randomly sampled person entities in DBpedia Evaluation First Method: randomly delete one the attribute of an entity, see if our method can find this attribute back or not Second Method: human evaluation, see if the proposed missing attributes are reasonable or not First Method Taylor Swift Occupation: singer Instrument: vocal, guitar Genre: pop, country, rock Born: Pennsylvania Record Label: Big Machine DELETE! Can our algorithm find the attribute Record Label back? Second Method Missing Attribute Proposed for Taylor Swift Record Label Birth date Allegiance (for military people) Are these proposed missing attributes reasonable?

Experiment Comparison method: Z. Abedjan and F. Naumann.(2013) First evaluation method Top 5 means the algorithm proposes 5 most probable missing attributes, see if the deleted attribute is contained in The metric is precision (if the deleted attribute is contained, then a match) Top1 Top5 top10 Our Method 84.43% 95.36% 96.05% Abedjan & Naumann NA 51.00% 71.40%

Experiment Comparison of old entities and new entities First evaluation method May be not fair when most of the proposed missing attributes are reasonable, except that the deleted one has a lower rank Top1 Top5 top10 Old Entities 84.43% 95.36% 96.05% New Entities 33.86% 45.45% 46.07% Not Fair Case: Proposed Missing Attributes (Top 5): 1. (good proposal) 2. (good proposal) 3. (good proposal) 4. (not good) 5. (good proposal) 18. Deleted Attribute

Experiment Human Evaluation Randomly choose 1000 old entities and 1000 new entities Propose all the attributes with a score higher than the threshold (no more than 10) Old Entity New Entity Precision 96.72% 97.09%

Conclusion Performance Our method has a high precision, more than 95% suggested attributes are reasonable Our method is good enough to be used in real world Future Work Try Chinese data Combine our method with relation extraction

Q & A