Creating Mindmaps of Documents

Similar documents
Sarcasm Detection in Text: Design Document

Word Sense Disambiguation in Queries. Shaung Liu, Clement Yu, Weiyi Meng

Lyric-Based Music Mood Recognition

Basic Natural Language Processing

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

SMART VEHICLE SCREENING SYSTEM USING ARTIFICIAL INTELLIGENCE METHODS

arxiv: v1 [cs.cl] 24 Oct 2017

Lecture Notes in Artificial Intelligence 7250

Outline. Why do we classify? Audio Classification

COSC282 BIG DATA ANALYTICS FALL 2015 LECTURE 11 - OCT 21

Image-to-Markup Generation with Coarse-to-Fine Attention

Speech Recognition and Voice Separation for the Internet of Things

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Hidden Markov Model based dance recognition

Frankenstein: a Framework for musical improvisation. Davide Morelli

Versatile EMS and EMI measurements for the automobile sector

Beliefs & Biases in Web Search. Ryen White Microsoft Research

Semantic Analysis in Language Technology

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

Sentiment Aggregation using ConceptNet Ontology

Musical Hit Detection

A Discriminative Approach to Topic-based Citation Recommendation

Characterizing Literature Using Machine Learning Methods

The decoder in statistical machine translation: how does it work?

Detecting Musical Key with Supervised Learning

TJHSST Computer Systems Lab Senior Research Project Word Play Generation

Machine Translation Part 2, and the EM Algorithm

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

colors AN INTRODUCTION TO USING COLORS FOR UNITY v1.1

BayesianBand: Jam Session System based on Mutual Prediction by User and System

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

Improving MeSH Classification of Biomedical Articles using Citation Contexts

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

ECE Real Time Embedded Systems Final Project. Speeding Detecting System

Press Publications CMC-99 CMC-141

LSTM Neural Style Transfer in Music Using Computational Musicology

YOU ARE WHAT YOU LIKE INFORMATION LEAKAGE THROUGH USERS INTERESTS

Adaptive decoding of convolutional codes

AutoChorale An Automatic Music Generator. Jack Mi, Zhengtao Jin

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Survey of Hyponym Relation Extraction from Web Database Using Motif Patterns with Feature Extraction Model

Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus

Data flow architecture for high-speed optical processors

Enriching a Document Collection by Integrating Information Extraction and PDF Annotation

1/20/2010 WHY SHOULD WE PUBLISH AT ALL? WHY PUBLISH? INNOVATION ANALOGY HOW TO WRITE A PUBLISHABLE PAPER?

Phase III & IV. appreciation. Tia Blunden grad project process book

Chapter Eight: Distinguishing Between Fact and Opinion

Semi-supervised Musical Instrument Recognition

Chinese Word Sense Disambiguation with PageRank and HowNet

Voice Controlled Car System

Automatic Music Clustering using Audio Attributes

Audio Compression Technology for Voice Transmission

CS 7643: Deep Learning

SDS PODCAST EPISODE 96 FIVE MINUTE FRIDAY: THE BAYES THEOREM

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Identifying functions of citations with CiTalO

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

Instructions for Use of the 2018 NRR Contest Logger

EVOLVING DESIGN LAYOUT CASES TO SATISFY FENG SHUI CONSTRAINTS

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

Journal Papers. The Primary Archive for Your Work

Navigate to the Journal Profile page

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 7, NOVEMBER

High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers

THE NEXT GENERATION OF CITY MANAGEMENT INNOVATE TODAY TO MEET THE NEEDS OF TOMORROW

3rd Slide Set Computer Networks

Multiple Strategies to Analyze Monty Hall Problem. 4 Approaches to the Monty Hall Problem

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

Introduction to Bell Library Resources

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

A Framework for Segmentation of Interview Videos

K-means and Hierarchical Clustering Method to Improve our Understanding of Citation Contexts

Spectral Sounds Summary

LIBRARY ASSIGNMENT. Level Three RESEARCH IN A SUBJECT AREA. Austin Community College Produced by Library Services and Developmental Reading.

A combination of opinion mining and social network techniques for discussion analysis

Contents. pg pg pg Countable, Uncountable Nouns. pg pg pg pg pg Practice Test 1. pg.

Introduction to Natural Language Processing Phase 2: Question Answering

Post-Routing Layer Assignment for Double Patterning

Tech Paper. HMI Display Readability During Sinusoidal Vibration

Digital Cinema Specification. Agenda

Presentations- Correct the Errors

OPERATIONS SEQUENCING IN A CABLE ASSEMBLY SHOP

Heuristic Search & Local Search

LIBRARY ASSIGNMENT. Level Three. Austin Community College Produced by Library Services and Developmental Reading. Name. Date due

KS3 > Skills > Story openings and endings (NLS Y7) > Using clues to predict a story

Query By Humming: Finding Songs in a Polyphonic Database

IP Telephony and Some Factors that Influence Speech Quality

Welcome to the Library Intro to Human Services Fall 2009 Comparing Magazine and Journal Articles. What is a Periodical Database?

LESSON 2 Past Simple and Present perfect simple

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Discrete, Bounded Reasoning in Games

Temporal patterns of happiness and sarcasm detection in social media (Twitter)

INFORMATION SYSTEMS. Written examination. Wednesday 12 November 2003

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Algorithmic Music Composition

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Introduction to WordNet, HowNet, FrameNet and ConceptNet

Transcription:

Creating Mindmaps of Documents Using an Example of a News Surveillance System Oskar Gross Hannu Toivonen Teemu Hynonen Esther Galbrun February 6, 2011

Outline Motivation Bisociation Network Tpf-Idf-Tpu Measure News Surveillance System Bisociations for Computational Creativity

Motivation Epic information overload Finding connections between concepts Discovering novel (hopefully interesting) connections

Bisociation Networks Networks constructed of item (in our case term) pairs For an example consider the following set of item pairs: P = {(A, B), (A, C), (C, D), (D, A)} Now treating items as nodes and drawing an undirected connection between each pair gives us a graph B D A C

Text to Bisociation Network: Step 1 - Preprocessing Our goal is to apply this method on everyday texts Reasonable preprocessing is needed Wonderful Python package NLTK HTML plain text Named Entity Recognition Removing Stopwords Stemming

Text to Bisociation Network: Step 2 - Creating Pairs Tokenize document into sentences Sort words in sentences Remove duplicates Create Pairs Example: Consider the following text Thank you for the dinner and a very pleasant evening. Have your car take me to the airport. Mr Corleone is a man who insists on hearing bad news at once. Which is after preprocessing dinner even pleasant thank veri. airport bad car insist take. hear mr corleon man new onc.

Step 3 - Calculate Measure (1) Term pair frequency (tpf ) tpf sen ({t, u}, d) = {s d {t, u} s}, {s d} where s is a sentence, d is a document. Inverse document frequency (idf ) idf doc (t, u) = log C {d C {t, u} d}, where C is document collection, d is a document, (t, u) is a term pair.

Step 3 - Calculate Measure (2) Term pair uncorrelation (tpu) tpu sen ({t, u}, d) = ( min 2 v {t,u} Finally getting the tpf-idf-tpu measure ) {d C s d s.t. {t, u} s} {v d} M = tpf sen idf doc tpu sen

Applying to News Stories Currently crawling 7 news sources The corpus size is 65000 with 47 10 6 term pairs Incremental implementation

Goals for a News Surveillance System What is really new in a news story? Create a summary of a news story Decide in a glance whether the news story provides me anything Find related news stories

What is new? Sample from a news story which was published yesterday

Summary Generation For the sake of clarity, the summary is copy-pasted Generated by using the highest scoring term pairs and taking out the sentences from news story Northamptonshire Police seized computer equipment, drugs paraphernalia and mobile phones during the arrest of the 17-year-old from Corby. A teenager has been released on bail after being questioned by police about the supply of illegal drugs via the Facebook social media website. Randomly generated summary Police said a Facebook page, which had more than 200 friends, was shut down. Officers said they would be taking part in activities in schools to promote internet safety.

Glance on a News Story

Related news story published on February 6 Story headline Shake-up in Egyptian ruling party

Future Work Create intuitive and functional GUI Merging news stories We are still looking for a method for validating if any of this makes any sense Something like on the next slide

Usable News Surveillance System

Computational Creativity & Novelty One way for creating background associations of a domain Considering two backgrounds graphs from different domains Find an interesting association Translate through high abstraction to another Propose new creative connection in the other domain The background graph can also be used for novelty detection

Background Generation Extract keywords with tf idf algorithm Extract term pairs using log likelihood or tpf idf measure Take n top keywords and add them as nodes to graph G Take m term pairs and add them to the graph G If we have many components in G Connect components using Wordnet Synsets or extracted term pairs

The end Questions? It s amazing that the amount of news that happens in the world every day always just exactly fits the newspaper. Jerry Seinfeld