Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Similar documents
Reducing False Positives in Video Shot Detection

A Framework for Segmentation of Interview Videos

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

SHOT DETECTION METHOD FOR LOW BIT-RATE VIDEO CODING

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Multi-modal Analysis for Person Type Classification in News Video

Word Sense Disambiguation in Queries. Shaung Liu, Clement Yu, Weiyi Meng

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Subjective Similarity of Music: Data Collection for Individuality Analysis

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Smart Traffic Control System Using Image Processing

Retrieval of textual song lyrics from sung inputs

Automatic Labelling of tabla signals

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Theme Music Detection Graph Second

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction

Incorporating Domain Knowledge with Video and Voice Data Analysis in News Broadcasts

Topics in Computer Music Instrument Identification. Ioanna Karydi

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

AUDIO FEATURE EXTRACTION AND ANALYSIS FOR SCENE SEGMENTATION AND CLASSIFICATION

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite

Name Identification of People in News Video by Face Matching

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Singer Identification

Hidden Markov Model based dance recognition

Automatic Music Genre Classification

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Symbol Classification Approach for OMR of Square Notation Manuscripts

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

THE importance of music content analysis for musical

Speech Recognition and Signal Processing for Broadcast News Transcription

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

Assembling Personal Speech Collections by Monologue Scene Detection from a News Video Archive

COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL

Music Database Retrieval Based on Spectral Similarity

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

A Guide To Reporting Music To Channel 5

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

An Introduction to Deep Image Aesthetics

The H.26L Video Coding Project

Improving Frame Based Automatic Laughter Detection

MPEG has been established as an international standard

An Efficient Reduction of Area in Multistandard Transform Core

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

A Music Retrieval System Using Melody and Lyric

Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

White Paper : Achieving synthetic slow-motion in UHDTV. InSync Technology Ltd, UK

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

Goal Detection in Soccer Video: Role-Based Events Detection Approach

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Phone-based Plosive Detection

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

Automatic Soccer Video Analysis and Summarization

Re-Cinematography: Improving the Camera Dynamics of Casual Video

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

2. Problem formulation

Story Tracking in Video News Broadcasts

Visual Communication at Limited Colour Display Capability

Broadcast News Navigation using Story Segmentation

Music Radar: A Web-based Query by Humming System

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

A repetition-based framework for lyric alignment in popular songs

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

Outline. Why do we classify? Audio Classification

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Local TV Titling Rules September 2016

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

Indexing local features and instance recognition

Nobody Monitors Media Better

Visual Encoding Design

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

Composer Style Attribution

Design of a Fast Multi-Reference Frame Integer Motion Estimator for H.264/AVC

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Adaptive Distributed Compressed Video Sensing

Mood Tracking of Radio Station Broadcasts

AUTOMATIC LICENSE PLATE RECOGNITION(ALPR) ON EMBEDDED SYSTEM

SCALABLE video coding (SVC) is currently being developed

LAUGHTER serves as an expressive social signal in human

Video summarization based on camera motion and a subjective evaluation method

REIHE INFORMATIK 16/96 On the Detection and Recognition of Television Commercials R. Lienhart, C. Kuhmünch and W. Effelsberg Universität Mannheim

INFORMATION THEORY INSPIRED VIDEO CODING METHODS : TRUTH IS SOMETIMES BETTER THAN FICTION

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

Transcription:

Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval David Chen, Peter Vajda, Sam Tsai, Maryam Daneshi, Matt Yu, Huizhong Chen, Andre Araujo, Bernd Girod Image, Video, and Multimedia Systems Group Stanford University 1

2

3

4

Plays 30 second clip around query phrase match Would benefit from accurate segmentation of stories Would benefit from reliable generation of summary clips 5

Applications of Anchor Detection 1. Provide strong cues for story segmentation 2. Extract news story summaries/previews TURNING TO TECH, SHARES OF RESEARCH IN MOTION REBOUNDED FROM A ONE MONTH LOW. THE COMPANY'S NEXT GENERATION BLACKBERRY-10 PRODUCT LINE IS EXPECTED TO BE UNVEILED IN JUST A FEW WEEKS. YOU MAY REMEMBER SHARES SOLD OFF LAST WEEK AFTER THE COMPANY ISSUED A CAUTIOUS OUTLOOK FOR ITS FOURTH QUARTER RESULTS. BUT TODAY SHARES BOUNCED BACK: UP 11.5% TO A UNDER $12. 3. Identify anchors for general person recognition Anchor Brian Williams Anchor Susie Gharib Don t confuse anchors with other people in the videos 6

Applications of Preview Matching 1. Provide strong cues for story segmentation 2. Extract news story summaries/previews JUST A MESS. IN WASHINGTON, LAWMAKERS LEAVE TOWN FOR THE HOLIDAYS. THE CLOCK TICKS DOWN TO THE SO-CALLED FISCAL CLIFF. LATE TODAY, THE PRESIDENT HASTILY APPEARS TO ASK IF SOME OF THIS BUSINESS CAN BE FINISHED SOON. 3. Indicate the most important stories in a broadcast 7

Outline Related work in news video analysis Long-range visual similarity Anchor detection algorithm Preview matching algorithm Experimental results 8

Related Work in News Video Analysis Model-based anchor detection [Zhang et al., 1998] [Hanjalic et al., 1998] [Liu et al., 2000] Model-free anchor detection [Gao et al., 2002] [De Santo et al., 2006] [D Anna et al., 2007] [Ma et al., 2008] [Broilo et al., 2011] Spatio-temporal slices for reporter detection [Liu et al., 2007] [Zheng et al., 2010] Classification of news video shots [Bertini et al., 2001] [Xiao et al., 2010] [Lee et al., 2011] 9

Long-Range Visual Similarity 1 501 1001 0.5 0.45 0.4 0.35 Frame Number 1501 2001 0.3 0.25 0.2 2501 3001 0.15 0.1 0.05 0 1 501 1001 1501 2001 2501 3001 Frame Number 10

1 Long-Range Visual Similarity 0.5 501 1001 What causes these longrange visual similarities? 0.45 0.4 0.35 Frame Number 1501 2001 0.3 0.25 0.2 2501 3001 0.15 0.1 0.05 3501 0 1 501 1001 1501 2001 2501 3001 3501 Frame Number 11

Long-Range Visual Similarity NBC Nightly News on Dec. 21, 2012 12

Anchor: Brian Williams Long-Range Visual Similarity Reporter: Kelly O Donnell Analyst: David Gregory Reporter: Andrea Mitchell 13

Long-Range Visual Similarity 14

Keyframes Anchor Detection Pipeline Exclude Frames Without Faces Extract Image Signatures Compare Image Signatures Detections Include Temporally Nearby Candidates Prune Away False Candidates Form Initial Anchor Candidates Similarity Matrix Compare From Count pruned number initial set candidates of long-range candidates, to one local expand another peaks to and include the prune out temporally current candidates row nearby of which the candidates similarity are not very matrix which similar and are pick to also the initial very other candidates similar initial from in candidates appearance high-count rows 15

Intra-Episode vs. Inter-Episode Intra-episode: compare frames within a single episode of a news program Inter-episode: compare frames between different episodes of a news program 16

Preview Matching Pipeline Frame JUST A MESS COMING UP Matches Detect and Recognize Text Adaptively Crop to Preview Region Extract Image Signature Verify Geometry in Shortlist Compare Image Signatures Database of Image Signatures 17

REVV: Residual Enhanced Visual Vector Query Image Extract Local Features Visual Codebook Vector Quantize to Visual Words Perform Mean Aggregation of Residuals Regularize with Power Law Ranked List 1.74 Database Signatures 1.75 1.79 1.80 Compute Weighted Correlations Binarize Components from Sign Reduce Dimensions by LDA 1.83 1.84 18

Anchor detection Experimental Setup Training on 12 episodes of NBC Nightly News (1 anchor/episode), ABC World News (1 anchor/episode), Nightly Business Report (2 anchors/episode) Testing on 21 episodes of same three programs Measure precision / recall / F-score Preview matching Testing on 10 episodes of NBC Nightly News and ABC World News Measure precision / recall / F-score Comparison of two memory-efficient signatures GIST: 66 MB/episode [Oliva et al., 2001] [Douze et al., 2009] REVV: 10 MB/episode [Chen et al., 2013] 19

Anchor Detection Results Recall Precision F-Score GIST Intra 0.53 0.84 0.65 REVV Intra 0.87 0.90 0.88 20

Anchor Detection Results Recall Precision F-Score REVV Intra 0.87 0.90 0.88 REVV Intra + Inter 0.90 0.91 0.90 21

Preview Matching Results Type A: Preview occurs at beginning of broadcast Recall Precision F-Score GIST 0.48 1.00 0.65 REVV 0.90 1.00 0.95 22

Preview Matching Results Type B: Preview occurs prior to a commercial Recall Precision F-Score GIST 0.62 1.00 0.77 REVV 0.93 1.00 0.96 23

Conclusions Long-range visual similarity in news videos provides a general and effective method for anchor detection and preview matching A robust image signature is required to handle challenging appearance variations throughout a newscast The image signature should be memory-efficient to enable parallelized processing of large video archives 24

Thank You dmchen@stanford.edu