Generic object recognition

Similar documents
Instance Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Indexing local features and instance recognition

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

CS 1699: Intro to Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh September 1, 2015

ECS 189G: Intro to Computer Vision March 31 st, Yong Jae Lee Assistant Professor CS, UC Davis

CS 2770: Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

Lecture 5: Clustering and Segmentation Part 1

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Lecture 5: Clustering and Segmenta4on Part 1

VBM683 Machine Learning

Speech Recognition and Signal Processing for Broadcast News Transcription

A Statistical Framework to Enlarge the Potential of Digital TV Broadcasting

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

2. Problem formulation

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Outline. Why do we classify? Audio Classification

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

A Framework for Segmentation of Interview Videos

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

Hearing Sheet Music: Towards Visual Recognition of Printed Scores

Automatic Construction of Synthetic Musical Instruments and Performers

Summarizing Long First-Person Videos

Hidden Markov Model based dance recognition

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Broken Wires Diagnosis Method Numerical Simulation Based on Smart Cable Structure

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Auto classification and simulation of mask defects using SEM and CAD images

The Bias-Variance Tradeoff

Reducing False Positives in Video Shot Detection

Sarcasm Detection in Text: Design Document

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

Name Identification of People in News Video by Face Matching

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Joint Image and Text Representation for Aesthetics Analysis

Detecting Musical Key with Supervised Learning

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Singer Traits Identification using Deep Neural Network

Image Steganalysis: Challenges

SMART VEHICLE SCREENING SYSTEM USING ARTIFICIAL INTELLIGENCE METHODS

An Efficient Multi-Target SAR ATR Algorithm

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Week 14 Music Understanding and Classification

Detecting the Moment of Snap in Real-World Football Videos

Action07 Mid-range Business Plan

Topic 10. Multi-pitch Analysis

Analysing Musical Pieces Using harmony-analyser.org Tools

Chapter 2. Analysis of ICT Industrial Trends in the IoT Era. Part 1

Modeling memory for melodies

TRAFFIC SURVEILLANCE VIDEO MANAGEMENT SYSTEM

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

Guide to designing a device incorporating MEMSbased pico projection

A Comparison of Methods to Construct an Optimal Membership Function in a Fuzzy Database System

Man-Machine-Interface (Video) Nataliya Nadtoka coach: Jens Bialkowski

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

MUSI-6201 Computational Music Analysis

Robert Alexandru Dobre, Cristian Negrescu

CSE Data Visualization. Graphical Perception. Jeffrey Heer University of Washington

Lyric-Based Music Mood Recognition

Color in Information Visualization

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

Enhancing Semantic Features with Compositional Analysis for Scene Recognition

Audio Feature Extraction for Corpus Analysis

Gender and Age Estimation from Synthetic Face Images with Hierarchical Slow Feature Analysis

Security of the Internet of Things

DISTRIBUTION STATEMENT A 7001Ö

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

An Introduction to Deep Image Aesthetics

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Cie L*48.57 a* b* Covering the World. Solutions for paint and coatings color management

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Off-line Handwriting Recognition by Recurrent Error Propagation Networks

Revolutionary AOI Technology, Unbelievable Speed World's Fastest and Most Accurate 3D SPI

A Study of Predict Sales Based on Random Forest Classification

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Lyrics Classification using Naive Bayes

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Speech To Song Classification

Searching for Similar Phrases in Music Audio

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER. 6. AUTHOR(S) 5d. PROJECT NUMBER

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Multi-modal Analysis for Person Type Classification in News Video

Biometric Voting system

Frame Processing Time Deviations in Video Processors

Supervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Transcription:

Generic object recognition May 19 th, 2015 Yong Jae Lee UC Davis

Announcements PS3 out; due 6/3, 11:59 pm Sign attendance sheet (3 rd one) 2

Indexing local features 3 Kristen Grauman

Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space Quantize via clustering, let cluster centers be the prototype words Word #2 Descriptor s feature space Determine which word to assign to each new image region by finding the closest cluster center. 4 Kristen Grauman

Visual words Example: each group of patches belongs to the same visual word Figure from Sivic & Zisserman, ICCV 2003 5 Kristen Grauman

Inverted file index Database images are loaded into the index mapping words to image numbers 6 Kristen Grauman

Inverted file index When will this give us a significant gain in efficiency? New query image is mapped to indices of database images that share a word. 7 Kristen Grauman

Bags of visual words Summarize entire image based on its distribution (histogram) of word occurrences. Analogous to bag of words representation commonly used for documents. 8

Comparing bags of words Rank frames by normalized scalar product between their (possibly weighted) occurrence counts---nearest neighbor search for similar images. [1 8 1 4] [5 1 1 0] ssssss dd jj, qq = dd jj, qq dd jj qq = VV ii=1 dd jj ii qq(ii) VV ii=1 dd jj (ii) 2 VV ii=1 qq(ii) 2 d j q for vocabulary of V words 9 Kristen Grauman

Application: Large-Scale Retrieval 10 Query Results from 5k Flickr images (demo available for 100k set) [Philbin CVPR 07]

Spatial Verification: two basic strategies RANSAC Typically sort by BoW similarity as initial filter Verify by checking support (inliers) for possible transformations e.g., success if find a transformation with > N inlier correspondences Generalized Hough Transform Let each matched feature cast a vote on location, scale, orientation of the model object Verify parameters with enough votes 11 Kristen Grauman

RANSAC verification 12

Voting: Generalized Hough Transform If we use scale, rotation, and translation invariant local features, then each feature match gives an alignment hypothesis (for scale, translation, and orientation of model in image). Model Novel image 13 Adapted from Lana Lazebnik

Voting: Generalized Hough Transform A hypothesis generated by a single match may be unreliable, So let each match vote for a hypothesis in Hough space Model Novel image 14

What else can we borrow from text retrieval? China is forecasting a trade surplus of $90bn ( 51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. China, The trade, figures are likely to further annoy surplus, the US, which commerce, has long argued that China's exports are unfairly helped by a deliberately exports, undervalued imports, yuan. Beijing US, agrees the surplus yuan, is too high, bank, but says domestic, the yuan is only one factor. Bank of China governor Zhou Xiaochuan said foreign, the country increase, also needed to do more to boost domestic trade, demand value so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

tf-idf weighting Term frequency inverse document frequency Describe frame by frequency of each word within it, downweight words that appear often in the database (Standard weighting for text retrieval) Number of occurrences of word i in document d Number of words in document d Total number of documents in database Number of documents word i occurs in, in whole database 16 Kristen Grauman

17 Slide credit: Ondrej Chum Query expansion Query: golf green Results: - How can the grass on the greens at a golf course be so perfect? - For example, a skilled golfer expects to reach the green on a par-four hole in... - Manufactures and sells synthetic golf putting greens and mats. Irrelevant result can cause a `topic drift : - Volkswagen Golf, 1999, Green, 2000cc, petrol, manual,, hatchback, 94000miles, 2.0 GTi, 2 Registered Keepers, HPI Checked, Air-Conditioning, Front and Rear Parking Sensors, ABS, Alarm, Alloy

Query expansion Results Spatial verification Query image New results New query Chum, Philbin, Sivic, Isard, Zisserman: Total Recall, ICCV 2007 18 Slide credit: Ondrej Chum

Recognition via alignment Pros: Cons: Effective when we are able to find reliable features within clutter Great results for matching specific instances Scaling with number of models Spatial verification as post-processing not seamless, expensive for large-scale problems Not suited for generic category recognition 19 Kristen Grauman

Summary Matching local invariant features Useful to find objects and scenes Bag of words representation: quantize feature space to make discrete set of visual words Summarize image by distribution of words Index individual words Inverted index: pre-compute index to enable faster search at query time Recognition of instances via alignment: matching local features followed by spatial verification Robust fitting : RANSAC, GHT 20 Kristen Grauman

Making the Sky Searchable: Fast Geometric Hashing for Automated Astrometry Sam Roweis, Dustin Lang & Keir Mierle University of Toronto David Hogg & Michael Blanton New York University 21 21

Example A shot of the Great Nebula, by Jerry Lodriguss (c.2006), from astropix.com http://astrometry.net/gallery.html 22

Example An amateur shot of M100, by Filippo Ciferri (c.2007) from flickr.com http://astrometry.net/gallery.html 23

Example A beautiful image of Bode's nebula (c.2007) by Peter Bresseler, from starlightfriend.de http://astrometry.net/gallery.html 24

Today Generic object recognition 25

What does recognition involve? 26 Source: Fei-Fei Li, Rob Fergus, Antonio Torralba.

Verification: is that a lamp? 27 Source: Fei-Fei Li, Rob Fergus, Antonio Torralba.

Detection: are there people? 28 Source: Fei-Fei Li, Rob Fergus, Antonio Torralba.

Identification: is that Potala Palace? 29 Source: Fei-Fei Li, Rob Fergus, Antonio Torralba.

Object categorization mountain tree banner building street lamp people vendor 30 Source: Fei-Fei Li, Rob Fergus, Antonio Torralba.

Scene and context categorization outdoor city 31 Source: Fei-Fei Li, Rob Fergus, Antonio Torralba.

Instance-level recognition problem John s car 32

Generic categorization problem 33

Object Categorization Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label. Which categories are feasible visually? Fido German shepherd dog K. Grauman, B. Leibe animal living being 34

Visual Object Categories Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Basic Level Categories in human categorization [Rosch 76, Lakoff 87] The highest level at which category members have similar perceived shape The highest level at which a single mental image reflects the entire category The level at which human subjects are usually fastest at identifying category members The first level named and understood by children The highest level at which a person uses similar motor actions for interaction with category members K. Grauman, B. Leibe 35

Visual Object Categories Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Basic-level categories in humans seem to be defined predominantly visually. There is evidence that humans (usually) start with basic-level categorization before doing identification. K. Grauman, B. Leibe Basic level Individual level Abstract levels dog German shepherd Fido animal quadruped cat Doberman cow 36

How many object categories are there? Source: Fei-Fei Li, Rob Fergus, Antonio Torralba. 37 Biederman 1987

38

Other Types of Categories Functional Categories e.g. chairs = something you can sit on Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing K. Grauman, B. Leibe 39

Other Types of Categories Ad-hoc categories e.g. something you can find in an office environment Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing K. Grauman, B. Leibe 40

Why recognition? Recognition a fundamental part of perception e.g., robots, autonomous agents Organize and give access to visual content Connect to information Detect trends and themes 41

Posing visual queries Yeh et al., MIT Belhumeur et al. Kooaba, Bay & Quack et al. 42

Autonomous agents able to detect objects 43

Finding visually similar objects 44

Kristen Grauman Discovering visual patterns Objects Sivic & Zisserman Categories Lee & Grauman Actions Wang et al. 45

Kristen Grauman Auto-annotation Gammeter et al. T. Berg et al. 46

Kristen Grauman Challenges: robustness Illumination Object pose Clutter Occlusions Intra-class appearance Viewpoint 47

Challenges: robustness Realistic scenes are crowded, cluttered, have overlapping objects. 48

Challenges: importance of context 49 slide credit: Fei-Fei, Fergus & Torralba

Challenges: importance of context 50

Challenges: complexity 6 billion images 70 billion images 1 billion images served daily 10 billion images 100 hours uploaded per minute Almost 90% of web traffic is visual! 51

Kristen Grauman Challenges: complexity Thousands to millions of pixels in an image 30+ degrees of freedom in the pose of articulated objects (humans) About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991] 52

53 Kristen Grauman Challenges: learning with minimal supervision More Less

What works most reliably today Reading license plates, zip codes, checks 54 Source: Lana Lazebnik

What works most reliably today Reading license plates, zip codes, checks Fingerprint recognition 55 Source: Lana Lazebnik

What works most reliably today Reading license plates, zip codes, checks Fingerprint recognition Face detection 56 Source: Lana Lazebnik

What works most reliably today Reading license plates, zip codes, checks Fingerprint recognition Face detection Recognition of flat textured objects (CD covers, book covers, etc.) 57 Source: Lana Lazebnik

What works most reliably today Reading license plates, zip codes, checks Fingerprint recognition Face detection Recognition of flat textured objects (CD covers, book covers, etc.) Recognition of generic categories beginning to work! 58

59 Kristen Grauman Generic category recognition: basic framework Build/train object model Choose a representation Learn or fit parameters of model / classifier Generate candidates in new image Score the candidates

60 Kristen Grauman Generic category recognition: representation choice Window-based Part-based

Supervised classification Given a collection of labeled examples, come up with a function that will predict the labels of new examples. four nine Training examples? Novel input How good is some function we come up with to do the classification? Depends on Mistakes made Cost associated with the mistakes 61 Kristen Grauman

Kristen Grauman Supervised classification Given a collection of labeled examples, come up with a function that will predict the labels of new examples. Consider the two-class (binary) decision problem L(4 9): Loss of classifying a 4 as a 9 L(9 4): Loss of classifying a 9 as a 4 Risk of a classifier s is expected loss: ( 4 9 using s) L( 4 9) + Pr( 9 4 using s) ( 9 4) R( s) = Pr L We want to choose a classifier so as to minimize this total risk 62

Kristen Grauman Supervised classification Optimal classifier will minimize total risk. Feature value x At decision boundary, either choice of label yields same expected loss. If we choose class four at boundary, expected loss is: = P(class is 9 x) L(9 4) + P(class is 4 x) L(4 4) = P(class is 9 x) L(9 4) If we choose class nine at boundary, expected loss is: = P( class is 4 x) L(4 9) 63

Kristen Grauman Supervised classification Optimal classifier will minimize total risk. Feature value x At decision boundary, either choice of label yields same expected loss. So, best decision boundary is at point x where P( class is 9 x) L(9 4) = P(class is 4 x) L(4 9) To classify a new point, choose class with lowest expected loss; i.e., choose four if P( 4 x) L(4 9) > P(9 x) L(9 4) 64

Supervised classification P(4 x) P(9 x) Feature value x Optimal classifier will minimize total risk. At decision boundary, either choice of label yields same expected loss. So, best decision boundary is at point x where P( class is 9 x) L(9 4) = P(class is 4 x) L(4 9) To classify a new point, choose class with lowest expected loss; i.e., choose four if P( 4 x) L(4 9) > P(9 x) L(9 4) How to evaluate these probabilities? 65 Kristen Grauman

Probability Basic probability X is a random variable P(X) is the probability that X achieves a certain value called a PDF -probability distribution/density function or continuous X discrete X Conditional probability: P(X Y) probability of X given that we already know Y 66 Source: Steve Seitz

Example: learning skin colors We can represent a class-conditional density using a histogram (a non-parametric distribution) P(x skin) Percentage of skin pixels in each bin Feature x = Hue P(x not skin) Feature x = Hue 67 Kristen Grauman

Example: learning skin colors We can represent a class-conditional density using a histogram (a non-parametric distribution) P(x skin) Now we get a new image, and want to label each pixel as skin or non-skin. What s the probability we care about to do skin detection? Feature x = Hue Feature x = Hue P(x not skin) 68 Kristen Grauman

Bayes rule posterior likelihood prior P ( skin x) = P( x skin) P( skin) P( x) α P( skin x) P( x skin) P( skin) Where does the prior come from? Why use a prior? 69

Example: classifying skin pixels Now for every pixel in a new image, we can estimate probability that it is generated by skin. Brighter pixels higher probability of being skin Classify pixels based on these probabilities 70

Example: classifying skin pixels Using skin color-based face detection and pose estimation as a video-based interface Gary Bradski, 1998 72

Supervised classification Want to minimize the expected misclassification Two general strategies Use the training data to build representative probability model; separately model class-conditional densities and priors (generative) Directly construct a good decision boundary, model the posterior (discriminative) 73

Coming up Face detection Categorization with local features and part-based models Deep convolutional neural networks 74

Questions? See you Thursday! 75