CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

Similar documents
CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

Generic object recognition

CS 1699: Intro to Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh September 1, 2015

The Bias-Variance Tradeoff

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

CS 2770: Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

Indexing local features and instance recognition

Instance Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

ECS 189G: Intro to Computer Vision March 31 st, Yong Jae Lee Assistant Professor CS, UC Davis

FOIL it! Find One mismatch between Image and Language caption

Lecture 5: Clustering and Segmentation Part 1

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

MUSI-6201 Computational Music Analysis

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

Summarizing Long First-Person Videos

Hearing Sheet Music: Towards Visual Recognition of Printed Scores

Topics in Computer Music Instrument Identification. Ioanna Karydi

Enhancing Semantic Features with Compositional Analysis for Scene Recognition

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

2. Problem formulation

An Introduction to Deep Image Aesthetics

Using Genre Classification to Make Content-based Music Recommendations

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Semantic Image Segmentation via Deep Parsing Network

Automatic Music Clustering using Audio Attributes

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Learning beautiful (and ugly) attributes

Lecture 5: Clustering and Segmenta4on Part 1

Shades of Music. Projektarbeit

Color in Information Visualization

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

Machine Vision System for Color Sorting Wood Edge-Glued Panel Parts

Universität Bamberg Angewandte Informatik. Seminar KI: gestern, heute, morgen. We are Humor Beings. Understanding and Predicting visual Humor

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Reducing False Positives in Video Shot Detection

The theory of data visualisation

A Framework for Segmentation of Interview Videos

8K Resolution: Making Hyperrealism a Reality

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Outline. Why do we classify? Audio Classification

A Survey of Audio-Based Music Classification and Annotation

ImageNet Auto-Annotation with Segmentation Propagation

Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Joint Image and Text Representation for Aesthetics Analysis

VBM683 Machine Learning

gresearch Focus Cognitive Sciences

Feature-Based Analysis of Haydn String Quartets

Singer Recognition and Modeling Singer Error

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Lesson 10 November 10, 2009 BMC Elementary

Image Steganalysis: Challenges

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Audio Feature Extraction for Corpus Analysis

Lecture 9 Source Separation

IMAGE AESTHETIC PREDICTORS BASED ON WEIGHTED CNNS. Oce Print Logic Technologies, Creteil, France

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Large Scale Concepts and Classifiers for Describing Visual Sentiment in Social Multimedia

Evaluating Melodic Encodings for Use in Cover Song Identification

Hidden Markov Model based dance recognition

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

Bar Codes to the Rescue!

Recognising Cello Performers using Timbre Models

Basic Sight Words - Preprimer

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

jsymbolic 2: New Developments and Research Opportunities

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Detecting the Moment of Snap in Real-World Football Videos

Acoustic Scene Classification

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

An Efficient Multi-Target SAR ATR Algorithm

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

Natural Scenes Are Indeed Preferred, but Image Quality Might Have the Last Word

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

Speech Recognition and Signal Processing for Broadcast News Transcription

Distortion Analysis Of Tamil Language Characters Recognition

Pedestrian Detection with a Large-Field-Of-View Deep Network

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Modeling memory for melodies

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 3. A Network-Centric View on HPC

Smart Traffic Control System Using Image Processing

Vocabulary Sentences & Conversation Color Shape Math. blue green. Vocabulary Sentences & Conversation Color Shape Math. blue brown

SOCIAL NARRATIVE: GOING TO LIFELINE THEATRE Welcome to Lifeline Theatre! We look forward to having you attend Bunnicula on November 3 rd.

Multi-modal Analysis for Person Type Classification in News Video

Transcription:

CS 1674: Intro to Computer Vision Intro to Recognition Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

Plan for today Examples of visual recognition problems What should we recognize? Recognition pipeline Features Data Overview of some methods for classification K-Nearest Neighbors Linear classifiers

Some translations Feature vector = descriptor = representation Recognition often involves classification Classes = categories (hence classification = categorization) Training = learning a model (e.g. classifier), happens at training time from training data Classification = prediction, happens at test time

Slide credit: D. Hoiem

Classification Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Decision boundary Zebra Non-zebra Slide credit: L. Lazebnik

Classification Assign input vector to one of two or more classes Any decision rule divides the input space into decision regions separated by decision boundaries Slide credit: L. Lazebnik

Example: Spam filter Slide credit: L. Lazebnik

Examples of Categorization in Vision Part or object detection E.g., for each window: face or non-face? Scene categorization Indoor vs. outdoor, urban, forest, kitchen, etc. Action recognition Picking up vs. sitting down vs. standing Emotion recognition Happy vs. scared vs. surprised Region classification Label pixels into different object/surface categories Boundary classification Boundary vs. non-boundary Etc, etc. Adapted from D. Hoiem

What do you see in this image? Trees Bear Camera Man Can I put stuff in it? Rabbit Grass Forest Slide credit: D. Hoiem

Describe, predict, or interact with the object based on visual cues Is it dangerous? Is it alive? How fast does it run? Does it have a tail? Is it soft? Can I poke with it? Slide credit: D. Hoiem

Image categorization Two-class (binary): Cat vs Dog Adapted from D. Hoiem

Image categorization Multi-class (often): Object recognition Caltech 101 Average Object Images Adapted from D. Hoiem

Image categorization Fine-grained recognition Visipedia Project Slide credit: D. Hoiem

Image categorization Place recognition Places Database [Zhou et al. NIPS 2014] Slide credit: D. Hoiem

Image categorization Dating historical photos 1940 1953 1966 1977 [Palermo et al. ECCV 2012] Slide credit: D. Hoiem

Image categorization Image style recognition [Karayev et al. BMVC 2014] Slide credit: D. Hoiem

Region categorization Layout prediction Assign regions to orientation Geometric context [Hoiem et al. IJCV 2007] Assign regions to depth Make3D [Saxena et al. PAMI 2008] Slide credit: D. Hoiem

Region categorization Material recognition [Bell et al. CVPR 2015] Slide credit: D. Hoiem

Attribute-based recognition A. Farhadi, I. Endres, D. Hoiem, and D Forsyth. Describing Objects by their Attributes. CVPR 2009.

Attribute-based recognition A. Kovashka, S. Vijayanarasimhan, and K. Grauman. Actively Selecting Annotations Among Objects and Attributes. ICCV 2011.

Attribute-based search A. Kovashka, D. Parikh and K. Grauman. WhittleSearch: Image Search with Relative Attribute Feedback. CVPR 2012.

Generic categorization problem Slide credit: K. Grauman

Instance-level recognition problem John s car Which one do you think is harder: Generic or instance-level recognition? Adapted from K. Grauman

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Visual Object Categories What stuff should we bother to recognize? Basic Level Categories in human categorization [Rosch 76, Lakoff 87] The highest level at which category members have similar perceived shape The highest level at which a single mental image reflects the entire category The level at which human subjects are usually fastest at identifying category members The first level named and understood by children The highest level at which a person uses similar motor actions for interaction with category members K. Grauman, B. Leibe

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Visual Object Categories Basic-level categories in humans seem to be defined predominantly visually. There is evidence that humans (usually) start with basic-level categorization before doing identification. Basic-level categorization is easier and faster for humans than object identification! Abstract levels animal quadruped Basic level dog cat cow German shepherd Doberman K. Grauman, B. Leibe Individual level Fido

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label. Which categories are feasible visually? Fido German shepherd dog animal living being K. Grauman, B. Leibe

How many object categories are there? Source: Fei-Fei Li, Rob Fergus, Antonio Torralba. Biederman 1987

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Other Types of Categories Functional Categories e.g. chairs = something you can sit on K. Grauman, B. Leibe

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Other Types of Categories Ad-hoc categories e.g. something you can find in an office environment K. Grauman, B. Leibe

Why recognition? Recognition a fundamental part of perception e.g., robots, autonomous agents Organize and give access to visual content Connect to information Detect trends and themes Slide credit: K. Grauman

Why is recognition hard?

Recognition: A machine learning approach

The machine learning framework Apply a prediction function to a feature representation of the image to get the desired output: f( ) = apple f( ) = tomato f( ) = cow Slide credit: L. Lazebnik

The machine learning framework y = f(x) output prediction function image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) Slide credit: L. Lazebnik

Training Training Images Steps Image Features Training Labels Training Learned model Testing Test Image Image Features Learned model Prediction Slide credit: D. Hoiem and L. Lazebnik

Q: What are good features for recognizing a beach? Slide credit: D. Hoiem

Q: What are good features for recognizing cloth fabric? Slide credit: D. Hoiem

Q: What are good features for recognizing a mug? Slide credit: D. Hoiem

Q: What are good features for fine-grained recognition? Cardigan Welsh Corgi? Pembroke Welsh Corgi What breed is this dog? Slide credit: J. Deng

What are the right features? Depend on what you want to know! Object: shape Local shape info, shading, shadows, texture Material properties: albedo, feel, hardness Color, texture Scene: geometric layout linear perspective, gradients, line segments Action: motion Optical flow, tracked points Slide credit: D. Hoiem

What kind of things do we compute histograms of? Color L*a*b* color space HSV color space Texture (filter banks or HOG over regions) Slide credit: D. Hoiem

What kind of things do we compute histograms of? Histograms of descriptors SIFT [Lowe IJCV 2004] Bag of visual words Slide credit: D. Hoiem

Bag of visual words Image patches Cluster patches BoW histogram Slide credit: D. Hoiem

Training Training Images Steps Image Features Training Labels Training Learned model Testing Test Image Image Features Learned model Prediction Slide credit: D. Hoiem and L. Lazebnik

Recognition training data Images in the training set must be annotated with the correct answer that the model is expected to produce Motorbike Slide credit: L. Lazebnik

Datasets today ImageNet: 22k categories, 14mil images Microsoft COCO: 70 categories, 300k images PASCAL: 20 categories, 12k images SUN: 5k categories, 130k images

The PASCAL Visual Object Classes Challenge (2005-2012) http://pascallin.ecs.soton.ac.uk/challenges/voc/ Challenge classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor Dataset size (by 2012): 11.5K training/validation images, 27K bounding boxes, 7K segmentations Slide credit: L. Lazebnik

PASCAL competitions http://pascallin.ecs.soton.ac.uk/challenges/voc/ Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image Detection: Predicting the bounding box (if any) and label of each object from the twenty target classes in the test image Adapted from L. Lazebnik

Challenges: robustness Illumination Object pose Clutter Occlusions Intra-class appearance Viewpoint Slide credit: K. Grauman

Challenges: robustness Realistic scenes are crowded, cluttered, have overlapping objects. Slide credit: K. Grauman

Challenges: importance of context Slide credit: Fei-Fei, Fergus & Torralba

Painter identification How would you learn to identify the author of a painting? Goya Kirchner Klimt Marc Monet Van Gogh

One way to think about it Training labels dictate that two examples are the same or different, in some sense Features and distances define visual similarity Goal of training is to learn feature weights so that visual similarity predicts label similarity Linear classifier: confidence in positive label is a weighted sum of features What are the weights? We want the simplest function that is confidently correct Adapted from D. Hoiem

Nearest neighbor classifier Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required! Slide credit: L. Lazebnik

K-Nearest Neighbors classification For a new point, find the k closest points from training data Labels of the k points vote to classify Black = negative Red = positive k = 5 If query lands here, the 5 NN consist of 3 negatives and 2 positives, so we classify it as negative. Slide credit: D. Lowe

1-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: D. Hoiem

3-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: D. Hoiem

5-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 What are the tradeoffs of having a too large k? Too small k? Slide credit: D. Hoiem

A nearest neighbor recognition example: im2gps: Estimating Geographic Information from a Single Image. James Hays and Alexei Efros. CVPR 2008. http://graphics.cs.cmu.edu/projects/im2gps/

Where in the World? Slides: James Hays

Where in the World? Slides: James Hays

Where in the World? Slides: James Hays

How much can an image tell about its geographic location? Slide credit: James Hays

Nearest Neighbors according to gist + bag of SIFT + color histogram + a few others Slide credit: James Hays

6+ million geotagged photos by 109,788 photographers Slides: James Hays

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Slides: James Hays

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

The Importance of Data [Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Discriminative classifiers Learn a simple function of the input features that correctly predicts the true labels on the training set y = f x Training Goals 1. Accurate classification of training data 2. Correct classifications are confident 3. Classification function is simple Slide credit: D. Hoiem

What about this line? Linear classifier Find a linear function to separate the classes f(x) = sgn(w 1 x 1 + w 2 x 2 + + w D x D ) = sgn(w x) Slide credit: L. Lazebnik

NN vs. linear classifiers NN pros: + Simple to implement + Decision boundaries not necessarily linear + Works for any number of classes + Nonparametric method NN cons: Need good distance function Slow at test time (large search problem to find neighbors) Storage of data Linear pros: + Low-dimensional parametric representation + Very fast at test time Linear cons: Works for two classes How to train the linear function? What if data is not linearly separable? Adapted from L. Lazebnik

Evaluating Classifiers Accuracy # correctly classified / # all test examples Precision/recall Precision = # retrieved positives / # retrieved Recall = # retrieved positives / # positives F-measure = 2PR / (P + R)