CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, PDF Free Download

CS 1674: Intro to Computer Vision Intro to Recognition Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

Plan for today Examples of visual recognition problems What should we recognize? Recognition pipeline Features Data Overview of some methods for classification K-Nearest Neighbors Linear classifiers

Some translations Feature vector = descriptor = representation Recognition often involves classification Classes = categories (hence classification = categorization) Training = learning a model (e.g. classifier), happens at training time from training data Classification = prediction, happens at test time

Slide credit: D. Hoiem

Classification Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Decision boundary Zebra Non-zebra Slide credit: L. Lazebnik

Classification Assign input vector to one of two or more classes Any decision rule divides the input space into decision regions separated by decision boundaries Slide credit: L. Lazebnik

Example: Spam filter Slide credit: L. Lazebnik

Examples of Categorization in Vision Part or object detection E.g., for each window: face or non-face? Scene categorization Indoor vs. outdoor, urban, forest, kitchen, etc. Action recognition Picking up vs. sitting down vs. standing Emotion recognition Happy vs. scared vs. surprised Region classification Label pixels into different object/surface categories Boundary classification Boundary vs. non-boundary Etc, etc. Adapted from D. Hoiem

What do you see in this image? Trees Bear Camera Man Can I put stuff in it? Rabbit Grass Forest Slide credit: D. Hoiem

Describe, predict, or interact with the object based on visual cues Is it dangerous? Is it alive? How fast does it run? Does it have a tail? Is it soft? Can I poke with it? Slide credit: D. Hoiem

Image categorization Two-class (binary): Cat vs Dog Adapted from D. Hoiem

Image categorization Multi-class (often): Object recognition Caltech 101 Average Object Images Adapted from D. Hoiem

Image categorization Fine-grained recognition Visipedia Project Slide credit: D. Hoiem

Image categorization Place recognition Places Database [Zhou et al. NIPS 2014] Slide credit: D. Hoiem

Image categorization Dating historical photos 1940 1953 1966 1977 [Palermo et al. ECCV 2012] Slide credit: D. Hoiem

Image categorization Image style recognition [Karayev et al. BMVC 2014] Slide credit: D. Hoiem

Region categorization Layout prediction Assign regions to orientation Geometric context [Hoiem et al. IJCV 2007] Assign regions to depth Make3D [Saxena et al. PAMI 2008] Slide credit: D. Hoiem

Region categorization Material recognition [Bell et al. CVPR 2015] Slide credit: D. Hoiem

Attribute-based recognition A. Farhadi, I. Endres, D. Hoiem, and D Forsyth. Describing Objects by their Attributes. CVPR 2009.

Attribute-based recognition A. Kovashka, S. Vijayanarasimhan, and K. Grauman. Actively Selecting Annotations Among Objects and Attributes. ICCV 2011.

Attribute-based search A. Kovashka, D. Parikh and K. Grauman. WhittleSearch: Image Search with Relative Attribute Feedback. CVPR 2012.

Generic categorization problem Slide credit: K. Grauman

Instance-level recognition problem John s car Which one do you think is harder: Generic or instance-level recognition? Adapted from K. Grauman

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Visual Object Categories What stuff should we bother to recognize? Basic Level Categories in human categorization [Rosch 76, Lakoff 87] The highest level at which category members have similar perceived shape The highest level at which a single mental image reflects the entire category The level at which human subjects are usually fastest at identifying category members The first level named and understood by children The highest level at which a person uses similar motor actions for interaction with category members K. Grauman, B. Leibe

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Visual Object Categories Basic-level categories in humans seem to be defined predominantly visually. There is evidence that humans (usually) start with basic-level categorization before doing identification. Basic-level categorization is easier and faster for humans than object identification! Abstract levels animal quadruped Basic level dog cat cow German shepherd Doberman K. Grauman, B. Leibe Individual level Fido

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label. Which categories are feasible visually? Fido German shepherd dog animal living being K. Grauman, B. Leibe

How many object categories are there? Source: Fei-Fei Li, Rob Fergus, Antonio Torralba. Biederman 1987

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Other Types of Categories Functional Categories e.g. chairs = something you can sit on K. Grauman, B. Leibe

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Other Types of Categories Ad-hoc categories e.g. something you can find in an office environment K. Grauman, B. Leibe

Why recognition? Recognition a fundamental part of perception e.g., robots, autonomous agents Organize and give access to visual content Connect to information Detect trends and themes Slide credit: K. Grauman

Why is recognition hard?

Recognition: A machine learning approach

The machine learning framework Apply a prediction function to a feature representation of the image to get the desired output: f( ) = apple f( ) = tomato f( ) = cow Slide credit: L. Lazebnik

The machine learning framework y = f(x) output prediction function image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) Slide credit: L. Lazebnik

Training Training Images Steps Image Features Training Labels Training Learned model Testing Test Image Image Features Learned model Prediction Slide credit: D. Hoiem and L. Lazebnik

Q: What are good features for recognizing a beach? Slide credit: D. Hoiem

Q: What are good features for recognizing cloth fabric? Slide credit: D. Hoiem

Q: What are good features for recognizing a mug? Slide credit: D. Hoiem

Q: What are good features for fine-grained recognition? Cardigan Welsh Corgi? Pembroke Welsh Corgi What breed is this dog? Slide credit: J. Deng

What are the right features? Depend on what you want to know! Object: shape Local shape info, shading, shadows, texture Material properties: albedo, feel, hardness Color, texture Scene: geometric layout linear perspective, gradients, line segments Action: motion Optical flow, tracked points Slide credit: D. Hoiem

What kind of things do we compute histograms of? Color L*a*b* color space HSV color space Texture (filter banks or HOG over regions) Slide credit: D. Hoiem

What kind of things do we compute histograms of? Histograms of descriptors SIFT [Lowe IJCV 2004] Bag of visual words Slide credit: D. Hoiem

Bag of visual words Image patches Cluster patches BoW histogram Slide credit: D. Hoiem

Training Training Images Steps Image Features Training Labels Training Learned model Testing Test Image Image Features Learned model Prediction Slide credit: D. Hoiem and L. Lazebnik

Recognition training data Images in the training set must be annotated with the correct answer that the model is expected to produce Motorbike Slide credit: L. Lazebnik

Datasets today ImageNet: 22k categories, 14mil images Microsoft COCO: 70 categories, 300k images PASCAL: 20 categories, 12k images SUN: 5k categories, 130k images

The PASCAL Visual Object Classes Challenge (2005-2012) http://pascallin.ecs.soton.ac.uk/challenges/voc/ Challenge classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor Dataset size (by 2012): 11.5K training/validation images, 27K bounding boxes, 7K segmentations Slide credit: L. Lazebnik

PASCAL competitions http://pascallin.ecs.soton.ac.uk/challenges/voc/ Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image Detection: Predicting the bounding box (if any) and label of each object from the twenty target classes in the test image Adapted from L. Lazebnik

Challenges: robustness Illumination Object pose Clutter Occlusions Intra-class appearance Viewpoint Slide credit: K. Grauman

Challenges: robustness Realistic scenes are crowded, cluttered, have overlapping objects. Slide credit: K. Grauman

Challenges: importance of context Slide credit: Fei-Fei, Fergus & Torralba

Painter identification How would you learn to identify the author of a painting? Goya Kirchner Klimt Marc Monet Van Gogh

One way to think about it Training labels dictate that two examples are the same or different, in some sense Features and distances define visual similarity Goal of training is to learn feature weights so that visual similarity predicts label similarity Linear classifier: confidence in positive label is a weighted sum of features What are the weights? We want the simplest function that is confidently correct Adapted from D. Hoiem

Nearest neighbor classifier Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required! Slide credit: L. Lazebnik

K-Nearest Neighbors classification For a new point, find the k closest points from training data Labels of the k points vote to classify Black = negative Red = positive k = 5 If query lands here, the 5 NN consist of 3 negatives and 2 positives, so we classify it as negative. Slide credit: D. Lowe

1-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: D. Hoiem

3-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: D. Hoiem

5-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 What are the tradeoffs of having a too large k? Too small k? Slide credit: D. Hoiem

A nearest neighbor recognition example: im2gps: Estimating Geographic Information from a Single Image. James Hays and Alexei Efros. CVPR 2008. http://graphics.cs.cmu.edu/projects/im2gps/

Where in the World? Slides: James Hays

How much can an image tell about its geographic location? Slide credit: James Hays

Nearest Neighbors according to gist + bag of SIFT + color histogram + a few others Slide credit: James Hays

6+ million geotagged photos by 109,788 photographers Slides: James Hays

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Slides: James Hays

[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

The Importance of Data [Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays

Discriminative classifiers Learn a simple function of the input features that correctly predicts the true labels on the training set y = f x Training Goals 1. Accurate classification of training data 2. Correct classifications are confident 3. Classification function is simple Slide credit: D. Hoiem

What about this line? Linear classifier Find a linear function to separate the classes f(x) = sgn(w 1 x 1 + w 2 x 2 + + w D x D ) = sgn(w x) Slide credit: L. Lazebnik

NN vs. linear classifiers NN pros: + Simple to implement + Decision boundaries not necessarily linear + Works for any number of classes + Nonparametric method NN cons: Need good distance function Slow at test time (large search problem to find neighbors) Storage of data Linear pros: + Low-dimensional parametric representation + Very fast at test time Linear cons: Works for two classes How to train the linear function? What if data is not linearly separable? Adapted from L. Lazebnik

Evaluating Classifiers Accuracy # correctly classified / # all test examples Precision/recall Precision = # retrieved positives / # retrieved Recall = # retrieved positives / # positives F-measure = 2PR / (P + R)

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016