CS 1674: Intro to Computer Vision Intro to Recognition Prof. Adriana Kovashka University of Pittsburgh October 24, 2016
Plan for today Examples of visual recognition problems What should we recognize? Recognition pipeline Features Data Overview of some methods for classification K-Nearest Neighbors Linear classifiers
Some translations Feature vector = descriptor = representation Recognition often involves classification Classes = categories (hence classification = categorization) Training = learning a model (e.g. classifier), happens at training time from training data Classification = prediction, happens at test time
Slide credit: D. Hoiem
Classification Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Decision boundary Zebra Non-zebra Slide credit: L. Lazebnik
Classification Assign input vector to one of two or more classes Any decision rule divides the input space into decision regions separated by decision boundaries Slide credit: L. Lazebnik
Example: Spam filter Slide credit: L. Lazebnik
Examples of Categorization in Vision Part or object detection E.g., for each window: face or non-face? Scene categorization Indoor vs. outdoor, urban, forest, kitchen, etc. Action recognition Picking up vs. sitting down vs. standing Emotion recognition Happy vs. scared vs. surprised Region classification Label pixels into different object/surface categories Boundary classification Boundary vs. non-boundary Etc, etc. Adapted from D. Hoiem
What do you see in this image? Trees Bear Camera Man Can I put stuff in it? Rabbit Grass Forest Slide credit: D. Hoiem
Describe, predict, or interact with the object based on visual cues Is it dangerous? Is it alive? How fast does it run? Does it have a tail? Is it soft? Can I poke with it? Slide credit: D. Hoiem
Image categorization Two-class (binary): Cat vs Dog Adapted from D. Hoiem
Image categorization Multi-class (often): Object recognition Caltech 101 Average Object Images Adapted from D. Hoiem
Image categorization Fine-grained recognition Visipedia Project Slide credit: D. Hoiem
Image categorization Place recognition Places Database [Zhou et al. NIPS 2014] Slide credit: D. Hoiem
Image categorization Dating historical photos 1940 1953 1966 1977 [Palermo et al. ECCV 2012] Slide credit: D. Hoiem
Image categorization Image style recognition [Karayev et al. BMVC 2014] Slide credit: D. Hoiem
Region categorization Layout prediction Assign regions to orientation Geometric context [Hoiem et al. IJCV 2007] Assign regions to depth Make3D [Saxena et al. PAMI 2008] Slide credit: D. Hoiem
Region categorization Material recognition [Bell et al. CVPR 2015] Slide credit: D. Hoiem
Attribute-based recognition A. Farhadi, I. Endres, D. Hoiem, and D Forsyth. Describing Objects by their Attributes. CVPR 2009.
Attribute-based recognition A. Kovashka, S. Vijayanarasimhan, and K. Grauman. Actively Selecting Annotations Among Objects and Attributes. ICCV 2011.
Attribute-based search A. Kovashka, D. Parikh and K. Grauman. WhittleSearch: Image Search with Relative Attribute Feedback. CVPR 2012.
Generic categorization problem Slide credit: K. Grauman
Instance-level recognition problem John s car Which one do you think is harder: Generic or instance-level recognition? Adapted from K. Grauman
Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Visual Object Categories What stuff should we bother to recognize? Basic Level Categories in human categorization [Rosch 76, Lakoff 87] The highest level at which category members have similar perceived shape The highest level at which a single mental image reflects the entire category The level at which human subjects are usually fastest at identifying category members The first level named and understood by children The highest level at which a person uses similar motor actions for interaction with category members K. Grauman, B. Leibe
Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Visual Object Categories Basic-level categories in humans seem to be defined predominantly visually. There is evidence that humans (usually) start with basic-level categorization before doing identification. Basic-level categorization is easier and faster for humans than object identification! Abstract levels animal quadruped Basic level dog cat cow German shepherd Doberman K. Grauman, B. Leibe Individual level Fido
Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label. Which categories are feasible visually? Fido German shepherd dog animal living being K. Grauman, B. Leibe
How many object categories are there? Source: Fei-Fei Li, Rob Fergus, Antonio Torralba. Biederman 1987
Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Other Types of Categories Functional Categories e.g. chairs = something you can sit on K. Grauman, B. Leibe
Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Other Types of Categories Ad-hoc categories e.g. something you can find in an office environment K. Grauman, B. Leibe
Why recognition? Recognition a fundamental part of perception e.g., robots, autonomous agents Organize and give access to visual content Connect to information Detect trends and themes Slide credit: K. Grauman
Why is recognition hard?
Recognition: A machine learning approach
The machine learning framework Apply a prediction function to a feature representation of the image to get the desired output: f( ) = apple f( ) = tomato f( ) = cow Slide credit: L. Lazebnik
The machine learning framework y = f(x) output prediction function image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) Slide credit: L. Lazebnik
Training Training Images Steps Image Features Training Labels Training Learned model Testing Test Image Image Features Learned model Prediction Slide credit: D. Hoiem and L. Lazebnik
Q: What are good features for recognizing a beach? Slide credit: D. Hoiem
Q: What are good features for recognizing cloth fabric? Slide credit: D. Hoiem
Q: What are good features for recognizing a mug? Slide credit: D. Hoiem
Q: What are good features for fine-grained recognition? Cardigan Welsh Corgi? Pembroke Welsh Corgi What breed is this dog? Slide credit: J. Deng
What are the right features? Depend on what you want to know! Object: shape Local shape info, shading, shadows, texture Material properties: albedo, feel, hardness Color, texture Scene: geometric layout linear perspective, gradients, line segments Action: motion Optical flow, tracked points Slide credit: D. Hoiem
What kind of things do we compute histograms of? Color L*a*b* color space HSV color space Texture (filter banks or HOG over regions) Slide credit: D. Hoiem
What kind of things do we compute histograms of? Histograms of descriptors SIFT [Lowe IJCV 2004] Bag of visual words Slide credit: D. Hoiem
Bag of visual words Image patches Cluster patches BoW histogram Slide credit: D. Hoiem
Training Training Images Steps Image Features Training Labels Training Learned model Testing Test Image Image Features Learned model Prediction Slide credit: D. Hoiem and L. Lazebnik
Recognition training data Images in the training set must be annotated with the correct answer that the model is expected to produce Motorbike Slide credit: L. Lazebnik
Datasets today ImageNet: 22k categories, 14mil images Microsoft COCO: 70 categories, 300k images PASCAL: 20 categories, 12k images SUN: 5k categories, 130k images
The PASCAL Visual Object Classes Challenge (2005-2012) http://pascallin.ecs.soton.ac.uk/challenges/voc/ Challenge classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor Dataset size (by 2012): 11.5K training/validation images, 27K bounding boxes, 7K segmentations Slide credit: L. Lazebnik
PASCAL competitions http://pascallin.ecs.soton.ac.uk/challenges/voc/ Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image Detection: Predicting the bounding box (if any) and label of each object from the twenty target classes in the test image Adapted from L. Lazebnik
Challenges: robustness Illumination Object pose Clutter Occlusions Intra-class appearance Viewpoint Slide credit: K. Grauman
Challenges: robustness Realistic scenes are crowded, cluttered, have overlapping objects. Slide credit: K. Grauman
Challenges: importance of context Slide credit: Fei-Fei, Fergus & Torralba
Painter identification How would you learn to identify the author of a painting? Goya Kirchner Klimt Marc Monet Van Gogh
One way to think about it Training labels dictate that two examples are the same or different, in some sense Features and distances define visual similarity Goal of training is to learn feature weights so that visual similarity predicts label similarity Linear classifier: confidence in positive label is a weighted sum of features What are the weights? We want the simplest function that is confidently correct Adapted from D. Hoiem
Nearest neighbor classifier Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required! Slide credit: L. Lazebnik
K-Nearest Neighbors classification For a new point, find the k closest points from training data Labels of the k points vote to classify Black = negative Red = positive k = 5 If query lands here, the 5 NN consist of 3 negatives and 2 positives, so we classify it as negative. Slide credit: D. Lowe
1-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: D. Hoiem
3-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: D. Hoiem
5-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 What are the tradeoffs of having a too large k? Too small k? Slide credit: D. Hoiem
A nearest neighbor recognition example: im2gps: Estimating Geographic Information from a Single Image. James Hays and Alexei Efros. CVPR 2008. http://graphics.cs.cmu.edu/projects/im2gps/
Where in the World? Slides: James Hays
Where in the World? Slides: James Hays
Where in the World? Slides: James Hays
How much can an image tell about its geographic location? Slide credit: James Hays
Nearest Neighbors according to gist + bag of SIFT + color histogram + a few others Slide credit: James Hays
6+ million geotagged photos by 109,788 photographers Slides: James Hays
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
Slides: James Hays
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
The Importance of Data [Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
Discriminative classifiers Learn a simple function of the input features that correctly predicts the true labels on the training set y = f x Training Goals 1. Accurate classification of training data 2. Correct classifications are confident 3. Classification function is simple Slide credit: D. Hoiem
What about this line? Linear classifier Find a linear function to separate the classes f(x) = sgn(w 1 x 1 + w 2 x 2 + + w D x D ) = sgn(w x) Slide credit: L. Lazebnik
NN vs. linear classifiers NN pros: + Simple to implement + Decision boundaries not necessarily linear + Works for any number of classes + Nonparametric method NN cons: Need good distance function Slow at test time (large search problem to find neighbors) Storage of data Linear pros: + Low-dimensional parametric representation + Very fast at test time Linear cons: Works for two classes How to train the linear function? What if data is not linearly separable? Adapted from L. Lazebnik
Evaluating Classifiers Accuracy # correctly classified / # all test examples Precision/recall Precision = # retrieved positives / # retrieved Recall = # retrieved positives / # positives F-measure = 2PR / (P + R)