Lecture 5: Clustering and Segmentation Part 1

Similar documents
Lecture 5: Clustering and Segmenta4on Part 1

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

Indexing local features and instance recognition

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

Instance Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

CS229 Project Report Polyphonic Piano Transcription

Graphical Perception. Graphical Perception. Graphical Perception. Which best encodes quantities? Jeffrey Heer Stanford University

Supervised Learning in Genre Classification

Hidden Markov Model based dance recognition

ECS 189G: Intro to Computer Vision March 31 st, Yong Jae Lee Assistant Professor CS, UC Davis

Graphical Perception. Graphical Perception. Which best encodes quantities?

Modeling memory for melodies

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

A Framework for Segmentation of Interview Videos

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

CSE Data Visualization. Graphical Perception. Jeffrey Heer University of Washington

Generic object recognition

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

VBM683 Machine Learning

Topic 10. Multi-pitch Analysis

Latin Square Design. Design of Experiments - Montgomery Section 4-2

Technical Specifications

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1,

Lossless Compression Algorithms for Direct- Write Lithography Systems

AUDIOVISUAL COMMUNICATION

Murdoch redux. Colorimetry as Linear Algebra. Math of additive mixing. Approaching color mathematically. RGB colors add as vectors

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS

Linköping University Post Print. Packet Video Error Concealment With Gaussian Mixture Models

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Discrete, Bounded Reasoning in Games

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 7, NOVEMBER

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Chapter 10 Basic Video Compression Techniques

Inverse Filtering by Signal Reconstruction from Phase. Megan M. Fuller

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Video coding standards

Module 3: Video Sampling Lecture 16: Sampling of video in two dimensions: Progressive vs Interlaced scans. The Lecture Contains:

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

CS 2770: Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

MUSI-6201 Computational Music Analysis

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Processes for the Intersection

Algorithmic Music Composition

Computer Vision for HCI. Image Pyramids. Image Pyramids. Multi-resolution image representations Useful for image coding/compression

Detecting the Moment of Snap in Real-World Football Videos

N T I. Introduction. II. Proposed Adaptive CTI Algorithm. III. Experimental Results. IV. Conclusion. Seo Jeong-Hoon

Automatic Rhythmic Notation from Single Voice Audio Sources

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

CS 1699: Intro to Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh September 1, 2015

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Introduction to Psychology Prof. Braj Bhushan Department of Humanities and Social Sciences Indian Institute of Technology, Kanpur

Recap of Last (Last) Week

The H.26L Video Coding Project

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

Decision-Maker Preference Modeling in Interactive Multiobjective Optimization

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

DCI Requirements Image - Dynamics

LOCOCODE versus PCA and ICA. Jurgen Schmidhuber. IDSIA, Corso Elvezia 36. CH-6900-Lugano, Switzerland. Abstract

Personalized TV Recommendation with Mixture Probabilistic Matrix Factorization

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Can the Computer Learn to Play Music Expressively? Christopher Raphael Department of Mathematics and Statistics, University of Massachusetts at Amhers

Music Composition with RNN

Week 14 Music Understanding and Classification

Example: compressing black and white images 2 Say we are trying to compress an image of black and white pixels: CSC310 Information Theory.

AV1: The Quest is Nearly Complete

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Classification of Timbre Similarity

Re-Cinematography: Improving the Camera Dynamics of Casual Video

Reducing False Positives in Video Shot Detection

Processing. Electrical Engineering, Department. IIT Kanpur. NPTEL Online - IIT Kanpur

New-Generation Scalable Motion Processing from Mobile to 4K and Beyond

Digital Signal Processing. Prof. Dietrich Klakow Rahil Mahdian

Audio Structure Analysis

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Learning Joint Statistical Models for Audio-Visual Fusion and Segregation

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

A PROBABILISTIC TOPIC MODEL FOR UNSUPERVISED LEARNING OF MUSICAL KEY-PROFILES

Personalized TV Recommendation with Mixture Probabilistic Matrix Factorization

Characterization and improvement of unpatterned wafer defect review on SEMs

!"#"$%& Some slides taken shamelessly from Prof. Yao Wang s lecture slides

Put your sound where it belongs: Numerical optimization of sound systems. Stefan Feistel, Bruce C. Olson, Ana M. Jaramillo AFMG Technologies GmbH

Interactive Methods in Multiobjective Optimization 1: An Overview

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Automatic Labelling of tabla signals

MODELS of music begin with a representation of the

Music Recommendation from Song Sets

Obstacle Warning for Texting

Transcription:

Lecture 5: Clustering and Segmentation Part 1 Professor Fei Fei Li Stanford Vision Lab 1

What we will learn today Segmentation and grouping Gestalt principles Segmentation as clustering K means Feature space Probabilistic clustering (Problem Set 1 (Q3)) Mixture of Gaussians, EM 2

3

Image Segmentation Goal: identify groups of pixels that go together Slide credit: Steve Seitz, Kristen Grauman 4

The Goals of Segmentation Separate image into coherent objects Image Human segmentation Slide credit: Svetlana Lazebnik 5

The Goals of Segmentation Separate image into coherent objects Group together similar looking pixels for efficiency of further processing superpixels X. Ren and J. Malik. Learning a classification model for segmentation. ICCV 2003. Slide credit: Svetlana Lazebnik 6

Segmentation Compact representation for image data in terms of a set of components Components share common visual properties Properties can be defined at different level of abstractions 7

Tokens General ideas whatever we need to group (pixels, points, surface elements, etc., etc.) Bottom up segmentation tokens belong together because they are locally coherent Top down segmentation tokens belong together because they lie on the same visual entity (object, scene ) > These two are not mutually exclusive This lecture (#5) 8

What is Segmentation? Clustering image elements that belong together Partitioning Divide into regions/sequences with coherent internal properties Grouping Identify sets of coherent tokens in image Slide credit: Christopher Rasmussen 9

What is Segmentation? Why do these tokens belong together? 10

Basic ideas of grouping in human vision Gestalt properties Figure ground discrimination 11

Examples of Grouping in Vision Grouping video frames into shots Determining image regions What things should be grouped? Figure ground What cues indicate groups? Slide credit: Kristen Grauman Object level grouping 12

Similarity Slide credit: Kristen Grauman 13

Symmetry Slide credit: Kristen Grauman 14

Common Fate Image credit: Arthus Bertrand (via F. Durand) Slide credit: Kristen Grauman 15

Proximity Slide credit: Kristen Grauman 16

Muller Lyer Illusion Gestalt principle: grouping is key to visual perception. 17

The Gestalt School Grouping is key to visual perception Elements in a collection can have properties that result from relationships The whole is greater than the sum of its parts Illusory/subjective contours Occlusion Familiar configuration http://en.wikipedia.org/wiki/gestalt_psychology Slide credit: Svetlana Lazebnik 18

Gestalt Theory Gestalt: whole or group Whole is greater than sum of its parts Relationships among parts can yield new properties/features Psychologists identified series of factors that predispose set of elements to be grouped (by human visual system) I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees. Max Wertheimer (1880-1943) Untersuchungen zur Lehre von der Gestalt, Psychologische Forschung, Vol. 4, pp. 301-350, 1923 http://psy.ed.asu.edu/~classics/wertheimer/forms/forms.htm 19

Gestalt Factors These factors make intuitive sense, but are very difficult to translate into algorithms. Image source: Forsyth & Ponce 20

Continuity through Occlusion Cues 21

Continuity through Occlusion Cues Continuity, explanation by occlusion 22

Continuity through Occlusion Cues Image source: Forsyth & Ponce 23

Continuity through Occlusion Cues Image source: Forsyth & Ponce 24

Figure Ground Discrimination 25

The Ultimate Gestalt? 26

What we will learn today Segmentation and grouping Gestalt principles Segmentation as clustering K means Feature space Probabilistic clustering Mixture of Gaussians, EM Model free clustering Mean shift 27

Image Segmentation: Toy Example 1 2 3 black pixels gray pixels white pixels input image intensity These intensities define the three groups. We could label every pixel in the image according to which of these primary intensities it is. i.e., segment the image based on the intensity feature. What if the image isn t quite so simple? Slide credit: Kristen Grauman 28

Pixel count Input image Intensity Pixel count Input image Slide credit: Kristen Grauman Intensity 29

Pixel count Input image Intensity Now how to determine the three main intensities that define our groups? We need to cluster. Slide credit: Kristen Grauman 30

0 190 255 Intensity 1 2 3 Goal: choose three centers as the representative intensities, and label every pixel according to which of these centers it is nearest to. Best cluster centers are those that minimize Sum of Square Distance (SSD) between all points and their nearest cluster center c i : Slide credit: Kristen Grauman 31

Clustering With this objective, it is a chicken and egg problem: If we knew the cluster centers, we could allocate points to groups by assigning each to its closest center. If we knew the group memberships, we could get the centers by computing the mean per group. Slide credit: Kristen Grauman 32

K Means Clustering Basic idea: randomly initialize the k cluster centers, and iterate between the two steps we just saw. 1. Randomly initialize the cluster centers, c 1,..., c K 2. Given cluster centers, determine points in each cluster For each point p, find the closest c i. Put p into cluster i 3. Given points in each cluster, solve for c i Set c i to be the mean of points in cluster i 4. If c i have changed, repeat Step 2 Properties Will always converge to some solution Can be a local minimum Does not always find the global minimum of objective function: Slide credit: Steve Seitz 33

Segmentation as Clustering K=2 img_as_col = double(im(:)); cluster_membs = kmeans(img_as_col, K); K=3 labelim = zeros(size(im)); for i=1:k inds = find(cluster_membs==i); meanval = mean(img_as_column(inds)); labelim(inds) = meanval; end Slide credit: Kristen Grauman 34

K Means Clustering Java demo: http://home.dei.polimi.it/matteucc/clustering/tutorial_html/appletkm.html 35

K Means++ Can we prevent arbitrarily bad local minima? 1. Randomly choose first center. 2. Pick new center with prob. proportional to (Contribution of p to total error) 3. Repeat until k centers. Expected error = O(log k) * optimal Arthur & Vassilvitskii 2007 Slide credit: Steve Seitz 36

Feature Space Depending on what we choose as the feature space, we can group pixels in different ways. Grouping pixels based on intensity similarity Feature space: intensity value (1D) Slide credit: Kristen Grauman 37

Feature Space Depending on what we choose as the feature space, we can group pixels in different ways. Grouping pixels based on color similarity B G R=255 G=200 B=250 R=245 G=220 B=248 Feature space: color value (3D) R R=15 G=189 B=2 R=3 G=12 B=2 Slide credit: Kristen Grauman 38

Feature Space Depending on what we choose as the feature space, we can group pixels in different ways. Grouping pixels based ontexture similarity F 1 F 2 Filter bank of 24 filters F 24 Feature space: filter bank responses (e.g., 24D) Slide credit: Kristen Grauman 39

Smoothing Out Cluster Assignments Assigning a cluster label per pixel may yield outliers: Original Labeled by cluster center s intensity How can we ensure they are spatially smooth? 1 2? 3 Slide credit: Kristen Grauman 40

Segmentation as Clustering Depending on what we choose as the feature space, we can group pixels in different ways. Grouping pixels based on intensity+position similarity Intensity Y X Way to encode both similarity and proximity. Slide credit: Kristen Grauman 41

K Means Clustering Results K means clustering based on intensity or color is essentially vector quantization of the image attributes Clusters don t have to be spatially coherent Image Intensity based clusters Color based clusters Image source: Forsyth & Ponce 42

K Means Clustering Results K means clustering based on intensity or color is essentially vector quantization of the image attributes Clusters don t have to be spatially coherent Clustering based on (r,g,b,x,y) values enforces more spatial coherence Image source: Forsyth & Ponce 43

Summary K Means Pros Simple, fast to compute Converges to local minimum of within cluster squared error Cons/issues Setting k? Sensitive to initial centers Sensitive to outliers Detects spherical clusters only Assuming means can be computed Slide credit: Kristen Grauman 44

What we will learn today Segmentation and grouping Gestalt principles Segmentation as clustering K means Feature space Probabilistic clustering (Problem Set 1 (Q3)) Mixture of Gaussians, EM 45

Probabilistic Clustering Basic questions What s the probability that a point x is in cluster m? What s the shape of each cluster? K means doesn t answer these questions. Basic idea Instead of treating the data as a bunch of points, assume that they are all generated by sampling a continuous function. This function is called a generative model. Defined by a vector of parameters θ Slide credit: Steve Seitz 46

Mixture of Gaussians One generative model is a mixture of Gaussians (MoG) K Gaussian blobs with means μ b covariance matrices V b, dimension d Blob b defined by: Blob b is selected with probability The likelihood of observing x is a weighted mixture of Gaussians, Slide credit: Steve Seitz 47

Expectation Maximization (EM) Goal Find blob parameters θ that maximize the likelihood function: Approach: 1. E step: given current guess of blobs, compute ownership of each point 2. M step: given ownership probabilities, update blobs to maximize likelihood function 3. Repeat until convergence Slide credit: Steve Seitz 48

EM Details E step Compute probability that point x is in blob b, given current guess of θ M step Compute probability that blob b is selected (N data points) Mean of blob b Covariance of blob b Slide credit: Steve Seitz 49

Applications of EM Turns out this is useful for all sorts of problems Any clustering problem Any model estimation problem Missing data problems Finding outliers Segmentation problems Segmentation based on color Segmentation based on motion Foreground/background separation... EM demo http://lcn.epfl.ch/tutorial/english/gaussian/html/index.html Slide credit: Steve Seitz 50

Segmentation with EM Original image EM segmentation results k=2 k=3 k=4 k=5 Image source: Serge Belongie 51

Summary: Mixtures of Gaussians, EM Pros Probabilistic interpretation Soft assignments between data points and clusters Generative model, can predict novel data points Relatively compact storage Cons Local minima Initialization Often a good idea to start with some k means iterations. Need to know number of components Solutions: model selection (AIC, BIC), Dirichlet process mixture Need to choose generative model Numerical problems are often a nuisance 52

What we have learned today Segmentation and grouping Gestalt principles Segmentation as clustering K means Feature space Probabilistic clustering (Problem Set 1 (Q3)) Mixture of Gaussians, EM 53