CS 2770: Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

Similar documents
CS 1699: Intro to Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh September 1, 2015

ECS 189G: Intro to Computer Vision March 31 st, Yong Jae Lee Assistant Professor CS, UC Davis

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

Generic object recognition

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Indexing local features and instance recognition

Summarizing Long First-Person Videos

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

An Introduction to Deep Image Aesthetics

The Bias-Variance Tradeoff

CPSC 425: Computer Vision

Lecture 5: Clustering and Segmentation Part 1

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

Instance Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

FOIL it! Find One mismatch between Image and Language caption

Exhibits. Open House. NHK STRL Open House Entrance. Smart Production. Open House 2018 Exhibits

AI FOR BETTER STORYTELLING IN LIVE FOOTBALL

CS 7643: Deep Learning

Lecture 5: Clustering and Segmenta4on Part 1

ImageNet Auto-Annotation with Segmentation Propagation

Joint Image and Text Representation for Aesthetics Analysis

Less is More: Picking Informative Frames for Video Captioning

Music Understanding and the Future of Music

VBM683 Machine Learning

Audio spectrogram representations for processing with Convolutional Neural Networks

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

CS229 Project Report Polyphonic Piano Transcription

Concept of ELFi Educational program. Android + LEGO

Paulo V. K. Borges. Flat 1, 50A, Cephas Av. London, UK, E1 4AR (+44) PRESENTATION

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

MATLAB & Image Processing (Summer Training Program) 4 Weeks/ 30 Days

Predicting Performance of PESQ in Case of Single Frame Losses

Digital Signal Processing

Image Steganalysis: Challenges

Deep learning for music data processing

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

Image Processing Using MATLAB (Summer Training Program) 6 Weeks/ 45 Days PRESENTED BY

Singer Traits Identification using Deep Neural Network

Audio-Based Video Editing with Two-Channel Microphone

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

THE FOLLOWING PREVIEW HAS BEEN APPROVED FOR ALL AUDIENCES. CVPR 2016 Spotlight

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

CSE 166: Image Processing. Overview. Representing an image. What is an image? History. What is image processing? Today. Image Processing CSE 166

Melody classification using patterns

Image Aesthetics and Content in Selecting Memorable Keyframes from Lifelogs

2. Problem formulation

1 Feb Grading WB PM Low power Wireless RF Transmitter for Photodiode Temperature Measurements

MUSI-6201 Computational Music Analysis

Outline. Why do we classify? Audio Classification

DTS Neural Mono2Stereo

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Distortion Analysis Of Tamil Language Characters Recognition

LSTM Neural Style Transfer in Music Using Computational Musicology

Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT

Journal of Field Robotics. Instructions to Authors

Visual Dialog. Devi Parikh

Automatic Piano Music Transcription

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

For support, video tutorials, webinars and further information visit us at

Lecture 9 Source Separation

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Image-to-Markup Generation with Coarse-to-Fine Attention

CTP 431 Music and Audio Computing. Course Introduction. Graduate School of Culture Technology (GSCT) Juhan Nam

SHENZHEN H&Y TECHNOLOGY CO., LTD

IMAGE AESTHETIC PREDICTORS BASED ON WEIGHTED CNNS. Oce Print Logic Technologies, Creteil, France

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington

Representations of Sound in Deep Learning of Audio Features from Music

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

This project will work with two different areas in digital signal processing: Image Processing Sound Processing

Smart Traffic Control System Using Image Processing

Digital Image Processing and Pattern Recognition

Generating Chinese Classical Poems Based on Images

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Pedestrian Detection with a Large-Field-Of-View Deep Network

Visual Communication at Limited Colour Display Capability

CHAPTER-9 DEVELOPMENT OF MODEL USING ANFIS

workbook Listening scripts

Preface. system has put emphasis on neuroscience, both in studies and in the treatment of tinnitus.

Joint bottom-up/top-down machine learning structures to simulate human audition and musical creativity

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

SMART VEHICLE SCREENING SYSTEM USING ARTIFICIAL INTELLIGENCE METHODS

Coal Mines Security System

Introduction to Knowledge Systems

arxiv: v2 [cs.cv] 27 Jul 2016

arxiv: v1 [cs.cv] 9 Apr 2018

4K Video, Real-Time Analytics, and AI Applications Drive 24G SAS

Large Scale Concepts and Classifiers for Describing Visual Sentiment in Social Multimedia

Detecting Bosch IVA Events with Milestone XProtect

Through-Wall Human Pose Estimation Using Radio Signals

Speech Recognition and Signal Processing for Broadcast News Transcription

IMPROVING VIDEO ANALYTICS PERFORMANCE FACTORS THAT INFLUENCE VIDEO ANALYTIC PERFORMANCE WHITE PAPER

THE FUTURE OF VOICE ASSISTANTS IN THE NETHERLANDS. To what extent should voice technology improve in order to conquer the Western European market?

Image Aesthetics Assessment using Deep Chatterjee s Machine

Singing voice synthesis based on deep neural networks

CS 7643: Deep Learning

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Music Genre Classification and Variance Comparison on Number of Genres

Therapy for Memory: A Music Activity and Educational Program for Cognitive Impairments

Music Information Retrieval with Temporal Features and Timbre

Transcription:

CS 2770: Computer Vision Introduction Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

About the Instructor Born 1985 in Sofia, Bulgaria Got BA in 2008 at Pomona College, CA (Computer Science & Media Studies) Got PhD in 2014 at University of Texas at Austin (Computer Vision)

Course Info Course website: http://people.cs.pitt.edu/~kovashka/cs2770 Instructor: Adriana Kovashka (kovashka@cs.pitt.edu) Use "CS2770" at the beginning of your Subject Office: Sennott Square 5325 Office hours: Tue/Thu, 3:30pm - 5:30pm

TA Keren Ye (yekeren@cs.pitt.edu) Office: Sennott Square 5501 Office hours: TBD Do the Doodle by the end of Friday: http://doodle.com/poll/v3m8acmcdsiydqhq

Textbooks Computer Vision: Algorithms and Applications by Richard Szeliski Visual Object Recognition by Kristen Grauman and Bastian Leibe More resources available on course webpage Your notes from class are your best study material, slides are not complete with notes

Course Goals To learn about the basic computer vision tasks and approaches To get experience with some computer vision techniques To learn/apply basic machine learning (a key component of modern computer vision) To think critically about vision approaches, and to see connections between works and potential for improvement

Policies and Schedule http://people.cs.pitt.edu/~kovashka/cs2770/

Should I take this class? It will be a lot of work! But you will learn a lot Some parts will be hard and require that you pay close attention! But I will have periodic ungraded pop quizzes to see how you re doing I will also pick on students randomly to answer questions Use instructor s and TA s office hours!!!

Questions?

Plan for Today Introductions What is computer vision? Why do we care? What are the challenges? What is the current research like? Overview of topics (if time)

Introductions What is your name? What one thing outside of school are you passionate about? Do you have any prior experience with computer vision? What do you hope to get out of this class? Every time you speak, please remind me your name

Computer Vision

What is computer vision? Done? "We see with our brains, not with our eyes (Oliver Sacks and others) Kristen Grauman (adapted)

What is computer vision? Automatic understanding of images and video Computing properties of the 3D world from visual data (measurement) Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities (perception and interpretation) Algorithms to mine, search, and interact with visual data (search and organization) Kristen Grauman

Vision for measurement Real-time stereo Structure from motion Multi-view stereo for community photo collections NASA Mars Rover Pollefeys et al. Goesele et al. Kristen Grauman Slide credit: L. Lazebnik

Vision for perception, interpretation The Wicked Twister ride Lake Erie sky water Ferris wheel amusement park Cedar Point tree ride 12 E Objects Activities Scenes Locations Text / writing Faces Gestures Motions Emotions ride tree people waiting in line people sitting on ride Kristen Grauman deck tree bench tree carousel umbrellas pedestrians maxair

Visual search, organization Query Image or video archives Relevant content Kristen Grauman

Related disciplines Graphics Image processing Artificial intelligence Computer vision Algorithms Machine learning Cognitive science Kristen Grauman

Vision and graphics Images Vision Model Graphics Inverse problems: analysis and synthesis. Kristen Grauman

Why vision? Images and video are everywhere! 144k hours uploaded to YouTube daily 4.5 mil photos uploaded to Flickr daily 10 bil images indexed by Google Personal photo albums Movies, news, sports Surveillance and security Adapted from Lana Lazebnik Medical and scientific images

Why vision? As image sources multiply, so do applications Relieve humans of boring, easy tasks Human-computer interaction Perception for robotics / autonomous agents Organize and give access to visual content Description of image content for the visually impaired Fun applications (e.g. transfer art styles to my photos) Adapted from Kristen Grauman

Faces and digital cameras Camera waits for everyone to smile to take a photo [Canon] Setting camera focus via face detection Kristen Grauman

Devi Parikh Face recognition

Linking to info with a mobile device Situated search Yeh et al., MIT kooaba MSR Lincoln Kristen Grauman

Exploring photo collections Snavely et al. Kristen Grauman

Special visual effects The Matrix What Dreams May Come Mocap for Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz Kristen Grauman

Yong Jae Lee Interactive systems

Video-based interfaces YouTube Link Human joystick NewsBreaker Live Assistive technology systems Camera Mouse Boston College Kristen Grauman

Vision for medical & neuroimages fmri data Golland et al. Image guided surgery MIT AI Vision Group Kristen Grauman

Safety & security Navigation, driver safety Monitoring pool (Poseidon) Kristen Grauman Pedestrian detection MERL, Viola et al. Surveillance

Healthy eating FarmBot.io YouTube Link Im2calories by Myers et al., ICCV 2015 figure source

Self-training for sports? Pirsiavash et al., Assessing the Quality of Actions, ECCV 2014

Image generation Reed et al., ICML 2016 Radford et al., ICLR 2016

YouTube link Seeing AI

Obstacles? Kristen Grauman Read more about the history: Szeliski Sec. 1.2

What the computer gets Why is this problematic? Adapted from Kristen Grauman and Lana Lazebnik

Why is vision difficult? Ill-posed problem: real world much more complex than what we can measure in images 3D 2D Impossible to literally invert image formation process with limited information Need information outside of this particular image to generalize what image portrays (e.g. to resolve occlusion) Adapted from Kristen Grauman

Challenges: many nuisance parameters Illumination Object pose Clutter Occlusions Intra-class appearance Viewpoint Think again about the pixels Kristen Grauman

Challenges: intra-class variation CMOA Pittsburgh slide credit: Fei-Fei, Fergus & Torralba

Challenges: importance of context slide credit: Fei-Fei, Fergus & Torralba

Challenges: Complexity Thousands to millions of pixels in an image 3,000-30,000 human recognizable object categories 30+ degrees of freedom in the pose of articulated objects (humans) Billions of images indexed by Google Image Search 1.424 billion smart camera phones sold in 2015 About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991] Kristen Grauman

Challenges: Limited supervision Less More Kristen Grauman

Challenges: Vision requires reasoning Antol et al., VQA: Visual Question Answering, ICCV 2015

Ok, clearly the vision problem is deep and challenging time to give up? Active research area with exciting progress! How datasets changed: Kristen Grauman

Datasets today ImageNet: 22k categories, 14mil images Microsoft COCO: 80 categories, 300k images PASCAL: 20 categories, 12k images SUN: 5k categories, 130k images

Some Visual Recognition Problems

Recognition: What is this?

Recognition: What objects do you see? building street balcony truck carriage horse table person person car

Detection: Where are the cars?

Activity: What is this person doing?

Scene: Is this an indoor scene?

Instance: Which city? Which building?

Visual question answering: What are all these people participating in?

The Latest at CVPR 2016 * CVPR = IEEE Conference on Computer Vision and Pattern Recognition

Our ability to detect objects has gone from 34 map in 2008 to 73 map at 7 FPS (frames per second) or 63 map at 45 FPS in 2016

Redmon et al., CVPR 2016 You Only Look Once: Unified, Real-Time Object Detection

Force from Motion: Decoding Physical Sensation from a First Person Video Park et al., CVPR 2016

MovieQA: Understanding Stories in Movies through Question-Answering Tapaswi et al., CVPR 2016

Owens et al., CVPR 2016 Visually Indicated Sounds

Anticipating Visual Representations from Unlabeled Video Vondrick et al., CVPR 2016

Gatys et al., CVPR 2016 Image Style Transfer Using Convolutional Neural Networks

DeepArt.io try it for yourself! (Image Style Transfer Using Convolutional Neural Networks) Images: Styles:

DeepArt.io try it for yourself! (Image Style Transfer Using Convolutional Neural Networks) Results:

Thomas and Kovashka, CVPR 2016 Seeing Behind the Camera: Identifying the Authorship of a Photograph

Is computer vision solved? Given an image, we can guess with 81% accuracy what object categories are shown (ResNet) but we only answer why questions about images with 14% accuracy!

Why does it seem that it s solved? Deep learning makes excellent use of massive data (labeled for the task of interest?) But it s hard to understand how it does so It doesn t work well when massive data is not available and your task is different than tasks for which data is available Sometimes the manner in which deep methods work is not intellectually appealing, but our smarter / more complex methods perform worse

Overview of Topics

Overview of topics Lower-level vision Analyzing textures, edges and gradients in images, without concern for the semantics (e.g. objects) of the image Higher-level vision Making predictions about the semantics or higherlevel functions of content in images (e.g. objects, attributes, styles, motion, etc.) Involves machine learning; we ll cover some basics of this then go back to low-level tasks

Features and filters Transforming and describing images; textures, colors, edges Kristen Grauman

Features and filters Detecting distinctive + repeatable features Describing images with local statistics

Indexing and search Matching features and regions across images Kristen Grauman

How does light in 3d world project to form 2d images? Image formation Kristen Grauman

Multiple views Multi-view geometry, matching, invariant features, stereo vision Lowe Hartley and Zisserman Fei-Fei Li Kristen Grauman

Grouping and fitting Clustering, segmentation, fitting; what parts belong together? Kristen Grauman [fig from Shi et al]

Visual recognition Recognizing objects and categories, learning techniques Kristen Grauman

Object detection Detecting novel instances of objects Classifying regions as one of several categories

Attribute-based description Describing the high-level properties of objects Allows recognition of unseen objects

Convolutional neural networks State-of-the-art on many recognition tasks Image Prediction Krizhevsky et al. Yosinski et al., ICML DL workshop 2015

Recurrent neural networks Sequence processing, e.g. question answering Wu et al., CVPR 2016

Motion and tracking Tracking objects, video analysis Tomas Izo Kristen Grauman

Pose and actions Automatically annotating human pose (joints) Recognizing actions in first-person video

Your Homework Fill out Doodle Read entire course website Do first reading

Next Time Linear algebra review Matlab tutorial