The Bias-Variance Tradeoff

Similar documents
CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

CS229 Project Report Polyphonic Piano Transcription

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

MUSI-6201 Computational Music Analysis

Hidden Markov Model based dance recognition

Lyrics Classification using Naive Bayes

Music Genre Classification and Variance Comparison on Number of Genres

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

Noise. CHEM 411L Instrumental Analysis Laboratory Revision 2.0

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Detecting Musical Key with Supervised Learning

CS 2770: Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

ECE438 - Laboratory 1: Discrete and Continuous-Time Signals

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

MITOCW ocw f08-lec19_300k

R&S FPS-K18 Amplifier Measurements Specifications

System Identification

Neural Network Predicating Movie Box Office Performance

Release Year Prediction for Songs

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

ur-caim: Improved CAIM Discretization for Unbalanced and Balanced Data

Normalization Methods for Two-Color Microarray Data

Sodern recent development in the design and verification of the passive polarization scramblers for space applications

UC San Diego UC San Diego Previously Published Works

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

CSE 166: Image Processing. Overview. Representing an image. What is an image? History. What is image processing? Today. Image Processing CSE 166

Lesson 10 November 10, 2009 BMC Elementary

Feature-Based Analysis of Haydn String Quartets

Sharif University of Technology. SoC: Introduction

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

LCD and Plasma display technologies are promising solutions for large-format

Neural Network for Music Instrument Identi cation

CS 61C: Great Ideas in Computer Architecture

Motion Video Compression

Homework 2 Key-finding algorithm

Music Alignment and Applications. Introduction

Problem Points Score USE YOUR TIME WISELY USE CLOSEST DF AVAILABLE IN TABLE SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Sandwich. Reuben BLT. Egg salad. Roast beef

Introduction to Digital Signal Processing (Discrete-time Signal Processing) Prof. Ja-Ling Wu Dept. CSIE & GINM National Taiwan University

Sensors, Measurement systems Signal processing and Inverse problems Exercises

Encoders and Decoders: Details and Design Issues

CS 1699: Intro to Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh September 1, 2015

Elasticity Imaging with Ultrasound JEE 4980 Final Report. George Michaels and Mary Watts

Digital Audio and Video Fidelity. Ken Wacks, Ph.D.

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

VBM683 Machine Learning

Setting Energy Efficiency Requirements Using Multivariate Regression

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

DART Tutorial Sec'on 18: Lost in Phase Space: The Challenge of Not Knowing the Truth.

Time Domain Simulations

Base, Pulse, and Trace File Reference Guide

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

gresearch Focus Cognitive Sciences

AMD+ Testing Report. Compiled for Ultracomms 20th July Page 1

Professor Weissman s Algebra Classroom

COMP 9519: Tutorial 1

CHAPTER-9 DEVELOPMENT OF MODEL USING ANFIS

Machine Learning of Expressive Microtiming in Brazilian and Reggae Drumming Matt Wright (Music) and Edgar Berdahl (EE), CS229, 16 December 2005

Automatic Rhythmic Notation from Single Voice Audio Sources

CURIE Day 3: Frequency Domain Images

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Tutorial 0: Uncertainty in Power and Sample Size Estimation. Acknowledgements:

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Estimating Number of Citations Using Author Reputation

Heuristic Search & Local Search

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Subjective Similarity of Music: Data Collection for Individuality Analysis

Why Engineers Ignore Cable Loss

from ocean to cloud ADAPTING THE C&A PROCESS FOR COHERENT TECHNOLOGY

SoundExchange compliance Noncommercial webcaster vs. CPB deal

Research & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

Automatic Laughter Detection

MC9211 Computer Organization

Adaptive decoding of convolutional codes

Resampling Statistics. Conventional Statistics. Resampling Statistics

Essence of Image and Video

2.810 Manufacturing Processes and Systems Quiz #2. November 15, minutes

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

Singer Recognition and Modeling Singer Error

MITOCW watch?v=vifkgfl1cn8

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today

Module 8 : Numerical Relaying I : Fundamentals

Composer Style Attribution

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

m RSC Chromatographie Integration Methods Second Edition CHROMATOGRAPHY MONOGRAPHS Norman Dyson Dyson Instruments Ltd., UK

Week 14 Music Understanding and Classification

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

Notes on Digital Circuits

Imaging diagnostico in Sanità Stato attuale e prospettive

Transcription:

CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016

Plan for Today More Matlab Measuring performance The bias-variance trade-off

Matlab Tutorial http://cs.brown.edu/courses/cs143/2011/doc s/matlab-tutorial/ https://people.cs.pitt.edu/~milos/courses/cs2 750/Tutorial/ http://www.math.udel.edu/~braun/m349/ma tlab_probs2.pdf

Matlab Exercise http://www.facstaff.bucknell.edu/maneval/hel p211/basicexercises.html Do Problems 1-8, 12 Most also have solutions Ask the TA if you have any problems

Homework 1 http://people.cs.pitt.edu/~kovashka/cs2750/h w1.htm If I hear about issues, I will mark clarifications and adjustments in the assignment in red, so check periodically

ML in a Nutshell y = f(x) output prediction function features Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) Slide credit: L. Lazebnik

ML in a Nutshell Apply a prediction function to a feature representation (in this example, of an image) to get the desired output: f( ) = apple f( ) = tomato f( ) = cow Slide credit: L. Lazebnik

Data Representation Let s brainstorm what our X should be for various Y prediction tasks

Measuring Performance If y is discrete: Accuracy: # correctly classified / # all test examples Loss: Weighted misclassification via a confusion matrix In case of only two classes: True Positive, False Positive, True Negative, False Negative Might want to fine our system differently for FP and FN Can extend to k classes

Measuring Performance If y is discrete: Precision/recall Precision = # predicted true pos / # predicted pos Recall = # predicted true pos / # true pos F-measure = 2PR / (P + R)

Precision / Recall / F-measure True positives (images that contain people) True negatives (images that do not contain people) Predicted positives (images predicted to contain people) Predicted negatives (images predicted not to contain people) Precision = 2 / 5 = 0.4 Recall = 2 / 4 = 0.5 F-measure = 2*0.4*0.5 / 0.4+0.5 = 0.44 Accuracy: 5 / 10 = 0.5

Measuring Performance If y is continuous: Euclidean distance between true y and predicted y

Generalization Training set (labels known) Test set (labels unknown) How well does a learned model generalize from the data it was trained on to a new test set? Slide credit: L. Lazebnik

Generalization Components of expected loss Noise in our observations: unavoidable Bias: how much the average model over all training sets differs from the true model Error due to inaccurate assumptions/simplifications made by the model Variance: how much models estimated from different training sets differ from each other Underfitting: model is too simple to represent all the relevant class characteristics High bias and low variance High training error and high test error Overfitting: model is too complex and fits irrelevant characteristics (noise) in the data Low bias and high variance Low training error and high test error Adapted from L. Lazebnik

Bias-Variance Trade-off Models with too few parameters are inaccurate because of a large bias (not enough flexibility). Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Slide credit: D. Hoiem

Polynomial Curve Fitting Slide credit: Chris Bishop

Sum-of-Squares Error Function Slide credit: Chris Bishop

0 th Order Polynomial Slide credit: Chris Bishop

1 st Order Polynomial Slide credit: Chris Bishop

3 rd Order Polynomial Slide credit: Chris Bishop

9 th Order Polynomial Slide credit: Chris Bishop

Over-fitting Root-Mean-Square (RMS) Error: Slide credit: Chris Bishop

Data Set Size: 9 th Order Polynomial Slide credit: Chris Bishop

Data Set Size: 9 th Order Polynomial Slide credit: Chris Bishop

Question Who can give me an example of overfitting involving the Steelers and what will happen on Sunday?

How to reduce over-fitting? Get more training data Slide credit: D. Hoiem

Regularization Penalize large coefficient values (Remember: We want to minimize this expression.) Adapted from Chris Bishop

Polynomial Coefficients Slide credit: Chris Bishop

Regularization: Slide credit: Chris Bishop

Regularization: Slide credit: Chris Bishop

Regularization: vs. Slide credit: Chris Bishop

Polynomial Coefficients No regularization Huge regularization Adapted from Chris Bishop

How to reduce over-fitting? Get more training data Regularize the parameters Slide credit: D. Hoiem

Bias-variance Figure from Chris Bishop

Error Bias-variance tradeoff Underfitting Overfitting Test error High Bias Low Variance Complexity Training error Low Bias High Variance Slide credit: D. Hoiem

Test Error Bias-variance tradeoff Few training examples Many training examples High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem

Error Choosing the trade-off Need validation set (separate from test set) Test error Training error High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem

Error Effect of Training Size Fixed prediction model Generalization Error Testing Training Number of Training Examples Adapted from D. Hoiem

How to reduce over-fitting? Get more training data Regularize the parameters Use fewer features Choose a simpler classifier Slide credit: D. Hoiem

Remember Three kinds of error Inherent: unavoidable Bias: due to over-simplifications Variance: due to inability to perfectly estimate parameters from limited data Try simple classifiers first Use increasingly powerful classifiers with more training data (bias-variance trade-off) Adapted from D. Hoiem