Introductory Statistics. Lecture 1 Sinan Hanay

Similar documents
What is Statistics? 13.1 What is Statistics? Statistics

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

Key Maths Facts to Memorise Question and Answer

FILM ON DIGITAL VIDEO

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

STAT 250: Introduction to Biostatistics LAB 6

A Majority of Americans Use Apps to Watch Streaming Content on Their Televisions

Introduction slide 1 Digital Television 1. produced consumed New companies online continuation experimentation fragmenting reception dispersed

Distribution of Data and the Empirical Rule

Chapter 1 Midterm Review

Sector sampling. Nick Smith, Kim Iles and Kurt Raynor

The Relationship Between Movie theater Attendance and Streaming Behavior. Survey Findings. December 2018

ThinkNow Media How Streaming Services & Gaming Are Disrupting Traditional Media Consumption Habits Report

NETFLIX MOVIE RATING ANALYSIS

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

Residuals Informational Meeting. Los Angeles March 24, 2016

Supplementary Figures Supplementary Figure 1 Comparison of among-replicate variance in invasion dynamics

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Frequencies. Chapter 2. Descriptive statistics and charts

COMMUNITY NEEDS & INTERESTS QUESTIONNAIRE

Time Domain Simulations

Branson Gospel Sunday. How To Access Branson Gospel Sunday programming and the vast programming. available on FARM AND RANCH TV

CUT THE CORD THINGS TO CONSIDER BEFORE FIRING YOUR CABLE COMPANY. Hewie Poplock March, 2019

Algebra I Module 2 Lessons 1 19

Reviews of earlier editions

Polaris Nordic Digital Music in the Nordics. By: Simon Bugge Jensen & Marie Christiansen Krøyer

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

2016 Cord Cutter & Cord Never Study

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

Cambridge International Examinations Cambridge International Advanced Subsidiary and Advanced Level

Cutting the Cable. Mark Schulman

Tutorial 0: Uncertainty in Power and Sample Size Estimation. Acknowledgements:

Chapter 21. Margin of Error. Intervals. Asymmetric Boxes Interpretation Examples. Chapter 21. Margin of Error

Sources of Error in Time Interval Measurements

ONLINE VIDEO. Market situation

5INSIGHTS TO KNOW CONTENT MATTERS IDEAS IMPACTING THE CONTENT COMMUNITY 2016 Q3 ISSUE #1

Best Practices & Specifications For the Delivery of Image Artwork for Digital Audio Visual Distribution

Thinking Involving Very Large and Very Small Quantities

More About Regression

The Most Important Findings of the 2015 Music Industry Report

Introduction. Barbara Mitra 1

(Week 13) A05. Data Analysis Methods for CRM. Electronic Commerce Marketing

The Relationship Between Movie Theatre Attendance and Streaming Behavior. Survey insights. April 24, 2018

Margin of Error. p(1 p) n 0.2(0.8) 900. Since about 95% of the data will fall within almost two standard deviations, we will use the formula

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Business Case for CloudTV

Owner User Office Building For Sale with Living Space

Title VI in an IP Video World

Quantitative methods

Monday 15 May 2017 Afternoon Time allowed: 1 hour 30 minutes

When do two squares make a new square

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

CUT THE CORD THINGS TO CONSIDER BEFORE FIRING YOUR CABLE COMPANY. Hewie Poplock January, 2019

Novio Boy. Student Journal. Reading Schedule. by Gary Soto. Do people s opinions affect how we act? Why? Group members:

We stand for competition and media diversity

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Chapter 6. Normal Distributions

Prime Hollywood Office Building Great Owner/User or Investment Opportunity

Restoration of Hyperspectral Push-Broom Scanner Data

Neural Network Predicating Movie Box Office Performance

What Impact Will Over-the-Top Video Have on My Bottom Line

Information Networks

AGAINST ALL ODDS EPISODE 22 SAMPLING DISTRIBUTIONS TRANSCRIPT

Chapter 3. Averages and Variation

Other funding sources. Amount requested/awarded: $200,000 This is matching funding per the CASC SCRI project

K-Pop Idol Industry Minhyung Lee

ANALYZING CERTAIN TEMPORAL DEPENDENCES IN NETFLIX DATA

Measuring Variability for Skewed Distributions

The One Penny Whiteboard

AMAC Foundation 2017 Seminar Series. Today s Workshop: Life After Cable Entertainment Alternatives OR Cutting the Cord

*On-Line appendix for non-tables, by Margo Schlanger

If These Walls Could Talk: Dallas Cowboys: Stories From The Dallas Cowboys Sideline, Locker Room, And Press Box By Nick Eatman

3. Population and Demography

Resampling Statistics. Conventional Statistics. Resampling Statistics

Three Traditional US Markets Reshaped by Tech Giants

STAT 503 Case Study: Supervised classification of music clips

OWNER/USER OFFICE BUILDING FOR SALE WITH LIVING SPACE

An informational presentation about cutting the ties from traditional cable television.

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

ENGLISH FILE Intermediate

Ah, Those Transitions

Sampler Overview. Statistical Demonstration Software Copyright 2007 by Clifford H. Wagner

Chapter 14. From Randomness to Probability. Probability. Probability (cont.) The Law of Large Numbers. Dealing with Random Phenomena

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Use black ink or black ball-point pen. Pencil should only be used for drawing. *

Lecture 10: Release the Kraken!

Release Year Prediction for Songs

Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today

The speed of life. How consumers are changing the way they watch, rent, and buy movies. Consumer intelligence series.

(Refer Slide Time 1:58)

Q1. In a division sum, the divisor is 4 times the quotient and twice the remainder. If and are respectively the divisor and the dividend, then (a)

Repeated measures ANOVA

Cowspiracy NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON

BINGE-WATCHING! TAX ISSUES! REGULATORY UNCERTAINTY!

TV Subscriptions and Licence Fees

Evaluating Oscilloscope Mask Testing for Six Sigma Quality Standards

Using digital content and online shopping n=4020

Transcription:

Introductory Statistics Lecture 1 Sinan Hanay

Image: wikipedia.org

There are three kinds of lies: Image: wikipedia.org

There are three kinds of lies: lies, Image: wikipedia.org

There are three kinds of lies: lies, damned lies, Image: wikipedia.org

There are three kinds of lies: lies, damned lies, and statistics. Image: wikipedia.org

There are three kinds of lies: lies, damned lies, and statistics. Mark Twain Image: wikipedia.org

Statistics Does Not Lie

but, what is Statistics?

Before that, what can Statistics do?

Statistics The heart of Data Science, Machine Learning Machine Learning The process of learning patterns by computer Some examples: Google Translate Google Driverless Car

Google driverless car completed 500,000 km accident-free

Watch movies online (or rent DVDs, like Tsutaya, Hulu)

Why Netflix Is #1?

Alternatives Why Netflix Is #1?

Why Netflix Is #1? Alternatives Amazon Prime, itunes, Hulu, Vudu, PSN,

Why Netflix Is #1? Alternatives Amazon Prime, itunes, Hulu, Vudu, PSN, May have many answers: price, amount of movies

Why Netflix Is #1? Alternatives Amazon Prime, itunes, Hulu, Vudu, PSN, May have many answers: price, amount of movies One cool feature of Netflix

Why Netflix Is #1? Alternatives Amazon Prime, itunes, Hulu, Vudu, PSN, May have many answers: price, amount of movies One cool feature of Netflix Movie recommendation system

Netflix Competition

movies watched Netflix Competition

Netflix Competition movies watched Develop a suggestion system that improves 10%

Netflix Competition movies watched the movies suggested Develop a suggestion system that improves 10%

Netflix Competition movies watched the movies suggested e z i r P r a l l o D n o i l l i M 1 Develop a suggestion system that improves 10%

How Does It Work? Machine Learning Based on Statistical Inference Other applications Credit score Amazon recommendation Travel sites: price predictors

Photo: wikipedia.org

Flight from Brasil to France on 31 May 2009 Photo: wikipedia.org

Flight from Brasil to France on 31 May 2009 Lost contact after a few hours Photo: wikipedia.org

Flight from Brasil to France on 31 May 2009 Lost contact after a few hours Five days later, the first wreckage was discovered Photo: wikipedia.org

Flight from Brasil to France on 31 May 2009 Lost contact after a few hours Five days later, the first wreckage was discovered What was the cause of the accident? Photo: wikipedia.org

The Cause Photo: wikipedia.org

The Cause They had to find the voice recorder (i.e. black box) Photo: wikipedia.org

Image: wikipedia.org

Image: wikipedia.org

Image: wikipedia.org 6,300 square km search area

Search for the Black Box By April 2011, it was still not found (22 months after the crash) Metron started to search using a statistical method In one week, a huge part of wreckage In May 2011, the black box was found

Probability What is the probability of at least two people having the same birthday in this class?(i.e. same month and day) Guess?

For 23 people, it is 50% For 30 people, it is 70% For 66 people, it is 99%

Statistics for Experiments Uncertainties in experiments and populations Image: freerangestock.com

Expressing Values

Expressing Values Is this 6.80 or 6.89 grams?

Expressing Values Is this 6.80 or 6.89 grams? Display shows only one digit

Expressing Values Is this 6.80 or 6.89 grams? Display shows only one digit Furthermore, the device rounds up or rounds down?

Expressing Values Is this 6.80 or 6.89 grams? Display shows only one digit Furthermore, the device rounds up or rounds down? Both are possible

Expressing Values Is this 6.80 or 6.89 grams? Display shows only one digit Furthermore, the device rounds up or rounds down? Both are possible It can be even 6.71 grams

Expressing Values Is this 6.80 or 6.89 grams? Display shows only one digit Furthermore, the device rounds up or rounds down? Both are possible It can be even 6.71 grams Express as 6.8 ± 0.1 g

Measure Length Image: flicker.com

Image: ebay.com

Image: ebay.com

Image: ebay.com

Thermal Expansion Image: ebay.com

Systematic Error Thermal Expansion Image: ebay.com

Statistics cannot fix systematic errors.

Systematic Error Statistics cannot eliminate systematic errors You need to calibrate the measurement devices Accurate measurements

Are we done, after fixing devices?

Photo: rolex.com

could be perfectly accurate but not precise enough Photo: rolex.com

Random Errors could be perfectly accurate but not precise enough Photo: rolex.com

Random Errors Maybe you have the perfect device. Photo: riverviews.net Photo: timex.com

Random Errors Maybe you have the perfect device. Photo: riverviews.net Photo: timex.com

Random Errors But you are not Maybe you have the perfect device. punctual enough Photo: riverviews.net Photo: timex.com

Random Errors Depends on the measurement Fortunately, Statistics can reduce the uncertainty

What Is Statistics? Statistics emerged as a communication tool Censuses as early as 3000 BC in Egypt

Data Name Height Acker Alex 1.96 436 NBA players How do we summarize? Adams Hassan 1.93 Afflalo Arron 1.96 Young Nick 1.98 Young Thaddeus 2.03

Centrality - Mean Name Height mean = sum of heights players Acker Alex 1.96 Adams Hassan 1.93 Afflalo Arron 1.96 436 NBA players Young Nick 1.98 Young Thaddeus 2.03

Centrality - Mean Name Height mean = sum of heights players Acker Alex 1.96 Adams Hassan 1.93 Afflalo Arron 1.96 436 NBA players (1.96 + 1.93 + 1.96 + + 2.03) / 436 = 2.01 meters Young Nick 1.98 Young Thaddeus 2.03

Centrality - Median Name Height Sort heights Mode: value in the middle Acker Alex 1.96 Adams Hassan 1.93 Afflalo Arron 1.96 436 NBA players Young Nick 1.98 Young Thaddeus 2.03

Centrality - Median Name Height Sort heights Mode: value in the middle Acker Alex 1.96 Adams Hassan 1.93 Afflalo Arron 1.96 436 NBA players Young Nick 1.98 Median= 2.03 meters Young Thaddeus 2.03

Centrality - Mode Name Height Acker Alex 1.96 Mode: Most frequent value Adams Hassan 1.93 Afflalo Arron 1.96 436 NBA players Young Nick 1.98 Young Thaddeus 2.03

Centrality - Mode Name Height Acker Alex 1.96 Mode: Most frequent value Adams Hassan 1.93 Afflalo Arron 1.96 436 NBA players Mode= 2.06 meters Young Nick 1.98 Young Thaddeus 2.03

Example 1 Shoe Size Mean 27 Median 28 Mode 29

Example 1 You own a shoe store Shoe Size Mean 27 Median 28 Mode 29

Example 1 You own a shoe store Shoe Size Can only manufacture one size Mean 27 Median 28 Mode 29

Example 1 You own a shoe store Shoe Size Can only manufacture one size Fitting should be exact Mean 27 Median 28 Mode 29

Example 1 You own a shoe store Shoe Size Can only manufacture one size Fitting should be exact Mean 27 Median 28 Which size would you set? Mode 29

Example 1 You own a shoe store Shoe Size Can only manufacture one size Fitting should be exact Mean 27 Median 28 Which size would you set? You should choose mode, 29. Mode 29

Example 2 Salary (M yen) Mean 5 Median 4.5 Mode 3.5

Example 2 Salary (M yen) You are a governor of 1,000 people Mean 5 Median 4.5 Mode 3.5

Example 2 Salary (M yen) You are a governor of 1,000 people You need to collect a tax of 1 billion yen Mean 5 Median 4.5 Mode 3.5

Example 2 Salary (M yen) You are a governor of 1,000 people You need to collect a tax of 1 billion yen Only fixed percentage Mean 5 Median 4.5 Mode 3.5

Example 2 Salary (M yen) You are a governor of 1,000 people You need to collect a tax of 1 billion yen Only fixed percentage Mean 5 Median 4.5 Not high, not low Mode 3.5

Example 2 Salary (M yen) You are a governor of 1,000 people You need to collect a tax of 1 billion yen Only fixed percentage Mean 5 Median 4.5 Not high, not low You should consider mean, and set tax as 20%. Mode 3.5

Is Centrality Enough? Yearly Salaries (million yen) Mean, mode and median are same Country A Country B Mean: 4.20 Median: 4 Mode: 4 Are they equal? 4 10 4 1 3 2 4 4 6 4

Is Centrality Enough? Yearly Salaries (million yen) Mean, mode and median are same Country A Country B Mean: 4.20 Median: 4 Mode: 4 4 10 No, we need another measure. 4 1 3 2 Are they equal? 4 4 6 4

Measure of Dispersion Salary A Difference from Mean Salary B Difference from Mean Mean: 4.20 4-0.20 4-0.20 3-1.2 4-0.2 6-1.8 10 5.80 1-3.20 2-2.20 4-0.20 4-0.20 sum of differences, A: 0.20 + 0.20 + 1.2 + 0.2 + 1.8 = 3.6 sum of differences, B: 5.80 + 3.20 + 2.20 + 0.20 + 0.20 = 11.6

Variance It is rather subjective However, Statisticians use something different Instead of differences, take squares of differences sum of differences, A: 0.20 + 0.20 + 1.2 + 0.2 + 1.8 = 3.2 use squares, 0.20 2 + 0.20 2 +1.2 2 + 0.20 2 + 1.80 2 = 4.8 Finally, divide by number of elements, 4.8/5= 0.96 Var(A) = 0.96, Var(B) = 9.76 or σ 2 (A) = 0.96, σ 2 (B) = 9.76

Mean: 4.20 Median: 4 Mode: 4 σ 2 (A) = 0.96 σ 2 (B) = 9.76 What is σ? Yearly Salaries (million yen) Country A Country B 4 10 4 1 3 2 4 4 6 4

Mean: 4.20 Median: 4 Mode: 4 σ 2 (A) = 0.96 σ 2 (B) = 9.76 What is σ? Yearly Salaries (million yen) Country A Country B 4 10 4 1 3 2 4 4 6 4

Standard Deviation Denoted by σ Square root of variance (σ 2 ) A measure of dispersion

Overview Why we need Statistics? What is Statistics? Reading Assignment: Sections 1.1-1.5 from the book Download and install R and Rstudio

The End