Lessons from the Netflix Prize: Going beyond the algorithms

Similar documents
1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington

VBM683 Machine Learning

Release Year Prediction for Songs

Modeling memory for melodies

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

N12/5/MATSD/SP2/ENG/TZ0/XX. mathematical STUDIES. Wednesday 7 November 2012 (morning) 1 hour 30 minutes. instructions to candidates

DV: Liking Cartoon Comedy

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Abstract. Keywords Movie theaters, home viewing technology, audiences, uses and gratifications, planned behavior, theatrical distribution

Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

Enabling editors through machine learning

THE SHORT STORY. Title of Selection: Author: Characters: the people or animals who are in a story. Setting: the time and place in which a story occurs

ANALYZING CERTAIN TEMPORAL DEPENDENCES IN NETFLIX DATA

The Chorus Impact Study

Music Genre Classification and Variance Comparison on Number of Genres

Use black ink or black ball-point pen. Pencil should only be used for drawing. *

The Relationship Between Movie theater Attendance and Streaming Behavior. Survey Findings. December 2018

The Million Song Dataset

The Impact of Media Censorship: Evidence from a Field Experiment in China

Overview of ITU-R BS.1534 (The MUSHRA Method)

Time Domain Simulations

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

Pre-Processing of ERP Data. Peter J. Molfese, Ph.D. Yale University

REACHING THE UN-REACHABLE

Personalized TV Recommendation with Mixture Probabilistic Matrix Factorization

Resampling Statistics. Conventional Statistics. Resampling Statistics

3rd takes a long time/costly difficult to ensure whole population surveyed cannot be used if the measurement process destroys the item

The Relationship Between Movie Theatre Attendance and Streaming Behavior. Survey insights. April 24, 2018

Appendix A: Sample Selection

Exploring the Monty Hall Problem. of mistakes, primarily because they have fewer experiences to draw from and therefore

Supplementary Note. Supplementary Table 1. Coverage in patent families with a granted. all patent. Nature Biotechnology: doi: /nbt.

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Modeling sound quality from psychoacoustic measures

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

The Fox News Eect:Media Bias and Voting S. DellaVigna and E. Kaplan (2007)

Predicting the Importance of Current Papers

Sound Quality Analysis of Electric Parking Brake

Building Trust in Online Rating Systems through Signal Modeling

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

NETFLIX MOVIE RATING ANALYSIS

Machine Learning: finding patterns

Comparing gifts to purchased materials: a usage study

Incorporation of Escorting Children to School in Individual Daily Activity Patterns of the Household Members

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Penultimate Check-Up on Election 42: LIBERALS OPENING UP DAYLIGHT?

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Automatic Piano Music Transcription

Temporal coordination in string quartet performance

Advanced Video Processing for Future Multimedia Communication Systems

PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY

Sector sampling. Nick Smith, Kim Iles and Kurt Raynor

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

Data Science + Content. Todd Holloway, Director of Content Science & Algorithms for Smart Content Summit, 3/9/2017

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

StaMPS Persistent Scatterer Exercise

Using Genre Classification to Make Content-based Music Recommendations

REPORT TO CONGRESS ON STALKING AND DOMESTIC VIOLENCE, 2005 THROUGH 2006

TEST BANK. Chapter 1 Historical Studies: Some Issues

Set-Top-Box Pilot and Market Assessment

Singer Traits Identification using Deep Neural Network

Music Composition with RNN

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

A Discriminative Approach to Topic-based Citation Recommendation

Making Sense of Recommendations. Jon Kleinberg Cornell University. Sendhil Mullainathan Harvard University

AUDIO/VISUAL INDEPENDENT COMPONENTS

bwresearch.com twitter.com/bw_research facebook.com/bwresearch

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Similarity and Cover Song Identification: The Case of Jazz

Measuring Playlist Diversity for Recommendation Systems

Do Television and Radio Destroy Social Capital? Evidence from Indonesian Villages Online Appendix Benjamin A. Olken February 27, 2009

Image author unknown.

Learning Word Meanings and Descriptive Parameter Spaces from Music. Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab

The use of bibliometrics in the Italian Research Evaluation exercises

Comparison of NRZ, PR-2, and PR-4 signaling. Qasim Chaudry Adam Healey Greg Sheets

Music Genre Classification

LOCOCODE versus PCA and ICA. Jurgen Schmidhuber. IDSIA, Corso Elvezia 36. CH-6900-Lugano, Switzerland. Abstract

Precision testing methods of Event Timer A032-ET

Reebok Reaches Light TV Viewers with Google and YouTube

Linear mixed models and when implied assumptions not appropriate

Recap: Representation. Subtle Skeletal Differences. How do skeletons differ? Target Poses. Reference Poses

Personalized TV Watching Behaviour Recommendations for Effective User Fingerprinting

Relationships Between Quantitative Variables

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

Working Assumptions about Hollywood and History

LCD and Plasma display technologies are promising solutions for large-format

StaMPS Persistent Scatterer Practical

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition

Masking in Chrominance Channels of Natural Images Data, Analysis, and Prediction

ONLINE SUPPLEMENT: CREATIVE INTERESTS AND PERSONALITY 1. Online Supplement

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Acoustic and musical foundations of the speech/song illusion

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

Analysis of Video Transmission over Lossy Channels

Research & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION

True Random Number Generation with Logic Gates Only

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

COSC282 BIG DATA ANALYTICS FALL 2015 LECTURE 11 - OCT 21

Transcription:

Lessons from the Netflix Prize: Going beyond the algorithms Yehuda Koren movie #868 Haifa movie #76 movie #666

We Know What You Ought To Be Watching This Summer

We re quite curious, really. To the tune of one million dollars. Netflix Prize rules Goal to improve on Netflix existing movie recommendation technology, Cinematch Criterion: reduction in root mean squared error (RMSE) Oct 6: Contest began Oct 7: $K progress prize for 8.% improvement Oct 8: $K progress prize for 9.% improvement Sept 9: $ million grand prize for.6% improvement

Movie rating data Training data Test data Training data user movie score user movie million ratings 6 96 8, users 7 7,77 movies 6 years of data: - 768 76 7 Test data Last few ratings of each user (.8 million) Dates of ratings are given 6 6 68 76 6 6 6 8 9 7 69 8

Data >> Models Very limited feature set User, movie, date Places focus on models/algorithms Major steps forward associated with incorporating new data features Temporal effects Selection bias: What movies a user rated Daily rating counts

Multiple sources of temporal dynamics Item-side effects: Product perception and popularity are constantly changing Seasonal patterns influence items popularity User-side effects: Customers ever redefine their taste Transient, short-term bias; anchoring Drifting rating scale Change of rater within household

Something Happened in Early

Are movies getting better with time

Temporal dynamics - challenges Multiple effects: Both items and users are changing over time Scarce data per target Inter-related targets: Signal needs to be shared among users foundation of collaborative filtering cannot isolate multiple problems Common concept drift methodologies won t hold. E.g., underweighting older instances is unappealing

Effect of daily rating counts Number of ratings user gave on the same day is an important indicator It affects different movies differently Credit to: Martin Piotte and Martin Chabbert

Memento vs Patch Adams Memento (78 samples)...9.8.7.6.... - 8-6 - 9-7 6-8 - 6 6-9 + 7 Patch Adams (769 samples).9.8.7.6... - 8-6 - 9-7 6-8 - 6 6-9 + 7. Credit to: Martin Piotte and Martin Chabbert

Why daily rating counts Number of user ratings on a date is a proxy for how long ago the movie was seen Some movies age better than others Also, two rating tasks: Seed Netflix recommendations Rate movies as you see them Related to selection bias

Biases matter! Components of a rating predictor user bias movie bias user-movie interaction Baseline predictor Separates users and movies Often overlooked Benefits from insights into users behavior Among the main practical contributions of the competition User-movie interaction Characterizes the matching between users and movies Attracts most research in the field Benefits from algorithmic and mathematical innovations

A baseline predictor We have expectations on the rating by user u to movie i, even without estimating u s attitude towards movies like i Rating scale of user u Values of other ratings user gave recently (day-specific mood, anchoring, multi-user accounts) (Recent) popularity of movie i Selection bias; related to number of ratings user gave on the same day

Sources of Variance in Netflix data Biases % Unexplained 7% Personalization % +.7 (unexplained). (biases) +.9 (personalization).76 (total variance)

What drives user preferences Do they like certain genre, actors, director, keywords, etc. Well, some do, but this is far from a complete characterization! E.g., a recent paper is titled: Recommending new movies: even a few ratings are more valuable than metadata [Pilaszy and Tikk, 9] User motives are latent, barely interpretable in human language Can be captured when data is abundant

Wishful perception serious Braveheart The Color Purple Amadeus Lethal Weapon Sense and Sensibility Ocean s Geared towards females Geared towards males The Princess Diaries The Lion King Independence Day Dumb and Dumber escapist

Complex reality

Ratings are not given at random! Distribution of ratings Netflix ratings Yahoo! music ratings Yahoo! survey answers Marlin, Zemel, Roweis, Slaney, Collaborative Filtering and the Missing at Random Assumption UAI 7

A powerful source of information: Characterize users by which movies they rated, rather than how they rated A dense binary representation of the data: users movies users movies Which movies users rate, R r ui ui, B b ui ui

Ensembles are Valuable for Prediction Our final solution was a linear blend of over 7 prediction sets Some of the 7 were blends Difficult, or impossible, to build a grand unified model Blending techniques: linear regression, neural network, gradient boosted decision trees, and more Mega blends are not needed in practice A handful of simple models achieves 9% of the improvement of the full blend

Yehuda Koren Yahoo! yehuda@yahoo-inc.com