PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY

Similar documents
Release Year Prediction for Songs

Sentiment Analysis on YouTube Movie Trailer comments to determine the impact on Box-Office Earning Rishanki Jain, Oklahoma State University

Analysis of Film Revenues: Saturated and Limited Films Megan Gold

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

DOES MOVIE SOUNDTRACK MATTER? THE ROLE OF SOUNDTRACK IN PREDICTING MOVIE REVENUE

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Frequencies. Chapter 2. Descriptive statistics and charts

Music Source Separation

Modeling memory for melodies

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Front Inform Technol Electron Eng

Nature Neuroscience: doi: /nn Supplementary Figure 1. Emergence of dmpfc and BLA 4-Hz oscillations during freezing behavior.

THE DATA SCIENCE OF HOLLYWOOD: USING EMOTIONAL ARCS OF MOVIES

Other funding sources. Amount requested/awarded: $200,000 This is matching funding per the CASC SCRI project

MUSI-6201 Computational Music Analysis

Automatic Piano Music Transcription

Notes Unit 8: Dot Plots and Histograms

Placement Rent Exponent Calculation Methods, Temporal Behaviour, and FPGA Architecture Evaluation. Joachim Pistorius and Mike Hutton

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Description of Variables

Supplemental Material: Color Compatibility From Large Datasets

Measuring Variability for Skewed Distributions

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

arxiv: v1 [cs.sd] 13 Sep 2017

Analog Performance-based Self-Test Approaches for Mixed-Signal Circuits

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

STRONG MOTION RECORD PROCESSING FOR THE PEER CENTER

arxiv: v1 [cs.lg] 15 Jun 2016

Algebra I Module 2 Lessons 1 19

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

A Study of Predict Sales Based on Random Forest Classification

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

FAST MOBILITY PARTICLE SIZER SPECTROMETER MODEL 3091

BitWise (V2.1 and later) includes features for determining AP240 settings and measuring the Single Ion Area.

FPA (Focal Plane Array) Characterization set up (CamIRa) Standard Operating Procedure

Improving Frame Based Automatic Laughter Detection

The Million Song Dataset

Feature-Based Analysis of Haydn String Quartets

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

The One Penny Whiteboard

APPLICATION OF MULTI-GENERATIONAL MODELS IN LCD TV DIFFUSIONS

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Visual Encoding Design

Estimation of inter-rater reliability

Music Genre Classification and Variance Comparison on Number of Genres

Key Maths Facts to Memorise Question and Answer

Effects of acoustic degradations on cover song recognition

Sound Quality Analysis of Electric Parking Brake

BIBLIOMETRIC REPORT. Bibliometric analysis of Mälardalen University. Final Report - updated. April 28 th, 2014

This is a licensed product of AM Mindpower Solutions and should not be copied

gresearch Focus Cognitive Sciences

What can you tell about these films from this box plot? Could you work out the genre of these films?

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Effects of lag and frame rate on various tracking tasks

Chapter 1 Midterm Review

Reducing False Positives in Video Shot Detection

Televisions, Video Privacy, and Powerline Electromagnetic Interference

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

Draft 100G SR4 TxVEC - TDP Update. John Petrilla: Avago Technologies February 2014

Modeling sound quality from psychoacoustic measures

The Effect of DJs Social Network on Music Popularity

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

Understanding PQR, DMOS, and PSNR Measurements

A data mining approach to analysis and prediction of movie ratings

Lecture 10: Release the Kraken!

TWO-FACTOR ANOVA Kim Neuendorf 4/9/18 COM 631/731 I. MODEL

MANAGING HDR CONTENT PRODUCTION AND DISPLAY DEVICE CAPABILITIES

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

Supplementary Note. Supplementary Table 1. Coverage in patent families with a granted. all patent. Nature Biotechnology: doi: /nbt.

Sitting through commercials: How commercial break timing and duration affect viewership

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Neural Network Predicating Movie Box Office Performance

More About Regression

F1000 recommendations as a new data source for research evaluation: A comparison with citations

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

A repetition-based framework for lyric alignment in popular songs

A wavelet-based approach to the discovery of themes and sections in monophonic melodies Velarde, Gissel; Meredith, David

Gossip Spread in Social Network Models

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

MANOVA/MANCOVA Paul and Kaila

2013 Environmental Monitoring, Evaluation, and Protection (EMEP) Citation Analysis

Music Segmentation Using Markov Chain Methods

TABLE OF CONTENTS. Instructions:

Growing an Industrial Cluster Movie Production Incentives and the Georgia Film Industry

Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

DISTRIBUTION STATEMENT A 7001Ö

Feedback Control of SPS E-Cloud/TMCI Instabilities

Note for Applicants on Coverage of Forth Valley Local Television

Music Genre Classification

Mixed Models Lecture Notes By Dr. Hanford page 151 More Statistics& SAS Tutorial at Type 3 Tests of Fixed Effects

Draft Baseline Proposal for CDAUI-8 Chipto-Module (C2M) Electrical Interface (NRZ)

A Study on Cross-cultural and Cross-dataset Generalizability of Music Mood Regression Models

TOWARDS THE CHARACTERIZATION OF SINGING STYLES IN WORLD MUSIC

Evaluation of video quality metrics on transmission distortions in H.264 coded video

Analysis of AP/axon classes and PSP on the basis of AP amplitude

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington

Music Similarity and Cover Song Identification: The Case of Jazz

Transcription:

PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY

THE CHALLENGE: TO UNDERSTAND HOW TEAMS CAN WORK BETTER SOCIAL NETWORK + MACHINE LEARNING TO THE RESCUE

Previous research: team success Teamwork selection as an optimisation problem Anagnostopoulos et al. [2012], Tseng et al. [2004], Studied team success without social parameters Kim et al. [2013], Elberse [2007], Did not study the team as a whole Nemoto et al. [2011], Singh et al. [2011],

Previous research: social features Studied social parameter of individuals Papagelis et al. [2011], Li et al. [2013], Studied single social features Chen and Guan [2010], Schilling and Phelps [2007], Performed on small datasets Ghiassi et al. [2015],Oghina et al. [2012], No predictive analisys Uzzi and Spiro [2005], [Burt, 2009],

RESEARCH QUESTION: IN PREDICTIVE ANALYSIS OF TEAM SUCCESS, DOES USING MANY TOPOLOGICAL FEATURES FROM TEAMS HELP?

Methodology Start with large set of collaboration data (IMDB) Form a social network Filter irrelevant data Extract social features from team Characterize this never-before-seen data Apply Machine Learning Techniques Assess how social features help predict team success

DATASET IMDB [INTERNET MOVIE DATABASE] WORLD S LARGEST MOVIE DATASET DATE 1808 2014 SIZE 12,250 MOVIES 31,698 PRODUCERS

Associate Producer Co-Producer Executive Producer Line Producer Producer MOVIE S TYPICAL PRODUCING TEAM PRODUCERS THAT WORK TOGETHER ARE LINKED IN A SOCIAL NETWORK

Forming a Social Network Movies Producers Producer s Social Network

Removing inactive nodes

Filtering: 238K 32K Movies Filtering out movies that are Not connected to giant component Not from cinema Just one producer Released before 1930 (used for bootstrapping) Not feature length (< 30 min.) Not relevant (< 1,000 votes)

MOVIE S SUCCESS PARAMETERS NUMBER OF RATINGS (POPULARITY), AVERAGE RATING (ACCEPTANCE), GROSS (FINANCIAL SUCCESS)

Characterization: Movie Success Distribution of movie success Historical evolution of success distribution Correlation between different success metrics

(a) (b) (c) Movies Movies Movies 600 450 300 150 0 10 3 10 2 10 1 10 0 10 1 10 2 10 3 Gross (Million USD) 700 460 230 0 10 3 10 4 10 5 10 6 Votes 10 4 10 3 10 2 10 1 10 0 G 1 G 2 G 3 1 2 3 4 5 6 7 8 9 10 Rating HISTOGRAM OF MOVIE S SUCCESS PARAMETERS G1: TOP 10% MOVIES, G2: TOP 10 50% MOVIES, G3: ALL OTHER MOVIES

Movies 10 3 10 2 10 1 EXPLOSION IN MOVIE PRODUCTION 10 0 10 3 Gross 10 1 10 1 10 3 MORE MOVIES WITH LOWER GROSS NOW 10 5 10 5 c) Votes 10 4 10 3 OLD MOVIES RECEIVE LESS VOTES ) Rating 8.6 7.8 7.1 6.4 5.7 5.0 1930 1940 1950 1960 1970 Decade 1980 1990 EVOLUTION OF MOVIE S SUCCESS PARAMETERS DISTRIBUTION 2000 2010 ON TOP, HISTOGRAM OF MOVIE PRODUCTIONS COLOR CODED BY SUCCESS GROUP BIASED HIGHER RATINGS FOR OLD MOVIES

(a) (b) (c) Gross (Million USD) Rating (normalized) Gross (Million USD) 10 3 10 2 10 1 10 0 10 1 10 2 10 3 10 3 10 4 Votes 10 5 10 6 9 8 7 6 5 4 10 3 10 4 Votes 10 5 10 6 10 3 10 2 10 1 10 0 10 1 10 2 10 3 4 5 6 7 8 9 Rating (Normalized) HEXAGONAL SCATTER PLOT BETWEEN SUCCESS PARAMETERS DARKER BLUE SHADES REPRESENT HIGHER CONCENTRATION OF MOVIES 40 35 30 25 20 15 10 5 0 2.4 2.1 1.8 1.5 1.2 0.9 0.6 0.3 0.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 movies log(movies) log(movies) POPULAR MOVIES HAVE HIGHER GROSS POPULAR MOVIES HAVE HIGHER RATINGS MOVIES WITH HIGHER GROSS DEVIATE MORE FROM THE AVERAGE RATING

RESEARCH QUESTION: (IN THE CONTEXT OF MOVIE PRODUCING TEAMS) GIVEN DIFFERENT TEAMS THAT COULD PRODUCE A MOVIE, WHICH IS MORE LIKELY TO ACHIEVE SUCCESS?

Movie Characteristics GENRES (21) RUNTIME PRODUCTION BUDGET (NORM.) CONTINENTS (6)

Movie team Parameters: Ego # OF PAST EXPERIENCES LEVEL OF PREVIOUS SUCCESS IN-DEGREE CLOSENESS CLUSTERING COEFFICIENT BETWEENNESS NETWORK CONSTRAINT SQUARE CLUSTERING COEFFICIENT

Movie team Parameters: Pairwise SHARED FRIENDS NEIGHBOUR OVERLAP SHARED EXPERIENCE

Movie team Parameters: Global GLOBAL CLUSTERING COEFFICIENT AVERAGE SHORTEST PATH SMALL-WORLD-COEFFICIENT

Problem: many numbers from a single parameter ARITHMETIC MEAN HARMONIC MEAN MEDIAN STANDARD DEVIATION MINIMUM VALUE MAXIMUM VALUE NODE CONTRACTION

NUMBER OF FEATURES FOR EACH MOVIE 70 EGO FEATURES 10 PARAMS. X 7 AGG. WAYS 27 MOVIE FEATURES 21 GENRES + 6 CONTINENTS 3 MOVIE PARAMS. RUNTIME, TEAM SIZE, BUDGET 3 GLOBAL METRICS Q, CLUSTERING, AVG. PATH LENGTH 121 TOTAL DISTINCT FEATURES

Characterization: Movie Teams Parameters Distribution of parameters Historical evolution of parameters Relation between success metrics and parameters Distribution of movies in pairs of characteristics

6.0 10.5 150 120 9.0 7.5 90 6.0 60 0.24 7.5 0.16 7.0 6.5 0.08 6.0 0.00 5.5 Runtime Mean: Previous (minutes) votes Team Mean: metrics: Gross (Billion Previous USD) ratings 40 S 70 S: MOVIES WERE LONGER EFFECT OF CHRONOLOGICALLY BIASED RATINGS 120 10.5 90 9.0 60 7.5 30 6.0 Team metrics: Previous votes experience Mean: Previous votes 1930 1940 1950 1960 1970 Decade 1980 EVOLUTION OF MOVIE S FEATURES DISTRIBUTION PREV. VOTES, PREV. RATINGS, MOVIE RUNTIME 1990 2000 2010 RECENT MOVIES: +TEAMS THAT PREVIOUSLY PRODUCED POPULAR MOVIES

Mean: Gross (Billion USD) 0.24 0.16 0.08 0.00 Team metrics: Previous experience 120 90 60 30 0 1930 1940 1950 1960 1970 1980 1990 2000 2010 Decade EVOLUTION OF MOVIE S FEATURES DISTRIBUTION MEAN OF PREVIOUS GROSS, TEAM S PREVIOUS EXPERIENCE

450 300 150 0 16 12 8 4 0.4 0.3 0.2 0.1 Team metrics: Degree Team metrics: Team size Median: Closeness Harmonic mean: Clustering 2000 S: EXPLOSION OF TEAM S DEGREE 2000 S: MUCH HIGHER # OF PRODUCERS PER TEAM CLOSENESS: AN EVOLVING CHARACTERISTIC 0.8 0.6 0.4 0.2 0.0 1930 1940 1950 1960 1970 1980 1990 2000 2010 CLUSTERING: FAIRLY STABLE DISTRIBUTION Decade DISTRIBUTION MOVIE S FEATURES

Closeness Runtime Closeness Net. constraint Budget Team Size Degree Prev. votes G 1 G 2 G 3 Team Size Prev. gross INTERACTION BETWEEN PAIRS OF FEATURES DARK CLUSTERS SHOW CONCENTRATION OF BLOCKBUSTERS

A SUCCESSFUL, FEATURE LENGTH MOVIE CAN T BE TOO SHORT 50 100 150 200 250 (a) Runtime (minutes) 4 5 6 7 8 (b) Team metrics: Previous ratings 7 8 9 10 11 12 10 8 (c) Team metrics: Previous votes TEAMS WITH MODERATE PREVIOUS RATINGS PERFORM BETTER (! ) TEAMS THAT HAVE PRODUCED MORE POPULAR MOVIES BEFORE PERFORM BETTER HISTOGRAM OF MOVIE S PARAMS, PER SUCCESS GROUP

7 8 9 10 11 12 10 8 (c) Team metrics: Previous votes 0.0 0.5 1.0 1.5 2.0 2.5 3.0 (d) Mean: Gross (Billion USD) 10 8 G 1 G 2 G 3 50 100 150 200 250 (e) Team metrics: Previous experience TEAMS THAT HAVE PRODUCED MORE MONEY BEFORE PERFORM BETTER TEAMS WITH SUMMED LOW EXPERIENCE PERFORM BADLY 0 200 400 600 800 1000 1200 1400 (a) Team metrics: Degree TEAMS WITH LOW DEGREE PERFORM BADLY 0.2 0.4 0.6 0.8 1.0 (b) Team metrics: Network Constraint SOCIALLY UNCONSTRAINED TEAMS PERFORM BETTER HISTOGRAM OF MOVIE S PARAMS, PER SUCCESS GROUP

BEST PERFORMING TEAMS ARE NEITHER SMALL NOR BIG 5 10 15 20 25 30 35 40 (c) Team metrics: Team size 0.10 0.15 0.20 0.25 0.30 (d) Median: Closeness G 1 G 2 G 3 0.2 0.4 0.6 0.8 1.0 (e) Harmonic mean: Clustering TEAMS WITH LOWER CLOSENESS PERFORM WORSE TEAMS WITH LOWER CLUSTERING (BY HARMONIC MEAN) PERFORM BETTER HISTOGRAM OF MOVIE S PARAMS, PER SUCCESS GROUP

Movie Success Forecast Movie Producing teams characteristics as features Movie success parameters as target variables Regressor: Bayesian Ridge (better to handle noise) Feature selection: eliminate features with less significance until model starts loosing accuracy

Feature selection Out of 121 features, 23 features were selected 19 Non-topological: Genres (9), Continent (3), Runtime(1), Budget(1), Previous success (4), Previous Experience (1) 4 Topological: Degree (1), Team Size (1), Closeness (1), Clustering (1)

Test R 2 : 0.694 Baseline R 2 : 0.399 Votes True value Baseline Test IMPROVEMENTS IN PREDICTION ACCURACY WITH SOCIAL FEATURES RED BARS REPRESENT ACCURACY GAINS IN THIS SAMPLE, RED BARS, LOSSES

Target Years Non Topol. Topologic All Gain Votes Gross Rating 2008 2013.529, ±.0008.310, ±.0006.556, ±.0008 5.10% 2000 2013.484, ±.0004.294, ±.0005.517, ±.0004 6.82% 1990 2013.437, ±.0003.246, ±.0004.464, ±.0003 6.18% 2008 2013.431, ±.0008.170, ±.0013.448, ±.0009 3.94% 2000 2013.419, ±.0004.175, ±.0005.447, ±.0004 6.68% 1990 2013.392, ±.0004.174, ±.0004.435, ±.0003 10.97% 2008 2013.271, ±.0011.033, ±.0009.281, ±.0012 3.69% 2000 2013.267, ±.0006.038, ±.0003.273, ±.0006 3.37% 1990 2013.258, ±.0004.031, ±.0003.262, ±.0005 1.55% OVERALL GAIN IN PREDICTIVE ACCURACY (R2), 95% C.I.

Contributions Improvement to the state-of-the-art in movie success forecasting In-depth characterization of social aspects of a large collaborative network Presented a new approach for extensive aggregation of social metrics from agents in teams

THIS IS ONLY A FIRST LOOK IN HOW NETWORK TOPOLOGY ANALYSIS CAN HELP EXPLAIN COMPLEX HUMAN BEHAVIOR.