Neural Network Predicating Movie Box Office Performance

Similar documents
Detecting Musical Key with Supervised Learning

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Sentiment Analysis on YouTube Movie Trailer comments to determine the impact on Box-Office Earning Rishanki Jain, Oklahoma State University

Description of Variables

Hidden Markov Model based dance recognition

Analysis of Film Revenues: Saturated and Limited Films Megan Gold

Outline. Why do we classify? Audio Classification

Supervised Learning in Genre Classification

Automatic Music Genre Classification

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Arundel Partners TEAM 4

Music Genre Classification and Variance Comparison on Number of Genres

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

A Study of Predict Sales Based on Random Forest Classification

Chord Classification of an Audio Signal using Artificial Neural Network

Music Genre Classification

Neural Network for Music Instrument Identi cation

Composer Style Attribution

A Computational Model for Discriminating Music Performers

Enabling editors through machine learning

Sarcasm Detection in Text: Design Document

Jazz Melody Generation and Recognition

Automatic Piano Music Transcription

arxiv: v1 [cs.ir] 16 Jan 2019

Distortion Analysis Of Tamil Language Characters Recognition

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

IMDB Movie Review Analysis

TV RESEARCH, FANSHIP AND VIEWING

For the following resource view the trailer for Touching the Void at

Sean O Driscoll x

Automatic Laughter Detection

MUSI-6201 Computational Music Analysis

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

Dick Rolfe, Chairman

TELEVISIONS. Overview PRODUCT CATEGORY REPORT

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network

Sunday Maximum All TV News Big Four Average Saturday

Modeling memory for melodies

Automatic Laughter Detection

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Motion Picture, Video and Television Program Production, Post-Production and Distribution Activities

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Identifying Table Tennis Balls From Real Match Scenes Using Image Processing And Artificial Intelligence Techniques

Working Paper IIMK/WPS/284/QM&OM/2018/28. May 2018

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

DOES MOVIE SOUNDTRACK MATTER? THE ROLE OF SOUNDTRACK IN PREDICTING MOVIE REVENUE

IDENTIFYING TABLE TENNIS BALLS FROM REAL MATCH SCENES USING IMAGE PROCESSING AND ARTIFICIAL INTELLIGENCE TECHNIQUES

A Discriminative Approach to Topic-based Citation Recommendation

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

spackmanentertainmentgroup

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1,

The Bias-Variance Tradeoff

CS229 Project Report Polyphonic Piano Transcription

Automatic Music Clustering using Audio Attributes

GOLDEN DAWN FILMS, LLC Phn:

Netflix: Amazing Growth But At A High Price

EXPERIMENTAL STUDIES REGARDING THE IMPLEMENTATION POSSIBILITIES OF A QUALITY CONTROL SYSTEM FOR CERAMIC PRODUCTS IN CONTINUOUS FLUX PRODUCTION

INVESTOR PRESENTATION. March 2016

Domestic Box Office Admissions per Capita ( ) Admissions per cap Home entertainment advancements Cinematic experience advancements

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

GCE AS/A level 1182/01-A FILM STUDIES FM2 British and American Film

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

FILM, TV & GAMES CONFERENCE 2015

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Sonic's Third Quarter Results Reflect Current Challenges

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

The Interrelation of Box Office Results How does one weekend s movie attendance affect the next?

SALES DATA REPORT

This is a licensed product of AM Mindpower Solutions and should not be copied

Automatic Labelling of tabla signals

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

VLSI implementation of a skin detector based on a neural network

Week 14 Music Understanding and Classification

Centre for Economic Policy Research

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington

Music Composition with RNN

Improving Performance in Neural Networks Using a Boosting Algorithm

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Speech Recognition Combining MFCCs and Image Features

Feature-Based Analysis of Haydn String Quartets

Efficient Implementation of Neural Network Deinterlacing

Other funding sources. Amount requested/awarded: $200,000 This is matching funding per the CASC SCRI project

THE UK FILM ECONOMY B F I R E S E A R C H A N D S T A T I S T I C S

DISTRIBUTION B F I R E S E A R C H A N D S T A T I S T I C S

Lesson 49: Cinema (20-25 minutes)

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

Topics in Computer Music Instrument Identification. Ioanna Karydi

gresearch Focus Cognitive Sciences

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

CALCULATING SIMILARITY OF FOLK SONG VARIANTS WITH MELODY-BASED FEATURES

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

Introduction to Knowledge Systems

ur-caim: Improved CAIM Discretization for Unbalanced and Balanced Data

Release Year Prediction for Songs

Transcription:

Neural Network Predicating Movie Box Office Performance Alex Larson ECE 539 Fall 2013

Abstract The movie industry is a large part of modern day culture. With the rise of websites like Netflix, where people are able to watch hundreds of movies at any time, it is evident that film is a large part of our culture today. Movie studios are always trying to come up with the next big thing to make the largest profit. Studios have been adapting books, plays, and comic books to cash in on an already existing popular intellectual property. Studios have also been remaking older films in the hopes that they will have the same level of success as its predecessor. Making a movie is an expensive endeavor and people want to know if a remake, an adaptation, or an entirely new idea will be successful. Some current examples of how things are being predicated as being done by using data from sites like google and Wikipedia. Studies have been done using the number of searches a movie gets on google or how many hits a Wikipedia page gets for a certain movie to predict its box office success. The above methods have been shown to work well but I also believe you can predict the success of a movie based on many of its features. Some of these features may include genre, budget, release date, which studio making the movie, if the movie is or is not a new intellectual property, actors involved, MPAA rating (PG, PG-13, etc.), and many more. Using these features one should be able a prediction of a movie s potential box office success. I propose to use some artificial neural network methods to classify and predict a movie s potential box office success. Using some of the above features of movies described above I would like to create a data set based on movies within the past few years. After a good set of features and classes have been established, I will use artificial neural network algorithms and experiment with various pattern recognition classifiers like Multi-Layer Perceptron (MLP), k-nearest neighbor classifier, etc. to predict the potential box office success of a movie. Introduction and Motivation The movie industry is a large part of modern day culture. Many companies look to profit off the success of a movie. The distributor of the movie gains the profit from ticket sales while many other companies advertise and promote their products by featuring them in movies or having the movie associated with their own products to boost revenue. One major motivation behind this project to help investors choose which movies could have the highest possible return. Movies are very expensive to make and some wish to know if the payoff will be worth their investment. Movies are also something I

enjoy very much. Like many people I think they are a wonderful form of entertainment. It was my hope that this project would be fun and interesting way to look deeper into movies and the box office performance behind them. Related Work There have been a few recent projects that have dealt with predicating movie box office performance. One study was done based on the hits of a movie s Wikipedia page. The researchers for this study analyzed the activity of editors on the online encyclopedia Wikipedia. Based on this data they built a minimalistic predictive model for a movies box office success. [1] Google also performed research on movies box office success. Google used trailer related searches for a particular movie along with the franchise status of the movie and the season to predict the opening weekend of a movie with 94% accuracy. Problem Statement The goal of this project is to predict the potential box office success of a given movie based only on its given characteristics at its release. Data The data for the project was acquired from the-numbers.com. This website tabulates many movie characteristics and statistics. Movie data from the years 2008, 2009, 2010, 2011, 2012, and an incomplete version of 2013 were obtained. This project was performed late in 2013. While it was incomplete its data was still a good representation for movies released earlier in 2013. Features that were extracted from the data were as follows: movie s release month, distributor, genre, MPAA rating, and whether or not the movie was a sequel. Values were assigned for distributors, genre, and MPAA rating. For each year a subset of movies were selected at random from the top performing movies for that given year. Based on the movies yearly gross I choose to divide the data into 3 classes: Movies grossing less that 49 million, between 49 million and 91 million and more than 91 million. This data was

then translated into machine readable text flies that were used by various MATLAB programs used to run the experiments for this project. Experiments Using the MATLAB programs from the ECE 539 website various experiments were done with the k nearest neighbor classifier, maximum likelihood classifier and multilayer perceptron. The initial results of experimentation were not promising. Each classifier was achieving on average around 30% classification rate. This value is unacceptable because it is essentially the same as random guessing when there are 3 possible classification labels. From here the data was reevaluated. I plotted histograms each feature for each class label. I found that there were many outliers in the distributor, and genre features. Some smaller distribution studios would have a successful movie in one of the years where data was collected but not in others. Similarly in genre some genres like western and musical for example are just not represented enough in the data. These outliers where then removed from the data. The values assigned to the features were also reorganized. The distributor with the most successful movies was given a higher value, and the same thing was done with genre and MPAA rating. Results For all classifiers cross validation was used. I would leave one year out of the training data and train the classifier with the data from the remaining years. After classification had completed I would test the trained classifier using the data from the remaining year that was not included in the training data. The k-nearest neighbor classifier was the fastest of the 3 classifiers used. For the knn classifier I tested many different values of for K. the best results I achieved where when I used 14 nearest neighbors. This resulted in and average classification rate around 48% an improvement from the first implementation. KNN Classifier Testing Data 2008 2009 2010 2011 2012 2013 Average C Rate (%) 48 64 52 56 32 36 48

Confusion Matrix 31 12 7 24 14 12 15 8 27 I then performed classification of the data using the maximum likelihood classifier. This classifier also computes its results very quickly. The results of the maximum likelihood classifier do not change between different runs so this classifier only had to be run once. This classifier performed on average about as well as the knn model. Maximum Likelihood Classifier Testing Data 2008 2009 2010 2011 2012 2013 Average C Rate (%) 48 56 52 56 48 24 47.3 Confusion Matrix 34 10 6 25 10 15 11 12 27 Finally classification was done using the multi-layer perception. Many various perceptron networks were experimented with. This program took the longest out of the three classifiers to run. It also was run over multiple trials because the results change for each trial run. The MLP training was showing promise. It was classifying around 60% during training but when it came to the actual testing data it performing similarly to the knn and maximum likelihood classifiers with an average classification rate around 47.3%.

MLP back propagation Testing Data 2008 2009 2010 2011 2012 2013 Average C Rate (%) 52 48 48 52 40 44 47.3 Confusion Matrix 23 14 13 15 18 17 13 8 29 Discussion The results of these experiments where not superb but they were an improvement from my preliminary classification runs. Some interesting predictions that I found with the MLP model for 2013 were that it correctly predicted into the most successful class label were Iron Man 3, Hunger Games: Catching Fire, and Oblivion. Some interesting misclassifications were Gravity which was in the most successful category but classified in the worst. Other interesting misclassifications were After Earth and The Internship both did poorly but were predicted to do well. All three classifiers tended to do better classifying movies on for either the low class or the high class where in the middle it would seldom choose correctly. There may not be enough of a correlation between this set of feature vectors and the chosen class labels. Movie performance can be erratic as shown in the preliminary testing. Every so often you get outliers that come out of nowhere from lesser known studios and do extremely well and on the other hand sometimes you have huge movie flops coming from studios that normally put out great movies. This classifier in the end did not perform as well as the google or Wikipedia classifiers. Some improvements that could be made to this data set would be to increase the sample size of the movies

this may lessen the effect of that outliers may have been effecting classification. Adding more features to the feature vectors could also improve performance. Other characteristics such as a movie s budget, leading actor, director could also have an effect on the classification. References: [1] Mestyán M, Yasseri T, Kertész J (2013) Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data. PLoS ONE 8(8): e71226.doi:10.1371/journal.pone.0071226 [2]Chen, Andrea, Panaligan Reggie (2013) Quantifying Movie Magic with Google Search [3] http://boxofficemojo.com [4] http://www.the-numbers.com