Quantifying Domestic Movie Revenues Using Online Resources in China

Similar documents
Quality improvement in measurement channel including of ADC under operation conditions

References and quotations

Line numbering and synchronization in digital HDTV systems

MODELLING PERCEPTION OF SPEED IN MUSIC AUDIO

Australian Journal of Basic and Applied Sciences

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Chapter 7 Registers and Register Transfers

Research on the Classification Algorithms for the Classical Poetry Artistic Conception based on Feature Clustering Methodology. Jin-feng LIANG 1, a

DIGITAL DISPLAY SOLUTION REAL ESTATE POINTS OF SALE (POS)

The Communication Method of Distance Education System and Sound Control Characteristics

Implementation of Expressive Performance Rules on the WF-4RIII by modeling a professional flutist performance using NN

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

Motivation. Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

RELIABILITY EVALUATION OF REPAIRABLE COMPLEX SYSTEMS AN ANALYZING FAILURE DATA

The new, parametrised VS Model for Determining the Quality of Video Streams in the Video-telephony Service

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

Reliable Transmission Control Scheme Based on FEC Sensing and Adaptive MIMO for Mobile Internet of Things

Analyzing the influence of pitch quantization and note segmentation on singing voice alignment in the context of audio-based Query-by-Humming

THE Internet of Things (IoT) is likely to be incorporated

Practice Guide Sonata in F Minor, Op. 2, No. 1, I. Allegro Ludwig van Beethoven

Read Only Memory (ROM)

NewBlot PVDF 5X Stripping Buffer

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

Voice Security Selection Guide

The Blizzard Challenge 2014

Internet supported Analysis of MPEG Compressed Newsfeeds

A Novel Method for Music Retrieval using Chord Progression

Emotional Intelligence:

Image Enhancement in the JPEG Domain for People with Vision Impairment

Global and China Flat Panel TV (FPTV) Industry Report,

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

Size Doesn t Really Matter

Polychrome Devices Reference Manual

Volume 20, Number 2, June 2014 Copyright 2014 Society for Music Theory

2 Specialty Application Photoelectric Sensors

T-25e, T-39 & T-66. G657 fibres and how to splice them. TA036DO th June 2011

PowerStrip Automatic Cut & Strip Machine

Study Guide. Advanced Composition

2018 PHILADELPHIA FILM FESTIVAL SPONSORSHIP OPPORTUNITIES October 18-28

Digest Journal of Nanomaterials and Biostructures Vol. 12, No. 3, July - September 2017, p

L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture

NexLine AD Power Line Adaptor INSTALLATION AND OPERATION MANUAL. Westinghouse Security Electronics an ISO 9001 certified company

Manual Industrial air curtain

THE UNIVERSITY OF THE SOUTH PACIFIC LIBRARY Author Statement of Accessibility. Yes % %

Incidence and Progression of Astigmatism in Singaporean Children METHODS

Before you submit your application for a speech generating device, we encourage you to take the following steps:

Image Intensifier Reference Manual

Manual RCA-1. Item no fold RailCom display. tams elektronik. n n n

Apollo 360 Map Display User s Guide

Analysis and Detection of Historical Period in Symbolic Music Data

Facial Expression Recognition Method Based on Stacked Denoising Autoencoders and Feature Reduction

Comparative Study of Different Techniques for License Plate Recognition

RHYTHM TRANSCRIPTION OF POLYPHONIC MIDI PERFORMANCES BASED ON A MERGED-OUTPUT HMM FOR MULTIPLE VOICES

PROJECTOR SFX SUFA-X. Properties. Specifications. Application. Tel

Streaming is the new broadcast!

TRAINING & QUALIFICATION PROSPECTUS

Daniel R. Dehaan Three Études For Solo Voice Summer 2010, Chicago

Math of Projections:Overview. Perspective Viewing. Perspective Projections. Perspective Projections. Math of perspective projection

Recognition of Human Speech using q-bernstein Polynomials

Manual Comfort Air Curtain

Randomness Analysis of Pseudorandom Bit Sequences

Linguistic Stereotyping in Hollywood Cinema

Index. LV Series. Multimedia Projectors FULL LINE PRODUCT GUIDE. usa.canon.com/projectors. REALiS LCOS Projectors. WUX10 Mark II D WUX10 Mark II...

COLLEGE READINESS STANDARDS

Organic Macromolecules and the Genetic Code A cell is mostly water.

A Simulation Experiment on a Built-In Self Test Equipped with Pseudorandom Test Pattern Generator and Multi-Input Shift Register (MISR)

Grammar 6: Sheet 1 Answer Guide

Film Education Overview of Activities Value for Money and Significant Returns. Contents. Bringing film to education across the UK

Guide to condition reports for domestic electrical installations

Working with PlasmaWipe Effects

Digital Migration Process in Kenya

Elizabeth H. Phillips-Hershey and Barbara Kanagy Mitchell

9311 EN. DIGIFORCE X/Y monitoring. For monitoring press-fit, joining, rivet and caulking operations Series 9311 ±10V DMS.

AN IMPROVED VARIABLE STEP-SIZE AFFINE PROJECTION SIGN ALGORITHM FOR ECHO CANCELLATION * Jianming Liu and Steven L Grant 1

Achieving 550 MHz in an ASIC Methodology

Debugging Agent Interactions: a Case Study

A Model of Metric Coherence

Detection of Historical Period in Symbolic Music Text

,..,,.,. - z : i,; ;I.,i,,?-.. _.m,vi LJ

ROUNDNESS EVALUATION BY GENETIC ALGORITHMS

UNIT 7. Could You...?

CODE GENERATION FOR WIDEBAND CDMA

How the IoT Fuels Airlines Industry's Flight into the Future

Research Article Measurements and Analysis of Secondary User Device Effects on Digital Television Receivers

Reference Question Data Mining

Entropy ISSN by MDPI

Using a Computer Screen as a Whiteboard while Recording the Lecture as a Sound Movie

SUPREME COURT OF THE STATE OF CALIFORNIA THE PEOPLE OF THE STATE OF CALIFORNIA,.) Plaintiff-Respondent~ Defendant-Appellant.

unit 10 Community Helpers by Joni Bowman

2 Specialty Application Photoelectric Sensors

University Student Design and Applied Solutions Competition

Belgrade Community & Education Company. Education Pack To support the exhibition at the Belgrade Theatre for use with young people aged 11+

Digest Journal of Nanomaterials and Biostructures Vol. 13, No. 2, April - June 2018, p

Forces: Calculating Them, and Using Them Shobhana Narasimhan JNCASR, Bangalore, India

Tobacco Range. Biaxially Oriented Polypropylene Films and Labels. use our imagination...

VOCALS SYLLABUS SPECIFICATION Edition

Music Scope Headphones: Natural User Interface for Selection of Music

A Backlight Optimization Scheme for Video Playback on Mobile Devices

Our competitive advantages : Solutions for X ray Tubes. X ray emitters. Long lifetime dispensers cathodes n. Electron gun manufacturing capability n

Transcription:

Quatifyig Domestic Movie Reveues Usig Olie Resources i Chia Jia Xiao 1*, Sog-a Yag 2, Xi Li 1, He Ji 1 ad Shazhi Che 3 1. State Key Laboratory of Networkig ad Switchig Beijig Uiversity of Posts ad Telecommuicatios No.10 Xitucheg Road, Haidia District, Beijig 100876, Chia. 2. Beijig Guodiatog Network Techology Co. Ltd. Beijig 100070, Chia. 3. State Key Laboratory of Wireless Mobile Commuicatios Chia Academy of Telecommuicatios Techology Beijig 100083, Chia. Abstract I this paper, we apply a machie learig method to predict a movie s box-office performace usig data collected from Chiese maistream search egies, video sites ad WeChat. Domestic films are firstly classified ito differet types accordig to movie idustry experts. Ad we make multiple liear regressio with oe year s samples of five mai geres. The these models are applied to predict the performace of the last film with the same type i 2014.The results show that four types have reached over 80 percet predictio accuracy except actio movies. The we prove that this type is more proe to be iaccurate tha others. Keywords - multiple liear regressio, box office reveue, pearso correlatio coefficiet, R-square. I. INTRODUCTION With the explosive speed of Chiese movie idustry, both the quatity ad the quality of domestic films have bee improved drastically. Accordig to data published by Chiese Flim & Televisio Idustry Research Ceter(CFIRC)[1], it is foud that the box office icome of Chia has icreased from more tha 13 billio i 2011 to early 30 billio i 2014, with a average growth ratio of about 30% aually. More ad more ivestors are willig to put their capital ito this idustry. Alibaba group eve created olie fiacial product promoted by Alipay platform to attract retail flows. However, oe sigle movie could be the differece of box office smash or loss for ivestors i a give moth. Not oly is box office icome the most importat idicator of the success of a film, but also the profits of a group of ivestors. To mitigate risks ad boost returs, the predictio of movie sales reveue i Chia is very worthy of study. I Sectio 2, we review the previous work i box office predictio. Sectio 3 depicts the overall distributio of box office reveue with data proof. Sectio 4 addresses the questio of useful features ad olie data sources of domestic movies i Chia. We describe the method of buildig models with multiple liear regressio based o film idustry expertise. Ad we predict the performace of the last movie i 2014 belogig to specific type. Give the predictio results, Sectio 5 aalyzes the predictio deviatio ad discusses the most powerful reasos cotributig to this pheomeo. II. RELATED WORKS I this area, researchers have ever stopped the pace. There already exist a great umber of works with various ethods i box office predictio. Geerally, these works could be categorized ito two types o the basis of the data source. Oe type is based o quatifyig the elemets extracted from the movie itself, the other is from the data people make public olie or offlie. Litma, who was the first oe i this directio, metioed critic s ratig i 1983[2]. Some papers make use of WOM, which is maily collected from social media, tryig to realize aalysis because of two mai reasos cotributig to this pheomeo. Firstly, there are volumes of data about movies ad related social media, which are easy to be accessed ad collected. Secodly, WOM exert positive or egative ifluece o reveue depedig o whether the film resoates with audieces ad there could be a correlatio betwee social media cotets ad movie box office. Xiwag Yag, Yag Guo ad Yog Liu(2011)[3] proposed a Baysia-iferece based movie recommedatio system which leverage the embedded social structure iside a social etwork to provid accurate ad persoalized recommedatios. HAO Yua-yua, LI Yi-ju, YE Qiag ad ZOU Peg(2013)[4] revealed the variaces of impacts of three iformatio sources, icludig volume ad valece of olie reviews, critic reviews ad movie productio budgets o movie box office reveues. Rui Yao ad Jiahua Che(2013)[5] applied setimet aalysis ad machie learig methods to study the DOI 10.5013/IJSSST.a.17.02.04 4.1 ISSN: 1473-804x olie, 1473-8031 prit

relatioship betwee the olie reviews for a movie ad its box office reveue performace. It is idicated that olie reviews ca be a good idicator for predictig a movie s box office reveue. Krushikath R.Apala(2013)[6] idetified several pattes accordig to the predictive models for the box office performace of the movies, o the basis of factors derived from Twitter, Youtube ad the IMDb movie database. Shyam Gopiath, Pradeep K.Chitaguta, Sriram Vekatarama(2013)[7] measured the effects of pre- ad post- release blog volume, blog valece ad advertisig o the performace of 75 movies i 208 geographic markets of the U.S. It is foud that release day performace is impacted most by pre-release blog volume ad advertisig, whereas post-release performace is iflueced by blog valece, user ratig valece ad advertisig. Jaehoo Lee,Giseop Noh ad Chog-kwo Kim(2014)[8] provided a visualizatio approach to fid clearly hidde relatios betwee movies ad their evaluatio. The word-ofmouth effects are proved through aalyzig the patters with reviews. Apart from WOM(word-of-mouth),film is a miraculous work of art composed of a variety of iate elemets, such as performer, director, gere, producer, script, etc. There is a partly correlatio betwee some of these elemets ad box office reveue. Differet researchers hold differet poits of view. Timothy Kig(2007)[9] poited out that there is o correlatio betwee critical ratigs for movies ad box office reveue. Zhag Yusog(2009)[10] revealed that the box office reveue has positive correlatio with capital of the film, has egative correlatio with piracy ad has almost othig to do with valece of the film, which meas large ivestmet equals to high icome. He Pig(2011)[11] isisted that creatio, time legth ad advertisig promotio are the key elemets of a successful film with high icome. She also deemed that the ivestmet determies the reveue. HU Xiao-li, LI Bo ad WU Zhegpeg(2013)[12] researched o the factors by the multiple liear regressio ad aalyzed the extet of several factors impact. They idicated that the combiatio of actor ad director is the most ifluetial factor amog all the elemets ad director cotributes more to reveue tha actor. Jehoshua Eliashberg,Sam K.Hui,Z.Joh Zhag (2014)[13] developed a methodology, which make use of textual features from scripts based o screewritig domai kowledge, huma iput, ad atural laguage processig techiques, to predict box office performace of a movie at the poit of gree-lightig. From the papers listed above, we could coclude that most researchers could ot achieve the objective of predictio because of three mai reasos. First, some of these papers oly summarize rules based o historical data without oticig idustry law. Secod, some could ot publish result i advace because usig laggig elemets such as WOM. Third, some paper provide oe solutio to all movies, which would defiitely cause to high error rate. Last but ot the least, all the paper listed above failed to prove validity i makig reveue predictio o Chiese domestic movies. We take a differet approach. The objective of this paper is firstly, to fid ad collect available olie data of domestic movies. Secodly, to costruct reasoable models through aalyzig film idustry law. Lastly, to publish predictio results with acceptable accuracy pre-release ad discuss possible reasos cotributig to deviatio. III. SURVEY OF THE OVERALL DISTRIBUTION Google had released a white paper i 2013, which revealed its box office predictio model, o the basis of followig parameters: search query volume, search ad click volume, theater cout, frachise status ad audiece score. Google aouced that the degree of correspodece betwee the reveue predicted by this model ad the actual value is 94%. Fig. (1). Compariso of 2012 Box Office Idex ad Film-Related Search Idex[14]. Nevertheless, the coclusio metioed above are all from historical data. Google has ever highlighted the magic whe cofrotig with ew films. Whether this method ca be applied ito Chiese market is also questioable. First of all, although it domiates i America search-egie market, Google may ot have much more of a future i Chia. I Fig.2, accordig to the data published by CNZZ i 2014, Baidu,360 ad sogou are the top 3 search egies i the Chiese market. Secodly, the olie behavior of Chiese movie audiece is ot the same with that of America film viewers. As Chiese populatio of mobile iteret users grows, query volume ad paid click volume provided by search egies may ot represet etwork patters of the potetial moviegoers. Last but ot the least, recommedatio from video websites remais oe of the most ifluetial sources throughout the decisio process i choosig a film to watch. Despite it lists secod ext oly to Google i terms of trailor search, youtube is icapable of sigifyig the searchig patter of Chiese film previews. Alteratively, domestic major video sites are youku, douba ad so o. Besides computer techicias, we do research i this project with some seior film idustry practitioers. We believe that movie busiess has its ow iheret DOI 10.5013/IJSSST.a.17.02.04 4.2 ISSN: 1473-804x olie, 1473-8031 prit

characteristics ad laws. It is imperative to start this research abidig by idustry rules. These seasoed experts are capable of elighte us with sharig their isights durig the cooperatio process. We gathered box office reveue from the website http://www.m1905.com with web crawler amed jsoup. The data is published by week basis, supported by the state admiistratio of radio, film ad televisio. All the movies i m1905 have correspodig movie IDs. We extracted the weekly box office by passig the movie ID to the code ad added them up to obtai the total box office of the movie. The box office reveue released by this site cotaied data from four areas, mailad, Japa, Hogkog ad North America. We collected data over the past four years from May 2010 to May 2014.The the profit distributio curve is draw to display the overall tred of box office i differet regios. Fig. (2). Chiese Search Egie Market i 2014 Fig. (3). Box Office Reveue Distributio Curve i Four Regios from 2010 to 201 As is show i the curve graph, more tha five sixths of the total movies reveue are extremely low. I cotrast, less tha oe sixth occupy a large cut of the market. That meas if we predict that the profit of ay movie ot released yet is poor, the accuracy rate is higher tha 80%. Obviously,the probability of a box-office hit is less tha 20% vice versa. This pheomeo follows Zipf s law. The curve of the box-office receipts i Mailad presets the characteristics of power-law distributio, which is depicted i Fig.4.Oly 153 out of 1163 films gaied over 100 millio yua i past four years. From the perspective of probability, we are more cocered with predictig the high box office receipt rather tha the low oe. IV. BOX OFFICE PREDICTION Although Google Flu Treds(GFT) have got amouts of criticism sice published i Nature i 2009[15],it is advisable to absorb the essece of the modelig process. To improve the precisio of mappig search queries to iflueza-like illess, researchers dissected the data source from two dimesios, time ad regio. A. First Dimesio Time is a extremely importat dimesio. The results of predictio will udergo radical chages over time, just as box office reveue. The umber of ciema screes has experieced a surge each year. The data collected more tha five years ago is more likely to be oise rather tha sigal to our model. Accordig to idustry isiders, Chia has begu to keep explicit, clear ad accurate box office records sice 2012.Cosequetly, the data prior to this year is ot available for this project. B. Secod Dimesio Data about iflueza chages greatly across differet states i America ad reveue also varies widely i differet regios whe mapped to film busiess. However, due to the goal of atioal box office predictio, the secod dimesio to our model is supposed to be chaged, which is ot regio but gere accordig to idustry experts. Gere summarizes the overall theme of a movie. It also could idetify the potetial target viewers. More importatly, the iput ad output of a movie vary drastically accordig to the differece of gere, which meas it determies ivestmet ad reveue to some extet. Plus, the appetite of the market, especially the audiece, for gere is chagig with time ad circumstace. Therefore, it is absolutely ecessary to classify movies i accordace with gere. There exists dozes of gere labels. Ad the classificatio of the same film has a certai differeces amog differet video websites. For istace, Comig Home, directed by the famous director Zhag Yimou i 2014, is labeled as drama by youku, while marked as romace, drama, biography ad history by m1905. Through discussio with movie idustry experts i the project team, we cosider eightee geres here: romace, comedy, actio, literary, horror, suspese, sciece-fictio, aimatio, adveture, child, war, family, biography, musical, fatasy, crime, history ad erot. I view of the fact that a film may have more tha oe gere label, we oly attach two labels to oe sigle film, facilitatig classificatio. The eightee geres were divided ito two groups, listed i Table 1. A film is supposed to attach oe mai label ad the other auxiliary label or oe. Takig a example metioed before, comig home is marked as literary ad romace. C. Model We origially plaed to exploit two years data to costruct the predictio model. Three data sources are DOI 10.5013/IJSSST.a.17.02.04 4.3 ISSN: 1473-804x olie, 1473-8031 prit

icluded: search egies, video websites ad film idustry expertise. Classificatio Mai Auxiliary TABLE I. GENRE CLASSIFICATION Gere Comedy, Actio, Literary, Horror, Suspese, Adveture, War, Child, Erotic Romace, Sci-fi, Aimatio, Family, Biography, Musical, Fatasy, Crime, History First we make use of search egies show i Fig.2 to acquire the popularity of the director, two leadig actors ad the film itself. The three search egies idices are collected respectively aimig at the same elemet listed i Table 2. However, these idices are released with a oe-day delay. We could oly calculate the result o the secod day of release with idices o the screeig day. Secod the data of three video websites is collected:m1905,youku ad douba. Firstly, the movies ad box office data are provided by m1905. Secodly, youku possesses data-rich idices ad it is market share first i Chia. TABLE II. ELEMENTS OF MODEL Series Data Source Appedix A B Movie Baidu Idex Movie Average Baidu Idex Baidu Baidu Screeig Date Oe Week before Showig C Movie 360 Idex 360 Screeig Date D E F G H Movie Average 360 Idex Movie Sougou Idex Movie Average Sougou Idex Director Baidu Idex Director Average Baidu Idex 360 Oe Week before Showig Sougou Sougou Baidu Baidu Screeig Date Oe Week before Showig Screeig Date Oe Week before Showig I Director 360 Idex 360 Screeig Date J Director Average 360 Idex 360 Oe Week before Showig K Director Sougou Idex Sougou Screeig Date L Director Average Sougou Idex Sougou Oe Week before Showig M Actors Baidu Baidu Screeig Date Idex N Actors Average Baidu Oe Week before Showig Baidu Idex O Actors 360 Idex 360 Screeig Date P Actors Average 360 Oe Week before Showig 360 Idex Q Actors Sougou Sougou Screeig Date Idex R Actors Average Sougou Oe Week before Showig Sougou Idex S Douba Score Douba Queryig Date T Youku Score Youku Queryig Date U M1905 Score M1905 Queryig Date Y Box Office Reveue M1905 Queryig Date Fig. (4). Box Office Reveue Distributio Curve i Mailad. Fig. (5). Box Office Scatter Diagram of Actio Films i 2013. Fig. (6). Box Office Scatter Diagram of Actio Films i 2014. Third, the coversio ratio from douba score to box office reveue is cosidered relatively high [16]. DOI 10.5013/IJSSST.a.17.02.04 4.4 ISSN: 1473-804x olie, 1473-8031 prit

Furthermore, youku provide the followig eight sets of data: umber of collectio, umber of commets, umber of support, umber of trample, movie whole etwork search idex, movie whole etwork play idex, movie youku play idex ad movie tudou play idex. But time dimesio is ot icluded i these data ad these data will keep o chagig over time. I view of feedback iterferece, these data are abadoed i the process of modelig. 1 TABLE III. ANALYSIS OF VARIANCE OF (1) Model Sum of Squares df Mea Square F Sig Regressio 1320872 20 66044 399.000b Residual 496.174 3 165 Total 1321368 23 is show i Fig.5.The reveue of the 25th oe which was o show had already bee over 0.4 billio by the ed of 2014. Comparig Fig.5 ad Fig.6, we ca coclude that the umber of actio films is decreasig while the proportio of high box office films is icreasig. I order to make the model reflect chages of market tred, we oly adopt the data of 2014 to build the predictio model. As show i Table 2,there are twety-two idepedet variables. We cosider box office reveue as the depedet variable, ad make a multiple liear regressio model. The regressio equatio with eter method is as follow: -6 Y = - 15. 986 + 0. 001 A - 318 10 B - 0. 002 C + 0. 001 D + 0. 003 E - 0. 002 F + 0. 145 G - 0. 255 H - 0. 147 I + 0. 644 J - 0. 838 K - 0. 001 M - 0. 004 N - 0. 017 O + 0. 02 P + 0. 067 Q - 0. 053 R + 20. 793 S + 20. 616 T - 34. 749 U (1) 1 TABLE IV. REGRESSION EQUATION COEFFICIENT OF (1) Model Ustadardized Coefficiets B Std. Error Stadardized Coefficiets Model (Costat) -15.686 87.613 -.179.869 VAR00001.001.000.134 2.121.124 VAR00002.000.002 -.017 -.169.876 VAR00003.002.001.625 3.981.028 VAR00004.001.002.040.405.713 VAR00005.003.004.189.765.500 VAR00006 -.002.011 -.037 -.204.851 VAR00007.145.034.982 4.263.024 VAR00008 -.255.065-1.041-3.909.030 VAR00009 -.147.064 -.803-2.306.104 VAR00010.644.084 2.176 7.626.005 VAR00011 -.838.128-1.380-6.524.007 VAR00013 -.001.001 -.097 -.686.542 VAR00014 -.004.002 -.463-2.664.076 VAR00015 -.017.003-1.042-6.125.009 VAR00016.020.003 1.099 5.921.010 VAR00017.067.006 1.993 11.536.001 VAR00018 -.053.008-1.415-6.674.007 VAR00019 20.793 6.906.116 3.011.057 VAR00020 20.616 9.844.055 2.094.127 VAR00021-34.749 9.337 -.127-3.722.034 Sig ad the aalysis of variace is show i the Table III. TABLE V. MODEL SUMMERY Adjusted Std. Error of the Model R R Square R Square Estimate 1.931a 0.866.860 8.95749 2.958b 0.918.910 7.200916 Fig. (7). Compariso of the Three Values Cocerig Actio Movies. As of 31st December 2014, a total of 25 actio films were released this year. This umber is 35 i 2013. Except the last oe (The Takig of Tiger Moutai),the reveue scatter diagram of the remaiig 24 films was draw i Fig.6.There are 15 films whose box office is over 100 millio yua. 1. Actio Movie Model As of 31st December 2014, a total of 25 actio films were released this year. This umber is 35 i 2013. Except the last oe (The Takig of Tiger Moutai),the reveue scatter diagram of the remaiig 24 films was draw i Fig.6.There are 15 films whose box office is over 100 millio yua. This umber is same with that of 2013,which Fig. (8). Compariso of the Three Values Cocerig Horrible Movies. DOI 10.5013/IJSSST.a.17.02.04 4.5 ISSN: 1473-804x olie, 1473-8031 prit

Y 2. 0074 0. 001918 D (2) The Pearso correlatio coefficiet of (5) betwee results of (3) ad actual values is 0.958.The oe is 0.897 whe applied to (4). r (x x)(y y) i i i 1 2 2 (x x ) (y y ) i i i 1 i 1 The compariso of actual values, results of (3) ad (4) is depicted i Fig.7. Accordig to (3) ad (4), the predictio result of the last actio film i 2014 is 0.21 billio ad 0.11 billio respectively, which are sigificatly lower tha the actual value(0.8866 billio).we will aalyze the predictio deviatio i the ext sectio. 2. Horrible Movie Model There are twety horrible movies screeed i 2014 accordig to our classificatio method. We gathered the followig elemets listed i Table 2. The predictio model based o elemets without idicies o screeig day is as follow: 121. 043 0. 176 C Y 65. 292 0. 337 C 0. 273 B (4) R square values of (6) are 0.994 ad 0.996,so the lower half part is chose. Y 65. 292 0. 337 C 0. 273 B (5) The predictio model based o elemets without idices o screeig day is as follow: Y 192. 889 0. 297 B (6) The compariso chart of actual value, results of (7) ad (8) is show i Fig.8. (3) Fig. (9). Compariso of the Three Values Cocerig Literary ad Romace Movies. Fig. (10). Compariso of the Three Values Cocerig Child ad Aimatio Movies. We apply (7) ad (8) to predict box office reveue of the last horrible movie amed Bloody Doll i 2014.The predictio result are 19 millio ad 13.40 millio. The actual value is 23.5 millio. Therefore, the deviatio rate are 19.14% ad 42.8% respectively. It meas that if we release predictio o the secod day rather tha the screeig day, the result is more accurate. 3. Literary Romace Movie Model Accordig to the theory ad modelig method discussed above, the predictio model based o elemets with idices o screeig day of 19 films is as follow: 966. 716 1. 693 E 1593. 634 1. 563 E 1. 761 G Y 983. 812 1. 038 E1. 469 G1. 531 F 705. 896 0. 615 E 1. 946 G 5. 344 F 0. 311 B 17676. 827 0. 795 E2. 357 G5. 792 F 0. 369 B2954. 315 U (7) R square values of (9) are 0.814,0.884,0.925,0.965 ad 0.978,so the last part is chose. The predictio model based o elemets without idices o screeig day is as follow: Fig. (8). Compariso of the Three Values Cocerig Horrible Movies. 4023. 463 3. 785 F Y 3809. 249 7. 12 F 0. 331 B 1465. 213 7. 524 0. 391 7. 176 F B J ( 8) DOI 10.5013/IJSSST.a.17.02.04 4.6 ISSN: 1473-804x olie, 1473-8031 prit

R square values of (10) are 0.78, 0.85 ad 0.937, also the last part is chose. Therefore, the fial model is: 17676. 827 0. 795 E2. 357G5. 792 F0. 369 B2954. 315 U Y 1465. 213 7. 524 F 0. 391 B 7. 176 J (9) We apply this model to predict the profit of the last literary ad romace movie i 2014,whose ame is Fleet of Time. The result are 591 ad 612 millio. The actual value is 584 millio. The deviatio ratio are oly 1.2% ad 4.8%.The result based o elemets with idices o the screeig day are more precise, which is the same with the previous model. The compariso chart is show below. Certaily, if predictio result is less tha zero, it meas that the box office would probably be very terrible. 4. Child Aimatio Movie Model There are 31 films of this type released i 2014.To avoid the statistical oise, five films whose box office reveue are ot provided by m1905 were ot take ito accout durig modelig(their performace is also very poor).same as above, the predictio model based o elemets with idices o screeig day of 25 films is as follow: Y 9150. 984 0. 131 N 0. 230 B 2265. 678 T (12) We use (15) to predict the reveue of the last comedy ad romace movie i 2014, amed Love O The Cloud. The result is 0.3315 billio. The actual value is 0.2869 billio. The deviatio ratio is 15.55%. The compariso chart is show as follow. V. PREDICTION ERROR ANALYSIS We apply R-square ad correlatio coefficiet(cc) value to test the fit of the model. R-square ca be calculated as follows: 1898. 837 0. 284 B Y 1470. 665 0. 213 B 0. 062 D 2026. 165 0. 163 B 0. 175 D 0. 12 E (10) R square values of (12) are 0.961,0.98 ad 0.986,so the last part is chose. The predictio model based o elemets without idices o screeig day is the first ad secod part of (12), the fial model is: 1470. 665 0. 213 B 0. 062 D Y 2026. 165 0. 163 B 0. 175 D 0. 12 E (11) We use this model to predict the reveue of the last child ad aimatio movie i 2014, amed Kuiba III. The result are 26.55 ad 20.10 millio. The actual value is 23.50 millio. The deviatio rate are 11.5% ad 16.9%.Ditto the latter is better tha the former. The compariso chart is show as follow. 5. Comedy Romace Movie Model I 2014, 38 movies fall ito this category ad eleve of them are discarded because their box office profits are dropped (also too poor) by M1905.We pla to build the model with data with 26 films. The predictio model based o elemets with idices o screeig day is the same as the oe without those. 1995. 262 0. 175 N Y 865. 785 0. 135 N 0. 233 B 9150. 984 0. 131 N 0. 230 B 2265. 678 T (14) Fig. (11). Compariso of the Three Values Cocerig Comedy ad Romace Movies. Model TABLE VI. R-SQUARE VALUES OF THE FIVE TYPES OF FILMS R-square With Idices O the Screeig Day Without Idices O the Screeig Day Actio 0.918 0.804 Horrible 0.996 0.989 Literary ad Romace 0.978 0.937 Child ad Aimatio 0.986 0.980 Comedy ad Romace 0.939 0.939 TABLE VII. CORRELATION COEFFICIENT VALUES OF THE FIVE TYPES OF FILMS Model CC With Idices O the Screeig Day Without Idices O the Screeig Day Actio 0.958 0.897 Horrible 0.998 0.997 Literary ad Romace 0.989 0.967 Child ad Aimatio 0.987 0.983 Comedy ad Romace 0.969 0.969 TABLE VIII. PEARSON CORRELATION COEFFICIENTS AMONG SCHEDULE, REVENUE AND BSQV. Movie Title Correlatio Coefficiet Betwee Schedule Ad Daily Reveue Correlatio Coefficiet Betwee Schedule Ad Daily BSQV The square values of R are 0.859, 0.917 ad 0.940.Therefore, the last part is chose. The Takig of Tiger Moutai Goe With The Bullets 81.33% 91.10% 90.47% 91.93% DOI 10.5013/IJSSST.a.17.02.04 4.7 ISSN: 1473-804x olie, 1473-8031 prit

We could aalyze from (15) that box office reveue of this type is related to the film s attetio, the stars popularity ad douba score. That s a clear sig that douba ad performers could play as a idicator of success to this type. It is worth metioig that we achieved over 90 percet accuracy with this method whe predictig reveue of the film amed The Breakup Guru i Jue 2014,which is early 0.7 billio. At that time, its rival is the box office Champio(Trasformers IV) ad its score exceeded most people s expectatio, except our team. We could aalyze from (15) that box office reveue of this type is related to the film s attetio, the stars popularity ad douba score. That s a clear sig that douba ad performers could play as a idicator of success to this type. It is worth metioig that we achieved over 90 percet accuracy with this method whe predictig reveue of the film amed The Breakup Guru i Jue 2014,which is early 0.7 billio. At that time, its rival is the box office Champio(Trasformers IV) ad its score exceeded most people s expectatio, except our team. R squar e i 1 i 1 ( y y ) i i i ( y y ) i i i 2 2 (13) R-square values of the five models are show i Tab.6.The experimets suggest that predictio result calculated from idices o the day of release is more precise tha the average value of oe week before screeig. Meawhile, the predictio accuracy of actio film achieves the lowest amog the five types. Correlatio coefficiet values of the five models are show i Tab.7.The result is cosistet with what is maifested i Table 6. We could derive that the lower of R- square ad correlatio coefficiet values, the higher of deviatio rate. Based o the above aalysis, we could coclude that the actio model is more likely to be iaccurate tha the other four models. Therefore, it failed i our predictio test, far from 80 percet accuracy rate. There are may factors cotributig to predictio deviatio, which are obvious or subtle. Amog them, the most powerful elemets are WOM ad schedule, which are govered by film viewers ad ciema maagers respectively. WOM is supposed to have icreasig impact o cosumer purchase decisio. The key elemets of WOM are volume ad valece. Volume reflects the umber of people who become to kow or focus o the film, while valece is represetative of cosumer s experiece about it. These data could be collected from social etworks, fa ad video websites, but cotaiig much oise because of iteret water army. They are also ought to be ormalized before processig. However, the persuasive effect of WOM could be embodied by search query volume. Because search is a sig of iterest o matter where the iformatio is from (olie or offlie), whether it is real or ot (true or fake), what ature of it (good or bad). Fig. (12). Compariso of Box Office ad the Three Search Query Volume durig Screeig. Ciema maager is defiitely powerful eough to exert ifluece o box office reveue through screeig schedule. The percetage of schedule would be decreased to stop loss with a small audiece, while it would be icreased to raise icome with a large audiece by ciema maager. Furthermore, the umber of screeig days is ucertai to each film i differet theatre chai because of differet factors. Not to metio that SARFT(State Admiistratio of Radio, Film ad Televisio) has reiged supreme i Chia. It forced the box office champio Trasformers 4 to be take out of theatres at the 32d day sice releasig, makig the reveue below 2 billio. Takig two extremes i 2014 as a example, oe is Goe With The Bullets with bad rap, the other is The Takig Of Tiger Moutai with good reputatio. We crawled the daily box office reveue from WeChat official accout. The Pearso correlatio coefficiets of the first film(goe With The Bullets) betwee daily box office reveue ad the three search query volume are 0.958(Baidu),0.824(360) ad 0.827 (Sougou) respectively. The umbers of the secod film(the Takig Of Tiger Moutai) are 0.911(Baidu),0.836(360) ad 0.499 (Sougou) separately. Accordig to these figures, Baidu is closest to WOM after releasig. It is suggested i Table VIII that: 1) Schedule ad reveue have show a strog positive correlatio durig screeig period. 2) Schedule ad BSQV(Baidu Search Query Volume) have idicated a more strog positive correlatio while BSQV implies WOM referrig to the aalysis above. DOI 10.5013/IJSSST.a.17.02.04 4.8 ISSN: 1473-804x olie, 1473-8031 prit

3) It ca also be deduced that BSQV ad reveue have a strog positive correlatio based o 1) ad 2),which we had deducted from Fig.12. VI. CONCLUSION I this paper, we propose a multiple liear regressio predictio model that leverages olie resources icludig search egies ad video websites. We gathered iformatio from these resources before ad o the day of release to deter echo iterferece. Three maistream search egies, three authoratitive media sites ad oe WeChat Official Accout are selected owig to data dimesio. To reflect chages i the market, movie samples i oe year are collected. More importatly, we work with experts i tv ad film busiess to classify domestic films ad five mai kids are carefully chose. Stepwise method of multiple liear regressio is utilized to build the model istead of other methods. The experimets show that predictio result calculated from idices o the day of release is more precise tha the average value of oe week before screeig. Our model could do predictio with beyod 80 percet accuracy to the other four film types except actio movies ad the best oe eve achieved over 98 percet. We calculate correlatio coefficiet ad R-square value to evaluate the fittig degree of the five models. The we focus o the aalysis of WOM ad schedule, which are the two strogest factors cotributig to daily box office reveue. Baidu search idex is proved to be a idicator of WOM ad there is a positive correlatio betwee it ad schedule. Nowadays, the essece of commercial films is to attract audieces. Moreover, it is better to wi popularity before screeig tha gatherig fas after releasig. Takig the famous reality show amed Where Are We Goig Dad? as a example, the film of the same ame have gaied 700 millio reveue i 2014.It is easier for the film to gai both fame ad wealth with big fas before producig. Future work ca proceed i several directios. Due to the limited time i advace, we are supposed to aalyze the relatioship betwee schedule ad competitio i order to shift the predictio time to a earlier date. Secodly, we aim to study those factors cotributig to box office which caot be solved by liear regressio. ACKNOWLEDGMENT This work is supported by the Fudametal Research Fuds for the Cetral Uiversities(Grat No. 2014PTB-00-02), ad the Natioal Sciece Fud for Distiguished Youg Scholars (Grat No. 61425012 ) i Chia. REFERENCES [1] Chiese Film Idustry Research Report i 2013. Chiese Film Press. April,2013. [2] Litma, Barry R. (1983). Predictig Success of Theatrical Movies: A Empirical Study. Joural of Popular Culture, 16 (sprig), 159-175. [3] Xiwag Yag,Yag Guo,Yog Liu. Bayesia-iferece Base Recommedatio i Olie Socail Networks. [4] HAO Yua-yua,LI Yi-ju,YE Qiag,ZOU Peg. Dyamic Impacts of Olie Reviews ad Other Iformatio Sources o Sales i Pael Data Eviromet:Evidece from Movie Idustry. Iteratioal Coferece o Maagemet Sciece & Egieerig, September 10-12,2008. [5] Rui Yao, Jiahua Che. Predictig Movie Sales Reveue usig Olie Reviews. IEEE Iteratioal Coferece o Graular Computig(GrC) i 2013. [6] Krushikath R.Apala,Meri Jose,Supreme Motam,C.-C. Cha, Kathy J. Liszka, ad Federico de Gregorio. Predictio of Movies Box Office Performace Usig Social Media. IEEE/ACM Iteratioal Coferece o Advaces i Social Networks Aalysis ad Miig(ASONAM), August 25-29,2013. [7] Shyam Gopiath,Pradeep K.Chitaguta,Sriram Vekatarama. Blogs,advertisig ad local-market movie box-office performace. Social Sciece Research Network, February 8, 2013. [8] Jaehoo Lee,Giseop Noh,Chog-kwo Kim. Aalysis & Visualizatio o movie s popularity ad reviews. I Iteratioal Coferece o Big Data ad Smart Computig(BIGCOMP) 2014,pp:189-190. [9] Kig Timothy. Does Film Criticism Affect Box Office Earigs?Evidece from Movies Released i the U.S. i 2003.. Joural of Culutual Ecoomics, 2007(31): 171-186. [10] Zhag Yusog ad Xi Zhag. Aalysis of Factors that Ifluece Icome of Movies. Ecoomic Forum, 2009(4): 130-132. [11] [11] He Pig. Aalysis of Factors that Affect the Box Office Icome of a Film. Chiese Film Market, 2011(11): 8-10. [12] HU Xiao-li,LI Bo,WU Zheg-peg. The Aalysis of the Factors Which Ifluece Film Box Office. Joual of Commuicatio Uiversity of Chia(Sciece ad Techology). Vol.20,No.1,Feb,2013. [13] Jehoshua Eliashberg,Sam K.Hui,Z.Joh Zhag. Assessig Box Office Performace Usig Movie Scripts:A Kerel-based Approach. IEEE Trasactios o Kowledge ad Data Egieerig, Jauary 15,2014. [14] Google Whitepaper,Idustry Perspective+User Isights. Quafyig Movie Magic with Google Search. Jue 2013. [15] Jeremy Gisberg,Matthew H.Mohebbi1,Raja S.Patel,Lyette Brammer,Mark S.Smoliski & Larry Brilliat. Detectig Iflueza Epidemics Usig Search Egie Query Data. Nature, Vol 457,19 February 2009. [16] Zhag Li. Foreig Movies ewom ad Box Offices. Master thesis:tsighua Uiversity,2012. DOI 10.5013/IJSSST.a.17.02.04 4.9 ISSN: 1473-804x olie, 1473-8031 prit