Poceedings of the Wold Congess on Engineeing and Compte Science 202 Vol, Octobe 24-26, 202, San Fancisco, USA Content-Based Movie Recommendation Using Diffeent Feate Sets Mahiye Ulyagm, Zeha Cataltepe and Esengl Tayf Abstact Movie ecommendation systems aim to ecommend movies that ses may be inteested in. n this pape, we intodce a content-based movie ecommendation system which can se diffeent feate sets, namely, acto feates, diecto feates, gene feates and eywod feates. Fo each se, we assign a weight to each feate in a feate set based on the paticla se s past behavio. We podce se s implicit ating fo a movie based on the dation of the movie that the se viewed. n ode to pedict a ating fo se and a movie, sing a paticla feate set, we mege the se specific weights of movie s feates. We also podce atings sing all feate sets. We evalate each ecommendation method based on pecision, ecall and F- mease on ten movie ecommendations. ndex Tems Content-based Movie Recommendation, Feate Weight Calclation, Recommende Systems, mplicit Rating. TRODUCTO Digital TV boadcasting boght a hge incease in the nmbe of TV channels and the amont of content they povide. t is vey had fo ses to find and watch the content they actally lie among these many options. Uses have to zap aond the channels o follow the pogam gides to find the contents that they ae liely to pefe. Bt the pogamming gide is a vey long list, and zapping taes time and it may also not be possible to zap ove so many diffeent contents to have an idea on them. So, the sitable content selection fo each se becomes an impotant poblem []-[2]. Recommendation systems have emeged as a soltion to this poblem. f explicit atings (e.g. lie/dislie, a ating between 0 and 5 ae available, then a ecommendation system may se these atings. On the othe hand, fo many systems, ses do not want to povide sch an explicit feedbac, theefoe implicit atings need to be podced based on the se viewing histoy. Anothe taxonomy of ecommendation systems is based on whethe content of each movie, o viewing behavio of othe ses ae taen into accont. Collaboative filteing methods ely on a se-item matix which shows whethe a se lied an item o not [3]. Usally, the collaboative filteing methods as the ses to give explicit atings abot the contents they watched peviosly. So, the atings ae obtained explicitly by the system [4]. f atings ae not available, then they may Manscipt eceived Jly 6, 202; evised Agst 0, 202. This wo was sppoted in pat by the Tish Ministy of Science, ndsty and Technology, SanTez Pogam, Poject nmbe 00966.STZ.20-2 and in pat by Digit. Athos M. Ulyagm and Z. Cataltepe ae with stanbl Technical Univesity. Atho Esengl Tayf is with Digit. Coesponding Atho: Z. Cataltepe, stanbl Technical Univesity, Compte Engineeing Depatment, cataltepe@it.ed.t, +90-22-285-355. SB: 978-988-925-6-9 SS: 2078-0958 (Pint; SS: 2078-0966 (Online need to be geneated implicitly, based on se behavio. t shold be emphasized that sing diect atings o calclated implicit atings may podce diffeent eslts. Spasity of the se-item matix is a majo poblem in collaboative filteing [5]. Thee ae sally a lage nmbe movies in the system and each se gives atings to a small nmbe of movies. Since the nmbe of commonly ated movies by two ses is mostly zeo, it becomes vey difficlt to find ses who ae neaby. Moeove, if a movie is not ated by any ses at all, this movie cannot be ecommended (cold-stat poblem [5]. Content-based ecommendation ses movie infomation and ses viewing pofile. Msic Genome Poject is an example msic ecommendation system [6] which ses a content-based ecommendation method. n a content-based method each se is niqely chaacteized and the se s inteest is not matched some othe se as in the collaboative methods [7]. The ability to show content feates that cases an item to be ecommended also gives ses confidence abot the ecommendation system and insight into thei own pefeences [7]. n this pape, we intodce a content-based movie ecommendation method. Fist of all, we convet the viewing histoy of a se fo each movie to an implicit ating. We conside feate sets, sch as acto, diecto, eywod etc. that descibe a movie. Fo each feate in a feate set, based on the se s past viewed movies and the se s ating fo each movie, we compte a feate weight. Each feate weight is calclated sepaately fo each se. f a se watched a movie completely o mch of it, then, the feates extacted fom this movie ae impotant, and thei weights will be assigned accodingly. When a movie needs to be ated fo a se, based on the feates of the movie, we podce a diffeent ating fo each feate set and compae them. We also podce combined atings which tae into accont all diffeent feate sets.. RELATED WORK With the incease in the amont of items ses can by/watch and the ability to eep the histoy of ses consmed items, ecommendation systems have become both necessay and available. A ecommendation method fo a Japanese video sevice povide has been poposed in [8]. They sed the acto and eywod infomation of the ses films. They also consideed the time of the day the ses watch TV. They sed the atio of the nmbe of times a se watched a movie with a cetain feate (sch as acto, eywod to the nmbe of times the feate is obseved in all the movies. This atio is calclated fo all acto and eywod feates. Then fo each movie, sm of the movie s atio feates is calclated. They sed ecall, pecision and F-Mease as evalation meases. Diffeent fom o evalation method,
Poceedings of the Wold Congess on Engineeing and Compte Science 202 Vol, Octobe 24-26, 202, San Fancisco, USA they meased thei pefomance by getting feedbac fom the se ight afte maing a ecommendation. An appoach to sppot context-awae ecommendation fo pesonalized digital TV has been poposed in []. n this wo they sed contextal se pofile, which consists of se pesonal data pofile, se contextal infomation and the gene of the TV pogam. They got this infomation fom the ses by asing them diectly. The distinction of o system is that we do not as any qestions to the se to get thei demogaphic infomation o pefeences. Since in o system each cstome may consist of a nmbe of family membes and the infomation povided may not always be eliable, we do not se demogaphic infomation of ses at all. nstead of asing pefeence qestions to the se, we extact this infomation fom the se s watching histoy. FT system ecommends TV pogams to family membes [9]. They constcted a se pofile by asing each hosehold abot thei pogam gene pefeences. The se pofile also contained the times of the day they watch television. n the ecommendation phase, fistly FT system gesses which hosehold tn on the television by sing the time of the day infomation. f the gess is wong then the system may mae the wong ecommendations fo the hosehold. TiVo television show collaboative ecommendation system ses item-item fom of the collaboative filteing [2]. Pocess stats by a se giving a ating to a movie. Thee ae two types of atings: explicit and implicit. Fo explicit feedbac, the se pesses the emote contol btton accoding to how mch s/he loves the movie. This fomlation is simila to that of []. The diffeence in o fomlation is that can be geate than if a se has watched a content moe than once. Also, we wo with continos atings, we do not se a theshold to detemine elevant items. As it is seen in Fig., vales ae between 0 and 7.2. Most contents ae watched once, bt thee ae also contents which ae watched moe than once. B. Feate Based Weight Calclation Method The aim of the movie ecommendation system is to find the contents that the se may actally want to watch. We se the feates of the movies the se viewed in the past and the implicit atings fo these movies, to compte a weight fo each se and feate. We se the feate sets of type acto, gene, diecto. Taining contents have 476 actos, 927 diectos, 34 genes. The weight of each feate is assigned accoding to the implicit ating of se fo all taining data contents inclding that feate. The acto feate set may contain feates lie Bad Pitt, Haison Fod, Engin Günaydın; the gene feate set may contain dam, action, comedy; the diecto feate set may contain Woody Allen, James Cameon etc. n ode to detemine the weight of each feate fo se, we se the taining ating data. TABLE TEM-FEATURE MATRX FOR USER U. METHODOLOGY A. Convesion of Movie Viewing Dation to Rating n o system, ses do not ate movies explicitly, so we calclate implicit atings by sing the viewing dations. Assme that se watches the movie i fo t(, mintes ding the yea and t i is the total dation of the movie i. We define the nomalized viewing dation (the implicit ating of the se fo movie i as: t(, ( t i Fig.. Distibtion of the nomalized viewing dations of all ses. Let se watch items i 0,,i 8, which have feates j 0,,j 3. n Table, we show the feates fo the feate set fo se. ote that j 0,,j 3 contain the feates that appea on items all ses watched in the taining set. Rating colmn shows the ating of se fo movies i 0, i 8. The weight of feate j in feate set fo se is calclated as: w, tain x ( i, (, tain (, (2 i n eqation (2, epesents the type of the feate set, {acto, gene, diecto}. is the implicit ating of se fo item i and ( i, {0, } jth feate of item i. tain x, is the set of movies watched by se in the SB: 978-988-925-6-9 SS: 2078-0958 (Pint; SS: 2078-0966 (Online
Poceedings of the Wold Congess on Engineeing and Compte Science 202 Vol, Octobe 24-26, 202, San Fancisco, USA taining peiod. f movie i has feate j then x, ( i, will be and se ating fo movie i will contibte to the sm. C. Rating Pediction Method Afte calclating the weight of each feate fo each se, we se it to pedict the ecommendation atings of contents. We calclate the ating fo each feate set sepaately sing the weights w obtained by sing the taining data. We se two methods to geneate atings. As it is seen in (3 the fist one is to sm p the feate weights of the contents. Eqation (4 epesents the second one and it is to nomalize this sm, by dividing it to the nmbe of its feates. ' w (3 D jd, i, i jd, i w epesents the ating that se gives the movie ' i accoding to the feate set. is the ating fom of the nomalization of the feate weights smmation with D,. D, is the feates that appea in movie i fom i i feate set. D. Combining the Ratings of Diffeent Feate Sets Each feate set may contain a diffeent nmbe of items; theefoe vales may be in diffeent anges. n ode to detemine the total effect of all the feates, the ating of each feate which is calclated accoding to (3 needs to be nomalized. n this wo min-max nomalization method is sed it is condcted as follows: (,,,, ( mr /( MR mr (5 mr, indicates the minimm pedicted ating of the se calclated accoding to feate set in the taining set and MR, indicates the maximm pedicted ating of the se calclated accoding to feate set in the taining set. As in Eqation (5, we also nomalize ses actal atings: ( mr /( MR mr mr indicates the minimm actal ating of the se and MR indicates the maximm actal ating of the se. Ratings ae boght togethe with the se of two diffeent methods. Acto, gene, diecto, time zone, channel, eywod, elease yea nomalized atings ae smmed in the fist method. K sm, (4 (6 ( (7 Eqation (7 is sed to geneate atings. n this eqation, K is the nmbe of the feate sets. is the nomalized ating. Afte the combined atings ae geneated, the nmbe of coect ecommendations is evalated. Let E, epesent the Mean Absolte Eo (MAE of atings fo se accoding to feate set. The MAE is calclated as the absolte vale of the diffeence of the pedicted and actal atings fo se accoding to feate set on the taining set: E tain tain i, i The atings fo feate sets which have less MAE shold have lage weights in the combination. Theefoe, the second ating combination method ses the MAE as follows: K expsm, *exp( E, (8 ( (9 E. Evalation of Recommendations n ode to evalate the pefomance of o ecommendation methods we sed pecision, ecall and F- Mease metics. We se the top hit conts to mease the accacy of o ecommendation system. We ecommend movies to the se accoding to the pedicted atings as we geneated befoe. We sot all the test movies accoding to thei geneated atings and ecommend the top =0 movies to the se. Afte that we cont the nmbe of movies watched in the test set by the se ot of top 0 ecommendations and name this qantity as the #hitconts. Pecision is the mease of the how many movies the system hits in the top 0 movies: # hitconts Pecision (0 Recall is the atio of the nmbe of hits ot of 0 ecommendations and the size of which is the nmbe test of movies that se watched in the test set: #hitconts Recall ( test Fo example if the ecommendation system hits 5 movies which have been watched ot of the 0 ecommended movies, the pecision vale will be 0.5. f the se watched 30 movies in the test set, then the ecall vale will be 0.6. F-mease ses both pecision and ecall as follows: Pecision Recall F mease 2 (2 Pecision Recall V. EXPERMETS n o expeiments, we sed 3 months of log data, the fist 2 months fo taining phase, the last month fo the test phase. Thee ae 62 ses and 3700 contents. SB: 978-988-925-6-9 SS: 2078-0958 (Pint; SS: 2078-0966 (Online
Poceedings of the Wold Congess on Engineeing and Compte Science 202 Vol, Octobe 24-26, 202, San Fancisco, USA (a (b Fig. 2a-2b. Pecentage of the ses fo whom we ecommend 0 contents sing espectively (3 and (4 with acto feate set and calclate the hit conts. The challenge of the system is that thee ae lots of contents available to watch, bt ses have watched vey few of them and we ty to catch thei inteest accoding to those small nmbe of watched contents. n Fig. 2a and 2b, we get eslts by sing only the acto feates (3 and (4 espectively. t shows the pecentage of the ecommendations which have at least given nmbe of sccessfl ecommendations. n Fig. 2a, 98% of the ses have watched at least one of the ecommended contents and 88% of them have watched at least 2 ecommended contents. On the othe hand, in Fig. 2b these atios ae less. Theefoe, we se (3 fo the est of o expeiments. The nmbe of the contents available fo the ses is abot thosands, bt the nmbe of contents they actally watch mostly changes fom to 40. n Fig. 2a and 2b, we can see that the sccess ate of the system fo the ses who watch moe than 50 contents ae bette. As they watch moe and moe contents, we can get moe infomation fom thei watching behavios and pedict bette fo fte ecommendations. (a (b (c (d Fig. 3a-3b. Using acto, gene, diecto feate sets to calclate hit conts. Fig. 3c-3d. Combining the all feate sets accoding to espectively (7 and (9. SB: 978-988-925-6-9 SS: 2078-0958 (Pint; SS: 2078-0966 (Online
Poceedings of the Wold Congess on Engineeing and Compte Science 202 Vol, Octobe 24-26, 202, San Fancisco, USA n this wo, we also analyze pefomance of atings which combine diffeent feate sets. Thans to nomalization (Eqation 5, we can combine the atings fo each feate set given by the same se to the same item (7. n Fig. 3c and Fig. 3d we combine acto, gene, diecto, eywod, time zone, elease yea feate sets. Eqation (7 eslts ae shown in the Fig. 3c. Fo 60% of the se we mae 4 o moe sccessfl ecommendations ot of 0 movies sing all the feate sets when the nmbe of se watched movies is moe than 50 in the test set, (see Fig 3c. On the othe hand this atio decease to 50% when sing only acto feate set (see Fig. 3b. We cold mae the following infeence: When ses watch moe than 50 contents, 4 o moe sccessfl ecommendation atio is high when sing all feates sets togethe. Bt 7, 8, 9 o moe hit conts obtained when sing feate sets sepaately. The othe combination of nomalized ating method is given by (9. Fo each se, MAE is calclated in the taining set. Eqation 9 ses this MAE vale. This ating calclation povides moe accate ecommendation as it is seen in Fig. 3d. Table shows the aveage of the pefomance meases ove all ses in the test set. Pecision, ecall and F-Mease metics ae sed. Fo all of these meases, highe vales ae bette. Diecto gives the best ecommendation pefomance, while combined ecommendations have edced pefomance. We ae in the pocess of finding bette combination methods. [5] P. Melville, R. J. Mooney, and R. agaajan. Content-boosted collaboative filteing. n Poceedings of the 200 SGR Woshop on Recommende Systems, 200. [6] Westegen. T., The msic genome poject. http://www.pandoa.com/, Accessed on 9th of Jly,20. [7] Raymond J. Mooney, Loiene Roy, Content-based boo ecommending sing leaning fo text categoization, Poceedings of the fifth ACM confeence on Digital libaies, p.95-204, Jne 02-07, 2000, San Antonio, Texas, United States. [8] K. awa, T. Fhaa, H. Fjii and H. Taeda: Evalation of a TV Pogams Recommendation sing the EPG and Viewe's Log Data, in C. Peng, P. Voimaa, P. aanen, C. Qico, G. Haboe and A. Lgmay eds., Adjnct Poceedings EoTV 200, pp. 82 85, Tampe, Finland (200, Tampee Univesity of Technology. [9] Goen-Ba, D., Glinansy, O.: FT-ecommending TV pogams to family membes. Comptes & Gaphics 28, 49-56 (2004. TABLE EVALUATO METRC RESULTS Pec Recall F-Mease Acto_Hit0 0.75 0.078 0.08 Gene_Hit0 0.93 0.086 0.2 Diecto_Hit0 0.23 0.095 0.3 AllFeat_Hit0 0.36 0.06 0.084 AllFeatExp_Hit0 0.35 0.06 0.083 V. COCLUSO n this pape, we developed a content-based ecommendation system that maes ecommendations accoding to ses past watching behavio. We podce ecommendations based on acto, gene, diecto feate sets sepaately and also a combination of acto, gene, diecto, eywod, time_zone, elease_yea feate sets. n the fte, we plan to podce a hybid ecommendation system. REFERECES [] Santos da Silva, F., Alves, L. G. P., and Bessan, G. 2009. PesonalTVwae: A poposal of achitecte to sppot the contextawae pesonalized ecommendation of TV pogams. n Poceedings of the 7th Eopean Confeence on nteactive TV and Video. [2] K. Ali and W. van Stam. TiVo: maing show ecommendations sing distibted collaboative filteing achitecte. n KDD '04: Poceedings of the tenth ACM SGKDD intenational confeence on Knowledge discovey and data mining, pages 394 40. ACM, 2004 [3] Wang, J., de Vies, A.P., Reindes, M.J.T., 2006: Unifying Usebased and tem-based Collaboative Filteing Appoaches by Similaity Fsion, SGR 06, Agst 6-, 2006, Seattle, Washington, USA. [4] Koen, Y., Bell, R., Volinsy, C., "Matix Factoization Techniqes fo Recommende Systems, Compte Jonal, EEE Pess, 42-49, 2009. SB: 978-988-925-6-9 SS: 2078-0958 (Pint; SS: 2078-0966 (Online