Predicted Movie Rankings: Mixture of Multinomials with Features CS229 Project Final Report 12/14/ PDF Free Download

Predicted ovie Rnkings: ixtre of ltinomils with Fetres CS229 Project Finl Report 2/4/2006 Introdction H Ji Chew Dimitris Economo Rlene Yng hji@stnford.ed dimeco@stnford.ed rlene@stnford.ed The Netflix Prie is n on-going contest orgnied b the online DD rentl compn, Netflix. Contestnts tr to design nd implement lgorithms tht cn best predict ser's movie rnkings bsed on their movie preferences. We strted or project b joining the Stnford Netflix Prie Tem nd tking the probbilistic model pproch. Or first implementtion sed the mixtre of mltinomils model [] s bseline pproch nd obtined promising reslts. We then modified the lgorithm to mke se of dditionl fetres in the form of clsters, creting method tht we will hereb refer to s ixtre of ltinomils with Fetres. We evlted the mixtre of mltinomils with fetres pproch b experimenting with clstering techniqes nd vring model prmeters. Clsters were generted sing dt extrcted from the IDB dtbse, sch s movie genre, cst, etc. To del with both the complexit of the lgorithms nd the enormit of the dtset, we prllelied or implementtions sing the essge-pssing Interfce (PI). 2 ixtre of ltinomils 2. Theor Consider problem with N sers nd items, where ech ser cn give ech item rting r. In simple mltinomil model for this problem, rtings for individl items re ssmed to be independent of one nother, nd no distinction is mde between sers. For ser nd rting profile r, R = r ) = R = r ). In mixtre of mltinomils model however, we now sppose tht sers fll into one of K ltent clsses, nd item rtings re insted conditionll independent given ser s clss. The prior probbilit tht ser belongs to clss is given b Z = ) = θ The probbilit tht ser gives rting v to item is now denoted b prmeter, P ( R = v Z = ) =,, We then constrct the joint probbilit of ser hving rting profile r nd clss s R = r, Z = ) = Z = ) = R = v Z = ) Finll, we cn then se or prmeters s defined bove to predict n individl item rting: 2.2 Implementtion r θ = =, r,, r,2,, r, ) = K K θ = =,,,, = L () We sed the Netflix Prie Dtbse s the sorce for both or trining nd testing dt, nd implemented the E lgorithm to lern or model prmeters. With 2,649,429 sers nd 7,770 movies in the entire dtset, complexit becme n importnt concern. If we let N represent the nmber of ser profiles, the nmber of movies, the nmber of rtings, nd K the nmber of ltent clsses, the complexit of single itertion of E ws O(N K), or on the order of 0 0. The verge nmber of movies rted b ech ser ws mesred to be =206, significntl less thn the totl nmber of movies. Ths to redce the complexit b fctor of /, insted of iterting over ll movies for ever ser, t ech step we iterte onl over the movies rted b the ser.

3 ixtre of ltinomils with Fetres 3. otivtion The mixtre of mltinomils pproch derives the probbilit of ser s rnking sing nothing more thn the set of known ser rnkings. Intitivel, dding more informtion to the rnking prediction lgorithm wold ield improved reslts, expecting tht ser preferences re inflenced b movie properties, sch s genre, cst, film relese dte, film bdget, formt, etc. The onl informtion provided for ech of the movies in the given Netflix dtset is the er of relese nd the title. Using this informtion to index into IDB, we cn extrct dditionl movie properties. Once provided with sch dt, we modified the prediction phse of the mixtre of mltinomils lgorithm to mke se of dditionl fetres in the form of movie clsters. The reslting lgorithm is frther discssed in the next section. 3.2 Algorithm Dring the mixtre of mltinomils prediction step, the probbilit tht specific ser is prt of ltent clss is clclted sing the prmeters derived in the lerning lgorithm. In the mixtre of mltinomils with fetres lgorithm, for ech ser nd ech movie tht hs not been rnked b the ser, we clclte the probbilit tht the ser is prt of ltent clss, denoted φ where is the movie ID nd is the ltent clss nmber. This φ is clclted sing ll P ( R' = v Z = ) =, ', where is in the sme clster s nd introdcing scling fctor α [0,] for ll, '', where is not in the sme clster. Setting α = weights the,, of relted nd nrelted movies eqll mking it eqivlent to sing the mixtre of mltinomils lgorithm. The psedo-code for the prediction step of the lgorithm is shown below where the fnction if nd γ( r ',, ', = v onl if the movie is prt of the sme clster s nd the ser s rnking of movie,, is eql to. The fnction γ is eql to ero otherwise. r ' PSEUDO CODE: INPUT: r,θ, OUTPUT: rˆ FOR =:K DO FOR =: DO θ φ K θ ' = ' ' = v= γ ( r ',, α[ γ ( r ',, ', ] v' v' = γ ( r ',, α[ γ ( r ',, ', ] ' = v= v' ' v' ' FOR =: DO FOR v=: DO K pv = = rˆ = = v p v v v φ COENTS: For ech ser predict rnking for ech nrnked movie. Clclte probbilit tht ser is in ltent clss for nrnked movie bsed on derived prmeters for relted movies nd scled down prmeters for non-relted movies (b fctor α [0,]). Use clssifiction probbilities nd derived prmeters for nrnked movie to predict probbilities of rnking. Clclte predicted rnking. 3.3 Implementtion As mentioned in [], one itertion of the originl mixtre of mltinomils E lerning lgorithm reqires

rnning time of O(N K) while the prediction step tkes O( K) time. This lgorithm predicts ser s movie rtings bsed on ll the movies the ser hs rted. Ths, in eqtion (), we onl need to compte the prmeters once for ech ser profile nd ppl them to ech of the ser s test cses. With the modifictions to the prediction step, however, we hve dded nother dimension to the complexit of the lgorithm. We now predict ser s movie rtings bsed on mixtre of relted nd nrelted movies rted b the ser. Conseqentl, we need to compte different set of prmeters for ech of the test movies in the ser profile. This increses the complexit of the predicting step to O( K). This increse in compttionl time complexit mde rnning the modified lgorithm on single processor intrctble. To overcome this, we prllelied the compttion nd distribted the worklod cross mltiple mchines sing the essge-pssing Interfce (PI). Since the prmeters (φ s) re ser-specific, it seems ntrl to split the compttion long the N dimension. In or distribted compting environment, we hve mster node nd n child nodes. The mster node is responsible for initiliing the child nodes, distribting initil prmeters nd ggregting reslts from ll the child nodes. Figre shows grphicl representtion of or modified lgorithm rnning cross mltiple mchines. Send θ s & s Send new φ s Send new θ s & s Send Child Child ster Child 2 ster Child 2 ster Lerning Step Child n Child n Prediction Step Updte φ s Compte new θ s & s Predict & compte Aggregte & compte finl Figre : Distribting compttion cross mltiple mchines. Refer to [] for detils on the lerning step. To frther redce the compttionl time reqired, we note tht ser profile contins mn movies of the sme clster. Therefore, the clim tht we need to compte distinct set of prmeters for ech of the test movies in the ser profile is not entirel tre. On the contrr, for ser profile, we cn cche set of prmeters for ech distinct clster. Whenever we enconter movie from cched clster, we do not need to repet the compttion of s shown in Section 3.2. φ 3.4 ethodolog The modified lgorithm dded few degrees of freedom, sch s the vle of α nd sing different clstering techniqes, whose effects nd properties were not immeditel cler to s. The strightforwrd w to find the optiml combintion ws to rn the lgorithm with different configrtions nd pick the one with the best reslt. However, this is not prcticl de to the lrge time nd spce complexit. To llevite this, we rn the lgorithm with different configrtions on smller dt set nd then pplied the best configrtion to the fll dtset. We first experimented with different ws of clstering the movies. In prticlr, we sed movie genres to seprte the movies into overlpping nd non-overlpping clsters. Overll there re 28 bsic genres, rnging from horror to romnce. Note tht since movie cn belong to more thn one of the bsic genres, with overlpping clsters, two movies re relted if t lest one of their genres is the sme. With non-overlpping clsters, we first preprocessed the movie-genre informtion to form distinct combintions of genres. We fond on the order of 600 distinct genre combintions, with 8900 movies belonging to single genre. In this scheme, two movies re in the sme clster if nd onl if their genre combintions re exctl the sme. We evlted the performnce of or lgorithm sing the root men sqred error (). This is the stndrd performnce metric mesred b The Netflix Prie[2]. is clclted sing the following eqtion: N ( rˆ r ) N = 2

To minimie the, we compted the prediction of ser for movie b tking the expected vle over ll possible rtings from to 5 s follows[]: rˆ = v= v R In order to find the optiml α for relted nd nrelted movies, we rn the lgorithm with these two clstering schemes on 00 times smller trining set with the nmber of ltent clsses, K, set to 5 for 0 itertions. We then evlted the performnce of ech setting b compring the root men sqred error () obtined fter testing ginst the corresponding 00 times smller test set. The derived optiml clsters nd prmeter vles were then pplied to the fll dtset. 4 Reslts 4. ixtre of ltinomils = vs. Itertions vs. Nmber of Ltent Clsses.06 k=5 0.99.04 k=0 0.985.02 k=5 k=20 0.98 0.98 k=25 k=30 k=35 0.975 0.97 0.96 k=40 k=45 0.965 0.94 0 5 0 5 20 25 k=50 0.96 0 0 20 30 40 50 60 Itertions Ltent Clsses (k) Figre 2: vs. Itertions for different nmbers of ltent clsses (left), vs. Nmber of Ltent Clsses (right). The mixtre of mltinomils lgorithm converges fter 8 itertions in ever rn. Incresing the nmber of ltent clsses improves the performnce of the lgorithm, thogh fter K=30 the increse in performnce is negligible while the increse in complexit is still significnt. The best performnce mesred ws with K=50, ttining n of bot 0.962. 4.2 ixtre of ltinomils with Fetres vs. lph (non-overlpping movie genre clsters).052.05.048.046.044.042.04.038.036.034.032 0 0.2 0.4 0.6 0.8 Alph vs. lph (overlpping movie genre clsters).0385.038.0375.037.0365.036.0355.035.0345.034.0335 0 0.2 0.4 0.6 0.8 Figre 3: vs. Alph for movies clstered b non-overlpping genres (left), vs. Alph for movies clstered b overlpping genres (right). Reslts shown for itertion 0 of lgorithm where K=5 nd sing 00x-smller trining nd testing sets. Alph

The optiml combintion of clsters nd vles of α ws α=0 with overlpping clsters. This combintion chieved the lowest, thogh α=0.5 with non-overlpping clsters performed lmost s well. Appling the lgorithm with α=0 nd overlpping clsters on the fll Netflix dtset with K=50 (on 30 nodes sing PI) chieved n of 0.958. ethod AvgUserRting AvgovieRting AvgUserRting/(vg(AvgUserRting)) ixlti ixltifetres * AvgovieRting.069.053 0.999 0.962 0.958 Tble : Bseline reslts [3] nd reslts sing ixtre of ltinomils Algorithms 5 Discssion We expected the overlpping clsters to perform worse thn the non-overlpping clsters becse the former cn relte two movies s different s romnce-comed nd romnce-dlt film. However, sing overlpping clsters improves the performnce which m be de to the vilbilit of more informtion. For exmple, there re mn genre combintions exclded in the non-overlpping clster cse tht wold expectedl improve the reslts, sch s sci-fi-ction, sci-fi-horror, nd sci-fi-dventre. To compenste for the lck of informtion when sing the non-overlpping clsters, the optiml α is greter thn 0 weighing in the inflence of ll the ser s movie predictions (inclding nrelted movies). 6 Frther Work Crrentl, not ll of the Netflix movie IDs re mpped to IDB movie IDs. Ths ll clsters sed in the experiments were generted sing onl 75% of the movies. Completing this mpping will ndobtedl improve the reslts of the mixtre of mltinomils with fetres pproch. Additionll, sing different IDB movie properties or combintions of them with more sophisticted clstering techniqes m generte clsters tht better cptre ser preferences. The mixtre of mltinomils with fetres pproch onl improves the prediction phse of the mixtre of mltinomils lgorithm. odifing the lerning phse to incorporte clster informtion s well wold be n interesting vrition to or pproch. Another slight vrition to or lgorithm tht we think wold be interesting to test is sing threshold vle, t, to filter ot predictions derived from set of relted movies smller thn t. B setting t=20, for exmple, the lgorithm wold revert to the mixtre of mltinomils prediction pproch when the nmber of relted movies is less thn 20. In this w, the lgorithm wold prevent predictions bsed on ver little informtion which cold be more likel to be incorrect thn the originl pproch. Of corse, testing wold determine the optiml t. 7 Conclsion We modified the prediction step of the mixtre of mltinomils lgorithm to mke se of dditionl fetres, in the form of clsters. We pplied the reslting mixtre of mltinomils with fetres lgorithm to the Netflix dtset sing clsters derived from movie genres nd chieved n improvement in performnce over the originl lgorithm. The mixtre of mltinomils with fetres lgorithm cn be pplied to n set of sers rnking specific set of items tht cn be groped bsed on their properties. 8 Acknowledgments We wold like to thnk Tom Do for implementing the high-precision mth librr, nd providing helpfl dvice s well s sstems spport. Thnks to Tom Do nd Thc for orgniing nd mintining the Stnford Netflix Prie Tem. Finll, we wold like to thnk the other Stnford Netflix Prie Tem members for their help with dt crtion. 9 References [] rlin, Benjmin. Collbortive Filtering: A chine Lerning Perspective. 2004. [2] The Netflix Prie, http://www.netflixprie.com/fq [3] Stnford Netflix Prie Wiki, http://stnfordnetflixprie.pbwiki.com/bselines

Predicted Movie Rankings: Mixture of Multinomials with Features CS229 Project Final Report 12/14/2006