206 3 rd Iteratioal Coferece o Egieerig Techology ad Applicatio (ICETA 206) ISBN: 978--60595-383-0 Facial Expressio Recogitio Method Based o Stacked Deoisig Autoecoders ad Feature Reductio Ju Zhao, Ya Zhao*, Yog Yag & Yog Huag College of Coputer Sciece ad Techology, Chogqig Uiversity of Posts ad Telecouicatios, Chogqig, Chia Ikyu Park Departet of Iforatio ad Couicatio Egieerig, Iha Uiversity, Icheo, Korea ABSTRACT: Based o the deep learig theory, a ovel facial expressio recogitio ethod, which utilizes both Pricipal Copoet Aalysis (PCA) ad stacked deoisig autoecoders (SDAE), is proposed i this paper. At first, PCA is used as a liear diesioality reductio ethod o the expressio features, ad subsequetly o-liear diesioality is further leart by a greed layer-wise ethod of stacked deoisig autoecoders. The, soe eaigful ad low-diesioed expressio features ca be leart ad used to classify. The coparative experiet results show that the proposed ethod is ore effective tha soe other expressio recogitio ethods based o deep learig theories ad it ca also get higher expressio recogitio accuracy tha traditioal o-deep learig based expressio recogitio ethods. Keywords: facial expressio recogitio; deep learig; stacked deoisig autoecoders; pricipal copoet aalysis INTRODUCTION The facial expressio is ot oly the ajor way to express eotio for people, but also the ai ark to idetify hua s eotio. So the facial expressio recogitio plays a sigificat role i eotio coputig []. I recet years, the facial expressio recogitio has bee widely applied to the a-achie iteractio, distace educatio aageet, vehicle safety drivig, public security oitorig ad so o. The traditioal facial expressio recogitio ethods iclude five steps: acquirig data, preprocessig, feature extractio, feature selectio, ad expressio classificatio [2,3,4]. The feature plays a key role i the classificatio result ad geerally fiishes before expressio classificatio idepedetly. Ad so far, a variety of feature represetatio ethods have bee proposed, such as Gabor, discrete cosie trasfor (DCT), local biary patter (LBP), etc.. As buildig a oliear eural etwork which has several hidde layers, deep learig trasfors the *Correspodig author: 35473626@qq.co data by a layer-wised way to lear the essetial features. Thus a eural etwork is built ad used to express iages, voice, or text like a hua brai [5]. Copared with the feature expressio ethods, deep learig ca depict the rich ier iforatio of data, thus ultiately to iprove the classificatio accuracy. I recet years, soe deep learig ethods, such as restricted Boltza achie, deep belief etworks ad covolutio eural etwork have bee applied i facial expressio recogitio. I 2002, Fasel B. used covolutio eural etwork to idetify expressio [6]. I 204, Liu Yufa, et al. utilized the optical flow feature ad sparse autoecoder for facial expressio recogitio [7]. I the sae year, Lv Y., et al. detected the face copoets by deep belief etworks, ad the did the facial expressio recogitio usig autoecoder [8]. I 205, H Jug, et al. developed a facial expressio recogitio syste usig deep eural etwork ad covolutioal eural etwork [9]. I the sae year, Liu P, et al. proposed the cobiatio of deep belief etworks ad AdaBoost ethod for facial expressio recogitio [0]. I spite that the facial expressio recogitio ethod based o deep learig 90
has achieved great progress, the expressio recogitio rate is still low if we just use a sigle deep learig odel. We eed to odify the odel ad do a lot of work, although the process of odel traiig ad recogitio is ore coplex. I this paper, a ovel facial expressio recogitio ethod, which utilizes both Pricipal Copoet Aalysis (PCA) ad stacked deoisig autoecoders (SDAE), is proposed. First we use the PCA to do the liear feature reductio. Ad the the diesio of the data is reduced by a oliear way while the stacked deoisig autoecoders lear the features. Thus soe ore effective features of expressio could be leart. Both of the average forecast tie ad facial expressio recogitio rate are iproved. The siulatio experiet results show the effectiveess of the proposed ethod. 2 THE STACKED DENOISING AUTOENCODERS I this sectio, we will first itroduce the deoisig autoecoders which is the basic block of stacked deoisig autoecoders. Ad the the traiig process of the SDAE will be itroduced. 2. Deoisig autoecoders Assue that the autoecoder has a strog robustess to the partially destroyed iputs [,2]. As show i Figure, the iput x is corrupted to the partially destroyed vector x by eas of the stochastic appig x ~ qd ( x x). I this paper, we get the partially destroyed vector x by radoly settig a fixed part of iput x [0,] d to 0. The the corrupted iput is x apped, as with the basic autoecoder, to a hidde represetatio y f ( x ) s( Wx b),where θ={w,b}.w is a weight atrix with size of d * d ad b is a bias atrix with the diesio of d, s(x)=/(+e -x ). The y is recostructed to z=gθ (y)=s(w y+b ), where θ ={W,b }. We ca obtai the optial weight atrix of reverse appig as W =W T. The by iiizig the average recostructio error betwee iput ad the recostructed vector, we ca get the optial odel paraeters like Equatio (): * '* ( i) ( i), arg i Lx (, z ) ', arg i ', i i () i () i Lx (, g ' ( f( x ))) () Where the lost fuctio L, each copoet of vector x ad z belogs to Beroulli distributio, ad uses cross etropy to easure the distace betwee x ad z like Equatio (2): d Lxz (, ) ( xk log zk ( xk)log( zk)) (2) k Figure. The structure of deoisig autoecoders. 2.2 Stacked deoisig autoecoders I this paper, the stacked deoisig autoecoders is got by stackig the deoisig autoecoders [3,4]. As show i Figure 2, x is the iput, ŷ is the classificatio output, ad hi are the hidde layers. The traiig process is show as follows: () Iitially trai the first deoisig autoecoder cosiderig the iput is the iage ad h=y. The trai the odel as the ethod stated i part 2. to get the paraeters of the first deoisig autoecoder; (2) Use h got fro step () as the iput of the secod deoisig autoecoder ad h2=y. The trai it as a deoisig autoecoder to get the paraeters of secod deoisig autoecoder; (3) Iterate through step (2) to trai the rest deoisig autoecoders util to the - deoisig autoecoder; (4) Use h got fro last step as iput of the fial odel, ad ŷ as the output. Trai the last odel also as a deoisig autoecoder to get the classificatio results. Figure 2. The structure of stacked deoisig autoecoders. 3 FRAMEWORK OF FACIAL EXPRESSION RECOGNITION METHOD BASED ON SDAE As show i Figure 3, the proposed ethod cosists of three steps. Firstly the iages are cropped ad oralized i the preprocessig phase. Ad the the diesio of the iput feature is reduced by Pricipal Copoet Aalysis (PCA). The the SDAE is traied layer-wised after beig stetted its paraeters, while the test iages are preprocessed ad feature 9
diesio reduced by PCA. The traied SDAE lear feature layer-wised ad classify. Figure 3. Fraework of facial eotio recogitio based o PCA+SDAE. 3. Feature reductio ethod I order to reove the redudat copositio i facial expressio ad eliiate the depedece of data, the facial iage is reduced diesio by PCA after preprocessig. Pricipal Copoet Aalysis is defied as a orthogoal liear trasforatio atheatically. It trasfors the data to a ew coordiate syste i which the greatest variace by soe projectio of the data lies o the first coordiate, the secod greatest variace o the secod coordiate, ad so o [5]. Accordig to the sigular value decopositio of the covariace atrix of the dataset x, costruct the projectio axis of diesio reductio: () i () i T ( x )( x ) i (3) U, S, V svd( ) (4) Where is the aout of saple, is the diesio of saple, ε is the error coefficiet ad always less tha 0.0. The diesio reductio rage error is: i x () i i 2 approx i x x ()2 i (5) where xapprox is the output of saple. The optial diesio of reductio is fixed by pricipal copoet cotributio rate: k i i i i (6) where δ is always 0.99. Usig the priciple copoet values to replace the origial data, we ca extract the ai copoet of expressio. Besides the SDAE will reduce diesio by a oliear way whe it s learig features. We utilize the PCA to reduce the diesio by a liear way first. Thus we ca get the ai copositio of expressio ad ore effective features. Thus the traiig ad test of SDAE ca be ore efficiet. 3.2 Stacked deoisig autoecoders The stacked deoisig autoecoders is stacked by the autoecoder with deoisig criterio. Copared with other deep learig odels, such as deep belief etworks ad covolutioal eural etwork, it has a stroger ability to lear feature. Ad SDAE has bee applied to digit recogitio successfully []. I order to take advatage of SDAE s strog ability to lear feature, the SDAE is applied to facial expressio recogitio firstly i this paper. If the raw pixel feature is used for SDAE to recogize facial expressio, the expressio recogitio rate will be lower because of the redudat copositio i facial expressio. So a ew facial expressio recogitio ethod based o PCA ad SDAE is proposed. Firstly the facial iages are reduced feature by PCA after be preprocessed. The it is used for SDAE to lear feature ad classify. The paraeters of SDAE is set as the autoecoder s criterio which is the diesio of every deoisig autoecoder s output is saller tha iput so as to lear lower diesio features []. I this paper, the odel paraeter is used the ethod put forward i the article [6]. That is the diesio of DAE output is lower tha the diesio of DAE iput to lear the low diesio feature ad to reove the irrelevat features. The uber of hidde layer odes of SDAE decrease fro iput to output of SDAE. 4 EXPERIMENTS 4. Experietal settigs I order to deostrate the effectiveess of the proposed ethod, the experiets were coducted o two classical facial expressio datasets that are exted Coh-Kaade (CK+) ad JAFFE [7,8,9]. Ad experiets were aied to classify the expressio to six classes which are agry, disgust, fear, happy, sad ad surprise [20]. To get the eotio part of a face, the iages were clipped by the ethod etioed i [2]. Firstly, the eyes ad ose are located, if the distace betwee the two pupils is d, the the size of the clipped iage is 2.2d*.8d. At last, oralizatio is coducted after all the iages have bee clipped. 600 iages fro CK+ were selected as the saples ad the 0-fold cross validatio was adopted. For JAFFE dataset, 83 iages were selected as the saples without eutral expressio while the 7-fold cross validatio was adopted as the size of saples was sall. The diesio of data is reduced to 52 by PCA. So the uber of hidde odes is 500,400,200 fro lowest hidde layer to highest oe for CK+. Because the JAFFE dataset is sall, the uber of hidde layer is 500,400,300,200,00 fro lowest hidde layer to highest oe for JAFFE to lear better feature. 92
4.2 Experietal results aalysis 4.2. Cotrast experiets of SDAE with differet paraeters Deostratig the paraeters of SDAE [5] used i this paper ca obtai higher facial expressio recogitio, ad cotrast experiets are perfored. All experiets i this part utilize the proposed ethod to recogize the expressio. Accordig to set differet paraeters for SDAE, such as the aout of layers ad the aout of odes i every layer, we use these odels to test. The experiets are show i Figure 4 ad 5. Differet paraeters of SDAE are show i Table. The row SDAE eas layers of SDAE. The hi ea the aout of odes i every layer as show i Figure 2. After aalyzig the experietal results, we ca coclude that the paraeters of SDAE used i this paper ca get higher recogitio rate. For both the CK+ ad Jaffe dataset, if the iteratio of traiig for each odel is the sae, the recogitio rate icreases whe the layers of odel icrease withi a certai rage, because it ca lear ore advaced features thus to get a higher recogitio rate. But if the aout of odel layers icreases further, the coplex of traiig for odel will lear worse feature ad lead to a worse result. Ad the experiets show that 5 layers structure of SDAE for CK + ad 7 layers structure of SDAE for JAFFE ca get better recogitio rate. For a sigle curve i the figure, with the icrease of iteratio for odel traiig, the odel ca lear ore advaced features thus to get a higher recogitio rate. But if the iteratio icreases further, it will lead to over-fittig of the odel traiig ad a worse recogitio results. 4.2.2 The ecessity verificatio of PCA feature reductio experiet Cotrast experiets are set i this sectio to verify the ecessity of PCA feature reductio. For the iput of the SDAE, we copare the feature reduced diesio by PCA used i this paper with the raw pixel feature. The two kids of features are used to trai ad test the SDAE. The experiets are coducted o the CK+ ad JAFFE dataset. The results are show below: Figure 4. The expressio recogitio rate of SDAE with differet paraeters o CK+ dataset. Table 2. The expressio recogitio rate coparisos ad ecessity verificatio of PCA feature reductio ethods. CK+ JAFFE PCA+SDAE 92.56 83.67 SDAE 80.67 78.73 Table 2 shows the experietal results. The raw pixel feature, feature reduced feature by PCA were put ito the odel for experiets. The results idicate PCA diesio reductio cobied with SDAE for learig ad classificatio ca get better perforace tha the raw pixel features cobied with SDAE. This result deostrates the ecessity of feature diesio reductio. For coclusio, the proposed ethod which cobied the PCA ad SDAE ca get better facial expressio recogitio rate. Figure 5. The expressio recogitio rate of SDAE with differet paraeters o JAFFE dataset. Table. The uber of hidde layers ode of odels. SDAE H6 H5 H4 H3 H2 H 4 300 500 5 200 400 500 6 200 300 400 500 7 00 200 300 400 500 8 50 00 200 300 400 500 4.2.3 Cotrast experiets of differet deep learig ethods Deep learig is proposed i recet years. Several deep learig ethods have bee applied to facial expressio recogitio. So cotrast experiets were set to copare two deep learig ethods ad two classical ethods with our proposed ethod. They are respectively deep belief etworks, covolutioal eural etwork, artificial eural etwork ad support vector achie with local biary patters. The results are displayed as follows. 93
Table 3. The expressio recogitio rate of differet ethods o CK+ ad JAFFE dataset. CK+ JAFFE PCA+SDAE 92.56 83.67 DBN 88.38 79.00 CNN 90.33 82.90 ANN 79.43 73.22 LBP+SVM 8.83 78.33 After aalyzig the experietal results, we ca coclude that the proposed PCA+SDAE ethod ca get higher recogitio rate copared with the above ethods. It deostrates the auto learig ability of deep learig, ad also proves that the auto leared feature ca describe the expressio better tha the raw pixel feature ad extracted feature that ca lead better recogitio rate. 5 CONCLUSION A ovel facial expressio recogitio ethod based o Pricipal Copoet Aalysis ad stacked deoisig autoecoders was proposed i this paper. The experietal results idicate the proposed ethod ca get better perforace tha other deep learig based ethods ad o-deep learig ethods. I additio, the feature reduced feature by PCA ca get better perforace tha the raw pixel features to trai ad test the SDAE. Curretly, the odel structure is chose by experiets, how to choose the optiized odel structure ca be researched i the future. At last, ost of the facial expressio recogitio ca oly applied for frotal face, how to recogize the facial expressio with pose ca also be discussed i the future. ACKNOWLEDGEMENT This paper is sposored by Natural Sciece Foudatio Project of CQ (CSTC, 2007BB2445), Graduate Research ad Iovatio Project of CQ (CYS574) ad ICT & Future Plaig (MSIP) of Korea i the ICT R&D Progra 203. REFERENCES [] Fasel B., Luetti J. 2003. Autoatic facial expressio aalysis: a survey. Patter Recogitio, 36(): 259-275. [2] Suathi C.P., Sathaa T., Mahadevi M. 202. Autoatic facial expressio aalysis a survey. Iteratioal Joural of Coputer Sciece & Egieerig Survey, 36(): 259-275. [3] Caleau C.D. 203. Face expressio recogitio: A brief overview of the last decade. Applied Coputatioal Itelligece ad Iforatics (SACI), 203 IEEE 8th Iteratioal Syposiu o. IEEE, pp: 57-6. [4] Patic M., Rothkratz L.J.M. 2000. Autoatic aalysis of facial expressios: The state of the art. Patter Aalysis ad Machie Itelligece, IEEE Trasactios o, 22(2): 424-445. [5] Boureau Y., Cu Y.L. 2008. Sparse feature learig for deep belief etworks. Advaces i Neural Iforatio Processig Systes. pp: 85-92. [6] Fasel B. Mutliscale. 2002. Facial expressio recogitio usig covolutioal eural etworks. Idia Coferece o Coputer Visio, Graphics ad Iage Processig (ICVGIP 02). (EPFL-CONF-82835). [7] Liu Y., Hou X., Che J., et al. 204. Facial expressio recogitio ad geeratio usig sparse autoecoder. Sart Coputig (SMARTCOMP), 204 Iteratioal Coferece o. IEEE, pp: 25-30. [8] Lv Y., Feg Z., Xu C. 204. Facial expressio recogitio via deep learig. Sart Coputig (SMART- COMP), 204 Iteratioal Coferece o. IEEE, pp: 303-308. [9] Jug H., Lee S., Park S., et al. 205. Developet of deep learig-based facial expressio recogitio syste. Frotiers of Coputer Visio (FCV), 205 2st Korea-Japa Joit Workshop o. IEEE, pp: -4. [0] Liu P., Ha S., Meg Z., et al. 204. Facial expressio recogitio via a boosted deep belief etwork. Coputer Visio ad Patter Recogitio (CVPR), 204 IEEE Coferece o. IEEE, pp: 805-82. [] Vicet P., Larochelle H., Begio Y., et al. 2008. Extractig ad coposig robust features with deoisig autoecoders. Proceedigs of the 25th iteratioal coferece o Machie learig. ACM, pp: 096-03. [2] Vicet P., Larochelle H., Lajoie I., et al. 200. Stacked deoisig autoecoders: Learig useful represetatios i a deep etwork with a local deoisig criterio. Joural of Machie Learig Research, (6): 337-3408. [3] Begio Y. 2009. Learig deep architectures for AI. Foudatios & Treds i Machie Learig, 2(): -27. [4] Begio Y., Labli P., Popovici D., et al. 2007. Greedy layer-wise traiig of deep etworks. Advaces i Neural Iforatio Processig Systes, 9: 53. [5] Jolliffe I. 2002. Pricipal Copoet Aalysis. Joh Wiley & Sos, Ltd. [6] You Q., Zhag Y.J. 203. A ew traiig priciple for stacked deoisig autoecoders. Iage ad Graphics, Iteratioal Coferece o. IEEE, pp: 384-389. [7] Kaade T., Coh J.F., Tia Y. 2000. Coprehesive database for facial expressio aalysis. Autoatic Face ad Gesture Recogitio, 2000. Proceedigs. Fourth IEEE Iteratioal Coferece o. IEEE, pp: 46-53. [8] Lucey P., Coh J.F., Kaade T., et al. 200. The exteded coh-kaade dataset (CK+): A coplete expressio dataset for actio uit ad eotio-specified expressio. IEEE Coputer Society Coferece of Coputer Visio ad Patter Recogitio Workshops, pp: 94-0. [9] Lyos M.J., Budyek J., Akaatsu S. 200. Autoatic classificatio of sigle facial iages. IEEE Trasactios o Patter Aalysis & Machie Itelligece, 2(2): 357-362. [20] Eka P., Friese W.V. 97. Costats across cultures i the face ad eotio. Joural of Persoality & Social Psychology, 7(2): 24-29. [2] Deg H.B., Ji L.W., Zhe L.X., Huag J.C. 2005. A ew facial expressio recogitio ethod based o local Gabor filter bak ad PCA plus LDA. Iteratioal Joural of Iforatio Techology, (): 86-96. 94