Real-time Facial Expression Recognition in Image Sequences Using an AdaBoost-based Multi-classifier

Similar documents
-To become familiar with the input/output characteristics of several types of standard flip-flop devices and the conversion among them.

DO NOT COPY DO NOT COPY DO NOT COPY DO NOT COPY

10. Water tank. Example I. Draw the graph of the amount z of water in the tank against time t.. Explain the shape of the graph.

4.1 Water tank. height z (mm) time t (s)

Measurement of Capacitances Based on a Flip-Flop Sensor

A Turbo Tutorial. by Jakob Dahl Andersen COM Center Technical University of Denmark

Overview ECE 553: TESTING AND TESTABLE DESIGN OF. Ad-Hoc DFT Methods Good design practices learned through experience are used as guidelines:

Adaptive Down-Sampling Video Coding

Workflow Overview. BD FACSDiva Software Quick Reference Guide for BD FACSAria Cell Sorters. Starting Up the System. Checking Cytometer Performance

Lab 2 Position and Velocity

2015 Communication Guide

application software

CE 603 Photogrammetry II. Condition number = 2.7E+06

MULTI-VIEW VIDEO COMPRESSION USING DYNAMIC BACKGROUND FRAME AND 3D MOTION ESTIMATION

Nonuniform sampling AN1

Besides our own analog sensors, it can serve as a controller performing variegated control functions for any type of analog device by any maker.

Automatic location and removal of video logos

LATCHES Implementation With Complex Gates

Drivers Evaluation of Performance of LED Traffic Signal Modules

application software

First Result of the SMA Holography Experirnent

Hierarchical Sequential Memory for Music: A Cognitive Model

Removal of Order Domain Content in Rotating Equipment Signals by Double Resampling

THE INCREASING demand to display video contents

Physics 218: Exam 1. Sections: , , , 544, , 557,569, 572 September 28 th, 2016

Automatic Selection and Concatenation System for Jazz Piano Trio Using Case Data

(12) (10) Patent N0.: US 7,260,789 B2 Hunleth et a]. (45) Date of Patent: Aug. 21, 2007

Video Summarization from Spatio-Temporal Features

Digital Panel Controller

TEA2037A HORIZONTAL & VERTICAL DEFLECTION CIRCUIT

TRANSFORM DOMAIN SLICE BASED DISTRIBUTED VIDEO CODING

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

G E T T I N G I N S T R U M E N T S, I N C.

UPDATE FOR DESIGN OF STRUCTURAL STEEL HOLLOW SECTION CONNECTIONS VOLUME 1 DESIGN MODELS, First edition 1996 A.A. SYAM AND B.G.

A Methodology for Evaluating Storage Systems in Distributed and Hierarchical Video Servers

Telemetrie-Messtechnik Schnorrenberg

MELODY EXTRACTION FROM POLYPHONIC AUDIO BASED ON PARTICLE FILTER

THERMOELASTIC SIGNAL PROCESSING USING AN FFT LOCK-IN BASED ALGORITHM ON EXTENDED SAMPLED DATA

Coded Strobing Photography: Compressive Sensing of High-speed Periodic Events

Student worksheet: Spoken Grammar

On Mopping: A Mathematical Model for Mopping a Dirty Floor

EX 5 DIGITAL ELECTRONICS (GROUP 1BT4) G

Solution Guide II-A. Image Acquisition. HALCON Progress

AN ESTIMATION METHOD OF VOICE TIMBRE EVALUATION VALUES USING FEATURE EXTRACTION WITH GAUSSIAN MIXTURE MODEL BASED ON REFERENCE SINGER

Solution Guide II-A. Image Acquisition. Building Vision for Business. MVTec Software GmbH

The Art of Image Acquisition

A ROBUST DIGITAL IMAGE COPYRIGHT PROTECTION USING 4-LEVEL DWT ALGORITHM

Advanced Handheld Tachometer FT Measure engine rotation speed via cigarette lighter socket sensor! Cigarette lighter socket sensor FT-0801

SC434L_DVCC-Tutorial 1 Intro. and DV Formats

The Art of Image Acquisition

TUBICOPTERS & MORE OBJECTIVE

Computer Vision II Lecture 8

Computer Vision II Lecture 8

Truncated Gray-Coded Bit-Plane Matching Based Motion Estimation and its Hardware Architecture

Sustainable Value Creation: The role of IT innovation persistence

Monitoring Technology

Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion

AUTOCOMPENSATIVE SYSTEM FOR MEASUREMENT OF THE CAPACITANCES

LCD Module Specification

Video inpainting of complex scenes based on local statistical model

Computer Graphics Applications to Crew Displays

BLOCK-BASED MOTION ESTIMATION USING THE PIXELWISE CLASSIFICATION OF THE MOTION COMPENSATION ERROR

United States Patent (19) Gardner

Singing voice detection with deep recurrent neural networks

Mean-Field Analysis for the Evaluation of Gossip Protocols

Region-based Temporally Consistent Video Post-processing

Personal Computer Embedded Type Servo System Controller. Simple Motion Board User's Manual (Advanced Synchronous Control) -MR-EM340GF

SAFETY WITH A SYSTEM V EN

LABORATORY COURSE OF ELECTRONIC INSTRUMENTATION BASED ON THE TELEMETRY OF SEVERAL PARAMETERS OF A REMOTE CONTROLLED CAR

LOW LEVEL DESCRIPTORS BASED DBLSTM BOTTLENECK FEATURE FOR SPEECH DRIVEN TALKING AVATAR

Determinants of investment in fixed assets and in intangible assets for hightech

SMD LED Product Data Sheet LTSA-G6SPVEKT Spec No.: DS Effective Date: 10/12/2016 LITE-ON DCC RELEASE

Supercompression for Full-HD and 4k-3D (8k) Digital TV Systems

LCD Module Specification

Diffusion in Concert halls analyzed as a function of time during the decay process

The Impact of e-book Technology on Book Retailing

ZEP - 644SXWW 640SX - LED 150 W. Profile spot

DIGITAL MOMENT LIMITTER. Instruction Manual EN B

Enabling Switch Devices

CHEATER CIRCUITS FOR THE TESTING OF THYRATRONS

R&D White Paper WHP 120. Digital on-channel repeater for DAB. Research & Development BRITISH BROADCASTING CORPORATION.

Circuit Breaker Ratings A Primer for Protection Engineers

Source and Channel Coding Issues for ATM Networks y. ECSE Department, Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A

Novel Power Supply Independent Ring Oscillator

Marjorie Thomas' schemas of Possible 2-voice canonic relationships

Lancelot TS. Grand W HTI. Followspot. Type: Followspot Source: 4000 W HTI PSU: Magnetic - hot restrike Optics: 2 to 5 zoom.

Ten Music Notation Programs

MELSEC iq-f FX5 Simple Motion Module User's Manual (Advanced Synchronous Control) -FX5-40SSC-S -FX5-80SSC-S

Study of Municipal Solid Wastes Transfer Stations Locations Based on Reverse Logistics Network

Type: Source: PSU: Followspot Optics: Standard: Features Optical Fully closing iris cassette: Long lamp life (3000 h) Factory set optical train:

IDT70V05S/L 8K x 8 DUAL-PORT STATIC RAM

Signing Naturally, Teacher s Curriculum Guide, Units 7 12 Copyright 2014 Lentz, Mikos, Smith All Rights Reserved.

The Measurement of Personality and Behavior Disorders by the I. P. A. T. Music Preference Test

USB TRANSCEIVER MACROCELL INTERFACE WITH USB 3.0 APPLICATIONS USING FPGA IMPLEMENTATION

VECM and Variance Decomposition: An Application to the Consumption-Wealth Ratio

Trinitron Color TV KV-TG21 KV-PG21 KV-PG14. Operating Instructions M70 M61 M40 P70 P (1)

SOME FUNCTIONAL PATTERNS ON THE NON-VERBAL LEVEL

AJ- P. Operating Instructions. Digital Video Cassette Recorder. Printed in Japan VQT S0699W3119 A OFF CH1 CH2 CH2 RESET COUNTER CH3 CH4

Emergence of invariant representation of vocalizations in the auditory cortex

Transcription:

Real-ime Facial Expression Recogniion in Image Sequences Using an AdaBoos-based Muli-classifier Chin-Shyurng Fahn *, Ming-Hui Wu, and Chang-Yi Kao * Naional Taiwan Universiy of Science and Technology, Taipei 10607, Taiwan E-mail: csfahn@mail.nus.edu.w Tel: +886-02-2730-1215 Naional Taiwan Universiy of Science and Technology, Taipei 10607, Taiwan E-mail: M9415054@mail.nus.edu.w Tel: +886-02-2733-3141 ex.7425 Naional Taiwan Universiy of Science and Technology, Taipei 10607, Taiwan E-mail: D9515011@mail.nus.edu.w Tel: +886-02-2733-3141 ex.7425 Absrac In his paper, a highly auomaic facial expression recogniion sysem wihou choosing characerisic blocks in advance is presened. The sysem is able o deec and locae human faces in image sequences acquired in real environmens. To achieve efficien facial expression recogniion, we evaluae he performance of hree differen classifiers using muli-layer perceprons (MLPs), suppor vecor machines (SVMs), and Adaboos algorihms (ABAs). From he experimenal oucomes, we can observe ha he average recogniion raes obained from boh ABAs and MLPs are beer han ha from SVMs, bu he raining of MLPs akes quie a long ime. Comparaively, ABAs have an advanage of faciliaing he speed of convergence, which are chosen as he core echnique o implemen our srong facial expression classifier. Through conducing many experimens, he saisics of performance reveals ha he accuracy rae of our facial expression recogniion sysem reaches more han 90% for a single kind or muliple kinds of expressions appearing in an image sequence. I. INTRODUCTION Robos can share he happiness wih us [1]. To accomplish his, he ineracion beween humans and robos is criical echniques, especially for he face deecion and facial expression recogniion of an image sequence o which more and more researchers have been devoed [2-6]. Zhu e al. [7] adoped he hidden Markov model (HMM) as he classificaion scheme for facial expression recogniion. The accuracy rae of heir facial expression recogniion sysem is saisfacory. However, he compuaion of momen invarians is very ime-consuming, so i can no run in real ime. Zhang and Ji [8] proposed a probabilisic framework o combine emporal and spaial informaion o recognize acion unis (AUs). Colmenarez e al. [9] presened a Bayesian probabilisic approach o recognizing faces and facial expressions. They possessed he muual benefis in similariy measures beween faces and facial expressions. Qin and He [10] ook he echnological advanages of boh he suppor vecor machine (SVM) and Gabor feaure exracion for face recogniion. The sysem proposed in [11] auomaically deeced fronal faces in an image sequence and classified hem ino seven classes in real ime: neural, anger, disgus, fear, oy, sadness, and surprise. A facial expression feaure sream was used o rain a parallel HMM srucure in a similar fashion explained in [12], which provided a probabilisic model for emporal recurren facial expression paerns. To surmoun he shorcomings as saed above, we aemp o develop an auomaic facial expression recogniion sysem ha deecs human faces and exracs facial feaures from an image sequence. This sysem is employed for recognizing six kinds of facial expressions: oy, anger, surprise, fear, sadness, and neural of a compuer user. In he expression classificaion procedure, we mainly compare he performance of differen classifiers using muli-layer percepions (MLPs), SVMs, and AdaBoos algorihms (ABAs). Through evaluaing experimenal resuls, he performance of ABAs is superior o ha of he oher wo. According o his, we develop an AdaBoos-based muliclassifier used in our facial expression recogniion sysem. II. FACE AND FACIAL FEATURE DETECTION In our sysem design philosophy, he skin color cue is an obvious characerisic o deec human faces. To begin wih, we will execue skin color deecion, hen he morphological dilaion operaion, and facial feaure deecion. Subsequenly, a filering operaion based on geomerical properies is applied o eliminae he skin color regions ha do no perain o human faces. A. Color Space Transformaion Face deecion is dependen on skin color deecion echniques which work in one of frequenly used color spaces. In he pas, hree color spaces YCbCr, HSI, and RGB have been exensively applied for skin color deecion. Accordingly, we exrac he common aribue from skin color regions o perform face deecion. The color model of an image capured from he experimenal camera is composed of RGB values, bu i s easy o be influenced by lighing. Herein, we adop he HSI color space o replace he radiional RGB color space for skin color deecion. We disinguish skin color regions from nonskin color ones by means of lower and upper bound hresholds. Via many experimens of deecing human faces, we choose he H value beween 3 and 38 as he range of skin colors. B. Conneced Componen Labeling Afer he processing of skin color deecion, we employ he linear-ime conneced-componen labeling echnique 09-0100080017 2009 APSIPA. All righs reserved.

proposed by Suzuki e al. [13] o complee he componens conneced. The following depics heir algorihm which consiss of hree pars: he firs scan, forward scan, and backward scan. We resor o his algorihm for wo main benefis: (i) i is based on only sequenial local operaions, so i does no require a search algorihm o solve label equivalences; (ii) he conneciviy is achieved by simply reading and wriing a one-dimensional able which sores label equivalences during he scans. C. Face Region Verificaion The deailed seps of face region verificaion are described in he following: (i) Componen size udgmen Our sysem delees all his kind of conneced componens, if he pixel number of a conneced componen is smaller han 5,000 or greaer han 50,000. (ii) Aspec raio udgmen Since he heigh of a human face is mosly greaer han he widh, we uilize he aspec raio o verify face regions; ha is, we discard he box wih he heigh smaller han he widh. In he ligh of experimens, he heigh of a human face is usually greaer han or equal o he widh and smaller han or equal o hree imes of he widh. These crieria are expressed in (1) o locae probable face regions. BW B and H BH 3B (1) W where B H and B W are he heigh and widh of a circumscribed box, respecively. (iii)face region segmenaion Our facial feaure exracion mehod is mainly based on he normal posiions of facial feaures. For example, he mouh lies in he lower half area of a face region, and he eyes lie in he upper half area. Therefore, we mus clearly decide he lower boundary of a face region. The following prescribes he lower boundary of a face region. FL, if BH / BW 1.4 FL = (2) FU + 1.4 BW, if BH / BW > 1.4 where F L and F U are he lower and upper boundaries of a face region, individually. D. Pupils Deecion We exploi he rule ha he probable posiions of pupils are approximaely siuaed in a face region by 0.5~0.8 ime of he heigh and 0.15~0.85 ime of he widh referring o he lower lef corner of he face region. The one of heir eyes regions is comparaively dark o he skin color. Hence, we can adop his characerisic o udge he posiions of pupils. We can observe ha he eyebrows are locaed above he pupil, and he hair is also above he pupil or on is lef side usually. Therefore, we sar from he lower righ corner of he lef half image o lefwards search he firs whie pixel in row-maor order. Then from his pixel owards boh he lef and up, se up a square region of 10 10 pixels. From calculaing he cener of graviy of he whie pixels in his region, he posiion of he lef pupil is received; likewise, we can search ou he posiion of he righ pupil. E. Cener of a Mouh Deecion We uilize he rule ha he probable posiion of a mouh lies approximaely in he face image by 0.05~0.55 ime of he heigh and 0.2~0.8 ime of he widh referring o he lower lef corner of he face region. According o our observaion and experimens, he color of a lip is usually darker han he skin color, and i has a greaer red componen bu a smaller blue one. For locaing he region of a mouh, we apply he H value o deec lip-colored areas by use of lower and upper bound hresholds. Through doing many experimens, we choose he H value beween 0 and 6 as he range of lip colors. Finally, from calculaing he cener of all black pixels, he posiion of he cener of he mouh is acquired. III. FACIAL LANDMARKS EXTRACTION Afer deecing a face region and finding is pupils and cener of a mouh, we will perform facial landmarks exracion which is he crucial sep of an expression recogniion sysem. In general, each facial expression conains pleny of disincive feaures. If we disinguish facial expressions by exracing all he disincive feaures, i will spend a lo of execuion ime. Hence, we alernaively choose some landmarks which could represen he changes of facial expressions. I hen reduces much compuaional load o reach our expression recogniion sysem in real ime. Firs, we draw he proper ranges of eyes, mouh, and eyebrows relaed o he posiions of pupils and cener of a mouh. Nex, we perform boh he binarizaion and edge deecion operaions on he above ranged images and find 16 landmarks on a human face as shown in Fig. 1 o obain he characerisic informaion of facial expressions. Fig. 1 Facial landmarks on a human face. A. Landmarks Exracion of Eyes According o our experimens, we can observe ha mos of he upper horizonal bounds of eye regions lie beyond he base line hrough he pupils by 0.2 ime of he uni of lengh, he lower horizonal bounds of eye regions lie below by 0.4 ime of he uni of lengh, he inner verical bounds of eye regions lie inwards from he cenral poins of he wo pupils by 0.33 ime of he uni of lengh, and he ouer verical bounds of eye regions lie ouwards by 0.5 ime of he uni of lengh. Fig. 2 shows he proper recangular ranges of he lef and righ eye regions.

bounds of eyebrow regions lie ouwards by 0.6 ime of he uni of lengh. Fig. 4 shows he proper recangular ranges of he lef and righ eyebrow regions. Fig. 2 The recangular ranges of eye regions. In he sequel, we have o conduc he binarizaion and Sobel edge deecion processing in hese ranges of eye regions. Because our experimenal environmen is ofen influenced by varied illuminaion; for example, day, nigh or power of he fluorescen lamp, we mus aler he hresholds due o environmenal changes, and hen make our sysem aain beer performance. Afer his, we carry ou he logical AND operaion on he wo binary eye region images ha are respecively derived from he binarizaion and edge deecion processing. Our sysem would obain eigh landmarks on boh eyes from he candidae landmarks, which represen par of facial expression feaures abou eyes. Fig. 3(a) illusraes he range of searching he candidae landmarks of eyes, and some resuls of exracing he landmarks of eyes are indicaed in Fig. 3(b). Fig. 4 The recangular ranges of eyebrow regions. Firs of all, we mus respecively accomplish he binarizaion and Sobel edge deecion processing in he ranges of eyebrow regions, and perform he logical AND operaion on he wo resuling binary images. The hair usually lies in he periphery or he op of a face, which is prone o disurb eyebrow regions. Therefore, we only define one landmark on he inner rim and anoher on he cener of an eyebrow. Our sysem would find four landmarks on a pair of eyebrows from he candidae landmarks, which symbolize par of facial expression feaures abou eyebrows. Fig. 5(a) illusraes he range of searching he candidae landmarks of eyebrows, and some oucomes of exracing he landmarks of eyebrows are indicaed in Fig. 5(b). (a) (a) (b) Fig. 3 Finding he landmarks of eyes: (a) he searching range; (b) some locaing resuls. B. Landmarks Exracion of Eyebrows I is a lile difficul o exrac he landmarks of eyebrows, because eyebrows ofen appear in differen posiions for disinc expressions such as anger, sadness, and oy. Hence, he regular posiion and range of an eyebrow are hard o define. To overcome his, we uilize he color difference of eyebrows and he skin. I can be observed ha mos of he upper horizonal bounds of eyebrow regions lie beyond he base line hrough he pupils by 0.75 ime of he uni of lengh, he lower horizonal bounds of eyebrow regions lie beyond by 0.13 ime of he uni of lengh, he inner verical bounds of eyebrow regions lie inwards from he cenral poins of he wo pupils by 0.45 ime of he uni of lengh, and he ouer verical (b) Fig. 5 Finding he landmarks of eyebrows: (a) he searching range; (b) some locaing resuls. C. Landmarks Exracion of a Mouh In accordance wih our observaion and experimens, he lip color is usually darker han he skin color, so we apply his characerisic o exrac he landmarks of a mouh. I is also observed ha mos of he upper horizonal bounds of mouhs lie beyond heir cenral poins by 0.35 ime of he uni of lengh, he lower horizonal bounds lie below by 0.55 ime of he uni of lengh, he lef verical bounds of mouhs lie lefwards from he cenral poins by 0.6 ime of he uni of lengh, and he righ verical bounds lie righwards by 0.6 ime of he uni of lengh. Fig. 6 shows he proper recangular range of a mouh region.

implemen a classifier o recognize six kinds of expressions using AdaBoos algorihms (ABAs) raher han muli-layer perceprons (MLPs) and suppor vecor machines (SVMs) [14]. Fig. 6 The recangular range of a mouh region. Following ha, we firs respecively carry ou he binarizaion and Sobel edge deecion processing in he range of a mouh. In his phase, we aler he hresholds along wih environmenal changes o make our sysem aain beer resuls. Then he logical AND operaion is performed on he wo binary images o resul in a refined binary edged image of a mouh. Our sysem would acquire four landmarks on a mouh from he candidae landmarks, which sand for par of facial expression feaures abou he mouh. Fig. 7(a) illusraes he range of searching he candidae landmarks of a mouh, and some resuls of exracing he landmarks of mouhs are indicaed in Fig. 7(b). (b) Fig. 7 Finding he landmarks of a mouh: (a) he searching range; (b) some locaing resuls. (a) IV. FACIAL EXPRESSION RECOGNITION The design philosophy of classificaion is based on he difference beween one kind of facial expressions and a neural facial expression. From he 16 landmarks of a human face, we compue 16 characerisic disances which represen a kind of expressions. Then we subrac he 16 characerisic disances of a neural facial expression from hose of a cerain kind of expressions o acquire is corresponding 16 displacemen values. Afer performance evaluaion, we A. Feaure Manipulaions In he ligh of our observaion, he landmarks of eyebrows and mouhs will emerge more obvious displacemen for he six kinds of facial expressions excep he neural. For example, when he people are oyful, he facial landmarks on he lef and righ corners of a mouh are raised up and drawn apar o boh sides, while he people are angry, he facial landmarks on he inner rims of eyebrows are pressed inwards and downwards. Because he locaions of facial landmarks affeced by each kind of facial expressions are no he same, in order o recognize facial expressions effecively, we mus undersand he relaionship beween a kind of facial expressions and is displacemen of he corresponding facial landmarks. Table I shows he disinguishing feaures of differen facial expressions. TABLE I THE DISTINGUISHING FEATURES OF DIFFERENT FACIAL EXPRESSIONS Facial Disinguishing Feaure Expression 1. The corners of a mouh are raised up. Joy 2. The widh of a mouh becomes large. 3. Eyes are a lile diminished. 1. Two eyebrows are close o each oher. Anger 2. The inerval beween eyebrows appears verical lines. 3. Eyes open widely. 1. The eyebrows are raised up. Surprise 2. The chin is fallen down. 3. The heigh of a mouh becomes large. 1. Two eyebrows are close o each oher or raised up. Fear 2. The widh of a mouh becomes large. 3. Eyes open widely. 1. The corners of a mouh are fallen down. 2. Two eyebrows are a lile close o each oher. Sadness 3. Eyes are a lile diminished. 4. The upper lip is carried up. Nex, wih reference o he 16 landmarks on a human face as shown in Fig. 8, we produce 16 characerisic disances which are he main feaures used for recognizing facial expressions and calculaed in he following way: 1 D = M1 M 2 2 D = M 3 M 4 ( ) ( ) 3 D = M1+ M2 2 M3 4 D = M1+ M2 2 M4 5 D = EB1 EB3 6 D = EB2 EB4 7 D = EB1 Pl (3.1) (3.2) (3.3) (3.4) (3.5) (3.6) (3.7)

8 D = EB2 Pl 9 D = EB3 Pr 10 D = EB4 Pr 11 D = E3 E4 12 D = E7 E8 13 D = E2 M1 14 D = E6 M 2 15 D = E1 E5 ( ) 16 D = Pl + Pr 2 M4 (3.8) (3.9) (3.10) (3.11) (3.12) (3.13) (3.14) (3.15) (3.16) where E 1, E 2,, E 8 are he landmarks of eyes; EB 1, EB 2,, EB 8 are he landmarks of eyebrows; M 1, M 2,, M 8 are he landmarks of a mouh; P l and P r are he posiions of he lef and righ pupils, respecively. E2 EB2 Pl M1 E3 E4 EB1 E1 Fig. 8 The 16 landmarks used o generae 16 characerisic disances on a human face. M3 M4 Because he size of faces exraced by our recogniion sysem is varied, he derived characerisic disances are irregular. Hence, we need o normalize hese characerisic disances ha can make our recogniion sysem more accurae. Firs of all, we ake he disance d beween wo pupils as he uni of lengh, because i will no be changed wih differen facial expressions. All he original 16 characerisic disances are divided by he uni of lengh o obain 16 normalized characerisic disances expressed as follows. i i D = D d (4) where D i, i=1, 2,,16 are he original characerisic disances. And hen we apply hese characerisic disances o ge he displacemen values ha are fed o he classifier. In a sequence of human face images wih neural facial expressions, say 10 frames, we compue a mean value which is saved as a reference value for each normalized characerisic disance, and subrac he 16 reference values from he 16 normalized characerisic disances in he EB3 E5 E7 E8 EB4 Pr M2 E6 subsequen frames as depiced in Eq. (5). Such 16 displacemen values ac as he facial expression feaures inpued o our recogniion sysem. i i i S = D D r, = 11, 12,... (5) ' where D i is he i-h normalized characerisic disance in he ' -h frame and D i is he i-h reference characerisic disance. B. The AdaBoos Algorihm The AdaBoos algorihm (ABA) was proposed in he lieraure of compuaional learning heory in 1996 [15]. I has wo differen versions: one is used for binary classificaion problems and he oher is o deal wih he problems wih more han wo classes. The ABA generaes a hypohesis whose error on he raining se is small by combining many hypoheses whose errors may be large (bu sill beer han random guessing). Fig. 9 is he generalized version of he ABA for binary classificaion problems. Given a raining sample se: S ~ ={(x 1, y 1 ), (x 2, y 2 ),, (x m, y m )} wih yi { 1, + 1}. ~ Iniialize he disribuion: D ( i ) = 1/ m, i = 1,2,..., m. 1 For = 1,2,,T: xi X and (1)Train he weak classifier using he disribuion ~ D ( i ), i = 1,2,..., m. (2)Ge he weak hypohesis g : X { 1, + 1}. (3)Updae he disribuion ~ ( i) ~ ( i) D1 = D exp( η yi g ( xi )) / Z, i = 1,2,..., m where Ζ ~ is a normalizaion facor (guaraneed ha ( i ) D + is sill a disribuion) and 1 1 ε m ln η = wih ~ ( i) ε = 2 ε D [ yi g ( xi )]. i 1 End For Oupu he final hypohesis: T Gx ( ) = sign ηg( x). = 1 Fig. 9 A generalized version of he ABA. C. The Weak Classifier The weak classifier is he essenial par of an ABA. Each weak classifier produces he answer yes or no for paricular feaures. The ABA is very flexible, and can be improved by combining a sequence of weak classifiers, each of whose associaed condiional probabiliies is deermined by he oupu of he previously uned weak classifiers. Such boosing and random subspace mehods have been designed as decision rees, where hey ofen produce an ensemble of classifiers, which is superior o a single classificaion rule. Herein, we adop Classificaion and Regression Trees (CARTs) as he srucure of a weak classifier uned by he ABA. The classical CART algorihm was proposed by Breiman e al. in 1984 [16]. I builds a binary decision ree which splis 1

a single variable a each node for predicing coninuous dependen variables (regression) and caegorical predicor variables (classificaion). The leaves and nodes of he decision ree represen he resuls of classificaion and he predicion rules, respecively. The CART algorihm conducs a horough search recursively for all variables whose values are caegorized ino wo groups using a hreshold o find ou an opimal spliing rule for each node. We can regard he classificaion via a decision ree as a ree raversal process. A node of he CART is consruced by use of he following rules. Given a raining sample se S ~ = {(x 1, y 1 ), (x 2, y 2 ),, (x m, y m )}, n where each x i belongs o an insance space X R (each vecor wih dimensionaliy n; x i =(x i1, x i2,, x in )) and each label y i belongs o a finie label space Y { 1,1} : Rule 1. For each feaure (all dimensions), deermine a hreshold which separaes he sample se S wih a minimal classificaion error. Rule 2. Selec he -h feaure wih he minimal error and build a CART node. (i) Se up he branch condiion: ξ > hreshold. (ii)arrange he branches ha are conneced wih leaves o perform respecive classificaion. And suppose ha he classificaion error associaed wih a leaf is he probabiliy of a sample being misclassified. We sop he ree raversal when encounering a misclassificaion. The whole CART is consruced by means of he following seps: Sep 1. Consruc he roo of a CART wih he minimal error node. Sep 2. Selec he leaf wih he larges error. Sep 3. Consruc a node using only hose samples which are associaed wih he chosen leaf. Sep 4. Replace he chosen leaf by he new consruced node. Sep 5. Repea Seps 2-4 unil all leaves have no error or reach he predefined condiions. Fig. 10 illusraes an example of he CART where four nodes are consruced. ξ 2 > 3 classificaion srucure for he recogniion of muli-class facial expressions. This srucure is possessed of a good propery ha we can updae he models by raining only new added daa, wihou modifying he whole models rained earlier. Fig. 11(a) shows an AdaBoos-based classifier M 1 ha recognizes wo kinds of expressions. (a) (b) Fig. 11 Illusraion of classifier expansion: (a) an AdaBoos-based binary classifier; (b) an AdaBoos-based ernary classifier from expanding (a). When we add anoher kind of expressions, we us consruc he node M 2 by aking boh he expressions A and B as he negaive samples and he expression C as he posiive sample as shown in Fig. 11(b). Hence, we can uilize his srucure o recognize hree kinds of facial expressions. From his example, we can see ha when a new expression pu o he srucure, we us consruc a mos wo nodes. By feeding he feaures of a facial expression o he srong classifier rained wih he ABA, we can acquire a weigh of he final predicion. The facial expressions will be recognized if his weigh is posiive wih respec o one kind of expressions by he srong classifier. In our facial expression recogniion sysem, here are six kinds of expressions, including oy, anger, surprise, fear, sadness, and neural, o be classified. According o he exraced feaures, he classificaion of facial expressions is someimes ambiguous. Therefore, he sevenh leaf sanding for he Oher kind of expressions is required in our AdaBoos-based muli-classifier as shown in Fig. 12. ξ 1 > 1.5 ξ 4 > 2.2 ξ 3 > 0 Fig. 10 Illusraion of a 4-spli CART. D. Our Proposed AdaBoos-Based Muli-Classifier Because he ABA is primarily applied o a binary classifier, we furher develop a boom-up hierarchical Fig. 12 An example of a muli-classifier for discriminaing seven classes.

V. EXPERIMENTAL RESULTS The experimenal resuls consis of hree main pars: face deecion using skin color segmenaion, facial expression recogniion using an ABA, and sequenial composie expressions recogniion using he ABA ogeher wih he Maoriy voing scheme. The hardware and sofware used in hese experimens are lised in Table II. A presen, he facial feaures of en persons (eigh males and wo females) are sored in our daabase, and each person has 1,200 maerials comprising oy, anger, surprise, fear, sadness, and neural, each of which conains 200 maerials. Tha is, here are 12,000 maerials oally in our facial expression daabase. Fig. 13 shows some image samples of six kinds of expressions, where he faces may be panned from -30 o 30 and iled from -10 o 10. TABLE II THE HARDWARE AND SOFTWARE USED IN THE FACIAL EXPRESSION RECOGNITION SYSTEM Hardware Sofware CPU: Penium4 3.2GHz RAM: 512MB Tool: Borland C++ Builder 6.0 MATLAB 7.2 Camera: Logiech Quick-Cam Pro Operaing Sysem: Microsof 4000 Windows XP The place where we compleed he experimens is a Naional Taiwan Universiy of Science and Technology. The subecs are all he graduae sudens in he Image Processing and Paern Recogniion Laboraory. In he firs of his secion, we will illusrae our facial expression daabase. Then we will show he face deecion resuls from capured image sequences. Subsequenly, we will compare hree differen classifiers of recognizing facial expressions using MLPs, SVMs, and ABAs. Finally, we will show he sequenial composie facial expressions recogniion resul from various image sequences. A. The Facial Expression Daabase Up o now, no sandard daabase has been generally acknowledged by inernaional researchers in he field of facial expression recogniion, bu here are some daabases commonly used in experimens; for example, Daabase Japanese Female Facial Expressions (JAFFE) [17], Cohn- Kanade Facial Expression Daabase [18], Ekman-Hager Facial Acion Exemplars [19], and The CMU Pose, Illuminaion, and Expression (PIE) Daabase of Human Faces [20]. Of hese daabases, some have a single image frame of facial expressions bu no coninuous image sequences, and some have gray images bu no colored images. They are no all suiable for esing our facial expression recogniion sysem. Therefore, we se up one small-scale daabase of facial expressions by ourselves using he web camera Logiech Quick-Cam Pro 4000 o ake image sequences wih he resoluion of 320 240 pixels. In addiion, his daabase employed in our sysem is differen from he ordinary daabases of facial expressions. Such daabases usually sore saic face images one by one. In consequence, he facial feaures are exraced from only a single face image of heir daabases each ime. On he conrary, in our sysem we direcly exrac facial feaures from an image sequence wihou soring image frames. This mehod could no only accelerae he speed of seing up he facial expression daabase, bu save he wase of he hard disk space. Fig. 13 The raining image samples of six kinds of expressions. B. The Resuls of Face Deecion The face deecion procedure is accomplished by he following processes in order: skin color deecion, morphological operaion, conneced componen labeling, componen size udgmen, aspec raio udgmen, and proper face region segmenaion. The course of his procedure probably coss 0.06 seconds. Some face deecion resuls are shown in Fig. 14. Fig. 14 The resuls of face deecion in complex backgrounds. We perform he face deecion experimens in hree image sequences of 300, 400, and 500 coninuous frames, respecively. There are hree differen subecs including wo males and one female in hese sequences. The correc raes of face deecion of hese hree image sequences are almos idenical. In addiion, we define he Error rae as he percenage of regarding an inhuman face as a human face, and he Miss rae is he percenage which someone's face appears in an image sequence bu he sysem has no deeced i ou. Table III shows he resuls of face deecion raes of he above experimens. The reason why he errors of face deecion occur is ha he ligh is insufficien or he colors of some regions are close o he skin color. Consequenly, all he erroneous face candidaes may no mee he condiions such as he aspec raio and he size of he box bounding a face.

Exp. TABLE III THE STATISTICS OF FACE DETECTION RATES The Toal Number of Faces The Number of Deeced Faces Correc Rae Error Rae Miss Rae I 300 295 98.33% 1.30% 0.37% II 400 393 98.25% 1.43% 0.32% III 500 492 98.40% 1.25% 0.35% C. The Resuls of Facial Expression Recogniion Our sysem is designed o recognize facial expressions on an image sequence, bu acually, he recogniion is achieved by udging a single image each ime. Herein, we mainly compare he performance of differen classifiers using MLPs, SVMs, and ABAs, and hen conclude heir pluses and minuses. Since all hese machine learning mehods are supervised ones, we have o acquire some samples o rain he classifiers. During he raining, we employ 10-fold crossvalidaion o esimae he accuracy of differen sysem models. Owing o he limied lengh of a piece of wriing, we only describe our AdaBoos-based muli-classifier below. Because he ABA we adop is a binary classifier, we propose a boom-up hierarchical classificaion srucure consising of properly arranged ABAs for facial expression recogniion. Such a decision ree of recognizing he six kinds of expressions is similar o ha based on SVMs [21]. The oal raining ime for he condiion of 12,000 samples by means of 16 CART splis and 300 ieraions of he ABA is abou 4 minues. The deailed experimenal daa are recorded in Table IV. And we can see ha he accuracy raes of recognizing expressions are beer and evener han hose resuling from SVMs. TABLE V DEFINITION OF PRECISION AND RECALL RATES Noaion Definiion True posiive Precision True posiive + False posiive True posiive Recall True posiive + False negaive True posiive Resul I Ground ruh False posiive Resul I Ground ruh False negaive Resul I Ground ruh TABLE VI THE RECALL AND PRECISION RATES OF THE THREE CLASSIFIERS Expression Type Recall Rae MLP Precision Rae Neural 96.6% 96.6% Joy 94.7% 94.7% Anger 95.9% 95.9% Surprise 99.7% 99.7% Fear 93.8% 93.8% Sadness 93.5% 93.5% Oher 95.4% 95.4% Expression Type Recall Rae SVM Precision Rae Neural 94.2% 94.2% Joy 91.5% 91.5% Anger 94.3% 94.3% Surprise 99.9% 99.9% Fear 86.1% 86.1% Sadness 88.2% 88.2% Oher 91.3% 91.3% TABLE IV THE FACIAL EXPRESSION RECOGNITION RESULTS FROM ABAS Expression Type Recall Rae ABA Precision Rae Expression ype Recogniion resul Neural Joy Anger Surprise Fear Sadness Oher Neural 1923 1 29 2 10 28 7 Joy 21 1929 0 1 35 3 11 Anger 7 1 1971 0 15 3 3 Surprise 0 0 0 1995 5 0 0 Fear 4 119 1 1 1835 15 24 Sadness 26 8 44 0 9 1907 6 Oher 2 3 4 1 8 6 1976 To furher show he performance of he above experimens, we will inroduce he definiion of precision and recall raes as depiced in Table V. I means ha he mos expressions can be classified correcly. Neural 96.2% 96.2% Joy 96.5% 96.5% Anger 98.6% 98.6% Surprise 99.8% 99.8% Fear 91.8% 91.8% Sadness 95.4% 95.4% Oher 98.8% 98.8% In he calculaion, 2,000 maerials are viewed as he correc samples, and he oher 10,000 maerials are regarded as he wrong samples. Table VI records he precision and recall raes of he above experimens, and Table VII summarizes he sysem performance using hree differen classificaion echniques. By observing Tables VI and VII, we can find ha he average recogniion raes obained from boh he MLPs and ABAs are beer han ha from SVMs. Especially, he accuracy raes are raised for recognizing he expressions Joy, Fear, and Sadness. And we inspec

ha all he recogniion raes received from MLPs are very even, bu he raining of MLPs akes quie a long ime. Excep he accuracy raes of recognizing expressions Neural and Fear, he oher accuracy raes obained from he ABAs are superior o hose from he MLPs. On he oher hand, we compare SVMs wih ABAs, and he performance of he former is worse han ha of he laer. I is due o he ABAs consiuing a srong classifier composed of some weak classifiers which have greaer adapabiliy. The goal of an SVM is o find he bes hyperplane o group he inpu daa ino wo classes. I can ac as a weak classifier used in he ABAs. On he whole, he classificaion resul obained from he ABAs is beer han ha from he SVMs. In consequence, we choose ABAs as he classificaion mehod o realize our facial expression recogniion sysem. TABLE VII SYSTEM PERFORMANCE OF THE THREE CLASSIFIERS Classificaion Technique Average Recogniion Rae MLP SVM ABA 95.7 % 92.2 % 96.7 % TABLE IX CLASSIFICATION RESULTS OF SEQUENTIAL COMPOSITE FACIAL EXPRESSIONS Video label N-J Sur-A A-F N-Sad Sur-J F-Sad Sequenial classificaion resul Neural Neural Neural Neural (Oher) Joy Joy Joy Joy Joy Surprise Surprise Surprise Surprise Surprise (Anger) Anger Anger Oher Anger Anger Anger Anger Anger (Oher) Fear Joy Fear Fear Fear Neural Neural Neural Neural Neural (Sadness) Sadness Sadness Sadness Sadness Surprise Surprise Surprise Surprise Surprise (Fear) Joy Joy Joy Joy Joy Joy Joy Fear Fear Fear Fear Fear Fear (Sadness) Neural Sadness Sadness Sadness Sadness In his experimen, he oal number of es process unis is 65 and he number of process unis correcly classified is 59. The correc classificaion rae is abou 90.7%. Fig. 15 shows an example frame of he es image sequences, each of which only has wo kinds of facial expressions for simplifying demonsraion. Average Training Time 25 Min 11 Min 8 Min The following ess our sysem on image sequences. Each of which us has wo kinds of expressions. Table VIII liss he es image sequences of composie facial expressions. To classify muliple kinds of expressions in a single image sequence, we repor he classificaion resul for each process uni. Herein, we simply rea a single frame as a process uni which is classified ino a kind of expressions. The classificaion resul of a process uni which is parenhesized by parenheses sands for he change of expressions in an image sequence. Therefore, he process uni a such a momen easily makes he recogniion sysem ambiguous. TABLE VIII TEST IMAGE SEQUENCES OF COMPOSITE FACIAL EXPRESSIONS Composie expression ype Video label Neural and Joy Surprise and Anger Anger and Fear Neural and Sadness Surprise and Joy Fear and Sadness N-J Sur-A A-F N-Sad Sur-J F-Sad In Table IX, he expression replied from he sysem, which is prined in a fon of boldface, means a misclassificaion resul. As menioned above, excep he failure in classifying he process uni a he momen of expression changes, he oher failures are caused by he high similariy beween wo kinds of expressions, especially for he expressions Joy and Fear. Fig. 15 Demonsraion of he facial expression changing from Surprise o Anger. VI. CONCLUSIONS AND FUTURE WORKS In his paper, we have presened a highly auomaic facial expression recogniion sysem in which a face deecion procedure is firs able o deec and locae human faces in image sequences acquired in real environmens. We need no label or choose characerisic blocks in advance. In he face deecion procedure, some geomerical properies are applied o eliminae he skin color regions ha do no belong o human faces. I requires no oo much miscellaneous calculaion and could accelerae he processing speeds of he facial expression recogniion sysem. In he facial feaure exracion procedure, we only perform boh he binarizaion and edge deecion operaions on he proper ranges of eyes, mouh, and eyebrows o obain he 16 landmarks of a human face o furher produce 16 characerisic disances which represen a kind of expressions. I can effecively reduce he influence of noises originaed from he oher ranges and lower he wrong siuaion of exracing he landmarks o increase he recogniion rae of he whole sysem.

During he developmen of he facial expression classificaion procedure, we evaluae hree machine learning mehods: MLPs, SVMs, and ABAs. We combine ABAs wih CARTs, which selecs weak classifiers and inegraes hem ino a srong classifier auomaically. I no only akes less raining ime han he oher machine learning mehods do, bu also enhances he classificaion capabiliy. Thus, we can updae raining samples o handle differen siuaions, bu need no spend much compuaional cos. According o hese, we selec he ABA as he classifier of he facial expression recogniion sysem. The hroughpu obained is from 5 o 8 frames per second, and he performance of he sysem is very saisfacory, whose recogniion rae achieves more han 90%. Hence, he facial expression recogniion sysem we proposed is quie closed o a real-ime facial expression recogniion one. Some fuure works are worh invesigaing o aain beer performance. In our curren feaure exracion procedure, only color and edge cues are adoped, and we will focus on adding some oher cues such as he exure feaures of a human face o make i more robus. In he facial expression classificaion procedure, if he number of expressions ha should be recognized increases, he execuion ime will also increase wih i. We will replace he AdaBoosbased binary classifier by he one wih he abiliy of classifying more han wo classes o overcome his problem. Moreover, in he facial expression classificaion procedure, he crux of he maer is ha people s expression changes usually have coninuiy wih he elapsed ime. If we can consider he ime informaion in his procedure, i will raise he whole reliabiliy of he facial expression recogniion sysem. In he near fuure, we will employ he echniques of hidden Markov models (HMMs) [22] or condiional random fields (CRFs) [23] for improving he accuracy of facial expression recogniion. ACKNOWLEDGMENT The auhors would like o hank he Naional Science Council of Taiwan for her suppor in par under Gran NSC95-2213-E-011-105. REFERENCES [1] Y. Sakagami e al., The inelligen ASIMO: Sysem overview and inegraion, in Proc. of he IEEE/RSJ In. Conf. on Inell. Robos Sys., Saiama, Japan, vol. 3, pp. 2478-2483, 2002. [2] G. L. Foresi, C. Micheloni, L. Snidaro, and C. Marchiol, Face deecion for visual surveillance, in Proc. of he 12h IEEE In. Conf. on Image Anal. Process., Udine, Ialy, pp. 115-120, 2003. [3] Z. Zhang, M. Lyons, M. Schuser, and S. Akamasu, Comparison beween geomery-based and gabor-wavelesbased facial expression recogniion using muli-layer percepron, in Proc. of he IEEE In. Conf. on Auoma. Face Gesure Recogni., Sophia Anipolis, France, pp. 454-459, 1998. [4] M. H. Yang, D. Kriegman, and N. Ahua, Deecing faces in images: A survey, IEEE Trans. on Paern Anal. Machine Inell., vol. 24, no. 1, pp. 34-58, 2002. [5] G. Yang and T. S. Huang, Human face deecion in a complex background, Paern Recogni., vol. 27, no. 1, pp. 53-64, 1994. [6] K. C. Yow and R. Cipolla, Feaure-based human face deecion, Image Vision Compu., vol. 15, no. 9, pp. 712-735, 1997. [7] Y. Zhu, L. C. De Silva, and C. C. Ko, Using momen invarians and HMM in facial expression recogniion, in Proc. of he 4h IEEE Souhwes Symp. on Image Anal. and. Inerpre., Singapore, pp. 305-309, 2000. [8] Y. Zhang and Q. Ji, Acive and dynamic informaion fusion for facial expression undersanding from image sequences, IEEE Trans. on Paern Anal. Machine Inell., vol. 27, no. 5, pp. 699-714, 2005. [9] A. Colmenarez, B. Frey, and T. S. Huang, A probabilisic framework for embedded face and facial expression recogniion, in Proc. of he In. Conf. on Compu. Vision Paern Recogni., New York, NY, pp. 592-597, 1999. [10] J. Qin and Z. S. He, A SVM face recogniion mehod based on Gabor-feaured key poins, in Proc. of he 4h In. Conf. on Mach. Learning Cybern., Chongqing, China, pp. 18-21, 2005. [11] M. S. Barle, G. Lilewor, I. Fasel, and J. R. Movellan, Real ime face deecion and facial expression recogniion: Developmen and applicaions o human compuer ineracion, in Proc. of he In. Conf. on Compu. Vision Paern Recogni. Workshop, San Diego, CA, vol. 5, pp. 53-58, 2003. [12] M. E. Sargin e al., Prosody-driven head-gesure animaion, in Proc. of he In. Conf. on Acous., Speech, and Signal Process., vol. 2, Isanbul, Turkey, pp. 677-680, 2007. [13] K. Suzuki, I. Horiba, and N. Sugie, Linear-ime connecedcomponen labeling based on sequenial local operaions, Compu. Vision Image Undersand., vol. 89, no. 1, pp. 1-23, 2003. [14] S. Theodoridis and K. Kouroumbas, Paern Recogniion, 3rd Ed., San Diego: Academic Press, 2006. [15] Y. Freund and R. E. Schapire, Experimens wih a new boosing algorihm, in Proc. of he 13h In. Conf. on Mach. Learning, Bari, Ialy, pp. 148-156, 1996. [16] L. Breiman, J. Friedman, R. Olshen, and C. Sone, Classificaion and Regression Trees, Boca Raon: Chapman and Hall, 1984. [17] M. Lyons, S. Akamasu, M. Kamachi, and J. Gyoba, Coding facial expressions wih Gabor waveles, in Proc. of he 3rd IEEE In. Conf. on Auoma. Face Gesure Recogni., Nara, Japan, pp. 200-205, 1998. [18] T. Kanade, J. Cohn, and Y. Tian, Comprehensive daabase for facial expression analysis, in Proc. of he 4h IEEE In. Conf. on Auoma. Face Gesure Recogni., Pisburgh, PA, pp. 46-53, 2000. [19] P. Ekman, J. Hager, C. H. Mehvin, and W. Irwin, Ekman- Hager facial acion exemplars, unpublished. [20] T. Sim, S. Baker, and M. Bsa, The CMU pose, illuminaion, and expression (PIE) daabase of human faces, Tech. Repor CMU-RI-TR-01-02, Roboics Ins., Carnegie Mellon Univ., Pisburgh, PA, 2001. [21] M. H. Wu, A facial expression recogniion sysem based on he facial landmarks exraced from an image sequence, Maser Thesis, Dep. of Compu. Sci. and Inform. Eng., Na. Taiwan Univ. of Sci. and Tech., Taipei, Taiwan, 2008. [22] L. R. Rabiner and B. H. Juang, An inroducion o hidden Markov models, IEEE Acous. Speech Signal Process. Mag., vol. 3, no. 1, pp. 4-16, 1986. [23] Y. Wang and Q. Ji, A dynamic condiional random field model for obec segmenaion in image sequences, in Proc. of he IEEE Compu. Soci. Conf. on Compu. Vision Paern Recogni., vol. 1, New York, NY, pp. 264-270, 2005.