Truncated Gray-Coded Bit-Plane Matching Based Motion Estimation and its Hardware Architecture

1530 IEEE Transacions on onsumer Elecronics, Vol. 55, No. 3, AUGUST 2009 Truncaed Gray-oded Bi-Plane Maching Based Moion Esimaion and is Hardware Archiecure Anıl Çelebi, Suden Member, IEEE, Orhan Akbulu, Oğuzhan Urhan, Member, IEEE, Sarp Erürk, Member, IEEE Absrac This paper proposes an efficien low bi-deph represenaion based moion esimaion approach which is paricularly suiable for low-power consumer elecronics devices. In he proposed approach moion esimaion is carried ou using bi runcaed gray-coded image pixels. The corresponding hardware archiecure is also designed and presened in his paper o show he effeciveness of he proposed approach. I is shown ha he proposed approach provides improved moion esimaion accuracy compared o convenional bi-runcaion based approaches ha are direcly applied o binary coded pixel values. The proposed approach uses simple Gray-coding, ha has very low-complexiy and can be applied on a pixel-by-pixel basis. Hence, he comparaively more complex ransformaion processes required in One Bi-Transform or Two-Bi Transform based low bi-deph represenaion ME approaches are avoided. Experimenal resuls show ha he proposed approach also ouperforms such low bi-deph represenaion based moion esimaion mehods previously presened in he lieraure, in erms of moion esimaion accuracy 1. Index Terms Moion esimaion, gray-coding, bi runcaion, hardware archiecure, sysolic arrays. I. INTRODUTION Recen increases in compuaional processing capabiliies of microprocessors as well as highly effecive dedicaed hardware implemenaions ogeher wih emerging video coding sandards have increased video applicaions in consumer elecronics devices. Moreover, widespread Inerne access and increased capaciy of sorage media, have conribued o increases in video conen disribuion. However, mobile devices have ypically limied processing capabiliies and baery life, and herefore require lowcomplexiy video compression echniques o enable efficien ransmission of capured video over bandwidh limied channels. Many consumer elecronics devices, such as mobile phones and camcorders for example, ypically use H.263 [1] or H.264/AV [2] based echniques for video compression. 1 This work was suppored by he Scienific and Technical Research ouncil of Turkey (TUBITAK) and Korean Research Foundaion (KRF) ooperaion Program projec eniled Low-omplexiy Moion Esimaion Techniques and Their Sysem-on-hip Implemenaion (TUBITAK projec number 107E179). Auhors are wih Kocaeli Universiy Laboraory of Image and Signal processing (KULIS), Deparmen of Elecronics and Telecommunicaion Engineering, Universiy of Kocaeli, 41040 Kocaeli, Turkey (e-mail: anilcelebi@kocaeli.edu.r, orhanakbulu@gmail.com, urhano@ieee.org, sarp@ieee.org). onribued Paper Manuscrip received June 28, 2009 0098 3063/09/$20.00 2009 IEEE Moion esimaion (ME) is basically he mos complex and processing power inensive par of he encoder. Alhough various low-complexiy ME approaches have been presened in he lieraure o reduce he compuaional complexiy of ME, only several efficien hardware implemenaions of such approaches have been proposed. One-bi ransform (1BT) based ME has been proposed in [3] as a low-complexiy ME approach for video compression. In 1BT, video frames are convered ino a single bi-plane by comparing hem wih heir muli-band pass filered version. In 1BT based ME, block maching is performed using Exclusive- OR (EX-OR) maching of bi-planes. ME using EX-OR maching of 1BTs enables efficien hardware implemenaion compared o convenional SAD (sum of absolue difference) maching, decreasing he compuaional load by nearly sixeen imes a he expense of some accuracy loss. A muliplicaion-free one-bi ransform (MF1BT) is proposed in [4] o decrease he compuaional load of he 1BT presened in [3] by making use of a novel muliplicaion-free kernel o obain bi-planes. In [5] a wo-bi ransform (2BT) based ME approach is inroduced o improve he performance of 1BT based mehods using an addiional bi-plane, bu hereby increasing compuaional load compared o 1BT. A consrained one-bi ransform (-1BT) based ME approach is proposed in [6] o inroduce a consrain mask so as o use only 1BTs of reliable pixels in he maching process. The -1BT based ME approach is shown o provide improved maching compared o oher 1BT and 2BT based ME approaches. Low bi-deph maching based ME approaches can also be combined wih addiional complexiy reducion mehods for furher reducion in ME complexiy. In [7], a parial disorion search approach combined wih a sparse search poin approach is uilized wih -1BT based ME o furher reduce he compuaional load for sofware implemenaion, a he expense of a sligh accuracy loss. An early erminaion scheme is combined wih MF1BT based ME in [8], o furher reduce compuaional complexiy. Various hardware implemenaions of ME algorihms are proposed in he lieraure. I is saed in [9] ha one of he mos imporan aspecs influencing power consumpion in ME hardware archiecures is sysem memory bandwidh, and a higher sysem memory bandwidh increases power consumpion. Mos hardware archiecures proposed for he block maching algorihm (BMA) in ME are designed using parallel archiecures based on sysolic or semi-sysolic arrays [10-14]. In [10] a hardware archiecure uilizing SAD reuse o reduce he sysem memory bandwidh for variable block size

A. Çelebi e al.: Truncaed Gray-oded Bi-Plane Maching Based Moion Esimaion and is Hardware Archiecure 1531 moion esimaion is proposed. An efficien hardware archiecure for variable block size ME is obained in [10] by removing he daa dependency beween he sub-pariions of macroblocks and modifying he predicion flow accordingly. In [11], a deailed archiecural analysis of variable block size ME for H.264/AV is invesigaed. Insead of using a 1D adder ree as in [10], a 2D adder ree archiecure ha increases parallelism and a new search scheme ha improves daa reuse is uilized in [11]. In [12] a hardware-oriened fas ME algorihm is proposed wih he inra-/iner-candidae daa reuse consideraions. In [13], an adapive search range algorihm is proposed for he sofware side and a SAD-ree based archiecure is inroduced in he hardware side for sofware/hardware co-soluion o achieve high hroughpu ME for H.264/AV HDTV. The common purpose of hese hardware implemenaions is o reduce complexiy and/or power consumpion. One way o reduce processing complexiy and/or power consumpion is o reduce he daa amoun o be processed. Therefore, archiecures ha make use of low bi-deph based represenaions such as 1BT and 2BT can herefore provide efficien hardware implemenaions. Several hardware implemenaions of binary ME approaches are presened in [3, 14-18]. In [3], all binary ME using 1BT and he hardware archiecure based on a 1D linear PE (processing elemen) array is presened. An all binary hierarchical ME approach and he corresponding hardware design are presened in [14]. In [15], a plaform based implemenaion of he approach proposed in [14] is presened wih bus inerlaced archiecure. A fas binary ME algorihm for MPEG4 shape encoding is presened in [16] ogeher wih is hardware archiecure. In [17], hardware archiecures for 1BT based ME mehods are proposed ogeher wih an efficien daa flow scheme where he power consumpion is reduced abou 50% compared o he hardware archiecure proposed in [3]. Recenly, an exension of 1BT based ME hardware archiecure presened in [17] o sub-pixel level is proposed in [18]. The number of arihmeic operaions carried ou in he PE array and adder srucures is anoher aspec influencing he complexiy and power consumpion of ME hardware archiecures. In he hardware archiecure presened in [13], a 2D PE array composed of 256 PEs is used wih an SAD ree and variable block size (VBS) adder ree, requiring 6368 full adders (FAs) in oal. On he oher hand, he 1BT based ME archiecure presened in [17] requires only a oal of 199 FAs. Insead of using all 8-bis of pixel values for ME, i is proposed in [19] o use bi runcaion by uilizing only a cerain number of he mos significan bis (MSB), by runcaing he lower bis, in order o reduce he compuaional load. Alernaively, an adapive, pixel bi-deph reducion echnique based ME approach and is VLSI archiecure is presened in [9], however addiional processing is required o obain he reduced bi-deph represenaions in his case compared o simple bi runcaion. Bi runcaion is furhermore applied o variable block size moion esimaion in [20]. Bi runcaion based ME hardware archiecures are proposed in lieraure o reduce he compuaional complexiy of 8-bis/pixel based BMA a he expense of some loss in ME accuracy [19, 20, 22]. In [19], he VLSI implemenaion of biruncaion based ME is accomplished using a well known parallel archiecure proposed in [21]. A variable lengh bi runcaion echnique based ME approach and is VLSI archiecure is proposed in [22]. However, he hardware complexiy of he archiecure presened in [22] is relaively high compared o low bi-deph based ME archiecures because he PE archiecure proposed in [22] is designed o process boh low bi-deph as well as full bi-deph pixels. In [23], Gray-coded pixel values are used o obain global moion in image sequences for video sabilizaion. This approach is employed for block maching based ME in [24] o invesigae is possible uilizaion in ME for video coding. This paper proposes o employ bi runcaion on Gray-coded pixel values for low-complexiy ME and i is shown ha his approach provides improved predicion performance compared o exising low bi-deph represenaion based ME approaches such as 1BT, 2BT, MF1BT, and -1BT. Furhermore, he proposed approach significanly reduces he binarizaion process which is comparaively more complex in 1BT based ME mehods due o he filering process. I is also shown ha he proposed mehod ouperforms convenional bi runcaion based ME approaches in erms of ME accuracy. The proposed approach enables low-complexiy and power efficien ME hardware archiecure implemenaion. II. TRUNATED GRAY-ODED BIT-PLANE MATHING BASED MOTION ESTIMATION Gray-coding based block moion esimaion is presened in [24] o reduce compuaional load of he moion esimaion process paricularly in hardware implemenaions. In his paper, i is proposed o use bi-runcaion wih Gray-coding o furher reduce ME complexiy and a he same ime faciliae efficien hardware design. I is possible o represen a pixel value ha is quanized o 2 K grey quanizaion levels, and locaed a locaion ( x, y ) of frame f a ime in he form of 1 2 1 0 (, ) 2 2... 2 2 K K f xy = ak 1 + ak 2 + + a1 + a0 (1) where a k coefficiens represen he naural binary code and ake only binary values. If he k h bi-plane of frame is represened as bk ( x, y ), i conains all a k bis of level k. If a bi-deph of 8-bis/pixel is used, K is 8, and b0 ( x, y ) is he leas significan bi-plane, while b7 ( x, y ) is he mos significan bi-plane. The gray-coded version of a pixel value can be compued from is naural binary codes as: gk 1 = ak 1, 0 k K 2 (2) g = a a k k k+ 1

1532 where shows he EX-OR operaion. Since he gray codes of adjacen grey levels differ only in a single bi, i is more appropriae o use Gray-coded pixel values in EX-OR maching based ME. In BM, he original block of he curren frame is searched for, inside a search window in he reference frame (which is usually he previous frame), using a cerain similariy measure. In Gray-coded bi-plane maching based ME he similariy beween he curren block of size N N pixels locaed in frame and he reference block locaed in frame 1 can be calculaed using a correlaion measure (M G ) which is defined as N 1N 1K 1 k 1 (, ) = 2 { (, ) ( +, + )} MG m n gk i j gk i m j n i= 0 j= 0 k= 0 (3) s m, n s 1 where ( mn, ) and s denoe he candidae displacemen and search range, respecively. The displacemen resuling in he lowes M G value is assigned as he moion vecor of he curren block. Noe ha, a scaling facor of 2 k is uilized o include he weigh of level k when compuing he similariy measure, so ha higher order bi-planes have higher weigh. The proposed runcaed Gray-coding based bi-plane maching approach does no use all K bi-planes, bu i makes only use of he highes M bi-planes o compue he similariy measure. If he number of runcaed bis is shown as NTB, hen he highes M = K NTB bi-planes are used in he maching process. Thus, he new correlaion meric M TG is defined as N 1N 1 K 1 gk (, i j) k NTB MTG ( m, n) = 2 1 i= 0 j= 0 k= NTB gk ( i+ m, j+ n) s m, n s 1 (4) Noe ha basically he lower boundary of one of he summaions shown in (3) is changed in he similariy measure compuaion of he proposed approach, bu his provides an imporan reducion in compuaions. I is also obvious ha his measure can be compued wih only binary operaions since he muliplicaion in (4) can be carried ou using shif operaions as i is power of 2. The ME performance of he proposed mehod for differen NTB values is provided in he experimenal resuls secion. Experimenal resuls show ha NTB = 5 gives he bes performance in erms of moion esimaion accuracy aking a he same ime complexiy ino accoun; and only he mos significan hree Gray-coded bi-planes are uilized for ME in his case. The hardware design is herefore carried ou for NTB = 5. III. HARDWARE DESIGN Because of he binary naure of he proposed algorihm a 1D sysolic array archiecure is sufficien o provide realime processing performance. The archiecures proposed for IEEE Transacions on onsumer Elecronics, Vol. 55, No. 3, AUGUST 2009 Fig. 1. Hardware archiecure of proposed approach S0S1 S0 S1 S0 S1 S0 S1 urren Block 16 PE Array Search Window 1504 M 0 M 1 M 2 M 14 PE 0 7 PE 1 8 PE 2 9 11 PE 15 11 Fig. 2. PE Array M 15 8-bis/pixel represenaion based ME algorihms mosly uilize 2D sysolic arrays o process video in real ime. A similar daa hroughpu o 8 bi/pixel represenaion based 2D sysolic array archiecure is achievable by uilizing only a 1D sysolic array for a binary ME mehod, since each PE in a binary ME hardware archiecure can process muliple pixels a each clock cycle [3]. The overall hardware archiecure proposed in his paper is shown in Fig. 1. The implemened archiecure is capable of performing ME a macroblock level ( 16 16 pixels) for a search range of [ 16,15] pixels. A RAM block, of size -bi wide and 16 rows deep, is used o sore he curren macroblock (The macroblock size is 16 16 pixels and he mos-significan 3 bis of he gray-coded bi-planes are used in he maching process). A dual por RAM block of size -bi wide and 1504 47( for _ row _ scan) 32( for _ column _ scan) rows deep is used o sore he search window. Le S i,j denoe he -bi wide pixel vecor ha is locaed beween he columns j and (j+47) h on he row i of he search window. Two differen row vecors of he search window are needed during he operaion of he proposed hardware archiecure. Thus, a dual por memory is used o sore he search window pixels. ME a macroblock level wih a search range of [ 16,15] pixels, requires a search window of size 47 47 pixels. Therefore, he heoreical minimum memory size required for he search window memory of he proposed ME archiecure is acually 3-bis/pixel 47 47= 6,627k bis. This leads o a oal minimum on-chip memory amoun of 7,395k bis, including he curren block memory, in heory. However, in he implemenaion, he oal memory used for he proposed archiecure is acually (1504 ) + (16 ) = 72,96k bis. This is much higher han he heoreical minimum, and i is acually possible o reduce his size by designing addiional daa scheduling hardware, however, his is ou of he scope of his paper. For comparaive evaluaion i is useful o noe ha he oal on-chip memory amoun is 24,32k bis in [17] for 1BT based BMA hardware where a search window memory omparaor onrol Logic 11

A. Çelebi e al.: Truncaed Gray-oded Bi-Plane Maching Based Moion Esimaion and is Hardware Archiecure 1533 TABLE I. DATA FLOW SHEME OF PROPOSED ARHITETURE bigger han he heoreical minimum is used because of he abovemenioned reasons. On he oher hand, he on chip memory used in [11] is 208k bis for ME hardware wih 8 bis/pixel represenaions. The PE array consiss of 16 PEs as shown in Fig. 2. All PEs in he PE array are basically idenical o each oher, and he only difference is he widh of he M inpu/oupu of each PE. The widh of he M oupu of each PE is deermined by he range of possible values he M oupu can ake. In oher words, i is deermined by he maximum possible value of he adder ree and he M inpu, which are summed o obain he M oupu. The oupu of he adder ree is fixed and 7-bi wide. The M oupu of he firs PE (M 0 ) is herefore also 7-bi wide. The M oupu of he nex PE (M 1 ) on he oher hand needs o be 8-bi wide as i is he sum of wo 7 bi values, i.e. he sum of he M value of he previous PE and he oupu of is own adder ree. onsidering he maximum possible values a each sage, he final M oupu (M 15 ) is obained o be 11-bi wide. The PE archiecure of he proposed hardware is shown in Fig. 3. In addiion o M inpus and oupus, here are hree daa inpus o each PE: one for he curren macroblock () and wo for he search window daa (S1, S2). Three LUTs are used o obain he EX-OR maching resul, one for each gray-coded bi-plane, because hree MSBs of gray-coded pixel values are used in he maching process. Noe ha he curren macroblock daa is no shifed hrough he PEs, bu each PE uses he corresponding row of he curren macroblock o compue he M value of ha row, so ha acually a shared compuaion scheme is uilized. This is also seen in he daa flow scheme of he proposed archiecure shown in Table I. The muliplexer (Mux) block in he PE archiecure selecs he correc search daa, depending on he conrol signal coming from he conrol block, according o he daa flow scheme shown in Table I. The oupus of he Lach and he Mux blocks are inpued ino he EX-OR array o compue heir EX-OR disance. The oupu of he EX-OR array is separaed ino hree pars, corresponding o he hree separae bi-planes, according o he significance (weigh) of each biplane. Because he mos significan hree bi-planes are used for each gray-coded pixel value, hree 16-bi wide vecor groups are consruced, and hese groups are hen applied o he inpus of he 3 LUTs. The PE archiecure shown in Fig. 3 conains LUTs wih wo 8-bi wide inpus and wo 4- bi wide oupus. The reason for his is o simplify he calculaion of he number of ones in a 16-bi wide vecor oupu of he EX-OR maching operaion. For a 16-bi wide inpu, he deph of he LUT is needed o be 2 16 whereas his number is 2 8, for 8-bi wide vecors; and hus, wo smaller LUTs are sufficien o perform his operaion wih an addiional 4-bi adder. The oupus of he hree LUTs are hen summed in he adder ree aking ino accoun heir corresponding weighs. The oupu of he adder ree is 7 bis, so as o accommodae he larges possible value. Finally, he oupu of he adder ree is added o he M oupu of he previous PE and his sum is forwarded o he M inpu of he nex PE. Noe ha in Table I, square brackes show ha he corresponding daa is read from he lach insead of he curren macroblock RAM. Thus, for each PE only a single memory read operaion is needed for he curren macroblock in he enire moion vecor compuaion sage of a macroblock. In Table I, i represens he -bi wide vecor locaed in he ih row of he curren macroblock comprising he hree MSBs of Gray-coded values of he 16 pixels in he ih row. Si,j represens he -bi wide vecor locaed in he ih row beween columns j and (j+47) in he search window. For example S0,0 denoes he hree MSBs of he 16 pixels in he 0h row concaenaed in he form of {(S0,0-2),(S0,3-5),,(S0,45-47)}. Each PE in he PE array shown in Fig. 2 can process 16 pixels in one clock cycle so ha 16 16 pixels can be processed in each cycle when all of he PEs in he array are uilized. As shown in Table I, 15 clock cycles are needed for he PE array o become fully funcional.

1534 IEEE Transacions on onsumer Elecronics, Vol. 55, No. 3, AUGUST 2009 M i-1 8 S1 S2 Lach Mux EX-OR Array 8 8 8 LUT LUT LUT 4 4 4 4 <<1 6 5 7 <<2 Adder 7 Tree 7-11 8-11 PE i M Acc M i Fig. 3. PE Archiecure of he proposed hardware The hardware needs 1024 clock cycles for he compuaion of he moion vecor for one macroblock, in addiion o his 15 clock cycle offse. More deailed informaion abou his daa flow scheme, ypical o 1D sysolic array archiecures, is available in [25]. The oal full adder amoun needed for he PE array of he proposed archiecure is 523, where 16 23=368 full adders are used in he adder ree and 153 full adders are used in he M accumulaion sage. For comparaive evaluaion i is useful o noe ha he oal full adder coun of he MF1BT [4] based ME hardware archiecure presened in [17] is 153, as only a single bi-plane is used in he maching process (bu he ME accuracy is lower). On he oher hand he hardware archiecure presened in [11] requires 6368 full adders o compue he SAD of a macroblock. Therefore, he hardware complexiy of he proposed approach is in beween 1BT based archiecures and 8 bis/pixel represenaion based archiecures, bu close o 1BT based archiecures. D Synhesis of he proposed archiecure is performed using he Synpliciy Synplify Pro synhesis ool wih all advanced feaures such as reiming and pipelining urned off, so as o provide comparison wih he archiecure presened in [17] for he ME approach proposed in [4]. The oal number of LBs (onfigurable Logic Blocks) occupied by he proposed archiecure on a Xilinx X2VP30 device is 2339, while his number is 690 for he archiecure presened in [17]. Therefore, he hardware size is increased roughly 3.4 imes whereas he deph of image pixels used in he maching process is increased 3 imes. Thus, he hardware complexiy seems increase abou linearly wih he pixel bi-deph. IV. EXPERIMENTAL RESULTS In he experimenal seup, iniially, he moion esimaion performance of he proposed runcaed gray-coded bi-plane maching based ME approach (T-GBPM) is evaluaed using an open loop scheme, in which he curren frame of he video sequence is reconsruced from he previous frame using moion vecors obained by he ME approach. The similariy beween he original and he esimaed frames are compued in erms of Peak Signal o Noise Raio (PSNR). Six differen video sequences are uilized in he experimens o properly assess he performance of he proposed approach. Average PSNR values for he es sequences are given in Table II. Here, T-BPM represens he convenional runcaed bi-plane maching approach presened in [11]. In case of T-BPM and T-GBPM, experimenal resuls are provided for various NTB cases. Experimenal resuls show ha he proposed T-GBPM based ME approach ouperforms convenional T-BPM based ME in all NTB cases. The increase in PSNR can be as high as 0.5dB. These resuls show ha Gray-coding improves he performance of runcaed bi-plane maching. Experimenal resuls also show ha he proposed T-GBPM based ME approach wih NTB=5 provides higher PSNR values compared o oher low bi-deph ME approaches such as -1BT or 2BT. Mehod TABLE II. AVERAGE PSNR VALUES (DB) FOR SEVERAL TEST SEQUENES USING AN OPEN-LOOP SHEME Fooball (125 frames) Flowergarden (115 frames) Mobile (140 frames) Video Sequences Tennis (112 frames) oasguard (352x288) (300 frames) Foreman (352x288) (300 frames) SAD, 8 bis/pixel 22.88 23.79 22.99 29.87 30. 32.11 1BT [3] 21.83 23.32 22.71 28.77 29.84 30.44 2BT [5] 22.08 23.43 22.72 28.89 29.93 30.71 MF-1BT [4] 21.81 23.26 22.73 28.78 29.88 30.38-1BT [6] 22.10 23.39 22.77 29.18 29.98 30.87 T-BPM [11] (NTB=6) 22.08 23.49 22.72 28.57 28.97 30.35 T-BPM [11] (NTB=5) 22.22 23.49 22.75 28.68 29.85 30.86 T-BPM [11] (NTB=4) 22.22 23.49 22.76 28.70 29.95 31.04 T-BPM [11] (NTB=3) 22.21 23. 22.76 28.70 29.95 31.08 T-BPM [11] (NTB=2) 22.20 23. 22.76 28.70 29.95 31.09 T-GBPM (NTB=6) 22.39 23.61 22.84 28.98 29.14 30.64 T-GBPM (NTB=5) 22.59 23.67 22.86 29.19 30.16 31.32 T-GBPM (NTB=4) 22.58 23.66 22.87 29.26 30.27 31.57 T-GBPM (NTB=3) 22.56 23.66 22.87 29.23 30.26 31.61 T-GBPM (NTB=2) 22.56 23.66 22.87 29.23 30.25 31.61

A. Çelebi e al.: Truncaed Gray-oded Bi-Plane Maching Based Moion Esimaion and is Hardware Archiecure 1535 This increase in PSNR can be accouned o he fac ha hree bi-planes are used in he maching process for T-GBPM based ME wih NTB=5, while only wo bi-planes are used in he maching process in case of -1BT and 2BT based ME. The proposed T-GBPM also has a lower binarizaion complexiy compared o 1BT and 2BT based approaches. The proposed hardware is coded in Verilog hardware descripion language and verified for a clock frequency of 90 MHz using synhesis wih he Synpliciy Synplify Pro synhesis ool. The synhesized design occupied 2339LUTs (8%) on a Xilinx X2VP30 device. The power consumpion of he proposed T-GBPM based ME hardware archiecure is obained o be abou 230 mw on average, while he power consumpion for T-BPM based ME is obained as abou 245 mw on average, for he mobile sequence a a clock frequency of 66MHz. Therefore, he proposed T-GBPM based ME archiecure resuls in a small reducion in power consumpion compared o T-BPM based ME, in addiion o improved ME accuracy. Noe ha he XPower and ISE Simulaor (ISIM) ools from Xilinx are used for he power consumpion analysis. The proposed hardware archiecure is also synhesized for a 0.18um process o compare he performance wih available 8bis/pixel based ME hardware archiecures. According o he synhesis resuls, he gae coun for he 1-bi/pixel based hardware archiecure proposed in [17] is 8k, he number of oal gaes for he hardware archiecure proposed in his work is 23k gaes. In [11] where he leas significan 3 bis of pixels are runcaed o reduce ME hardware complexiy, he oal number of gaes for fixed block size ME is 88k. In [26], he oal gae coun uilized for ineger moion esimaion is 146k gaes, which is roughly 6 imes he size of he proposed hardware archiecure. ompared o 1-bi/pixel based archiecures, he proposed hardware archiecure requires abou hree imes more gaes, which is direcly he resul of using 3 bis/pixel insead of 1 bi/pixel represenaions, bu he proposed hardware archiecure has wo imporan advanages: firs of all he ME accuracy of he proposed approach is much beer (he PSNR of reconsruced frames is up by nearly 1 db) and secondly he proposed approach only requires Gray coding of pixel values which is a very low-complexiy process and can direcly be applied on a pixel-by-pixel basis, whereas 1BT and 2BT based approaches require a comparaively more complex ransform process which inroduces subsanial addiional hardware complexiy ha is no accouned for in he presened resuls. ompared o recenly proposed 8bis/pixel based ME archiecures uilizing SAD based maching crierion, he hardware complexiy of he proposed approach is dramaically lower, making he proposed approach paricularly favorable for consumer elecronics applicaions ha require low complexiy and low power consumpion. V. ONLUSIONS A novel gray coded bi plane maching based ME approach wih bi runcaion is proposed in his paper. The proposed runcaed gray-coded bi-plane maching based ME approach is shown o provide improved moion esimaion accuracy compared o convenional bi-plane maching based ME. Furhermore, he presened approach also ouperforms low bideph represenaion based ME mehods, such as 1BT and 2BT based ME, in erms of ME accuracy; and also has a lower binarizaion complexiy. The binarizaion complexiy is much lower because he proposed approach uses simple Gray-coding ha has very low-complexiy and can be applied on a pixelby-pixel basis, whereas 1BT and 2BT based approaches require much more complex ransformaion processes ha acually add o he hardware complexiy. An efficien hardware archiecure of he proposed mehod is designed and verified in his paper. The proposed approach is paricularly suiable for consumer elecronics equipmen wih low processing resources and limied power capabiliies. REFERENES [1] ITU-T Recommendaion H.263, Video coding for low bi rae communicaion, 1996. [2] Join Video Team (JVT) of ISO/IE MPEG & ITU-T VEG, Draf ITU-T Recommendaion and Final Draf Inernaional Sandard of Join Video Specificaion (ITU-T Rec. H.264/ISO/IE 14496-10 AV), JVT-G050, March, 2003. [3] B. Naarajan, V. Bhaskaran, and K. Konsaninides, Low-complexiy block-based moion esimaion via one-bi ransforms, IEEE Trans. ircui Sys. Video Technol., vol. 7, no. 4, pp. 702-706, Aug. 1997. [4] S. Erürk, Muliplicaion-free one-bi ransform for low-complexiy block-based moion esimaion, IEEE Signal Process. Le., vol. 14, no. 2, pp. 109-112, Feb. 2007. [5] A. Erürk and S. Erürk, Two-Bi Transform for Binary Block Moion Esimaion, IEEE Trans. ircui Sys. Video Technol., vol. 15, no. 7, pp. 938-946, July 2005. [6] O. Urhan and S. Erürk, onsrained one-bi ransform for lowcomplexiy block moion esimaion, IEEE Trans. ircuis and Sys. Video Technol., vol. 17, no.4, pp. 478-2, April 2007. [7] O. Urhan, onsrained one-bi ransform based moion esimaion using predicive hexagonal paern, Journal of Elecron. Imaging, vol. 61, no. 3, Aricle ID: 033019, July-Sep. 2007. [8] H. Lee, J. Jeong, Early erminaion scheme for binary block moion esimaion, IEEE Trans. onsumer Elecron., vol. 53, no. 4, pp. 1682-1686, Nov. 2007. [9] S. Lee, J.M. Kim, S.I. hae, New moion esimaion algorihm using adapively quanized low bi-resoluion image and is VLSI archiecure for MPEG2 video encoding, IEEE Trans. ircuis and Sys. Video Technol., vol. 8, no. 6, pp. 734-744, Oc. 1998. [10] Y.W. Huang, T.. Wang, B.Y. Hsieh, L.G. hen, Hardware Archiecure Design for Variable Block Size Moion Esimaion in MPEG-4 AV/JVT/ITU-T H.264, Proc. of IEEE Inernaional Symposium on ircuis and Sysems (ISAS), vol. 2, pp. 796-799, May 2003. [11].Y. hen, W.Y. hien, Y.W. Huang, T.. hen, T.. Wang, L.G. hen, Analysis and Archiecure Design of Variable Block-Size Moion Esimaion for H.264/AV, IEEE Tran. ircuis Sys. vol. 53, no. 2, pp.578-593, Mar. 2006. [12] T.-. hen, Y.-H. hen, S.-F. Tsai, S.-Y. hien, L.-G. hen, Fas Algorihm and Archiecure Design of Low-Power Ineger Moion Esimaion for H.264/AV, IEEE Trans. ircuis and Sys. Video Technol., vol. 17, no. 5, pp. 568-577, May 2007. [13] Z. hen, T. Ikenaga, S. Goo, A Hardware/Sofware o-soluion o Achieving High Throughpu Required by Moion Esimaion Par in H.264/AV HDTV Real-ime Applicaion, Proc. of IEEE Inernaional Symposium on VLSI Design, Auomaion and Tes, VLSI-DAT, pp. 128-131, Apr. 2008. [14] J.-H. Luo,.-N. Wang, and T. hiang, A novel all-binary moion esimaion (ABME) wih opimized hardware archiecures, IEEE Trans. ircuis Sys. Video Technol., vol. 12, no. 8, pp. 700-712, Aug. 2002. [15] S.-H. Wang, W.-L. Tao,.-N. Wang, W.-H. Peng, T. hiang, Plaform based design of all binary moion esimaion (ABME) wih bus inerleaved archiecure, Proc. of IEEE Inernaional Symposium on VLSI Design, Auomaion and Tes, pp. 241-244, Apr. 2005.

1536 [16] E.A. Al Qaralleh, T.-S. hang, K.-B. Lee, An Efficien Binary Moion Esimaion Algorihm and is Archiecure for MPEG-4 Shape Encoding, IEEE Trans. ircuis and Sys. Video Technol., vol. 16, no. 7, pp. 859-868, Jul. 2006. [17] A. Çelebi, O. Urhan, I. Hamzaoğlu, S. Erürk, Efficien Hardware Implemenaions of Low Bi Deph Moion Esimaion Algorihms, IEEE Signal Process. Les., vol. 16, no. 6, pp. 513-516, June 2009. [18] A. Çelebi, O. Akbulu, O. Urhan, I. Hamzaoğlu, S. Erürk, An All Binary Sub-Pixel Moion Esimaion Approach and is Hardware Archiecure, IEEE Trans. onsumer Elecron., vol. 54, no. 4, Nov. 2008. [19] Y. Baek, H.S. Oh, H.K. Lee, An efficien block-maching crierion for moion esimaion and is VLSI implemenaion, IEEE Trans. onsumer Elecron., vol. 42, no. 4, pp. 885-892, Nov. 1996. [20] A. Bahari, T. Arslan, A.T. Erdogan, Low power variable block size moion esimaion using pixel runcaion, Proc. of. IEEE Inernaional Symposium on ircuis and Sysems, pp. 3663-3666, New Orleans USA, May 2007. [21] K.M. Yang, M.T. Sun,, L. Wu, A Family of VLSI Designs for he Moion ompensaion Block-maching Algorihm, IEEE Trans. ircuis Sys., vol. 36, no. 2, 1317 1325, 1989. [22] Z.-L. He,.-Y. Tsui, K.-K. han, M. L. Liou, Low Power VLSI Design for Moion Esimaion Using Adapive Pixel Truncaion, IEEE Trans. ircuis and Sys. Video Technol., vol. 10, no. 5, pp. 669-678, Aug. 2000. [23] S.J. Ko, S.H. Lee, S.W. Jeon, E.S. Kang, Fas digial image sabilizer based on Gray-coded bi-plane maching, IEEE Trans. onsumer Elecron., vol. 45, no. 3, pp. 598-603, Aug. 1999. [24] O. Urhan, and S. Erürk, Gray-coded bi-plane maching for block based moion esimaion, Proc. of 10 h Signal Processing and ommunicaion Applicaions onference, vol. 1, pp. 518-523, Denizli, Turkey, June 2002. (In Turkish) [25] V. Bhaskaran and K. Konsaninides, Image and Video ompression Sandards: Algorihms and Archiecures, 2nd., Kluwer Academic Publishers, London, 1997 [26] Y.-H. hen,.-. heng, T.-Z. huang,.-y. hen, and L.-. hen, Efficien Archiecure Design of Moion-ompensaed Temporal Filering/Moion ompensaed Predicion Engine, IEEE Trans. ircuis and Sys. Video Technol., vol. 18, no. 1, pp. 98-109, Jan 2008. IEEE Transacions on onsumer Elecronics, Vol. 55, No. 3, AUGUST 2009 Orhan Akbulu was born in Küahya, Turkey. He received he B.Sc. and M.Sc. degrees in elecronics and elecommunicaion engineering from Kocaeli Universiy in 2005 and 2007 respecively. He is currenly working owards he Ph.D. degree a he Graduae School of Naural and Applied Sciences, Kocaeli Universiy. His major research ineress are image and video coding sysems. Oğuzhan Urhan (S 02-M 06) received his B.Sc., M.Sc., and Ph.D. degrees in elecronics and elecommunicaion engineering from he Universiy of Kocaeli, Kocaeli, Turkey, in 2001, 2003, and 2006, respecively. Since 2001 he has been wih he Deparmen of Elecronics and Telecommunicaions Engineering, Universiy of Kocaeli, Turkey, where he is currenly Associae Professor. He was a Visiing Professor a hung-ang Universiy, Korea, from 2006 o 2007. His research ineress include digial signal and image processing, in paricular, image and video resoraion and coding. Sarp Erürk (M 99) received his B.Sc. in Elecrical and Elecronics Engineering from Middle Eas Technical Universiy, Ankara in 1995. He received his M.Sc. in Telecommunicaion and Informaion Sysems and Ph.D. in Elecronic Sysems Engineering in 1996 and 1999 respecively from he Universiy of Essex, U.K. From 1999 o 2001 he carried ou his compulsory service a he Army Academy, Ankara. He is currenly appoined as Full Professor a Kocaeli Universiy, where he worked as Assisan Professor beween 2001 and 2002, and Associae Professor beween 2002 and 2007. His research ineress are in he area of digial signal and image processing, video coding, remoe sensing and digial communicaions. Anıl Çelebi (S 00) was born in Ordu, Turkey. He received he B.Sc., M.Sc. and Ph.D. degrees in elecronics and communicaion engineering from Kocaeli Universiy, Kocaeli, Turkey, in 2002, 2005, and 2008, respecively. Since 2002 he has been wih he Deparmen of Elecronics and Telecommunicaions Engineering, Universiy of Kocaeli, Turkey, where he is currenly Assisan Professor. His research ineress include very large scale inegraion (VLSI) design and implemenaion for analog/mixed signal sysems, image processing sysems, and video coding sysems.