A Reconfigurable Frame Interpolation Hardware Architecture for High Definition Video

A Reconfiguable Fame Intepolation Hadwae Achitectue fo High Definition Video Ozgu Tasdizen and Ilke Hamzaoglu Faculty of Engineeing and Natual Sciences, Sabanci Univesity 34956, Tuzla, Istanbul, Tukey tasdizen@su.sabanciuniv.edu, hamzaoglu@sabanciuniv.edu Abstact Since Fame Rate Up-Convesion (FRC) is stated to be used in ecent consume electonics poducts like High Definition TV, eal-time and low cost implementation of FRC algoithms has become vey impotant. Theefoe, in this pape, we popose a low cost hadwae achitectue fo ealtime implementation of fame intepolation algoithms. The poposed hadwae achitectue is econfiguable and it allows adaptive selection of fame intepolation algoithms fo each Macoblock. The poposed hadwae achitectue is implemented in VHDL and mapped to a low cost Xilinx XC3SD1800A-4 FPGA device. The implementation esults show that the poposed hadwae can un at 101 MHz on this FPGA and consumes 32 BRAMs and 15384 slices. Keywods: Fame Rate Up-Convesion, Fame Intepolation, Hadwae Implementation, FPGA. I. INTRODUCTION Fame Rate Up-Convesion (FRC) is the convesion of a lowe fame ate video signal to a highe fame ate video signal. LCD panels used fo High Definition TV (HDTV) have a fame ate up to 240 Hz, wheeas video signals ae usually ecoded at 24 Hz, 25 Hz, o 30 Hz. Theefoe, FRC is equied in ode to display the HDTV video signals in the LCD panels. FRC can be done by intepolating a new fame between evey two consecutive oiginal fames like in 25 Hz to 50 Hz convesion, and it can be done by intepolating thee new fames between evey two consecutive oiginal fames like in 25 Hz to 100 Hz convesion. In the case of 24 Hz to 60 Hz convesion 3:2 pull-down technique is used [1]. FRC fo 1:4 convesion atio is illustated in Fig. 1. The dashed fames in this figue ae the intepolated fames. Simple FRC techniques like fame epetition and Linea Intepolation (LI) ae used in some consume electonics poducts. But, these techniques often poduce atifacts to which human eye is vey sensitive. Fame epetition esults in motion jekiness and LI causes bluing at object boundaies [2,3]. To ovecome these poblems, FRC algoithms using motion infomation between consecutive fames ae developed [2,3]. Fo example, Motion Compensated Aveaging (MCA) technique pefoms fame intepolation by using the Motion Vectos (MVs) found by the Motion Estimation (ME) pocess. Figue 1. Fame Rate Up-Convesion The LI and MCA techniques pefom fame intepolation as shown in Eq. (1) and Eq. (2) espectively. In these equations, t is the time instance the fame F belongs to, x is the spatial location of the cuent pixel in the fame and τ is the time slot the intepolated fame belongs to. Fo the convesion atio 1:2, τ will be 0.5 fo both intepolated fames, and fo the convesion atio 1:4, τ will be 0.25, 0.5, and 0.75 fo the thee intepolated fames. F LI, (1) t τ ) = ( 1 τ ) F t 1) + τf ( x t ) 1 F MCA t τ ) = [ F( x + τv, t 1) + F( x ( 1 τ ) v, t) ] (2) 2 ME is not only used fo FRC. It is also used in video compession standads such as MPEG4 and H.264, and in video enhancement applications such as de-intelacing and de-noising. Howeve, the motion infomation equied fo FRC is not the same as the one equied fo video coding. While ME fo video coding finds the MVs poviding the smallest pediction eo, ME fo FRC equies finding the MVs eflecting the tue motion among consecutive fames. Among the ME techniques, block matching is the most pefeed method due to its simplicity. Block matching patitions the cuent fame into non-ovelapping ectangula blocks and ties to find a block fom a efeence fame in a given seach ange that best matches the cuent block. Block matching algoithms cause blocking atifacts. Seveal FRC algoithms ae poposed to educe these blocking atifacts [3].

Figue 2. An Example FRC System Figue 3. MVs Requied to Intepolate Cuent MB(i,j) In this pape, we assume that the tue MVs equied by the poposed FRC hadwae will be obtained by anothe hadwae, which includes a block matching ME hadwae like the one we poposed in [4], and povided to the poposed FRC hadwae. An example FRC system is shown in Fig. 2. Analyzing the off-chip memoy bandwidth equiement of this FRC system clealy shows that FRC systems equie significant data tansfe fom the off-chip fame memoy. Since this FRC system implements a 1:2 convesion atio, it will intepolate new fames by using one MV pe Macoblock (MB) and accessing one MB fom the cuent fame and one MB fom the efeence fame. Since each colo channel is 10 bits, the RGB values of a pixel take 30 bits which can be stoed in a 32 bit wod in memoy. A Full HD fame has 1920x1080 (1.98M) pixels which take 7.92MB. Theefoe, 2x7.92MB = 15.84MB have to be accessed fom the offchip fame memoy in ode to intepolate one fame. The eceived input fame and the intepolated fame will be stoed in the fame memoy and they will be sent to the LCD display fom the fame memoy. Theefoe, 47.52MB pe fame will be accessed fom the off-chip fame memoy. As it can be seen fom this example, FRC systems equie significant off-chip memoy bandwidth. FRC algoithms such as Adaptive Motion-Compensated Intepolation and Ovelapped Block Motion Compensation poposed in [5-10] poduce good quality esults. Howeve, fo intepolating a MB, these algoithms do not only access the MBs in the cuent and pevious fames pointed by the MV fo the cuent MB, they also access the MBs pointed by the MVs of the eight spatially neighboing MBs of the cuent MB. The MVs equied fo intepolating MB(i,j) is shown in Fig. 3. In the figue, i and j denote the x and y coodinates of a MB, espectively. The dak shaded MB is the cuent MB(i,j) and dashed MBs ae its non-causal neighboing MBs. Theefoe, these FRC algoithms access 9 MBs fom cuent fame and 9 MBs fom efeence fame fo intepolating a MB. This significantly inceases the offchip memoy bandwith equiement of an FRC system. Even though the off-chip memoy bandwidth equied by these FRC algoithms can be educed by using a lage on-chip memoy as poposed in [11], eal-time implementation of these FRC algoithms fo HDTV is vey difficult and they equie a significant aea fo the on-chip memoy. Theefoe, in this pape, we popose a low cost hadwae achitectue fo eal-time implementation of fame intepolation algoithms equiing much lowe offchip memoy bandwidth; LI, MCA, Static Median Filteing (SMF), Dynamic Median Filteing (DMF), Soft Switching (SS) and Cascaded Median Filteing (CMF) [3]. The poposed hadwae achitectue is econfiguable and it allows adaptive selection of these fame intepolation algoithms fo each 16x16 MB. The poposed hadwae achitectue is implemented in VHDL and mapped to a low cost Xilinx Spatan XC3SD1800A-4 FPGA using Xilinx ISE 9.2.04. It is veified with RTL simulations using Mento Gaphics Modelsim. The implementation esults show that the poposed hadwae can wok at 101 MHz and it consumes 15384 slices and 32 Block RAMs (BRAMs). Seveal complete FRC hadwae implementations including these fame intepolation algoithms ae poposed in [12-15]. Howeve, they do not specify the details of the fame intepolation pat of thei hadwae, and they do not popose a econfiguable hadwae achitectue fo implementing these fame intepolation algoithms. The est of the pape is oganized as follows. Section II explains the fame intepolation algoithms implemented by the poposed FRC hadwae. Section III descibes the poposed econfiguable FRC hadwae achitectue. Section IV concludes the pape. II. FRAME INTERPOLATION ALGORITHMS FRC by epetition of the oiginal fames esults in motion jekiness and LI causes bluing at object boundaies. MCA is used to ovecome these atifacts. Howeve, it intoduces blocking atifacts. Blocking atifacts occu at object boundaies when a block contains multiple objects with diffeent motions. An appopiate solution to these local poblems is gaceful degadation [3].

Gaceful degadation methods ae SMF, DMF, SS, and CMF. Thei equations ae shown in (3), (4), (5), and (6) espectively. Thei advantages and dawbacks ae discussed in detail in [3]. In geneal, SMF poduces good esults fo stationay scenes; howeve it fails fo detailed pats of the video. DMF pefoms bette fo these pats of video. The dawback of DMF is its tendency to cause seation of edges in highly detailed aeas. The block diagams of SMF and DMF ae shown in Fig. 4 and Fig. 5, espectively. SS is an altenative to the apid switching of DMF between LI and motion compensated pixels. SS takes the weighted aveage of motion compensated and non-motion compensated pixels. As a esult, switching between LI and MCA becomes softe. As shown in Eq. (5), the weighting mechanism is contolled by a facto k which shows the eliability of the MVs. Fo eliable MVs, MCA will be pefeed and fo uneliable MVs, LI will be pefeed. SS may esult in local motion jekiness o local blu. CMF combines the stengths of SMF, DMF, and SS by taking the median of these methods. CMF can ovecome the poblems of these individual methods if contolled caefully. Figue 6. Top-Level Hadwae Achitectue Figue 4. SMF Figue 7. On-Chip Memoy and Datapath F Figue 5. DMF = median F x, t 1, F x, t, F ) ( ( ) ( ) ( x )) SMF MCA, F DMF F F ) = median ( F ( x + τv, t 1 ), F ( x (1 τ ) v, t), F ( x LI, )) ) = kf ) + ( 1 k) F ( x ) SS LI MCA, CMF (3) (4) (5) ) = median( FSMF ), FDMF ), F ( x SS, )) (6) III. PROPOSED HARDWARE ARCHITECTURE The top-level block diagam of the poposed fame intepolation hadwae achitectue is shown in Fig. 6. The poposed hadwae achitectue implements LI, MCA, SMF, DMF, SS and CMF fame intepolation algoithms and it allows adaptive selection of these algoithms fo each MB. The poposed hadwae intepolates fames MB by MB. It takes the selected intepolation algoithm and the MV fo each 16x16 MB as inputs and pefoms the fame intepolation. In this pape, we implemented the on-chip memoy and the datapath pat of this hadwae shown in Fig. 7. The input MV to the fame intepolation hadwae points to a MB in the cuent fame and a MB in the efeence fame in a ange of (±48, ±24) pixels. MVs used in the intepolation pocess coespond to a lage seach ange in the ME pocess. Fo example, fo the convesion atio 1:2, the MVs with a ange of (±48, ±24) pixels used in the intepolation pocess coespond to a seach ange of (±96, ±48) pixels in the ME pocess.

Figue 8. Data Stoed in the On-Chip Memoy Figue 9. MB Schedule As shown in Fig. 7 and Fig. 8, the on-chip memoy consists of 32 BRAMs, and it is used to stoe 112x64 pixels fom the cuent fame and 112x64 pixels fom the efeence fame. BRAM 0 to BRAM 15 ae used to stoe the pope aea fom the cuent fame and BRAM 16 to BRAM 31 ae used to stoe the pope aea fom the efeence fame. Since each colo channel (R, G, B) is 10 bit wide, BRAMs ae configued as 448x32-bit, and each BRAM is used to stoe 4 lines of the equied aea fom the coesponding fame. As shown in Fig. 8, most of the data that should be stoed in the on-chip memoy fo two consecutive MBs ae the same. Theefoe, fo the next MB only the 64x16 pixels non-ovelapping aea, shown with the dashed lines in Fig. 8, can be accessed fom the fame memoy by using data e-use methodology. In addition, since the BRAMs in the FPGAs have dual pots, the intepolation of a MB can be ovelapped with accessing the non-ovelapping aea equied by the next MB fom the fame memoy as shown in Fig. 9. Howeve, this equies stoing additional 16 pixels pe line in each BRAM and it inceases the complexity of the addess geneation module. The poposed datapath includes 48 Pocessing Elements (PEs). The boxes named as R, G, and B in Fig. 7 epesent the PEs. Each PE pefoms the intepolation of a colo channel. Theefoe, the datapath intepolates R, G, B channels of a pixel in paallel and it intepolates 16 pixels in each clock cycle. The Rotato consists of 30 identical otatos each 16 bits long. They ae used to align the intepolated pixels to match with thei oiginal positions whee they must be in the cuent MB. The Output Registe File has 256 egistes each 30 bits long. The intepolated MB will be stoed in this egiste file, and it will be sent to the fame memoy by the memoy contolle. The block diagam of a PE is shown in Fig. 10. In the fist clock cycle of the intepolation pocess, the pevious pixel F( x, t 1) and the cuent pixel F ( x, t) will be stoed in 10 bit egistes Reg. P. and Reg. C.. In the second clock cycle, motion compensated values of pevious pixel F( x + τ v, t 1) and cuent pixel F ( x (1 τ ) v, t) will be stoed in the 10 bit egistes Reg. P. MC and Reg. C. MC. Reg. SMF, Reg. DMF and Reg. CMF include thee 10 bit egistes. In the second cycle, outputs of Reg. P. and Reg. C. will be added and the least significant bit will be discaded so that thei aveage will be calculated and stoed in the egiste Reg. DMF. Similaly, in the thid cycle MCA value will be calculated and stoed in the egiste Reg. SMF. Reg. CMF stoes the outputs of SMF, DMF and SS. SS value is calculated by the Soft Switching module. The block diagam of the Soft Switching module is shown in Fig. 11. This module takes LI and MCA, and multiplies them with k and (1-k). In ode to save aea, no multiplie o divide is used in this module. By only using one adde/subtacto and one multiplexe, multiplying with the k and (1-k) values 24/32:8/32, 20/32:12/32, 18/32:14/32, 16/32:16/32 can be ealized. Fo example, multiplying with the k value 20/32 can be ealized by adding the esult of <<2 (x4) opeation to the esult of <<4 (x16) opeation and multiplying with the (1-k) value 12/32 can be ealized by subtacting the esult of <<2 opeation fom the esult of <<4 opeation. The least significant 5 bits of the adde-subtacto esult is discaded to divide it by 32. The Median module is shown in Fig. 12. It takes thee 10 bit inputs A, B, and C, and finds the median of these inputs. The Median module has thee compaatos and fou 2-to-1 multiplexes. In ode to incease its clock fequency, pipelining egistes ae used at its input and output. Fist, the median value fo SMF is calculated. Then, the median value fo DMF is calculated in the next clock cycle. Finally, the median value fo CMF is calculated. In ode to calculate CMF, the esult of the Median module fo SMF and DMF ae stoed in Reg. CMF togethe with the esult of Soft Switching module. The Output Mux is used to select the esult of the intepolation algoithm specified by the Intepolation Algoithm input. This multiplexe selects eithe esults of LI, MCA, SS o the esult of Median module. The esults of LI, MCA and SS will be eady in the second, thid, and fouth clock cycles, espectively. SMF, DMF, and CMF esults will be eady in the 5th, 6th, and 8th clock cycles, espectively. When opeated in LI, MCA, SMF, DMF, o SS modes, thee is no need to stall the pipeline, but CMF

mode equies stalling the pipeline fo two clock cycles. The poposed hadwae achitectue is implemented in VHDL and mapped to a low cost Xilinx Spatan XC3SD1800A-4 FPGA using Xilinx ISE 9.2.04. It is veified with RTL simulations using Mento Gaphics Modelsim. The implementation esults show that the poposed hadwae can wok at 101 MHz and it consumes 15384 slices and 32 BRAMs. A PE consumes 222 slices. Soft Switching and Median modules consume 27 and 35 slices, espectively. When opeated in any mode except CMF, the poposed hadwae intepolates a 16x16 MB in 16 clock cycles afte the fist esult is eady. When opeated in CMF mode, it intepolates a 16x16 MB in 48 clock cycles afte the fist esult is eady. IV. CONCLUSIONS In this pape, a low cost econfiguable hadwae achitectue fo fame intepolation of HD fames is pesented. The poposed hadwae achitectue implements the LI, MCA, SMF, DMF, SS, and CMF fame intepolation algoithms and it allows adaptive selection of these algoithms fo each MB. The poposed hadwae achitectue is implemented in VHDL and mapped to a low cost Xilinx XC3SD1800A-4 FPGA. The implementation esults show that the poposed hadwae can un at 101 MHz on this FPGA and it consumes 32 BRAMs and 15384 slices. Figue 10. Pocessing Element Figue 11. Soft Switching Module Figue 10. Median Module

ACKNOWLEDGEMENTS This wok is suppoted in pat by TUBITAK (The Scientific and Technological Reseach Council of Tukey). REFERENCES [1] Bugwadia, K. A., Petajan, E. D., Pui, N. N., Pogessive-Scan Rate Up-Convesion of 24/30 Hz Souce Mateials fo HDTV, IEEE Tans. on Consume Electonics, vol. 42, no. 3, pp. 312-321, Aug. 1996. [2] Castagno, R., Haavisto, P., Ramponi, G., A Method fo Motion Adaptive Fame Rate Up-Convesion, IEEE Tans. Cicuits Syst. Video Technol., vol. 6, no.5, pp. 436 442, Oct. 1996. [3] Ojo, O. A., De Haan, G., Robust Motion- Compensated Video Upconvesion, IEEE Tans. on Consume Electonics, vol. 43, no. 4, pp. 1045-1056, Nov. 1997. [4] Tasdizen, O., Kukne, H., Akin, A., Hamzaoglu, I., High Pefomance Reconfiguable Motion Estimation Hadwae Achitectue, DATE Confeence, Nice, Fance, Ap. 2009. [5] Ha, T., Lee, S., Kim, J., Motion Compensated Fame Intepolation by new Block-based Motion Estimation Algoithm, IEEE Tans. on Consume Electonics, vol. 50, no. 2, pp. 752-759, May 2004. [6] Zhai, J., Yu, K., Li, S., A Low Complexity Motion Compensated Fame Intepolation Method, IEEE ISCAS, pp. 4927-4930, Kobe, Japan, May 2005. [7] Choi, B. D., Han, J. W., Kim, C. S., Ko, S. J., Motion-Compensated Fame Intepolation Using Bilateal Motion Estimation and Adaptive Ovelapped Block Motion Compensation, IEEE Tans. Cicuits Syst. Video Technol., vol. 17, no. 4, pp. 407 416, Ap. 2007. [8] Yang, Y. T., Tung, Y. S., Wu, J. L., Quality Enhancement of Fame Rate Up-Conveted Video by Adaptive Fame Skip and Reliable Motion Extaction, IEEE Tans. Cicuits Syst. Video Technol., vol. 17, no.12, pp. 1700 1713, Dec. 2007. [9] Lee, S. H., Shin, Y. C., Yang, S., Moon, H. H., Pak, R. H., Adaptive Motion-Compensated Intepolation fo Fame Rate Up-Convesion, IEEE Tans. on Consume Electonics, vol. 48, no. 3, pp. 444-450, Aug. 2002. [10] Lee S. H., Kwon, O., Pak, R. H., Weighted-Adaptive Motion-Compensated Fame Rate Up-Convesion, Tans. on Consume Electonics, vol. 49, no. 3, pp. 485-492, Aug. 2003. [11] Beic, A., Van Meebegen, J., De Haan, G., Sethuaman, R., Memoy-Centic Video Pocessing, IEEE Tans. Cicuits Syst. Video Technol., vol. 18, no.4, pp. 439 452, Ap. 2008. [12] De Haan, G., Biezen, P. W. A. C., Ojo, O. A., An Evolutionay Achitectue fo Motion-Compensated 100 Hz Television, IEEE Tans. Cicuits Syst. Video Technol., vol. 5, no.3, pp. 207 217, Jun. 1995. [13] De Haan, G., Kettenis, J., Löning, A., De Looe, B., IC fo Motion-Compensated 100 Hz TV with Natual-Motion Movie-Mode, IEEE Tans. on Consume Electonics, vol. 42, no. 2, pp. 165-174, May 1996. [14] De Haan, G., IC fo Motion-Compensated De- Intelacing, Noise Reduction, and Pictue-Rate Convesion, IEEE Tans. on Consume Electonics, vol. 45, no. 3, pp. 617-624, Aug. 1999. [15] Beic, A., De Haan, G., Sethuaman, R., Van Meebegen, J., An Efficient Pictue-Rate Up- Convete, Jounal of VLSI Signal Pocessing, vol. 41, no. 1, pp. 49-63, Aug. 2005.