LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION Jng Lu, Fe Qao *, Zhjan Ou and Huazhong Yang Department of Electronc Engneerng, Tsnghua Unversty ABSTRACT Ths paper presents a low complexty vdeo codng method based on Underdetermned Blnd Sgnal Separaton (UBSS). The detaled codng framework s desgned. Three key technques are proposed to enhance the compresson rato and the qualty of the decoded frames. The experments valdate that the proposed method costs 30ms encodng tme less than DISCOVER. The smulaton shows that ths new method can save 50% energy compared wth H.264. Index Terms Low-Complexty vdeo encoder, Underdetermned blnd source separaton 1. INTRODUCTION Recently, the smart camera systems ganed more popularty wth the emergence of wreless vdeo survellance, multmeda sensor network and mage capturng of moble devces [1]. However, lmted power supply and low computatonal ablty of the encoder mpedes the wde mplementaton of such novel applcatons wth a suffcent mage/vdeo compresson rato. Therefore, low-complexty vdeo encoders wth less computaton complexty and power consumpton are necessary to cater for such gven smart eyes. Multple attempts n three key categores have been made to reduce the encodng computaton complexty. The frst one s the mprovement of the conventonal vdeo encoder ncludng smplfy moton estmaton, DIC/IDCT and quantzaton whch almost occupy 93% encodng computaton tme [2]. But the computaton complexty s stll hgh after the enhancement. The second one s the Dstrbuted Vdeo Codng (DVC) approach developed based on Slepan-Wolf (SW) theory and Wyner-Zv (WZ) theory [3, 4]. Compared to conventonal vdeo codec, the complexty dstrbuton of DVC s swapped. But ts decoder s too complex to decode the vdeo frames n real tme. Moreover, Compressve Sensng (CS) method s used to compress vdeo sequence wth low complexty consumpton [5, 6], but t s very dffcult to trade off between compresson rato and recovered pcture qualty. So far, there are no acceptable standards for the low-complexty encodng applcatons. New efforts are consstently beng made to mprove exstng low-complexty encodng methods or develop new ones. Motvated by the fact that several sounds can be pcked out from a mxed audo sgnal n a nosy cocktal party, we propose a novel vdeo compresson approach that several vdeo frames are frst encoded as one mxed frame. At the decoder sde, the mxed frame s separated nto the estmatons of the prevous vdeo frames va Underdetermned Blnd Sgnal Separaton (UBSS). The UBSS framework s a natural ft for low-complexty vdeo encodng, because the nvolved computaton of the encoder sde s only the matrx multplcaton. Ths paper s organzed as follows. The secton II brefly revews the related BSS problem. In secton III, the detaled structure of the new approach s frst provded, and three key technologes are ntroduced. And secton IV shows smulaton result to valdate ths method. Fnally, secton V summarzes the proposed method 2. ENCODING AND DECODING ALGORITHM Blnd Sgnal Separaton was frst establshed by J. Herault and C. Jutten n 1985 [7], amed at recoverng the unknown source sgnals only by several observed lnear mxed sources. It can be descrbed by the followng equatons. y Wx WAs (1) where ( ) [ ( ) ( ) ( )] s the unknown source sgnals matrx. ( ) [ ( ) ( ) ( )] s the observed sgnals m n matrx. y s the estmaton of s. A s the mxng R n m matrx. W R s the separatng matrx. In ths paper, the randomly Gaussan matrx s used as the mxng matrx [8]. The consecutve vdeo frames are taken as the source sgnals s n equaton (1) and dsregarded ther tme sequence. Varyng from the tradtonal UBSS problem, the order of the recovered frames would not be dsrupted because the mxng matrx s known exactly to the separaton sde. The codng process s shown n Fg. 1. At present, there are three types of separaton algorthms: greedy algorthm, l1 mnmzaton, Total Varaton (TV) mnmzaton. TV mnmzaton can recovery those sgnals or mages whch are not sparse but ts gradent s sparse [9], whch s sutable for the vdeo frames. In order to mprove TV mnmzaton s computaton
complexty and guarantee ts robustness, C. L ntroduced the augmented lagrangan method to solve TV mnmzaton problem and proposed TVAL3 algorthm n 2009 [9]. In our paper, TVAL3 s adopted as the separaton algorthm because t has better performance than greedy algorthm, l1 mnmzaton and other TV mnmzaton algorthms. Of course, there are nherent errors of TVAL3 [9]. Also, the sze of source sgnals matrx nfluences the qualty of recovered frames and separaton tme [9]. f1 f f... 2 3 f n Mxng Matrx A mx f... mx 1 f m Channel mx... mx 1 f m f Separatng Algorthm f 1 f 2 f... 3 f n Fg. 1 Vdeo codng process by UBSS (n>m) Encoder Pre-processng Unt Conventonal Intra Encoder (such as JPEG H.264 Intra) Mxng Unt Decoder Conventonal Intra Decoder (such as JPEG H.264 Intra) Separatng Unt... Buffer Fg. 2 Vdeo compresson framework based on UBSS Inverse Pre-processng Unt Decoded Decoded 3.1 Codec Structure 3. CODER IMPLEMENTATION For Underdetermned Blnd Sgnal Separaton (UBSS), the amount of source sgnals s more than that of observed sgnals. So t can be properly used to compress the vdeo sequence[10]. The proposed vdeo compresson framework based on UBSS s shown n Fg. 2. The vdeo frames are dvded nto key frames and UBSS frames. The key frames are encoded and decoded by conventonal ntra codng methods such as JPEG, H.264 Intra. Whle the UBSS frames are encoded by underdetermned mxng and decoded by separaton algorthm of UBSS. At the encoder sde, there s a pre-processng unt, whch usually needs the nformaton of key frame, before mxng unt. For the proposed low-complexty vdeo encoder, three technologes are employed to enhance the decodng qualty, lower the decodng complexty and reduce the memory consumpton for encodng. Frst, for the pre-processng unt shown n Fg. 2, the key technology that the UBSS frames frst subtract ther precedng key frames before mxng s used to ensure the recovered qualty. Second, for the mxng unt, there are two key technologes. One s the streamedway mxng approach to save memory at the encoder sde. The other s the block-level mxng method for decodng complexty reducton and encodng qualty enhancement. A detaled explanaton s provded as follows. 3.2 Resdual Mxng In ths work, the pre-processng unt for UBSS frame n Fg. 2 s to subtract the key frame. Only the resduals are mxed. We call ths approach as Resdual Mxng. There are two advantages by performng Resdual Mxng. Frstly, the decodng qualty wll be better due to the sparser resduals gradent. So the compresson rato of UBSS frame could be ncreased. Secondly, by subtractng the precedng key frame, the relatve error of decoded UBSS frame caused by nherent errors of TVAL3 can be reduced. At the decoder sde, the UBSS frame resduals are frst decoded by TVAL3. And then, these recovered resduals are added to the relevant key frame to get the decoded UBSS frames. PSNR(dB) 55 50 45 35 30 25 20 Foreman Resdual No Resdual 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recprocal of compresson rato Fg. 3 Effect of resdual mxng on separaton qualty To valdate the effect of ths approach, experments are performed on vdeo sequence Foreman, whose spatal resoluton s QICF (144*176). The separaton effect s evaluated by Peak-Sgnal-to-Nose Rato (PSNR). Fg. 3 shows the experment result. The x axs represents the
recprocal of compresson rato. It s obvous that the qualty of recovered mage after subtractng key frame s 10dB better than that of non-resdual mxng. 3.3 Streamed-Way Mxng The UBSS frames are encoded by mxng several UBSS frames. If the mxng process s conducted after all needed UBSS frames are sampled and stored at the encoder sde, t requres a lot of memory consumpton, shown n Fg. 4(a). Here we take n=4 for example. In order to save the memory consumpton, specal mxng approach s needed. In ths applcaton, mxng matrx s fxed, t s possble that the frames are encoded n samplng order as shown n equaton (2), where a s the - th column of A. Therefore, the streamed-way mxng can be acheved by dvdng A nto several subsets A, whch s composed by the -th column of mxng matrx A, shown n Fg. 4(b). 1 2 3 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 T x A f f f f a f a f a f a f A f A f A f A f f 1 f f 2 f 3 f 4 Buffer 1 Buffer 2 Buffer 3 Buffer 4 Mxng Matrx mn AR ( m n) (a). wthout streamed-way mxng Matrx A Mxng Buffer N Encodng Complete? (b). wth streamed-way mxng Fg. 4 Dfferent mxng process Encoded Stream x Y (2) Encoded Stream x By mxng n a streamed-way, the memory sze needed n the encoder sde for storng frames s equal to the fnal compressed frame sze. Compared (a) and (b), t s obvous that mxng n a streamed-way can save lots of memores. 3.4 Block-Level Mxng In ths work, UBSS frames are beng mxed at block-level. Ths s because that for TVAL3 algorthm, when the compresson rato s fxed, the nput sze not only nfluences the qualty of recovered mage, but also affects decodng tme. To valdate ths concluson, multple experments are conducted on the same vdeo sequence Foreman. Table I shows the results. The result shows that when both decodng tme and PSNR are taken nto account, the optmum block sze s 32 32. In our smulaton, the 32 32 block s composed by 4 correspondng 16 16 blocks from 4 consecutve UBSS frames shown n Fg. 5. Moreover, t s very effcent to employ the temporal correlaton among frames by choosng correspondng blocks from consecutve UBSS frames. Table I. Decodng tme and PSNR for dfferent block sze Block Sze 4*4 8*8 16*16 32*32 64*64 Tme(s) 78.842 24.711 12.355 14.976 43.103 PSNR(dB) 8.920 21.104 21.841 29.975 33.375 1 2 3 4 Encoder Mxng Matrx A Fg. 5 Mxng process n block-level - H.264 Intra encoder Decoder H.264 Intra decoder...mxngseparatng Buffer Decoded Key Frame Decoded UBSS Frame Fg. 6 Detaled codec structure of UBSS+H.264 4. SIMULATION RESULTS AND ANALYSIS x1 x 2 xm Ths secton compares the performance of the proposed method and three well-known low-complexty vdeo codng methods, H.264 Intra, H.264 No moton and DISCOVER. H.264 Intra represents that each vdeo frame s encoded by Intra codng method. H.264 No moton means that the moton vector for moton estmaton s set to zero. DISCOVER s one of the most well-known low complexty vdeo codng methods based on DVC. Fg. 6 shows the detaled structure adopted for performance comparson, defned as UBSS+H.264. In ths structure, H.264 Intra s used as the key frame encodng method. And three key technologes are adopted as well. The compresson performance, encodng tme, hardware resource consumpton are chosen as there crtera to evaluate the performance of four dfferent vdeo codng approaches. 4.1 Compresson Performance Compresson performance s one key performance crteron of a vdeo compresson standard. It represents the relatonshp between the qualty of recovered mage and the compresson rato. For an excellent vdeo codng method, ts recovered mage s qualty s hgher than others at the same compresson rato. Experments are conducted on Foreman. The experment results are shown n Fg. 7.
The results show that wth the ncrement of compresson rato, PSNR of UBSS+H.264 decreases slower than that of the others. When compresson rato s small, PSNR of UBSS+H.264 s the lowest. It s caused by the error of TVAL3 algorthm. But when compresson rato s larger than 30, UBSS+H.264 s PSNR exceeds H.264 Intra s PSNR and s comparable wth H.264 No Moton and DISCOVER. PSNR(dB) Fg. 7 Compresson performance of four codng methods 4.2 Encodng tme The encodng complexty of the proposed vdeo codec s composed by two parts, the encodng complexty of the key frames and that of the UBSS frames. There are many ways to measure the encodng complexty. But most of them are very dffcult to mplement. So n ths paper, encodng tme s used as a measurement of encodng complexty. Encodng tme per frame(ms) 38 36 34 32 30 28 26 0 5 10 15 20 25 30 35 45 50 Compresson rato 220 200 180 160 1 120 100 80 60 H.264 Intra H.264 No moton DISCOVER UBSS+H.264 H.264 Intra H.264 No moton DISCOVER UBSS+H.264 0 5 10 15 20 25 30 35 45 50 Compresson rato Fg. 8 Encodng tme of four codng methods The test s conducted on the computer wth Intel 5 CPU at 2.67GHz, 2GB RAM and 32-bt Wn7 operatve system. The smulatons are done wth C++ code usng release mode of Vsual Studo 2008. In order to ensure that the smulaton results are not nfluenced, nothng s run on the test PC expect operatve system. The experments are stll conducted on Foreman. Fg. 8 shows the experment result. The results demonstrate that UBSS+H.264 have the lowest encodng tme. The proposed vdeo codec consumes 60%, 57%, 30% lower encodng tme than H.264 No moton, H.264 Intra and DISCOVER respectvely at the pont of compresson rato 30. 4.3 Hardware Resource Consumpton In ths secton, energy consumpton and the number of equvalent gate are used to measure the complexty of the proposed codng method. The energy consumpton per UBSS frame s calculated by the followng steps. Frst, the power consumpton of UBSS frames s smulated by Desgn Compler (DC), then multply the power consumpton by the encodng tmes per UBSS frame to get the energy consumpton per UBSS frame shown n table II. The results show that wth the ncrement of compresson rato, the energy consumpton per UBSS frame decreases respectvely, an mportant advantage of the proposed method. Ths s because that the multplcaton tmes needed s reduced wth the reducton of mxng matrx sze. Also, the number of equvalent gate for UBSS encodng frame s 980, whch s sgnfcantly less than that of H.264 Intra. So the proposed framework UBSS+H.264 can effcently enhance the compresson performance of H.264 Intra only at lttle expense of complexty. Fg. 9 shows the orgnal frame, the decoded results of H.264 and the proposed method. Compared wth the decoded result of H.264, the decoded frames qualty of proposed method s acceptable. Table II. The energy consumpton per UBSS frame (mj) 1/Compresson Rato 0.02 0.05 0.08 0.10 0.50 Energy(mJ) 0.007 0.016 0.026 0.032 0.159 Orgnal H.264 Proposed method Fg. 9 Comparson results of orgnal frame, H.264 and proposed method decoded frames 5. CONCLUSION In ths paper, a new low-complexty vdeo codng method based on UBSS s proposed. The codec structure s presented. Three key technologes--resdual mxng, streamed-way mxng, block-level mxng--are proposed to enhance the compresson performance. The experment results demonstrate that the proposed method can acheve comparable compresson rato at very low expense of encodng tme. What s more, the proposed approach has an mportant advantage that the energy consumpton per UBSS frame decreases wth the ncrement of compresson rato.
6. REFERENCES [1]. X. Kunzh, Q. Fe, Q. We, and Y. Huazhong, "Smart-Eyes: a FPGA-based smart camera platform wth effcent mult-port memory controller," n ICMT-13, 2013. [2]. Y. K., J. Lv, J. L, and S. L, "Practcal real-tme vdeo codec for moble devces," n ICME 03, pp. III-509-12 vol.3. [3]. B. Grod, A. M. Aaron, S. Rane, and D. Rebollo-Monedero, "Dstrbuted Vdeo Codng," Proceedngs of the IEEE, vol. 93, pp. 71-83, 2005. [4]. F. Dufaux, W. Gao, S. Tubaro, and A. Vetro, "Dstrbuted Vdeo Codng: Trends and Perspectves," Eurasp Journal on Image and Vdeo Processng, 2009. [5]. M. Wakn, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baranuk, "Compressve magng for vdeo representaton and codng," n Pcture Codng Symposum, 2006. [6]. M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baranuk, "Sngle-pxel magng va compressve samplng," Sgnal Processng Magazne, IEEE, vol. 25, pp. 83-91, 2008. [7]. A. Cchock and S.. Amar, Adaptve blnd sgnal and mage processng: learnng algorthms and applcatons. Chchester, England ; New York: John Wley, 2002. [8]. R. G. Baranuk, "Compressve sensng [lecture notes]," Sgnal Processng Magazne, IEEE, vol. 24, pp. 118-121, 2007. [9]. C. L, "An effcent algorthm for total varaton regularzaton wth applcatons to the sngle pxel camera and compressve sensng," Rce Unversty, 2009. [10]. J. Lu, F. Qao, Q. We, and H. Yang, "A Novel Vdeo Compresson Method Based on Underdetermned Blnd Source Separaton," n Multmeda and Ubqutous Engneerng. vol. 2, ed: Sprnger Netherlands, 2013, pp. 13-20.