Interframe Bus Encoding Technique for Low Power Video Compression

Interframe Bus Encoding Technique for Low Power Video Compression Asral Bahari, Tughrul Arslan and Ahmet T. Erdogan School of Engineering and Electronics, University of Edinburgh United Kingdom Email: {a.bahari, tughrul.arslan, ahmet.erdogan}@ed.ac.uk Abstract This paper proposes a data encoder to reduce switched capacitance on system bus. Our method focuses on transferring raw video data (pixels) between off-chip memory and on-chip memory which is common in video compression applications. This method is based on entropy coding to minimize bus transition. The existing technique exploits the correlation between neighbouring pixels. In our technique, we exploit pixel correlation between two consecutive frames. Our method shows 54% transition saving when combined with the existing technique which is equivalent to 38% power saving for 15pF off-chip bus capacitance. This method is suitable for applications where multiple frames are transferred from off-chip memories such as in MPEG-4 AVC/H.264 encoder. 1. Introduction Interactive multimedia has become increasingly popular in today s wireless communications. For example, today, the consumer can communicate beyond sound and text. They can see the person they are talking to during video communication. In addition, they can utilize video streaming and live TV broadcasting. However, video processing is computing intensive and dissipates a significant amount of power. This is a major limitation in today s portable devices. Existing multimedia devices can only play a video application for a few hours before the battery is depleted. This limits the user s experience and becomes a major bottleneck for the development of more attractive applications. For MPEG-4 video compression, raw video data (pixel) dominates data transfer [1]. During compressing five minutes video (QCIF resolution at 15 frame per second), at least 171 million pixels are transferred from memory to video compressor. These values increase for higher frame rates and frame resolution. This high data transfer translates into high power dissipation on the memory-processor busses. This is severe for systems with off-chip memory where the bus load is several orders of magnitude higher than the onchip bus. It has been reported that the off-chip bus consumes 10%-80% of overall power [2]. For video communication where pixels are continuously being transferred to or from external memory, bus power consumption cannot be neglected. CMOS power consumption is give by P = C l Vdd 2 f α, where C l is capacitance load, V dd is operating voltage, f is operating frequency and α is switching activity. In most cases, designers has no influence on C l, V dd and f. The only parameter that can be optimized at high level design is switching activity. In this paper, we present a data encoding technique to minimize the power dissipation during multiple frames transfer to/from off-chip system bus. The power reduction is achieved by utilizing bus encoding to reduce switching activity on the bus. Bus encoding transforms the original data such that two consecutive encoded data has lower switching activity than the unencoded one. [3], [4] and [5] implement the bus encoding on address busses. These methods exploit the highly correlated bus addresses to reduce switching activities. Compared to address busses, data busses show more random characteristics. Bus invert [6], codebook [7] and exact algorithm [8] were proposed for this type of data. The existing techniques exploit the correlation between neighbouring pixels for video data. However, the pixel correlation between frames has not been fully exploited to reduce bus transition. In this paper, we propose an interframe bus encoding technique where we utilise the pixel correlation between two consecutive frames. The results show that this approach minimises the bus transition by an average of 65% over unencoded bus. When combined with existing techniques, this method shows superior performance with 54% tran-

sition saving. This paper is organised as follows. Section 2 reviews the existing intraframe techniques for bus encoding. Section 3 discusses our approach to reducing the transition activity during memory data transfer. This is followed by the results and performance benchmarking of our method in section 4. Finally, section 5 concludes the paper. 2. Intraframe decorrelation The technique discussed in this paper is based on the combination of difference-base-mapped and valuebase-mapped (dbm-vbm), as discussed in [9]. We adopt this method because it allows us to exploit the pixel correlations widely available in video data. In MPEG-4 memory-processor communication, the main data transfer consists of streams of pixels representing image sequences. Often, the pixels are scanned in a block-based manner to maintain high correlation between adjacent pixels. Fig. 1 shows the pixels difference distribution for four different frame sequences and Fig. 2 shows the equivalent switching activity for transmitting the pixel using bus. For highly correlated data, the difference between two consecutive pixels with smaller magnitude has higher probability than the larger magnitude. Dbm-vbm utilises this characteristic to minimize the bus transition. In addition, the bus switching activity for image data distribute non-uniformly and non-stationary as shown in Fig. 2. The distribution depends on the type of video sequences. Frame with high texture tend to have more bits with high switching activity, and vice versa. Fig. 3 shows the block diagram describing the dbm-vbm operation. It consists of decorrelator (dbm) and entropy coder (vbm). The dbm-vbm technique is summarized as follows (interested reader should refer [9] for more detail description). First, two adjacent pixels (intraframe) are decorrelated using dbm. Dbm calculates the relative difference between the two pixels. Vbm maps the values to patterns that have different weights (i.e., total number of 1s). To reduce the overall transition, it maps the low magnitude value to a pattern that has the fewest 1s, whereas higher magnitude values are mapped to patterns that have more 1s. At the output, the XOR translate 1s as transition and 0s as transitionless. The average number of transitions for the dbm-vbm method depends on its source-word, i.e., the decorrelator output. The more the graph is skewed toward zero, the more patterns are assigned with less 1s. Thus, one way to improve the transition reduction is by improving the decorrelator. 3. Interframe decorrelation Video sequences consist of both spatial and temporal redundancy. The existing bus encoding techniques utilize spatial redundancy within frame. However, the temporal redundancy is not fully exploited to reduce bus transition. In our method, we propose decorrelating the pixels using two consecutive frames (interframe). This method is based on the observation that two consecutive frames are highly correlated. Often, the background of a scene is stationary. Furthermore, for a moving object, the differences between successive frames are very small. Fig. 4 compares the pixel decorrelation using intraframe and interframe for Foreman sequences. The figure shows that decorrelating the pixels using interframe improves the graph skew-ness towards zero. This will translate to higher transition saving since more patterns will be assigned with less 1s. In order to demonstrate the effectiveness of our method, we use the setup shown in Fig. 5. Assuming 8 bits represent each pixel, we require two 8-bit busses to transfer two pixels in parallel, as shown in Fig. 5(a). Let frames A and B be the two consecutive frames, and and be the busses that transmit these frames, respectively. To exploit the redundancy of these busses, we decorrelate one with respect to the other. Let be the reference bus. is then decorrelated with respect to using dbma. The dotted arrow in Fig. 5(b) represents this. DbmA has a similar function to dbm, except that it calculates the relative difference between two current inputs. The output from dbma is then mapped to the vbm table and XOR-ed as in intraframe. The decoder works as an inverse function of the encoder. To reduce the bus transition on the reference bus, we apply the existing bus encoding techniques (intraframe) to. Fig. 5(c) shows the complete setup for the proposed bus encoding technique. 4. Result and discussion In this section, we first analyse the effectiveness of the interframe bus encoding technique as applied to. Next, we analyse the effect on interframe transition as the distance between frames increases. Finally, we compare the total power reduction achieved utilizing our method. Three types of video sequences (Akiyo, Foreman and Table Tennis) were used throughout our analysis to represent different sequence classes as defined in [10]. We benchmark our method against the intraframe method and bus invert method. As shown in Fig. 2, the

1 0.9 0.8 akiyo foreman foreman grasses 0.7 Normalise Distribution 0.6 0.5 0.4 Interframe Enc dbma vbm+ xor (a) Interframe Dec xor+ vbm dbma 0.3 0.2 0.1 0-50 -46-42 -38-34 -30-26 -22-18 -14-10 -6-2 2 6 10 14 18 Pixel Difference 22 26 30 34 38 42 46 50 Interframe Enc (b) Interframe Dec Figure 1. Intraframe decorrelation using adjacent pixel. dbma vbm+ xor xor+ vbm dbma Normalise Switching Activity rate 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Akiyo Foreman Table Grasses 1 2 3 4 5 6 7 8 Bit Position (LSB MSB) Figure 2. Bit switching acitvity for image data Decorrelator Bus Encoder Entropy Coder Bus Decoder Entropy Decoder Correlator Figure 3. Dbm-vbm bus encoder and decoder. Figure 4. Interframe vs. intraframe decorrelation for Foreman sequence. Intraframe Enc (c) Intraframe Dec Figure 5. Experiment setup: (a) Unencoded busses (b) Encoding using interframe (c) Encoding and. data switching activity is non-uniform. Thus, using normal bus invert method for the whole 8 bits will not result in optimise solution. Furthermore, since the switching activity is non-stationary, partial bus invert [11]is not optimise for this type of data since the choice of bus lines involve is static. Instead, we chose adaptive partial bus invert (APBI) as propose in [12] and clustered bus invert as comparisons to our technique. From our simulation, grouping 4bit bus at the LSB and another 4bit bus at the MSB give better transition saving for wide range of image data. Table 1 shows the percentage of transition reduction for applying bus encoding techniques to as in Fig. 5(c). The results show that the interframe method provides higher transition reduction compared to both bus invert and intraframe implementations. On average, our method reduces 65% of the transition over unencoded busses. This is equivalent to a 1.5 and 2.5 times more transition saving over intraframe and clustered bus invert, respectively. APBI results in low transition saving compared to clustered bus invert. This is because the APBI requires some delay before it determines the right combination of bits to be inverted. This reduces the effectiveness of this technique compared to clustered bus invert. The amount of switching reduction is dependent on the frame characteristics. For frames with a low amount of movement, the amount of transition reduction is large

(87% as in Akiyo). This is because the correlation between successive frames is high. Alternatively, for frames with high amounts of movement, the correlation between frames is low. This results in less transition reduction, as shown in the Table Tennis sequences (55%). However, in both cases, the interframe shows superior performance compared to intraframe and bus invert techniques. In addition, as the distance between two frames increases, the frames correlation decreases. This is reflected by the decrease in transition reduction for interframe, as shown in Table 2. For high motion sequences, the decrease is more rapid, as shown in the Table Tennis case. Increasing the frame distance from 1 to 2 degrades the performance by 22%. On the other hand, for low motion sequences (such as Akiyo), increase in frame distance only degrades the transition reduction by about 3%. In general, using two consecutive frames improves transition reduction when the frame distance is kept as close as possible and the correlations between frames are high. Table 3 compares the total transition saving on both and, as in Fig. 5(c). Since interframe is only applicable on, we implement intraframe on. The results show that the interframe-intraframe combination (inter-intra) shows superior performance compared to the intraframe-intraframe pair (intra-intra). The inter-intra combination gives a 54% transition reduction over unencoded bus. This is equivalent to a 22% improvement over the intra-intra pair. This improvement is contributed by the higher transition reduction due to the interframe technique on. We synthesized the bus encoder and decoder using 0.13µm UMC technology library with 1.25V supply voltage at 20MHz clock frequency. The design was mapped onto logic gates using Synopsys Design Compiler, and the functional behaviour was verified using Verilog XL. Synopsys Power Compiler was used to perform gate level power evaluation. Table 4 shows the total encoder and decoder circuit area (µm2), power (µw) and delay (ns) overhead for implementing bus coding on and (refer to Fig. 4(c)). Both inter-intra and intra-intra require more area compared to bus invert. This is because they require more logic gates to implement the decorrelator and vbm look-up table. Compared to intra-intra, inter-intra requires less area and power. This is because the intraframe decorrelator requires additional register to store the previous input value whereas the interframe decorrelator uses current bus values as its input. This eliminates the need for extra register in the interframe decorrelator. Fig. 6 shows the total power consumption for bus Bus Akiyo Foreman Table Tennis Avg Trans Trans Trans None 157775-187429 - 223166 - - BI 122798 22 145229 23 149031 33 26 APBI 134054 15 164032 12 189505 14 14 Intra 90483 43 116715 38 122418 45 42 Inter 21290 87 84780 55 101508 55 65 Table 1. Comparison of different bus encoding applied to Frame Akiyo Foreman Table Avg Trans Trans Trans 1 21290 87 84780 55 101508 55 65 2 25852 84 99227 47 126890 43 58 3 30034 81 109492 42 136485 39 54 4 33076 79 116715 38 141427 37 51 5 36117 77 118616 37 144468 35 50 Table 2. Transition saving for different frame distances applied to interframe bus encoding encoding techniques implemented in Fig. 5(c). The total power consumption, P T, is calculated as P T = P Enc + P Dec + P CL, where P Enc and P Dec represent the power consumption due to bus encoder and decoder circuits. P CL is the total bus power consumption estimated by P CL = 1 2 C LVDD 2 f α. In general, the slope of the graphs in Fig.5 is proportional to α, which is dependent on the type of bus encoding used. From the graph, for capacitance less than 5pF, the circuit power offsets the power saving on the busses. However, for wire capacitance greater than 5pF, the bus power reduction exceeds the circuit power overhead. This result in inter-intra dissipates much lower power compared to the other techniques. For typical off-chip wire capacitance of 15pF, the total bus power consumption is 2.5mW compared to 4mW for unencoded busses. This is equivalent to 38% power savings, compared to only 24% and 5% using intra-intra and bus invert, respectively. The greater power savings achieved in interintra is due to much lower transitions occurring on the busses. A smaller slope, as shown in the graph in Fig. 5 reflects this. 5. Conclusion We have presented an interframe bus encoding technique for MPEG-4 applications. The combination of interframe and intraframe results in a 54% transition reduction over unencoded bus. This is equivalent to 22% transition reduction improvement compared to

- Akiyo Foreman Table Avg Trans Trans Trans None 315550-374858 - 446332 - - BI- 245596 22 290458 23 298062 33 26 BI Intra- 180966 43 233430 38 244836 45 42 Intra Inter- Intra 111773 65 201495 46 223926 50 54 Table 3. Transition saving for different bus coding implemented on and Area Power Delay BI BI 1936 84 2.6 Intra Intra 29050 730 4.5 Inter Intra 28691 635 4.3 Table 4. Total encoder and decoder circuit overhead for different bus coding implemented on and Power (uw) 6000 5000 4000 3000 2000 1000 No bus coding (16 bit) Bus inv (20 bit) Intra-intra (16 bit) Inter-intra (16 bit) 0 0.0E+00 5.0E-12 1.0E-11 1.5E-11 2.0E-11 Capacitance per bus wire (F) existing techniques that use the intraframe approach. This method is suitable for applications where multiple frames are being transferred between memories, such as during motion prediction in MPEG-4 AVC/H.264 encoder. References [1] C.-H. Lin et al, "Low power design for mpeg-2 video decoder," IEEE Transactions on Consumer Electronics, vol. 42, no. 3, pp. 513-521, 1996. [2] Musoll, E. et al, "Exploiting the locality of memory references to reduce the address bus energy," Proc. of ISLPED 1997, pp. 202-7. [3] H. Mehta et al, "Some issues in gray code addressing," Proc. of the Sixth Great Lakes Symposium on VLSI, pp. 178-81, 1996. [4] L. Benini et al., "Asymptotic zero-transition activity encoding for address busses in low-power microprocessorbased systems," Proc. of Great Lakes Symposium on VLSI, pp. 77-82, 1997. [5] W. Fornaciari et al, "Power optimization of system level address buses based on software profiling," Proc. of CODES 2000, pp. 29-33, 2000. [6] M. Stan et al, "Bus-invert coding for low-power i/o," IEEE Trans. on VLSI Systems, vol. 3, no. 1, pp. 49-58, Mar. 1995. [7] S. Komatsu et al., "Low power chip interface based on bus data encoding with adaptive code-book method," Proc. Ninth Great Lakes Symposium on VLSI, pp. 368-71, 1999. [8] L. Benini et al, "Architectures and synthesis algorithms for power-efficient bus interfaces," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 9, pp. 969-80, Sept. 2000. [9] S. Ramprasad et al, "A coding framework for low-power address and data busses," IEEE Trans. on VLSI Systems, vol. 7, no. 2, pp. 212-21, June 1999. [10] F. Pereira et al, "Mpeg-4 video subjective test procedures and results," IEEE Trans on Circuits and Systems for Video Technology, vol. 7, no. 1, pp. 32-51, Feb. 1997. [11] Youngsoo Shin et al, Partial Bus-Invert Coding for Power Optimization of Systelm Level Bus, ISLPED, pages 127-129, 1998. [12] Siegmund R. et al, Adaptive Partial Businvert Coding for Power-Efficient Transfer over Wide System Busses, XIII International Symposium on Integrated Systems and Circuit Design SBCCI 2000, Manaus (Brazil), September 2000, 18-23, 2000, Figure 6. Total bus power consumption vs. wire capacitance