SKIP Prediction for Fast Rate Distortion Optimization in H PDF Free Download

SKIP Prediction for Fast Rate Distortion Optimization in H.264 Avishek Saha, Kallol Mallick, Jayanta Mukherjee, Senior Member, IEEE and Shamik Sural, Senior Member, IEEE Abstract In H.264, the optimal coding mode for each macroblock (MB) is selected by exhaustively searching all MB modes in the multiple reference frames. The exhaustive mode search has a very high computational complexity. This paper proposes three different approaches along with their combination to reduce the complexity of the rate-distortion optimized mode decision process in H.264. These approaches are based on transform-domain properties and spatio-temporal correlation in video sequences. Experimental results demonstrate that the proposed methods achieve around 5-7 times improvement in average speedup with good prediction quality and comparable bitrate 1. Index Terms SKIP Prediction, ρ-domain, restricted reference frame, coding efficiency. I. INTRODUCTION The H.264/AVC is the state-of-the-art video compression standard recently developed by the ITU-T/ISO/IEC Joint Video Team [1]. Compared to previous standard this new video coding standard can deliver significantly improved compression efficiency, which makes it possible to transmit high quality video over lower bit rate channels. In addition, the increased flexibility of H.264 encoding and transmission caters to a broad spectrum of video applications enabling new video services over cable, satellite and mobile networks. However, these performance gains of H.264 come at a cost of increased computational complexity [2]. The decoding complexity increases by a factor of four, whereas the encoding complexity may be as high as nine times over MPEG-2. This huge increase in encoder complexity is mainly due to Rate-Distortion Optimization (RDO) of the Motion Estimation (ME) and Mode decision (MD) processes in H.264. An H.264/AVC video encoder typically consists of the encoding modules of motion estimation, motion compensation, integer transform, quantization and entropy coding. In H.264/AVC, an MB can be encoded using intra prediction from neighboring samples in the same frame or using inter prediction from samples in a previously coded frame/frames. In addition, H.264 supports the use of variable Macro Block (MB) partition sizes encoded in the form of MB modes [1]. The MBs can be of 1 This work has been supported by a research grant from the Department of Science and Technology (DST), Govt. of India, under Research Grant No. SR/S3/EECE/024/2003. Avishek Saha, Kallol Mallick and Jayanta Mukherjee are with the Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, WB 721302 India (e-mail: avishek.saha@gmail.com, kallol@cse.iitkgp.ernet.in, jay@cse.iitkgp.ernet.in ). Shamik Sural is the School of Information Technology, Indian Institute of Technology, Kharagpur, WB 721302 India (e-mail: shamik@sit.iitkgp.ernet.in ) N N 0 N/2 0 N/2 SKIP 16 x 8 N 16 x 16 0 1 1 8 x 16 N/2 0 1 2 3 P 8 x 8 8 x 8 8 x 4 4 x 8 4 x 4 Fig. 1 Partition sizes of macroblocks (N=16) and submacroblocks (N=8) in H.264/AVC. size 16x16, 8x16, 16x8, 8x8, 8x4, 4x8 and 4x4. Fig. 1 shows the different MB modes used in H.264. An MB with large partition size requires a single motion vector to represent its motion information. However, a single motion vector may not be able to accurately represent the motion information of the entire MB resulting in a large residual error and hence a large number of bits for encoding the transformed residual error. Again, an MB mode with small MB partitions may require more bits to represent the motion information and fewer bits to encode the residual error. Thu selection of the proper encoding mode has considerable impact on the compression efficiency of the H.264 encoder. In H.264, the MB mode with the best Rate-Distortion (RD) performance is selected as the optimal encoding mode. The optimized RD cost is obtained by using Lagrangian minimization [3]. The minimization cost function as follow J = D + λr (1) where, J is the cost, D is the distortion, λ is the Lagrangian multiplier and R is the rate. H.264 uses rate distortion optimization for both motion estimation and mode decision. To reduce the computational complexity of RD optimized H.264, several fast approaches have been proposed [4] [12]. These algorithms either speed up the motion estimation or the mode decision process of the encoder. In [4], the macroblock content complexity was employed to reduce the number of inter-modes check for each MB. The SATD-based mode selection method was used by Tanizawa et al. [5] to reduce the number of candidate modes. Subsequently, RDO was performed on the reduced number of candidates. A novel technique to limit the number of candidate MB modes to a small subset by pre-encoding a down-sampled version of the original image was proposed in [6]. The edge direction

map [7] and amplitude of the edge vector [8], which can be obtained from the Sobel operator, have been successfully used to predict the Inter and Intra MB mode respectively. The MB SKIP decision has also been taken based on the difference between the average boundary error and the average bit-rate cost of the best Inter mode [9]. The weighted cost obtained from quantized transform coefficients has been used [10] to save substantial amount of computation. In [11] and [12], motion search information has also been used to skip the checking of unlikely block sizes. This work presents three approaches to expedite the RDO process in H.264. The first two methods are based on SKIP prediction and the last method is based on reducing the number of reference frames to a single best matching frame. The mode decision process will be performed on this selected best reference frame. The first approach toward SKIP prediction uses the zero-count in transform-domain to decide on whether to SKIP or not. The transform-domain model used is the ρ- domain model [13], which has so far been used for bit estimation purposes [14]. But, to the best of our knowledge, the ρ-domain model has never been used for SKIP prediction. In the second approach, the spatio-temporal correlation properties of video sequences have been utilized to further improve the ρ- domain SKIP prediction results. The third approach reduces the number of reference frames on which the mode decision is to be performed. Only a single best reference frame is selected by performing an initial low cost matching on the multiple reference frames. The selected reference frame is then used in the mode decision process. The rest of this paper is organized as follows. Section II reviews fast rate distortion optimization in H.264. Fast RDO based on skip prediction and reduced reference frame based fast RDO have also been discussed. The proposed approaches based on SKIP prediction and reduced reference frame have been presented in Section III. Section IV compares and analyses the performance of the individual approaches as well as a combination of all the three approaches. Finally, Section V concludes this paper. II. FAST RATE DISTORTION OPTIMIZATION IN H.264 Rate-distortion optimization techniques have been widely applied to video encoders [3]. The RDO techniques result in substantial improvement in compression efficiency. In RDO, the Lagrangian minimization method is used to find the best MB mode among the modes of {INTRA4X4, INTRA16X16, INTER16X16, INTER16X8, INTER8X16, INTER8X8, INTER8X4, INTER4X8, INTER4X4, SKIP, DIRECT}. For each MB in an inter-frame, the RDO is first used to obtain the optimal motion vector by minimizing the cost function, J ( mv, λ MOTION ) = D( c( mv)) + λmotion R( mv pmv) (2) where, mv is the motion vector obtained by motion estimation, pmv is the predicted motion vector and λ MOTION is the Lagrange multiplier. R(mv-pmv) represents the motion information and D(c(mv)) is the sum of absolute differences (SAD) between the original video signal s and the coded video signal c. For multiple reference frame the final reference frame is selected by minimizing (3), J ( REF λmotion ) = D( c( REF, mv( REF))) + (3) λmotion ( R( mv( REF) pmv( REF)) + R( REF)) where, D(c(REF,mv(REF))) is the SAD between original signal s and the coded signal c from reference frame REF. Rate-distortion optimized mode selection is performed by minimizing J( c, MODE QP, λmode) = D( c, MODE QP) (4) + λmoder( c, MODE QP) where the distortion D(c,MODE QP) is measured as the sum of squared errors between the original block s and the coded block c, and QP is the quantization parameter. The relation between the Lagrangian multipliers λ MOTION, λ MODE and the quantization parameter QP, is given by: 6 λ 2 QP / MOTION = λmode = m (5) where m is a constant. The optimal motion vector and mode selection is dependant on the quantization parameter QP. The optimal mode for inter prediction is obtained by computing all the Lagrangian costs (1) for all possible subblock partitions. Moreover, the H.264/AVC encoder uses Intra_16 16 and Intra_4 4 prediction modes for intraframes. In these mode the reconstructed pixels in the adjacent blocks coded previously are used to predict the content of the macroblock. The Intra_16 16 has four mode whereas the Intra_4 4 prediction has nine modes for each 4 4 subblock. This results in a total of 144 cost checkings for each intra MB mode decision. H.264 inter-prediction uses Tree Structured Motion Compensation (TSMC) [15]. TSMC supports block sizes from 16x16 to 4x4 with the facility of dividing each block into sub-blocks. The lowest sub-block size can be 4x4. In inter-prediction, all the eight possible MB partitions and the SKIP mode are checked to select the optimal encoding mode. Usually, small partition sizes are selected for detailed regions whereas large partition sizes perform reasonably well for homogeneous regions. The actual distortion and bit consumption calculations for all candidate modes greatly increase the encoding time. Thu using RDO to select the best coding mode for each MB is the most computationally intensive process in H.264. A. Skip Prediction based Fast RDO The H.264/AVC JM encoder [1] identifies certain MBs as skipped during encoding. The encoding of these MBs are skipped by the encoder. In the decoder, the skipped MB is reconstructed by motion-compensated prediction from the current reference picture using a motion vector predicted from previously decoded motion vectors. The SKIP mode has the lowest RD cost (J) among all MB encoding modes. Moreover, checking of SKIP mode involves the lowest complexity. Thu a SKIP decision at the start of the mode decision process can substantially lower the entire encoder complexity. However, incorrect skip decisions may increase the bit rate and may also result in a loss of picture quality.

B. Restricted Reference Frame based Fast RDO In H.264, multiple reference frames are used for inter motion estimation purposes. The Lagrangian minimization based RD cost function is calculated for all reference frames. An increase in the number of reference frames also increases the computational cost of RDO. A number of reasons have been cited [16] to justify the use of more number of reference frames to obtain better predictions. However, strong correlations among the motion vectors of successive frames lead to the intuition that, the full RD-optimized mode decision search for all reference frames need not be performed and a decision can be taken based on an initial estimate of the best reference frame. MB A Frame (n 1) MB B Frame (n 1) Current MB X Frame n MB C Frame (n 1) III. PROPOSED APPROACHES TOWARD FAST RDO The RD optimized motion estimation process in H.264/AVC encoder calculates the distortion costs and the bit costs associated with each of the possible MB modes. This mode decision step consumes considerable portion of the entire encoder execution time in encoding process. The proposed schemes delineated in the following subsections intend to improve the computational complexity of the RDO process without incurring noticeable drops in reconstructed video quality. The first two methods employ SKIP prediction, whereas the last one expedites motion estimation by reducing the number of reference frames. A. RHO-domain Skip Prediction Rate-distortion optimized fast ME requires computation of the RD cost J for all modes of an MB. In order to calculate the rate and distortion of an MB mode, the DCT, quantization, inverse transform and inverse quantization operations need to be performed on the MB. The rate information is derived from the quantized transform coefficients. These operations contribute to the high computational complexity of the RDO process. However, low complexity DCT and quantization calculation may substantially bring down the RDO execution time. Reduced complexity DCT and quantization can be performed using the rate and distortion models. Information-theoretic analysis shows [22] that, a generalized Gaussian can best approximate the distribution of transformed AC coefficients of a natural image. Based on this assumption, [13] proposed a rate model called the ρ-domain model. In ρ- domain, the rate is expressed as a function of the number of zeroes among the transformed coefficients. If ρ is the percentage of zeroes among the quantized transform coefficient then it can be shown [17] that, ρ monotonically increases with the quantization step size q. This implies that there is a one-to-one mapping between ρ and q. Hence, mathematically, the coding bit rate R, which is a function [18] of the quantization step-size q, must also be a function of ρ, denoted by R(ρ). In encoder, the transform coding coefficients are quantized and then entropy encoded. So, higher the number of zeros in quantized transform coefficient the lower is the bit rate. Thu the rate calculation in ρ-domain can be expressed a R ( ρ ) = (1 ρ) θ (6) Fig. 2 Prediction from neighboring macroblocks in the previous frame It is to be observed that, when ρ tends to 1, R tends to zero. This essentially implies the SKIP mode, since in this mode no motion vector or distortion information is encoded. The decoder predicts the motion vector from the MVs of the previously decoded neighboring macroblocks. This observation provides the motivation to predict the SKIP mode from the Sum-of-Absolute-Transformed-Difference (SATD). If the number of zeroes in the SATD of a block is above a pre-defined threshold, a SKIP mode is predicted for the MB. The threshold value is determined empirically. Proper choice of the threshold parameter keeps the PSNR drop within acceptable limits. The proposed method predicts the SKIP mode based on a single SATD calculation and avoids the mode decision calculations for the remaining modes. Thu the SKIP prediction saves a large amount of computation. Especially for slow moving video where the percentage of SKIP mode MB is particularly high, the proposed method results in substantial improvement of the motion estimation complexity. B. Spatio-Temporal Skip Prediction In video frame the collocated macroblocks are highly correlated due to spatial homogeneity in the neighboring regions. Fig. 2 shows the neighboring MBs 'A', 'B' and 'C' in the (n-1) th frame, of the current macroblock 'X' in the n th frame. It has been shown [19] that, modes of macroblocks A, B and C in previous frame, have a high correlation with the best mode for the current macroblock. Moreover, [20] shows that the rate-distortion cost function (J) between neighboring blocks is highly correlated. These observations provide the necessary motivation to predict the SKIP mode of an MB from the spatio-temporal characteristics of neighboring MBs in the current and the previous frames. First, the proposed algorithm checks the mode selected for the MBs A, B and C, in the previous frame. If majority of them (can be parameterized as at least 2 or all 3) have SKIP as the best mode, the algorithm moves to the next step. It then finds out the SATD cost for SKIP mode of all the three MBs - A, B and C and also finds out the minimum SKIP cost among these three MBs. If the minimum SKIP cost is less than an empirically determined threshold value, then the

mode for current MB in the current frame is predicted as SKIP. The SATD costs of all MBs in the current frame are calculated and stored as reference for processing of the next frame. Let, SKIP(MB i,n ) = 1, if the best mode from the i th MB in the n th frame is SKIP = 0, otherwise Using the above notation, the spatio-temporal SKIP prediction algorithm can be briefly described as follows: SKIP(MB i,n ) = 1 if (i) Σ SKIP(MB m,n-1 ) = 3 (or >= 2, as per parameter), where the sum is over m; and m Є N i,n-1 and (ii) min {SATD(MB m,n-1 )} < T, where the min is over m and m Є N i,n-1 and T is a threshold. SATD(MB i,n ) is the SATD cost value calculated for SKIP mode for i th MB in frame n. Moreover, N i,n = {m: m denotes the MB number for neighboring MB A, B or C for i th MB in frame n.} Since the MBs in the first column do not have Neighboring MB A, the algorithm is disabled for all the MBs in first column of the frame. Similarly, MBs in the first row do not have Neighboring MB B and C ; the algorithm is disabled for all the MBs in the first row of the frame. This enables the algorithm to start its prediction from optimally found best mode, rather than predicted best mode. The algorithm is also not used in case of last column (MB C is absent). The only drawback of this approach is its positive error accumulation. An incorrectly predicted SKIP mode for an MB in the n th frame has a positive bias on the SKIP prediction of the (n+1) th frame. Hence, the error gets accumulated over frames and the PSNR drops. To overcome this drawback, the algorithm is switched off for every n th frame. Experimentally, a value of n=5 is found to perform well for fast moving video streams. C. Restricted Reference Frame In H.264 motion estimation, the encoder searches for MVs in multiple reference frames (max 16, as per JM10.2 reference implementation) to obtain the best mode decision. For each reference frame, the motion estimation algorithm is executed for all possible modes of an MB. This essentially increases the motion estimation time in multiples of the number of reference frames used. Our proposed algorithm selects a single reference frame, based on an initial estimate of the Rate-Distortion (RD) cost for the reference frame. Then the selected reference frame is used to find the best mode for the MB. By the principle of spatial correlation, it can be said that, neighboring MBs tend to have similar motion vector and mode decision. Hence, for similar reason it can be posited that the neighboring MBs should tend to use the same reference frame. Based on this spatio-temporal correlation, a predicted motion vector (which is also used for SKIP cost calculation) is used as the center of the motion vector search region. This idea has been extended to predict the minimum RD cost for the best mode of the MB, for a particular reference frame. SKIP RD Cost of an MB is defined as the RD Cost, assuming SKIP mode for the MB, for a particular reference frame. During SKIP mode, a motion vector predictor is used, which predicts the Motion Vector for the MB based on the MVs of the neighboring macroblocks. The simplest predictor used is the median of motion vectors of the neighboring macroblocks. For a moving object in a video frame, if the particular MB is part of the same object as its neighboring MB the motion vector and reference frame for the current MB will be similar to its neighboring ones. Since in SKIP mode, the motion vector predicted from neighboring MBs is used, this SKIP RD cost for this MB will be minimum among all reference frames. However, if MB is not a part of the same object in motion, the SKIP RD cost will be high. So, the reference frame, which gives minimum SKIP RD Cost, will also give the minimum RD cost among all modes. In the proposed approach, the SKIP RD cost of the MB is calculated first for all the reference frames. The reference frame having the minimal SKIP RD Cost is predicted as the best reference frame and the full motion estimation and mode decision is carried out only on this reference frame. Experimental results show a substantial improvement in motion estimation time, with PSNR drop within acceptable limits. This is particularly effective for fast moving video where the speed-up due to previously proposed SKIP prediction based algorithm is low, since fast moving videos have a less percentage of SKIP modes. IV. RESULTS Our experiments were performed on the JM 10.2 reference implementation [21] of H.264 2 [1]. The code was compiled using MS Visual C++.Net on Windows XP (SP2) platform. The results for performance analysis were collected for different bitrates by varying the Quantization Parameter (QP) from 10 to 30, in steps of 2. The simulations have been performed on the luminance component of the popular video sequences listed in Table 1. These sequences consist of different degrees and types of motion and are in QCIF (176 144), CIF (352 288), SIF (352 240) and CCIR601 (720 486) formats. The first two sequence namely, Container and Foreman, are in QCIF format. The next two sequences are Stefan in CIF format and Football in SIF format. Tennis and Garden is in CCIR601 format. Among these sequence Container has gentle, smooth and low motion change and consists mainly of stationary and quasistationary blocks. Foreman has moderately complex motion and hence is categorized as "medium" motion content sequences. Rigorous motion based on camera panning with translation and complex motion content can be found in the sequences Stefan, Football, Tennis and Garden. Image sequences are always IPPPPP and no B frames were used. 2 Encoder parameter configuration: Profile 100, Level 40, Period of I- Frames = 10, Quantization parameter for I and P Slices (0-51) = 10 to 30 in steps of 2, No frames skipped, Subpixel motion estimation disabled, Hadamard enabled, CABAC entropy coding enabled, ±16 search range (no search range restriction), Number of previous references frame = 5, All MB size InterSearch enabled, No B-frame used, SP-Picture Periodicity disabled, Entropy coding method = CABAC, RD-optimized mode decision enabled

Table 1 Test sequences used in analysis Name Format Frames Motion Container QCIF (176x144) 300 Low Foreman CIF (352x288) 300 Medium Stefan CIF (352x288) 89 High Football SIF (352x240) 124 High Tennis ITU CCIR601 151 High (720x480) Garden ITU CCIR601 (720x486) 199 High The ρ-domain (RHO), the spatio-temporal (SPT5.3), the reduced reference frame (RRF) approaches and their combination (COMB) have been compared with the baseline (ORG) JM 10.2 reference encoder. The comparisons have been made in terms of three metric namely, (a) PSNR, (b) Bitrate, and (c) SpeedUp Factor (SUF). The PSNR drop and the bitrate give an idea of the loss in prediction quality and the loss in compression efficiency. The speed up factor denotes the reduction in computational complexity. Table 2 presents the results of the experiments. A. ρ-domain SKIP Prediction Results The ρ-domain results have been presented in Table 2. As can be seen, the maximum PSNR drop obtained by the ρ-domain is only about 0.05 db. This shows that the ρ-domain zero count is a good approximation for SKIP prediction. The ρ-domain results are particularly encouraging at low bitrate where the prediction quality obtained by ρ-domain SKIP prediction is even better than the reference encoder implementation of H.264, for the sequence Container. For the fast moving sequence Stefan, ρ-domain SKIP prediction results in no loss in prediction quality. In addition, it is to be noted that, for both Foreman and Stefan, the ρ-domain prediction results in increased compression. Thu at low bitrate Stefan has no loss in prediction quality, better compression and increased speedup. Similar improvements can be noticed for other test cases as well. The highest ρ-domain PSNR drop can be observed for Tennis and Garden. However, even the highest ρ-domain PSNR loss is less than 0.1 db. The average of the ρ-domain results taken over all test sequences at both low and high bitrates show a loss in prediction quality of only about 0.033 db. This quality loss is accompanied by an improvement in compression efficiency. B. Spatio-temporal SKIP Prediction Results Table 2 also shows the performance comparison of SPT5.3 with ORG, RHO, RRF and COMB schemes. The main advantage of SPT5.3 over other approaches is its increased compression efficiency. In most case the coding efficiency of SPT5.3 is very close to that of the RHO approach. Although the loss in prediction quality is more than RHO, this loss is well within the MPEG limit of 0.5dB. SPT5.3 intends to enhance the performance of RHO and hence it predicts SKIP modes from spatially and temporally neighboring MBs. However, this prediction is made over and above the SKIP predictions of the ρ-domain approach. This accounts for the increased PSNR drop of SPT5.3 as compared to the RHO approach. However, reducing the number of frames in the resetting interval brings down this PSNR drop. The average loss in prediction quality for the SPT5.3 scheme is the highest among all the three proposed strategies. However, it is to be noted that the average compression efficiency of SPT5.3 is also highest among the proposed approaches. This increased compression is accompanied by higher speedups as compared to the averaged ρ-domain results. C. Restricted Reference Frame Results The Restricted Reference Frame (RRF) approach has been compared with ORG, RHO and SPT5.3 in Table 2. From the tabulated result it can be concluded that the RRF approach is extremely advantageous in terms of speedup. For most test sequence the RRF approach reduces the computational complexity by 5-8 times with marginal loss in prediction quality. The only disadvantage is its increased bitrate. This is understandable since the major motivation [16] behind the use of more than one reference frames is the strong correlation between the motion vectors in multiple reference frame which results in better prediction quality and hence higher compression efficiency. Hence, low number of reference frames accounts for the increase in bitcount. In terms of prediction quality, the RRF performs better at high bitrates. The average RRF results have the highest speedup with acceptable quality loss. The drawback of reducing the number of reference frames is an increase in the number of bits. An accurate estimate of the reference frame reduces this increase in bit information. D. Combined Results All the three aforementioned SKIP prediction schemes perform better in one aspect or the other. The ρ-domain SKIP prediction has the best prediction quality. Spatio-temporal SKIP prediction results demonstrate the highest compression efficiency. And finally, the restricted reference frame based SKIP prediction leads to the best coding efficiency. This observation provides the motivation to combine these schemes and observe the combined results. For all the test sequence the combined results have the highest speedup, the highest PSNR drop and the lowest compression efficiency. However, it is to be noted that the increase in speedup is substantial, as compared to the baseline reference (JM10.2) implementation. Moreover, this increased coding efficiency comes at an expense of negligible loss in quality and marginal increase in bitrate. As already mentioned, the maximum quality loss is well within the MPEG tolerance limit of 0.5 db. Moreover, at low bitrate the increase in bit count is very small. Thu the combination of the three proposed approaches has much better performance than the reference JM implementation, particularly at low bitrates.

Table 2 PSNR, Bit rate and SUF comparison of the proposed approaches Input Parameters Method PSNR Δ PSNR Bitrate Δ Bitrate SUF (in db) (in db) (in kbps) (in kbps) ORG 50.01-746.70-1 ±16 RHO 50.00-0.01 747.05 +0.35 1.02 30 fps SPT5.3 49.71-0.30 743.25-3.45 1.02 10 QP RRF 49.84-0.17 771.89 +25.19 5.92 Container COMB 49.51-0.50 766.98 +20.28 6.10 QCIF ORG 34.59-25.87-1 ±16 RHO 34.61 +0.02 25.99 +0.12 1.01 30 fps SPT5.3 34.42-0.17 24.89-0.98 1.48 30 QP RRF 34.49-0.01 28.83 +2.96 5.11 COMB 34.29-0.30 27.12 +1.25 7.68 ORG 49.96-6254.26-1 ±16 RHO 49.89-0.07 6251.76 +2.5 1.008 30 fps SPT5.3 49.91-0.05 6251.60 +2.66 1.01 10 QP RRF 49.77-0.19 6557.77 +303.51 5.59 Foreman COMB 49.77-0.19 6560.94 +306.68 5.6 CIF ORG 35.17-277.81-1 ±16 RHO 35.15-0.02 277.44-0.37 0.99 30 fps SPT5.3 34.51-0.66 276.82-0.99 1.1 30 QP RRF 34.94-0.23 314.13-36.32 5.86 COMB 34.94-0.23 313.89-36.08 5.83 ORG 49.93-8866.14-1 ±16 RHO 49.92-0.01 8861.51-4.63 1.02 30 fps SPT5.3 49.92-0.01 8861.51-4.63 1.02 10 QP RRF 49.76-0.17 9209.60 +343.46 5.67 Stefan COMB 49.76-0.17 9210.46 +344.32 5.66 CIF ORG 34.20-733.50-1 ±16 RHO 34.20 0.00 731.99-1.51 1.04 30 fps SPT5.3 33.83-0.37 731.88-1.62 1.04 30 QP RRF 34.00-0.20 826.62 +93.12 6.07 COMB 33.63-0.57 826.50 +93.00 6.11 ORG 49.59-10531.94-1 ±16 RHO 49.59 0.00 10531.94 0.00 1.01 30 fps SPT5.3 49.59 0.00 10531.94 0.00 1.02 10 QP RRF 49.49-0.10 10995.61 +463.67 5.80 Football COMB 49.49-0.10 10995.61 +463.67 5.85 SIF ORG 31.94-1364.22-1 ±16 RHO 31.92-0.02 1364.00-0.22 1.02 30 fps SPT5.3 31.89-0.05 1363.66-0.56 1.06 30 QP RRF 31.73-0.21 1517.11 +152.89 5.4 COMB 31.68-0.26 1515.92 +151.7 5.55 ORG 47.47-35408.55-1 ±16 RHO 47.38-0.09 35397.82-10.73 1.01 30 fps SPT5.3 47.38-0.09 35397.82-10.73 1.01 10 QP RRF 47.28-0.19 35889.70 +481.15 5.20 Tennis COMB 47.22-0.25 35889.70 +481.15 5.32 ITU CCIR601 ORG 35.60-6170.26-1 ±16 RHO 35.52-0.08 6164.68-5.58 1.02 30 fps SPT5.3 35.52-0.08 6164.68-5.58 1.04 30 QP RRF 35.48-0.12 6291.86 +121.6 5.67 COMB 35.42-0.18 6288.26 +118.0 5.80 ORG 49.94-58568.16-1 ±16 RHO 49.88-0.06 58559.55-8.61 1.01 30 fps SPT5.3 49.87-0.07 58559.55-8.61 1.02 10 QP RRF 49.81-0.13 59041.04 +472.88 5.84 Garden COMB 49.71-0.23 59039.31 +471.15 5.88 ITU CCIR601 ORG 33.29-10649.24-1 ±16 RHO 33.24-0.05 10641.83-7.41 1.02 30 fps SPT5.3 33.23-0.06 10641.81-7.43 1.06 30 QP RRF 33.18-0.11 10783.05 +133.81 5.89 COMB 33.13-0.16 10781.78 +132.54 6.02 Avg. RHO - -0.033 - -3.007 1.015 Avg. SPT5.3 - -0.161 - -3.493 1.073 Avg. RRF - -0.153 - +213.16 5.668 Avg. COMB - +0.262 - +212.31 5.95

As can be seen in Table 2, the average of the COMB results has the highest drop in prediction quality. The combined effects of SPT5.3 and RRF have resulted in an overall PSNR loss of 0.26 db, which is well within the MPEG limit of 0.5 db. The average speedup achieved in about 6 times the baseline implementation. Moreover, the average increase in bit information for COMB is less as compared to the average RRF results. Figs. 3-6 show the Rate-Distortion curves of the proposed scheme tested on the aforementioned test sequences. In most case the RD-curve of ρ-domain almost coincides with the ORG RD-curve. This shows that the ρ- domain zero count is a good approximation for SKIP prediction. For the sequence Container, the RD-curves at low bitrates are extremely close to one another. Hence, for Container, the proposed approaches perform extremely well at low bitrates. Similar conclusions can be drawn from the RD-curves of Foreman. Except for the SPT5.3 scheme, which has a particularly high PSNR drop at low bitrate the other RD-curves faithfully follow the baseline (ORG) RD curve. As can be seen, the RD-curves of both Container and Foreman converge at low bitrates. However, in case of Stefan, the RD-curves demonstrate more or less uniform performance over the entire bit-range. The PSNR drops are identical at both low and high bitrate. In Football, the RD-curves of ORG, RHO and SPT5.3 are super-posed onto a single curve, whereas the RD-curves of RRF and COMB super-pose onto another curve. Similar to Stefan, the RD-curves for Foreman also exhibit identical PSNR drop for both low and high bit-rates. V. CONCLUSION This paper has presented new schemes for SKIP prediction based fast motion estimation with applications in H.264. Three different approaches were proposed. The first approach utilizes the ρ-domain rate control model for counting the number of zeroes in the sum-of-absolute transformed-differences (SATD). SKIP MB mode was predicted based on this zero count. Spatial and temporal correlation among the collocated MBs in the same frame as well as the previous frame forms the basis of the next approach. Finally, the multiple numbers of reference frame on which the mode search is performed, were reduced based on an initial low-complexity matching cost. It was observed that, each of the proposed approaches individually exhibited the best performance in terms of either prediction quality or compression efficiency or coding efficiency. Subsequently, all three approaches were merged to generate the combined results. The experimental results on standard test sequences demonstrate substantially high speedup with marginal loss in bitrate and prediction quality. REFERENCES [1] T.Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video Technol., July 2003. [2] J. Ostermann, J. Borman P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, Video coding with H.264/AVC: tool performance and complexity, IEEE Circuits Syst. Mag., vol. 4, no. 1, pp. 7 28, Jan. 2004. [3] A. Ortega and K. Ramchandran, Rate-distortion methods for image and video compression, IEEE Sig. Pro. Mag., no. 1, pp. 23 50, Jan. 1998. [4] M.Yang and W. Wang, Fast macroblock mode selection based on motion content classification in H.264/AVC, in Proc. IEEE Int. Conf. Img. Proc. (ICIP04), no. II, pp. 741-744, 2004. [5] A. Tanizawa, S. Koto, T. Chujoh and Y. Kikuchi, A study on fast rate-distortion optimized coding mode decision for H.264, in Proc. of IEEE Int. Conf. Img. Proc. (ICIP04), pp. 741-744, 2004. [6] Q. Dui, D. Zhu and R. Ding, Fast Mode Decision For Inter Prediction in H.264, in Proc. of IEEE Int. Conf. Img. Proc. (ICIP04), pp. 119-122, 2004. [7] F. Pan, X. Lin, R. Susanto, K.P. Lim, Z.G. Li, G.N. Feng, D.J. Wu, and S. Wu, Fast mode decision for intra prediction, in Joint Video Team (JVT) JVT-G013, Mar. 2003. [8] K.P. Lim, S. Wu, D.J. Wu, S. Rahardja, X. Lin, F. Pan and Z.G. Li, Fast inter mode selection, in Joint Video Team JVT-I020, Sep. 2003. [9] T.Y. Kuo and C.H. Chan, Fast Macroblock Partition Prediction for H.264/AVC, in Proc. IEEE Int. Conf. Multimedia and Expo (ICME04), no. I, pp. 675-678, June 2004. [10] Y.H. Kim, J.W. Yoo, S.W. Lee, J. Shin, J. Paik and H.K. Jung, Adaptive mode decision for H.264 encoder, Electron. Lett., vol. 40, no. 19, Sep. 2004. [11] P. Yin, H.Y. Cheong, A.M. Tourapis and J. Boyce, Fast mode decision and motion estimation for JVT/H.264, in Proc. Int. Conf. Imag. Proc. (ICIP03), vol. 3, pp. 853-856, 2003. [12] Y.K. Tu, J.F. Yang, M.T. Sun and Y. Tsai, Fast variable-size block motion estimation for efficient H.264/AVC encoding, Signal Processing: Image Comm., vol. 20, no. 7, pp. 595-623, Aug. 2005. [13] Z. He and S.K. Mitra, A linear source model and a unified rate control algorithm for DCT video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 11, pp. 972-982, 2002. [14] H. Kim and Y. Altunbasak, Low-complexity macroblock mode selection for H.264-AVC encoder in Proc. of IEEE Intl. Conf. Img. Proc. (ICIP04), pp. 765-768, 2004. [15] T. Wiegand, M. Flierl and B. Girod, Entropy-constrained design of quadrate video coding scheme in Proc. of the 6th Intl. Conf. on Img. Proc. and its Apps., vol. 1, pp. 14-17, 1997. [16] Y. Su and M.T. Sun, Fast multiple reference frame motion estimation for H.264, in Proc. of IEEE Int. Conf. Mult. Expo. (ICME03), pp. 695-698, 2004. [17] Z. He and S.K. Mitra, ρ-domain source modeling and rate control for video coding and transmission, in Proc of the IEEE Intl. Conf. on Acous. Spch. Sig. Pro (ICASSP01), vol. 3, pp. 1773-1776, 2001. [18] J.R. Corbera and S. Lei, Rate control in DCT video coding for lowdelay communication IEEE Trans. Circuits Syst. Video Tech., vol. 9, pp. 172-185, 1999. [19] R. Arminato, R. Schafer, F. Kitson and V. Bhaskaran, Linear predictive coding of motion vector in Proc. IS&T SPIE EI 1996. [20] Y.V. Ivanov and C.J. Bleakley, Skip Prediction and Early Termination for Fast Mode Decision in H.264/AVC, in Proc. of International Conference on Digital Telecomm. (ICDT06), 2006. [21] H.264 JVTModel JM10.2, http://iphome.hhi.de/suehring/tml/ [22] E.Y. Lam and J.W. Goodman, A mathematical analysis of the DCT coefficient distributions for image IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 10, pp. 1661-1666, 2000. ACKNOWLEDGMENT We would like to thank Dr. L. M. Po of Dept. of Electronic Engineering, City University of Hong Kong, for supplying us with the fast moving sequences in ITU CCIR601 format.

52 50 ORG RHO SPT5.3 RRF COMB 48 46 PSNR (in db) 44 42 40 38 36 34 0 100 200 300 400 500 600 700 800 Bitrate (in kbps) Fig. 3 RD Curve for QCIF Container 50 48 ORG RHO SPT5.3 RRF COMB 46 44 PSNR (in db) 42 40 38 36 34 0 1000 2000 3000 4000 5000 6000 7000 Bitrate (in kbps) Fig. 4 RD Curve for CIF Foreman

50 48 ORG RHO SPT5.3 RRF COMB 46 44 PSNR (in db) 42 40 38 36 34 32 50 48 46 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Bitrate (in kbps) Fig. 5 RD Curve for CIF Stefan ORG RHO SPT5.3 RRF COMB 44 PSNR (in db) 42 40 38 36 34 32 30 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 Bitrate (in kbps) Fig. 6 RD Curve for SIF Football