A Linear Source Model and a Unified Rate Control Algorithm for DCT Video Coding

Size: px

Start display at page:

Download "A Linear Source Model and a Unified Rate Control Algorithm for DCT Video Coding"

Shawn Holt
5 years ago
Views:

1 970 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 A Linear Source Model and a Unified Rate Control Algorithm for DCT Video Coding Zhihai He, Member, IEEE, and Sanjit K. Mitra, Life Fellow, IEEE Abstract We show that, in any typical transform coding systems, there is always a linear relationship between the coding bit rate and the percentage of zeros among the quantized transform coefficients, denoted by. Based on Shannon s source coding theorem, a theoretical justification is provided for this linear source model. The physical meaning of the model parameter is also discussed. We show that it is directly related to the image content and is a measure of picture complexity. In video coding, we propose an adaptive estimation scheme to estimate this model parameter. Based on the linear source model and the adaptive estimation scheme, a unified rate control algorithm is proposed for various standard video coding systems, such as MPEG-2, H.263, and MPEG-4. Our extensive simulation results show that the proposed rate control outperforms other algorithms reported in the literature by providing much more accurate and robust rate control. Index Terms Linear rate model, rate control, rate-distortion control, Shannon s source coding theorem, video coding. I. INTRODUCTION THE OUTPUT bit rate of the video encoder varies dramatically over time due to scene activities. In visual communication over narrow-band or time-varying channels, rate control is very important to ensure the successful transmission of coded video data through communication channels. In real-time video communications, such as video conferencing, video phone, interactive classrooms, and real-time Web cast, the end-to-end delay for video transmission has to be very small, which requires more accurate and robust rate control. In standard image and video coding, the output bit rate is controlled by the quantization parameter of the video encoder. In this work, we denote the quantization parameter by. For a uniform quantizer, represents the quantization step size. For a perceptual quantizer with a quantization matrix as in JPEG [1] image and MPEG [2] video coding, represents the quantization scale factor. The relationship between and is described by the rate-quantization ( - ) function, denoted by. The key issue in rate control is to estimate. Once is available, to achieve the target coding bit rate, we just select the corresponding quantization parameter. Manuscript received June 5, 2001; revised November 1, This paper was recommended by Associate Editor H. Sun. Z. He was with the Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA USA. He is now with Sarnoff Corporation, Princeton, NJ USA ( zhe@sarnoff.com) S. K. Mitra is with the Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA USA ( mitra@iplab.ece.ucsb.edu). Digital Object Identifier /TCSVT A. Brief Review of Rate Control Algorithms The rate-distortion ( - ) formula for a simple quantizer has been established for a long time [3], [4]. However, in a typical transform coding system, for example, JPEG coding [1], such type of analytic entropy formula does not work, especially at low bit rates [5]. In Fig. 1, we plot the actual JPEG coding bit rates and the entropies of the quantized discrete cosine transform (DCT) coefficients for images Lena and Peppers at different quantization scales. It can be seen that the relative error between them is very large. In the classical - formula, the only parameter which describes the input source is the variance of the source data. It is well known that variance itself is far insufficient to characterize the input source data and to determine the final coding bit rate. In addition, the analytic entropy formula does not take into account the coding behavior of a specific coding algorithm. For example, the wavelet-based image compression algorithm proposed in [6] is much more efficient than the JPEG coding [1]. Therefore, even for the same image, different coding algorithms have quite different - functions. Since the analytic entropy formula does not work well in practical coding applications, more sophisticated rate formulas have been developed and used in the literature and applied to rate control for video coding applications [5]. To estimate the rate function more accurately, some operational approaches have also been proposed. In [7], the - curve is modeled by an exponential formula coupled with several control parameters. The model parameters are then estimated from the coding statistics generated by re-encoding of the input video. Obviously, this algorithm has very high computational complexity. In video coding, the coding statistics of the previous frames can be utilized to estimate the model parameters of the current frame, as in the VM7 rate control algorithm [8], [9]. In this way, the computational complexity is reduced. However, in this approach, it is assumed that the neighboring frames have the same - characteristics. Unfortunately, this is not true at scene changes. Therefore, this approach often suffers from performance degradation at scene changes [10]. The empirical estimation of the model parameter can also be carried out at the macroblock (MB) level, as in the TMN8 rate control algorithm [10], [11]. With an adaptive quantization scheme at the MB-level, the TMN8 algorithm has a superior rate control performance when compared to the rate control in VM7. However, due to the limited accuracy of its source model, it also suffers from performance degradation for videos with high motion. Furthermore, the TMN8 rate control algorithm is mainly designed for pictures. For pictures, a very rough approximation for the coding bit rate is used. Therefore, the TMN8 algorithm is mainly used for H.263 video coding. It /02$ IEEE

2 HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 971 Fig. 1. (a) Entropy and the actual JPEG coding bit rate for images (a) Lena and (b) Peppers. (b) does not work well in MPEG-2 coding which has at least one picture in each group of pictures (GOP). B. Proposed Source Model and Rate Control Scheme To the best of our knowledge, all the source models reported in the literature [5], [7], [8], [10] try to find the best expression for the coding bit rate in terms of the quantization parameter. In other words, the rate function is defined and modeled in the domain. In order to improve the accuracy of the source model, the expression of becomes more and more complicated as reported in the literature. It is well known that zeros play a key role in transform coding of images and videos. The high compression ratio in transform coding is mainly achieved by efficient coding of zeros [1], [2], [6], [13], [15]. Let be the percentage of zeros among the quantized transform coefficients. Note that monotonically increases with, which implies there is a one-to-one mapping between them. (Here, we assume that the transform coefficients have a continuous and positive distribution.) Hence, mathematically, the coding bit rate is also a function of, denoted by. In this work we propose to study the rate function in the domain instead of the traditional domain. Based on our extensive simulations and theoretical justification, we have found out that in all typical transform coding systems, such as the wavelet-based image compression [6], [16], [17], JPEG image coding [1], MPEG-2 [2], H.263 [13], and MPEG-4 video coding [15], is always a linear function where is a constant. This leads to a unified source model for all typical transform coding systems. We have also observed that the only model parameter is directly related to the image content. Its physical meaning is also discussed in this work. For video coding, we develop an adaptive scheme to estimate the value of. Once is estimated, the rate curve can be constructed (1) by (1). As a consequence, a unified linear rate control algorithm is proposed here for MPEG-2, H.263, and MPEG-4 video coding. The proposed algorithm is conceptually simple and has low computational complexity. However, it outperforms other rate control algorithms reported in the literature by providing more accurate and robust rate control. The paper is organized as follows. In Section II, we present the -domain - analysis technique. Based on our extensive simulation results, we introduce the linear rate model in Section III. In Section IV, based on Shannon s source coding theorem, we provide a theoretical justification for the linear rate model. The physical meaning of the model parameter is discussed in Section V. A unified rate control algorithm for video coding is proposed in Section VI. The experimental results and performance comparison with other rate control algorithms are presented in Section VII. Finally, some concluding remarks are given in Section VIII. II. -DOMAIN RATE ANALYSIS As discussed in Section I, if we assume the transform coefficients have a positive distribution, obviously, there is a one-to-one mapping 1 between and. Therefore, any function in the domain can be mapped into the domain and vice versa. This mapping can be easily computed from the distribution of the transform coefficients. Let us consider the H.263 encoder [14] as an example. Let and be the step size and dead zone threshold of the quantizer, respectively. In H.263 coding implementation, is for intra MBs and for inter MBs. Let and be the distributions of the DCT coefficients in the intra and inter MBs, respectively. (In H.263 codec implementation, the DCT coefficients have integer values. Therefore, and actually are histograms of 1 In fact, the distribution of transform coefficients is nonnegative. However, this does not affect the proposed R-D analysis framework and rate control algorithm. This is because zero frequency of a sample value means no such transform coefficients existing in the picture data.

972 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 Fig. 2. Sample images for wavelet image coding. the DCT coefficients.

3 972 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 Fig. 2. Sample images for wavelet image coding. the DCT coefficients.) For any, the corresponding percentage of zeros can be obtained as follows: where is the number of coefficients in the current video frame. The above computation only involves several additions. In the MPEG-2 quantization scheme, a perceptual quantization matrix, denoted by, is employed [2]. In this case, we first scale each DCT coefficient by its perceptual weight. After scaling, the perceptual quantization becomes uniform. We then generate the distribution of the scaled DCT coefficients and compute the mapping using a formula similar to (2). In our rate control algorithm, this mapping is stored as a look-up table. The mapping of the rate curve between the and the domains is performed by table look-up and bi-linear interpolation. In the proposed -domain source modeling approach, we first estimate the rate function in the domain, then map it to the domain to obtain the - function, if necessary. III. LINEAR SOURCE MODEL FOR TRANSFORM CODING The great advantage of -domain analysis is that the rate function is linear in the domain. To show this, a series of experiments have been performed as described in this section. A. Linear Source Model for Wavelet-Based Coding We randomly select 24 sample images which have a wide range of - characteristics. The sample images are shown in Fig. 2. Each sample image is first decomposed by a fivelevel dyadic scheme with the 9/7 wavelet [19]. The decomposed image is uniformly quantized with a step size and then coded by the set partitioning in hierarchical trees (SPIHT) [6] algorithm. For each, we compute the corresponding percentage of zeros and record the coding bit rate. By varying the quantization step size, we can generate a series of points on the rate curve, which are plotted in Fig. 3. It can be seen (2) that is almost a straight line. In addition, this line passes through the point. This is because, when is 1.0, all the coefficients are quantized to zeros and the corresponding coding bit rate should also become zero. Fig. 3 shows that the linear source model given by (1) is true for practical wavelet image coding. Experiments over many other images with other wavelet-based coding algorithms, such as the zeros-tree [16] and the stack-run (SR) [17] coding algorithms, also yield similar results. Due to page limitation, they are omitted here. For comparison, in Fig. 4, we plot the rate curve in the domain for each sample image. It can be seen that for different images the patterns of are quite different from each other. In addition, has a very complex nonlinear behavior. This image-dependent variation and nonlinear behavior make it very hard to develop an accurate and robust source model in the domain. However, in the domain, the rate curve is a linear function which is extremely simple. This is the great advantage of the proposed -domain rate analysis. B. Linear Source Model for DCT-Based Video Coding Next we show the linear source model also holds in various video coding systems. With the H.263 video codec [14], we encode the test video sequence at a series of quantization stepsizes. Let be the coding bit rate which excludes the motion vectors (MVs) and the header information bits. It should be noted that the amount of bits for MV and header information is already fixed before rate control and quantization. We can not change it during the rate control process. In Fig. 5, we plot for several frames from the Foreman video sequence. It can be seen that is approximately a linear function, as described by (1). To demonstrate this linear relationship more systematically, we study the correlation coefficient between and, denoted by. In Fig. 6, we plot the values of for each frame in Akiyo and News coded by MPEG-4, Carphone, Salesman, Coastguard, and Tabeltennis coded by MPEG-2. It should be noted that in MPEG-4 does not

4 HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 973 Fig. 3. Linear relationship between the percentage of zeros and the coding bit rate R in wavelet image coding with SPIHT. The x axis represents while the y axis represents R. All the plots share the same coordinate system. Fig. 4. Plot of the rate curve R(q) in the q domain for each sample image coded by SPIHT. The x axis represents q, while the y axis represents R. All the plots share the same coordinate system. include the bits for shape information. From Fig. 6, it can be seen that is always larger than For most of the frames, it is even larger than which is extremely close to 1. This implies that the linear relationship between and also holds in MPEG-2 and MPEG-4 video coding. In summary, our extensive simulation results demonstrate that the linear source model given by (1) is a unified source model for all typical transform coding systems, such as SPIHT, zero-tree, and JPEG image coding, MPEG-2, H.263, and MPEG-4 video coding. [20], [21] indicates that transform coefficients have a generalized Gaussian distribution given by where (3) IV. HEURISTIC JUSTIFICATION It is well known that the asymptotic - behavior of a coding system is characterized by Shannon s source coding theorem [3], [4]. Based on this theorem, we provide a heuristic justification for the -domain linear rate model in (1). The literature (4) where is the standard deviation of the transform coefficients and is a model parameter which controls the shape of the

5 974 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 Fig. 5. QCIF video. Linear relationship between the percentage of zeros and the coding bit rate R in video coding with H.263. The test frames are from the Foreman Fig. 6. The correlation coefficient (inverse) of each frame between the coding bit rate R and in MPEG-4 and MPEG-2 video coding. distribution. For example, when, becomes a Gaussian distribution given by When, becomes a Laplacian distribution given by For DCT-based image/video coding, Lam and Goodman [22] have mathematically shown that the DCT coefficients have a Laplacian distribution. In the following, before studying the (5) (6) generalized Gaussian source, we first consider its two special cases: the Laplacian and the Gaussian sources. A. Laplacian Source For the Laplacian source, let us define the distortion measure as where is the input symbol and is the output symbol of the quantizer. According to Shannon s source coding theorem [4], [23], if a distortion is allowed, the minimum number of bits needed to represent a symbol from a Laplacian source is given by (7)

6 HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 975 TABLE I AVERAGE PERCENTAGE OF ZEROS FOR A WIDE RANGE OF CODING BIT RATES First we consider a uniform quantizer. For a given quantization step size, by definition, the corresponding distortion is Fig. 7. Plots of the function R() given by (16) for different dead zone threshold values. With (6), from Appendix A we have Note that the percentage of zeros is given by (8) (9) (10) is negligible when compared with the linear term. Therefore, theoretically, is an approximately linear function. The above mathematical formulation is for the uniform quantizer with a dead zone threshold of. In image and video coding, a uniform threshold quantizer with a larger dead zone is often used to produce more quantized zeros in order to reduce the coding bit rate. Suppose the dead zone threshold is where is some positive constant. The corresponding quantization distortion is given by After changing the independent variable from to, (9) becomes (14) (11) In this case, the percentage of zeros is given by With (7) and (11), we have (15) (12) With (14), (16), and (7), the expression of becomes which is the rate function in the of (12) yields domain. A Taylor expansion (13) In our extensive simulations we observe that, in transform coding of images and videos at low bit rates, the corresponding percentage of zeros is mostly larger than 70%. To show this, in Table I we list the average percentage of zeros among the quantized DCT coefficients for a wide range of coding bit rates. The four test videos are Carphone, Akiyo, Foreman, and Football in QCIF format coded at 15 frames per second (fps). The last row of the table lists the corresponding peak signal-to-noise ratio (PSNR) for 960 kbps. We can see that these bit rates and PSNR values are much higher than those required in practical video coding applications. Even for 960 kbps, the average value of is still above 70%. Therefore, for practical purposes, we assume is normally larger than 70%. In other words, is less than 0.3, which implies that the nonlinear term in (13) is less than 0.027, which where (16) (17) We plot for, 0.5, and 0.75 in Fig. 7. It can be seen that the plots are all very close to being straight lines. This justifies the linear rate model given by (1) for the Laplacian source. B. Gaussian Source Next we consider the Gaussian source which is another special case of the generalized Gaussian source. For the Gaussian distribution we need to employ the square error distortion (18) According to Shannon s source coding theorem [23], if a square distortion given by (18) is allowed, the minimum number of

976 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 Fig. 8. Plots of the function R() for a Gaussian source at different dead zone threshold values. Fig. 9.

7 976 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 Fig. 8. Plots of the function R() for a Gaussian source at different dead zone threshold values. Fig. 9. Plots of the lower and upper bounds of the -domain rate function R() for a generalized Gaussian source with =10and =1:5. bits needed to represent a symbol from a Gaussian source is given by. (19) For a uniform threshold quantizer with a step size and dead zone, by definition, the corresponding distortion is given by (20) It might be very difficult to derive an explicit expression of in the same way as for the Laplacian distribution. Instead, we evaluate numerically and plot it for different dead zone thresholds in Fig. 8. It can be seen that the rate function in the domain for a Gaussian source is also also very close to being a linear function. C. Generalized Gaussian Distribution Laplacian and Gaussian sources are two special cases of the generalized Gaussian source. For these two types of sources coupled with appropriate distortion measures, based on the source coding theorem, we have the explicit expressions for their - functions as given in (7) and (19). However, for a generalized Gaussian distribution, due to the complex nature of the distribution, it is difficult to obtain an explicit expression for the - function. Instead, we can obtain the lower and upper bounds of its - function [24]. To be more specific, for a generalized Gaussian source with zero mean and variance,wehave Here is the differential entropy of given by (21) (22) Fig. 10. The slope of each sample image. The parameter in (21) is the square error distortion. Following the same procedure as for the Gaussian source, we numerically evaluate the lower and upper bounds of the rate function in the domain and plot them in Fig. 9. Here the variance of the source is set to and the distribution control parameter is set to. It can be seen that the lower and the upper bounds of the rate function are very close to each other. In addition, both are approximately linear functions. Since the actual rate function should lie between them, it should therefore also be approximately a linear function. This justifies the linear rate model given in (1) for a generalized Gaussian source. V. PHYSICAL UNDERSTANDING OF THE SOURCE MODEL The only parameter of the proposed source model is the slope. Obviously, is related to some characteristic of the input source data, and this characteristic has a deterministic effect on the coding bit rate. In this section, we try to find out what the characteristic is and what its physical meaning is. In Fig. 10, we plot the slope for each sample listed in Fig. 2. It can be seen that the variation of is very large. We sort all of these sample images by the value of. In Fig. 11, the sorted images are listed in a raster scan order with increasing from the smallest to the largest. It can be seen that the images in the first half have

HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 977 Fig. 11. Samples images sorted by and listed in the raster scan order.

8 HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 977 Fig. 11. Samples images sorted by and listed in the raster scan order. more high-frequency texture than those in the second half. The images in the second half are smoother and more structured. This suggests to us that the value of is closely related to the amount of texture presented in the corresponding image. In the frequency domain, an image with more texture normally has a relatively larger amount of middle or high-frequency components [18]. In other words, the energy is more distributed to the middle or high-frequency subbands. For smoother and more structured images with less texture, the energy is more concentrated in the lower frequency subbands. Let us consider the wavelet transform as an example. After a five-level dyadic subband decomposition, there are 16 subbands total, denoted by. Let be the variance of the wavelet coefficients in. Let and be the arithmetic mean and geometric mean of, respectively. We define the energy compaction measure as (23) Obviously, larger corresponds to more compacted energy and less texture components. Actually, is often used as a feature variable for texture analysis. In Fig. 12, we plot the pair of for each sample image listed in Fig. 2. It can be seen that there is a strong linear correlation between and. The correlation coefficient between them is which is very high. This strong correlation explains the physical meaning of which is the only parameter of the proposed source model. VI. UNIFIED RATE CONTROL FOR VIDEO CODING In Sections II V, we have developed a linear source model in the domain. The only parameter of the source model is the. To estimate the rate curve, first we have to accurately estimate the value of. Obviously, the physical meaning of provides a way to estimate from the distribution of the subband energy. However, Fig. 10 tells us the estimation is not very accurate. In Fig. 12. The linear correlation between the coding gain (energy compaction measure) and the slope. this section, we propose an adaptive estimation algorithm for in video coding. Based on the linear source model in (1) and the adaptive estimation, a unified rate control algorithm is then proposed for MPEG-2, H.263 and MPEG-4 video coding. A. Adaptive Estimation of Let be the number of the coded MBs in the current frame. Note that in a MB, there are a total of 384 luminance and chrominance coefficients. Let be the number of bits used to encode these MBs. We denote the number of zeros in these MBs by. Based on (1), can be estimated as follows: (24) The estimated is then applied to the current MB. We can see that the estimated value of is an accumulative statistic of the coded MBs. In Fig. 13(a), we plot the estimated value of at each MB for Frame 80 in Carphone QCIF video sequence. It can be seen that, as more and more MBs are encoded, the

9 978 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 TABLE II RATE CONTROL RESULTS FOR MPEG-2. -RC REPRESENTS THE PROPOSED -DOMAIN RATE CONTROL ALGORITHM (a) (b) Fig. 13. Analysis of the MB-level adaptive estimation of. (a) Estimated value of at each MB. (b) Standard deviation of estimated for each frame. estimated value of converges to its true value of the current video frame. Let be the standard deviation of the estimated value of at each MB. In Fig. 13(b), we plot for each coded frame in Carphone. It can be seen that is mostly less than. In other words, the relative estimation error is below 5%, which is very small. B. Rate Control Algorithm With the adaptive estimation of and the proposed linear source model, the rate control for video coding turns out to be very simple and straightforward. Let the target bit rate (in bits) per frame be. Let the encoder buffer size be and the number of bits in the buffer be. The available bits for coding the current frame is (25) where the target buffer level is by default set to 0.2. Let be the number of MBs in a video frame. For QCIF videos, is 99. The quantization parameter is determined by the following steps: Step 1; Initialization. Before encoding the first MB, set. Compute the distributions and for the DCT coefficients in the intra- and inter- MBs, respectively. Set which is its average value for typical video sequences. Step 2: Determine the quantization parameter. According to (1), the number of zeros to be produced by quantizing the rest MBs should be (26) Based on the one-to-one mapping between and, the quantization parameter is determined. The current MB is quantized with and encoded. Step 3: Update. Let and be the number of zeros and number of bits produced by the current MB, respectively. Set,, and.if, update the value of according to (24). At the same time, subtract the frequencies of the DCT coefficients in the current MB from if it is an intra MB or from if it is an inter MB. Step 4: Loop. Repeat steps 2 and 3 for the next MB until all the MBs in the current frame are encoded. This rate control algorithm has very low computational complexity and implementation cost, only involving several addition and a few simple multiplication operations. It can be seen that the proposed rate control algorithm always divides the video frame into two groups, coded and uncoded MBs, and balances the bit budget between these two groups. Such a type of rate control mechanism turns out to be very accurate and robust. It should also be noted that, in the proposed algorithm, the rate control of the current frame does not use any information or statistics from its previous frame. Therefore, it will never suffer from performance degradation at scene changes. VII. EXPERIMENTAL RESULTS The proposed rate control algorithm has been implemented in the H.263, MPEG-2 and MPEG-4 video coders. In the following experiments, we compare the proposed rate control algorithm with the rate control algorithms developed in TM5 [25] of MPEG-2, TM8 [10] of H.263+, and VM8 [26] of MPEG-4. A. Rate Control in MPEG-2 In MPEG-2, the video sequence is coded by the unit of GOP. Each GOP consists of at least one intracoded picture ( -frame) and some intercoded pictures ( - and -frames). We employ the TM5 bit allocation scheme [25] to determine the number of bits assigned to each frame in the current GOP. For each frame, the proposed rate control algorithm is employed to achieve the target bit rate. The test videos are Foreman, Salesman, Tabletennis, and Coastguard. The target bit rate for each test is shown in Table II. To measure rate control performance, we define the relative control error as (27) where and are the actual and target coding bits of each video frame. We plot of both rate control algorithms for Foreman and Tabletennis in Figs. 14 and 15, respectively. It can be seen that the proposed algorithm yields a much smaller control error, which is mostly less than 2%. The PSNR of each frame in Foreman and Tabletennis are plotted in Figs. 16 and 17, respectively. It can be seen that with the proposed rate control algorithm the picture quality is significantly improved.

10 HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 979 Fig. 14. Relative bit rate control error in percentage for each frame in Foreman when the proposed rate control algorithm (solid-dot line) and the TM5 algorithm are applied to the MPEG-2 coding. Fig. 17. PSNR of each frame in Tabletennis when the proposed rate control algorithm (solid dotted line) and the TM5 algorithm are applied to the MPEG-2 coding. TABLE III DESCRIPTION OF THE RATE CONTROL TESTS WITH THE H.263 CODEC The significant picture quality improvement is due to our accurate source model and robust rate control. Fig. 15. Relative bit rate control error in percentage for each frame in Tabletennis when the proposed rate control algorithm (solid-dot line) and the TM5 algorithm are applied to the MPEG-2 coding. Fig. 16. PSNR of each frame in Foreman when the proposed rate control algorithm (solid dotted line) and the TM5 algorithm are applied to the MPEG-2 coding. The picture quality improvement for the other two test videos are listed in Table II. The PSNR gain of 0.87 db on average is achieved. For some video frames, the gain is even up to 2 db. B. Rate Control in H.263 In real-time video coding with H.263, the time delay should be very small, which imposes strict requirement on the rate control process. In the following experiment, we compare the proposed rate control algorithm with the TMN8 algorithm [10] which is one of the best available rate control algorithms for video coding. The configuration of each test is shown in Table III. The frame rate is fixed at 10 fps. We plot the number bits in the buffer for each test in Fig. 18. The proposed rate control algorithm maintains a much steadier buffer level than TMN8. The number of bits produced by each frame is plotted in Fig. 19. It can be seen that with the proposed algorithm the output bit rate of the video encoder is well matched to the target bit rate or the channel bandwidth. The average PSNR of the luminance components in each test are listed in Table III. The proposed algorithm achieves a slightly improved picture quality due to its more robust rate control and less skipped frames. C. Rate Control in MPEG-4 We use the MoMuSys MPEG-4 codec [27] with the H.263-type quantization scheme to test the proposed algorithm and the rate control algorithm in VM8. The two test videos are Carphone and News. We treat the whole scene as one video object. The frame rate is 10 fps. The target bit rate is 64 kbps. The number of bits produced by each frame in Carphone and News are plotted in Figs. 20 and 21, respectively. It can

11 980 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 Fig. 18. coding. Number of bits in the buffer when the proposed algorithm (solid line) and the TMN8 rate control algorithm (dotted line) are appplied to the H.263 video Fig. 19. Number of bits produced by each encoded frame when the proposed algorithm (solid line) and the TMN8 rate control algorithm (dotted line) are appplied to the H.263 video coding.

12 HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 981 of the model parameter is discussed. Second, a unified rate control algorithm is proposed for various standard video coding systems, which outperforms other algorithms reported in the literature by providing more accurate and robust rate control. The algorithm proposed in this work is conceptually simple. It has very low computational complexity and implementation cost. APPENDIX A detailed derivation of (9) is given in this appendix. Note that (28) Fig. 20. Number of bits produced by each encoded frame in Carphone when the proposed algorithm (solid line) and the VM8 rate control algorithm (dotted line) are appplied to the MPEG-4 coding. and From (8), we have (29) (30) (31) (32) Fig. 21. Number of bits produced by each encoded frame in News when the proposed algorithm (solid line) and the VM8 rate control algorithm (dotted line) are appplied to the MPEG-4 coding. (33) (34) be seen that with the proposed algorithm, an almost constant bit rate is achieved. The relative control error is less than 1%. However, in the rate control of VM8, the bit rate variation is very large. On average, 0.5 db and 0.7 db improvement in PSNR are achieved by the proposed rate control algorithm for Carphone and News, respectively. The proposed rate control algorithm has also been tested over other video sequences at different coding bit rates and yields similiar results. The simulation results, along with the results presented in the above, show that the proposed algorithm provides a much more robust and accurate rate control than other algorithm reported in the literature. VIII. CONCLUDING REMARKS There are two major contributions in this work. First, we have proposed a novel framework for source modeling by studying the rate function in the domain instead of the traditional domain. A unified linear source model have been developed for typical transform video coding systems. The proposed source model is very simple but very accurate. Based on Shannon s source coding theorem, we have also developed a theoretical justification for this linear source model. The physical meaning REFERENCES (35) [1] G. K. Wallace, The JPEG still picture compression standard, Commun. ACM, vol. 34, pp , Apr [2] D. LeGall, MPEG: A video compression standard for multimedia application, Commun. ACM, vol. 34, pp , Apr [3] H. Gish and J. N. Pierce, Asymptotically efficient quantizing, IEEE. Trans. Inform. Theory, vol. IT-14, pp , Sept [4] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice- Hall, [5] H.-M. Hang and J.-J. Chen, Source model for transform video coder and its application Part I: Fundamental theory, IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp , Apr [6] A. Said and W. A. Pearlman, A new fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp , June [7] W. Ding and B. Liu, Rate control of MPEG video coding and recording by rate-quantization modeling, IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp , Feb [8] T. Chiang and Y.-Q. Zhang, A new rate control scheme using quadratic rate distortion model, IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp , Feb

982 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 [9] H.-J. Lee, T. Chiang, and Y.-Q. Zhang, Scalable rate control for MPEG-4 video, IEEE Trans.

Video Technol., vol. 9, pp. 172 185, Feb. 1999.

13 982 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002 [9] H.-J. Lee, T. Chiang, and Y.-Q. Zhang, Scalable rate control for MPEG-4 video, IEEE Trans. Circuits Syst. Video Technol., vol. 10, pp , Sept [10] J. Ribas-Corbera and S. Lei, Rate control in DCT video coding for low-delay communications, IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp , Feb [11], Contribution to the rate control Q2 experiment: A quantizer control tool for achieving target bitrates accurately, in Coding of Moving Pictures and Associated Audio MPEG96/M1812 ISO/IEC JTC/SC29/WG11, Sevilla, Spain, Feb [12] Video Coding for Low Bit Rate Communications, in ITU-T Recommendation H.263, version 1: ITU-T, [13] G. Cote, B. Erol, M. Gallant, and F. Kossentini, H.263+: Video coding at low bit rates, IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp , Nov [14] ITU-T/SG-15, video codec test model, TMN5, in Telenor Research: Telenor codec, [15] T. Sikora, The MPEG-4 video standard verification model, IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp , Feb [16] J. M. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Processing, vol. 41, pp , Dec [17] M. J. Tsai, J. D. Villasenor, and F. Chen, Stack-run image coding, IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp , Oct [18] T. Chang and C.-C. J. Kuo, Texture analysis and classification with tree-structured wavelet transform, IEEE Trans. Image Processing, vol. 2, pp , Oct [19] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image coding using wavelet transform, IEEE Trans. Image Processing, vol. 1, pp , Apr [20] R. W. Buccigrossi and E. P. Simoncelli, Image compression via joint statistical characterization in the wavelet domain, IEEE Trans. Image Processing, vol. 8, pp , Dec [21] S. M. LoPresto, K. Ramchandran, and M. T. Orchard, Image coding based on mixture modeling of wavelet coefficients and a fast estimationquantization framework, in Proc. Data Compression Conf., Snowbird, UT, Mar. 1997, pp [22] E. Y. Lam and J. W. Goodman, A mathematical analysis of the DCT coefficient distributions for images, IEEE Trans. Image Processing, vol. 9, pp , Oct [23] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding. New York: McGraw-Hill, [24] T. G. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, [25] MPEG-2 video test model 5, in ISO/IEC JTC1/SC29/WG11 MPEG93/457, [26] Text of ISO/IEC MPEG4 video VM Version 8.0, in ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Associated Audio MPEG 97/W1796. Stockholm, Sweden: Video Group, [27] MPEG4 verification model version 7.0, in ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Associated Audio MPEG97. Bristol, U.K.: MoMuSys codec, Zhihai He (M 01) received the B.S. degree from Beijing Normal University, Beijing, China, and the M.S. degree from Institute of Computational Mathematics, Chinese Academy of Sciences, Beijing, China, in 1994 and 1997 respectively, both in mathematics, and the Ph.D. degree from University of California, Santa Barbara, CA, in 2001, in electrical engineering. In 2001, he joined Sarnoff Corporation, Princeton, NJ, as a Member of Technical Staff. His current research interests include image compression, video coding, network transmission, wireless communication, and embedded system design. Dr. He received the 2001 IEEE Circuits and Systems Society CSVT Transactions Best Paper Award. Sanjit K. Mitra (S 59 M 63 SM 69 F 74 LF 01) received the B.Sc. (Hons.) degree in physics in 1953 from Utkal University, Cuttack, India, the M.Sc. (Tech.) degree in radio physics and electronics in 1956 from Calcutta University, Calcutta, India, the M.S. and Ph.D. degrees in electrical engineering from the University of California, Berkeley, in 1960 and 1962, respectively, and an Honorary Doctorate of Technology degree from the Tampere University of Technology, Tampere, Finland, in From June 1962 to June 1965, he was with Cornell University, Ithaca, NY, as an Assistant Professor of Electrical Engineering. He was with the AT&T Bell Laboratories, Holmdel, NJ, from June 1965 to January He has been on the faculty at the University of California since then, first at the Davis campus and then at the Santa Barbara campus since 1977, as a Professor of Electrical and Computer Engineering, where he served as Chairman of the Department from July 1979 to June He has published over 550 papers in signal and image processing, 12 books, and holds five patents. Dr. Mitra served as the President of the IEEE Circuits and Systems (CAS) Society in 1986 and as a Member-at-Large of the Board of Governors of the IEEE Signal Processing (SP) Society during He is currently a member of the editorial boards of four journals. He is the recipient of the 1973 F.E. Terman Award and the 1985 AT&T Foundation Award of the American Society of Engineering Education, the 1989 Education Award, and the 2000 Mac Van Valkenburg Society Award of the IEEE CAS Society, the Distinguished Senior U.S. Scientist Award from the Alexander von Humboldt Foundation of Germany in 1989, the 1996 Technical Achievement Award, and the 2001 Society Award of the IEEE SP Society, the IEEE Millennium Medal in 2000, and the McGraw-Hill/Jacob Millman Award of the IEEE Education Society in He is the co-recipient of the 2000 Blumlein Browne Willans Premium of the Institution of Electrical Engineers (London), the 2001 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Best Paper Award, and the 2002 Technical Achievement Award of the European Association for Signal Processing (EURASIP). He is an Academician of the Academy of Finland and a Corresponding Member of the Croatian Academy of Sciences and Arts. He is a Fellow of the AAAS, and SPIE, and a member of EURASIP and ASEE.

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,