THE NEWEST international video coding standard is

Similar documents
Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

SCALABLE video coding (SVC) is currently being developed

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Overview: Video Coding Standards

Chapter 2 Introduction to

Chapter 10 Basic Video Compression Techniques

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.

The H.26L Video Coding Project

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Error concealment techniques in H.264 video transmission over wireless networks

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

A Fast Intra Skip Detection Algorithm for H.264/AVC Video Encoding

Visual Communication at Limited Colour Display Capability

H.264/AVC Baseline Profile Decoder Complexity Analysis

Project Proposal: Sub pixel motion estimation for side information generation in Wyner- Ziv decoder.

Reduced complexity MPEG2 video post-processing for HD display

WITH the rapid development of high-fidelity video services

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Express Letters. A Novel Four-Step Search Algorithm for Fast Block Motion Estimation

Key Techniques of Bit Rate Reduction for H.264 Streams

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame

Video coding standards

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

ERROR CONCEALMENT TECHNIQUES IN H.264 VIDEO TRANSMISSION OVER WIRELESS NETWORKS

MPEG has been established as an international standard

Multimedia Communications. Image and Video compression

Error Resilient Video Coding Using Unequally Protected Key Pictures

Adaptive Key Frame Selection for Efficient Video Coding

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

Video Over Mobile Networks

Multimedia Communications. Video compression

Temporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle

Highly Efficient Video Codec for Entertainment-Quality

Variable Block-Size Transforms for H.264/AVC

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

WITH the demand of higher video quality, lower bit

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

Motion Video Compression

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Principles of Video Compression

ROBUST ADAPTIVE INTRA REFRESH FOR MULTIVIEW VIDEO

AUDIOVISUAL COMMUNICATION

An Overview of Video Coding Algorithms

Project Proposal Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359

Performance Comparison of JPEG2000 and H.264/AVC High Profile Intra Frame Coding on HD Video Sequences

CONSTRAINING delay is critical for real-time communication

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Chapter 2 Video Coding Standards and Video Formats

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

WE CONSIDER an enhancement technique for degraded

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

ERROR CONCEALMENT TECHNIQUES IN H.264

The H.263+ Video Coding Standard: Complexity and Performance

HEVC Subjective Video Quality Test Results

Video Compression - From Concepts to the H.264/AVC Standard

PACKET-SWITCHED networks have become ubiquitous

CERIAS Tech Report Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E

A Study of Encoding and Decoding Techniques for Syndrome-Based Video Coding

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

Interim Report Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

Constant Bit Rate for Video Streaming Over Packet Switching Networks

A Unified Approach to Restoration, Deinterlacing and Resolution Enhancement in Decoding MPEG-2 Video

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

SCENE CHANGE ADAPTATION FOR SCALABLE VIDEO CODING

UC San Diego UC San Diego Previously Published Works

Rate-Distortion Analysis for H.264/AVC Video Coding and its Application to Rate Control

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Dual Frame Video Encoding with Feedback

Analysis of a Two Step MPEG Video System

complex than coding of interlaced data. This is a significant component of the reduced complexity of AVS coding.

EMBEDDED ZEROTREE WAVELET CODING WITH JOINT HUFFMAN AND ARITHMETIC CODING

Error Concealment for SNR Scalable Video Coding

ARTICLE IN PRESS. Signal Processing: Image Communication

A Study on AVS-M video standard

CHROMA CODING IN DISTRIBUTED VIDEO CODING

Concealment of Whole-Picture Loss in Hierarchical B-Picture Scalable Video Coding Xiangyang Ji, Debin Zhao, and Wen Gao, Senior Member, IEEE

Advanced Computer Networks

A VLSI Architecture for Variable Block Size Video Motion Estimation

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT

Systematic Lossy Error Protection of Video based on H.264/AVC Redundant Slices

Final Report Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359

Transcription:

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005 813 Fast Mode Decision Algorithm for Intraprediction in H.264/AVC Video Coding Feng Pan, Xiao Lin, Susanto Rahardja, Keng Pang Lim, Z. G. Li, Dajun Wu, and Si Wu Abstract The H.264/AVC video coding standard aims to enable significantly improved compression performance compared to all existing video coding standards. In order to achieve this, a robust rate-distortion optimization (RDO) technique is employed to select the best coding mode and reference frame for each macroblock. As a result, the complexity and computation load increase drastically. This paper presents a fast mode decision algorithm for H.264/AVC intraprediction based on local edge information. Prior to intraprediction, an edge map is created and a local edge direction histogram is then established for each subblock. Based on the distribution of the edge direction histogram, only a small part of intraprediction modes are chosen for RDO calculation. Experimental results show that the fast intraprediction mode decision scheme increases the speed of intracoding significantly with negligible loss of peak signal-to-noise ratio. Index Terms AVC, H.264, intraprediction, JVT, MPEG, video coding. Fig 1. Variable block size for rate distortion optimization. I. INTRODUCTION THE NEWEST international video coding standard is H.264/AVC [1]. It has been approved recently by ITU-T as Recommendation H.264 and by ISO/IEC as International Standard 14 496-10 (MPEG-4 part 10) Advanced Video Coding (AVC). The elements common to all video coding standards are present in the current H.264/AVC recommendation: an MB is 16 16 in size; luminance (luma) is represented with higher resolution than chrominance (chroma) with 4:2:0 subsampling; motion compensation and block transforms are followed by scalar quantization and entropy coding; motion vectors are predicted from the median of the motion vectors of neighboring blocks; bidirectional pictures (B-pictures) are supported that may be motion compensated from both temporally previous and subsequent pictures; and a direct mode exists for B-pictures in which both forward and backward motion vectors are derived from the motion vector of a co-sited macroblock (MB) in a reference picture. Some new techniques, such as spatial prediction in intracoding, adaptive block size motion compensation, 4 4 integer transformation, multiple reference pictures (up to seven reference pictures) and content adaptive binary arithmetic coding (CABAC), are used in this standard. The testing results of H.264/AVC show that it greatly outperforms existing video coding standards in both peak signal-to-noise ratio (PSNR) and visual quality [2]. Manuscript received October 21, 2003; revised May 20, 2004. This paper was recommended by Associate Editor F. Pereira. The authors are with the Institute for Infocomm Research, 119613 Singapore (e-mail: efpan@i2r.a-star.edu.sg; linxiao@i2r.a-star.edu.sg; rsusanto@i2r.a-star.edu.sg; kplim@i2r.a-star.edu.sg; ezgli@i2r.a-star.edu.sg; djwu@i2r.a-star.edu.sg; swu@i2r.a-star.edu.sg). Digital Object Identifier 10.1109/TCSVT.2005.848356 Fig. 2. Computation of RDcost. To achieve the highest coding efficiency, H.264/AVC uses a nonnormative technique called Lagrangian rate-distortion optimization (RDO) technique to decide the coding mode [3] for an MB. Fig. 1 shows the possible MB modes and Fig. 2 shows the RDO process. As can be seen from Fig. 2, in order to choose the best coding mode for an MB, H.264/AVC encoder calculates the rate-distortion (RD) cost (RDcost) of every possible mode and chooses the mode having the minimum value, and this process is repeatedly carried out for all the possible modes for a given MB. Therefore, the computational burden of this type of brute force-searching algorithm is far more demanding than any existing video coding algorithm. To reduce the complexity of H.264/AVC, a number of efforts have been made to explore the fast algorithms in motion estimation, intramode prediction and intermode prediction for H.264/AVC video coding [4], [5]. Fast motion estimation is a well-studied topic and is widely applied in the existing standards such as MPEG-1/2/4 and H.261/H.263. However, these fast motion estimation algorithms cannot be applied directly to H.264/AVC coding due to the variable block size motion estimation. On the other hand, fast intramode decision is a new topic in H.264/AVC coding, and very few previous works exist so far. It is believed that fast intramode decision algorithms are also very 1051-8215/$20.00 2005 IEEE

814 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005 important in reducing the overall complexity of H.264/AVC. We have made two contributions to H.264/AVC related to the fast mode decision algorithms which are adopted as part of nonnormative reference model for H.264/AVC [6], [7]. In this paper, we present one of the contributions, a fast intramode decision algorithm for H.264/AVC intraprediction by using local edge information. The presented algorithm considerably reduces the amount of calculations needed for intraprediction with negligible loss of coding quality. We have observed that the pixels along the direction of local edge are normally of the similar values (this is true for both luma and chroma components), and a good prediction could be achieved if we predict the pixels using those neighboring pixels that are in the same direction of the edge. Therefore, an edge map which represents the local edge orientation and strength is created, and a local edge direction histogram is then established for each subblock. Based on the distribution of the edge direction histogram, only a small number of prediction modes are chosen for RDO calculation during intraprediction. Experimental results show that the fast mode decision algorithms increase the speed of intracoding significantly with negligible loss of the quality. The rest of the paper is organized as follows. Section II gives an overview of intracoding in H.264/AVC. Section III will present in detail the fast intraprediction algorithm based on the edge direction histogram. Experimental results will be presented in Section IV and conclusions will be given in Section V. II. OVERVIEW OF INTRACODING IN H.264/AVC Intracoding refers to the case where only spatial redundancies within a video picture are exploited. The resulting picture is referred to as an I-picture. Traditionally, I-pictures are encoded by directly applying the transform to all MBs in the picture. In previous video coding standards (namely H.263 and MPEG-4), intraprediction has been conducted in the transform domain. Intraprediction in H.264/AVC is always conducted in the spatial domain, by referring to neighboring samples of previously coded blocks. The difference between the actual block/mb and their prediction is then coded. With these advanced prediction modes, the performance of intracoding in H.264/AVC is comparable to that of the recent still image compression standard JPEG-2000 [8]. If an MB is encoded in intramode, a prediction block is formed based on previously coded and reconstructed blocks before deblocking. This prediction block is subtracted from the current block prior to encoding. For the luma samples, the prediction block may be formed for each 4 4 block (denoted as I4MB) or for an entire MB (denoted as I16MB). When using the I4MB prediction, each 4 4 block of the luma component utilizes one of nine prediction modes. Beside DC prediction, eight directional prediction modes are specified. When utilizing the I16MB prediction, which is well suited for smooth image areas, a uniform prediction is performed for the whole luma component of an MB. Four prediction modes are supported. The chroma samples of an MB are always predicted using a similar prediction technique as for the luma component in I16MB prediction. Fig. 3. I4MB prediction coding is conducted for samples a-p of a block using samples A-Q. (b) Eight prediction directions for I4MB prediction. A. I4MB Prediction Modes The nine prediction modes for each 4 4 luma block are shown in Fig. 3. It can be seen that I4MB prediction is conducted for samples a-p of a block using samples A-Q. There are in total eight prediction directions and one DC prediction mode for I4MB prediction as detailed in the following [1]. Mode 0: Vertical Prediction Mode 1: Horizontal prediction. Mode 2: DC prediction. Mode 3: Diagonal down-left prediction. Mode 4: Diagonal down-right prediction. Mode 5: Vertical-right prediction. Mode 6: Horizontal-down prediction. Mode 7: Vertical-left prediction. Mode 8: Horizontal-up prediction. For example, if we choose Mode 0, then the pixels,,, and are predicted based on the neighboring pixel ; pixels,, and are predicted based on pixel, and so on. If we choose Mode 7, then pixel a would be predicted by, pixels and would be predicted by, and pixels and would be predicted by and so on. Note that DC is a special prediction mode, where the mean of the left handed and upper samples (pixels A to D and I to L in Fig. 3) are used to predict the entire block. Normally DC prediction is useful for those blocks with little or no local activities. B. I16MB Prediction Modes As an alternative to I4MB prediction described above, the entire MB may be predicted. This is well suited for smooth image areas where a uniform prediction is performed for the whole luma component of an MB. Four prediction modes are supported. Mode 0 (vertical): extrapolation from upper samples. Mode 1 (horizontal): extrapolation from left samples. Mode 2 (DC): mean of upper and left-hand samples. Mode 4 (Plane): plane prediction based on a linear spatial interpolation by using the upper and left-hand samples of the MB.

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING 815 Fig. 4. Examples of 4 2 4 edge patterns and their preferred intraprediction directions. C. 8 8 Chroma Prediction Mode Each 8 8 chroma component of an MB is predicted from chroma samples above and/or to the left that have previously been encoded and reconstructed. The four chroma prediction modes are very similar to that of the I16MB prediction except that the order of mode numbers is different: DC (Mode 0), horizontal (Mode 1), vertical (Mode 2) and plane (Mode 3). The same prediction mode is always applied to both chroma blocks. H.264/AVC uses the RDO technique to achieve the best coding performance. This means that the encoder has to encode the intrablock using all the mode combinations and choose the one that gives the best RDO performance. Since the choice of prediction modes for chroma components is independent to that of luma components, thus for each luma prediction modes, there should be four different chroma prediction modes. Therefore, the number of mode combinations for luma and chroma components in an MB is, where,, and represent the number of modes for chroma prediction, I4MB prediction and I16MB prediction, respectively. It means that, for an MB, it has to perform different RDO calculations before a best RDO mode is determined. As a result, the complexity and computational load of the encoder is extremely high. III. DETERMINING THE PRIMARY EDGE DIRECTION IN THE IMAGE BLOCK We observed that the pixels along the direction of local edge normally have similar values (this is true for both luma and chroma components). Therefore, a good prediction could be achieved if we predict the pixels using those neighboring pixels that are in the same direction of the edge. Fig. 4 shows a few edge patterns of a 4 4 block and their preferred directional predictions. There are a number of ways to get the local edge directional information, such as edge direction histogram which is based on a simple edge detection algorithm [9], and directional fields which are based on the local gradients [10], etc. The algorithm described in this paper is based on edge detection due to its simplicity in terms of computational complexity. The rest of this section will explain in detail the fast intraprediction algorithm by using an edge direction histogram based on edge detection. A. Edge Map In order to obtain the edge information in the neighborhood of the intrablock to be predicted, the edge map of the video picture is generated by using the Sobel edge operators. Each pixel in the video picture will then be associated with an element in the edge map, which is the edge vector containing its edge direction and amplitude. Sobel operator has two convolution kernels which respond to degree of difference in vertical direction and horizontal directions. For a pixel, in a luma (or chroma) picture, we define the corresponding edge vector,,as where and represent the degree of difference in vertical and horizontal directions respectively. Therefore, the amplitude of the edge vector can be roughly estimated by In fact, the amplitude could be obtained more accurately by using the rooted sum of the squares of and. The latter is computationally expensive and thus (2) is used. The direction of the edge (in degree) is decided by the hyper-function, (1) (2) (3)

816 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005 It must be noted that in the actual implementation of the algorithm, (3) is not necessary. This is due to the fact that in H.264/AVC there is only limited number of prediction modes for intracoding. In this paper, a simple thresholding technique is applied to to build up the edge direction histogram. B. Edge Direction Histogram In order to decide whether the image block contains an edge, and how strong this edge is, an edge direction histogram is calculated from all the pixel map of the block by summing up the amplitude of the edge with similar edge directions in the block. 1) 4 4 Luma Block Edge Direction Histogram: In the case of a 4 4 luma block, there are 8 directional prediction modes, as shown in Fig. 3, plus a DC prediction mode. The border between any two adjacent directional prediction modes is the bisectrix of the two corresponding directions. For example, the border of Mode 1 (0 ) and Mode 8 (26.6 ) is the direction at 13.3, this is because that for Mode 8, prediction is done at an angle of approximately 26.6 above horizontal direction. It is important to note that Mode 3 and Mode 8 are adjacent due to circular symmetry of the prediction modes. The mode of each pixel is determined by its edge direction. Therefore, the edge direction histogram of a 4 4 luma block is decided by the following algorithm. For each pixel in a 4 4 luma block, let,, be the histogram cell of the prediction mode k, and let, then Fig. 5. Edge direction histogram of Fig. 4(c). if and or Note that Mode 2 is not included in the above algorithm. This is because that Mode 2 will always be chosen as one of the candidate mode. Fig. 5 shows the edge direction histogram of Fig. 4(c). It shows that this block exhibits strong edge in the Vertical Right direction. 2) Edge Direction Histogram for 16 16 Luma Block and 8 8 Chroma Block: In the case of 16 16 luma and 8 8 chroma blocks, there are only two directional prediction modes, plus a plane prediction and a DC prediction mode. Therefore, the edge direction histogram for this case will be based on three (4) Fig. 6. Intra 8 2 8 and 16 2 16 prediction mode directions. directions, i.e., horizontal, vertical and diagonal (plane) directions, as shown in Fig. 6. Note that both diagonal down right and diagonal down left prediction modes are associated with the plane prediction. Though it is not mathematically correct to associate plane prediction to any directional edge, we can for sure associate the vertical and horizontal prediction to its respective directional edges. Therefore, it is fairly reasonable for us to try plane prediction if it is not obviously a DC prediction. The edge direction histogram for 16 16 luma is constructed as follows: if else For the similar reason, Mode 2 is missing in the above algorithm. An example of such edge direction histogram is shown in Fig. 7. Note for 8 8 chroma blocks, the similar equation of the above is applied, except that the order of mode numbers is different. As mentioned above, each cell in the edge direction histogram sums up the amplitudes of those pixels with similar edge directions in the block. Obviously, the histogram cell with the maximum amplitude indicates that there is a strong edge along (5)

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING 817 predicted by the pixels above and/or to the left of the block. Method 4: During the experiments of Method 2, it is observed that the chosen intraprediction mode is either the primary prediction mode, or one of the two neighboring modes (in terms of direction) of the primary prediction mode. Therefore, the two additional candidate prediction modes are determined to be the two neighbors of the primary prediction mode in terms of directions (refer to Fig. 3). Experimental results have shown that Method 4 achieves a good balance between computational time and coding efficiency, and the rest of this section will describe the detailed implementation of this algorithm. However, in the experimental section, we will still present the comparison among all the methods. Fig. 7. histogram of. Example of 16 2 16 luma and 8 2 8 chroma block edge direction this direction in the block, and is thus considered as the preferable prediction direction. The mode whose direction complies with such is chosen as the primary prediction mode. Note that only the cell with global maximum is chosen as the primary prediction mode, even though the histogram might have multiple maximums. Therefore, the above algorithm produces one primary prediction mode each for a 4 4 luma block, 16 16 luma block, and 8 8 chroma block. IV. MODE DECISION FOR INTRAPREDICTION Based on the primary prediction mode previously determined, the fast mode decision algorithms for intraprediction select a small number of the prediction modes as the candidates to be used in RDO computation. It should be noted that, the actual RDO computation in H.264/AVC intracoding is based on the reconstructed images. While the edge directional histogram is calculated from the original lossless images as the reconstructed image is not available at the time of calculating the histogram, the primary prediction mode decided above will not always be the best RDO mode in actual coding. We have thus tried a number of ways in deciding the number of preferred prediction modes, as is discussed in the following. Method 1: The mode with maximum amplitude in edge directional histogram is chosen as the candidate prediction mode, and if this amplitude is below a predefined threshold, the prediction mode will be chosen as DC. Method 2: This method simply encounters DC mode to be candidate mode besides the primary prediction mode. This will eliminate the effect that different thresholds result in different performances in different sequences, which are the cases using Method 1. Method 3: In this method, additional information is added based on Method 2. The window size of the histogram computation is enlarged, by including pixels in the left column and upper row of the block of interest. This is due to the fact that a block of interest is A. I4MB Prediction Modes Experimental results have shown that, in general, the histogram cell with the maximum amplitude is the best candidate for intraprediction (Method 1). In the case that all the cells have similar amplitudes, DC mode will be a better choice, thus an amplitude threshold is needed in deciding whether the intrablock exhibits strong edge presence or is just a flat region. However, it is difficult to pre-define a universal threshold that suits for different block context and different video sequences. Therefore, we always choose DC mode as the second candidate in participating the RDO operation (Method 2). Extensive experiments also show that, the chosen intraprediction mode is either the primary prediction mode, or one of the two neighboring modes (in terms of direction) of the primary prediction mode. The main cause for this phenomenon is that in H.264/AVC, RDO is based on the reconstructed intralossy images, while the edge directional histogram is calculated from the original lossless images. Therefore, the two additional candidate prediction modes are determined to be the two neighbors of the primary prediction mode in terms of directions. For example, if the primary prediction mode is Mode 1, then two additional candidate prediction modes will be Mode 8 and Mode 6. Note that Mode 8 and Mode 3 are adjacent modes in terms of directions due to the symmetry of the circle. In summary, for the I4MB prediction coding, the histogram cell with the maximum amplitude, and its two adjacent cells, plus DC mode are chosen to take part in RDO calculation. Therefore, for each 4 4 luma block, we will only perform 4 modes RDO calculation, instead of 9. B. I16MB Prediction Modes Based on the same observation above, the primary prediction mode decided by edge direction histogram is considered as a candidate of best prediction mode, and DC mode is also chosen as the next candidate. Therefore, in I16MB prediction coding, we will only perform 2 modes RDO calculation, instead of 4. C. 8 8 Chroma Prediction Modes For intrachroma blocks, there are two different edge direction histograms, one from component U and the other from V. The

818 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005 TABLE I NUMBER OF CANDIDATE MODES primary prediction modes from the two components are both considered as candidate modes. Same as before, DC mode will also be used in the RDO calculation. Note that according to the standard, the same prediction mode is always applied to both chroma blocks. Therefore, if the primary prediction modes from the two components are the same, there could only be 2 candidate modes for RDO calculation; otherwise, there will be 3. Thus, for each 8 8 chroma block intracoding, we will perform either 2 or 3 modes RDO calculation, instead of 4. D. Algorithm Complexity Analysis Table I summarizes the number of candidates selected for RDO calculation based on edge direction histogram. As can be seen from Table I, the encoder with the fast mode decision algorithm would need to perform only if the two chroma components have the same primary prediction mode. In case that the two chroma components have different primary prediction mode (which is very rare), the total number of RDO calculations would be. Thus our fast intraprediction algorithm has reduced number of RDO modes calculation significantly compared to the 592 modes that are used in the current RDO calculation in H.264/AVC video coding. E. Early Termination of RDO Calculation During the intracoding of any prediction mode, the calculation can be terminated if it can foresee that the current mode will not be the best prediction mode. By early termination of the RDO calculation which is deemed to be suboptimal, a great timesaving could be achieved. In RDO, the coding cost consists of two parts: rate and distortion. After calculating the cost of rate, there might be cases that the cost of rate is higher than the coding cost of the best mode in the previous modes. This implies that the current mode will not be the best mode since its coding cost will not be the smallest. Therefore, the RDO calculation will be terminated and the calculation of the Distortion is then eliminated. An MB is encoded by either I4MB prediction or I16MB prediction. In RDO, the selection between these two coding modes is determined by the coding costs of the MB by each coding mode. After I16MB prediction coding, I4MB prediction coding will apply to the sixteen 4 4 blocks in the MB and the cost of these blocks will be accumulated. However, if the accumulated cost before encoding the entire sixteen 4 4 blocks is already higher than that of I16MB prediction coding, the coding of the remaining of 4 4 blocks in the MB will be terminated pre-maturely. V. EXPERIMENTAL RESULTS Our proposed algorithm was implemented into JM6.1e provided by JVT. According to the specifications provided in [11], the test conditions are as follows. 1) MV search range is 32 pels for QCIF, CIF. 2) RD optimization is enabled. 3) Reference frame number equals to 1. 4) CABAC is enabled. 5) MV resolution is 1/4 pel. 6) GOP structure is IPPPP or IBBPB. A group of experiments were carried out on the recommended sequences with quantization parameters 28, 32, 36, and 40 as specified by [12]. The averaged PSNR values of luma (Y) and chroma (U, V) is used which is based on the equations below: where the average mean square error (MSE) is given by The comparison results were produced and tabulated based on the difference of coding time, the PSNR difference and the bit-rate difference, and the coding time statistics is generated from JM6.1e encoder. The test platform used is Pentium IV-2.8 GHz, 512 Mbytes RAM. In order to evaluate the timesaving of the fast intramode decision algorithm, the following calculation is defined to find the time differences. Let denote the coding time used by JM6.1e encoder and be the time taken by the faster intraprediction algorithm, and is defined as PSNR and bit-rate differences are calculated according to the numerical averages between the RD-curves derived from JM6.1e encoder and the fast algorithm, respectively. The detailed procedures in calculating these differences can be found from a JVT document authored by Bjontegaard [13], which is recommended by JVT Test Model Ad Hoc Group [12]. Note that PSNR and bit-rate differences should be regarded as equivalent, i.e., there is either the increase in PSNR or the decrease in bit-rate not both at the same time. A. Experiments on IPPPP Sequences It should be noted that, in H.264/AVC coding, MBs in P-frames also choose intracoding as the possible coding modes in the RDO operation, thus great timesaving is expected by using fast intracoding algorithm for this type of sequences. Table II shows the tabulated performance comparison of the proposed algorithm with JM6.1e for the sequences listed in [12]. In this experiment, the total number of frames is 300 for each sequence, and the period of I-frames is 100, i.e., there is one I-frame for every 100 coded frames. Note that in the table positive values mean increments, and negative values mean decrements. The differences in PSNR and bit rate are calculated (6) (7) (8)

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING 819 TABLE II RESULTS FOR IPPPP SEQUENCES Fig. 10. Timesaving at different intraperiod. Fig. 8. News, 1Psnr = 00:067 db, 1Bits = 1:226%. Fig. 11. Timesaving at different size of searching area. Fig. 9. Mobile, 1Psnr = 00:018 db, 1Bits = 0:451%. according to [13]. It can be seen that the fast intraprediction algorithm achieves consistent timesaving (average 25%) with negligible losses in PSNR and increments in bit rate. This means that, the fast intraalgorithm only takes about 3/4 of the time that is needed by JM6.1e. Figs. 8 and 9 show the RD curves of the two sequences news and mobile. Again, these twofigures have shown that the fast intraprediction algorithm has the similar RDO performance as that of JM6.1e. We have noticed that the simple early termination scheme described in Subsection IV-E contributed to about 6% to 8% of the total timesaving, with negligible loss of PSNR.However at higher quantization values, the increase in bit-rate is slightly higher than that in the lower quantization values. Figs. 10 and 11 show the timesaving at different intraperiods and at different searching area during motion estimation. It is noted from these ures that the fast intraalgorithm achieves similar timesaving when the intraperiod changes from 50 to 150 frames. However, the timesaving has reduced significantly when the size of the searching area increases. This is because that in H.264 video coding, the rate distortion optimization for intercoding mode decision is much more complex than that for intracoding mode decision due to motion estimation operations, i.e., the time takes to perform the RDO for intercoding is much longer than that for intracoding, and this becomes even so when the searching area increases. Fig. 12 shows the timesaving by using different number reference frames. It can be seen that the timesaving has reduced as the number of reference frames increases. This is similar to that case of Fig. 11, as the increased number of reference frames has increased the proportion of intercoding in the overall computational load.

820 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005 Fig. 12. Timesaving at different # of reference frames. TABLE III RESULTS FOR IIIII SEQUENCES Fig. 13. News, 1Psnr = 00:294 db, 1Bits = 3:902%. B. Experiments on All Intraframes Sequences In this experiment, a total number of 300 frames are used for each sequence, and the period of I-frames is set to 1, i.e., all the frames in the sequence are intracoded. It can be seen from Table III that the fast intraprediction algorithm achieves consistent timesaving (average 60%), which means that the fast intraalgorithm only takes about 40% of the time that is needed by JM6.1e. The average loss of PSNR of about 0.24 db, or equivalently, a slight increment in bit rate of about 3.7%. Figs. 13 and 14 show the RD curves of the two sequences news and mobile. Again, these two ures have shown that the fast intraprediction algorithm has the similar RDO performance as that of JM6.1e. C. Experiments on IBBPBB Sequences In this experiment, the picture type is set to IBBPB, i.e., there are two B-frames between any two I- or P-frames. A total number of 300 frames are used for each sequence, and the period of I-frames is set to 100. It can be seen from Table IV that the fast intraprediction algorithm achieves consistent timesaving (average 10%) with negligible losses in PSNR in increments in bit rate. It is noted that the timesaving for this type of sequence is much less than that of the IPPPP format. This is due to the fact that in H.264/AVC coding, B-frames do not use intracoding, and also, in B-frame coding, the motion estimation takes much longer time than that in P-frame coding. Fig. 14. Mobile, 1Psnr = 00:255 db, 1Bits = 3:168%. TABLE IV RESULTS FOR IBBPB SEQUENCES Another interesting observation from the table is that QCIF sequences achieve more timesaving than CIF. This is due to the high percentage of the boundary MBs in a QCIF sequence, and the searching area for those MBs is much smaller compared to the nonboundary MBs. Figs. 15 and 16 show the RD curves of the two sequences news and mobile. Again, these two ures have shown that the fast intraprediction algorithm has similar RDO performance as that of JM6.1e.

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING 821 VI. CONCLUSION This paper presented a fast mode decision algorithm for intraprediction in H.264/AVC video coding. By making use of the edge direction histogram, the number of mode combinations for luma and chroma blocks in an MB that takes part in RDO calculation has been reduced significantly from 592 to as low as 132. Other techniques such as early termination of RDO mode calculation are also used to further reduce the computation time. This results in a great reduction of the complexity and computation load of the encoder. Experimental results show that the fast algorithm has a negligible loss of PSNR compared to the original scheme. Fig. 15. News, 1Psnr = 00:156 db, 1Bits = 3:106%. Fig. 16. Mobile, 1Psnr = 00:013 db, 1Bits = 0:379%. TABLE V COMPARISON OF DIFFERENT FAST INTRA PREDICTION METHODS REFERENCES [1] Information Technology Coding of Audio-Visual Objects Part 10: Advanced Video Coding, Final Draft International Standard, ISO/IEC FDIS 14 496-10, Dec. 2003. [2] Report of the formal verification tests on AVC (ISO/IEC 14 496-10 ITU-T Rec. H.264),, MPEG2003/N6231, Dec. 2003. [3] G. Sullivan, T. Wiegand, and K.-P. Lim, Joint model reference encoding methods and decoding concealment methods, presented at the 9th JVT Meeting (JVT-I049d0), San Diego, CA, Sep. 2003. [4] X. Li and G. Wu, Fast integer pixel motion estimation, presented at the 6th JVT Meeting (JVT-F011), Awaji Island, Japan, Dec. 2002. [5] Z. Chen, P. Zhou, and Y. He, Fast integer pel and fractional pel motion estimation for JVT, presented at the 6th JVT Meeting (JVT-F017), Awaji Island, Japan, Dec. 2002. [6] F. Pan, X. Lin, S. Rahardja, K. P. Lim, Z. G. Li, G. N. Feng, D. J. Wu, and S. Wu, Fast mode decision algorithm for JVT intra prediction, presented at the 7th JVT Meeting (JVT-G013), Pattaya, Thailand, Mar. 2003. [7] K. P. Lim, S. Wu, D. J. Wu, S. Rahardja, X. Lin, F. Pan, and Z. G. Li, Fast intermode decision, presented at the 9th JVT Meeting (JVT-I020), San Diego, CA, Sep. 2003. [8] D. Marpe, V. George, H. L. Cycon, and K. U. Barthel, Performance evaluation of motion-jpeg2000 in comparison with H.264/AVC operated in intra coding mode, in SPIE Conf. Wavelet Applications in Industrial Processing, Oct. 2003, pp. 129 137. [9] A. K. Jain and A. Vailaya, Image retrieval using color and shape, Pattern Recognit., vol. 29, pp. 1233 1244, 1996. [10] A. M. Bazen and S. H. Gerez, Systematic methods for the computation of the directional fields and singular points of fingerprints, IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 905 919, Jul. 2002. [11] G. Sullivan, Recommended simulation common conditions for H.26L coding efficiency experiments on low resolution progressive scan source material, presented at the 14th VCEG-N81 Meeting, Santa Barbara, CA, Sep. 2001. [12] JVT Test Model Ad Hoc Group, Evaluation sheet for motion estimation,, Draft version 4, Feb. 19, 2003. [13] G. Bjontegaard, Calculation of average PSNR differences between RD-curves, presented at the 13th VCEG-M33 Meeting, Austin, TX, Apr. 2001. D. Comparison of Different Fast Intraprediction Methods As mentioned in the beginning of Section IV, besides the proposed methods, we have also tried different ways in deciding the number of preferred prediction modes based on the primary prediction mode. Table V gives the comparison of these methods. In this experiment, the settings and parameters used are the same as that in Section V-A, and we only present the results of the two sequences, i.e., news and mobile. It can be seen from Table V that all the four methods have achieved significant timesaving, and in terms of RD performance, Method 3 achieves the best results, though it is slightly inferior in timesaving. Feng Pan (M 00 SM 03) received the B.Sc., M.Sc., and Ph.D. degrees in communication and electronic engineering from Zhejiang University, Hangzhou, China in 1983, 1986, and 1989, respectively. Since then, he has been teaching and researching in a number of universities in China, U.K., Ireland, and Singapore. He is now with Institute for Infocomm Research, Singapore. His research areas are digital image processing, digital signal processing, digital video compression, and digital television broadcasting. He has published numerous technical papers and offered many short courses for industries. Dr. Pan currently serves as the Chapter Chairman of IEEE Consumer Electronics, Singapore.

822 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005 Xiao Lin (M 99 SM 02) received the Ph.D degree from the Electronics and Computer Science Department, University of Southampton, Southampton, U.K., in 1993. He worked with Centre for Signal Processing (CSP) for about five years as a Researcher and Manager on the Multimedia Program. He worked for DeSOC Technology as a technical director where he contributed on the VoIP solution, speech packet lost concealment for Bluetooth WCDMA baseband SOC development. He joined Institute for Infocomm Research, Singapore, in July 2002, where he is now Research Manager in charge of multimedia signal processing areas. computer network. Z. G. Li (M 97 SM 04) received the B.Sci. degree and the M. Eng. degree from Northeastern University, Shen Yang, China in 1992 and 1995, respectively, and received the Ph.D. degree from Nanyang Technological University, Singapore, in 2001. Currently, he is with the Institute for Infocomm Research (I2R), Singapore. He is an also Adjunct Assistant Professor of Nanyang Technological University, Singapore. He has published more than 30 journal papers in the fields of video processing, hybrid systems, chaotic secure communication, and Susanto Rahardja (M 00 SM 04) received the B.Eng. degree in electrical engineering from the National University of Singapore (NUS), Singapore, the M.Eng. degree in digital communication and microwave circuits, and the Ph.D. degree in the area of logic synthesis and signal processing from the Nanyang Technological University (NTU), Singapore, in 1991, 1993, and 1997, respectively. He joined the Centre for Signal Processing, NTU, as a Research Engineer in 1996, a Research Fellow in 1997, and served as a Business Development Manager in 1998. In 2001, he joined NTU as an Academic Professor and was appointed the Assistant Director of the Centre for Signal Processing. In 2002, he joined the Agency for Science, Technology, and Research and was appointed as the Program Director to lead the Signal Processing Program. He is the Co-Founder of AMIK Raharja Informatika and STMIK Raharja, an institute of higher learning in Tangerang, Indonesia. He is currently the Director of Media Division in the Institute for Infocomm Research, Singapore. He has more than 100 articles in international journals and conferences. He is currently an Associate Professor at the School of Electrical and Electronic Engineering in the Nanyang Technological University. His research interests include binary and multiple-valued logic synthesis, digital communication systems, and digital signal processing. Dr. Rahardja was the recipient of IEE Hartree Premium Award in 2002 and the Tan Kah Kee Young Inventors GOLD Award (Open Category) in 2003. Dajun Wu received the B.S. degree in computer science from Northwest University, Xi an, China, and the M. Eng. degree in computer engineering from Xi an Jiatong University, Xi an, China, in 1993 and 1998, respectively. From 1998 to 2000, he was a Researcher Scholar in the School of Computer Engineering, Nanyang Technological University, Singapore. Since 2000, he has been with Institute for Infocomm Research, Singapore. His research field includes image/video coding and computer vision. Keng Pang Lim (M 95) received the B.A.Sc. and Ph.D. degrees from the School of Computer Engineering, Nanyang Technological University, Singapore, in 1994 and 2001, respectively. He is an Associate Lead Scientist in Institute for Infocomm Research, Singapore, where he is currently leading a video coding group. His research interests include video coding, computer vision, and number theoretical transform. Dr. Lim was the recipient of the Du Pont Scholarship and Sony Prize Award. Si Wu received the B.S and M.Eng. degrees in telecommunication from Xidian University, Xi an, China. He is currently working as Senior Technical Officer in the Institute for Infocomm Research, Singapore. His research interests are multimedia communication, networking, and video processing.