Study of AVS China Part 7 for Mobile Applications By Jay Mehta EE 5359 Multimedia Processing Spring 2010 1
Contents Parts and profiles of AVS Standard Introduction to Audio Video Standard for Mobile Applications AVS M AVS M Codec Profiles and levels Major and minor tools used in AVS M Error Concealment and Resilience Conclusions Results List of Acronyms References 2
AVS Standard Audio-video coding standard (AVS) is a working group of audio and video coding standard in China, which was established in 2002. Based on versatile applications in the area of video, AVS-China is categorized into various profiles[16]. AVS-China consists of four profiles namely: Jizhun (base) profile, Jiben (basic) profile, Shenzhan (extended) profile and Jiaqiang (enhanced) profile, defined in AVS-video targeting to different applications (Table.2) [16]. 3
Part Category 1 System 2 Video 3 Audio 4 Conformance test 5 Reference Software 6 Digital media rights management 7 Mobile video 8 Transmit AVS via IP network 9 AVS file format 10 Mobile speech and audio coding Table.1 Different parts of AVS-China [3] 4
Profiles Key applications Jizhun profile (base) Jiben profile (basic) Shenzhan profile (extended) Jiaqiang profile (enhanced) Television broadcasting, HDTV, etc. Mobility applications, etc. Video surveillance, etc. Multimedia entertainment, etc. Table.2 Application based profiles of AVS [16] 5
Introduction to AVS M AVS is a set of integrity standard system system video, audio and media copyright management. AVS M is the 7 th part of the video coding standard developed by the AVS Workgroup of China which aims for mobile systems and devices. In AVS M a Jiben Profile has been defined which has 9 different levels. AVS follows a layered structure for the data and this representation is seen in the coded bitstream. Sequence layer provides an entry point into the coded video. It consists of a set of mandatory and optional downloadable parameters. 6
Picture The picture layer provides the coded representation of a video frame. It comprises a header with mandatory and optional parameters and optionally with user data. There are 3 types of pictures defined by the AVS: I- Pictures P-Pictures B-Pictures 4:2:0 Sub sampling format is used in AVS M. 7
Picture AVS M supports only I picture and P picture which are shown in Figure 1. AVS M supports only progressive video sequence. Therefore, one picture is one frame. P picture can have a maximum of two reference frames for forward prediction. 8
Slice Series of Macro Blocks Must not overlap, must be contiguous, must begin and terminate at the left and right edges of the picture. A single slice can cover the entire picture. Slices are independently coded so no slice can refer to another slice during the decoding process. 9
Macroblocks and blocks Picture is divided into macroblocks. The upper left sample of each MB should not exceed picture boundary. Macroblock partitioning is used for motion compensation. The number in each rectangle specifies the order of appearance of motion vectors. 10
AVS M encoder [10] 11
AVS M Codec Each and every input MB needs to either intra predicted or inter predicted. In an AVS M Encoder, S0 is used to select the right prediction method for the current MB whereas in the Decoder, the S0 is controlled by the MB type of current MB. The intra predictions are derived from the neighboring pixels in the left and top blocks. The unit size of intra prediction is 4x4 because of the 4x4 integer cosine transform used by the AVS M. The inter predictions are derived from the decoded frames. AVS M employs an adaptive variable length coding (VLC) coding technique. 12
AVS M Codec The reconstructed image is the sum of the prediction and current reconstructed error image. AVS M uses the deblocking filter in motion compensation loop. The deblocking process directly acts on the reconstructed reference first across vertical edges and then across horizontal edges. 13
AVS M Decoder [10] 14
Profiles and levels [3] AVS M defies Jiben profile which has 9 levels. 1.0: up to QCIF and 64 kbps 1.1: up to QCIF and 128 kbps 1.2: up to CIF and 384 kbps 1.3: up to CIF and 768 kbps 2.0: up to CIF and 2 Mbps 2.1: up to HHR and 4 Mbps 2.2: up to SD and 4 Mbps 3.0: up to SD and 6 Mbps 3.1: up to SD and 8 Mbps 15
Major and Minor tools of AVS M Network Abstraction layer NAL Supplemental Enhancement Information SEI Transform 4x4 integer transform Quantization and scaling- scaling only in encoder. Intra prediction 9 modes, simple 4x4 intra prediction and direct intra prediction Motion compensation 16x16/16x8/8x16/8x8/8x4/4x8/4x4 modes Quarter pixel interpolation 8 tap horizontal interpolation filter and 4 tap vertical interpolation filter Simplified in loop deblocking filter Entropy coding Error resilience 16
Network Abstraction Layer NAL [7] In AVS M Video compression, a compressed video bitstream is made up of Access units (AUs). AU contains information for decoding a picture. AU consists a no. of NAL units, some of them are optional. A NAL unit can be a sequence parameter set(sps), a picture parameter set(pps), an SEI, a picture header, or a slice_layer_rbsp (raw byte sequence payload) which consists of a slice_header followed by slice data. 17
Transform [7] 4x4 is the unit of transform, intra prediction and smallest motion compensation in AVS M. The 4x4 transform used in AVS M is AVS M uses a prescaled integer transform (PIT) technology; all of the scale related operations have been done in the encoder. The decoder does not need any scale operations. PIT is used in AVS M to reduce complexity. 18
Quantization It is performed by the adaptive uniform quantizer on the transform coefficients. The step size of the quantizer can be varied to provide rate control. The transmitted step size quantization parameter is used directly for luminance coefficients and for chrominance coefficients it is modified on the upper end of its range. The quantization parameter varies from 0 to 63 in steps of one. 19
Intra Prediction[3] Two Types Intra _4x4 Direct Intra Prediction(DIP) Significant complexity reduction Maintains comparable performance Intra_4x4 Each 4x4 block is predicted from spatially neighboring samples. For each 4x4 block, one of the nine predictions modes can be utilized to exploit spatial correlation including eight directional prediction modes (such as Down Left, vertical etc.) and non directional prediction mode (DC). 20
Intra Prediction [3] The 16 samples of the 4x4 block which are labeled as a- p are predicted using prior decoded samples in adjacent block label as A-D, E-H and X. The up right pixels used to predict are expanded by pixel sample D and the down left pixels are expanded by H. 21
Intra Prediction [10] 1 of the 9 prediction modes shown below is used for spatial correlation. 22
Content based Most Probable Intra Mode Decision A statistical model is used to determine the most probable intra mode of current block based on video characteristics and content correlation. A look up table is used to predict the most probable intra mode decision of current block. Irrespective of whether Intra_4x4 or DIP is used, the most probable mode decision method is described as follows: Get the intra mode of up block and left block. If the up (or left) block is not available for intra mode prediction, the mode up (or left) block is defined as -1. Use the up intra mode and left intra mode to find the most probable mode in the table. 23
Content based Most Probable Intra Mode Decision If the current MB is coded as Intra_4x4 mode, the intra prediction mode is coded as follows: If the best mode equals to the most probable mode, 1 bit of flag is transmitted to each block to indicate the mode of current block is its most probable mode. If the best mode is not the most probable mode, the 1 bit flag is to indicate the mode of current block is not the most probably mode, and then a 3 bit mode information is transmitted. Thus mode information of each block can be presented in 1 bit or 4 bits. 24
Direct Intra Prediction When direct intra prediction is used, a new method is followed to code the intra prediction mode information. A rate distortion based direct intra prediction mainly contains 5 steps. Step 1: All 16 4x4 blocks in a MB use their most probable modes to do Intra_4x4 prediction and calculate RDCost(DIP) of this MB. RDCost(mode)=D(mode) + λ.r(mode) (11) Step 2: Mode search of Intra_4x4, find the best intra prediction mode of each block, and calculate RDCost(Intra_4x4). 25
Direct Intra Prediction Step 3: Compare RDCost(DIP) and RDCost(Intra_4x4). If RDCost(DIP) is less than the RDCost(Intra_4x4), DIP flags equals to 1 then go to step 4, else DIP flags equals to 0 and go to step 5. Step 4: Encode the MB using DIP and finish the encoding of this MB. Step 5: Encode the MB using ordinary Intra_4x4 and finish the encoding of this MB. 26
Interframe Prediction [2] AVS M defines I picture and P picture. P picture uses forward motion compensated prediction. The max. no. of reference pictures used by a P picture is 2. It also specifies nonreference P pictures. If the nal_ref_idc of a P picture is equal to 0, the P picture shall not be used as a reference picture. The nonreference P pictures can be used for temporal scalability. The reference pictures are identified by the reference picture number, which is 0 for IDR picture. The reference picture no. of a non IDR reference picture is calculated as refnum = (2) 27
Interframe Prediction After decoding current picture, if nal_ref_idc of current picture is not equal to 0, then current picture is marked as used for reference. If current picture is an IDR picture, all reference pictures except current picture shall be marked as unused for reference. Otherwise, if nal_unit_type of current picture is not equal to 0 and the total no. of reference pictures excluding current picture is equal to the num ref frames, the foll. applies: If num ref frames is 1, reference pictures excluding current picture in DBP shall be marked as unused for reference. If num ref frames is 2 and sliding window size is 2, the reference picture excluding the current picture in DPB with smaller reference picture number shall be marked as unused for reference. Otherwise, id num ref frames is 2 and sliding window size is 1, the reference picture excluding the current picture in DBP with larger reference picture number shall be marked as unused for reference. 28
Interframe Prediction The size of motion compensation block can be 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4. If the half_pixel_mv_flag is equal to 1, the precision of the motion vector is up to ½ pixel, otherwise the precision of motion vector is up to ¼ pixel. When half_pixel_mv_flag is not present in the bitstream, it shall be inferred to be 11. The interpolated values at half sample positions can be obtained using 8 tap filter F1 = (-1/4,-12,41,41,-12,4,-1) and 4 tap filter F2 = (-1, 5,5, -1). 29
Interframe Prediction [3] The positions of the integer, half and quarter pixel samples are shown in the figure 8. Capital letters indicate integer sample positions, while small letters indicate half and quarter sample positions. 30
Deblocking Filter AVS M makes use of a simplified deblocking filter, wherein boundary strength is decided at MB level. Filtering is applied to the boundaries of luma and chroma blocks except for the boundaries of the pictures or slice. Intra prediction MB usually has more and bigger residuals than that of inter prediction MB, which leads to very strong blocking artifacts at the same QP. A stronger filter is applied to intra predicted MB and a weak filter is applied to inter predicted MB. When QP is not very large, the distortion caused by the quantization is relatively small, henceforth no filtering is required. 31
Deblocking Filter [7] If the following three conditions hold good then the filtering process is applied otherwise the filtering process is bypassed. p0-q0 <α(indexa) p1-p0 <β(indexb) q1-q0 <β(indexb) Where α and β can be calculated by IndexA, IndexB. p1, p0, q1, q0 are samples across every sample level boundary. [12] 32
Entropy Coding [5] Entropy coding involves mapping from a video signal after prediction and transforming to a variable length coded bitstream. AVS M uses Exp-Golomb code, as shown in the table below to encode syntax elements such as quantized coefficients, macroblock coding type, and motion vectors. 18 coding tables are used in quantized coefficients encoding. The encoder uses the run and the absolute value of the current coefficient to select the table. 33
Entropy Coding [5] 34
Context based Adaptive 2-D Variable Length Coding In AVS an efficient context based adaptive 2D variable length coding is designed for coding transform coefficients in a 4x4 block. The transform coefficients are mapped into one dimensional (level, run) sequence by the reverse zigzag scan. It employs 2D joint VLC to remove the redundancy between the levels and runs in transform coefficient blocks. It employs multiple conditionally trained 2D VLC tables to better match different (level, run) s probability distributions at different coding phases by automatic table switching. It makes use of an improved table switching method and an improved escape coding method. Also, it employs a new Coded Block Pattern (CBP), CBP_4x4 to be compatible better with the 4x4 transform. The transform block size in AVS is 4x4, so a new 4-bit syntax element CBP_4x4 is introduced. 35
Error Concealment To deal with the transmission error problem numerous techniques have been specified which are: Forward Error Concealment: Encoder plays the primary role. Backward error Concealment: estimation of lost information due to transmission errors in which the decoder fulfills the error concealment task. Interactive error Concealment: best reconstruction quality, but difficult to implement. 36
Error Resilience With the purpose of error concealment, scene signaling in SEI illustrates two kinds of information: (1) frames in which the short change starts and ends; and (2) the type of the scene transition. If a part of the current picture with which a scene information SEI message is associated is lost or corrupted, the decoder may apply a spatial error concealment algorithm to construct the lost or corrupted parts of the current picture if the scene has changed since the previous received picture. Otherwise the decoder may use a spatiotemporal error concealment algorithm. 37
Comparison between AVS Part 2 and AVS Part 7 Table 3 Comparison between AVS Part 2 and AVS Part 7 [2] 38
Applications [2] AVS Part-7: Mobile video Jiben Profile Record and local playback on mobile devices Multimedia Message Service (MMS) Streaming and broadcasting Real-time video conversation 39
SSIM The structural similarity (SSIM) index is a method for measuring the similarity between two images. The SSIM index is a full reference metric. In other words, the measuring of image quality based on an initial uncompressed or distortion-free image as reference. SSIM is designed to improve on traditional methods like Peak to peak Signal to Noise ratio (PSNR) and Mean Squared Error (MSE), which have proved to be inconsistent with human eye perception. 40
Input sequence : QCIF Foreman [10] BIT RATE,Y-PSNR QCIF sequence: Foreman (4:2:0 format) Total No: of frames : 300 frames.. Width : 176. Height: 144. Frame rate: 20 fps. 41
Results for QCIF Foreman Sequence QP Original File Size [Kb] Compressed File Size [Kb] Compression Ratio Bit Rate [Kbps] Y-PSNR [db] 63 11138 23 523:1 29.7 25.8 60 11138 31 419:1 73.3 29.1 55 11138 51 289:1 164.5 31.4 50 11138 86 123:1 251.7 33.73 40 11138 268 57:1 389.2 41.07 30 11138 780 18:1 476.5 43.2 20 11138 1960 5:1 583 43.89 10 11138 4259 2.89:1 649.3 44.99 5 11138 5085 2.12:1 849.7 45.47 0 11138 7197 1.3:1 1015 49.85 42
PSNR v/s Bitrate for Foreman QCIF 43
Input sequence : QCIF Car Phone [10] BIT RATE,Y-PSNR QCIF sequence: Car Phone (4:2:0 format) Total No: of frames : 300 frames.. Width : 176. Height: 144. Frame rate: 20 fps. 44
Results for QCIF Car Phone Sequence QP Original File Size [Kb] Compressed File Size [Kb] Compression Ratio Bit Rate [Kbps] Y-PSNR [db] 63 14182 23 581:1 25.85 24.58 60 14182 31 431:1 52.15 29.17 55 14182 51 262:1 101.11 33.27 50 14182 86 155:1 151.14 35.52 40 14182 268 50:1 235.81 42.24 30 14182 780 17:1 389.26 45.71 20 14182 1960 7:1 600.66 48.47 10 14182 4259 3:1 1473.73 50.55 5 14182 5085 2.63:1 2051.6 51 0 14182 7197 2:1 2789.2 52.44 45
PSNR v/s Bitrate for Car Phone QCIF 46
Input sequence : QCIF Akiyo [10] BIT RATE, Y-PSNR, Y SSIM QCIF sequence: Akiyo (4:2:0 format) Total No: of frames : 300 frames.. Width : 176. Height: 144. Frame rate: 20 fps. Original QP=0 QP=40 QP=63 47
Results for QCIF Akiyo Sequence QP Original File Size [Kb] Compressed File Size [Kb] Compression Ratio Bit Rate [Kbps] Y-PSNR [db] Y-SSIM 63 5569 9 619:1 1.69 24.413 0.678 60 5569 10 557:1 1.91 25.299 0.705 55 5569 12 464:1 2.43 27.644 0.788 50 5569 16 348:1 3.27 30.108 0.852 40 5569 32 174:1 6.53 34.885 0.934 30 5569 67 83:1 13.57 39.965 0.975 20 5569 153 36:1 31.25 45.498 0.991 10 5569 376 15:1 76.85 50.333 0.996 5 5569 480 12:1 98.27 51.714 0.997 0 5569 984 6:1 201.47 60.629 0.999 48
Input sequence : CIF Tempete [10] BIT RATE, Y-PSNR, Y SSIM CIF sequence: Tempete (4:2:0 format) Total No: of frames : 260 frames. Width : 352. Height: 288. Frame rate: 20fps Original QP=0 QP=40 QP=60 49
Results for CIF Sequence QP Original File Size [Kb] Compressed File Size [Kb] Compression Ratio Bit Rate [Kbps] Y-PSNR [db] Y-SSIM 63 13365 23 581:1 7.59 20.938 0.486 60 13365 31 431:1 10.36 22.024 0.559 55 13365 51 262:1 17.35 24.024 0.687 50 13365 86 155:1 29.04 26.164 0.790 40 13365 268 50:1 91.23 30.955 0.918 30 13365 780 17:1 266.02 36.374 0.971 20 13365 1960 7:1 668.68 42.756 0.991 10 13365 4259 3:1 1453.68 49.801 0.998 5 13365 5085 2.63:1 1735.43 51.195 0.998 0 13365 7197 2:1 2456.31 60.515 0.999 50
Conclusions AVS M is an application driven coding standard with well optimized and efficient techniques. AVS part 7 targets low complexity and low picture resolution mobility applications. The AVS encoder and decoder are implemented using AVS M software. Tests are carried out on various QCIF and CIF sequences. The bit rate, PSNR and SSIM values are tabulated. The performance of AVS-china was analyzed by varying the quantization parameter. The PSNR and bit rate were calculated. We observe that at higher QP the performance is degraded and vice versa. 51
Outcome The project helped in increasing familiarity in working with this codec. The experimental results gave an insight into the efficiency of this codec. The different aspects of simulation of this codec such as the following were learned and understood Modes of Configuration Modification of Parameters Input sequence specifications Analyze the codec output Efficient use of time and re-use of knowledge 52
List of Acronyms AU Access Unit AVS Audio Video Standard AVS-M Audio Video Standard for mobile B-Frame Interpolated Frame CAVLC Context Adaptive Variable Length Coding CBP Coded Block Pattern CIF Common Intermediate Format DIP Direct Intra Prediction DPB Decoded Picture Buffer EOB End of Block HD High Definition HHR Horizontal High Resolution ICT Integer Cosine Transform IDR Instantaneous Decoding Refresh I-Frame Intra Frame IMS IP Multimedia Subsystem ITU-T International Telecommunication Union MB Macroblocks MPEG Moving Picture Experts Group MPM Most Probable Mode MV Motion Vector NAL Network Abstraction Layer 53
List of Acronyms P-Frame Predicted Frame PIT Prescaled Integer Transform PPS Picture Parameter Set QCIF Quarter Common Intermediate Format QP Quantization Parameter RD Cost Rate Distortion Cost SAD Sum of Absolute Differences SD Standard Definition SEI Supplemental Enhancement Information SPS Sequence Parameter Set VLC Variable Length Coding 54
References [1] AVS working group official website, http://www.avs.org.cn [2] AVS Project and AVS-Video Techniques Lu Yu, Zhejiang University, Dec.13, 2005 ISPACS, 2005. [3] L. Yu et al., Overview of AVS-Video: Tools, performance and complexity, SPIE VCIP, vol. 5960, pp. 596021-1~ 596021-12, Beijing, China, July 2005. [4] W. Gao et al., AVS the Chinese next-generation video coding standard, National Association of Broadcasters, Las Vegas, 2004. [5] L. Fan, Mobile Multimedia Broadcasting Standards, ISBN: 978-0-387-78263-8, Springer US, 2009. [6] F. Yi et al., Low-Complexity Tools in AVS Part 7, J. Comput. Sci. Technol, vol.21, pp. 345-353, May. 2006. [7]L.YU, S. Chen and J. Wang, Overview of AVS-video coding standards, Signal Process: Image Commun, vol. 24, Issue 4, pp. 247-262, April 2009. [8] W. Gao, AVS A project towards to an open and cost efficient Chinese national 55 standard, ITU-T VICA workshop, ITU Headquarters, Geneva, 22-23 July 2005.
References [9] Z.Zhang et al., Improved Intra Prediction Mode-decision Method, Proc. of SPIE,Vol. 5960, pp. 59601W-1~ 59601W-9, Beijing, China, July 2005. [10] Z.Ma et al., Intra Coding of AVS Part 7 Video Coding Standard, J. Comput. Sci. Technol,vol.21, Feb.2006 [11] W. Gao and T. Huang, AVS Standard -Status and Future Plan, Workshop on Multimedia New Technologies and Application, Shenzhen, China, Oct. 2007. [12] Y. Cheng et al. Analysis and application of error concealment tools in AVS-M decoder, Journal of Zhejiang University Science A, vol. 7, pp. 54-58, Jan 2006. [13] M. Liu and Z. Wei, A fast mode decision algorithm for intra prediction in AVS-M video coding Volume 1, ICWAPR apos;07, Issue, 2-4, pp.326 331, Nov. 2007. [14] Q. Wang et al., Context-Based 2D-VLC for Video Coding, IEEE Int l Conf. on Multimedia and Expo (ICME), vol.1, pp. 89-92, June. 2004. [15] Audio Video Coding Standard (AVS) of China, Ki. N. Ngan, Department of Electronic Engineering, The Chinese University of Hong Kong, 11/19/2009. 56
References [16] W. Gao, K.N. Ngan and L. Yu, Special issue on AVS and its applications: Guest editorial, Signal Process: Image Commun, vol. 24, Issue 4, pp. 245-344, April 2009. [17] S.W. Ma and W. Gao, Low Complexity Integer Transform and Adaptive Quantization Optimization, J. Comput. Sci. Technol, vol.21, pp.354-359, May 2006. [18] S. Hu, X. Zhang and Z. Yang, Efficient Implementation of Interpolation for AVS, Image and Signal Processing, 2008. Congress on Volume 3, Issue, 27-30 May 2008, pp133 138. [19] R. Schafer and T. Sikora, Digital video coding standards and their role in video communications, Proc. of the IEEE, vol. 83, pp. 907-924, June 1995. 57
Thank you 58