FINAL REPORT PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT

EE 5359 MULTIMEDIA PROCESSING FINAL REPORT PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT Under the guidance of DR. K R RAO DETARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS AT ARLINGTON Vidur K. Vajani (1000679332) Email id: vidur.vajani@mavs.uta.edu

2 Acknowledgement: I would like to acknowledge the helpful discussions I had with Dr. K. R Rao. I sincerely appreciate the help, guidance, support and motivation by Dr. Rao during the preparation of my project. I would also like to thank my fellow classmates and seniors for their guidance and advice.

3 List of Acronyms: AU AVS AVS-M B-Frame CAVLC CBP CIF DIP DPB EOB HD HHR ICT IDR I-Frame IMS ITU-T MB MPEG MPM MV NAL P-Frame PIT PPS QCIF Access Unit Audio Video Standard Audio Video Standard for mobile Interpolated Frame Context Adaptive Variable Length Coding Coded Block Pattern Common Intermediate Format Direct Intra Prediction Decoded Picture Buffer End of Block High Definition Horizontal High Resolution Integer Cosine Transform Instantaneous Decoding Refresh Intra Frame IP Multimedia Subsystem International Telecommunication Union Macroblocks Moving Picture Experts Group Most Probable Mode Motion Vector Network Abstraction Layer Predicted Frame Prescaled Integer Transform Picture Parameter Set Quarter Common Intermediate Format

4 QP RD SAD SD SEI SPS VLC Quantization Parameter Cost Rate Distortion Cost Sum of Absolute Differences Standard Definition Supplemental Enhancement Information Sequence Parameter Set Variable Length Coding

5 List of figures: Figure 1: History of audio video coding standards Figure 2: Evaluation of AVS China Figure 3: Common Intermediate Format (CIF) 4:2:0 chroma sampling Figure 4: Quadrature Common Intermediate Format (QCIF) 4:2:0 chroma sampling Figure 5: Layered structure Figure 6: Current picture predicted from previous P pictures Figure 7: Slice layer example Figure 8: Macroblock in (a) 4:2:0 and (b) 4:2:2 formats Figure 9: AVS-M encoder Figure 10: Intra_4x4 Prediction including current block and its surrounding coded pixels for prediction Figure 11: Eight Directional Prediction Modes in AVS-P7 Figure 12: Nine Intra_4x4 prediction Modes in AVS-P7 Figure 13: The position of integer, half and quarter pixel samples Figure 14: luma and chroma block edges Figure 15: Horizontal or Vertical Edge of 4 4 Block Figure 16: Adaptive sliding window based reference picture marking process Figure 17: Block diagram of an AVS-M decoder Figure 18: Inverse DCT matrix of AVS-M Figure 19: Subpixel locations around integer pixel A Figure 20: The flow chart of main() Figure 21: The flow chart of Encode_I_Frame() Figure 22: The flow chart of Encode_P_FrameO Figure 23: Video quality at various QP values for miss_america_qcif Figure 24: PSNR (db) vs. Bitrate (Kbps) for miss_america_qcif Figure 25: SSIM vs. Bitrate (Kbps) for miss_america_qcif

6 Figure 26: Video quality at various QP values for mother_daughter_qcif Figure 27: PSNR (db) vs. Bitrate (Kbps) for mother_daughter_qcif Figure 28: SSIM vs. Bitrate (Kbps) for mother_daughter_qcif Figure 29: Video quality at various QP values for stefan_cif Figure 30: PSNR (db) vs. Bitrate (Kbps) for stefan_cif Figure 31: SSIM vs. Bitrate (Kbps) for stefan_cif Figure 32: Video quality at various QP values for silent_cif Figure 33: PSNR (db) vs. Bitrate (Kbps) for silent_cif Figure 34: SSIM vs. Bitrate (Kbps) for silent_cif

7 List of Tables: Table 1: History of AVS China Table 2: Different parts of AVS China Table 3: Comparison between different AVS profiles Table 4: AVS profiles and their applications Table 5: Macroblock typres of P picture Table 6: Submacroblock types of P picture Table 7 Context-based Most Probable Intra Mode Decision Table Table 8: Kth Order Golomb Code Table 9: NAL unit types Table 10: Interpolation filter coefficient Table 11: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for missamerica_qcif sequence Table 12: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for motherdaughter_qcif sequence Table 13: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for Stefan_cif sequence Table 14: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for silent_cif sequence

8 Abstract: The modes of digital representation of information such as audio and video signals have undergone much transformation in leaps and bounds. The real-time mobile video communication requires the balance between performance and complexity. Audio Video coding Standard (AVS) is established by the Working Group of China in the same name [4]. Up to now, there are two separate parts in this standard targeting to different video compression applications: AVS Part 2 for high-definition digital video broadcasting and high-density storage media and AVS Part 7 for low complexity, low picture resolution mobility applications [7]. Primary focus of this project is to study and analyze the performance of AVS-M. In this project, the major AVS-M video coding tools, their performance and complexity are analyzed. This project provides an insight into the AVS-M video standard, architecture of AVS-M codec, features it offers and various data formats it supports. A study is done on the key techniques such as transform and quantization, intra prediction, quarter-pixel interpolation, motion compensation modes, entropy coding and in-loop de-blocking filter. AVS-M video codec is analyzed at various quality measures like bit rate, PSNR and SSIM.

9 Contents: Acknowledgement 2 List of Acronyms 3 List of figures 5 List of tables 7 Abstract 8 1.0 Introduction 10 2.0 AVS Standard 12 2.1 Introduction to AVS-M 14 2.2 Data Formats 16 2.3 Picture Format 16 2.4 Layered Structure 16 3.0 AVS-M encoder 19 3.1 Intra Prediction 20 3.1.1 Intra_4x4 20 3.1.2 Content based Most Probable Intra Mode Decision 21 3.1.3 Direct Intra Prediction 21 3.2 Interprediction Mode 22 3.3 Deblocking filter 24 3.4 Entropy Coding 24 4.0 AVS Tools 25 4.1 High level tools similar to H.264/AVC 25 4.2 High Level Tools / Features Different Than H.264/AVC 28 5.0 AVS-M decoder 31 5.1 Error concealment tools in AVS-M decoder 32 6.0 Main program flow analysis for encoder 32 7.0 Performance Analysis 36 7.1 Simulation results of sequence miss-america_qcif 36 7.2 Simulation results of sequence mother-daughter.qcif 38 7.3 Simulation results of sequence sefrat_cif 41 7.4 Simulation results of sequence silent_cif 43 8.0 Conclusions 46 References 46

10 1. Introduction: Over the past 20 years, analog based communication around the world has been sidetracked by digital communications. As the demand of audio and video signals is increased by huge amount need for the worldwide standard for audio, video and image has also increased tremendously. The modes of digital representation of information such as audio and video signals have undergone much transformation in leaps and bounds. Many successful standards of audio-video signals have been released and they have plenty of applications in today s digital media. Figure 1 explains the evaluation of coding standards. Figure 1: History of audio video coding standards [5] Moving Picture Experts Group (MPEG) [3] was the first one which came up with the format for transferring information in digital format. Soon, after its release this format became the standard for audio and video file compression and transmission. After that MPEG-2 and MPEG-4 were released subsequently. MPEG-4 part 2 uses advanced coding tools with additional complexity to achieve higher compression factors than MPEG-2. MPEG-4 is very efficient in terms of coding, being almost 1/4 th the size of MPEG-1. These standards had almost monopoly in the market in 1990's. AVS video standard is developed by the Audio Video Coding Standard Working Group of China (AVS working group in short), which was approved by the Chinese Science and Technology Department of Ministry of Information Industry in June 2002. [3]. This audio and video standard was initiated by the Chinese government in order to counter the monopoly of the MPEG standards, which were costing it dearly. The mandate of the AVS working group is to establish the China's national standards for

11 compression, manipulation and digital rights management in digital audio and video multimedia equipment and systems. AVS Mission [5]: To develop a 2 nd Generation video coding standard with same/better coding performance than others Avoids licensing risk based on clearly analysis of related patents in last 50 years To help DTV, IPTV and new media operators in China and also outside China Three main characteristics of AVS China [4]: China coordinates the formulation, technically advanced second generation of source coding standard - advanced Patent pond management system and completed standard workgroup law document - independent Formulations process open, internationalized - opening This standard is applied in the fields like high-resolution digital broadcast, wireless communications medium, and internet broadcast media. Short history of AVS China: [18] Mar18-21. 2002 June 11, 2002 June 21, 2002 Aug 23-24, 2002 Dec 9, 2002 Dec 19, 2003 Mar 29, 2004 Mar. 30-31, 2004 178th Xiangshan Science Conference, Beijing, Broad-band Network and Security Stream Media Technology Science and Technology Department of MII released a bulletin about setting up Audio Video Coding Standard Workgroup on China Electronics Audio Video Coding Standard Working Group was set up in Beijing. first meeting of AVS, AVS united with MPEG-China. Website of AVS opened to the members formally. Department of Science and Technology of Ministry of Information Industry issued the notice of Setting up Audio Video Coding Standard Working Group and assigned the task of the group On the 7th AVS meeting, AVS-video (1.0) and AVS-system (1.0) was finalized Industry forum of AVS video coding technology towards 3G, ShenZhen, sponsored together with universities and companies of Hong Kong Start up the video coding standardization for new generation of mobile communication Table 1: History of AVS China [18] Audio video standard for Mobile (AVS-M) is the seventh part of the most recent video coding standard which is developed by AVS workgroup of China which aims for mobile systems and devices with limited processing and power consumption.

12 Figure 2 shows the evaluation of AVS video coding standards. 2. AVS Standard: Figure 2: Evaluation of AVS China [9] AVS is a set of integrity standard system which contains system, video, audio and media copyright management. AVS comprises 10 parts. The different parts of AVS China are listed in Table 1. Table 2: Different parts of AVS China [5]

13 As it can be seen from the table 2 that AVS has vast applications in various digital media. According to the application requirement, the trade-off between the encoding efficiency and encoder/decoder implementation complexity is selected. Considering different requirements AVS is subdivided into four profiles. 1. Jizhun Profile: Jizhun profile is defined as the first profile in the national standard of AVS-Part2, approved as national standard in 2006, which mainly focuses on digital video applications like commercial broadcasting and storage media, including high-definition applications. Typically, it is preferable for high coding efficiency on video sequences of higher resolutions, at the expense of moderate computational complexity. 2. Jiben profile: Jiben profile is defined in AVS-Part7 target mobility video applications featured with smaller picture resolution. Thus, computational complexity becomes a critical issue. In addition, the ability on error resilience is needed due to the wireless transporting environment. 3. Shenzhan profile: The standard of AVS-Shenzhan focuses exclusively on solutions of standardizing the video surveillance applications. Especially, there are special features of sequences from video surveillance, i.e. the random noise appearing in pictures, relatively lower encoding complexity affordable, and friendliness to events detection and searching required. 4. Jiaqiang profile: To fulfill the needs of multimedia entertainment, one of the major concerns of Jiaqiang profile is movie compression for high-density storage. Relatively higher computational complexity can be tolerated at the encoder side to provide higher video quality, with compatibility to AVS China-Part 2 as well. Table 3 shows the comparison between different AVS profiles. Table 3: Comparison between different AVS profiles [10]

14 According to the configuration, different profiles have different applications; some key applications of each profile are shown in table 4. 2.1 Introduction to AVS-M: Table 4: AVS profiles and their applications [6] The seventh part of the AVS and Jiben profile of AVS china aims at mobile systems and devices with limited processing and power consumption. This is also called AVS-M, where letter M stands for mobile. However, the target applications of AVS-M are not limited to mobile applications as implied by the name. AVS-M has been developed to meet the needs of video compression in the applications of digital storage media, networked media streaming, multimedia communications and so on. The standard is applicable to the following applications: Interactive storage media Wide-band video services Real-time telecommunication services Remote video surveillance AVS P7 short history can be given by: [13] 2004.8,WD(working draft) 2004.9,10th AVS meeting, WD 2.0 2004.11,CD(committee draft) 2005.9,14th AVS meeting, FCD (Final CD) 2006.1, FD 2006.3, GB

15 Most common test sequences are Common Intermediate Format (CIF) and Quadrature Common Intermediate Format (QCIF). Their formats are shown in figures 3 and 4 respectively. The CIF sequence has a fixed dimension of 352(width) x 288(height) whereas, the QCIF sequence has a fixed dimension of 176(width) x 144(height). Figures 3 and 4 show the CIF and QCIF structure with respect to 4:2:0 chroma sampling, where Y is the luminance (brightness) component and Cb and Cr are the chrominance (color) components. Fig 3: Common Intermediate Format (CIF) 4:2:0 chroma sampling [24] Fig 4: Quadrature Common Intermediate Format (QCIF) 4:2:0 chroma sampling [24] According to the encoding speed and sequence format AVS M a Jiben Profile has 9 different levels. These are shown below. [6] 1.0: up to QCIF and 64 kbps and resolution is 176 144 1.1: up to QCIF and 128 kbps and resolution is 176 144 1.2: up to CIF and 384 kbps and resolution is 352 288 1.3: up to CIF and 768 kbps and resolution is 352 288 2.0: up to CIF and 2 Mbps and resolution is 352 288 2.1: up to HHR and 4 Mbps and resolution is 704 480 2.2: up to SD and 4 Mbps and resolution is 720 576 3.0: up to SD and 6 Mbps and resolution is 720 576 3.1: up to SD and 8 Mbps and resolution is 720 576

16 2.2 Data Formats[7]: 1. Progressive Scan: This format is directly compatible with all content that originates in film, and can accept inputs directly from progressive telecine machines. It is also directly compatible with the emerging standard for digital production the so-called 24p standard. In the next few years, most movie production and much TV production will be converted to this new standard. A significant benefit of progressive format is the efficiency with which motion estimation operates. Progressive content can be encoded at significantly lower bitrates than interlaced content with the same perceptual quality. Furthermore, motion compensated coding of progressive format data is significantly less complex than coding of interlaced data. 2. Interlaced Scan: AVS also provides coding tools for interlaced scan format. These tools offer coding of legacy interlaced format video. 2.3Picture Format: AVS is a generic standard and can code pictures with a rectangular format up to 16K x 16K pixels in size. Pixels are coded in luminance-chrominance format (YCrCb) and each component can have precision of 8 bits. AVS supports a range of commonly used frame rates and pixel aspect ratios AVS supports 4:2:0 and 4:2:2 chroma formats. Chromaticity is defined by international standards. 2.4 Layered Structure [5]: AVS is built on a layered data structure representing traditional video data. This structure is mirrored in the coded video bit stream. Figure 5 illustrates this layered structure. Figure 5: Layered structure [7]

17 At the highest layer, sets of frames of continuous video are organized into a sequence. The sequence provides an opportunity to download parameter sets to decoders. Pictures can optionally be subdivided into rectangular regions called Slices. Slices are further subdivided into square regions of pixels called macroblocks. These are the fundamental coding units used by AVS and comprise a set of luminance and chrominance blocks of pixels covering the same square region of the picture. Sequence: The sequence layer comprises a set of mandatory and optional downloaded system parameters. The sequence layer provides an entry point into the coded video. For example, they should be placed at the start of each chapter on a DVD to facilitate random access. Alternatively they should be placed every ½- second in broadcast TV to facilitate changing channels. Picture: The picture layer provides the coded representation of a video frame. It comprises a header with mandatory and optional parameters and optionally with user data. Three types of picture are defined by AVS: Intra pictures (I-pictures) Predicted pictures (P-pictures) Interpolated pictures (B-pictures) AVS uses adaptive modes for motion compensation at the picture layer and macroblock layer. At the picture layer, the modes are [7] Forward prediction from the most recent reference frame Forward prediction from the second most recent prediction frame Interpolative prediction between the most recent reference frame and a future reference frame. Intra coding Figure 6 illustrates how current picture is predicted from the previous reference pictures. Figure 6: Current picture predicted from previous P pictures [7] Slice: The slice structure provides the lowest-layer mechanism for resynchronizing the bitstream in case of transmission error. Slices comprise an arbitrary number of raster-ordered rows of macroblocks as illustrated in the example of figure 7.

18 Figure 7: Slice layer example [7] Macroblock: A macroblock includes the luminance and chrominance component pixels that collectively represent a 16x16 region of the picture. In 4:2:0 mode, the chrominance pixels are subsampled by a factor of two in each dimension; therefore each chrominance component contains only one 8x8 block. In 4:2:2 mode, the chrominance pixels are subsampled by a factor of two in the horizontal dimension; therefore each chrominance component contains two 8x8 blocks. This is illustrated in Figure 8. (a) (b) Figure 8: Macroblock in (a) 4:2:0 and (b) 4:2:2 formats [7] At the macroblock layer, the modes depend on the picture mode [7] In intra pictures, all macroblocks are intra coded. In predicted pictures, macroblocks may be forward predicted or intra coded. In interpolated pictures, macroblocks may be forward predicted, backward predicted, interpolated or intra coded.

19 Block: The block is the smallest coded unit and contains the transform coefficient data for the prediction errors. In the case of intra-coded blocks, Intra prediction is performed from neighboring blocks. There are two MB types of I-picture specified by AVS-M. If mb_type is 1, the type of current MB is I_4x4; otherwise, the type is I Direct. The MB types of P-picture are shown in Table 5 and Table 6. If skip mode flag is 1, then MbTypeIndex is equal to mb_type plus 1; otherwise, MbTypeIndex is equal to mb_type. If MbTypeIndex is greater than or equal to 5, MbTypeIndex is set to 5. Table 5 : Macroblock typres of P picture [6] 3.0 AVS-M encoder: AVS-M encoder is shown in figure 9. Table 6: Submacroblock types of P picture [6] There are 2 modes of prediction: 1. Intra prediction 2. Inter prediction Figure 9: AVS-M encoder [10]

20 3.1 Intra Prediction [16] Two types of intra prediction modes are adopted in AVS-P7, Intra_4x4 and Direct Intra Prediction (DIP). AVS-P7 s intra coding brings a significant complexity reduction and maintains a comparable performance. 3.1.1 Intra_4x4 When using the Intra_4x4 mode, each 4x4 block is predicted from spatially neighboring samples as illustrated in Fig. 10. The 16 samples of the 4x4 block which are labeled as a-p are predicted using prior decoded samples in adjacent block label as A-D, E-H and X. The up-right pixels used to predict are expanded by pixel sample D. Similarly, the down-left pixels are expanded by H. Compared with the reference pixels locations used by Intra_4x4 of H.264/AVC, AVS-P7 brings with data fetching and onchip memory consuming reduction while remaining comparable performance. Fig. 10 Intra_4x4 Prediction including current block and its surrounding coded pixels for prediction [16] For each 4x4 block, one of nine prediction modes can be utilized to exploit spatial correlation including eight directional prediction modes (such as Down Left, Vertical, etc) and non-directional prediction mode (DC). In addition to DC prediction mode (which all 16 pixels in 4x4 block is predicted by the average of surrounding available pixels), eight directional prediction modes are specified as shown in Fig. 11. Fig. 11 Eight directional prediction modes in AVS-P7 [16]

21 All the modes adopted by the AVS-P7 are utilized to improve the intra coding efficiency in heterogeneous area, e.g. multiple objects in one macroblock or block with different motion tendency. And, we present all the modes specifically in Fig. 12. Fig. 12 Nine intra_4x4 prediction modes in AVS-P7 [16] 3.1.2 Content based Most Probable Intra Mode Decision A statistical model is used to determine the most probable intra mode of current block based on video characteristics and content correlation. A look up table is used to predict the most probable intra mode decision of current block. Irrespective of whether Intra_4x4 or DIP is used, the most probable mode decision method is described as follows: Get the intra mode of up block and left block. If the up (or left) block is not available for intra mode prediction, the mode up (or left) block is defined as -1. Use the up intra mode and left intra mode to find the most probable mode in the table. If the current MB is coded as Intra_4x4 mode, the intra prediction mode is coded as follows: If the best mode equals to the most probable mode, 1 bit of flag is transmitted to each block to indicate the mode of current block is its most probable mode. Table 7 Context-based most probable intra mode decision table [16] 3.1.3 Direct Intra Prediction: When direct intra prediction is used, we use a new method to code the intra prediction mode information. As analysis before, when Intra_4x4 is used, for each block at least 1 bit is needed to present the mode information. It means, for a macroblock, even when intra prediction mode of all 16 blocks are their most probable mode, 16 bits is needed to indicate the mode information.

22 A rate-distortion based direct intra prediction mainly contains 5 steps Step 1: All 16 4 4 blocks in a MB use their MPMs to do Intra_4 4 prediction and calculate RDCost(DIP) of this MB. Step 2: Mode search of Intra_4 4, find the best intra prediction mode of each block, and calculate RDCost(Intra_4x4). Step 3: Compare RDCost(DIP) and RDCost(Intra_4x4). If RDCost(DIP) is less than RDCost(Intra_4x4), DIP flag equals to 1 then go to step 4, else DIP flag equals to 0 go to step 5. Step 4: Encode the MB using DIP and finish encoding of this MB. Step 5: Encode the MB using ordinary Intra_ 4 4 and finish encoding of this MB. 3.2 Interprediction Mode: AVS-M defines I picture and P picture. P pictures use forward motion compensated prediction. The maximum number of reference pictures used by a P picture is two. To improve the error resilience capability, one of the two reference pictures can be a I/P pictures far away from current picture. AVS-M also specifies nonreference P pictures. If the nal_ref_idc of a P picture is equal to 0, the P picture shall not be used as a reference picture. The nonreference P pictures can be used for temporal scalability. The reference pictures are identified by the reference picture number, which is 0 for IDR picture. The reference picture number of a non-idr reference picture is calculated as given in equation 1. = +, num (1) = + + 32, otherwise Where num is the frame num value of current picture, is the frame num value of the previous reference picture, and refnum is the reference picture number of the previous reference picture. After decoding current picture, if nal_ref_idc of current picture is not equal to 0, then current picture is marked as used for reference. If current picture is an IDR picture, all reference pictures except current picture in decoded picture buffer (DPB) shall be marked as unused for reference. Otherwise, if nal_unit_type of current picture is not equal to 0 and the total number of reference pictures excluding current picture is equal to num ref frames, the following applies: 1. If num ref frames is 1, reference pictures excluding current picture in DBP shall be marked as unused for reference. 2. If num ref frames is 2 and sliding window size is 2, the reference picture excluding current picture in DPB with smaller reference picture number shall be marked as unused for reference. 3. Otherwise, if num ref frames is 2 and sliding window size is 1, the reference picture excluding current picture in DBP with larger reference picture number shall be marked as unused for reference. The size of motion compensation block can be 16 16, 16 8, 8 16, 8 8, 8 4, 4 8 or 4 4. If the half_pixel_mv_flag is equal to 1, the precision of motion vector is up to ½ pixel, otherwise the precision of motion vector is up to ¼ pixel [18]. When half_pixel_mv_flag is not present in the bitstream, it shall be inferred to be 11.

23 The positions of integer, half and quarter pixel samples are depicted in figure 13. Capital letters indicate integer sample positions, while small letters indicate half and quarter sample positions. The interpolated values at half sample positions can be obtained using 8-tap filter F1 = ( 1,4, 12,41,41, 12,4, 1) and 4-tap filter F2 = ( 1,5,5, 1). The interpolated values at quarter sample position can be obtained by linear interpolation. Figure 13: The position of integer, half and quarter pixel samples [1] According to Figure 13, half sample b is calculated as follows: b = C+4D 12E +41F +41G 12H +4I J b = (b +32) >> 6 And half sample h is calculated as follows: h = A+5F +5 S h = (h +4) >> 3 Both b and h should be clipped to the range [0,255] Quarter sample a is calculated as a = (F +b+1)>>1 Interpolation of chroma sample values is shown in Figure 14. A, B, C and D are the integer sample values around the interpolated sample. dx and dy are the horizontal and vertical distances from predicted sample to A, respectively. The predicted sample predxy is calculated as given by equation 2. = ((8 )(8 )A+ (8 )B+(8 ) C+ D +32)>>6, (2) should be clipped to the range [0,255].

24 3.3 Deblocking filter [3] AVS Part 7 makes use of a simplified deblocking filter, wherein boundary strength is decided at MB level [3]. Filtering is applied to the boundaries of luma and chroma blocks except for the boundaries of picture or slice. In Figure 15 and Figure 16 the dotted lines indicate the boundaries which will be filtered. Intra prediction MB usually has more and bigger residuals than that of inter prediction MB, which leads to very strong blocking artifacts at the same QP. Therefore, a stronger filter is applied to intra predicted MB and a weak filter is applied to inter predicted MB. When the MB type is P Skip, there is no coded residual. When QP is not very large, the distortion caused by quantization is relatively small, henceforth no filtering is required. Figure 14: Luma and chroma block edges [3] Figure 14 shows the pixels used for sample-level deblocking filter. Different filtering processes are applied to each sample-level boundary under different filter modes and the values of some pixels are updated. 3.4 Entropy Coding: Figure 15: Horizontal or vertical edge of 4 4 Block [2] In entropy coding, the basic concept is mapping from a video signal after prediction and transforming to a variable length coded bitstream, generally referring to two entropy coding methods, either variable length coding or arithmetic coding. For the request of higher coding efficiency, contextbased adaptive entropy coding technique is developed and favored by current coding standards. AVS-M uses Exp-Golomb code, as shown in Table 6 to encode syntax elements such as quantized coefficients, macroblock coding type, and motion vectors. Eighteen coding tables are used in quantized coefficients encoding. The encoder uses the run and the absolute value of the current coefficient to select Table 6.

25 Table 6: Kth order golomb code [5] 4.0 AVS Tools [19]: 4.1 High level tools similar to H.264/AVC: 1. Network abstraction layer (NAL) unit structure 2. Parameter sets. 3. Instantaneous decoding refresh (IDR) picture 4. Gradual decoding refresh (GDR) or gradual random access 5. Flexible slice coding 6. Reference picture numbering 7. Non-reference P picture 8. Constrained intra prediction 9. Loop filter disabling at slice boundaries 10. Byte-stream format. 1. NAL Unit Structure: [19] Video coding standards, earlier than H.264/AVC, use the start code based bitstream structure, wherein the bi=stream consists of several layers, typically including several of the following: a sequence layer, a picture layer, a slice layer, a macroblock layer, and a block layer. The bitstream for each layer typically consists of a header and associated data, and each header of a slice or higher layer starts with a start code for resynchronization and identification. The NAL unit structure was first introduced into H.264/AVC. In this structure, the coded video data is organized into NAL units, each of which contains an NAL unit header and payload. The NAL unit header contains the type of data contained in the NAL unit. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems. A series of NAL units generated by and encoder is referred to as an NAL unit stream. NAL unit has 6 types: [18] picture_header_rbsp( ) of non-idr picture picture_header_rbsp( ) of IDR picture slices_layer_rbsp( ) seq_parameter_set_rbsp( ) pic_parameter_set_rbsp( )

26 sei_rbsp( ) andom_access_point_indicator_rbsp( ) Table 9: NAL unit types [6] The advantages of the NAL unit structure over the start code based structure include the following. The NAL unit structure provides convenient conveyance of video data for different transport layers. For packet-based systems, such as RTP, the transport layer can identify NAL unit boundaries without use of start codes. Therefore, those overhead bits can be saved. The NAL unit structure provides flexible extension capability using new NAL unit type. If needed, start code prefixes can be used to transform an NAL unit stream to the start code based structure. 2. Parameter Sets: Parameter sets contain the sequence-level header information and the infrequently changing picture-level header information. With parameter sets, the infrequently changing information is not to be repeated for each sequence or picture, hence coding efficiency is improved. Furthermore, use of parameter sets enables out-of-band transmission of the important header information, such that improved error resilience is achieved. In out-of-band transmission, parameter set NAL units is transmitted in a channel different than the one for transmission of other NAL units.

27 3. IDR Picture: Support of random access is important for any video codec. In video coding standards earlier than H. 264/AVC, intra coded pictures are used as random access points. However, because the use of multiple reference pictures, a following an intrapicture may have inter prediction reference from a picture preceding the intra picture in decoding order. This may make an intra picture not random accessible. Consequently, to perform random access, it is required to check for each intra picture whether any subsequent pictures have a reference dependency on pictures prior to the intra picture in decoding order. This significantly increases the complexity for random access operations. To solve the above problem and better meet the requirement, explicitly signaling of IDR pictures is specified such that random accessible intra pictures can be easily identified. In both AVS-M and H.264/AVC, IDR pictures are signaled by a unique NAL unit type. 4. Gradual Decoding Refresh (GDR) or Gradual Random Access GDR enables gradual random access, wherein the decoded video is completely refreshed after a number of decoded pictures, rather than the instantaneous decoding refresh by random accessing at an IDR picture. A nice feature of GDR is that random access can be performed at non-idr pictures or even non-intra pictures. Furthermore, GDR can enable gradual random access and improved error resilience simultaneously. In both, AVS-M and H.264/AVC, GDR or gradual random access points are signaled using supplemental enhancement information (SEI) messages. 5. Flexible Slice Coding Slice coding is an efficient tool to improve video error resilience. In many packet-oriented transport systems, the optimal size of a packet and a slice in bytes is a function of the expected packet loss rate and the underlying maximum transmission unit (MTU) size. It is beneficial to encapsulate one sliece into one packet, because if a slice is split into multiple packets, the loss of one of these packets may prevent decoding of the entire slice. If the slice size were to be in the granularity of MB rows and must start from the first MB of an MB row, the sizes of the slices would likely be non-optimal. In other words, such a slice design would prevent optimal transport. 6. Reference Picture Numbering In AVS-M, reference pictures are labeled by the 5-bit syntax element frame_num, which is incremented by one for each reference picture. For non-reference pictures, the value is incremented by one is relation to the value of the previous reference picture in decoding order. This frame number enables decoders to detect loss of reference pictures. If there is no loss of reference pictures, the decoder can go on decoding. Otherwise, a proper action should be taken. 7. Non-Reference P Picture In standards earlier than H.264/AVC, I and P pictures are always reference pictures, while B pictures are always non-reference pictures. In H.264/AVC, pictures of any picture type, including B pictures, may be reference pictures, and P pictures may be reference or non-reference pictures. The latest AVS-M specification does not support B pictures. To enable temporal scalability with only I and P pictures, non-reference picture has the syntax element equal to 0.

28 8. Constrained Intra Prediction Intra prediction utilizes available neighboring samples in the same picture for prediction of the coded samples to improve the efficiency of intra coding. In the constrained intra prediction mode, samples from inter coded blocked are not used for intra prediction. Use of this mode can improve error resilience. If errors happen in the reference picture, the error may propagate to the inter coded blocks of the current picture. If constrained intra prediction were not used, then the error may also propagate to the intra coded blocks due to intra prediction. Consequently, the intra coded blocks would not have efficiently stopped error proportion. 9. Loop filter Disabling at Slice Boundaries Similar to H.264/AVC, AVS-M includes a deblocking loop filter. Loop filtering is applied to each 4by 4 block boundary. Loop filtering is the only decoding operation that makes slices in the same picture not completely independent of each other. To enable completely independent slices in the same picture, turning off loop filtering at slice boundaries is supported, both in H.264/AVC and AVS-M. AVS- M further supports turning off loop- filtering completely. Having completely independent slices can achieve perfect gradual random access based on gradual decoding refresh. 10. Byte Stream Format The byte stream format is specified to enable transport of the video in byte or bit oriented transport systems, where start codes are needed to detect the start of each coded slice for resynchronization purpose. The method to transform an NAL unit stream to a byre stream is to prefix a start code to each NAL unit. Start code emulation shall not occur in the bits of each NAL unit. Even though start code emulation is not a problem for packet based transport systems, start code emulation prevention is always used, because otherwise transforming NAL unit streams to byte streams would become much more complex. 4.2 High Level Tools / Features Different Than H.264/AVC [19] This subsection discusses the following AVS-M high-level tools or features that are different than H. 264/AVC: 1. Picture ordering and timing 2. Random access point indicator 3. Picture header 4. Signaling of scalable points 5. Reference picture management 6. Hypothetical reference decoder (HRD) 1. Picture order and timing The picture timing mechanism of AVS-M is similar to H.263. The syntax element picture_distance in AVS-M is similar to the temporal reference (TR) in H.263. Picture_distance is an indication of picture order. The value of picture_distance increases by one in relation to the previous coded picture plus the number of skipped pictures after the previous coded picture and is then modulo divided by 256. A difference between picture_distance and TR is than picture_distance resets to zero at

29 IDR pictures. Together with the syntax element, the time duration of the picture_distance difference of 1, the picture_distance also indicates the picture timing. 2. Random access point indicator AVS-M specifies the random access point indicator (RAPI) NAL unit type. An RAPI NAL unit, if present, shall be the first NAL unit of an access unit in decoding order. An RAPI NAL unit indicates that the access unit contains an IDR picture or a gradual random access SEI message, and the decoding of any access unit after the RAPI NAL unit in decoding order does not need any sequence parameter set NAL unit, picture parameter set NAL unit or SEI NAL unit prior to the RAPI NAL unit in decoder order. The RAPI NAL unit enables easier random access operation. In order to know that an access unit is a random access point, it is no longer necessary to buffer and parse all the NAL units in the beginning of an access units until the first coded slice of an IDR picture or the first SEI NAL unit containing a gradual random access SEI message. This feature is particularly useful in broadcast or multicast applications for a client to tune in an ongoing session. 3. Picture header AVS-M uses picture header and picture parameter set simultaneously. The main reason to reintroduce a picture header to AVS-M was for coding efficiency purposes. A conservative estimate of the overhead is as follows. If the picture header were not used and those picture header syntaz elements were put in the slice header, and further assume that for the case of CIF 30fps, 256Kbps, each MB row was coded as one slice, the additional overhead would be 4.8% of the total bitrate. For mobile video telephony, it is reasonable to have a QCIF resolution at 15fps, 100 bytes per slice, 64Kbps, which results in 80 slices per second. In this case, the additional overhead would be roughly 2.4% of the total bit rate. 4. Signalling of scalable points Temporal scalability can be achieved by using nonreference pictures supported in AVS-M. With such functionality, one AVS-M bitstream may be efficiently served to decoding devices with different decoding capabilities or connected with different transmission bandwidth. For example, an originating terminal creates an AVS-M that complies with a profile at a certain level and is scalably coded in such a way that a base layer bitstream containing only the reference pictures of the entire bitstream is compliant with a lower level. If the receiver is only capable of decoding the lower level, the server should adapt the video bitstream accordingly. To make sure that the adapted bitstream complies with the lower level, the server should analyze the bitstream, make necessary transcoding, and check that the adapted bitstream conforms to requirements of the lower level. These processing steps require a lot of computations in the server. The above problem can be solved by signaling of the profile and level of a scalable bistreamlayer. This signaling allows creation or modification of the compressed bitstream in such a way where the created or modified bitstream later is guaranteed to conform to a certain pair of profile and level. The signaling greatly simplifies and speeds up multimedia adaptation and computational scalability in local playback or streaming applications. 5. Reference picture management Reference picture management is one important aspect of decoded picture buffer (DPB) management. Decoded pictures used for predicting subsequent coded pictures and for future output are buffered in the DPB. To efficiently utilize the buffer memory, the DPB management processes, including the storage process of decoded pictures into the DPB, the marking process of decoded pictures from the

30 DPB, shall be specified. This is particularly important when multiple reference pictures are used and when picture output order is not decoding order. Figure 16: Adaptive sliding window based reference picture marking process [19] 6. Hypothetical reference decoder (HRD) In video coding standards, a compliant bit stream must be able to be decoded by a hypothetical decoder that is conceptually connected to the output of an encoder and consists of at least a pre-decoder buffer, a decoder, and an output/display unit. This virtual decoder is known as the hypothetical reference decoder (HRD). The HRD supports multiple operation points, each of which is characterized by the following five parameters: Bitrate, indicating the maximum input bit rate of the corresponding operation point; CpbSize, indicating the size of CPB of the corresponding operation point; DpbSize, indicating the size of the DPB of the corresponding operation point; InitCpbDelay, indicating the delay for the corresponding operation point between the time of arrival in the CPB of the first bit of the first picture and the time of removal from the CPB of the first bit of the first picture; InitDpbDelay, indicating the delay for the corresponding operation point between the time of arrival in the DPB of the first decoded picture and the time of output from the DPB of the first decoded picture. The instances of the five parameters are signaled in the HRD buffering parameters SEI message. The operation of each HRD operation point is as follows. 1. The buffers are initially emply. 2. Bits of coded access units enter the CPB at the rate equal to BitRate. 3. The decoding timer starts from a negative value equal to 0 when the first bit enters the CPB. Data is not removed from the CPB if the value of the decoding timer is smaller than 0. 4. Removal of the first coded access unit from the CBP occurs when the value of the decoding timer is equal to 0. Removal of any other coded access unit occurs when the value of the decoding timer is equal to the relative output time of the access unit. 5. When a coded access unit has been removed from the CPB, the corresponding decoded picture is immediately placed into the DPB. The output timer starts from a negative value equal to 0-InitDpbDelay when the first decoded picture enters the DPB. Data is not outputted from the DPB if the value of the output timers is smaller than 0. 6. A decoded picture is outputted from the DPB when the value of the output timer is equal to the relative output time of the decoded picture. If the decoded picture is a nonlreference picture or if the decoded picture is a reference picture but marked as unused

31 for reference, the data of the decoded picture is also removed from the DPB when it is outputted. When outputting a decoded picture from the DPB, decoded pictures that are not marked as used for reference and the relative output times are earlier than the value of the output timer are removed from the DPB. 5.0 AVS-M decoder [12]: The generalized block diagram of an AVS-M decoder is presented in Figure 16. Figure 17: Block diagram of an AVS-M decoder [12] The decoder complexity estimation is very important and instructive for target platformdependent codec realization. Storage complexity and computational complexity are two major concerns of the decoder implementation. A. Inverse Transform and Quantization: Only integer 4 X 4 block-based DCT is adopted in AVS-M to simplify the hardware design and decrease the processing complexity of residuals. The inverse transform matrix is illustrated in Figure 18. The range of the QP is extended from 0 to 63 with 8-step quantization approximately. Figure 18: Inverse DCT matrix of AVS-M [12] B. Interpolation: In AVS-M, the 1/2-pixels in luma reference frame are divided into horizontal and vertical classes with different filters, which are illustrated in table 10. For an integer pixel A, there are three 1/2-pixels, marked bh, bv, c; and, twelve 1/4-pixels denoted as d, e,j g, h, i in Figure 19. Chroma interpolation is omitted from this paper since it is easier to implement than luma, as it is performed by straightforward bilinear interpolation operations.

32 Figure 19: Subpixel locations around integer pixel A [12] Table 10: Interpolation filter coefficient [12] C. In-loop deblocking: Deblocking is performed across the edges of 4x4 block on the nearest two pixels. The filter mode is determined by the macroblock type and the QP of current macroblock, i.e. if the macroblock is INTRA coded, the filter is INTRA mode; and if the macroblock is not SKIP or the QP is not less than the predefined threshold, the filter is INTER mode. 5.1 Error concealment tools in AVS-M decoder [17] Error concealment in video communication is a very challenging topic. Many techniques have been proposed to deal with the transmission error problem. Generally, all these techniques can be categorized into three kinds: forward error concealment, backward error concealment and interactive error concealment. Forward error concealment refers to techniques in which the encoder plays the primary role, which partitions video data into more than one layer with different priority. The layer with higher priority is delivered with a higher degree of error protection. Better quality can be achieved when more layers are received at decoder side. Backward error concealment refers to the concealment or estimation of lost information due to transmission errors in which the decoder fulfills the error concealment task. The decoder and encoder interactive techniques achieve the best reconstruction quality, but are more difficult to implement. Generally speaking, a feedback channel is needed from decoder to encoder and low time delay should be guaranteed. 6.0 Main program flow analysis for encoder [15]: In this section, we analyze in detail the main program flow in three key function: Main( ), Encode_I Frame( ),Encode_P_Frame( ) and give flow diagram instructions [5]. This function is the AVS- M program's main function. The main process of the main function is that the required parameters and cache used in the entire program are allocated and initialized. And then, according to the parameters pglmage-> type, decide on the current image I frame or P frame coding, respectively, into the I frame or P frame coding procedures for processing. At last compensation image to return to the main function is

33 reconstructed and stored. For image motion compensation, the amount of data itself will be significantly reduced. Flow chart of the main() is shown in the figure 20. Figure 20: The flow chart of main() [15]

34 Flow chart of Encode_I_Frame is shown in figure 21. Figure 21: The flow chart of Encode_I_Frame() [15]

35 The flow chart of Encode_P_Frame() is shown in figure 22. Figure 22: The flow chart of Encode_P_Frame()[15]

36 7.0 Performance Analysis: Quantization parameter (QP) values ranging from 0-63 and AVS-china part 7 video codec was analyzed at various quality measures like Bit Rate, PSNR and SSIM [23] were calculated. The test sequences used were of QCIF and CIF formats. The bit rate was plotted against the PSNR and MSE. Total 4 sequences are simulated, two from qcif format and another two from cif format. 1. Miss-america_qcif 2. Mother-daughter_qcif 3. Stefan_cif 4. Silent_cif 7.1 Simulation results of sequence miss-america_qcif:[22] Input Sequence: miss-america_qcif.yuv Total No: of frames: 30 frames. Original file size : 1114Kb Width: 176. Height: 144. Frame rate: 30 fps. Figure 23 illustrates video quality of miss_america_qcif sequence at various QP values. Original File QP =10 QP = 50 QP= 63 Fig 23: Video quality at various QP values for miss_america_qcif

37 Simulation Results for miss-america_qcif Sequence: Table 11 shows values compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for miss_america_qcif sequence. QP COMPRESSED Compression Bit Rate [Kbps] Y-PSNR[dB] Y-SSIM FILE SIZE [kb] Ratio 0 330 3.3757:1 2698.85 54.2773 0.9972 5 201 5.5422:1 1641.18 51.3345 0.9946 10 111 10.036:1 902.83 49.1926 0.9916 20 23 48.434:1 179.76 44.9110 0.9842 30 8 139.25:1 56.80 40.7322 0.9716 40 4 278.50:1 30.66 36.0983 0.9429 50 2.80 397.85:1 22.21 30.7096 0.8869 55 2.55 436.86:1 20.17 28.0461 0.8455 60 2.40 464.16:1 18.98 25.1375 0.7999 63 2.33 478.11:1 18.42 22.0501 0.7829 Table 11: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for missamerica_qcif sequence Results in Graphical Form: Fig 24 shows the plot of PSNR (db) vs. Bitrate (Kbps) for miss_america_qcif. Fig 24: PSNR (db) vs. Bitrate (Kbps) for miss_america_qcif

38 Figure 25 shows the plot of SSIM vs. bitrate (Kbps) for miss_america_qcif. Fig 25: SSIM vs. Bitrate (Kbps) for miss_america_qcif 7.2 Simulation results of sequence mother-daughter.qcif[21] Input Sequence: mother-daughter_qcif.yuv Total No: of frames: 30 frames. Original file size : 1139Kb Width: 176. Height: 144. Frame rate: 30 fps Figure 26 illustrates video quality of mother_daughter_qcif sequence at various QP values. Original File QP = 10

39 QP = 50 QP=63 Fig 26: Video quality at various QP values for mother_daughter_qcif Results for mother-daughter_qcif Sequence: Table 12 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for mother-daughter_qcif sequence. QP COMPRESSED Compression Bit Rate [Kbps] Y-PSNR Y-SSIM FILE SIZE [kb] Ratio [db] 0 237 4.8059:1 1937.45 53.97741 0.9981 5 127 8.9685:1 1037.33 51.1885 0.9964 10 63 18.0794:1 514.99 48.8490 0.9945 20 19 59.9474:1 152.79 43.4243 0.9856 30 8 142.3750:1 60.56 38.4661 0.9617 40 4 284.7500:1 31.99 33.7337 0.9030 50 2.79 408.2437:1 22.15 29.4328 0.8023 55 2.56 444.9219:1 20.29 26.0557 0.6981 60 2.38 478.5714:1 18.83 23.0817 0.6371 63 2.16 527.3148:1 18.53 22.3413 0.6221 Table 12: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for motherdaughter_qcif sequence

40 Results in Graphical Form: Figure 27 shows the plot of PSNR (db) vs. Bitrate (Kbps) for mother_daughter_qcif. Fig 27: PSNR (db) vs. Bitrate (Kbps) for mother_daughter_qcif Figure 28 shows the plot of SSIM vs. Bitrate (Kbps) for mother_daughter_qcif Fig 28: SSIM vs. Bitrate (Kbps) for mother_daughter_qcif

41 7.3 Simulation results of sequence sefrat_cif:[21] Input Sequence: stefan_cif.yuv Total No: of frames: 15 frames. Original file size : 2227.5 Kb Width: 352 Height: 288. Frame rate: 30 fps. Figure 29 illustrates video quality of Stefan_cif sequence at various QP values Original File QP = 10 QP = 50 QP=63 Fig 29: Video quality at various QP values for stefan_cif

42 Simulation Results for stefan_cif Sequence Table 13 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for Stefan_cif sequence. QP COMPRESSED Compression Bit Rate [Kbps] Y-PSNR[dB] Y-SSIM FILE SIZE [kb] Ratio 0 1082 2.0587:1 17722.04 53.7192 0.9987 5 810 2.7500:1 13257.49 50.5553 0.9973 10 588 3.7883:1 9616.75 48.0813 0.9953 20 270 8.2500:1 4419.03 41.8208 0.9884 30 107 20.8178:1 1749.46 36.0297 0.9742 40 41 54.3293:1 655.40 30.7737 0.9403 50 19 117.2368:1 309.20 25.9537 0.8419 55 15 148.5000:1 233.75 23.6506 0.7556 60 11 202.5000:1 177.33 20.7062 0.5875 63 10 222.7500:1 151.39 19.0242 0.4688 Table 13: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for stefan_cif sequence Results in Graphical Form: Figure 30 shows the plot of PSNR (db) vs. Bitrate (Kbps) for stefan_cif. Fig 30: PSNR (db) vs. Bitrate (Kbps) for stefan_cif

43 Figure 31 shows the plot of SSIM vs. Bitrate (Kbps) for stefan_cif. Fig 31: SSIM vs. Bitrate (Kbps) for stefan_cif 7.4 Simulation results of sequence silent_cif[21] Input Sequence: silent_cif.yuv Total No of frames: 15 frames. Original file size : 2227.5Kb Width: 352. Height: 288. Frame rate: 30 fps. Fig 32 illustrates video quality of silent_cif sequence at various QP values.

44 Original File QP = 10 QP = 50 QP=63 Fig 32: Video quality at various QP values for silent_cif Simulation Results for silent_cif Sequence: Table 14 shows the values of compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for silent_cif sequence. QP COMPRESSED Compression Bit Rate [Kbps] Y-PSNR[dB] y-ssim FILE SIZE [kb] Ratio 0 592 3.7627:1 9694.90 53.5225 0.9982 5 357 6.2395:1 5836.70 50.6134 0.9965 10 199 11.1935:1 3244.26 47.7497 0.9934 20 66 33.7500:1 1076.57 41.8517 0.9769 30 28 79.5536:1 445.79 36.5718 0.9329 40 13 171.3462:1 206.23 32.0900 0.8498 50 8 278.4375:1 121.74 28.1807 0.7315 55 7 318.2143:1 101.13 25.9689 0.6612 60 5.54 402.0758:1 89.21 23.8874 0.6125 63 5.25 424.2857:1 84.36 22.1109 0.5366 Table 14: Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for silent_cif