1 Introduction Motivation Modus Operandi Thesis Outline... 2

Size: px

Start display at page:

Download "1 Introduction Motivation Modus Operandi Thesis Outline... 2"

Charlotte Cummings
5 years ago
Views:

1 Contents 1 Introduction Motivation Modus Operandi Thesis Outline Background Overview Analog and Digital Television HDTV History Notation Transmission Formats Interlaced versus Progressive Scanning Devices Requirements Alternatives to HD Ready Content Summary H.264 and Compression Artifacts H History Dataflow Path H.264 Structure Functional Blocks i

2 ii 3.2 Artifacts of Compression Blocking Effect DCT Basis Image Effect Blurring Color Bleeding Staircase Effect Mosaic Patterns Motion Compensation Mismatch Chrominance Mismatch Summary

3 Chapter 1 Introduction 1.1 Motivation HD is the new buzzword in town. More and more people are buying display devices - which have a HD-READY logo. TV Broadcasters and other organizations related to the TV industry are also more and more interested in capturing and delivering HD content to end users. Organizations are reassessing their business models as to how they can use HDTV to maximize their revenues. The purpose of HD is to deliver crystal clear pictures to the end users with the level of detail that was previously unheard of and was simply not feasible with the infrastructure present. Now that the customer has an HD-Ready display device at his disposal, he wants to make sure also that the content delivered to him is also of the highest quality. Everybody in the delivery chain, from content producers to broadcasters,is aware of this demand from the customer. HD means a lot of data has to be recorded, moved back and forth between various facilities and finally broadcasted. Data has to be compressed to transport it between various facilities and compression is the main source of degradation in quality. Video data becomes prone to various kinds of visible artifacts when subjected to compression and that is not at all acceptable for the customer - who just wants a brilliant picture on his new costly display device. We focus on developing a quality assessment scheme which can take as input a video sequence and give a parameter as output which can indicate the quality of the input video sequence. 1.2 Modus Operandi (A brief discussion of what we will do, will occupy this space) 1

4 2 Chapter 1. Introduction 1.3 Thesis Outline In this thesis we review some background material that facilitate the understanding of our work and describe the quality control procedures. This is followed by a description of our proposed method and results of the simulation. The rest of the document is structured as follows: Chapter 2 contains a review of the background material concerning HDTV and also takes a brief look at who is doing what in the world of HDTV. Chapter 3 describes the compression algorithms in general and h.264 in particular. Various types of coding artifacts introduced by compression methods are also discussed. Chapter 4 explains the types of quality control techniques, limitations of existing techniques and then describes our own proposed method for automated video quality analysis. Chapter 5discusses the implementation of our proposed methods. We describe the platform and test video sequences used as well as the various test cases used. Chapter 6 shows the results as well as drawbacks of our scheme as well as future work.

5 Chapter 2 Background 2.1 Overview In this chapter we review what is HDTV and how it was developed gradually in various parts of the world. We also explore the choices available for compression as well as having a brief look at what do the various terminologies depict concerning HD. In the end we will look at various devices which can display and decode HD material. 2.2 Analog and Digital Television Digital television is the transmission of TV using Digital Signals. Because of the nature of the digital signal there is no fluctuation. It is either perfectly intact or absent. This property of digital transmission makes it more precise than analog transmission. An analog signal degrades over distance and may be barely detectable at the boundary of the transmission range - this is the reason behind fading in and out of radio stations at the boundaries of the transmission areas. In case of analog signals, the SNR decreases with distance and hence the quality of the broadcast suffers, which was normally evident in the form of ghosts and snow. In case of digital signal, the decrease in SNR does not deteriorate the quality of the signal, but the range shrinks. This is called the cliff effect. As a result the receiver either receives a clear picture or nothing at all. The traditional standards - PAL, NTSC and SECAM - specify analog transmission. The major DTV standards are ATSC (North America), DVB (Europe) and ISDB (Japan). All three use MPEG-2 video compression and Dolby Digital audio compression. DVB and ISDB also include mpeg audio compression. Digital transmission techniques allow more data to be transmitted, so 2 things can be done using this increase in bandwidth. Improved display resolution and sound quality, in the case of high definition television (HDTV). 3

6 4 Chapter 2. Background Simultaneous broadcast of up to five programs, in the case of standard definition television (SDTV). Datacasting is also an option in the case of digital transmission. Datacasting might allow, for example, someone watching a football match to choose a different camera angle, or to select a display of player or team statistics. Here we would like to differentiate between DTV and HDTV. DTV deals with the method of transmission of the television signal whereas HDTV defines a new television display format but does not deal with the method of transmission. 2.3 HDTV High-Definition television means broadcast of television signals with a higher resolution than that of traditional formats (NTSC, SECAM, PAL) allow. HDTV is mostly broadcasted digitally. It is defined as 1080 active lines, 16 x 9 aspect ratio in ITU-R BT.709. However, in the ATSC broadcast standard, used in the United States and other countries, any ATSC resolution with 720 or more active lines is considered HDTV. The traditional standards do not allow such high resolutions with NTSC limiting the resolution to 640 x 480 and PAL allowing 768 x 576 as the maximum possible resolution History Modern-day HDTV has its roots in research that was started in Japan by the NHK (Japan Broadcasting Corporation) in Things progressed separately in Europe and North America. HDTV broadcast requires a lot of data to be transmitted and it was soon evident that analog HDTV would simply not be feasible. First we discuss how things had developed in North America followed by the Europeans progress in this field North America In 1977, the SMPTE (The Society of Motion Picture and Television Engineers) Study Group on High Definition Television was formed. The group published its initial recommendation in 1980, which included, among other things, the definition of wide screen format. The first demonstration of HDTV in the United States took place in 1981 and generated a great deal on interest. In 1987, the FCC (Federal Communications Commission) sought advice from the private sector and formed the Advisory Committee on Advanced Television Service. Initially there were as many as 23 different ATV (Advanced Television) systems proposed to this committee, but by 1990, there were only 9 proposals remaining - all based on analog technology. However, by mid-1991, the leading ATV designs were based on a new all-digital approach. A joint proposal from several companies detailing an all-digital ATV system was given to the FCC in Following certain changes and compromises,

7 2.3. HDTV 5 this proposal was approved by the FCC in December 1996 and became the mandated ATSC (Advanced Television Systems Committee) standard for terrestrial DTV/HDTV broadcasting Europe In the early 1990s, European broadcasters, consumer equipment manufacturers and regulatory bodies formed the European Launching Group (ELG) to discuss introducing DTV throughout Europe. The ELG realized the importance of establishing a common frame of reference among members and drafted a document called the Memorandum of Understanding (MoU) to establish a basis of understanding. The MoU was signed by all ELG members in The DVB (Digital Video Broadcasting) Project was created from the ELG membership in September of DVB opens the possibilities of providing crystal-clear television programming to television sets in buses, cars,trains and even hand-held televisions. DVB is beneficial to content providers because they can offer their services anywhere. Today, the DVB project consists of over 220 organizations in more than 29 countries worldwide. DVB broadcast services are available in Europe, Africa, Asia, Australia and parts of South America Present Day Europe Euro1080 became the first commercial broadcaster in Europe to air full time HDTV content in January It was free to air in its first few months. German channel group Prosieben- Sat. 1 ran some test transmissions in fall 2004 and early 2005 before launching a free to air, complete service in October BSkyB launched its HDTV service by the name of Sky HD in May 2006 for viewers in UK and Ireland. The service requires the new Sky HD Digibox and a monthly subscription. Several other broadcasters in France, Italy, Spain, Poland, Sweden, Belgium and Germany have either started or are planning to start HDTV services Notation HDTV broadcast is described using notations that describe the following The number of lines in the display resolution. Progressive frames (p) or interlaced fields (i). Number of frames (or fields) per second. For example, the format 720p60 is 1280 x 720 pixels, with 60 progressive frames per second. The format 1080i50 is 1920 x 1080 pixels, with 50 fields (25 frames) per second. Often the frame rate is not mentioned and the format is described only with the help of resolution as 720p(i) or 1080p(i).

8 6 Chapter 2. Background Transmission Formats SDTV uses MPEG-2 for video compression with bit rates between 10 and 18 Mbps, depending on the type of content. In the beginning of HDTV MPEG-2 was the standard of choice for compressing the video data. In 1998 MPEG-4 was announced which promised better compression. That meant lower bit rates for the same amount of data. In May 2003, MPEG-4 Part 10 Advanced Video Coding (AVC) was announced, also known as H.264. This new development promised even lesser bit rates for the same picture quality. According to Apple, which has incorporated H.264 in Quicktime 7, H.264 delivers up to four times the frame size of the older MPEG-4 Part 2 at the same data rate. As compared to MPEG-2, AVC provides same quality at a third to half of the data rate. AVC remains efficient across the whole range of media delivery modes, right down to about 1.5 Mbps for web based services or even kbps for mobile content at frames per second on a 176 x 144 pixel screen. With time more and more broadcasters are adopting the efficient H.264 codec. These include Prosieben and Sat1 from Germany. CanalSat of France is also broadcasting using H.264. Other broadcasters include RAI (Radio Audizioni Italiane) from Italy, Pay per view terrestrial channels in France, BBC HD from UK and Sky HD (which is a combination of several channels) from UK and Ireland Interlaced versus Progressive Scanning Traditional TV uses interlaced scanning in order to conserve bandwidth. In interlaced scanning each frame is displayed. During the first pass all the odd number lines are drawn in 1/50th of a second and during the second pass the even numbered lines are drawn in another 1/50th of a second, giving a combined frame rate of 25 fps. On the other hand in a progressively scanned system the entire frame is conveyed in every scan sequence (every 1/50th of a second). There have been many discussions on the respective merits of one or the other one of these formats. There seems to be agreement on the following points: A progressive format is easier to compress and leads to lower bitrates Motion portrayal is better with 720p50. Interlaced scanning can introduce image artefacts during rapid motion when shown on native progressively scanned displays (LCD, Plasma,... ). 720p50 provides overall fewer artefacts than 1080i Production in 1080i is currently easier due to more available equipment. Overall 1080i is in wider use worldwide than 720p.

9 2.4. Devices Devices Still to be explored. In January 2005, EICTA (European Information, Communications and Consumer Electronics Technology Industry Associations) announced the requirements for the HD Ready label. EICTA introduced the label as a quality sign, for the differentiation of display equipment, capable of processing and displaying high-definition signals.the two variations of the HD Ready logo are shown in Fig Figure 2.1: The 2 variations of the logo for HD Ready Requirements In order for a device to be awarded the label HD Ready, it has to cover the following requirements: 1. Display The minimum native resolution of the display (e.g. LCD, PDP) or display engine (e.g. DLP) is 720 physical lines in wide aspect ratio. 2. Video Connectors The display device accepts HD input via: Analog YPbPr. HD Ready displays support analog YPbPr as a HD input format to allow full compatibility with today s HD video sources in the market. Support of the YPbPr signal should be through common industry standard connectors directly on the HD Ready display or through an adapter easily. DVI or HDMI(High Definition Multimedia Interface) HD capable inputs accept the following HD video formats: 50 and 60Hz progressive scan (720p). 50 and 60Hz interlaced (1080i).

10 8 Chapter 2. Background The DVI or HDMI input supports copy protection (HDCP - High-Bandwidth Digital Content Protection) Alternatives to HD Ready Many PCs and laptops are capable of displaying HD content, however they do not qualify for the HD Ready label because of the connector requirements. Any sufficiently fast computer with a display resolution of 1280x720 or more is fully ready for HD video. The video can come from internet, data files or a DTV tuner card. 2.5 Content So what is HD good for? Special events, sports events and movies are predicted to be the major driving force behind HDTV. National Geographic plans to show documentaries and BBC aims to convert everything into HD. It is just a matter of time before all jump into the bandwagon and switch to HD. With the FIFA World Cup 2006 and Wibledon 2006 being already broadcasted in HD, and more events like Asian Games 2006 in Doha (Qatar) in the pipeline for HD broadcast, the list is ever growing. The higher resolution of HD brings the viewer a new style of experiencing sports events with very high level of detail. Motion Pictures are the medium where detail is as important as anything else. Crisp, large and high resolution pictures will surely revolutionize the whole experience of watching movies for home viewers. HD brings a new life to the movie industry. Situation of some of the biggest players is something like this: In % of MGM s production was HD 80% of WB s television content is shot or transferred to HD 80% of Disney s production is already in HD and they plan it to be 100% in the next four years Although the Hollywood is still divided over which new dvd format is to be used, blue ray or HD DVD (High Density Digital Versatile Disc), there is no doubt that about the use of high definition video. Which ever format wins, the end user is going to experience HD video at the end of the day. 2.6 Summary In this chapter we explored what is HDTV and how it was developed over the period of years and the current situation of HDTV in Europe. We also looked at the various notations

11 2.6. Summary 9 used to describe HDTV as well as the choices available for compression of video. In the end we examined what is meant by the term HD Ready and what type of content is available in HD for the end users.

12 10 Chapter 2. Background

13 Chapter 3 H.264 and Compression Artifacts In this chapter we look closely at how an H.264 encoder works and what are the different tools and capabilities available to compress a given video bitstream. In the second half of the chapter we explore various types of artifacts which arise as a result of the compression procedure. 3.1 H.264 As discussed in the previous chapter, H.264 is slowly but surely gaining momentum and is being adopted by more and more broadcasters. In this section we first take a brief look at the history of H.264 and then we move onto the actual working of the coder. However a detailed working of the encoder is beyond the scope of this document and so only brief descriptions of different parts of the encoder have been presented. Major parts of this chapter (text and figures) have been borrowed from Iain E.G. Richardson s book titled, H.264 and MPEG-4 Video Compression. The book is recommended for a detailed and in depth description of the workings and functionalities of H History In 1993, the MPEG-4 project was launched. The focus of MPEG-4 was originally to provide a more flexible and efficient update to the earlier MPEG-1 and MPEG-2 standards. In 1994 it was established that the new;y developed H.263 standard offered the best compression technology available at that time, MPEG changed its focus and decided to embrace object based coding and functionality as the distinctive element of the new MPEG-4 standard. Around the time at which MPEG-4 Visual was finalized (1998/99), the ITU-T study group began evaluating proposals for a new video coding initiative entitled H.26L (the L stood for Long Term ). The Joint Video Team (JVT) consists of members of ISO/IEC JTC1/SC29/WG11 (MPEG) and ITU-T SG 16 Q.6 (VCEG). JVT came about as a result of an MPEG require- 11

14 12 Chapter 3. H.264 and Compression Artifacts ment for advanced video coding tools. The core coding mechanism of MPEG-4 Visual (Part 2) is based on H.263, (published in 1995). With recent advances in processor capabilities and video coding research it was evident that the performance of codecs could be improved. After evaluating several competing technologies in 2001, it became apparent that the H.26L test model CODEC was the best choice to meet MPEG s requirement and it was agreed that members of MPEG and VCEG would form a Joint Video Team to manage the final stages of H.26L development. JVT s main purpose was to see the H.264 Recommendation / MPEG-4 Part 10 standard through to publication (published in 2003) ; now that the standard is complete, the groups focus has switched to extensions to support other color spaces and increased sample accuracy Dataflow Path H.264 does not explicitly define a CODEC but rather defines the syntax of an encoded video bitstream together with the method of decoding this bitstream. The Figures 3.1 and 3.2 show the likely structure of a compliant encoder and decoder. Except for deblocking filter, most of the basic functional elements are present in previous standards also but the important changes in H.264 occur in the details of each functional block. Figure 3.1: H.264 Encoder. The Encoder (Fig. 3.1) includes two dataflow paths, a forward path (left to right) and a reconstruction path (right to left). The dataflow path in the Decoder from right to left illustrate the similarities between Encoder and Decoder. The main steps of encoding and decoding will be discussed now.

15 3.1. H Figure 3.2: H.264 Decoder Encoder (forward path) An input frame or field F n is processed in units of a macroblock. Each macroblock is encoded in intra or inter mode and, for each block in the macroblock, a prediction PRED (marked P in Figure 3.1) is formed based on reconstructed picture samples. In intra mode, PRED is formed from samples in the current slice that have previously encoded, decoded and reconstructed (uf n in the Figures 3.1 and 3.2). In inter mode, PRED is formed by motion-compensated prediction from one or two reference picture(s) selected from the set of list 0 and/or list 1 reference pictures. In the figures, the reference picture is shown as the previous encoded picture F n 1 but the prediction reference for each macroblock partition may be chosen from a selection of past or future pictures, that have already been encoded, reconstructed and filtered. The prediction PRED is subtracted from the current block to produce a residual difference block D n that is transformed and quantised to give X, a set of quantised transform coefficients which are reordered and entropy encoded. The entropy encoded coefficients, together with side information required to decode each block within the macroblock (prediction modes, quantisation parameter, motion vectors, etc.) form the compressed bitstream which is passed to a Network Abstraction Layer (NAL) for transmission or storage Encoder (Reconstruction Path) As well as encoding and transmitting each block in a macroblock, the encoder decodes it to provide a reference for further predictions. The coefficients X are scaled (Q 1 ) and inverse transformed (T 1 ) to produce a different block D n. The prediction block PREDis added to D n to create a reconstructed block uf n (a decoded version of the original block; u indicates that it is unfiltered). A filter is applied to reduce the effects of blocking distortion and the reconstructed reference picture is created from a series of blocks F n.

16 14 Chapter 3. H.264 and Compression Artifacts Decoder The decoder receives a compressed bitstream from the NAL and entropy decodes the data elements to produce a set of quantised coefficients X. These are scaled and inverse transformed to give D n. Using the header information decoded from the bitstream, the decoder creates a prediction block PRED, identical to the original prediction PRED formed in the encoder. PRED is added to D n to produce uf n, which is filtered to create each decoded block F n H.264 Structure Profiles and Levels Profiles and levels specify restrictions on bitstreams and hence limits on the capabilities needed to decode the bitstreams. Profiles and levels may also be used to indicate interoperability points between individual decoder implementations. Each Profile specifies a subset of algorithmic features and limits that shall be supported by all decoders conforming to that profile. Each level specifies a set of limits on the values that may be taken by the syntax elements of the standard, such as sample processing rate, picture size, coded bitrate and memory requirements. The same set of level definitions is used with all profiles, but individual implementations may support a different level for each supported profile. For any given profile, levels generally correspond to decoder processing load and memory capability. The standard includes the following seven profiles, targeted at specific applications. Baseline Profile (BP): Primarily for lower cost applications demanding less computing resources, this profile is used widely in video conferencing and mobile applications. Main Profile (MP): Originally intended as the mainstream consumer profile for broadcast and storage applications. Extended Profile (XP): Intended as the streaming video profile, this profile has relatively high compression capability and some extra methods for robustness to data losses and server stream switching. High Profile (HiP): The primary profile for broadcast and disc storage applications, particularly for high-definition television applications (e.g. it is used in HD DVD and Blu-ray disc). High 10 Profile (Hi10P): Going beyond today s mainstream consumer product capabilities, this profile builds on top of the High Profile - adding support for up to 10 bits per sample of decoded picture precision.

17 3.1. H High 4 : 2 : 2 Profile (Hi422P): Primarily targeting professional applications that use interlaced video, this profile builds on top of the High 10 Profile - adding support for the 4 : 2 : 2 chroma sampling format while up to using 10 bits per sample of decoded picture precision. High 4 : 4 : 4 Profile (Hi444P)[deprecated]: This profile builds on top of the Hi422P - supporting up to 4 : 4 : 4 chroma sampling, up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color space transformation error. (This profile is being removed from the standard in favor of developing a new improved 4 : 4 : 4 profile.) A detailed discussion of the profiles and levels is beyond the scope of this document Coded Data Format H.264 makes a distinction between a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The output of the encoding process is VCL data, which are mapped to NAL units prior to transmission or storage. Each NAL unit contains a Raw Byte Sequence Payload (RBSP), a set of data corresponding to coded video data or header information. A coded video sequence is represented by a sequence of NAL units that can be transmitted over a packet-based network or a bitstream transmission link or stored in a file. The purpose of separately specifying the VCL and NAL is to distinguish between coding-specific features (at the VCL) and transport-specific features (at the NAL) Reference Pictures An H.264 encoder may use one or more or previously encoded pictures as reference for motion-compensated prediction of each inter coded macroblock or macroblock partition. This enables the encoder to search for the best match for the current macroblock partition from a wider set of pictures than only the previously encoded picture. The encoder and decoder each maintain one or two lists of reference pictures, containing pictures that have previously been encoded and decoded. Inter coded macroblocks and macroblock partitions in P slices are predicted from pictures in a single list, list 0.Inter coded macroblocks and macroblock partitions in B slices may be predicted from two lists, list 0 and list Slices A frame is coded as one or more slices, each containing an integral number of macroblocks from 1 to the total number of macroblocks in the frame. The number of macroblocks per slice may vary within a frame. There is minimal interdependency between coded slices

18 16 Chapter 3. H.264 and Compression Artifacts which helps in limiting the propagation of errors. There are five types of coded slice and a coded picture can be composed of different types of slices. 1. I(intra): Contains only I macroblocks. 2. P(predicted): Contains P macroblocks and/or I and/or Skipped macroblocks. 3. B(bi-predictive): Contains B macroblocks and/or I macroblocks. 4. SP(switching P): Facilitates switching between coded streams; contains P and/or I macroblocks. 5. SI(switching I): Facilitates switching between coded streams; contains SI macroblocks (a pecial type of intra coded macroblock). When a Skipped macroblock is signalled in the bitstream, no further data is sent for that macroblock. The decoder calculates a vector for the skipped macroblock and reconstructs the macroblock from the first reference picture in list 0. Redundant Coded Picture A picture marked as redundant contains a redundant representation of part or all of a coded picture. Normally, the decoder reconstructs the frame from primary pictures and discards any redundant pictures. However, if a primary coded picture is damaged (e.g. due to a transmission error), the decoder may replace the damaged area with decoded data from a redundant picture if possible Macroblocks A coded picture consists of a number of macroblocks, each containing 16 x 16 luma samples and associated chroma samples (4 x 8 Cb and 8 x 8 Cr samples). Within each picture macroblocks are arranged in slices, where a slice is a set of macroblocks in raster scan order. An I slice may contain only I macroblock types, a P slice may contain P and I macroblock types and a B slice may contain B and I macroblock types. I macroblocks are predicted using inter prediction from decoded samples in the current slice. A prediction is formed either (a) for the complete macroblok or (b) for each 4 x 4 block of luma samples (and associated chroma samples) in the macroblock. P macroblocks are predicted using inter prediction from reference picture(s). An inter coded macroblock may be divided into macroblock partitions, i.e. blocks of size 16 x 16, 16 x 8, 8 x 16 or 8 x 8 luma samples (and associated chroma samples). If the 8 x 8 partition size is chosen, each 8 x 8 sub-macroblock may be further divided into sub-macroblock partitions of size 8 x 8, 8 x 4, 4 x 8 or 4 x 4 luma samples (and associated chroma samples). Each macroblock partition may be predicted from one picture in list 0. If present, every sub-macroblock partition in a sub-macroblock is predicted from the same picture in list 0.

19 3.1. H B macroblocks are predicted using inter prediction from reference picture(s). Each macroblock partition may be predicted from one or two reference pictures, one picture in list 0 and/or one picture in list Functional Blocks Inter Prediction Inter prediction creates a prediction model from one or more previously encoded video frames or fields using block-based motion compensation. Important differences from earlier standards include the support for a range of block sizes (from 16 x 16 down to 4 x 4) and fine subsample motion vectors (quarter pel resolution in the luma component). Tree structured Motion Compensation The luminance component of each macroblock (16 x 16 samples) may be split up in four ways and motion compensated either as one 16 x 16 macroblock partition, two 16 x 8 partitions, two 8 x 16 partitions or four 8 x 8 partitions. If the 8 x 8 mode is chosen, each of the four 8 x 8 partitions can be further split up in a further four ways, with 4 x 4 being the smallest partition size. These partitions and sub-macroblocks give rise to a large number of possible combinations within each macroblock. This method of partitioning macroblocks into motion compensated sub-blocks of varying size is known as tree structured motion compensation. A separate motion vector is required for each partition or sub-macroblock. Each motion vector must be coded and transmitted and the choice of partition(s) must be encoded in the compressed bitstream. Choosing a large partition size (16 x 16, 16 x 8,8 x 16) means that a small number of bits are required to signal the choice of motion vector(s) and the type of partition but the motion compensated residual may contain a significant amount of energy in frame areas with high detail. Choosing a smaller partition size may reduce the energy residual but will require more bits to signal the motion vectors and choice of partition(s). In general, a large partition size is appropriate for homogeneous areas of the frame and a small partition size may be beneficial for detailed areas. Each chroma component (Cb or Cr) in a macroblock has half the horizontal and vertical resolution of the luma component. Each chroma block is partitioned in the same way as the luma component, except that the partition sizes have exactly half the horizontal and vertical resolution. The horizontal and vertical components of each motion vector are halved when applied to the chroma blocks. Motion Vectors Each partition or sub-macroblock partition in an inter-coded macroblock is predicted from an area of the same size in a reference picture. The offset between the two areas (the motion

20 18 Chapter 3. H.264 and Compression Artifacts Figure 3.3: H.264 Example of integer and subsample prediction. vector) has quarter-sample resolution for the luma component and one-eighth-sample resolution for the chroma components. The luma and chroma samples at sub-sample positions do not exist in the reference picture and so it is necessary to create them using interpolation from nearby coded samples. In Figure 3.3, a 4 x 4 block in the current frame (a) is predicted from a region of the reference picture in the neighborhood of the current block position. If the horizontal and vertical components of the motion vector are integers (b), the relevant samples in the reference block actually exist (grey dots). If one or both vector components are fractional values (c), the prediction samples are generated by interpolation between adjacent samples in the reference frame. Half-pel samples that are adjacent to two integer samples generated by using a 6 element weighted Finite Impulse Response (FIR) filter. Once all of the samples horizontally and vertically adjacent to integer samples have been calculated, the remaining half-pel positions are calculated by interpolating between six horizontal or vertical half-pel samples from the first set of operations. Once all the half-pel samples are available, quarter-pel positions are produced by simple linear interpolation. Motion Vector Prediction Encoding a motion vector for each partition can cost a significant number of bits, especially if small partition sizes are chosen. Motion vectors for neighboring partitions are often highly correlated and so each motion vector is predicted from vectors of nearby, previously coded partitions. A predicted vector MVp, is formed based on previously calculated motion vectors and MVD, the difference between the current vector and the predicted vector, is encoded and transmitted. The method of forming the prediction MVp depends on the motion compensation partition size and on the availability of nearby vectors.

21 3.1. H Bi-Prediction In Bi-predictive mode, a reference block is created from the list 0 and list 1 reference pictures. Two motion compensated reference areas are obtained from reference pictures from both lists and each sample of the prediction block is calculated as an average of the list 0 and list 1 prediction samples. This method of prediction is used in B frames. Direct Prediction No motion vector is transmitted for a B slice macroblock or macroblock partition encoded in Direct mode. Instead, the decoder calculates list 0 and list 1 vectors based on previously coded vectors and uses these to carry out bi-predictive motion compensation of the decoded residual samples. A skipped macroblock in a B slice is reconstructed at the decoder using Direct prediction. Weighted Prediction Weighted prediction is a method of modifying the samples of motion-compensated prediction data in a P or B slice macroblock. There are three types of weighted predictions in H.264: 1. P slice macroblock, explicit weighted prediction; 2. B slice macroblock, explicit weighted prediction; 3. B slice macroblock, implicit weighted prediction; Each prediction sample is scaled by a weighting factor ω 0 or ω 1 prior to motion compensated prediction. In the explicit type, the weighting factors are determined by the encoder and transmitted in the slice header. If implicit prediction is used, ω 0 and ω 1 are calculated based on the relative temporal positions of the list 0 and list 1 reference pictures. A larger weighting factor is applied if the reference picture is temporally close to the current picture and a smaller factor is applied if the reference picture is temporally away from the current picture. One application of weighted prediction is to allow explicit or implicit control of the relative contributions of reference picture to the motion-compensated prediction process. For example, weighted prediction may be effective in coding of fade transitions (where one scene fades into another) Intra Prediction In intra mode a prediction block P is formed based on previously encoded and reconstructed blocks (of the same frame) and is subtracted from the current block prior to encoding. For

22 20 Chapter 3. H.264 and Compression Artifacts the luma samples, P is formed for each 4 x 4 block or for a 16 x 16 macroblock. there are a total of nine optional prediction modes for each 4 x 4 luma block, four modes for a 16 x 16 luma block and four modes for the chroma components. The encoder typically selects the prediction mode for each block that minimises the difference between P and the block to be encoded. A further intra coding mode, IPCM, enables the encoder to transmit the values of the image samples directly (without predictions or transformation). In some cases this mode may be more efficient than the usual process of intra prediction, transformation, quantization and entropy coding. Including the IPCM option makes it possible to place an absolute limit on the number of bits that may be contained in a coded macroblock without constrained decoded image quality. 4 x 4 Luma Prediction Modes Figure 3.4 shows a 4 x 4 luma block, that is required to be predicted. The samples above and to the left (labeled A-M in the figure), have been previously encoded and reconstructed and hence are available to be used as reference. The samples a, b, c,...,p of the prediction block P are calculated based on the samples A-M as follows. Mode 2 (DC prediction) is modified depending on which samples A-M have previously been coded; each of the other modes may only be used if all of the required prediction samples are available. The arrows in figure 3.5 indicate the direction of prediction in each mode. For modes 3-8, the predicted samples are formed from a weighted average of the prediction samples A- M. Mode 0 (Vertical) Mode 1 (Horizontal) Mode 2 (DC) Mode 3 (Diagonal Down-Left) Mode 4 (Diagonal Down-Right) Mode 5 (Vertical- Right) Mode 6 (Horizontal- Down) Mode 7 (Vertical-Left) Mode 8 (Horizontal- UP) The upper samples A, B, C, D are extrapolated vertically. The left samples I, J, K, L are extrapolated horizontally. All samples in P are predicted by the means of samples A... D and I... L. The samples are interpolated at a 45 o angle between lower left and upper right. The samples are extrapolated at a 45 o angle down and to the right. Extrapolation at an angle of about 26.6 o to the left of vertical. Extrapolation at an angle of about 26.6 o below horizontal. Extrapolation at an angle of about 26.6 o to the right of vrtical. Extrapolation at an angle of about 26.6 o below horizontal.

3.1. H.264 21 Figure 3.4: H.264 Labelling of prediction samples.

Four modes are available: Mode 0 (Vertical) Mode 1 (Horizontal) Mode 2 (DC) Mode 3 (Plane) Extrapolation from upper samples. Extrapolation from left samples. Mean of upper and left-hand samples.

23 3.1. H Figure 3.4: H.264 Labelling of prediction samples. 16 x 16 Luma Prediction Modes As an alternative to the 4 x 4 luma modes described in the previous section, the entire 16 x 16 luma block may be predicted in one operation. Four modes are available: Mode 0 (Vertical) Mode 1 (Horizontal) Mode 2 (DC) Mode 3 (Plane) Extrapolation from upper samples. Extrapolation from left samples. Mean of upper and left-hand samples. A linear plane function is fitted to the upper and left-hand samples H and V. Figure 3.5: H x 4 Luma prediction modes. 8 x 8 Chroma Prediction Modes Each 8 x 8 component of an intra-coded macroblock is predicted from previously encoded chroma samples above and/or to the left. Both chroma components always use the same prediction mode. The modes are DC(mode 0), horizontal (mode 1), vertical (mode 2) and plane (mode 3).

24 22 Chapter 3. H.264 and Compression Artifacts Signalling Intra Prediction Modes The choice of intra prediction mode for each 4 x 4 block must be signalled to the decoder and this could potentially require a very large number of bits. However, intra modes for neighboring 4 x 4 blocks are often correlated. Predictive coding is used to take advantage of this situation and to save some bits Deblocking Filter A filter is applied to each decoded macroblock to reduce blocking distortion. The deblocking filter is applies after the inverse transform in the encoder and in the decoder. The filter smooths block edges, improving the appearance of decoded frames. the filtered image is used for motion-compensated prediction of future frames and this can improve compression performance as the filtered image is a more faithful reconstruction of the original image than a non-filtered, blocky image. It is possible for the encoder to alter the filter strength or to disable the filter. Filtering is applied to vertical or horizontal edges of 4 x 4 blocks in a macroblock in the following order. 1. Filter 4 vertical boundaries of the luma component (in order a, b, c, d in Figure 3.6). 2. Filter 4 horizontal boundaries of the luma component (in order e, f, g, h in Figure 3.6). 3. Filter 2 vertical boundaries of each chroma component (i, j). 4. Filter 2 horizontal boundaries of each chroma component (k, l). Each filtering operation effects up to 3 samples on either side of the boundary. Figure 3.7 shows four samples on either side of the boundary in adjacent blocks p and q. The strength of the filter depends on the current quantiser, the coding modes of neighboring blocks and the gradient of image samples across the boundary. Boundary Strength The choice of filtering outcome depends on the boundary strength and on the gradient of image samples across the boundary. The boundary strength parameter bs is chosen according to the following rules (for progressive frames):

25 3.1. H Figure 3.6: Edge filtering order in a macroblock. p and/or q is intra coded and boundary is a macroblock boundary bs = 4 p and q are intra coded and boundary is not a macroblock bs = 3 boundary neither p or q is intra coded; p and q contain coded coefficients bs = 2 neither p or q is intra coded; neither contains coded coefficients; p and q use different reference pictures or a different bs = 1 number of reference pictures or have motion vector values that differ by one luma sample or more otherwise bs = 0 The result of applying these rules is that the filter is stronger at places where there is likely to be significant blocking distortion, such as the boundary of an intra coded macroblock or a boundary between blocks that contain coded coefficients. Filter Decision A group of samples from the set (p2, p1, p0, q0, q1, q2) is filtered only if: 1. BS > 0 and 2. p0 q0 < α and p1 p0 < β and q1 q0 β. α and β are thresholds defined in the standard; they increase with the average quantiser parameter QP of the two blocks q and p. The effect of the filter decision is to switch off the filter when there is a significant change across the block boundary in the original image. When QP is small, anything other than a very small gradient across the boundary is likely to be due to image features (rather than blocking artifacts) that should be preserved and so the threshold α and β are low. When QP is larger, blocking distortion is likely to be more significant and α, β are higher so that more boundary samples are filtered Transform and Quantisation H.264 uses three transforms depending on the residual data that is to be coded: a Hadamard transform for the 4 x 4 array of luma DC coefficients in intra macroblocks predicted in 16

26 24 Chapter 3. H.264 and Compression Artifacts Figure 3.7: Samples adjacent to horizontal and vertical boundaries. Figure 3.8: Scanning order of residual blocks within a macroblock. x 16 mode, a Hadamard transform for the 2 x 2 array of chroma DC coefficients (in any macroblock) and a DCT based transform for all other 4 x 4 blocks in the residual data. Data within a macroblock are transmitted in the order shown in Figure 3.8. If the macroblock is coded in 16 x 16 Intra mode, then the block labelled 1, containing the transformed DC coefficient of each 4 x 4 luma block, is transmitted first. Next, the luma residual blocks 0 15 are transmitted in the order shown. Blocks 16 and 17 are sent next, containing a 2 x 2 array of DC coefficients from the Cb and Cr chroma blocks respectively and finally, chroma residual blocks are sent. 4 x 4 Residual Transform (blocks 0-15 and 18-25) This transform operates on 4 x 4 blocks of residual data after motion compensated prediction or Intra prediction. Thr H.264 transform is based on the DCT but with some fundamental differences:

27 3.1. H It is an integer transform (all operations can be carried out using integer arithmetic, without loss of decoding accuracy). 2. It is possible to ensure zero mismatch between encoder and decoder inverse transforms. 3. The core part of the transform can be implemented using only addition and shifts. 4. A scaling multiplication (part of the transform) is integrated into the quantiser, reducing the total number of multiplications. The inverse quantisation (scaling) and inverse transform operations can be carried out using 16 bit integer arithmetic with only a single multiply per coefficient, without any loss of accuracy. The 4 x 4 forward transform is given by Y as: Y = C f XC T f E f = ([ ] [ ] [ X ]) a 2 ab 2 ab b a 2 ab 2 ab 2 b 2 4 a 2 ab 2 ab b a 2 ab 2 ab 2 b 2 4 (3.1) This transform is just an approximation to DCT and is not identical to the 4 x 4 DCT. The approximate transform has almost similar compression performance as compared to the original 4 x 4 DCT transform and has a number of important advantages. The core part of the transform, CXC T, can be carried out with integer arithmetic using only additions, subtractions and shifts. The dynamic range of the transform operations is such that 16-bit arithmetic may be used throughout, since the inputs are in the range ±255. The post scaling operation E f requires one multiplication for every coefficient which can be absorbed into the quantisation process. The inverse transform is given by the following equation. The H.264 standard defines this transform explicitly as a sequence of arithmetic operations: Y = C T i (Y E i )C i = [ ] ( [ [ X] a 2 ab a 2 ab ab b 2 ab b 2 a 2 ab a 2 ab ab b 2 ab b 2 ]) [ (3.2) This time, Y is pre-scaled by multiplying each coefficient by the appropriate weighting factor from matrix E i. The factors ±1/2 can be implemented with a right shift without significant loss of accuracy because the coefficients Y are pre-scaled. ] Quantisation H.264 assumes a scalar quantiser. The mechanism of the forward and inverse quantisers are complicated by the requirements to (a)avoid division and/or floating point arithmetic

28 26 Chapter 3. H.264 and Compression Artifacts and (b) incorporate the post- and pre-scaling matrices E f and E i defined in the previous section. The basic forward quantiser operation is: Z ij = round (Y ij /Qstep) where Y ij is a coefficient of the H.264 transform, Qstep is a quantiser step size and Z ij is a quantised coefficient. The rounding operation need not round to the nearest integer; e.g., biasing the round operation towards smaller integers can give perceptual quality improvements. A total of 52 values of Qstep are supported by the standard, indexed by a quantisation parameter QP. Qstep doubles for every increment of 6 in QP. The wide range of quantiser step sizes makes it possible for an encoder to control the tradeoff accurately and and flexibly between bitrate and quality. The values of QP can be different for luma and chroma. Both parameters are in the range 0 51 and the default is that the chroma parameter QP C is derived from QP Y so that QP C is less than QP Y for QP C > 30. A user-defined mapping between QP Y and QP C may be signalled in a Picture Parameter Set. The post-scaling factor a 2, ab/2 or b 2 /4 is incorporated into the quantiser. First the input block X is transformed to give a block of unscaled coefficients W. Then, each coefficient W ij is quantised and scaled in a single operation: ( ) P F Z ij = round W ij. Qstep (3.3) P F is a 2, ab/2 or b 2 /4 depending on the position (i, j). In order to simplify the arithmetic, the factor ( P F Qstep ) is implemented in the reference model software [4] as a multiplication by a factor MF and a right-shift, avoiding any division operations: where ( Z ij = round W ij. P F ) 2 qbits (3.4) and MF 2 q bits = P F Qstep qbits = 15 + floor(qp/6) (3.5) In integer arithmetic, above equation can be implemented easily as: Z ij = ( W ij.mf + f) >> qbits sign(z ij ) = sign(w ij ) (3.6)

29 3.1. H where >> indicates a binary right shift. Rescaling The basic scaling (or inverse quantiser) operation is: Y ij = Z ij Qstep (3.7) The pre-scaling factor for the inverse transform is incorporated in this operation, together with a constant scaling factor of 64 to avoid rounding errors: W ij = Z ij Qstep.P F.64 (3.8) W ij is a scaled coefficient which is transformed by the core inverse transform CT i WC i. The values at the output of the inverse transform are divided by 64 to remove the scaling factor (this can be implemented by using an addition and a right-shift). The H.264 standard does not specify Qstep or PF directly. Instead, the parameterv = Qstep.P F.64 is defined for 0 QP 5 and for each coefficient position so that the scaling operation becomes: W ij = Z ij V ij.2 floor(qp/6) (3.9) The factor 2 floor(qp/6) in above equation causes the scaled output increase by a factor of 2 for every increment if 6 in QP. 4 x 4 Luma DC Coefficient Transform and Quantisation If the macroblock is encoded in 16 x 16 Intra prediction mode, each 4 x 4 residual block is first transformed using the core transform described in the previous section (C f XC T f ). The DC coefficient of each 4 x 4 block is then transformed using a 4 x 4 Hadamard transform: Y D = ] [W D /2 (3.10) W D is the block of a 4 x 4 DC coefficients and Y D is the block after transformation. The output coefficients Y D(i,j) are quantised to produce a block of quantised DC coefficients: ZD(i,j) = ( YD(i,j) MF(0,0) + 2f ) >> (qbits + 1) sign ( Z D(i,j) ) = sign ( YD(i,j) ) (3.11)

30 28 Chapter 3. H.264 and Compression Artifacts f and qbits are defined as before. At the decoder, an inverse Hadamard transform is applied followed by rescaling: W QD = Decoder rescaling is performed by: ] [Z D W D(i,j) = W QD(i,j)V (0,0) 2 floor(qp/6) 2 (QP 12) (3.12) W D(i,j) = [W QD(i,j) V (0,0) floor(qp/6)] >> (2 floor(qp/6)) (QP < 12) (3.13) V (0,0) is the scaling factor V for position (0, 0). Because V (0,0) is constant throughout the block, rescaling and inverse transformation can be applied in any order. The specified order (inverse transform first and then scaling) is designed to maximise the dynamic range of the inverse transform. The residual DC coefficients W D are inserted into their respective 4 x 4 blocks and each 4 x 4 block of coefficients ) is inverse transformed using the core DCT-based inverse transform (C T i W C i. In a 16 x 16 intra-coded macroblock, much of the energy is concentrated in the DC coefficients of each 4 x 4 block which tend to be highly correlated. After this extra transform, the energy is concentrated further into a small number of significant coefficients. 2 x 2 Chroma DC Coefficient Transform and Quantisation Each 4 x 4 block in the chroma components is transformed in the same way as that of the luma components. The DC coefficients of each 4 x 4 block of chroma coefficients are grouped in a 2 x 2 block (W D ) and are further transformed prior to quantisation: W QD = [ ] [ W D ] [ ] (3.14) Quantisation of the 2 x 2 output block Y D is performed by: ZD(i,j) = ( YD(i,j) MF(0,0) + 2f ) >> (qbits + 1) sign ( Z D(i,j) ) = sign ( YD(i,j) ) During decoding, the inverse transform is applied before scaling: W QD = [ ] [ Z D ] [ ] (3.15) (3.16)

31 3.1. H Figure 3.9: Zig zag scan for 4 x 4 luma block. Scaling is performed by: W D(i,j) = W QD(i,j)V (0,0) 2 floor(qp/6) 1 (QP 6) W D(i,j) = [ W QD(i,j) V (0,0) ] >> 1 (QP < 6) (3.17) The rescaled coefficients are replaced in their respective 4 x 4 blocks of chroma coefficients which are then transformed. As with the Intra luma DC coefficients, the extra transform helps to de-correlate the 2 x 2 chroma DC coefficients and improves compression performance Reordering In the encoder, each 4 x 4 block of quantised transform coefficients is mapped to a 16- element array in a zig-zag order (Figure 3.9). In a macroblock encoded in 16 x 16 Intra mode, the DC coefficients of each 4 x 4 luma block are scanned first and these DC coefficients from a 4 x 4 array that is scanned in the order of figure 3.9. This leaves 15 AC coefficients in each luma block that are scanned starting from the second position in the figure. Similarly, the 2 x 2 DC coefficients of each chroma component are first scanned in raster order and then the 15 AC coefficients in each chroma 4 x 4 block are scanned starting from the second position Entropy Coding Above the slice layer, syntax elements are encoded as fixed- or variable-length binary codes. At the slice layer and below, elements are coded using either variable-length codes (VLCs) or context-adaptive arithmetic coding (CABAC) depending on the entropy encoding mode. When entropy coding mode is set to 0, residual block data is coded using a context-adaptive

32 30 Chapter 3. H.264 and Compression Artifacts variable length coding (CAVLC) scheme and other variable-length coded units are coded using Exp-Golom codes. And when entropy coding mode is set to 1, an arithmetic coding scheme is used to encode and decode H.264 syntax elements. Parameters that require to be encoded and transmitted include: Sequence-, picture- and slice-layer syntax elements. Macroblock type mb type, prediction method for each coded macroblock. Coded block pattern, indicates which blocks within a macroblock contain coded coefficients. Quantiser parameter, transmitted as a delta value from the previous value of QP. Reference frame index, identify reference frame(s) for inter prediction. Motion vector, transmitted as a difference (mvd) from predicted motion vector. Residual data, coefficient data for each 4 x 4 or 2 x 2 block. Context-Based Adaptive Variable Length Coding (CAVLC) This is the method used to encode residual, zig-zag ordered 4 x 4 (and 2 x 2) blocks of transform coefficients. CAVLC is designed to take advantage of several characteristics of quantised 4 x 4 blocks: After predictions, transformation and quantisation, blocks are typically sparse (containing mostly zeros). CAVLC uses run-level coding to represent strings of zeros compactly. The highest nonzero coefficients after the zig-zag scan are often sequences of ±1 and CAVLC signals the number of high-frequency ±1 coefficients ( Trailing Ones ) in a compact way. the number of nonzero coefficients in neighboring blocks is correlated. the number of coefficients is encoded using a look-up table and the choice of look-up table depends on the number of nonzero coefficients in neighboring blocks. The level (magnitude) of nonzero coefficients tends to be larger at the start of the reordered array and smaller towards the higher frequencies. CAVLC takes advantage of this by adapting the choice of VLC look-up table for the level parameter depending on recently coded level magnitudes. CAVLC encoding of a block of transform coefficients proceeds as follows: 1. Encode the number of coefficients and trailing ones.

33 3.1. H Encode the sign of each Trailing One. 3. Encode the levels of the remaining nonzero coefficients. 4. Encode the total number of zeros before the last coefficient. 5. Encode each run of zeros. Context-based Adaptive Binary Arithmetic Coding (CABAC) CABAC achieves good compression performance through (a) selecting probability models for each syntax element according to the element s context, (b) adapting probability estimates based on local statistics and (c) using arithmetic coding rather than variable-length coding. Coding a data symbol involves the following steps: 1. Binarisation: CABAC uses Binary Arithmetic Coding which means that only binary decisions (1 or 0) are encoded. A non-binary-valued symbol (e.g. a transform coefficient or a motion vector) is binarised or converted into a binary code prior to arithmetic coding. Steps 2, 3 and 4 are repeated for each bit of the binarised symbol. 2. Context model selection. A context model is a probability model for one or more bins of the binarised symbol and is chosen from a selection of available models depending on the statistics of recently-coded data symbols. The context model stores the probability of each bin being 1 or Arithmetic encoding: An arithmetic encoder encodes each bin according to the selected probability model. There are just two sub ranges for each bin (corresponding to 1 and 0 ). 4. Probability update: The selected context model is updated based on the actual coded value (e.g. if the bin value was 1, the frequency count of 1s is increased). With this the discussion of the inner workings of H.264 come to an end. In this section H.264 was examined in quite some detail but still a lot of information has been left out and interested readers are advised to consult other texts.

34 32 Chapter 3. H.264 and Compression Artifacts 3.2 Artifacts of Compression The video compression algorithm used in various standards, such as H.261, H.263, H.264, MPEG-1, MPEG-2 and MPEG-4, employ the same basic algorithms, i.e. motion compensation, temporal DPCM, Discrete Cosine Transform (DCT) and quantization. Most of the times the compression scheme employed is not lossless and data is lost depending on the level of compression. This loss in data can give rise to various types of artifacts. Classification, detection and evaluation of these artifacts becomes important if the performances of various algorithms are to be compared with one another. The study of artifacts is also important as it can help in the reduction and minimization of these artifacts as well. Due to complexity of the Human Visual System (HVS), the artifacts observed are not directly proportional to the level of quantization. The characteristics of the video such as local and global spatial, and temporal characteristics also have their effect on the HVS. Because of these reasons no particular quantization level can be pin pointed, which might induce a particular artifact. Therefore it becomes impossible to indicate a specific bitrate at which any one artifact appears. For demonstration purposes a video clip was encoded at various bitrates between 50 Kbps and 100 Kbps and artifacts were observed. Yuen and Wu, 1998, describe the following types of artifacts which may appear in a compressed video stream Blocking Effect The blocking effect can be defined as the discontinuities found at the boundaries of adjacent blocks. It is due to the independent quantisation of the individual blocks (normally 8 x 8 pixels in size) in block based DCT based schemes. Because the blocks are coded independent of each other, the level and characteristics of the coding error introduced into a block may differ between adjacent blocks. This, eventually appears as discontinuities between boundaries of adjacent blocks as blocking effect. Examples of blocking effect can be seen in the Fig Intra Coded Blocks The intensity of the blocking effect depends on the coarseness of the quantization of the DCT coefficients of neighboring blocks. The threshold at which blocking effect is visible depends on the contents of the block as well as on the properties of the human visual system. The blocking effect is more visible in smooth areas of a frame. However if spatially active areas are very coarsely quantized, then these areas also exhibit blockiness. Normally bright, dark and spatially active areas do not exhibit this artifact.

35 3.2. Artifacts of Compression 33 Figure 3.10: Block effect Predictive Coded Blocks In predictive coded blocks, blocking effect can appear both at the macroblock level and at the block level. The internal blocking effect does not appear for macroblocks which have a smoothly textured content and if the motion compensation is good. The combination of an accurate motion prediction and coarse quantization results in internal blocking effect. At the macroblock level, blocking manifests when the blocks in a macroblock have different motion vectors. The main reason for this phenomenon is poor motion compensation. In earlier standards, there was a single motion vector for the whole macroblock. But in later standards e.g. H.264, motion vectors are possible for units smaller than macroblocks, hence this intra macroblock blocking effect is reduced significantly DCT Basis Image Effect The DCT basis images have a regular horizontal or vertical pattern. This regular pattern and the fixed size makes them visually prominent. Example of an 8 x 8 DCT basis images is shown in Fig.().

36 34 Chapter 3. H.264 and Compression Artifacts Figure 3.11: Blurring effect The effect is caused due to the quantization of the AC coefficients in areas of high spatial detail. Situations may arise because of quantisation, where only a single AC coefficient is prominent in the representation of a block. This results in an emphasis of the pattern contributed by the prominent basis image. The quantisation matrices show that each basis image has a different quantiser value to deal with. This is because HVS does not perceive all the basis images with same importance. Therefore, the visual effect caused by the basis image effect is proportional to the amount of AC energy concentrated into any coefficient, as well as the significance of the corresponding basis image Blurring Blurring can be caused by several reasons at different stages of the encoding process. Blurring manifests in the form of loss of spatial detail and a reduction in the sharpness of edges in moderate to high spatial activity regions of a picture. Blurring can be a direct result of the suppression of the higher order AC coefficients as a result of the quantisation process. This acts as a low pass filtering procedure, reducing the edge and texture detail. In H.264 blurring can also be caused if the deblocking filter is very strong. The deblocking filter is an in the loop filter to reduce the blocking effect. The filter smooths block edges, improving the appearance of decoded frames. Very strong deblocking can also result

3.2. Artifacts of Compression 35 Figure 3.12: Original Image in blurring of the genuine edges as well and hence loss of spatial detail. The blurring effect can be seen in action in Fig. 3.11 in the texture of the ground and the road.

37 3.2. Artifacts of Compression 35 Figure 3.12: Original Image in blurring of the genuine edges as well and hence loss of spatial detail. The blurring effect can be seen in action in Fig in the texture of the ground and the road. For comparison another sample encoded at very high bitrate is also shown in Fig and the texture of the ground and road is quite visible in this figure, which is almost smooth in the Fig Color Bleeding The blurring of luminance information results in the loss of spatial information and smoothing in general. However, the same phenomenon for chrominance results in smearing of color between areas of strongly contrasting chrominance. Color bleeding is a result of, quantization to zero of higher order AC coefficients. Unlike luminance, chrominance information is subsampled, so the the effects of low pass filtering in this case are not limited to the 8 x 8 block of pixels, but extends to the boundary of the macroblock. Normally strong chrominance edges also mean the presence of strong luminance edges but this does not generally hold for the other way around. So blurring may not always be accompanied by color bleeding. It is important to note that color bleeding is a combined effect of subsampling of chrominance and the compression process. Color bleeding can be seen in action in Fig. 3.13, around the parked trailer, where the white color of the trailer can also be seen around the edges of the trailer.

Chapter 2 Introduction to

Chapter 2 Introduction to H.264/AVC H.264/AVC [1] is the newest video coding standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The main improvements