Chapter 10 Basic Video Compression Techniques

Chapter 10 Basic Video Compression Techniques 10.1 Introduction to Video compression 10.2 Video Compression with Motion Compensation 10.3 Video compression standard H.261 10.4 Video compression standard MPEG-1 1

The need for video compression 2

Introduction to Video Compression A video consists of a time-ordered sequence of frames, i.e., images. An obvious solution to compress video by applying an image compression algorithm to each frame, for instance compressing each image frame as a JPEG image. 3

Introduction to Video Compression Consecutive frames in a video are similar temporal redundancy exists. Significantly higher compression rates can be achieved by exploiting temporal redundancy. 4

Introduction to Video Compression It Utilizes two basic compression techniques: Intraframe compression Occurs within individual frames Designed to minimize the duplication of data in each picture (Spatial Redundancy) Interframe compression Compression between frames Designed to minimize data redundancy in successive pictures (Temporal redundancy) 5

Introduction to Video Compression Temporal redundancy arises when successive frames of video display images of the same scene. It is common for the content of the scene to remain fixed or to change only slightly between successive frames. Spatial redundancy occurs because parts of the picture are often replicated (with minor changes) within a single frame of video. 6

Temporal Redundancy Temporal redundancy is exploited so that not every frame of the video needs to be coded independently as a new image. It makes more sense to code only the changed information from frame to frame rather than coding the whole frame With difference coding, only the first image (I-frame) is coded in its entirety. In the two following images (P-frames), references are made to the first picture for the static elements, i.e. the house. Only the moving parts, i.e. the running man, are coded using motion vectors, thus reducing the amount of information that is sent and stored. 7

Temporal Redundancy Temporal redundancy can be better exploited by Predictive coding based on previous frames. Predicting the motion of pixels and regions from frame to frame, rather than predicting the frame as a whole. Compression proceeds by subtracting images. The difference between the current frame and other frame(s) in the sequence will be coded small values and low entropy, good for compression. 8

Temporal Redundancy 9

Pixel motion prediction It can be done even better by searching for just the right parts of the image to subtract from the previous frame. Practically, the coherency from frame to frame is better exploited by observing that groups of contiguous pixels, rather than individual pixels, move together with the same motion vector. Therefore, it makes more sense to predict frame n+1 in the form of regions or blocks rather than individual pixels. 10

Pixel motion prediction The pixel Cn(x, y) shown in the frame n has moved to a new location in frame n+1. Consequently, C n+1 (x, y) in frame n+1 is not the same as Cn(x, y) but is offset by the motion vector (dx,dy). the small error difference e(x,y) = C n+1 (x, y) - Cn(x+dx, y+dy) 11

Video Compression with Motion Compensation Steps of Video compression based on Motion Compensation (MC): MC-based Prediction. Motion Estimation (motion vector search). Derivation of the prediction error, i.e., the difference. 12

Motion Compensation It is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. 13

Motion Compensation How it works It exploits the fact that, often, for many frames of a movie, the only difference between one frame and another is the result of either the camera moving or an object in the frame moving. In reference to a video file, this means much of the information that represents one frame will be the same as the information used in the next frame. Using motion compensation, a video stream will contain some full (reference) frames; then the only information stored for the frames in between would be the information needed to transform the previous frame into the next frame. 14

Motion Compensation Each image is divided into macroblocks of size N x N. By default, N = 16 for luminance images. For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted. 15

Motion Compensation Motion compensation is performed at the macroblock level. The current image frame is referred to as Target Frame. A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)). The displacement of the reference macroblock to the target macroblock is called a motion vector MV. Figure 10.1 shows the case of forward prediction in which the Reference frame is taken to be a previous frame. 16

Fig. 10.1: Macroblocks and Motion Vector in Video Compression. MV search is usually limited to a small immediate neighborhood both horizontal and vertical displacements in the range [ p, p]. This makes a search window of size (2p + 1) x (2p + 1). 17

Size of Macroblocks Smaller macroblocks increase the number of blocks in the target frame a larger number of motion vectors to predict the target frame. This requires more bits to compress motion vectors, but smaller macroblocks tend to decrease the prediction error. Larger macroblocks fewer motion vectors to compress, but also tend to increase the prediction error. This is because larger areas could possibly cover more than one moving region within a large macro block. 18

Video compression standard H.261 H.261: An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards. The standard was designed for videophone, video conferencing and other audiovisual services over ISDN. The video codec supports bit-rates of p x 64 kbps, where p ranges from 1 to 30 (Hence also known as p * 64). Require that the delay of the video encoder be less than 150 msec so that the video can be used for realtime video conferencing. 19

H.261 Frame Sequence Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): 20

H.261 Frame Sequence I-frames: These are intra-frames coded where only spatial redundancy is used to compress that frame. Are treated as independent images (can be reconstructed without any reference to other frames). Transform coding method similar to JPEG is applied within each I-frame. This frame requires more bits for compression than predicted frames their compression is not that high 21

H.261 Frame Sequence P-frames: P-frames are predictive coded (forward predictive coding method ), exploiting temporal redundancy by comparing them with a preceding reference frame Are not independent (it is impossible to reconstruct them without the data of another frame (I or P)) They contain the motion vectors and error signals P-frames need less space than the I-frames, because only the differences are stored. However, they are expensive to compute, but are necessary for compression An important problem the encoder faces is when to stop predicting using P-frames, and instead insert an I frame An I frame needs to be inserted where P frames cannot give much compression This happens during scene transitions or scene changes, where the error images are high. 22

Fig. 10.4: H.261 Frame Sequence. We typically have a group of pictures one I-frame followed by several P-frames a group of pictures Number of P-frames followed by each I-frame determines the size of GOP can be fixed or dynamic. Why this can t be too large? 23

H.261 Frame Sequence Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal. Lost P-Frames usually results in artifacts that are folded into subsequent frames. If an artifact persists over time, then the likely cause is a lost P-Frame. To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video. 24

Intra-frame (I-frame) Coding Various lossless and lossy compression techniques use like JPEG. Compression contained only within the current frame Simpler coding Not enough by itself for high compression. Cant rely on intra frame coding alone not enough compression. However, cant rely on inter frame differences across a large number of frames So when Errors get too large: Start a new I-Frame 25

Intra-frame (I-frame) Coding Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x 8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8 x 8 blocks. For each 8 x 8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding. 26

Block Transform Encoding (I-frame) 27

Inter-frame (P-frame) Predictive Coding The H.261 P-frame coding scheme based on motion compensation: For each macroblock in the Target frame, a motion vector is allocated. After the prediction, a difference macroblock is derived to measure the prediction error. Each of these 8 x 8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures. 28

Inter-frame (P-frame) Predictive Coding The P-frame coding encodes the difference macroblock (not the Target macroblock itself). Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level. The MB itself is then encoded (treated as an Intra MB) and in this case it is termed a non-motion compensated MB. (In order to minimize the number of expensive motion estimation calculations, they are only calculated if the difference between two blocks at the same position is higher than a threshold, otherwise the whole block is transmitted) For a motion vector, the difference MVD is sent for entropy coding 29

Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation. 30

Video compression standard MPEG MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video. MPEG compression is essentially an attempt to overcome some shortcomings of H.261: H.261 only encodes video. MPEG-1 encodes video and audio. H.261 only allows forward prediction. MPEG-1 has forward and backward prediction (B-pictures). MPEG-1 was designed to allow a fast forward and backward search and a synchronization of audio and video. 31

Motion Compensation in MPEG-1 As mentioned before, Motion Compensation (MC) based video encoding in H.261 works as follows: In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction. prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps. The prediction is from a previous frame forward prediction. 32

Motion Compensation in MPEG-1 Sometimes, areas of the current frame can be better predicted by the next future frame. This might happen because objects or the camera moves, exposing areas not seen in the past frames. The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame. 33

The Need for a Bidirectional Search The Problem here is that many macroblocks need information that is not in the reference frame. Occlusion by objects affects differencing Difficult to track occluded objects etc. MPEG uses forward/backward interpolated prediction. Using both frames increases the correctness in prediction during motion compensation. The past and future reference frames can themselves be coded as an I or a P frame. 34

MPEG B-Frames The MPEG solution is to add a third frame type which is a bidirectional frame, or B-frame 35

MPEG B-Frames B-frames, also known as bidirectionally coded frames, are intercoded and also exploit temporal redundancy. To predict a B-frame, the previous or past frame and the next or future frame are used. The coding of B frames is more complex compared with I or P-frames with the encoder having to make more decisions. 36

MPEG B-Frames To compute a matching macroblock, the encoder needs to search for the best motion vector in the past reference frame and also for the best motion vector in the future reference frame. Two motion vectors are computed for each macroblock. The macroblock gets coded in one of three modes: Forward predicted using only the past frame Backward predicted using only the future frame Interpolated, using both by averaging the two predicted blocks The case corresponding to the best macroblock match and yielding the least entropy in the difference is chosen. 37

Backward Prediction Implications B-frames also necessitate reordering frames during transmission, which causes delays: The order in which frames arrive at the encoder is known as the display order. This is also the order in which the frames need to be displayed at the decoder after decoding. B-frames induce a forward and backward dependency. The encoder has to encode and send to the decoder both the future and past reference frames before coding and transmitting the current B-frame. Because of the change in the order, all potential B-frames need to be buffered while the encoder codes the future reference frame, imposing the encoder to deal with buffering and also causing a delay during transmission. 38

Backward Prediction Implications Ex: Here, Backward prediction requires that the future frames that are to be used for backward prediction be Encoded and Transmitted first, I.e. out of order. Fig 10.9: MPEG Frame Sequence. 39

Example encoding patterns: Pattern 1: IPPPPPPPPPP Dependency: I <---- P <---- P <---- P... I-frame compressed independently First P-frame compressed using I-frame Second P-frame compressed using first P-frame And so on... 40

Example encoding patterns: Pattern 2: I BB P BB P BB P BB P BB P BB P Dependency: I <---- B B ----> P <---- B B ----> P... I-frame compressed independently First P-frame compressed using I-frame B-frames between I-frame and first P-frame compressed using I-frame and first P-frame Second P-frame compressed using first P-frame B-frames between first P-frame and second P-frame compressed using first P-frame and second P-frame And so on... 41

Example encoding patterns: Pattern 3: I BBB P BBB P BBB P BBB P Dependency: I <---- B B B ----> P <---- B B B ----> P... I-frame compressed independently First P-frame compressed using I-frame B-frames between I-frame and first P-frame compressed using I-frame and first P-frame Second P-frame compressed using first P-frame B-frames between first P-frame and second P-frame compressed using first P-frame and second P-frame And so on... 42

The quality of an MPEG-video The usage of the particular frame type defines the quality and the compression ratio of the compressed video. I-frames increase the quality (and size), whereas the usage of B- frames compresses better but also produces poorer quality. The distance between two I-frames can be seen as a measure for the quality of an MPEG-video. No defined limit to the number of consecutive B frames that may be used in a group of pictures, Optimal number is application dependent. Most broadcast quality applications however, have tended to use 2 consecutive B frames (I,B,B,P,B,B,P,) as the ideal trade-off between compression efficiency and video quality. 43

MC-based B-frame coding idea (summary) The MC-based B-frame coding idea is illustrated in Fig. 10.8: Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction). If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged (indicated by % in the figure) before comparing to the Target MB for generating the prediction error. If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction. 44

Fig 10.8: B-frame Coding Based on Bidirectional Motion Compensation. 45