Principles of Video Compression

Topics today Introduction Temporal Redundancy Reduction Coding for Video Conferencing (H.261, H.263) (CSIT 410) 2

Introduction Reduce video bit rates while maintaining an acceptable image quality Exploit strong correlation both between successive picture frames and within the picture elements themselves Insensitivity of the human visual system to loss of certain spatio-temporal visual information Uses Interframe predictive coding H.261, H.263, MPEG-1, 2 and 4 (CSIT 410) 3

Introduction [2] Fundamental redundancy reduction principles: Spatial redundancy reduction Temporal redundancy reduction Entropy coding (CSIT 410) 4

Temporal Redundancy Reduction Use Interframe coding Static parts of the image sequence, generate temporal differences almost close to zero, and hence are not coded Parts that change between the frames, either due to illumination variation or to motion of the objects, result in significant image error, which needs to be coded (CSIT 410) 5

Temporal Redundancy Reduction [2] Interframe Motion compensated Interframe (CSIT 410) 6

Temporal Redundancy Reduction [3] MOTION ESTIMATION Estimate the motion of moving objects by block matching algorithm (BMA) Divide the frame into blocks of M N pixels usually, square blocks of N 2 pixels For a maximum motion displacement of w pixels per frame Match the current block of pixels against a corresponding block at the same coordinates but in the previous frame, within the square window of width N + 2w Find the displacement on the basis of match criterion for best match (CSIT 410) 7

Temporal Redundancy Reduction [4] MOTION ESTIMATION The current and previous frames in a search window (CSIT 410) 8

Temporal Redundancy Reduction [5] MOTION ESTIMATION Matching Functions Mean Squared Error Mean Absolute Error To reduce processing cost, MAE is preferred to MSE and hence is used in all the video codecs (CSIT 410) 9

Temporal Redundancy Reduction [6] MOTION ESTIMATION BMA (block matching algorithm) in simple case requires (2w+1) 2 computations Costly Motion estimations comprise almost 50 70 per cent of the overall encoder's complexity Faster Motion estimation is required!!! (CSIT 410) 10

Temporal Redundancy Reduction [7] FASTER MOTION ESTIMATION Reduce the number of search points by selectively checking only a small number of specific points Assumption behind this being the distortion measure monotonically decreases towards the best matched point Approaches Two-dimensional logarithmic (TDL) Three-step search (TSS) Modified motion estimation algorithm (MMEA) (CSIT 410) 11

Temporal Redundancy Reduction [8] FASTER MOTION ESTIMATION Two dimensional Logarithmic Search (CSIT 410) 12

Temporal Redundancy Reduction [9] FASTER MOTION ESTIMATION Try to find the maximum number of steps to reach the best estimation!!! (CSIT 410) 13

Temporal Redundancy Reduction [10] HIERARCHICAL MOTION ESTIMATION The assumption of monotonic variation of image intensity methods perform well for slow moving objects, such as those in video conferencing often converge to a local minimum of distortion subsample the image to smaller sizes, such that the motion speed is reduced by the sampling ratio Hierarchical block matching algorithm (HBMA) A three-level image pyramid (CSIT 410) 14

Temporal Redundancy Reduction [11] HIERARCHICAL SEARCH ALGORITHM Begin // Get macroblock center position at the lowest resolution Level k x k 0= x 0 0 /2 k ; y k 0 = y 0 0/2 k ; Use Sequential (or 2D Logarithmic) search method to get initial estimated MV(u k, v k ) at Level k; while last TRUE { End Find one of the nine macroblocks that yields minimum MAD at Level k 1 centered at (2(x k 0 + u k ) 1 x 2(x k 0 + u k )+1, (2(y k 0 + v k ) 1 y 2(y k 0 + v k )+1 ); if k = 1 then last = TRUE; k = k 1; Assign (x k 0, y k 0) and (u k ; v k ) with the new center location and MV; (CSIT 410) 15

Temporal Redundancy Reduction [12] GENERIC INTERFRAME VIDEO CODEC Generic Interframe encoder used in standard video codecs, such as H.261, H.263, MPEG- 1, MPEG-2 and MPEG-4 (CSIT 410) 16

Temporal Redundancy Reduction [14] GENERIC INTERFRAME VIDEO CODEC Generic Interframe decoder (CSIT 410) 17

Coding for Video Conferencing (H.261) Allows bitrates between approximately 64 kbit/s and 1920 kbit/s Interframe DCT-based coding technique Interframe prediction is first carried out in the pixel domain The prediction error is then transformed into the frequency domain, where the quantization for bandwidth reduction takes place Motion compensation can be included in the prediction stage, although it is optional (CSIT 410) 18

Coding for Video Conferencing (H.261) [2] Two types of frames I-Frame P-Frame I-Frame is usually sent every half a second Motion vectors are always measured in the neighborhood of 15 pixels Frame Sequence (CSIT 410) 19

Coding for Video Conferencing (H.261) [3] I-Frame Coding (CSIT 410) 20

Coding for Video Conferencing (H.261) [4] P-Frame Coding (CSIT 410) 21

Coding for Video Conferencing (H.261) [5] Quantization Step size is fixed, 31 even levels from 2 62 scale between 1 to 31 Exception : DC coeff in I-Frame, step size is 8 always used (CSIT 410) 22

Coding for Video Conferencing (H.261) [6] Encoder (CSIT 410) 23

Coding for Video Conferencing (H.261) [7] Decoder (CSIT 410) 24

Video Bitstream Syntax Four Layers Picture Layer Group Of Blocks (GOB) Layer 11 x 3 Macroblocks, a GOB CIF contains 2x6 GOBS QCIF contains 3 GOBS Macroblock Layer Block Layer Syntax of H.261 video bitstream (CSIT 410) 25

H.263 An improved video coding standard for video conferencing & other audio visual services Aimed at low bitrate communications of less than 64kbps Predictive coding for inter-frames Transform coding for intra-frames & difference macroblocks from inter-frame prediction Supports notion of GOBs (CSIT 410) 26

H.263 Motion Compensation MV is the median of MV1, MV2 and MV3 a) Predicted MV of the current block b) Finding MV when current block is on the border (CSIT 410) 27

H.263 Motion Compensation [2] Motion compensation involves half pixel precision (CSIT 410) 28

H.263 Optional Coding Modes Unrestricted motion vector mode Syntax based arithmetic coding (SAC) Advanced prediction mode PB-Frames (CSIT 410) 29

H.263 Optional Coding Modes[2] Unrestricted Motion Vector mode. In this mode motion vectors are allowed to point outside the picture. The edge pixels are used as prediction for the "not existing" pixels. With this mode a significant gain is achieved if there is movement along the edge of the pictures, especially for the smaller picture formats. It also includes an extension of the motion vector range so that larger motion vectors can be used. This is especially useful in case of camera movement. (CSIT 410) 30

H.263 Optional Coding Modes[3] Syntax-based Arithmetic Coding mode. In this mode arithmetic coding is used instead of VLC coding. The SNR and reconstructed frames will be the same, but generally fewer bits will be produced. The average gain for inter frames is 3-4%. This gain depends on the sequence, the bit rate and other options used. For intra blocks and frames, the gain is higher, on average about 10%. (CSIT 410) 31

H.263 Optional Coding Modes[4] Advanced Prediction mode. This option means that overlapped block motion compensation is used for the P- frames. Four 8x8 vectors instead of one 16x16 vector are used for some of the macro blocks in the picture, and motion vectors are allowed to point outside the picture as in the UMV mode above. The encoder has to decide which type of vectors to use. Four vectors use more bits, but give better prediction. (CSIT 410) 32

H.263 Optional Coding Modes PB-frames mode. A PB-frame consists of two pictures being coded as one unit. The name PB comes from the name of picture types in MPEG where there are P- pictures and B-pictures. Thus a PB-frame consists of one P-picture which is predicted from the last decoded P-picture and one B-picture which is predicted from both the last decoded P- picture and the P-picture currently being decoded. This last picture is called a B-picture, because parts of it may be bi-directionally predicted from the past and future P-pictures. (CSIT 410) 33

Reference: 1. Chapter 10 2. Chapter 3 of JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures by Tinku Acharya and Ping-Sing Tsai, John Wiley & Sons See Blackboard, Course Materials (CSIT 410) 34