Video compression principles Video: moving pictures and the terms frame and picture. one approach to compressing a video source is to apply the JPEG algorithm to each frame independently. This approach is known as moving JPEG or MJPEG. between 10:1 and 20:1, neither of which is large enough on its own to produce the compression ratios needed. Redundancy is often present between a set of frames Example: movement of a person s lips or eyes in video telephony application a person or vehicle moving across the screen in a movie. (only a small portion of each frame is involved with any motion that taking place.) Hence, sending only information relating to those segments of each frame that have movement associated with them. (considerable additional savings in bandwidth can be made by exploiting the temporal differences that exist between many of the frames). Just a selection is sent in individually-compressed form and, for the remaining frames, only the differences between the actual frame contents and the predicted frame contents are sent. Sub-sampling of Chrominance Information Transforming (R,G,B)->(Y,Cb,Cr) provides two advantages: 1)The human visual system (HVS) is more sensitive to Y component than the Cb or Cr components. 2) Cb and Cr are far less correlated with Y than R with G, R with Blue and Blue with G, thus reducing TV transmission bandwidths. Cb and Cr both require far less bandwidth and can be sampled more coarsely (Shannon). By doing so we can reduce data without affecting visual quality from a personal view. Color Space Conversion In general, each pixel in a picture consists of three components : R (Red), G (Green), B (Blue). (R,G,B) must be converted to (Y,Cb,Cr) in MPEG-1 before processing We can view the color value of each pixel from RGB color space, or YCbCr color space Because (Y,Cb,Cr) is less correlated than (R,G,B), coding using (Y,Cb,Cr) components is more efficient. (Y,U,V) can also be used to denote (Y,Cb,Cr), however it most appropriately represents the analog TV equivalent 1
Macroblock structure Macro Blocks & Color Sub-sampling Schemes The basic coding unit is a 8 by 8 matrix block. A macroblock is consists of six block: 4 block of luminance (Y), one block of Cb chrominance, and one block of Cr chrominance A macroblock consists of 4 8x8 pixel blocks Slide: Courtesy, Hung Nguyen Picture Frames - Overview Three frame types: I-Picture (Intra-frame picture), P-Picture (Inter-frame predicted picture) B-Picture (Bi-directional predicted- interpolated pictures) I-frames Are encoded without reference to any other frames. Each frame is treated as a separate (digitized) picture and the Y, Cb and Cr matrices are encoded independently using the JPEG algorithm (DCT, quantization, entropy encoding) except that the quantization threshold values that are used are the same for all DCT coefficients. Hence the level of compression obtained with I- frames is relatively small. 2
P-frames The encoding of a P-frame is relative to the contents of either a preceding I-frame or a preceding P-frame. P-frames are encoded using a combination of motion estimation and motion compensation B-frames Their contents are predicted using search regions in both past and future frames. allowing for occasional moving objects, this also provides better motion estimation. Example Frame Sequences Group of pictures or GOP: The number of frames/pictures between successive I-frames It is given the symbol N and typic values for Nare from 3 through to 12. I and P Frames Only I,P and B Frames The number of frames between a P- frame and the immediately preceding I- or P-frame is called the prediction span. It is given the symbol M (1 & 3) A typical sequence of frames involving just I- and P-frames is shown in Figure 4.11(a) and a sequence involving all three frame types is shown in part (b) of the figure. 3
P-frames their contents are encoded by considering the contents of the current (uncoded) frame relative to the contents of the immediately preceding (uncoded) frame. B-frames, however three (uncoded) frame contents are involved: the immediately preceding I- or P-frame, the current frame being encoded, and the immediately succeeding I- or P-frame. This results in an increase in the encoding (and decoding) delay which is equal to the time to wait for the next I- or P-frame in the sequence. Decoding P frame With P-frames, the received information is first decoded and the resulting information is then used, together with the decoded contents of the preceding I- or P-frame, to derive the decoded frame contents. Decoding B frames In the case of B-frames, the received information is first decoded and the resulting information is then used, together with both the immediately preceding I- or P-frame contents and the immediately succeeding P- or I-frame contents, to derive the decoded frame contents. Hence in order to minimize the time required to decode each B-frame, the order of encoding (and transmission) of the (encoded) frames is changed so that both the preceding and succeeding I- or P-frames are available when the B-frame is received. Frame Types there are two basic types of compressed frame: those that are encoded independently those that are predicted. Intracoded Frames -> I-Frames Level of compression is relatively small 10:1 to 20:1 Present at regular intervals to limit extent of errors Number of frames between I-frames is known as the Group of pictures (GOP) 10:1 to 20:1 compression ratio Intercoded Frames (interpolation frames) Predicted Frames-> P-Frames Significant compression level achieved here Errors are propagated 20:1 to 30:1 compression ratio Bidirectional Frames -> B-Frames Highest levels of compression achieved B-frames are not used for prediction, thus errors are not propagated 30:1 to 50:1 compression ratio 4
Motion Compensation (MC) And Motion Estimation (ME) Motion Estimation is to predict a block of pixels' value in next picture using a block in current picture. The location difference between these blocks is called Motion Vector. And the difference between two blocks is called prediction error. In MPEG-1, encoder must calculate the motion vector and prediction error. When decoder obtain these information, it can use this information and current picture to reconstruct the next picture. We usually call this process as Motion Compensation. In general, motion compensation is the inverse process of motion Estimation Slide: Courtesy, Hung Nguyen Motion Compensation Try to match each block in the actual picture to content in the previous picture. Matching is made by shifting each of the 8 x 8 blocks of the two successive pictures pixel by pixel each direction -> Motion vector Subtract the two blocks -> Difference block Transmit the motion vector and the difference block Motion estimation Estimation any movement between successive frames. (The accuracy of the prediction operation?) Motion Estimation (ME) Motion compensation additional information must also be sent to indicate any small differences between the predicted and actual positions of the moving segments involved. Slide: Courtesy, Hung Nguyen 5
P-Frame Encoding: Macroblock Structure P-Frame Encoding: Encoding Procedure Intra-frame Encoding Process Decomposing image to three components in RGB space Converting RGB to YCbCr Dividing image into several macroblocks (each macroblock has 6 blocks, 4 for Y, 1 for Cb, 1 for Cr) DCT transformation for each block After DCT transform, Quantizing each coefficient Then use zig-zag scan to gather AC value Use DPCM to encode the DC value, then use VLC to encode it Use RLE to encode the AC value, then use VLC to encode it 6
Coding of P Pictures As in I pictures, the encoder needs to store the decoded P pictures since this may be used as the starting point for motion compensation. Therefore, the encoder will reconstruct the image from the quantized coefficients. In coding P pictures, the encoder has more decisions to make than in the case of I pictures Selection of Macroblock Type: There are 8 types of macroblock in P pictures. Motion Compensation Decision: The encoder has an option on whether to transmit motion vectors or not for predictive-coded macroblocks. Intra/Non-intra Coding Decision Coded/Not Coded Decision: After quantization, if all the coefficients in a block is zero then the block is not coded. Quantizer/No Quantizer Decision: Quantizer scale can be altered which will affect the picture quality. Slide: Courtesy, Hung Nguyen 7