In this lecture Video Compression and Standards Gail Reynard yintroduction to video compression ytypes of frames ymotion estimation ysome video compression standards Video Compression Principles yapproaches: apply JPEG to each frame moving JPEG or MJPEG compression ratios not high enough for most applications doesn t consider redundancy between frames Make use of redundancies between frames prediction Video Compression Principles (2) ytechniques are used to predict the content of many frames based on a combination of a preceding and sometimes succeeding frame yinvolves sending: a selection of individually compressed frames PLUS differences between actual frame contents and predicted frame contents yuses: motion estimation motion compensation Frame Types ytwo basic types of compressed frame: independently encoded frames known as intracoded frames or I-frames predicted frames, two types: predictive or P-frames bidirectional or B-frames intercoded or interpolation frames I-frames ymatrices encoded independently using JPEG yquantization threshold values used are the same for all DCT coefficients compression is relatively small (10:1-20:1) yi-frames must be present at regular intervals number of frames between successive I-frames is known as a group of pictures or GOP typical values are between 3 and 12 1
P-frames yencoding relative to the contents of either: a preceding I-frame OR a preceding P-frame ycombination of motion estimation and motion compensation much higher levels of compression than I-frames (20:1-30:1) yno. of P-frames between I-frames is limited yno. of frames between P-frame & preceding I- or P-frame called prediction span (typically 1-3) Motion Estimation ycomparing small segments of two consecutive frames if a difference is found: search of neighbouring segments only a few neighbours searched yb-frames contents predicted using search regions in both past and future frames highest level of compression (30:1-50:1) do not propagate errors Encoded frame sequence Encoded frame sequence Prediction Prediction I P P I P P M = 1 N = 3 I B B P B B Bidirectional predictions M = 3 N = 6 M = prediction span N = group of pictures (GOP) span I I -- -- Decoding Received Information yi-frames decoded immediately yp-frames decoded and then used with the decoded contents of preceding I- or P-frame yb-frames decoded and then used with the decoded contents of preceding I- or P-frame and succeeding I- or P-frame Order of encoding and transmission is therefore changed! IBBPBBPBBI IPBBPBBIBBPBB ypb frames Other Types of Frame not really a new frame type the way two neighbouring P- and B-frames are encoded as if they were a single frame yd frames for use in video-on-demand applications highly compressed frames low resolution sequence of frames decoded at speed during fast forward/rewind Compensation ydigitized contents of Y matrix divided into a 2D matrix of 16 x 16 pixels called a macroblock yeach macroblock has an address yto encode a P-frame: contents of each macroblock in the target frame are compared, pixel by pixel, with the contents of the corresponding macroblock in the preceding I- or P-frame (reference frame) 2
Compensation (2) If a close match is found only the address of the macroblock is encoded If not search extended to cover an area around the macroblock in the reference frame Normally only contents of Y matrix used in search match found if the mean of the absolute errors in all pixel positions in the difference macroblock is less than a given threshold Compensation (3) if close match found, two parameters encoded: motion vector (x,y) offset of macroblock being encoded & location of block of pixels which matches in the reference frame single pixel resolution prediction error three matrices (Y, C b, C r ) containing difference values between those in the target macroblock and the matching set of pixels in the search area Compensation (4) motion vectors encoded using differential encoding resulting codewords are Huffman encoded difference matrices encoded as for I-frames (DCT, quantization, entropy encoding) If a match not found at all: macroblock is encoded independently in the same way as macroblocks in I-frames H.261 ycompression standard defined by the ITU-T for provision of video telephony and videoconferencing services over ISDN 64kbps CIF (videoconferencing) or quarter CIF (QCIF) (video telephony) used each frame divided into macroblocks of 16 x 16 pixels therefore, horizontal resolution is reduced from 360 to 352 pixels H.261 (2) yonly I- and P-frames used ythree P-frames between each pair of I- frames ystart of each new encoded video frame is indicated by the picture start code H.263 ydefined by ITU-T for use in video applications over wireless and PSTN e.g. video telephony, videoconferencing, security surveillance, interactive games playing real time applications over a modem therefore, 28.8 kbps - 56 kbps ybased on H.261, but H.261 gives poor picture quality below 64 kbps therefore H.263 is more advanced 3
H.263 (2) yqcif and sub-qcif used yhorizontal resolution reduced yuses I-, P- and B-frames yalso, neighbouring pairs of P- and B- frames can be encoded as a single entity PB-frame reduced encoding overheads increases frame rate H.263 (3) yother mechanisms used: unrestricted motion vectors error resilience error tracking independent segment decoding reference picture selection Motion Pictures Expert Group (MPEG) yformed by the ISO ystandards relating to the use of video with sound MPEG-1 video resolution based on SIF for storage of VHS-quality audio and video on CD- ROM at bit rates up to 1.5 Mbps MPEG (2) MPEG-2 for recording and transmission of studio-quality audio and video MPEG-4 initially for similar applications to those of H.263 very low bit rate channels then expanded for interactive multimedia applications over the Internet MPEG (3) MPEG-7 concerned with describing structure and features of the content of the (compressed) multimedia information produced by the different standards resulting descriptions used in search engines MPEG-3 originally supposed to be focussed on HDTV subsequently incorporated into MPEG-2 MPEG-1 ycompression algorithm similar to H.261 yhorizontal resolution reduced yuses I-frames only, or I- and P-frames only, or I-, P- and B-frames yno D-frames supported 4
MPEG-2 ysupports four levels of video resolution low, main, high 1440 and high five profiles associated with each level simple, main, spatial resolution, quantization accuracy, and high MPEG-4 yhas content-based functionalities before being compressed, each scene is defined in the form of: a background and one or more foreground audiovisual objects (AVOs) these defined in the form of one or more video objects and/or audio objects» each has a separate object descriptor» language used to describe and modify objects is called the binary format for scenes (BIFS) MPEG-4 (2) composition of a scene in terms of AVOs is defined in a scene descriptor each video frame can be segmented into a number of video object planes (VOPs) each corresponds to an AVO of interest each encoded separately based on shape, motion, texture resulting bitstreams multiplexed together for transmission along with the related object and scene descriptor information Summary yvideo compression techniques ystandards H.261 H.263 MPEG 5