UMCP ENEE631 Slides (created by M.Wu 2004) Basics on Video Communications and Other Video Coding Approaches/Standards Spring 06 Instructor: K. J. Ray Liu ECE Department, Univ. of Maryland, College Park UMCP ENEE631 Slides (created by M.Wu 2004) Quick Review A Few Basics on Video Acquisition, Display, Analog & Digital Formats ENEE631 Digital Image Processing (Spring'06) ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [2] Video Camera Video Display Frame-by-frame capturing CCD sensors (Charge-Coupled Devices) 2-D array of solid-state sensors Each sensor corresponding to a pixel Store in a buffer and sequentially read out Widely used small and light CMOS sensors Each sensor is a transitor ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [3] CRT (Cathode Ray Tube) Large dynamic range Bulky for large display CRT physical depth has to be similar to screen width LCD Flat-panel display Use electrical field to change the optical properties hence the brightness/color of liquid crystal Generating the electrical field by an array of transistors: active-matrix thin-film transistors by plasma Active-matrix display (also known as TFT) has a transistor located at each pixel, allowing display be switched more frequently and less current to control pixel luminance. Passive matrix LCD has a grid of conductors with pixels located at the grid intersections ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [4] 1
Composite vs. Component Video Component video Three separate signals for tristimulus color representation or luminancechrominance representation Pro: higher quality Con: need high bandwidth and synchronization Composite video Multiplex into a single signal Historical reason for transmitting color TV through monochrome channel Pro: save bandwidth Con: cross talk S-video: luminance sig. + single multiplexed chrominance sig. Analog Video Raster Line-by-line Raster Scan Represent line-by-line image frame with 1-D analog waveform Synchronization signal for horizontal and vertical retrace ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [5] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [6] Forming Picture on TV Tube (Monochrome) How Many TV Lines? From B.Liu EE330 S 01 Princeton From B.Liu EE330 S 01 Princeton Determined by spatial freq. response of HVS dot dot Cannot resolve if distance > 2000 x separation (~ 0.03 degree viewing angle) How many lines? N = 500 for D=4H ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [7] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [8] 2
Review: Progressive vs. Interlaced scan From B.Liu EE330 S 01 Princeton Analog Color TV Systems Historical notes Color TV system had to be compatible with earlier monochrome TV system 3 formats NTSC ~ North American + Japan/Taiwan PAL ~ Western Europe + Asia(China) + Middle East SECAM ~ Eastern Europe + France What format in your home country? From Wang s Preprint Fig.1.5 ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [9] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [10] Comparison of Three Analog TV Systems Spatial and temporal resolution Color coordinate Signal bandwidth Multiplexing of luminance, chrominance, and audio (From Wang s Preprint) NTSC 4:3 aspect ratio (width:height) 525 lines/frame, 2:1 interlace at field rate 59.94Hz 483 active lines per frame; vertical retrace takes time of 9 lines rest for broadcaster s info. like closed caption YIQ color coordinate for transmission RGB primary slightly different from PAL Orthogonal chrominance I ~ orange-to-cyan; Q ~ green-to-purple (need less bandwidth) Multiplexing over 6M Hz total bandwidth Artifacts due to cross talk between luminance and chrominance ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [11] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [12] 3
NTSC 6MHz Bandwidth From Wang s Preprint Fig.1.6(b) Analog Video Recording From Wang s Preprint Table 1.2 Comparison of common formats ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [13] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [14] Digital Video Formats ITU-R BT.601 recommendation Downsampled chrominance Y Cb Cr coordinate and four subsampling formats Inter. Telecomm. Union Radio sector Wang s Preprint Fig.1.8 ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [15] From Wang s Preprint Table 1.3 ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [16] 4
From R.Liu Seminar Course 00 @ UMCP Generations of Video Coding UMCP ENEE631 Slides (created by M.Wu 2004) Resource Background and Motivation on Multimedia Coding / Communications ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [17] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [18] Channel Bandwidth From R.Liu Seminar Course 00 @ UMCP Storage Capacity From R.Liu Seminar Course 00 @ UMCP ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [19] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [20] 5
From R.Liu Seminar Course 00 @ UMCP From R.Liu Seminar Course 00 @ UMCP Source Video Formats Application Requirements ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [21] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [22] Performance Tradeoff for Video Coding UMCP ENEE631 Slides (created by M.Wu 2004) Other Standard and Considerations for Digital Video Coding From R.Liu s Handbook Fig.1.2: mos ~ 5-pt mean opinion scale of bad, poor, fair, good, excellent ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [23] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [24] 6
H.26x for Video Telephony Remote face-to-face communication: A dream for years H.26x Video coding targeted low bit rate Through ISDN or regular analog telephone line ~ on the order of 64kbps Need roughly symmetric complexity on encoder and decoder H.261 (early 1990s) Similar to simplified MPEG-1 ~ block-based DCT/MC hybrid coder Integer-pel motion compensation with I/P frame only ~ no B frames Restricted picture size/fps format and M.V. range H.263 (mid 1990s) and H.263+/H.263++ (late 1990s) Support half-pel motion compensation & many options for improvement H.264 (latest, 2001-): also known as H.26L / JVT / MPEG4 part10 Hybrid coding framework with many advanced techniques Focusing on greatly improving compression ratio at a cost of complexity MPEG-2 Extend from MPEG-1 Target at high-resolution high-bit-rate applications Digital video broadcasting, HDTV, Also used for DVD Support scalability Support interlaced video Frame pictures vs. Field pictures New prediction modes for motion compensation related to interlaced video Use previously encoded fields to do M.E.-M.C. ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [25] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [26] Scalability in Video Codecs Scalability: provide different quality in a single stream Stack up more bits on base layer to provide improved quality Possible ways for achieving scalabilities SNR Scalability ~ Multiple quality video services Basic vs. premium quality Spatial Scalability ~ Multiple-dimension displays Display on PDA vs. PC vs. Super-resolution display Temporal Scalability ~ Multiple frame rates Layered coding concept facilitates: Unequal error protection Efficient use of resources Different needs from customers Multiple services ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [27] SNR Scalability Two layers with same spatio-temporal resolution but different qualities Video in + - base-layer encoder base-layer decoder enhancement-layer encoder Base-layer Enhancement-layer ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [28] multiplexer Output From R.Liu Seminar Course @ UMCP 7
Spatial Scalability Two layers with different spatial resolution Temporal Scalability From R.Liu Seminar Course @ UMCP Enhancement layer carries additional frames at same spatial resolution Video in Down-sampler Up-sampler + - base-layer encoder base-layer decoder enhancement-layer encoder Base-layer Enhancement-layer multiplexer Output From R.Liu Seminar Course @ UMCP Base-layer video in Base-layer decoded video out Enhancement-layer video in Temporal demux base-layer encoder base-layer decoder enhancement-layer encoder Base-layer Enhancement-layer Base-layer Enhancement-layer multiplexer Output ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [29] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [30] MPEG-4 Many functionalities targeting a variety of applications Introduced object-based coding strategy For better support of interactive applications & graphics/animation video Require encoder to perform object segmentation difficult for general applications Introduced error resilient coding techniques Streaming video profile for wireless multimedia applications Part-10 is converged into H.264 Focused on improving compression ratio and error resilience Stick with Hybrid coding framework Object-based based Coding in MPEG-4 Interactive functionalities Higher compression efficiency by separately handling Moving objects Unchanged background New regions M.C.-failure regions => Sprite encoding Object segmentation needed (not easy ) Based on color, motion, edge, texture, etc. Possible for targeted applications Revised from R.Liu Seminar Course @ UMCP ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [31] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [32] 8
Object-based based Coding in MPEG-4 4 (cont d) Model-Based Video Coding From Wang s book preprint Fig. 13.30 ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [33] From R.Liu Seminar Course @ UMCP ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [34] Analysis-Synthesis Coding Some Coding Models From R.Liu Seminar Course @ UMCP ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [35] From R.Liu Seminar Course @ UMCP ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [36] 9
MPEG-7 Multimedia Content Description Interface Figure from MPEG-7 Document N4031 (March 2001) Not a video coding/compression standard like previous MPEG Emphasize on how to describe the video content for efficient indexing, search, and retrieval Standardize the description mechanism of content Descriptor, Description Scheme, Description Definition Languages Example of MPEG-7 visual descriptor: Color, Texture, Shape, Summary Scalable coding Standards evolved from or similar to MPEG-1 MPEG-2, H.26x Brief intro. on model-based coding Object-based video coding & MPEG-4 Additional MPEG-4 activities Error resilience Intellectual property management/protection What is after MPEG-4? MPEG-7 for facilitating image/video search and indexing ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [37] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [38] UMCP ENEE408G Slides (created by M.Wu 2002) Reading Assignment Readings Wang s book Chapt.13, Sec.11.1, Sec.10.5 [Electronic Handout] R.Liu s Handbook Chapt.1-3 Chapter 7 Data Compression (handout) Sec. 7.6 => H.261 & H.263 Sec. 7.7.5 & 7.7.6 => MPEG-4 & MPEG-7 Tutorial on MPEG Video Coding (handout) IEEE Signal Processing Magazine, Sept. 1997 Video Content Analysis ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [39] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [40] 10
Introduction to Video Content Analysis Teach computer to understand video content Define features that computer can learn to measure and compare color (RGB values or other color coordinates) motion (magnitude and directions) shape (contours) texture and patterns Give example correspondences so that computer can learn build connections between feature & higher-level semantics/concepts statistical classification and recognition techniques Video understanding Break a video sequence into chunks, each with consistent content ~ shot Group similar shot into scenes that represent certain events Describe connections among scenes via story boards or scene graphs Associate shot/scene with representative feature/semantics for future query Video Understanding (step-1) From Yeung-Yeo-Liu: STG (Princeton) Break a video sequence into chunks, each with consistent content ~ shot ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [41] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [42] Video Understanding (step-2) Group similar shot into scenes From Yeung-Yeo-Liu: STG (Princeton) Video Understanding (step-3) From Yeung-Yeo-Liu: STG (Princeton) Describe connections among scenes via story boards or scene graphs ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [43] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [44] 11
Video Temporal Segmentation A first step toward video content understanding Two types of transitions Cut ~ abrupt transition Gradual transition Fade out and Fade in; Dissolve; Wipe Detecting transitions Detecting cut is relatively easier ~ check frame-wise difference Detecting dissolve and fade by checking linearity f 0 (1 t/t) + f 1 * t/t Detecting wipe ~ more difficult via projection, edge pattern, or linearity of color histogram Types of Transitions [above] Transition types offered by Adobe Premiere See also transition demos provided by PowerPoint From talks by Joyce-Liu (Princeton) Video transition collection (Rob Joyce) www.ee.princeton.edu/~robjoyce/research/transitions/ ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [45] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [46] Examples of Wipes Compressed-Domain Processing UMCP ENEE408G Slides (created by M.Wu 2002) Use I & P frames only to reduce computation and to enhance robustness in scene change detection I b b P b b P b b P b b I b b P Working in compressed domain Process video by only doing partial decoding (inverse VLC, etc.) without a full decoding (IDCT) to save computation Low resolution version already provide enough information for transition detection DC-image ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [47] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [48] 12
DC Image Put DC of each block together Already contain most information of the video DC Frame Example From Joyce-Liu (Princeton) UMCP ENEE408G Slides (created by M.Wu 2002) Fast Extraction of DC Image From MPEG-1 I frame Take DC coeff. from each block and put together P/B frame Fast approximation of reference block s DC Adding DC of the motion compensation residue recall DCT is a linear transform 1 2 R 3 4 C 4 hw i i [ DCT( Pref )] [ DCT ( P i )] 64 00 00 i= 1 [ DCT( P )] [ DCT( P )] + [ DCT( P )] cur 00 ref 00 diff 00 ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [49] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [50] UMCP ENEE408G Slides (created by M.Wu 2002) Compressed-Domain Scene Change Detection Compare nearby frames Take pixel-wise difference of nearby DC-frames Or take pixel-wise difference of every N frames to accumulate more changes => useful for detect gradual transitions Observe the pixel-wise difference for different frame pairs Peaks @ cuts, and plateaus @ gradual transitions Figure from Yeo-Liu CSVT 95 paper UMCP ENEE408G Slides (created by M.Wu 2002) Scene Change Detection (cont d) Figure from Yeo-Liu CSVT 95 paper ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [51] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [52] 13
Dissolve: DC Frame Space g k Pixel 3 Dissolve: a linear combination of g and h Detect straight lines in DC frame space correlation detection on triplets m Pixel 2 n dissolve Pixel 1 h k From talks by Joyce-Liu (Princeton) UMCP ENEE408G Slides (created by M.Wu 2002) Wipe Detection Convert the 2-D problem to 1-D by projection Perform horizon, vertical, diagonal projection to detect diverse wipe types ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [53] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [54] Color Histogram What is color histogram? Count the # of pixels with the same color Plot color-value vs. corresponding pixel# Similarly for luminance histogram Give idea of the dominate color and color distribution Ignore the exact spatial location of each color value Useful in image and video analysis Color histogram can be used to: Detect gradual shot transition esp. for fancy wipes Measure content similarity between images / video shots Wipe Detection (cont d) More diverse and fancy wipes Linear change in color histogram G k m Bin 2 wipe Bin 1 H k From talks by Joyce-Liu (Princeton) Bin 3 n ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [55] ENEE631 Digital Image Processing (Spring'06) Lec20 Video Coding (3) [56] 14