ECEN 5653/4653 RT Digital Media Systems Introduction to Sampling, Encoding, Decoding and Transport February 14, 2012 Sam Siewert
Digital Media = Digital Video and Audio Sam Siewert 2
Overview Introduction to Codecs (Encode/Decode for Digital Media) Digital Sampling Basic Color Models Color: Luminance/Chrominance Audio: PCM (Pulse Code Modulation), A-to-D Resolution and Frequency Audio Playback: DACs, Speakers, Channels Digital Video Encoding Pixel (POINT) Frame (XY Pixel Map or Grid) Sequence of Frames (Group of Pictures) Digital Media Systems Encoding, Transport, Decoding MPEG Basics and Standards Sam Siewert 3
Digital Video Sam Siewert 4
Real-Time Video Codecs and Tools Codec = Compression, Decompression (Encode/Decode) Basic Inputs to Encoder (Original Uncompressed Data) POINT: RGB vs YCrCb Color Encoding (4 th Color?) MACRO BLOCK: Portion of a frame (e.g. 8x8 pixels) FRAME: Run Length Encoding and Huffman (Lossless) SEQUENCE: I-Frame Single Frame Compression and Forward/Backward Difference Images, Group of Pictures (GoP) Uncompressed - Python Viewer, Irfanview (http://www.irfanview.com/) Display Uncompressed PPM sequences PPM Portable Pixmap, http://en.wikipedia.org/wiki/portable_pixmap Open Source Options http://www.theora.org/ Stream over Raw TCP to VLC Media Player http://mjpeg.sourceforge.net/ Leverage HW and SW MPEG Encoder/Decoder Hauppauge WinTV-HVR-1600 (ATSC, NTSC, Clear QAM MPEG2 Encode) VLC Media Player - http://en.wikipedia.org/wiki/vlc_media_player, http://www.videolan.org/vlc/ ffmpeg - http://en.wikipedia.org/wiki/ffmpeg, http://www.ffmpeg.org/ Sam Siewert 5
POINT or PIXEL: RGB Color Model RGB, 24-bit, 8 bits [0-255] for each color band (x, y, z) sampled Each Pixel is a 3-D Vector in RGB Space, Opponent Colors Blue Cyan Magenta White Black Green Red Yellow Sam Siewert 6
Discussion What Does Eye See? Paper Seeing Forbidden Colors Ewald Hering (1872), Opponent Colors (R/G, Y/B) Color Models RGB Cube HSV - Hue/Saturation/Value Hue Similarity to R, G, Y, B Saturation Color vs. Brightness Value Low=Black, High=Color Red and Green Opponent Colors Can t See Both Simultaneously Yellow and Blue Opponent Colors Luminance (Candela/Square-Meter) Light Passing Through Area Forming a Solid Angle in A Direction Candela (Photonic Power )= Watts/Steradian More Precise than Brightness Chrominance ( CrCb or UV in YCrCb or YUV) U=Blue Luminance (Y) V=Red - Luminance (Y) Wavelength Spectrum - ROYGBIV RGB Cube HSV Cylinder/Cone Sam Siewert 7
YUV ITU-R BT.601 Component Video Standard Y is Luma UV is Color information designed so that BW TV (CRT) will still display grayscale image Used by NTSC, PAL, SECAM (Standard Definition TV) RGB to YUV Conversion Y = (0.299 * R) + (0.587 * G) + (0.114 * B) U = -(0.147 * R) - (0.289 * G) + (0.436 * B) V = (0.615 * R) - (0.515 * G) - (0.100 * B) YUV to RGB Conversion R = Y + 1.140V G = Y (0.394 * U) (0.581 * V) B = Y + (2.032 * U) Sam Siewert 8
POINT: RGB to Grayscale Encoding Ranges from 36-bit down to 16-bits per Pixel 24 bit: 8 bits per R, G, B or 16 bit: 5:6:5 bits per R, G, B 10-bit and 12-bit Sample HD Color (e.g. HD-SDI, DCI, Post) Can be RGB or 4:2:0, 4:2:2, 4:4:4 YUV Encoded 4:2:0 and 4:2:2 Sub-sample Color, 4:4:4 does not Single Color Band from RGB Not True Grayscale, but Useful for Computer Vision Applications Some Targets Like a Laser Pointer are Best Seen in Red Band or Green Band Alone GIMP Uses a Conversion to 8-bit Luminance Y = 0.3R + 0.59G + 0.11B Defined by equal amounts of color the eye is most sensitive to green, then red, and then blue Sam Siewert 9
POINT: R, G, or B band only vs. Balance R G B Balanced Sam Siewert 10
POINT: YCrCb RGB An Alternative to RGB is YUV, Where Y is Luminance and CrCb is Chrominance and Green is encoded over Y, Cr, and Cb rather than a discrete sample The following 2 sets of formulae are taken from information from Keith Jack's excellent book "Video Demystified" (ISBN 1-878707-09-4). RGB to YCrCb Conversion (For Computers with RGB [0-255]) Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16 Cr = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128 Cb = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128 YCrCb to RGB Conversion R = 1.164(Y - 16) + 1.596(Cr - 128) G = 1.164(Y - 16) - 0.813(Cr - 128) - 0.392(Cb - 128) B = 1.164(Y - 16) + 2.017(Cb - 128) In both these cases, you have to clamp the output values to keep them in the [0-255] range. Sam Siewert 11
POINT: YCrCb 4:4:4 24-bit/30-bit Format For every Y sample in a scan-line, there is also one CrCb sample Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits No compression between RGB and YCrCb 4:4:4 For DCI and Post Production, Each Sample can be 10 bits 0 319 76,480 76,799 = Y, Cr, and Cb sample = Y sample only Sam Siewert 12
POINT: YCrCb 4:2:2 16-bit Format For every 2 Y samples in a scan-line, there is one CrCb sample Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits Two RGB Pixes = 48 bits, Whereas Two YCrCb is 32 bits, or 16 bits per pixel vs. 24 bits per pixel (1/3 smaller frame size) 0 319 48 bit to 32 bit 76,480 76,799 = Y, Cr, and Cb sample = Y sample only Pixel-0 = Y7:Y0 0, Cb7:Cb0 0 ; Pixel-1 = Y7:Y0 1, Cr7:Cr0 0 Pixel-2 = Y7:Y0 2, Cb7:Cb0 1 ; Pixel-3 = Y7:Y0 3, Cr7:Cr0 1 Pixel-4 = Y7:Y0 4, Cb7:Cb0 2 ; Pixel-5 = Y7:Y0 5, Cr7:Cr0 2 Sam Siewert 13
POINT: YCrCb 4:2:0 12-bit Format For every 4 Y samples in a scan-line, there is one CrCb sample Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits 4 RGB Pixes = 96 bits, Whereas 4 YCrCb is 48 bits, or 12 bits per pixel vs. 24 bits per pixel (1/2 smaller frame size) 0 319 96 bit to 48 bit 76,480 76,799 = Y, Cr, and Cb sample = Y sample only Sam Siewert 14
FRAME: XY Pixel Maps FRAME Resolution Computer Graphics Resolutions (Close in Viewing) VGA = 640x480 SVGA=800x600 TV and Cinema Resolutions (Lean Back Viewing) NTSC = Standard Defintion, 720x480 Interlaced (Odd, then Even Scan Lines) High Definition (Progressive full frame or Interlaced) 720i/p = 1280x720 1080i/p = 1920x1080 http://en.wikipedia.org/wiki/display_resolution FRAME Aspect Ratios X to Y Ratio NTSC = 3:2, 720x480 (240) HD 720 = 16:9, 1280x720 (80) HD 1080 = 16:9, 1920x1080 (120) 2K = 17:9, 2048x1080 (120) FRAME Rates 60i, NTSC = 59.94 odd/even, or 29.97 (Basically 30 fps, 60x1000/1001 RF Chroma/Audio Seperation) 24p, Cinema = 24 fps 60p, HDTV Progressive Modes Sam Siewert 15
Display and Camera Resolutions Red Epic 645 9K: 9334x7000, Red Epic 617 28K: 28000x9334 Sam Siewert 16
SEQUENCE: Series of Frames in Encoded Group of Pictures I-Frame Initial Frame in GoP, Compression Within Frame Only P-Frame Predicted Frame B-Frame Bi-Directional Interpolated Frame (Differences Between Last I-Frame and Next P-Frame or I-Frame) Sam Siewert 17
Building Your Own Video Codec Video Compression Spaces POINT: Color Space RGB (24 bits) YCrCb 4:2:2 (16 bits / pixel) Lossy compared to RGB Grayscale (8 bits) Lossy FRAME or MACRO BLOCK: XY Dimension As an Image Convolution/Deconvolution (Lossy) Convolution: Moving Average of Pixels to Compress Multiple Pixels to One Deconvolution: Interpolation to Estimate Original Pixel Values Adjancent to Compressed Pixel As A String Run Length Encoding (Lossless) Huffman Encoding (Lossless) SEQUENCE: Frame to Frame Time Dimension Difference Images (Lossless or Lossy with Thresholds) Pixel Address and data for non-zero pixels Pixel Address for 320x240 = 17 bits Dpixel = 24 bits for RGB Scenes often don t change quickly Transmission of Change-Only Data Threshold on pixel to Compress more (Lossy) Detection of Size Blow-up on Fast Changing Data Sam Siewert 18
Basic Definitions Useful Wikipedia Pages PPM - http://en.wikipedia.org/wiki/portable_pixmap GIF - http://en.wikipedia.org/wiki/gif JPEG - http://en.wikipedia.org/wiki/jpeg (Lossless Intra-frame compression) MPEG - http://en.wikipedia.org/wiki/mpeg Theora - http://en.wikipedia.org/wiki/theora PPM and PGM Info (Portable Pixmap and Graymap) http://netpbm.sourceforge.net/doc/ppm.html (RGB) http://netpbm.sourceforge.net/doc/pgm.html (grayscale) MPEG Info http://www.mpeg.org/mpeg/index.html http://www.compression-links.info/mpeg DivX Info http://www.divx.com/divx/ Sam Siewert 19
Frame Analysis and Image Processing Resources Uncompressed Single Frames or Compressed I-frame Only Data Used for Image Processing E.g. Edge Enhancement http://software.intel.com/en-us/articles/using-intelstreaming-simd-extensions-and-intel-integratedperformance-primitives-to-accelerate-algorithms/ Color Map Editing Post Production in General Can Edit Frame Sequences Compressed GOP Chop, http://gopchop.org/ Single Frame Viewing and Analysis http://www.irfanview.com/ http://www.trilon.com/xv/downloads.html http://www.gimp.org/downloads/ Image Processing Libraries http://cimg.sourceforge.net/ http://sourceforge.net/projects/opencvlibrary/ Sam Siewert 20
Using Python PPM Stream Viewer Not Many Tools Support Un-encoded Un-compressed frame and sequence display Test Your Python and Vpipe Installation Run vpipe_display.py first Run frametx_test.py second Write PPM TCP Streaming Client to Connect and Send PPM Frame Sequence Single PPM Frames Can be Viewed with Irfanview Uncompressed PPM Streaming Requires Significant Bandwidth 320x240x3 = 225K/frame, 6.75MB/sec for 30 fps NTSC 720x480x3 = 1012.5K/frame, 30MB/sec for 30 fps Sam Siewert 21
Uncompressed Streams Normally Used Only in Digital Cinema and Post Production True Color (RGB, 8-bits Each) YCrCb (16-bits Each Pixel) Uncompressed Lossless or Lossy I-Frame Only MJPEG (http://en.wikipedia.org/wiki/mjpeg), JPEG 2000 (http://en.wikipedia.org/wiki/jpeg_2000 ) Very Data and Storage Intensive Not Practical for Digital Video Transport, Used in Digital Cinema and Instrumentation Standard Definition 720x480 RGB @ 30fps Requires 30MB/sec Requires gigabit Ethernet Minimum 80MB/sec with 8b/10b encoding + Packet Overhead High Definition 720p (1280x720x3 = 2700KB/frame) @ 30fps = 80MB/sec High Defintion 1080p (1920x1080x3 = 6075KB/frame) @ 30fps = 178MB/sec DCI 2K, 2048x1080x3 (6480KB/frame) @ 24fps = 152MB/sec (4/8Gbps Fiber Channel, 10G Ethernet, Infiniband (10/20/40 Gbps)) DCI 4K, 4096x2160x3 (25920KB/frame) @ 24fps = 607.5MB/sec, Requires 8G FC, 10G Ethernet, or Infiniband Sam Siewert 22
More on Codec Streaming Streaming = Codec + Data Transport E.g. MPEG-4 / RTP Your Codec / UDP Transport Protocols UDP Connectionless Datagrams, No Delivery Guarantee Diversely Routed Data Can Out of Order Datagrams Lost Are Not Re-transmitted TCP Connection-oriented Messaging, Guarantee for Window All Messages Segmented, Sequenced, and Fully Acknowledged All Messages Re-assembled from Segments and Re-Ordered Any Lost Messages Re-transmitted from Re-Transmission Window Re-transmission Window Based on Bandwidth-Delay, Congestion After a Maximum Number of Retries, TCP Finally Gives Up RTP/UDP Real-Time Transport Payload type, Sequence Number, Time-stamp, Delivery Monitoring http://www.ietf.org/rfc/rfc1889.txt RTSP Real-Time Streaming Transport Typically Used to Control RTP Delivery, but can use UDP or other transport http://www.ietf.org/rfc/rfc2326.txt Sam Siewert 23
Digital Audio Sam Siewert 24
Digital Audio Analog to Digital Encoding PCM Pulse Code Modulation ADC Sample Rate (e.g. 6 to 48KHz per Channel) 8-bit, 16-bit and 20-bit Monotone 8/16/20-bit Stereo (Right and Left Channel) Dolby Surround Sound Channel Mappings.1 Refers to Low Frequency Speaker 2.x, 3.x, 5.x, 7.x Speaker Placement Compression and Encoding Standards MP3 MPEG-1, Audio Layer 3 (Lossy PCM Encoded Data Compression) AC-3 - Audio Codec #3 (Dolby Digital), Multi-Channel Compression Transport Audio Elementary Stream in MPEG Packets (188 Byte) Multiplexed with Video in UDP or RTP Transport Stream 3.0 5.1 7.1 Sam Siewert 25
Audio Codec and Transport System Basics Sam Siewert 26
Basic Voice/IP Setup Four Tasks (Services) On Each Phone Host Record, Play-back Streaming, Transport Two Phone Hosts for Point-to-Point Phone Call Record Buffer Audio ADC Codec Data (8-bit Mono, 16-bit Stereo, etc.) Record Half Buffer Audio Source (Microphone, MP3) waveform Sound Card ADC/Codec/ DMA Codec Sample IRQ Record Sound Bite Message Data Audio Streaming Transport Data Sound Bite Messages tnettask UDP or TCP Speaker s waveform Sound Card DMA/Codec/ DAC Playback Control Playback Sound Bite Remote Message Audio Codec DAC Data Playback Half Buffer Playback Buffer Sam Siewert 27
Point-to-Point VOIP Host-1 Replicate Services on Each Host Audio Source (Microphone, MP3) waveform Record Buffer Audio ADC Codec Data (8-bit Mono, 16-bit Stereo, etc.) Record Half Buffer Sound Card ADC/Codec/ DMA Codec Sample IRQ Record Sound Bite Message Data Audio Streaming Transport Data Sound Bite Messages tnettask (UDP or TCP) Establish Control and Data Transport Speaker(s) waveform Sound Card DMA/Codec/ DAC Playback Control Audio Codec DAC Data Playback Host-2 Sound Bite Remote Message Playback Half Buffer Playback Buffer Ethernet LAN Record Buffer Either Side Should Be Able to Initiate or Answer a Call Audio Source (Microphone, MP3) Speaker(s) waveform waveform Audio ADC Codec Data (8-bit Mono, 16-bit Stereo, etc.) Sound Card ADC/Codec/ DMA Sound Card DMA/Codec/ DAC Codec Sample IRQ Playback Control Record Playback Record Half Buffer Sound Bite Message Data Sound Bite Remote Message Audio Streaming Transport Data Sound Bite Messages tnettask (UDP or TCP) Playback Half Buffer Audio Codec DAC Data Playback Buffer Sam Siewert 28
MPEG2 Transport Basics and System View Sam Siewert 29
MPEG2 Fundamentals MPEG2 Transport in 188 Byte Packets Defined by 13818-1 Small Packets with PCR (Program Clock Recovery) Encapsulated in Baseband or Broadband Phy/Link Layers and Perhaps Network/Transport as Well Baseband UDP or RTP/RTSP Broadband QAM64, QAM256, QPSK modulation schemes SPTS or MPTS Must be Multi-plexed and Timed PID Remapping To Avoid SPTS PID Conflicts and to Generate Master PAT/PMT and PSI Data A Program Video PIDs (MPEG2 ES) Audio PIDs, Secondary Audio (e.g. AC3) Regeneration of Overall MPTS PCR Can Start with SPTS or MPEG2 ES Most Likely Don t Need to Regenerated PTS/DTS for ES May Need to NULL Pad VBR MPTS/SPTS for Broadband Sam Siewert 30
Digital Cable Concepts QAM Quadrature Amplitude Modulation, Modulates Amplitude of Two Carrier Waves, 90 degrees out of Phase (Q and I Channels) Spectrum Analyzer Constellation QPSK (Bit Encoded for Digital) QAM-16 (4x4) STD QAM-256 (16x16) 6MHz Channels 57 to 999 MHz (Ch 2 to 158) Can Carry MPEG MPTS, e.g 10 per Channel QPSK, QAM Tuner Sam Siewert 31
MPEG2 Fundamentals Basic Head-End Broadband MPEG2 System Broadcast VoD Services Config & Playlist Video Services Control Interface MPTS Playback DVB-ASI PCI SPTS Playback QAM Driver Server Bit-streams Pre-mux Tools DVB-ASI Analyzer DVB-ASI Analyzer IP Network QAM-RF QAM-SA STBs PRO-1000 Quad Sam Siewert 32
IPTV Server with IPTV-STB IP (UDP/RTP) Stream Server (e.g. VLC, DekTek) IP Return Path PC-STB (e.g. Amino) Investigating VoD Server HDMI/DVI Streamer VoD Srvr VoD Filesystem Video Only Vault IP Network UDP/IP HDMI/DVI IPTV PC-STB (VLC) VoD Demo
Digital Cable Server with PC-STB QAM 256 with Stream Server (e.g. DekTek) Return Path Must be DSG Modem PC-STB (e.g. Dell+Tuner) QAM256 Streamer VoD Srvr Stream Filesystem Video Only Vault Coax Network QAM256 PC- STB (MyHD) HDMI/DVI Stream Demo
Digital VoD System End-to-End gige / 10G IP Transport Distribution Network Edge QAMs Pitcher Catcher VoD Servers DSL or FTTH HFC Cable Network Video Vault Billing System IPTV STBs Sam Siewert 35
National On-Demand System with DRM (Digital Rights Management) Sam Siewert 36
MPEG2 Encode/Decode Introduction Sam Siewert 37
MPEG2 ES Fundamentals MPEG2 Video Elementary Stream POINT: Most Often YCrCb Coding (4:2:0, 4:2:2, 4:4:4) Luminance in Every Pixel Color Difference Filtered to Half Resolution Horizontally and Vertically Often Color is Sub-sampled FRAME: Defines Macro Blocks DCT (Discrete Cosine Transform) Huffman Encoding Run Length Encoding Quantization SEQUENCE: Arranged in Group of Pictures, Often 12 Frames at a Time Motion Vectors (Differences Between Frames) I-frame (Intraframe) Spatial Compression Only P-frame (Predicted) Using Previous I/P-frame Reference B-frame (Bidirectional) Never Used as Reference, Reference Previous AND Next I/Pframe Compression Often about 30 to 1 High Motion Video (Harder to Compress) Coefficients Used Drive Perceived Quality Sam Siewert 38
Group of Pictures: High Level View Sam Siewert 39
MPEG-2: Order Of Operators POINT (Pixel) Encoding Macro-Block Lossy Intra-Frame Compression Motion-Based Compression in Group of Pictures Sam Siewert 40
Parsing an Elementary Video Stream Many 188-Byte Packet Types and Header Allows for Multi-plexing of many Video and Audio Streams on a Carrier Sam Siewert 41
Next Lecture Deeper Dive in 13818-1 and 13818-2 13818-1: Transport Streams for Video & Audio Container for Program Streams (188 Byte Packets) Multiplexed Video and Audio Elementary Streams PSI Program Specific Information System Clock (PCT, PTS/DTS) 13818-2: Elementary Video Stream Encode/Decode Video DCT Macro-blocks Color Format GoP (I-Frame, B-Frame, P-Frame) Motion Compensation and Vector Quantization Mathematics of DCT and Huffman Encoding Differences Between MPEG-2 and MPEG-4 Trick-Play Operations on Program and Transport Streams Sam Siewert 42