Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Computer Representation of Audio Quantization Sampling Digital Image Representation Color System Chrominance Subsampling Digital Video Representation Hardware Requirements Chapter 1: Audio/Image/Video Fundamentals 2 1
Audio Signals Sound is created by vibration of matter (i.e., air molecules) and is a continuous wave that travels through air: Amplitude - the measure of the displacement of air pressure wave from its mean or quiescent state (measured in decibels, db) Frequency - the number of periods in a second (measured in hertz, Hz, cycles/second). 1/f => Period. Amplitude (Air Pressure) Period Time Chapter 1: Audio/Image/Video Fundamentals 3 Digital Audio Representation A transducer (inside a microphone) converts pressure to voltage levels. An analog signal is converted into a digital stream by discrete sampling. Discretization both in time and amplitude (quantization). Chapter 1: Audio/Image/Video Fundamentals 4 2
Quantization and Sampling Sample Height 1.00 0.75 0.5 0.25 Samples Quantization Sampling Rate Chapter 1: Audio/Image/Video Fundamentals 5 Sampling Rate Direct relationship between sampling rate, sound quality (fidelity) and storage space. How often do you need to sample a signal to avoid losing information? Depends on how fast the signal is changing. In reality, twice per cycle (follows from the Nyquist sampling theorem). Human hearing frequency range: 20 Hz - 20KHz, voice is about 500Hz to 2KHz. Chapter 1: Audio/Image/Video Fundamentals 6 3
Nyquist Sampling Theorem If a signal f(t) is sampled at regular intervals of time and at a rate higher than twice the highest significant signal frequency, then the samples contain all the information of the original signal. Example: Actual playback frequency for CD quality audio is 22,050 Hz Nyquist Theorem sampled the signal at twice this frequency: sampling frequency => 44,100 Hz. Chapter 1: Audio/Image/Video Fundamentals 7 Quantization Sample precision - the resolution of a sample value. Quantization depends on the number of bits used measuring the height of the waveform. 16-bit CD quality quantization results in 64K values. Audio formats are specified by sample rate and quantization: Voice quality 8-bit quantization, 8,000 Hz mono (64 Kbps) CD quality 16-bit quantization, 44,100 Hz linear stereo (705.6 Kbps for mono, 1.411 Mbps for stereo) Chapter 1: Audio/Image/Video Fundamentals 8 4
+2 N-1-2 N-1 Signal-to-Noise Ratio V A measure of the quality of the signal. Defined as the ratio of the power of the correct signal and the noise: S/N = 20 log 10 (V signal /V noise ) S/N = 20 log 10 (2 N-1 /1/2) = 20 x N x log 10 2 = 6.02N (db) Thus, each bit adds about 6 db of resolution! V max V signal (t) t Quantization 2V max /2 N = V max /2 N-1 Sampling Rate Max. quantization noise, V noise Chapter 1: Audio/Image/Video Fundamentals 9 Pulse Code Modulation (PCM) The two step process of sampling and quantization is known as Pulse Code Modulation. Based on the Nyquist sampling theorem. Used in speech and CD encoding. Chapter 1: Audio/Image/Video Fundamentals 10 5
How Are Audio Samples Represented? Audio samples are represented as formats characterized by four parameters: Sample rate: Sampling frequency Precision: Number of bits used to store audio samples Encoding: Audio data representation (compression) Channel: Multiple channels of audio may be interleaved at sample boundaries. PCM-encoded speech (64 Kbps) and music (1.411 Mbps) strains the bandwidth of Internet, thus some form of compression is needed! See Chapter 5: Audio Compression Chapter 1: Audio/Image/Video Fundamentals 11 Preview of Chapter 5 Audio samples are encoded (compressed) based on Non-uniform quantization - humans are more sensitive to changes in quiet sounds than loud sounds:!-law encoding Difference encoding Psychoacoustic Principles - humans do not hear all frequencies the same way due to Auditory Masking: Simultaneous masking Temporal masking This information is used in MPEG-1 Layer 3, known as MP3. Reduces bit rate for CD quality music down to 128 or 112 Kbps. Chapter 1: Audio/Image/Video Fundamentals 12 6
Outline Computer Representation of Audio Quantization Sampling Digital Image Representation Color System Chrominance Subsampling Digital Video Representation Hardware Requirements Chapter 1: Audio/Image/Video Fundamentals 13 Digital Image Representation An image is a collection of an n m array of picture elements or pixels. Pixel representation can be bi-level, gray-scale, or color. Resolution is determined by the number of pixels. Intensity/Brightness Level W B Bi-level Gray-scale 1 bit n bits R G 3 x n bits Pixel B Color Chapter 1: Audio/Image/Video Fundamentals 14 7
Color Depth (Pixel Depth) The amount of information per pixel is known as the color depth: Monochrome (1 bit per pixel) Gray-scale (8 bits per pixel) Color (8 or 16 bits per pixel) 8-bit indexes to a color palette 5 bits for each RGB + 1 bit Alpha (16 bits) True color (24 or 32 bits per pixel) RGB (24 bits) RGB + Alpha (32 bits) Chapter 1: Audio/Image/Video Fundamentals 15 Example Color Depth 1-bit depth 4-bit depth 8-bit depth 16-bit depth Chapter 1: Audio/Image/Video Fundamentals 16 8
Color Spaces A method by which we can specify, create, and visualize color. Why more than one color space? Different color spaces are better for different applications: Humans => Hue Saturation Lightness or Brightness (HSL or HSB) LCD monitors => Red Green Blue (RGB) Printers => Cyan Magenta Yellow Black (CMYK) Compression => Luminance and Chrominance (YIQ, YUV, and YCbCr) Chapter 1: Audio/Image/Video Fundamentals 17 Visible Spectrum 440 nm 545 nm580 nm Human retina is most sensitive to these wavelengths Chapter 1: Audio/Image/Video Fundamentals 18 9
Color Perception Luminosity Sensitivity Blue Green Red 400 500 600 700 Wavelength (nm) Chapter 1: Audio/Image/Video Fundamentals 19 HSB Defines the color itself H dominant wavelength Indicates the degree to which the hue differs from a neutral gray with the same value (brightness) S purity % white Indicates the level of illumination B Luminance Intensity of light Chapter 1: Audio/Image/Video Fundamentals 20 10
RGB Color System RGB (Red-Green-Blue) is the most widely used color system. Represents each pixel as a color triplet in the form (R, G, B), e.g., for 24-bit color, each numerical values are 8 bits (varies from 0 to 255). (0, 0, 0) = black (255, 255, 255) = white (255, 0, 0) = red (0, 255, 255) = cyan (65, 65, 65) = a shade of gray Chapter 1: Audio/Image/Video Fundamentals 21 RGB RGB is an additive model. No beam, no light. Yellow Magenta Cyan All 3 beams => white! Chapter 1: Audio/Image/Video Fundamentals 22 11
CMYK Color System For printing, there is no light source. We see light reflected from the surface of the paper. Subtractive color model. Cyan No ink, 100% reflection of light => white! Magenta Yellow All 3 colors => black! But, due to imperfect ink, its usually a muddy brown. That s why Black (K) ink is added. Chapter 1: Audio/Image/Video Fundamentals 23 YUV Color System PAL (Phase Alternating Line) standard. Humans are more sensitive to luminance (brightness) fidelity than color fidelity. Luminance (Y) - Encodes the brightness or intensity. Chrominance (U and V) -Encodes the color information. YUV uses 1 byte for luminance component, and 4 bits for each chrominance components. Requires only 2/3 of the space (RGB = 24 bits), so better compression! This coding ratio is called 4:2:2 subsampling. RGB <=> YUV Y = 0.3R + 0.59G + 0.11B U = (B-Y) * 0.493 V = (R-Y) * 0.877 Chapter 1: Audio/Image/Video Fundamentals 24 12
YCbCr Color System Closely related to YUV. It is a scaled and shifted YUV. Cb (blue) and Cr (red) chrominance. Used in JPEG and MPEG. YCbCr <=> RGB Y = 0.257R+0.504G+0.098B+16 Cb = ((B-Y)/2)+0.5 = -0.148R-0.291G+0.439B+128 Cr = ((R-Y)/1.6)+0.5 = 0.439R-0.368G-0.071B+128 Chapter 1: Audio/Image/Video Fundamentals 25 YIQ Color System Used in NTSC color TV broadcasting. B/W TV only uses Y. YIQ signal: similar to YUV Y = 0.299R + 0.587G + 0.114B I = 0.596R - 0.275G - 0.321B Q = 0.212R -0.528G + 0.311B Composite signal: All information is composed into one signal. To decode, need modulation methods for eliminating interference b/w luminance and chrominance components. Chapter 1: Audio/Image/Video Fundamentals 26 13
Color Decomposition RGB CMY YCbCr YIQ Red Cyan Y Y Green Magenta U I Blue Yellow V Q Chapter 1: Audio/Image/Video Fundamentals 27 Chrominance Subsampling What s another way to cut chrominance bandwidth in half? => Use 4 bits per pixel. Human eye less sensitive to variations in color than in brightness. Compression achieved with little loss in perceptual quality. Horizontal sampling reference Y Cb Cr 4:4:4 Horizontal factor (relative to 1 st digit) Horizontal factor (relative to 1 st digit, except when 0 ½ horizontal & ½ vertical) Chapter 1: Audio/Image/Video Fundamentals 28 14
4:2:2 Subsampling For every 4 luminance samples, take 2 chrominance samples (subsampling by 2:1 horizontally only). Chrominance planes just as tall, half as wide. Reduces bandwidth by 1/3 Used in professional editing (high-end digital video formats) Chapter 1: Audio/Image/Video Fundamentals 29 4:1:1 Subsampling For every 4 luminance samples, take 1 chrominance sample (subsampling by 4:1 horizontally only). Used in digital video. Chapter 1: Audio/Image/Video Fundamentals 30 15
4:2:0 Subsampling For every 4 luminance samples, take 1 chrominance sample (subsampling by 2:1 both horizontally and vertically). Chrominance halved in both directions. Most commonly used. Three varieties: JPEG, MPEG-1, MJPEG MPEG-2 Chapter 1: Audio/Image/Video Fundamentals 31 How Are Images Represented? A single digitized image of 1024 pixels 1024 pixels, 24 bits per pixels requires ~25 Mbits of storage ~7 minutes to send over a 64 Kbps modem! ~8-25 seconds to send over a 1-3 Mbps cable modem! Some form of compression is needed! See Chapter 2: Compression Basics and Chapter 3: Image Compression Chapter 1: Audio/Image/Video Fundamentals 32 16
Preview of Chapters 2 and 3 Lossless - no information is lost: Exploits redundancy Most probable data encoded with fewer bits Lossy - approximation of original image Looks for how pixel values change Human eye more sensitive to luminance than chrominance. Human eye less sensitive to subtle feature of the image. JPEG uses both techniques. Chapter 1: Audio/Image/Video Fundamentals 33 Outline Computer Representation of Audio Quantization Sampling Digital Image Representation Color System Chrominance Subsampling Digital Video Representation Hardware Requirements Chapter 1: Audio/Image/Video Fundamentals 34 17
Digital Video Representation Can be thought of as a sequence of moving images (or frames). Important parameters in video: Digital image resolution (e.g., n m pixels) Quantization (e.g., k-bits per pixel) Frame rate (p frames per second, i.e., fps) Continuity of motion is achieved at a minimal 15 fps is good at 30 fps HDTV recommends 60 fps! Chapter 1: Audio/Image/Video Fundamentals 35 Standard Video Data Formats National Television System Committee (NTSC) Set the standard for transmission of analog color pictures back in 1953! Used in the US and Japan. 525 lines (480 visible). Resolution? Not digital, but equivalent to the quality produced by a 720 486 pixels. 30 fps (i.e., delay between frames = 33.3 ms). Video aspect ratio of 4:3 (e.g., 12 in. wide, 9 in. high) Other standards: PAL (Phase Alternating Line): Used in parts of Western Europe. SECAM: French Standard Chapter 1: Audio/Image/Video Fundamentals 36 18
HDTV Advanced Television Systems Committee (ATSC) > 1000 lines 60 fps Resolutions of 1920 1080 and 1280 720 pixels Video aspect ratio of 16:9 MPEG-2 for video compression AC-3 (Audio Coding-3) for audio compression 5.1 channel Dolby surround sound Chapter 1: Audio/Image/Video Fundamentals 37 Bandwidth Requirements NTSC - 720 486 pixels, 30 fps, true color 3 720 486 8 30 = 251,942,400 bps or ~252 Mbps! With 4:2:2 subsampling Luminance part: 720 486 8 30 = 83,980,800 bps Chrominance part: 2 720/2 486 8 30 = 83,980,800 bps Together ~168 Mbps! For uncompressed HDTV quality video, BW requirement is 3 1920 1080 8 60 = 2,985,984 bps or ~3 Gbps! Chapter 1: Audio/Image/Video Fundamentals 38 19
Video Compression In addition to techniques used in JPEG, MPEG uses Spatial redundancy - correlation between neighboring pixels. Spectral redundancy - correlation between different frequency spectrum. Temporal redundancy - correlation between successive frames. See Chapter 5: Video Compression. What about delay through the network? See Chapter 6: Multimedia Networking. Chapter 1: Audio/Image/Video Fundamentals 39 Outline Computer Representation of Audio Quantization Sampling Digital Image Representation Color System Chrominance Subsampling Digital Video Representation Hardware Requirements Chapter 1: Audio/Image/Video Fundamentals 40 20
Hardware Requirements Multimedia servers Core networks - routers Edge networks wired or wireless Multimedia enhanced PCs Wireless mobile devices Smartphones, Tab/Pad devices, Netbooks, Wearable devices, Internet Appliances, etc: See Chapter 7: Mobile Application and Multimedia Processors. Chapter 1: Audio/Image/Video Fundamentals 41 Questions? Chapter 1: Audio/Image/Video Fundamentals 42 21