1 Essence of Image and Video Wei-Ta Chu 2010/9/23
2 Essence of Image Wei-Ta Chu 2010/9/23 Chapters 2 and 6 of Digital Image Procesing by R.C. Gonzalez and R.E. Woods, Prentice Hall, 2 nd edition, 2001
Image Sensing and Acquisition 3 Collect the incoming energy and focus it onto an image plane.
A Simple Image Formation Model 4 Denote an image by a 2D function Characterized by two components: Illumination:, determined by the illumination source Reflectance:, determined by the characteristics of the imaged objects.
Image Sampling and Quantization 5 Sampling Quantization Digitizing the coordinate values Digitizing the amplitude values
Image Sampling and Quantization 6 Continuous image projected onto a sensor array Results of image sampling and quantization
Digital Image Representation 7 Dynamic range The number of discrete gray levels allowed for each pixel Due to processing, storage, and sampling hardware considerations, the number of gray levels typically is an integer power of 2: We refer to images whose gray levels span a significant portion of the gray scale as having a high dynamic range.
Digital Image Representation 8 Image size For a square image with width(height) is N, the total number of bits required to represent the image:
Spatial Resolution 9 Sampling is the principal factor determining the spatial resolution of an image. 1024x1024 32x32: Downsampled by a factor of 2
10 Spatial Resolution 1024x1024 Resample 512 x 512 to 1024 x 1024 From 256x256 From 128x128 From 64x64 From 32x32
Gray-Level Resolution(L=256,128,,4,2) 11 256 128 16 8 64 32 4 2
Histogram 12 The histogram of an image with gray level in the range [0,L-1] is a discrete function Normalized histogram
Histogram 13 Useful image statistics Image processing applications Image enhancement Image compression Image segmentation
Color Fundamentals 14 Color spectrum: violet, blue, green, yellow, orange & red Each color in the spectrum blends smoothly into the next The color perceived in an object are determined by the nature of the light reflected from the object
Color Fundamentals 15 Cones can be divided into three principal sensing categories Due to the absorption of the human eyes, colors are seen as variable of three primary colors (red, green, blue) Approximately 65% of all cones are sensitive to red light, 33% to green light, 2% to blue light.
Color Fundamentals 16 Secondary colors of light Magenta (R + B) Cyan (G + B) Yellow (R + G) The primary color of pigments subtract a primary color of light and reflects the other two.
Color Fundamentals 17 Brightness Embodies the chromatic notion of intensity Hue Attribute associated with the dominant wavelength in a mixture of light waves Dominant color as perceived by an observer Saturation The relative purity of the amount of white light mixed with a hue Less saturated: e.g. pink (red+white), lavender (violet+white) Hue and saturation taken together as called chromaticity.
Specifying Colors 18 The amounts of red, green, and blue needed to form any particular color are called the tristimulus values and are denoted X, Y, Z, respectively. A color is then specified by its trichromatic coefficients, defined as Using CIE chromaticity diagram, which shows color composition as a function of x (red) and y(green)
Specifying Colors 19 The point marked green has approximately 63% green and 25% red content. The composition of blue is approximately 13%.
Color Models (Color Spaces) 20 A color model is a specification of a coordinate system and a subspace within that system where each color is represented by a single point. Hardware-oriented & application-oriented RGB color monitor, color video cameras CMY (cyan, magenta, yellow) color printing CMYK (cyan, magenta, yellow, black) color printing HSI (hue, saturation, intensity) closely matching with human perception
The RGB Color Model 21 Based on Cartesian coordinate system Different colors are points on or inside the cube Full color image: 8 bits for each component, total 24 bits
22 The RGB Color Model
The CMY and CMYK Color Models 23 When a surface coated with cyan pigment is illuminated with white light, no red light is reflected from the surface. Cyan subtracts red light Most devices that deposit colored pigments on paper require CMY data input or perform RGB to CMY conversion. Equal amounts of CMY pigments should produce black.
The HSI Color Model 24 RGB/CMY color systems are suited for hardware implementations. RGB system matches nicely with the fact that the human eye is strongly perceptive to red, green, and blue primaries. But RGB and CMY are not well suited for describing colors for human interpretation.
The HSI Color Model 25 We describe a color object by its hue, saturation, and brightness. Hue: color attribute that describes a pure color Saturation: degree of pure color diluted by white light Brightness: measured by intensity HSI color model decouples the intensity component from the color-carrying information
The HSI Color Model 26 Take the RGB cube, stand on the black vertex, with the white vertex above it. The intensity (gray scale) is along the line joining these two vertices.
The HSI Color Model 27 The dot is an arbitrary color point. The angle from the red axis gives the hue, and the length of the vector is the saturation. The intensity of all colors is given by the position of the plane on the vertical intensity axis.
HSI 28 HSI is also known as HSL, HLS HSV color space
Converting colors from RGB to HSI 29 RGB values have been normalized to the range [0,1] The angle θis measured with respect to the red axis of the HSI space.
The LAB (CIELAB) Color Models 30 CIELAB (L * a * b * ) color space L*: lightness dimension a*,b*: two chromatic dimensions that are roughly red-green and blue-yellow. L*a*b* color is designed to approximate human vision http://en.wikipedia.org/wiki/lab_color_space http://coatings.specialchem.com.cn/tc/color/index.aspx?id=cielab
Other Color Models 31 YUV, YIQ, YCbCr color spaces YCbCr is widely used in video/image compression schemes such as MPEG and JPEG Please refer to http://en.wikipedia.org/wiki/color_space
Color Histogram 32 A representation of the distribution of colors in an image. Discretize colors into a number of bins, and counting the number of pixels with colors in each bin. http://rsb.info.nih.gov/ij/plugins/color-inspector.html
Nonuniform Quantization 33 An example in HLS (HSI) space Considering human perception Lee, et al. Spatial color descriptor for image retrieval and Video summarization, IEEE Trans. on Multimedia, 2003.
Characteristics of Histogram 34 The color histogram of an image represents the global statistics (color distribution) of pixels colors Histogram is one of the most useful feature to describe images or be the basis for similarity measure
Histogram-based Difference 35 Bin-wise histogram difference between Image I 1 and I 2
Short Introduction to Image Features 36 Color features Color histogram Color moments Color coherence vectors (CCV) Color correlogram Ma, et al. Benchmarking image features content-based image retrieval, Record of the 32nd Asilomar Conf. on Signals, Systems & Computers, vol 1., 1998.
Short Introduction to Image Features 37 Texture features Tamura features (coarseness, directionality, contrast) Multi-resolution simultaneous auto-regressive model Canny edge histogram Gabor texture feature Pyramid-structured wavelet transform (PWT) feature Tree-structured transform (TWT) feature Ma, et al. Benchmarking image features content-based image retrieval, Record of the 32nd Asilomar Conf. on Signals, Systems & Computers, vol 1., 1998.
Color Moments 38 Containing only the dominant features instead of storing the complete color distributions. Store the first three moments of each color channel of an image in the index. Average Variance Skewness
Color Moments 39
Color Moments 40 Distance between two images I 1 and I 2 Diff. of average Diff. of variance Diff. of skewness
Color Correlogram 41 A color correlogram expresses how the spatial correlation of parts of colors changes with distance. The histogram of an image is defined as The colors in are quantized into The notation is synonymous with and Huang, et al. Image indexing using color correlograms, CVPR, 1997.
Color Correlogram 42 Let a distance be a fixed a priori. Then the correlogram of is defined for, This value gives the probability that a pixel at distance away from the given pixel is of color., The autocorrelogram of captures spatial correlation between identical colors only
43 Essence of Video Wei-Ta Chu 2010/9/23
Constitution of Digital Video Data 44 A natural video stream is continuous in both spatial and temporal domains. In order to represent and process a video stream digitally it is necessary to sample spatially and temporally. Spatial domain Temporal domain
Video Stream 45 Natural scene Camera RGB to YC 1 C 2 Monitor Processing, Storage, Transmission YC 1 C 2 To RGB
Video Data Representation 46 RGB is not very efficient for representing real-world images, since equal bandwidths are required to describe all the three color components. E.g. 8 bits per component, then 24 bits per pixel Human eye is more sensitive to luminance. Many image coding standards and broadcast systems use luminance and color difference signals. YUV and YIQ for analog television standards, YCbCr for their digital version.
Color Models in Video 47 Largely derive from older analog methods for coding color for TV. Luminance is separated from color information. YIQ is the color space used by the NTSC color TV system, employed mainly in North and Central America, and Japan. In Europe, video tape uses the PAL and SECAM codings, which are based on TV that uses a matrix transform called YUV. Digital video mostly uses a matrix transform called YCbCr that is closely related to YUV.
TV Encoding System 48 PAL, short for Phase Alternating Line, is a color encoding system used in broadcast television systems in large parts of the world. SECAM, French for "Sequential Color with Memory"), is an analog color television system first used in France. NTSC is the analog television system in use in the United States, Canada, Japan, South Korea, Taiwan, the Philippines, Mexico, and some other countries
The YUV Color Model 49 The YUV model defines a color space in terms of one luma (brightness) and two chrominance components. The YUV color model is used in the PAL, NTSC, and SECAM composite color video standards. YUV signals are created from an original RGB source. The weighted values of R, G, and B are added together to produce a single Y signal.
The YUV Color Model 50 The U signal is then created by subtracting the Y from the blue signal, and then scaling; V is created by subtracting the Y from the red, and then scaling by a different factor. Y U V
The YCbCr Color Model 51 YCbCr is a family of color spaces used in video and digital photography systems. Y is the luma component and Cb and Cr are the blue and red chroma components. Recommendation 601 specifies 8-bit coding: Y C b C r
Chroma Subsampling 52 4:2:2 indicates horizontal subsampling of the Cb, Cr signals by a factor of 2. Of four pixels labeled as 0 to 3, all four Ys are sent, and every twocb sthe two Cr sare sent. (Y0,Cb0) (Y1,Cr0) (Y2,Cb2) (Y3,Cr2) 4:2:0 subsamples in both the horizontal and vertical dimensions by a factor of 2.
Examples 53 Given image resolution of 720x576 pixels represented with 8 bits each component, the bit rate required is: 4:4:4 resolution: 720x576x8x3 = 10 Mbits/frame 4:2:0 resolution: (720x576x8) + (360x288x8)x2 = 5 Mbits/frame
Motion Estimation 54 Successive video frames may contain the same objects (still or moving). Motion estimation examines the movement of objects in an image sequence to try to obtain vectors representing the estimated motion.
Motion Estimation 55 The Essence of Image and Video Compression, by A.C. Kokaram http://www.mee.tcd.ie/~ack/teaching/1e8/lecture3.pdf
Three Typical Types of Coded Picture 56 I frame (intraframe) Intraframe encoded without any temporal prediction P frame (forward predicted frame) Interframe encoded using motion predition from the previous I or P frame B frame (bidirectionally predicted frame) Interframe encoded using interpolated motion prediction between the previous I or P frames and the next I or P frames.
Motion Prediction 57 A typical Group of Picture (GOP) in MPEG-2
Short Introduction to Video Features 58 Motion-based features Camera motion, object motion Motion activity/magnitude Moving object detection Shot-based features Average shot length/shot change frequency Scene-based features
Motion Type 59 Camera motion (global motion) Zoom-in/Zoom-out Pan Tilt Object motion
Motion Activity/Magnitude 60 Attributes: Intensity of activity Direction of activity Spatial distribution of activity Indication of the number and size of active regions Temporal distribution of activity Variation of activity over the duration of a video segment or shot
Average Shot Length/Shot Change 61 Frequency A statistical measurement which divides the total length of the film by the number of shots. Average duration of a shot between cuts Directors often change shots frequently (shorter ASL) to attract the audience E.g. commercials Video segments with longer ASLs usually present peaceful scenes.
62 Video Syntax Analysis Wei-Ta Chu 2010/9/23
Outline 63 Shot boundary detection Scene boundary detection Keyframe selection
Video Structure 64 Shot A consecutive sequence of frames recorded from a single camera. Scene A collection of semantically related and temporally adjacent shots, depicting and conveying a high-level concept or story. Scene Shot Frame Video
Shot Boundary Detection / 65 Shot Change Detection Shot A basic unit for advanced accessing browsing, summarization, retrieval Keyframes Representative frame(s) of a shot Issues Large camera/object motion Editing effects: dissolve, wipe, fade Flashlight
Types of Shot Change 66 Abrupt change (hard cut) Cut occurs in a single frame when stopping and restarting the camera Gradual transition Fade-in: gradual increase in intensity starting from a black frame Fade-out: gradual decrease in intensity resulting a black frame Dissolve: transiting from the end of one clip to the beginning of another Wipe: One image is replaced by another with a distinct edge that forms a shape.
Examples of Shot Changes 67 Cut Dissolve Wipe Li and Lee. Efective detection of various wipe transitions IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007.
Examples of Fade 68 Fade out Fade in Cernekova, et al., Information theory-based shot cut/fade detection and video summarization IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 82-91, 2006.
Different Types of Wipe 69 Li and Lee. Efective detection of various wipe transitions IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 663-673, 2007. Video example: http://en.wikipedia.org/wiki/wipe_%28transition%29
Detection Process 70 Extract features Similarity calculating Boundary decision Video Shot 1 Shot 2 Shot 3 Shot 4
Features 71 Pixel difference Statistical difference Histograms Compression differences Edge Motion
Pixel Difference 72 Count the number of pixels that change in value more than some threshold. May be sensitive to camera motion.
1. Pair-wise comparison 73 Compare the corresponding pixels in two frames. Problems: sensitive to camera movement E.g. camera panning Improvement: smoothing by a 3x3 window before comparison Zhang, et al., Automatic partitioning of ful-motion video Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.
2. Histogram Comparison 74 Less sensitive to object motion, since it ignores the spatial changes in a frame. H i (j): the histogram value for the ith frame, where j is one of the G grey levels.
2. Histogram Comparison Example 75 Example video sequence The intensity histogram of the first three frames
2. Histogram Comparison 76 Color histogram difference p i (r,g,b) is the number of pixels of color (r,g,b) in frame I i of N pixels. Each color component is discritized to 2 B different values.
3. Likelihood Ratio 77 Compare corresponding regions (blocks) in two successive frames based on second-order statistical characteristics of their intensity values. m i : mean intensity value for a given region S i : variances for a given region Then a camera break can be declared whenever the total number of sample areas whose likelihood ratio exceeds the threshold is sufficiently large Raise the tolerance of slow and small object motion from frame to frame.
4. Edge Change Ratio 78 Zabih, et al., A feature-based algorithm for detecting and clasifying scene breaks Proc. Of ACM Multimedia, pp. 189-200,1995.
4. Edge Change Ratio 79
4. Edge Change Ratio 80 Edge change ratio
5. Motion Vectors 81 Using the direction of motion prediction to be the cues for shot change detection Pei, et al., Scene-effect detection and insertion MPEG encoding scheme for video browsing and eror concealment IEEE Trans. on Multimedia, vol. 7, no. 4, pp. 606-614, 2005.
5. Motion Vectors 82 Using motion vector information to filter out false positives Zhang, et al., Automatic partitioning of ful-motion video Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.
6. Differences in DCT domain 83 Discrete Cosine Transform (DCT) coefficients 1. Select subset of blocks 2. Select subset of DCT coefficients of these blocks 3. Concatenate selected coefficients of selected blocks as a vector 4. Calculate the similarity of two coefficient vectors Arman, et al., Image procesing on encoded video sequences Multimedia Systems Journal, vol. 1, no. 5, pp. 211-219, 1994.
Gradual Transition Detection 84 Cuts or abrupt change Gradual transition
1. Twin-Comparison Approach 85 Zhang, et al., Automatic partitioning of full-motion video Multimedia Systems Journal, vol. 1, pp. 10-28, 1993.
2. Edge Change Ratio 86 Lienhart, R., Comparison of automatic shot boundary detection algorithms Proc. of SPIE Storage and Retrieval for Image and Video Databases VII, vol. 3656, pp. 290-301, 1999.
87 2. Edge Change Ratio
3. Characterizing a Wipe Transition 88
Evaluation 89 Precision The percentage of retrieved items that are desired items Recall The percentage of desired items that are retrieved. Precision = # Correctly retrieved items # All retrieved items = # Correctly retrieved items # Correctly retrieved items + # Falsely retrieved items Recall = # Correctly retrieved items # All relevant items = # Correctly retrieved items # Correctly retrieved items + # Items that are not retrieved
Evaluation Other Terms 90 Miss # Items that are not retrieved True positive (TP) # Correctly retrieved items False positive (FP) Predicted positive Predicted negative Actual positive TP FN Actual negative FP TN # Falsely retrieved items True negative (TN) # Correctly missed items False negative (FN) # Items that are not retrieved
Evaluation 91 Detected (retrieved) Relevant (ground truth) Actual positive Actual negative Predicted positive TP FP FP TP FN Predicted negative FN TN TN
Relationship between Precision & Recall 92 Precision-Recall (PR) curve
93 Relationship between True Positive and False Positive Receiver Operator Characteristic (ROC) curve
Using PR or ROC Curves? 94 ROC curves can present an overly optimistic view of an algorithm s performance if there is a large skew in the clas distribution. Number of true negative examples greatly exceeds the number of positive examples. Thus a large change in the number in false positives can lead to a small change in the false positive rate. Precision compares false positives to true positives and better captures the algorithm s performance. Davis, et al., The relationship between precision-recal and ROC curves Proc. of International Conference on Machine Learning, pp. 233-240, 2006.
Comparison of Shot Boundary 95 Detection Techniques Methods Histograms, region histograms, running histograms, motion-compensated pixel differences, DCT coefficient differences Evaluation data Video type # Frames Cuts Gradual transitions TV 133204 831 42 News 81595 293 99 Movie 142507 564 95 Commercial 51733 755 254 Misc. 10706 64 16 Total Multimedia Content 419745 Analysis, CSIE, CCU 2507 506
Methods Compared 96 Histogram (64-bin gray-level) difference, single threshold Region (block) histogram 16 blocks, 64 gray-scale histograms, difference threshold for each block, and count threshold for changed blocks Running histogram (Twin method) 64 gray-scale histogram for each frame, twin thresholds Compute motion vectors. If excessive motion, reject gradual changes Motion compensated pixel difference 12 blocks per frame, motion vector for each block Compute average residual errors, if larger than high threshold, detected as a cut Use cumulative errors to detect gradual changes (similar to above) Use motion vectors to reject false gradual changes DCT difference Concatenate 15 coefficients of same locations from different blocks to form a vector Compute (1-inner product of two vectors from consecutive frames)
PR Curve for TV program 97
PR Curve for News program 98
PR Curve for Movie Videos 99
PR Curve for Commercials 100
PR Curve for All Data 101
PR Curve for All Data Cut Only 102
Observations 103 Histogram-based method is consistent Produced the first or second best precision Simplicity & straightforward Region algorithm seems to be the best Where recall is not the highest priority Running algorithm seems to be the best Where recall is important Motion vector is helpful to reduce false positives DCT the worst Large number of false positives in black frames
References 104 J.S. Boreczky, et al., "Comparison of video shot boundary detection techniques" Proc. of SPIE Conference on Storage and Retrieval for Image and Video Databases, vol. 2670, 1996. (must read) R. Lienhart, "Comparison of automatic shot boundary detection algorithms" Proc. of SPIE Storage and Retrieval for Image and Video Databases VII, vol. 3656, pp. 290-301, 1999. J. Yuan, et al., "A formal study of shot boundary detection" IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 2, pp. 168-186, 2007. A. Hanjalic, "Shot-boundary detection: unraveled or resolved?" IEEE Trans. on Circuits and Systems for Video Technology, vol. 12, no. 2, pp. 90-105, 2002.