Video Color Conceptualization using Optimization

Video olor onceptualization using Optimization ao iaohun Zhang YuJie Guo iaojie School of omputer Science and Technology, Tianjin University, hina Tel: +86-138068739 Fax: +86--7406538 Email: xcao, yujiezh, xguo@tju.edu.cn Yiu-ming heung Department of omputer Science, Hong Kong Baptist University, Hong Kong SA, hina Tel: +85-34115155 Fax: +85-3411789 Email: ymc@comp.hkbu.edu.hk olor conceptualization aims to propagate color concepts from a library of natural color images to the input image by changing the main color. However, the existing method may lead to spatial discontinuities in images because of the absence of a spatial consistency constraint. In this paper, to solve this problem, we present a novel method to force neighboring pixels with similar intensities to have similar color. Using this constraint, the color conceptualization is formalized as an optimization problem with a quadratic cost function. Moreover, we further expand twodimensional (still image) color conceptualization to three-dimensional (video), and use the information of neighboring pixels in both space and time to improve the consistency between neighboring frames. The performance of our proposed method is demonstrated for a variety of images and video sequences. color conceptualization, color discontinuity, optimization, color correspondence, video sequence Images and videos provide visual perceptions. There are many aspects of the content of an image, each providing different information. An important aspect providing much of the visual perception of an image is the composition of colors. surka et al. [1] abstracted the concepts of look and feel (e.g. capricious, classic, cool, and delicate impressions) from an image according to color combinations. In practice, one may want to edit an image or video according to different task demands or personal preferences. Generally, altering the color of the image or video is a popular and intuitive way to meet such requirements [, 3, 4, 5, 6, 7, 8, 9, 10, 11]. einhard et al. [5] proposed a method of borrowing the color characteristics of an image via simple statistical 1

Figure 1: (a) Verbal terms extracted by clustering many images into different moods according to their hue distribution. The left three columns are some of the clustered images, while the right column shows the hue distributions of the image. (b)the input image and its hue distribution. analysis. esearchers [, 3, 4] have proposed different colorization or color transfer methods that obtain colors from given reference images taking a color correspondence approach. Automatic colorization methods that search for reference images on the Internet using various filtering algorithms have been proposed [6, 7]. The success of the methods [, 3, 4, 5, 6, 7] depends heavily on finding a suitable reference image, which can be a rigorous and time-consuming task. The colorization methods employed in [8, 9] are based on a set of chrominance scribbles. The process is tedious and does not always provide natural-looking results. ohen-or et al. [10] and Tang et al. [11] changed the colors of pictures to give the sense of a more harmonious state using empirical harmony templates of color distribution. However, this technique cannot change colors to flexibly meet the demands of users. Hou and Zhang [1] first introduced a novel technique to change the image color intentionally, which is called image color conceptualization. In their work [1], prototypes of color distributions were generated by clustering a vast number of images, and the mood of the input image was then changed by transferring the color distributions to it. u et al. [13] also proposed a method with which to change the emotion conveyed by images. They used a learning framework for discovering emotion-related knowledge, such as color and texture. They then constructed emotion-specific models from features of the image super-pixels. To change the conveyed emotion, they defined a piece-wise linear transformation to align the feature distribution of the target image to the statistical model. The goal of their method was to change the high-level emotion, while the method that we propose here focuses on changing color using low-level features. The method proposed in this paper is most closely related to the work of Hou and Zhang [1]. Hou and Zhang designed a clustering model to generate prototypes of color distributions from an input library of natural landscape and architectural images, and labeled each distribution with a verbal term such as warm or cold (see Fig. 1 (a)). The main component of each color distribution ( i.e., the color concept), which corresponds to the representative color mood of the image, is then extracted. The propagation of a certain color concept to the target image is manipulated by adopting the peak-mapping method. However, since the hue wheel is shifted without the consideration of spatial information, some artifacts may be introduced during the

propagation. In this paper, we use an optimization method to solve the problem employing a simple premise: neighboring pixels in space-time with similar intensities should have similar colors [8]. With consideration of the spatial information, the spatial continuity of colors in the generated image is ensured. Moreover, the optimization plays an important role in expanding the color conceptualization technique to three dimensions (i.e., video). Figure : The left half of each picture is the original input image, and the right half is the colorconceptualized result. olor conceptualization for video is much more attractive and challenging than that for a still image. However, to the best of our knowledge, no such system exists. The most straightforward idea is to apply the color conceptualization technique to each frame individually. However, this does not exploit the coherence between adjacent frames. In fact, a video may contain many different shots and, even in the same shot, there are many significant changes such as different illumination and the movement of objects. Therefore, the result obtained by simple application to individual frames effect of the method is often far from satisfactory. Even in the same shot, adjacent frames probably differ in terms of their color conceptualization, which results in flickering. Another possible solution for color conceptualization of video is video object segmentation [14, 15], in which changes are made across frames of the same shot to avoid flickering. Unfortunately, current three-dimensional segmentation techniques are not precise enough. In this paper, we alternatively apply the optimization method to video color conceptualization to ensure the continuity of colors both spatially and temporally, and thus provide a pleasant experience when watching the output video. evin et al. [8] proposed a method of coloring image sequences making the simple assumption that nearby pixels in space time with similar gray levels should have similar colors. This assumption is further employed by us in the most important step of the video color conceptualization. The proposed method is demonstrated to be effective for solving the discontinuity problem in an experiment. The paper is organized as follows. In Section 1, we discuss the existing method for image color conceptualization and formulate the problem of color conceptualization. In Section, we propose an optimization method to improve the existing technique, and extend the proposed color conceptualization method to three dimensions (i.e., video). We also detail problems arising in implementation and the corresponding solutions. Various experiments are carried out in Section 3 for both images and video frames. Section 4 presents conclusions. 3

1 elated Work The main goal of color conceptualization is to extract color concepts by clustering images and to change the mood of an input image via propagating the expected color concept to it (as shown in Fig.). All the work in this paper is conducted in HSV [16] color space and is based on hue wheel representation. 1.1 Hue Wheel epresentation of Image In HSV color space, the hue is distributed around a circle, starting with the red primary at 0, passing through the green primary at 10 and the blue primary at 40, and then wrapping back to red at 360 (as shown in Fig. 1 (b)). Given an input image, we firstly convert it into the HSV color space. The hue wheel H () i of the input image is then defined as [1] H ( i) S( p) V ( p). (1) ( i 1) i H( p) 180 180 Here H( p ), S( p) and V( p ) are the hue, saturation and value of pixel p from image, and i [1,360] is an integer. The range of hue is divided into 360 orientations. We thus obtain 360 bins around the hue wheel. Subsequently, by calculating the value of H () histogram of hue wheel representation i for every i, the H is obtained as shown in Fig. 1 (b). The expression respects the fact that the pixels with high saturation and high brightness always attract more attention. For one image, there might be multiple peaks in the hue wheel. However, only the dominant color is represented by the dominant peak; therefore, we choose the strongest peak as the main color of the image. To cut the main hue peak at the proper position in the hue wheel, we adopt the three following steps (as shown in the upper half of Fig. 3). 1. Fit peak Pk ( ) by a Gaussian function G (, where and are the mean and variance of G ).. Set the left cut position.5 and the right cut position.5. 3. Save Pk ( ) ( k ) as the main hue peak of the image. (since about 98.76% of the Gaussian distribution is within k ). 4

Figure 3: (Upper) The hue wheel of the input image. (lower) The hue wheel of the color concept. There are two alternatives for the values in [, ) : a shift to the range [, ] or no change; there are two alternatives for the hue values in (, ] : a shift to the range [, ] or no change. r r 1. lustering Images Numerous color naming models intend to relate a numerical color space with semantic color names in natural language such as grass green and light sky blue [17,18,19]. The terms relate to color impressions; e.g., light sky blue distinguishes a particular color mood from other color distributions [1]. Most of the images convey an atmosphere by a main color. By clustering images through the Kullback-eibler (K) divergence of the distributions of hue wheels, we can extract typical moods. The K divergence of hue wheels D( ) is defined as 360 H () i D( ) H ( i)log, H () i i1 () where H is the hue wheel of the input image and H is the hue wheel of an image category. Given an image library, we use the algorithm proposed in [1] to cluster images into different categories. Images in the same category have the same mood and we label each category with a subjective description such as warm or cold. For an image category, the hue wheels of all the images in the category are calculated according to equation (1), and are then assumed to form the hue wheel of the category, which is represented by represents the current main mood, which we call the color concept. H. The dominant peak of H 1.3 Propagating the olor oncept olor conceptualization is the process of replacing the hue peak of the input image with the desired color concept. Here we normalize the hue peak according to 5

( i, H ) i t t H H () t () t (3) and then use the algorithm in [1] (which we call the color mapping algorithm for convenience) to propagate the color concept as follows. 1. For each i (, ), calculate ( i, H ).. For each i, find j that satisfies ( i, H ) ( j, H ). 3. Assign i j. Figure 4: (eft) The input image. (Middle) The result obtained using the method in [1], with color discontinues on the petal. (ight) The result obtained using our method. In the color manipulations made using the color mapping algorithm, the peak of a hue wheel is uniformly cut off at i and i. However, in real implementation, this may result in artifacts in the image introduced by the splitting of a contiguous region of the image [10]. An example is presented in Fig. 4 (middle). The splitting occurs in some regions with similar color, with part of a region falling within the peak and the other part falling outside the peak, which leads to discontinuity of color after color transformation. To solve this problem, Hou and Zhang [1] used a local minimum position to cut the peak, and achieved good results for most images. However, the method is not always effective (Fig. 4 (middle)). In many cases, directly cutting off the hue peak at any position will similarly result in discontinuity. Therefore, it is necessary to explore a new approach that uses spatial information to enforce spatial continuity. olor onceptualization using Optimization.1 Spatially onsistent Image olor onceptualization Inspired by image and video colorization assisted by optimization [8], we combine the cost function and optimization of the hue wheel to solve the peak boundary problem. The main steps are elaborated below. 6

1. Fit the hue peak Pk ( ) of the input image with a Gaussian function G, as in [1] (see, the red fitting line in the upper half of Fig. 3). The left cut position is initialized as.5, and the right cut position as.5. The hue peak falling in [, ] is changed to the desired color concept using the color mapping algorithm mentioned above.. Define two new cut positions, d falling in [0, ) and (, ] and d, and keep the hue values (i.e., to the left of half of Fig. 3) unchanged. The parameter d will be discussed in Section 3. and the right side of in the upper 3. There are two alternatives for the pixels with hue values falling in [, ) or (, ] (the parts below the black curly braces in the upper half of Fig. 3): to change to the color concept or not. In the case that the color concept is changed, the hue values of pixels falling in [, ) are changed to, and the hue values of pixels falling in (, ] are changed to. Here and are respectively the left border and right border of the desired color concept (as shown in Fig. 3). The optimal scheme B ( ) of the given image is determined by minimizing the following function over choices for all undetermined pixels. B( ) arg min( H( p) w H( q) ), p qn ( p) pq (4) where H( p) is the hue value of pixel p in the input image, and N( p ) is the set of eight neighbors of pixel p. Note that wpq is the weight coefficient satisfying [0] wpq e d ( xp x, q )/ 1, (5) where d( x, x ) is the squared difference between the intensities of pixels p and q. Here p q 1 is the variance of intensity in a window around pixel p. Obviously, difference between intensities decreases. For a given pixel p, qn ( p) w pq wpq increases as the 1. The minimization in equation (4) guarantees that neighboring pixels have similar colors if their intensities are similar.. Video olor onceptualization ompared with still-image color conceptualization, video color conceptualization is much more attractive and challenging because it involves the issue of ties and changes between adjacent frames. In addition, there may be various scenes in one video, and their theme contents and main colors can vary. If conceptualized uniformly, the video will likely appear awkward and distorted. 7

Moreover, the color conceptualization one desires should be based on video content, rather than being arbitrary. Therefore, scene segmentation is essential. State-of-the-art shot-detection methods [, 3] can be used in our framework. To demonstrate the performance of our method, we use a simple and effective method to distinguish different scenes in the video that is based on the square of the absolute difference of the gray value. In practice, we compute the average value of the squared absolute difference between adjacent frames from the first frame: n M I( k) I ( k), f f f1 k1 (6) where I ( k ) is the gray value of pixel k in frame f. For a given threshold, the frame f is f treated as the beginning of a new scene, if threshold. The remaining work is to concentrates on each single scene. M is equal to or greater than a pre-defined f Even for the same scene, video color conceptualization cannot be as simple as image color conceptualization. Applying image color conceptualization to each frame individually usually leads to flickering artifacts in the output video; e.g., Fig. 5 (b). There are two main reasons for this. First, the hue wheels of two adjacent frames are highly unlikely to be exactly the same, which results in different colors needing to be changed in the two frames. Second, the edges of the objects changing during the conceptualized process are unstable because of the absence of a time consistency constraint. Instead of calculating the hue wheel representation of each single frame separately, a hue wheel representation of the whole shot can be computed using equation (1). Similar to the first two steps in the propagating of the color concept described in section.1, the hue peak of the video shot is fitted with a Gaussian function, G, and the left and right borders are.5 and.5 respectively. Two additional cut positions are dv and d. Subsequently, the hue values falling in [, ] are changed to [, ] v according to the color mapping algorithm mentioned above, while the hue values falling in [0, ) or (, ] remain unchanged. There are also two options for the pixels with hue values falling in [, ) or (, ] : a shift in the hue value or no shift. However, as opposed to the case of the image color conceptualization, we use both spatial and temporal information to structure the optimization problem so that the best scheme for the whole shot can be obtained. Analogously, according to the principle that neighboring frames with similar intensities are expected to have similar color, the objective function to be minimized can be formalized as B( ) H( p) H( q) pv qn ( p), (7) 8

where H( p) is hue value of the pixel p in the input video, and w is the weight coefficient pq satisfying equation (5). As opposed to the case of image manipulation, N( p ) here represents 6 neighboring pixels in spatial-temporal space [4]. Figure 5: (a) Four successive frames of an input video. (b) olor conceptualization results obtained using Hou and Zhang s method [1] for each frame individually, with discontinuous and varying red regions on a leaf. (c) (d) olor conceptualization results obtained using our method..3 olor orrespondence In the case that the hue of a pixel is to be changed, we have previously changed the hue to if the hue value falls in[, ), and to if the hue value falls in (, ]. However, this would still result in artifacts since pixels with different hue values may change to the same value. onsequently, instead of changing all pixels to the same value, we employ a more elaborate scheme [10] to achieve correspondence of color appearance [1]: H( p ) r (1 G H( p ( ) )), (8) where p is a pixel with a hue value falling in [, ], H ( p) is the hue value that pixel p will change to if it needs to change, H( p ) is the original hue value of the pixel p, and r( ) is a parameter that will be discussed later. and are the mean values of the Gaussian function fitting the hue peak of the shot and the concept peak, respectively. G is a Gaussian function with mean zero and standard deviation and ranges continuously from 0 to 1. From equation (8), we find that the hue values of pixels falling in (, ] will become distributed near in the same order as their original values, and become compact (as shown in 9

Fig.3). The hue values of pixels falling in [, ) are changed to be near using a similar method..4 ircle Problem of Hue The main principle of our method is that neighboring pixels in space-time that have similar intensities should have similar colors. Under this assumption, we decided the hue value of each undetermined pixel according to the weighted sum of its adjacent pixels. However, the distribution of the hue value is a circular ring, where hue 0 and hue represent the same color. As an extreme example, if the hue of an undetermined pixel depends on two neighboring pixels with weighting coefficients 1 0.5 H and H 1 0 w and w 0.5, and the hue values of the two pixels are, then using the proposed method, the expected hue value of the undetermined pixel is. This means that the color of a pixel in a pile of red pixels may change to green, which is obviously unreasonable. The simplest solution is to make the hue distribution linear by disconnecting the hue wheel at an appropriate point according to the specific input pictures. The undetermined points and their neighboring points are always in or near the hue peaks of the input image and the color concept. We should find a cutoff point as far from both hue peaks as possible. We can then guarantee that there is only one distance between any two neighboring pixels among the undetermined points and the near hue values will not be put apart. A 1 and A are the median points of distance to and, from two directions, respectively, and the one with larger is the farthest point to the two main peaks. Therefore, the median point A 1 or A with larger distance to 3 Experimental esults will be selected as the cutoff point. Figure 6: (a) The input image. (b) (c) (d) The resulting image obtained using our method with d=10,140, and 60. In this section, we present various image and video results obtained using our proposed method. First, we illustrate that color conceptualization is different from color transfer in two respects. First, the main purpose of color conceptualization is to change the mood of a picture while that is not the case for color transfer. Second, color conceptualization generates color concepts by 10

clustering a number of pictures once, while the color transfer method has to find a suitable reference picture for each target image. We experimentally investigate the performance of our method for a variety of pictures and videos. The parameter d (introduced in Section.1) is crucial because it decides the number of pixels with undetermined hue values. If the value of d is too small, there are quite a few pixels with undetermined hue values (as shown in Fig. 6 (b), the color of the mountain on the left is not consistent). On the other hand, if the value of d is too large, some background pixels are wrongly labeled as undetermined (as shown in Fig. 6 (c), almost the whole image becomes the same color). In our implementation, we set d 60 as shown in Fig. 6 (d). Figure 7: (a) The upper image is the input image. The white areas in the bottom image are undetermined pixels. (b) (c) (d) esulting images obtained using our method with r =.5, r = 3, and r = 4, respectively. The parameters r and (introduced in Section.3) collaboratively decide the closeness of the hue distribution of the undetermined pixels. The hue values in (, ] would change to [, r ], where r decides the maximum distribution width and decides the specific distribution as shown in Fig. 7. r must be larger than.5 because the distribution range must include (.5,.5 ] according to our method. On the other hand, the value of r cannot be arbitrarily large. If the value is too large, there may be unexpected colors in the undetermined pixels because the distribution width of the hue is too large. Figure 7 shows results for an image with different values of r. The results show that our method is not sensitive to the parameter r. Even the magnified views show only minor differences with respect to varying r. We use r throughout our experiments. 3 Figure 8: The first picture is the input image and other pictures are the output conceptualized images. 11

The value of should be obtained to guarantee that Therefore, we obtain the value of H( p) into equation (8). changes exactly to. by substituting r 3, H( p) and For image color conceptualization, we should choose a certain color concept from existing concepts (here we cluster six color concepts using the V database [5] as the image library). Figure shows two examples of image color conceptualization. For the first picture, the change in color concept implies a change in season, because leaves can be yellow in autumn and tend to be green in spring. The different colors in the second picture suggest different weather. Figure 8 shows another natural scene conceptualized using our method. As we improve Hou and Zhang s method [1] by taking spatial information into consideration, our proposed method performs better in some cases, especially when there are color differences in the same region of an object. In Fig.9, the magnified images show the performance improvement over to the existing method. Moreover, this technique is applicable not only to the field of image processing, but also to the previewing of artwork coloring. An example is shown in Fig. 10. Figure 9: (a) Three input images. (b) Output images obtained using Hou and Zhang s image color conceptualization [1]. (c) Output images obtained using our image color conceptualization method. Experiments further demonstrate that our method performs well for video. Simply applying Hou and Zhang s image color conceptualization [1] to each frame individually leads to color discontinuity and flickering as demonstrated in Fig. 5 (b). For a better view, see our supplemental video material. Since we take the temporal information into account, the results obtained using our method as shown in Fig. 5 (c) and Fig. 5 (d) are significantly better. Figure 11 presents more comparisons not only between our video color conceptualization and Hou and Zhang s image method for individual video frames, but also between our video color conceptualization and our new image method applied to video frames individually. This comparison allows can help us to observe the role of temporal information in overcoming the flickering problem and shows the advantage of the video method. Figure 11 (b) shows frames of an input video, and Fig. 11 (a) shows the hue wheel representation of the three frames and the whole video. We see a difference in the hue wheel representation between two frames. Then Fig. 11 (c), (d) and (e) shows three 1

groups of frames of the resulting video obtained using Hou and Zhang s image method, our image method only considering the spatial information and our video method considering both spatial and temporal information. Some artifacts are observed in the magnified in views of (c) and (d). Figure 10: (a) The input image of a crocus artwork and its hue wheel representation. (b) The effect of coloring the crocus yellow and the hue wheel representation of the output image. (c) The effect of coloring the crocus green and the hue wheel representation of the output image. Figure 1 shows other video examples. olor conceptualization can be applied in many fields, such as image and video processing, advertising and music television processing, and mood consistency regulation in image cut and paste. 4 Discussion and onclusions We proposed an image color conceptualization method based on an existing method [1] and an optimization algorithm [8], and expanded it to video processing. Our main contributions include taking the spatial information into account to improve color continuity, and expanding our image-based method to video color conceptualization by enforcing spatio-temporal consistency. Experiments carried out for both images and videos demonstrated the performance of our proposed method. 13

Figure 11: (a) Hue wheel representation of the three frames in (b) and the whole video. (b) Three frames of the input video. (c) (d) and (e) are the resulting frames obtained using Hou and Zhang s image method [1], our image method and our video method. eferences 1. surka G, Skaff S, Marchesotti, et al. Building look & feel concept models from color combinations. The Visual omputer, vol. 7, no. 1, pp.1039-1053, 011.. Welsh T, Ashikhmin M and Mueller K. Transferring color to greyscale images. AM Transactions on Graphics (TOG)- Proceedings of AM SIGGAPH, vol. 1, no. 3, pp. 77-80, 00. 3. Irony, ohen-or D and ischinski D. olorization by example. In Eurographics Symposium on endering, pp. 01-10, 005. 4. harpiat G, Hofmann M and Scholkopf B. Automatic image colorization via multimodal predictions. In Proc. EV, pp.16-139, 008. 5. einhard E, Ashikhmin M, Gooch B, et al. olor transfer between images. IEEE omputer Graphics Applications, vol. 1, no. 5, pp. 34-41, 001. 6. iu, Wan, Qu Y, et al. Intrinsic colorization. AM Transactions on Graphics (SIGGAPH Asia 008 issue), vol. 7, no. 5, pp. 15:1-15:9, 008. 7. hia A, Zhuo S, Gupta, et al. Semantic olorization with internet images. AM Transactions on Graphics, vol. 30, no. 6, pp. 156:1-156:7, 011. 14

8. evin A, ischinski D and Weiss Y. olorization using optimization. In Proceedings of AM SIGGAPH, pp. 689-694, 004. 9. Yatziv and Sapiro G. Fast image and video colorization using chrominance blending. IEEE Transactions on Image Processing, vol. 15, no. 5, pp. 110-119, 006. 10. ohen-or D, Sorkine O, Gal, et al. olor harmonization, AM Transactions on Graphics (TOG), vol. 5, no. 3, pp. 64-630, 006. 11. Tang Z, Miao Z, Wan Y, et al. olor harmonization for images. Journal of Electronic Imaging, vol. 0, no., pp. 03001-0301, 011. 1. Hou and Zhang. olour conceptualization. In Proceedings of the fifteenth AM international conference on Multimedia, pp. 65-68, 007 13. u M, Ni B, Tang J and Yan S. Image e-emotionalizing. PM, 011. 14. ee Y, Kim J, and Grauman K. Key-segments for video object segmentation. In IV, pages 1995-00, 011. 15. Zhang B, Zhao H, and ao, Video Object Segmentation with Shortest Path, The 0th Anniversary AM Multimedia, Preprint, 01 16. Hanbury A. onstructing cylindrical coordinate colour spaces. Pattern ecognition and Image Processing Group, vol.9, no. 4, pp. 494-500, 008. 17. iu Y, Zhang D, u G, et al. egion-based image retrieval with high-level semantic color names. In Proc. of IEEE 11th International Multi- Media Modelling onference, pp. 180-187, 005. Figure 1: Three groups of video color conceptualization results. In each group, the upper row shows five frames of the input video, and the lower row shows the output. 18. Goldstein E. Sensation and perception (the 5th Edition), Brooks/ole, 1999. 19. Berk T, Brownston and Kaufmann A. A new color-naming system for graphics languages. In IEEE GA, vol., no. 3, pp. 37-44, 198. 0. Weiss Y. Segmentation using eigenvectors: A unifying view. In International onference on omputer Vision, pp. 975-98, 1999. 15

1. Morovic J, and uo M. The fundamentals of gamut mapping: A survey. Journal of Imaging Science and Technology, vol. 45, no. 3, pp. 83-90, 001.. ee H, Yu J, Im Y, et al. A unified scheme of shot boundary detection and anchor shot detection in news video story parsing. Multimedia Tools and Applications, vol. 51, no. 3, pp. 117-1145, 011. 3. Amudha J, adha D, Naresh P. Video Shot Detection using Saliency Measure. International Journal of omputer Applications, vol. 45, no., pp. 17-4, 01. 4. Shi J and Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol., no. 8, pp. 888-905, 000. 5. Oliva A and Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of omputer Vision, vol. 4, no. 3, pp. 145-175, 001. 16