arxiv: v2 [cs.cv] 4 Dec 2017

Size: px
Start display at page:

Download "arxiv: v2 [cs.cv] 4 Dec 2017"

Transcription

1 Will People Like Your Image? Learning the Aesthetic Space Katharina Schwarz Patrick Wieschollek Hendrik P. A. Lensch University of Tübingen arxiv: v2 [cs.cv] 4 Dec 2017 Figure 1. Aesthetically pleasing images from our derived scores. The complex matter of aesthetics depends on many factors, e.g., visual appearance, composition, content, or style, and makes it almost impossible to directly compare all images if they are similarly beautiful. Abstract Rating how aesthetically pleasing an image appears is a highly complex matter and depends on a large number of different visual factors. Previous work has tackled the aesthetic rating problem by ranking on a 1-dimensional rating scale, e.g., incorporating handcrafted attributes. In this paper, we propose a rather general approach to automatically map aesthetic pleasingness with all its complexity into an aesthetic space to allow for a highly fine-grained resolution. In detail, making use of deep learning, our method directly learns an encoding of a given image into this highdimensional feature space resembling visual aesthetics. Additionally to the mentioned visual factors, differences in personal judgments have a large impact on the likeableness of a photograph. Nowadays, online platforms allow users to like or favor certain content with a single click. To incorporate a huge diversity of people, we make use of such multi-user agreements and assemble a large data set of 380K images (AROD) with associated meta information and derive a score to rate how visually pleasing a given photo is. We validate our derived model of aesthetics in a user study. Further, without any extra data labeling or handcrafted features, we achieve state-of-the art accuracy on the AVA benchmark data set. Finally, as our approach is able to predict the aesthetic quality of any arbitrary image or video, we demonstrate our results on applications for resorting photo collections, capturing the best shot on mobile devices and aesthetic key-frame extraction from videos. 1. Introduction The wide distribution of digital devices allows us to take series of photos making sure not to miss any big moment. Manually picking the best shots afterwards is not only timeconsuming but also challenging. Generally, approaches for automatically ranking images towards their aesthetic appeal can be useful in many applications, e.g., to handle personal collections or for retrieval tasks. Overall, deciding how aesthetically pleasing an image appears is always a highly complex matter depending on a large number of various factors: Visual appearance, image composition, displayed content, or style all influence its aesthetic appeal. Fig. 1 shows a set of beautiful images with different appearance. Assume one would score each of them separately, e.g., within the interval [1, 10] to obtain some granularity. This is not only a challenging task. Even more critical, mapping those ratings to an absolute scale afterwards can lead to wrong relations between them. Asking for relative comparisons is not only an easier task, but also leads to a more reliable scale. For images like in Fig. 1 it is still almost impossible to directly compare all of them, e.g., the beautiful warmth of a sunset can hardly be generally related to the coolness of an image in style noir. Overall, it is often unclear which particular attribute mostly influences the aesthetic comparison of an image pair. Thus, we propose to arrange images in a high-dimensional space to obtain a better understanding on a highly fine-granular level about how the aesthetic appeal correlates between them without predefining specific factors. On saliency maps, we further demonstrate the necessity of considering global features in aesthetic tasks.

2 for training Aesthetic Space Flickr Explore CNN Aesthetic Model encodings Apps Aesthetic Collection Handy Shot Video Spots Figure 2. Overview. Based on images we assemble from Flickr, we derive a model that scores aesthetic appeal of an image from its views and faves. This model then guides the training process to learn fine-grained relations in the high-dimensional aesthetic space. Finally, our trained CNN is able to generate encodings for any arbitrary image leading to several applications. Additionally to the previously mentioned factors, differences in personal judgments have a large impact on the likeableness of a photograph. Nowadays, online platforms allow users to like or favor certain content with a single click. Usually people like beautiful images or, in other words, aesthetically pleasing ones. Sometimes, people might also favor images for other reasons like based on their scene content, e.g., picturing the newest mobile phone. Anyway, our user study shows that our derived model is still reliable. In this work, we consider both, the complexity of aesthetics in its high-dimensionality as well as a huge diversity of multi-user online ratings to obtain broad information about aesthetic relations without extra data labeling. An overview of our method is illustrated in Fig. 2. First, we assemble a large amount of images from Flickr and present a new database to exploit Aesthetic Ratings from Online Data (AROD). Therefrom, we derive a model of aesthetics to score the quality of an image by making use of the huge amount of available online user behavior, the views and faves. Then, we make use of deep learning and include the introduced measurements of aesthetic appeal indirectly as hints to guide the training process. Thereby, we only incorporate the information if two images are aesthetically similar or not instead of using the direct score. This allows us to consider every single image relatively to other images even if they do not seem visually comparable, i.e., due to large differences in their visual factors like appearance, displayed content, or style. Our trained CNN is then able to directly learn an encoding of any given image in a high-dimensional feature space resembling visual aesthetics. Our aesthetic space encodes the complex matter of aesthetics, that not every pair can be directly compared, on a highly fine-grained resolution of relative distances. Finally, as those encodings can be obtained for any arbitrary image, we demonstrate how they can be easily transferred into several applications on images as well as videos. In summary, our main contributions are: A new large-scale data set containing dense and diverse meta information and statistics to reliably predict visual aesthetics and which is easily extendable. A model that approximates aesthetic ratings on a broad diversity without specifically requesting expensive labels beforehand and which we validate in a user study. Formulating the complexity of aesthetic prediction as an encoding problem to directly learn the feature space allowing for fine-granularity of relative rankings on a high-dimensional level. Application prototypes such as an app for mobile devices, a photo-collection manager powered by visual aesthetic prediction as well as a video processing tool to score frames. 2. Related Work Aesthetics in Images. Previous research on visual aesthetics assessment focused on handcrafted visual cues such as color [37, 6, 36], texture [6, 19], or content [7, 30]. Generally, no absolute rules exist to ensure high aesthetic quality of a photograph. Photo quality has been explored to distinguish between high and low quality [19] or classify between the aesthetic quality of a photograph taken from a professional vs a laymen [40]. Besides of quality, interest has arisen towards the importance of images. Thereby, previous work has exploited if and to which extent an image can be predicted as popular [20], memorable [14], or interesting [7, 11, 8]. Thereby, aesthetics played roles like how it influences the memorability of an image [14]. Also, the relationship between aesthetics and images has been explored from multiple perspectives [17]. Further, making use of deep learning, the style of an image has been of recent interest: either to recognize a specific image style [18] or even to manipulate images by transferring artistic style from a painting to a captured photo [9, 16]. Such style attributes have been incorporated to improve aesthetic categorization [28]. In addition to style, the composition of an image largely influences aesthetic pleasingness and has been explored in terms of rules or enhancement [15, 26, 10]. Overall, many approaches have investigated a lot of work to find adequate attributes to approach aesthetics, e.g., generic image descriptors [32], attributes humans might use [7], cues performing psychological experiments [11], features based on artistic intuition [6], content-based features [30], or features with high computational efficiency [27]. Other methods have focused on classifying the aesthetic appeal restricting their content to consumer photos with faces [23, 24], consumer videos [34, 1] or other visual domains, e.g., paintings [22] or evolved abstract images [4]. In contrast to those previous methods, we aim for a general approach to explore the global overall aesthetic appeal without any necessity to restrict image content or define any specific attributes or properties.

3 Table 1. Comparison of different data sets containing images for judging visual pleasingness of images. * Per image properties AVA [35] AADB [21] AROD (ours) max ratings* M mean ratings* rating distr. normal normal uniform number of images 250K 10K 380K avg. image size Deep Metric Learning. Neural networks are capable of organizing arbitrary input in a latent space. Approaches directly manipulating this space have been successfully applied to signature verification [2], face recognition [5, 41] and comparing image patches [42] for depth estimation. Hereby, feature representations of the inputs are optimized such that they describe similarity relations within the data. Therefore, metric learning methods such as Siamese networks [5] and Triplet networks [13] are widely used. Inspired by those successful networks, we now approach the aesthetic learning problem by directly optimizing a metric to position aesthetic relations in a high-dimensional space. Deep Learning Aesthetics. Transferring aesthetics into a deep learning approach without defining hand-crafted features has been formulated as a categorization problem based on extracting patches for training [28, 29]. However, reducing visual content to small patches can destroy the global appearance which is important for aesthetic tasks. In contrast, we incorporate the entire image and demonstrate the importance of global features on saliency maps. Other methods have considered image quality rating as a traditional classification or regression problem predicting a single scalar information real or binary [35, 21]. Thus, they do not meet the complex nature of aesthetics as they oversimplify the task. They focus on a single scale problem that even humans might not be able to solve as they probably disagree on the actual level of visual pleasingness. Further, these approaches either use hand-crafted features [21] or examine a data set of small annotation density [35, 21]. In contrast to those methods, we make use of deep metric learning to transfer the problem of aesthetic ranking into a high-dimensional feature space representation. We rely on the plain image without defining any kinds of attributes. 3. Data Sets Generally, the training of deep networks requires large annotated data sets [38, 25] to obtain reliable results. Further, as visual aesthetics of photos is highly subjective depending on the current mood as well as any emotion, training a data-driven model requires extensive, diverse annotations. To overcome flaws of previous benchmark sets, we introduce a new data set with a comparison given in Table Previous Data Sets AVA. The AVA data set [35] provides 250K images classified in visually well-crafted and mediocre ones on a fix scale. These images are obtained from a professional community of photographic challenges. Through their annotation process only a very small amount of annotations are collected in comparison to the dimensions of social network members comprising also non-professional photographers. Note, to reliably judge image aesthetics it is inevitable to consider the consensus of highly diverse participants. AADB. Recently, Kong et al. [21] introduced a new aesthetics and attributes data set (AADB) comprising of 10K images. Each individual image score in AADB represents the averaged rating of five AMT (Amazon Mechanical Turk) workers, who are asked to give each image an overall aesthetic score. In addition, they provide attribute assignments from 11 predefined categories as judged by AMT workers. Their database maintains photos downloaded directly from Flickr which are likely to be not post-processed in contrast to professional results contained in AVA [35] Our Flickr Subset Whereas AADB is quite small, the image data of AVA seems rather biased. Besides, both only provide a small amount of collected ratings (Table 1). Thus, we propose a new, much larger data set comprising aesthetic ratings from online data (AROD). This data can be downloaded immediately, including meta-data as well as extensive, diverse labels, without the need to collect extra ratings spending additional time, effort, and money. AROD. A single click allows users to give feedback to media content. We propose to use this information. E.g., Flickr allows to add any photo to a personal list of favorites, which is counted as faves. Since this feature is optionally, users are absolutely free to add a particular image to their favorite list. Their only motivation is to tag a photo which is worth to remember. In addition, these images are uploaded without a purpose to participate in a concrete challenge and are not limited to a specific topic. To collect these image we crawl around 380K photos from Flickr including meta data such as their number of views, comments, favorite list containing this photo, title of the image and their description from the Flickr website. Our collection contains images which were published and uploaded between January 2004 and November As each photo is visited 7K times in average, this allows for a much finer granularity and gives more hints about aesthetics of images compared to previous data sets. Based on this data, we derive a model to obtain information about aesthetic pleasingness of the underlying image.

4 4. Model of Aesthetics In online platforms, people usually tend to like beautiful images or, in other words, aesthetically pleasing ones. Thus, we now aim to explore those multi-user agreements and turn them into a new useful measurement towards aesthetic appeal. We extract time-independent statistics, the faves and views (Fig. 2), which contain information traits about the underlying image quality Model Definition Previous attempts tried to directly regress some score or trained a simple binary model [36, 6] to decide whether an image is visually pleasing or ordinary. To overcome the classification approaches Kong et al. [21] employ a modification of the Siamese loss-function [2] to re-rank images according their predicted aesthetic score. In contrast to [21, 36, 6], we will leverage traits from freely available information in social networks to score the image quality. These statistics are only used as hints to guide the training process rather than as a direct label or score. To judge the pleasingness of an image we examine the relation between the views (number of visits) and the faves (number of clicks that favor image) as a proxy for visual aesthetics. Both these landmarks are highly dependent of visual aesthetics and encode the visual quality in all its facets. In addition, the low hurdle of creating a feedback ( like or favor ) allows to average information being orders of magnitude larger compared to data sets obtained via AMT. This is especially necessary, when treating images which are highly debatable. As common in population dynamics we assume exponential increase of the views dv (i) dt df (i) dt = r V (i) V (i) and the faves = r F (i) F (i) over time t N for any arbitrary image i I with growth rate r ( ) > 0. This allows us to approximate the score S(i) of the image quality independent of time t by S(i) log F (i) log V (i). (1) This time-independence of any image i is necessary when using images with different online life-spans. In addition, the model in Eq. (1) accounts for the effect of getting more faves per image being a popular user at Flickr due to the mechanism of followers. Note, the action not to add an image to ones faves contains valuable information, too! Considering the score S(i) gives a criteria to rank images i I, which values can be imitated by neural networks (see Fig. 3). A histogram of the distribution of S(i) (Eq. (1)) is illustrated in Fig. 4. The uniform distribution of the data shows that the data has high entropy which allows us to even judge borderline images. S=0.76 (1.6K 16K) S=0.76 (1.3K 12.6K) S=0.74 (3.5K 6.2K) S=0.73 (1K 13K) S=0.06 (1 60K) S=0.07 (1 21K) S=0.07 (1 14K) S=0.08 (1 10K) Figure 3. Images from our data set with approximated score S from (#faves #views). The upper rows shows images i with large values in S(i) and the bottom row with relatively low scores S(i). log(faves)/ log(views) score log(faves)/ log(views) bins Figure 4. Distribution of the collected score S(i). The uniform distribution allows us to even judge images with borderline ratings Human Evaluation As we introduce our aesthetic model as a score based on online behavior from uncontrolled user clicks, we validate the usefulness of our derived metric in a controlled experiment. We formulate our hypotheses H as follows: H 1 : Our derived aesthetic model based on freely available ratings from an uncontrolled human online behavior is reasonable. H 2 : Higher scored images are also rated better in a controlled user study and worse ones are also rated worse. Rating the aesthetic quality of an image is highly subjective and differs between persons. Performing a user study over a diversified crowd is inevitable to validate trends. As stated by Buhrmester et al. [3], Amazon Mechanical Turk (AMT) yields reliable data on a demographically diverse level. Thus, we use AMT to evaluate our aesthetic model. Experiment Setup. To overcome differences in internal ratings between persons, we aim for relative ratings instead of an absolute scale. Further, to ensure that images obtaining a higher score are really more pleasant than lower scored ones, we design the study as pairwise preference tests. The AMT workers are presented two images with different scores as shown in Fig. 5. In each binary forced-

5 Select the image that you think is aesthetically more pleasing: δ bc b δ ab a φ b φ a c φ c δca B 1 B 2 B 3 B 4 Figure 5. Example as seen by AMT workers. The task (top) is to select one image of the presented pair (bottom). choice task, the Turker is asked to select the image that is aesthetically more pleasing. We directly ask for aesthetic selection to ensure that our score derived from online faves is a suitable measure to rate aesthetics. From our downloaded data set, we evaluate 700 randomly selected image pairs. Each pair is presented to 5 Turkers. To negate click biases, ordering as well as positioning are randomized. User Study Results. In our user study, we randomly test image pairs with varying distances between the scores derived by our model. Thereby, the lowest scored images obtained at least one fave. All evaluated distances are listed in Table 2. Thereby, a small distance means that our derived Table 2. User study results. More similar rating decisions of Turkers are obtained for larger distances = S(i) S(j) between our derived scores S( ) of the images within a pair. dist > 0.1 > 0.2 > 0.3 > 0.4 > 0.5 > 0.6 mean µ var σ sign. level α 10% (p < 0.10) 5% (p < 0.05) scores are very similar and that the images are almost identically pleasing towards aesthetics. However, setting the minimal distance between the scores of the 2 images in a pair to 0.1 is rated towards the similar direction by already 78% of the Turkers. Further, for score distances bigger than 0.4, even 89% of the test persons agreed with the selection of the better image. Overall, we obtain ratings with surprisingly small variance. Besides, the already relatively small variance even further decreases with increasing distance. This indicates a high agreement between the different Turkers. As verified with a Kolmogorov-Smirnov test [33], the underlying data does not come from a normal distribution. Thus, we verified statistical relevance performing the Mann-Whitney U-test [31] which rejected the null hypothesis for all distances at least at the 10% level (p < 0.10) and for > 0.3 at the 5% level (p < 0.05) revealing statistical significant dependency between the scores of our model and the user study ratings (H 2 ). As we explicitly ask the Turkers to rate due to the term aesthetically pleasing, our presented score S(i) can really be seen as an aesthetic measure validating our first hypothesis H 1. Figure 6. Previous approaches treat aesthetic learning as a lowdimensional problem [21] which projects encodings on a 1- dimensional or into discrete bins [35]. Rather than learning a binmapping for each image i {a, b, c} into bins B i or directly φ i, we propose to learn pair-wise distances δ ij to resolve the highly complex matter of aesthetics in a high-dimensional space. 5. Learning Aesthetics As the visual quality of images is naturally hard to encode in a single scalar and it is hard to match images to discrete bins of aesthetic levels, we aim for directly learning an encoding of a given image in a high-dimensional feature space resembling visual aesthetics in contrast to 1-dim ranking as in [21] (Fig. 6). We will refer to the feature space as the aesthetic space. Ranking approaches like [21] predict scalars and inherently assume that image orders are possible on a 1-dim discrete or continuous rating scale. Hence, while a latent group of images might be globally miss-placed in the aesthetic space, our formulation allows to still order the images within the specific group correctly Encoding Aesthetics Inspired by metric learning [5, 13], our approach is to directly optimize relative distances δ : I I R, (i, j) Φ i Φ j 2 between encodings Φ i, Φ j from image pairs (i, j). We use a CNN to learn these encodings, which will be described later in detail. Importantly, this training procedure can be done without associating images to any specifically requested ratings or score from human annotators. Instead, it solely uses the information if two images are similarly aesthetic or not on an almost arbitrary scale. We minimize the triplet loss function [13] [ ] L e (a, p, n) = m + Φ a Φ p 2 2 Φ a Φ n 2 2 (2) + for images a, p, n and some margin m. Here, [x] + denotes the non-negative part of x like the ReLU activation function. This loss resembles a visual comparison, i.e., the distance between two mediocre images a, p should be smaller than the distance to a well-crafted image n and vice versa. Note,

6 S=0.71 (2.6K 62K) S=0.71 (2K 50K) S=0.29 (6 873) a CNN φ a Φ a m S=0.19 (5 12K) S=0.19 (2 305) S=0.62 (404 16K) triplet p weight sharing CNN weight sharing φ p Φ p Φ n aesthetic space Figure 7. Image triplets (a, p, n) for training with scores S(i). Each triplet consists of either 2 good and 1 bad image concerning its approximated quality or 1 good and 2 bad ones. our objective function is not directly built on predicting S( ) for a particular image on a specific scale and range. To decide whether two images are aesthetically similar or not we use our score S(i) to guide the sampling of the training data consisting of image triplets { } S(a) S(p) D = (a, p, n) with α < S( ) S(n) < β (3) with a, p and α, β R. Thus, any pair (a, p) with a rather small difference in the score allows for adaptively sampling of much harder negatives n by rejecting triplets with too large differences. An example of such image triplets is shown in Fig. 7. We allow (a, p) to contain images with higher or lower score than n for generating balance training data. This approach has the following advantages: 1. Every single image can be considered during the training relatively to other images, which also allows to train on highly debatable images. 2. There is no need to either learn a scalar or solve a binary classification problem in the fashion of ranking [21] or aesthetic-label prediction [35]. Instead, we learn the encoding itself Rating Aesthetics As the encodings space R d is only a partially ordered set, for any two images x, y knowing the aesthetic distance Φ x Φ y has no information if x should be considered as more visually pleasing than y. Thus, ordering multiple images is not possible. If an universally accepted worst image ω would exist, then one might simply use the learned distance δ(x, ω). But as we are allowed to rotate the entire space, a more practical solution is to force the encoding into a particular direction. We therefore add L d (a, n) = sign(s(n) s(a)) [ Φ a Φ n + m] + (4) as a directional term to the loss function. This leads the triplet loss by reducing the norms of encodings belonging n CNN Figure 8. Triplet-Loss. For each triplet (a, p, n) with anchor point, we aim at encoding aesthetically similar images a, p nearby and force a larger distance to aesthetic dissimilar images n. Adding L d to L e alters the update directions wrt. the aesthetic space origin ω. to less visual pleasing images and increases the norms of well crafted images. Note, that we again do not directly use any absolute score values from our data model. Altogether, we minimize the directional triplet loss : φ n L(a, p, n) = L e (a, p, n) + L d (a, n) to get a natural ordering by the Euclidean norm and relative distances. The effect of L d is depictured in Fig Learning the Aesthetic Space Network Architecture. We use the standard ResNet-50 architecture [12] f θ with trainable parameters θ to learn the encodings Φ i = f θ (i). We add a projection from the pool5 layer creating a 1000-dimensional descriptor Φ for each frame. Please refer to [12] for more details. Training was done on two Nvidia Titan X GPUs using stochastic gradient descent with initial learning rate 10 3 which is divided by 10 when the error plateaus. Sampling Training Data. We randomly sample images from our entire collection on-the-fly according to D in (3). We estimate the cardinality of D as D = from tracking the reject-rate during training. Hence, no dataaugmentation is required, which would further influence aesthetics. As ResNet expects the input to have the size , we resize the original image to match the input dimensions. Although, this down-sampling might remove small details, it keeps the relations of the image content. Further, we are interested in the aesthetics quality, rather than the photo quality from a computational photography viewpoint From Space to Scale To allow for multiple applications, e.g., ranking a set of images, it can be necessary to map our derived encodings within our high-dimensional space to a relative scale. As described earlier, while a latent group of images might be ω

7 Table 3. Performance comparison on AVA data set. Different models (top row) with according accuracy (bottom row). Our approach outperforms all models that do not use additional information and even most methods that include additional information during training. Additional information during training No additional information RDCNN Reg-Rank+Att Reg-Rank+Att+Cont Alexnet-FTune Murray Reg-Rank Reg SPP DCNN DMA Ours [28] [21] [21] [29] [35] [21] [21] [28] [28] [29] % % % % % % % % % % % globally miss-placed in the aesthetic space, our formulation allows to still order the images within the specific group correctly. Thus, we simply consider the norm of the encoding Φ i 2 as the projection score. Thereby, independently of the positions of the encodings in space, the relations between them stay maintained on the scale. 6. Experimental Results We pursue two ways of evaluation in quantitative evaluation on the common benchmark set and qualitative evaluation to analyze the internal network mechanism. Further results in combination with applications are presented in Sec. 7 and the supplemental material. Input Object Recognition Aesthetic Rating Figure 9. Different photographs (top) and related saliency maps for vanilla ResNet (middle) and our model (bottom) produced by guided-relu [39]. Darker region indicates higher influence on the actual network prediction. Quantitative Evaluation. For a fair comparison to previous approaches, we fine-tune our model network to the distributions of the ratings in the AVA dataset [35]. This is done using a subset of the AVA training data to predict discrete labels instead of relative embeddings. Table 3 shows such a quantitative comparison in accuracy to previous methods. Obviously, using an indirect approach such as ranking (Reg+Rank [21]), which resemble the nature of aesthetic judgments much better than standard approaches like classification [28, 29, 35] yields also better performance on this benchmark set. Ours further boosts this accuracy significantly, which we attribute to the more natural choice of our loss formulation. In contrast to previous work [21, 28], we do not rely on a dedicated neural network architecture using a rather common model design. Results on the left use additional information such as attribute data or contentdescription. Hence, although we trained on a data set which was constructed with literally no extensive explicit labeling, we outperform all previous methods relying solely on ratings they obtained in an expensive process. Further, learning from the consensus of many Flickr users is sufficient to gain higher accuracy (our network) on the AVA benchmark set than recent approaches with additional attributes (Reg+Rank+Attr, RDCNN). Note, these attribute categories are acting essentially as a prior and were selected after consulting professional photographers [21]. We expect to further improve our results when adding more explicit information about the content like in the construction of Reg+Rank+Att+Cont. As our main focus is to exploit freely available information solely, this explicit meta-information can be image-related comments and tags. Figure 10. Aesthetically resorted set of photos with decreasing score from our provided tool starting with the visually most pleasing image (left). What is the network looking for? Judging the visual quality of an image is totally different from plain object recognition tasks. When extracting relevant information, which is used by the neural network to perform aesthetics prediction, it is possible to visualize prominent traits in the input. To extract these saliency maps, we use guided- ReLU [39]. It is based on the idea, that large gradients of the output wrt the input have a high impact on the actual network prediction. Fig. 9 highlights those pixels in the input with large impact. Hence, this information is strongly coupled with the encoding in our aesthetic space. It clearly shows how our network considers larger regions in the image space compared to sparse saliency along gradients in the untrained network. More precisely, the network model reveals high synergy effects between surrounding regions in the light-house example in Fig. 9. At same time the vanilla ResNet only focuses on the light-house itself. 7. Applications In order to demonstrate the usability of our approach, we apply our derived aesthetics prediction score to images as well as videos allowing for several applications. Thereby, we map the encodings from space to a relative scale as described in Sec. 5.4 maintaining fine-granular relations.

8 Figure 11. Best video spots. Each frame is extracted at the peaks in the score signal. network weights, running this application directly on mobile devices is easily possible. Please see the supplemental video for a short demo. This application could further be extended to lead the user to the best shot during the movement while indicating better directions. Figure 12. Best handy shot. Based on slight movements in any direction, the application automatically captures the best shot. Figure 13. Best predicted image (blue frame) during capturing. The movements were recorded with a mobile device. Aesthetic Photo Collection. First of all, we support resorting an arbitrary photo collection due to our predicted relative aesthetic scores between the images. An example of a small set of aesthetically sorted images is shown in Fig. 10. This tool can facilitate to quickly resort one s holiday collection and directly share the best moments without time-consuming manually browsing of the usually rather large set of pictures. Best Handy Shot. A commonly known situation is that people want to take a picture but are not completely sure what the best shot of the view could be. They tend to take mulitple pictures and just postpone the decision process. This can even lead to missing the one best shot completely. We provide a simple application that allows slightly moving the phone around and temporarily captures a video. The idea is illustrated in Fig. 12. All the single images are then analyzed and rated by our system and the image of the best view is saved. The application supports the user to directly obtain the best aesthetically pleasing image and prevents the time-consuming decision process afterwards. Fig. 13 shows several frames from movements we recorded with a Samsung Galaxy SII phone and the predicted best shots. Sky proportions, saturation and the tension of the overall image layout play an important role within the decision. Due to its small memory footprint of only 102MB containing the Video Spots. Similarly, our system is able to find great shots in a video. Those shots can be selected as aesthetic key frames or, e.g., in documentary films, to identify the most wonderful places or spots. Therefore, we calculate a complete prediction curve along the video displaying the aesthetic relation between the frames. Fig. 11 displays an example of a video and the according aesthetic prediction curve. Kalman filtering is applied to smooth the final predictions over time. Extracting the frame scores is done at a speed of 140fps on a NVidia GTX960. Embedding common videos requires only 25% of the actual playback time demonstrating high efficiency and enabling real-time applications. Please see the supplemental material for examples. 8. Conclusion We propose a new data-driven approach which learns to map aesthetics with all its complexity into a highdimensional feature space. Additionally, we make use of online behavior to incorporate a broad diversity of user reactions as rating aesthetics is a highly subjective task. In detail, we assemble a novel large-scale data set of images from social media content. Hereby, aesthetics ground-truth scores for training are obtained without explicitly requesting user ratings in a time-consuming and costly process. Hence, our dataset can be easily extended, as our approach requires effectively no labeling-efforts using freely available information from social media content. The assumption of our underlying model is validated in a user study. To automatically judge aesthetics, we formulate the aesthetic prediction directly as an encoding problem. Consequently, we propose a more naturally loss objective for dealing with the complex task of learning a feature representation of visual aesthetics. Our focus lies on the abstract representation of aesthetics using online media. Thus, we solely rely on a commonly used model architecture and use a much weaker training signal which leads to state-of-the-art results on previous benchmarks. Finally, we confirm the success of our model in several real-world applications, namely, resorting photo collections, capturing the best shot and a smooth aesthetics prediction along a video stream.

9 References [1] S. Bhattacharya, B. Nojavanasghari, D. Liu, T. Chen, S.-F. Chang, and M. Shah. Towards a Comprehensive Computational Model for Aesthetic Assessment of Videos. In Proc. of MM, pages ACM, [2] J. Bromley, I. Guyon, Y. Lecun, E. Sckinger, and R. Shah. Signature Verification using a Siamese Time Delay Neural Network. In Proc. of NIPS, pages , [3] M. Buhrmester, T. Kwang, and S. D. Gosling. Amazon s Mechanical Turk: A New Source of Inexpensive, Yet High- Quality, Data? Perspect. Psychol. Sci., 6(1):3 5, [4] A. Campbell, V. Ciesielksi, and A. K. Qin. Feature Discovery by Deep Learning for Aesthetic Analysis of Evolved Abstract Images. In Proc. of EvoMUSART, pages Springer, [5] S. Chopra, R. Hadsell, and Y. LeCun. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In Proc. of CVPR, pages , [6] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Studying Aesthetics in Photographic Images Using a Computational Approach. In Proc. of ECCV, pages Springer, [7] S. Dhar, V. Ordonez, and T. L. Berg. High Level Describable Attributes for Predicting Aesthetics and Interestingness. In Proc. of CVPR, pages , [8] Y. Fu, T. M. Hospedales, T. Xiang, S. Gong, and Y. Yao. Interestingness Prediction by Robust Learning to Rank. In Proc. of ECCV, pages , [9] L. A. Gatys, A. S. Ecker, and M. Bethge. A Neural Algorithm of Artistic Style. CoRR, abs/ , [10] Y. Guo, M. Liu, T. Gu, and W. Wang. Improving Photo Composition Elegantly: Considering Image Similarity During Composition. Comput. Graph. Forum, 31: , [11] M. Gygli, H. Grabner, H. Riemenschneider, F. Nater, and L. Van Gool. The Interestingness of Images. In Proc. of ICCV, pages , [12] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. CoRR, abs/ , [13] E. Hoffer and N. Ailon. Deep Metric Learning Using Triplet Network. In Proc. of SIMBAD, pages 84 92, [14] P. Isola, J. Xiao, A. Torralba, and A. Oliva. What makes an image memorable? In Proc. of CVPR, pages , [15] S. Jacobitz. Accessed: [16] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Proc. of ECCV, pages , [17] D. Joshi, R. Datta, E. A. Fedorovskaya, Q.-T. Luong, J. Z. Wang, J. Li, and J. Luo. Aesthetics and Emotions in Images. IEEE Signal Process. Mag., 28(5):94 115, [18] S. Karayev, A. Hertzmann, H. Winnemoeller, A. Agarwala, and T. Darrell. Recognizing Image Style. CoRR, abs/ , [19] Y. Ke, X. Tang, and F. Jing. The Design of High-Level Features for Photo Quality Assessment. In Proc. of CVPR, pages , [20] A. Khosla, A. Das Sarma, and R. Hamid. What Makes an Image Popular? In Proc. of WWW, pages ACM, [21] S. Kong, X. Shen, Z. Lin, R. Mech, and C. Fowlkes. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. In Proc. of ECCV, pages , [22] C. Li and T. Chen. Aesthetic Visual Quality Assessment of Paintings. IEEE J. Sel. Topics Signal Process, 3(2): , [23] C. Li, A. Gallagher, A. C. Loui, and T. Chen. Aesthetic quality assessment of consumer photos with faces. In Proc. of ICIP, pages IEEE, [24] C. Li, A. C. Loui, and T. Chen. Towards Aesthetics: A Photo Quality Assessment and Photo Selection System. In Proc. of MM, pages ACM, [25] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In Proc. of ECCV, pages , [26] L. Liu, R. Chen, L. Wolf, and D. Cohen-Or. Optimizing Photo Composition. Comput. Graph. Forum, 29(2): , [27] K. Lo, K. Liu, and C. Chen. Assessment of photo aesthetics with efficiency. In Proc. of ICPR, pages , [28] X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang. RAPID: Rating Pictorial Aesthetics Using Deep Learning. In Proc. of MM, pages ACM, [29] X. Lu, Z. Lin, X. Shen, R. Mech, and J. Z. Wang. Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation. In Proc. of ICCV, pages , [30] W. Luo, X. Wang, and X. Tang. Content-based Photo Quality Assessment. In Proc. of ICCV, pages , [31] H. B. Mann and D. R. Whitney. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1):50 60, [32] L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka. Assessing the Aesthetic Quality of Photographs Using Generic Image Descriptors. In Proc. of ICCV, pages , [33] F. J. Massey. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, 46(253):68 78, [34] A. K. Moorthy, P. Obrador, and N. Oliver. Towards Computational Models of the Visual Aesthetic Appeal of Consumer Videos. In Proc. of ECCV, pages 1 14, [35] N. Murray, L. Marchesotti, and F. Perronnin. AVA: A Large- Scale Database for Aesthetic Visual Analysis. In Proc. of CVPR, pages , [36] M. Nishiyama, T. Okabe, I. Sato, and Y. Sato. Aesthetic Quality Classification of Photographs Based on Color Harmony. In Proc. of CVPR, pages 33 40, [37] P. O Donovan, A. Agarwala, and A. Hertzmann. Color Compatibility from Large Datasets. ACM Trans. Graph., 30(4):63:1 63:12, 2011.

10 [38] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vision, 115(3): , [39] J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for Simplicity: The All Convolutional Net. In ICLR (workshop track), [40] H. Tong, M. Li, H.-J. Zhang, J. He, and C. Zhang. Classification of Digital Photos Taken by Photographers or Home Users. In Proc. of PCM, pages , [41] R. R. Varior, M. Haloi, and G. Wang. Gated Siamese Convolutional Neural Network Architecture for Human Re- Identification. CoRR, abs/ , [42] S. Zagoruyko and N. Komodakis. Learning to Compare Image Patches via Convolutional Neural Networks. In Proc. of CVPR, pages , 2015.

An Introduction to Deep Image Aesthetics

An Introduction to Deep Image Aesthetics Seminar in Laboratory of Visual Intelligence and Pattern Analysis (VIPA) An Introduction to Deep Image Aesthetics Yongcheng Jing College of Computer Science and Technology Zhejiang University Zhenchuan

More information

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

Photo Aesthetics Ranking Network with Attributes and Content Adaptation Photo Aesthetics Ranking Network with Attributes and Content Adaptation Shu Kong 1, Xiaohui Shen 2, Zhe Lin 2, Radomir Mech 2, Charless Fowlkes 1 1 UC Irvine {skong2, fowlkes}@ics.uci.edu 2 Adobe Research

More information

arxiv: v2 [cs.cv] 27 Jul 2016

arxiv: v2 [cs.cv] 27 Jul 2016 arxiv:1606.01621v2 [cs.cv] 27 Jul 2016 Photo Aesthetics Ranking Network with Attributes and Adaptation Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes UC Irvine Adobe {skong2,fowlkes}@ics.uci.edu

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network Xin Jin 1,2,LeWu 1, Xinghui Zhou 1, Geng Zhao 1, Xiaokun Zhang 1, Xiaodong Li 1, and Shiming Ge 3(B) 1 Department of Cyber Security,

More information

Deep Aesthetic Quality Assessment with Semantic Information

Deep Aesthetic Quality Assessment with Semantic Information 1 Deep Aesthetic Quality Assessment with Semantic Information Yueying Kao, Ran He, Kaiqi Huang arxiv:1604.04970v3 [cs.cv] 21 Oct 2016 Abstract Human beings often assess the aesthetic quality of an image

More information

IMAGE AESTHETIC PREDICTORS BASED ON WEIGHTED CNNS. Oce Print Logic Technologies, Creteil, France

IMAGE AESTHETIC PREDICTORS BASED ON WEIGHTED CNNS. Oce Print Logic Technologies, Creteil, France IMAGE AESTHETIC PREDICTORS BASED ON WEIGHTED CNNS Bin Jin, Maria V. Ortiz Segovia2 and Sabine Su sstrunk EPFL, Lausanne, Switzerland; 2 Oce Print Logic Technologies, Creteil, France ABSTRACT Convolutional

More information

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering, DeepID: Deep Learning for Face Recognition Xiaogang Wang Department of Electronic Engineering, The Chinese University i of Hong Kong Machine Learning with Big Data Machine learning with small data: overfitting,

More information

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs Damian Borth 1,2, Rongrong Ji 1, Tao Chen 1, Thomas Breuel 2, Shih-Fu Chang 1 1 Columbia University, New York, USA 2 University

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler

More information

arxiv: v1 [cs.cv] 2 Nov 2017

arxiv: v1 [cs.cv] 2 Nov 2017 Understanding and Predicting The Attractiveness of Human Action Shot Bin Dai Institute for Advanced Study, Tsinghua University, Beijing, China daib13@mails.tsinghua.edu.cn Baoyuan Wang Microsoft Research,

More information

Modeling memory for melodies

Modeling memory for melodies Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University

More information

On the mathematics of beauty: beautiful images

On the mathematics of beauty: beautiful images On the mathematics of beauty: beautiful images A. M. Khalili 1 Abstract The question of beauty has inspired philosophers and scientists for centuries. Today, the study of aesthetics is an active research

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

6 Seconds of Sound and Vision: Creativity in Micro-Videos

6 Seconds of Sound and Vision: Creativity in Micro-Videos 6 Seconds of Sound and Vision: Creativity in Micro-Videos Miriam Redi 1 Neil O Hare 1 Rossano Schifanella 3, Michele Trevisiol 2,1 Alejandro Jaimes 1 1 Yahoo Labs, Barcelona, Spain {redi,nohare,ajaimes}@yahoo-inc.com

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

Enabling editors through machine learning

Enabling editors through machine learning Meta Follow Meta is an AI company that provides academics & innovation-driven companies with powerful views of t Dec 9, 2016 9 min read Enabling editors through machine learning Examining the data science

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Image Aesthetics Assessment using Deep Chatterjee s Machine

Image Aesthetics Assessment using Deep Chatterjee s Machine Image Aesthetics Assessment using Deep Chatterjee s Machine Zhangyang Wang, Ding Liu, Shiyu Chang, Florin Dolcos, Diane Beck, Thomas Huang Department of Computer Science and Engineering, Texas A&M University,

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin Indexing local features Wed March 30 Prof. Kristen Grauman UT-Austin Matching local features Kristen Grauman Matching local features? Image 1 Image 2 To generate candidate matches, find patches that have

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016 CS 1674: Intro to Computer Vision Face Detection Prof. Adriana Kovashka University of Pittsburgh November 7, 2016 Today Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs Peak Dynamic Power Estimation of FPGA-mapped Digital Designs Abstract The Peak Dynamic Power Estimation (P DP E) problem involves finding input vector pairs that cause maximum power dissipation (maximum

More information

Enhancing Semantic Features with Compositional Analysis for Scene Recognition

Enhancing Semantic Features with Compositional Analysis for Scene Recognition Enhancing Semantic Features with Compositional Analysis for Scene Recognition Miriam Redi and Bernard Merialdo EURECOM, Sophia Antipolis 2229 Route de Cretes Sophia Antipolis {redi,merialdo}@eurecom.fr

More information

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding Free Viewpoint Switching in Multi-view Video Streaming Using Wyner-Ziv Video Coding Xun Guo 1,, Yan Lu 2, Feng Wu 2, Wen Gao 1, 3, Shipeng Li 2 1 School of Computer Sciences, Harbin Institute of Technology,

More information

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016 CS 1674: Intro to Computer Vision Intro to Recognition Prof. Adriana Kovashka University of Pittsburgh October 24, 2016 Plan for today Examples of visual recognition problems What should we recognize?

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

Neural Aesthetic Image Reviewer

Neural Aesthetic Image Reviewer Neural Aesthetic Image Reviewer Wenshan Wang 1, Su Yang 1,3, Weishan Zhang 2, Jiulong Zhang 3 1 Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University

More information

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR 12th International Society for Music Information Retrieval Conference (ISMIR 2011) NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR Yajie Hu Department of Computer Science University

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

Enhancing Music Maps

Enhancing Music Maps Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing

More information

Universität Bamberg Angewandte Informatik. Seminar KI: gestern, heute, morgen. We are Humor Beings. Understanding and Predicting visual Humor

Universität Bamberg Angewandte Informatik. Seminar KI: gestern, heute, morgen. We are Humor Beings. Understanding and Predicting visual Humor Universität Bamberg Angewandte Informatik Seminar KI: gestern, heute, morgen We are Humor Beings. Understanding and Predicting visual Humor by Daniel Tremmel 18. Februar 2017 advised by Professor Dr. Ute

More information

The Effects of Web Site Aesthetics and Shopping Task on Consumer Online Purchasing Behavior

The Effects of Web Site Aesthetics and Shopping Task on Consumer Online Purchasing Behavior The Effects of Web Site Aesthetics and Shopping Task on Consumer Online Purchasing Behavior Cai, Shun The Logistics Institute - Asia Pacific E3A, Level 3, 7 Engineering Drive 1, Singapore 117574 tlics@nus.edu.sg

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Indexing local features and instance recognition

Indexing local features and instance recognition Indexing local features and instance recognition May 14 th, 2015 Yong Jae Lee UC Davis Announcements PS2 due Saturday 11:59 am 2 Approximating the Laplacian We can approximate the Laplacian with a difference

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

A Novel Parallel-friendly Rate Control Scheme for HEVC

A Novel Parallel-friendly Rate Control Scheme for HEVC A Novel Parallel-friendly Rate Control Scheme for HEVC Jianfeng Xie, Li Song, Rong Xie, Zhengyi Luo, Min Chen Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University Cooperative

More information

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect

More information

Improving Performance in Neural Networks Using a Boosting Algorithm

Improving Performance in Neural Networks Using a Boosting Algorithm - Improving Performance in Neural Networks Using a Boosting Algorithm Harris Drucker AT&T Bell Laboratories Holmdel, NJ 07733 Robert Schapire AT&T Bell Laboratories Murray Hill, NJ 07974 Patrice Simard

More information

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Story Tracking in Video News Broadcasts Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Acknowledgements Motivation Modern world is awash in information Coming from multiple sources Around the clock

More information

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER PERCEPTUAL QUALITY OF H./AVC DEBLOCKING FILTER Y. Zhong, I. Richardson, A. Miller and Y. Zhao School of Enginnering, The Robert Gordon University, Schoolhill, Aberdeen, AB1 1FR, UK Phone: + 1, Fax: + 1,

More information

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Introduction Active neurons communicate by action potential firing (spikes), accompanied

More information

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning Mitsubishi Electric Research Laboratories, Cambridge, MA, USA {peker,ajayd,}@merl.com

More information

CS 7643: Deep Learning

CS 7643: Deep Learning CS 7643: Deep Learning Topics: Stride, padding Pooling layers Fully-connected layers as convolutions Backprop in conv layers Dhruv Batra Georgia Tech Invited Talks Sumit Chopra on CNNs for Pixel Labeling

More information

Composer Style Attribution

Composer Style Attribution Composer Style Attribution Jacqueline Speiser, Vishesh Gupta Introduction Josquin des Prez (1450 1521) is one of the most famous composers of the Renaissance. Despite his fame, there exists a significant

More information

UC San Diego UC San Diego Previously Published Works

UC San Diego UC San Diego Previously Published Works UC San Diego UC San Diego Previously Published Works Title Classification of MPEG-2 Transport Stream Packet Loss Visibility Permalink https://escholarship.org/uc/item/9wk791h Authors Shin, J Cosman, P

More information

BIBLIOMETRIC REPORT. Bibliometric analysis of Mälardalen University. Final Report - updated. April 28 th, 2014

BIBLIOMETRIC REPORT. Bibliometric analysis of Mälardalen University. Final Report - updated. April 28 th, 2014 BIBLIOMETRIC REPORT Bibliometric analysis of Mälardalen University Final Report - updated April 28 th, 2014 Bibliometric analysis of Mälardalen University Report for Mälardalen University Per Nyström PhD,

More information

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Hendrik Vincent Koops 1, W. Bas de Haas 2, Jeroen Bransen 2, and Anja Volk 1 arxiv:1706.09552v1 [cs.sd]

More information

Summarizing Long First-Person Videos

Summarizing Long First-Person Videos CVPR 2016 Workshop: Moving Cameras Meet Video Surveillance: From Body-Borne Cameras to Drones Summarizing Long First-Person Videos Kristen Grauman Department of Computer Science University of Texas at

More information

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions 1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

Judging a Book by its Cover

Judging a Book by its Cover Judging a Book by its Cover Brian Kenji Iwana, Syed Tahseen Raza Rizvi, Sheraz Ahmed, Andreas Dengel, Seiichi Uchida Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan Email:

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

DATA SCIENCE Journal of Computing and Applied Informatics

DATA SCIENCE Journal of Computing and Applied Informatics Journal of Computing and Applied Informatics (JoCAI) Vol. 01, No. 1, 2017 13-20 DATA SCIENCE Journal of Computing and Applied Informatics Subject Bias in Image Aesthetic Appeal Ratings Ernestasia Siahaan

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

Understanding PQR, DMOS, and PSNR Measurements

Understanding PQR, DMOS, and PSNR Measurements Understanding PQR, DMOS, and PSNR Measurements Introduction Compression systems and other video processing devices impact picture quality in various ways. Consumers quality expectations continue to rise

More information

arxiv: v1 [cs.cv] 21 Nov 2015

arxiv: v1 [cs.cv] 21 Nov 2015 Mapping Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets arxiv:1511.06838v1 [cs.cv] 21 Nov 2015 Takuya Narihira Sony / ICSI takuya.narihira@jp.sony.com Stella X. Yu UC Berkeley / ICSI

More information

Less is More: Picking Informative Frames for Video Captioning

Less is More: Picking Informative Frames for Video Captioning Less is More: Picking Informative Frames for Video Captioning ECCV 2018 Yangyu Chen 1, Shuhui Wang 2, Weigang Zhang 3 and Qingming Huang 1,2 1 University of Chinese Academy of Science, Beijing, 100049,

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 7.9 THE FUTURE OF SOUND

More information

A Discriminative Approach to Topic-based Citation Recommendation

A Discriminative Approach to Topic-based Citation Recommendation A Discriminative Approach to Topic-based Citation Recommendation Jie Tang and Jing Zhang Department of Computer Science and Technology, Tsinghua University, Beijing, 100084. China jietang@tsinghua.edu.cn,zhangjing@keg.cs.tsinghua.edu.cn

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

II. SYSTEM MODEL In a single cell, an access point and multiple wireless terminals are located. We only consider the downlink

II. SYSTEM MODEL In a single cell, an access point and multiple wireless terminals are located. We only consider the downlink Subcarrier allocation for variable bit rate video streams in wireless OFDM systems James Gross, Jirka Klaue, Holger Karl, Adam Wolisz TU Berlin, Einsteinufer 25, 1587 Berlin, Germany {gross,jklaue,karl,wolisz}@ee.tu-berlin.de

More information

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories

More information

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS Yuanyi Xue, Yao Wang Department of Electrical and Computer Engineering Polytechnic

More information

Convolutional Neural Networks as a Computational Model for the Underlying Processes of Aesthetics Perception

Convolutional Neural Networks as a Computational Model for the Underlying Processes of Aesthetics Perception Convolutional Neural Networks as a Computational Model for the Underlying Processes of Aesthetics Perception Joachim Denzler, Erik Rodner, Marcel Simon Computer Vision Group, Friedrich Schiller University

More information

Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill

Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill White Paper Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill May 2009 Author David Pemberton- Smith Implementation Group, Synopsys, Inc. Executive Summary Many semiconductor

More information

Brain.fm Theory & Process

Brain.fm Theory & Process Brain.fm Theory & Process At Brain.fm we develop and deliver functional music, directly optimized for its effects on our behavior. Our goal is to help the listener achieve desired mental states such as

More information

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Constant Bit Rate for Video Streaming Over Packet Switching Networks International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Constant Bit Rate for Video Streaming Over Packet Switching Networks Mr. S. P.V Subba rao 1, Y. Renuka Devi 2 Associate professor

More information

Image Aesthetics and Content in Selecting Memorable Keyframes from Lifelogs

Image Aesthetics and Content in Selecting Memorable Keyframes from Lifelogs Image Aesthetics and Content in Selecting Memorable Keyframes from Lifelogs Feiyan Hu and Alan F. Smeaton Insight Centre for Data Analytics Dublin City University, Dublin 9, Ireland {alan.smeaton}@dcu.ie

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA Ming-Ju Wu Computer Science Department National Tsing Hua University Hsinchu, Taiwan brian.wu@mirlab.org Jyh-Shing Roger Jang Computer

More information

arxiv: v2 [cs.cv] 15 Mar 2016

arxiv: v2 [cs.cv] 15 Mar 2016 arxiv:1601.04155v2 [cs.cv] 15 Mar 2016 Brain-Inspired Deep Networks for Image Aesthetics Assessment Zhangyang Wang, Shiyu Chang, Florin Dolcos, Diane Beck, Ding Liu, and Thomas Huang Beckman Institute,

More information

How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors

How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors WHITE PAPER How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors Some video frames take longer to process than others because of the nature of digital video compression.

More information

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Luiz G. L. B. M. de Vasconcelos Research & Development Department Globo TV Network Email: luiz.vasconcelos@tvglobo.com.br

More information

gresearch Focus Cognitive Sciences

gresearch Focus Cognitive Sciences Learning about Music Cognition by Asking MIR Questions Sebastian Stober August 12, 2016 CogMIR, New York City sstober@uni-potsdam.de http://www.uni-potsdam.de/mlcog/ MLC g Machine Learning in Cognitive

More information

Evaluating Melodic Encodings for Use in Cover Song Identification

Evaluating Melodic Encodings for Use in Cover Song Identification Evaluating Melodic Encodings for Use in Cover Song Identification David D. Wickland wickland@uoguelph.ca David A. Calvert dcalvert@uoguelph.ca James Harley jharley@uoguelph.ca ABSTRACT Cover song identification

More information

Quantify. The Subjective. PQM: A New Quantitative Tool for Evaluating Display Design Options

Quantify. The Subjective. PQM: A New Quantitative Tool for Evaluating Display Design Options PQM: A New Quantitative Tool for Evaluating Display Design Options Software, Electronics, and Mechanical Systems Laboratory 3M Optical Systems Division Jennifer F. Schumacher, John Van Derlofske, Brian

More information

Adaptive Distributed Compressed Video Sensing

Adaptive Distributed Compressed Video Sensing Journal of Information Hiding and Multimedia Signal Processing 2014 ISSN 2073-4212 Ubiquitous International Volume 5, Number 1, January 2014 Adaptive Distributed Compressed Video Sensing Xue Zhang 1,3,

More information

Film Grain Technology

Film Grain Technology Film Grain Technology Hollywood Post Alliance February 2006 Jeff Cooper jeff.cooper@thomson.net What is Film Grain? Film grain results from the physical granularity of the photographic emulsion Film grain

More information

Digital Correction for Multibit D/A Converters

Digital Correction for Multibit D/A Converters Digital Correction for Multibit D/A Converters José L. Ceballos 1, Jesper Steensgaard 2 and Gabor C. Temes 1 1 Dept. of Electrical Engineering and Computer Science, Oregon State University, Corvallis,

More information

FOIL it! Find One mismatch between Image and Language caption

FOIL it! Find One mismatch between Image and Language caption FOIL it! Find One mismatch between Image and Language caption ACL, Vancouver, 31st July, 2017 Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Journal of Energy and Power Engineering 10 (2016) 504-512 doi: 10.17265/1934-8975/2016.08.007 D DAVID PUBLISHING A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations

More information

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik Discriminative and Generative Models for Image-Language Understanding Svetlana Lazebnik Image-language understanding Robot, take the pan off the stove! Discriminative image-language tasks Image-sentence

More information

HIT SONG SCIENCE IS NOT YET A SCIENCE

HIT SONG SCIENCE IS NOT YET A SCIENCE HIT SONG SCIENCE IS NOT YET A SCIENCE François Pachet Sony CSL pachet@csl.sony.fr Pierre Roy Sony CSL roy@csl.sony.fr ABSTRACT We describe a large-scale experiment aiming at validating the hypothesis that

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005. Wang, D., Canagarajah, CN., & Bull, DR. (2005). S frame design for multiple description video coding. In IEEE International Symposium on Circuits and Systems (ISCAS) Kobe, Japan (Vol. 3, pp. 19 - ). Institute

More information

Learning beautiful (and ugly) attributes

Learning beautiful (and ugly) attributes MARCHESOTTI, PERRONNIN: LEARNING BEAUTIFUL (AND UGLY) ATTRIBUTES 1 Learning beautiful (and ugly) attributes Luca Marchesotti luca.marchesotti@xerox.com Florent Perronnin florent.perronnin@xerox.com XRCE

More information