An Introduction to Deep Image Aesthetics

Similar documents
Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Joint Image and Text Representation for Aesthetics Analysis

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

Deep Aesthetic Quality Assessment with Semantic Information

arxiv: v2 [cs.cv] 27 Jul 2016

IMAGE AESTHETIC PREDICTORS BASED ON WEIGHTED CNNS. Oce Print Logic Technologies, Creteil, France

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

Image Aesthetics Assessment using Deep Chatterjee s Machine

Sarcasm Detection in Text: Design Document

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Singer Traits Identification using Deep Neural Network

arxiv: v2 [cs.cv] 4 Dec 2017

CS 7643: Deep Learning

Image-to-Markup Generation with Coarse-to-Fine Attention

On the mathematics of beauty: beautiful music

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

arxiv: v1 [cs.sd] 5 Apr 2017

Supplementary material for Inverting Visual Representations with Convolutional Networks

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Semantic Image Segmentation via Deep Parsing Network

Image Aesthetics and Content in Selecting Memorable Keyframes from Lifelogs

Stereo Super-resolution via a Deep Convolutional Network

ImageNet Auto-Annotation with Segmentation Propagation

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

Neural Aesthetic Image Reviewer

arxiv: v1 [cs.cv] 2 Nov 2017

Automatic Music Genre Classification

Less is More: Picking Informative Frames for Video Captioning

Deep learning for music data processing

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

FOIL it! Find One mismatch between Image and Language caption

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Detecting Musical Key with Supervised Learning

On the mathematics of beauty: beautiful images

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

arxiv: v1 [cs.ir] 16 Jan 2019

Universität Bamberg Angewandte Informatik. Seminar KI: gestern, heute, morgen. We are Humor Beings. Understanding and Predicting visual Humor

Satoshi Iizuka* Edgar Simo-Serra* Hiroshi Ishikawa Waseda University. (*equal contribution)

Lecture 9 Source Separation

National University of Singapore, Singapore,

arxiv: v2 [cs.cv] 15 Mar 2016

DATA SCIENCE Journal of Computing and Applied Informatics

Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

Learning beautiful (and ugly) attributes

Summarizing Long First-Person Videos

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

CS 2770: Computer Vision. Introduction. Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

Audio Cover Song Identification using Convolutional Neural Network

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

arxiv: v1 [cs.lg] 16 Dec 2017

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

arxiv: v1 [cs.cv] 16 Jul 2017

CS 7643: Deep Learning

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Enhancing Semantic Features with Compositional Analysis for Scene Recognition

SINGING is a popular social activity and a good way of expressing

GOOD-SOUNDS.ORG: A FRAMEWORK TO EXPLORE GOODNESS IN INSTRUMENTAL SOUNDS

Melody classification using patterns

A Discriminative Approach to Topic-based Citation Recommendation

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation

Scene Classification with Inception-7. Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

MIDI-Assisted Egocentric Optical Music Recognition

CERIAS Tech Report Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E

Deep Wavelet Prediction for Image Super-resolution

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Towards Deep Modeling of Music Semantics using EEG Regularizers

Music Similarity and Cover Song Identification: The Case of Jazz

MUSIC tags are descriptive keywords that convey various

Audio spectrogram representations for processing with Convolutional Neural Networks

A Framework for Segmentation of Interview Videos

The cost of reading research. A study of Computer Science publication venues

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Evaluating Melodic Encodings for Use in Cover Song Identification

Automatic Piano Music Transcription

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition

Lyrics Classification using Naive Bayes

Google s Cloud Vision API Is Not Robust To Noise

Hearing Sheet Music: Towards Visual Recognition of Printed Scores

Using Genre Classification to Make Content-based Music Recommendations

Multimodal Music Mood Classification Framework for Christian Kokborok Music

Music genre classification using a hierarchical long short term memory (LSTM) model

Neural Network for Music Instrument Identi cation

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval Community

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

arxiv: v1 [cs.cv] 9 Apr 2018

Experiments on musical instrument separation using multiplecause

gresearch Focus Cognitive Sciences

Generating Chinese Classical Poems Based on Images

Understanding the Changing Roles of Scientific Publications via Citation Embeddings

MUSIC scores are the main medium for transmitting music. In the past, the scores started being handwritten, later they

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

Transcription:

Seminar in Laboratory of Visual Intelligence and Pattern Analysis (VIPA) An Introduction to Deep Image Aesthetics Yongcheng Jing College of Computer Science and Technology Zhejiang University Zhenchuan Huang Alibaba Group 17/01/2018, Hangzhou 1

Outline Problem Statement Development Methods Traditional Methods Deep Methods Conclusions and Future Work 2

Problem Statement (Photographic) Image Aesthetics Assessment Computationally distinguishing high-quality photos form low-quality ones based on photographic rules, typically in the form of: Binary Classification. Quality Scoring. Classification Problem Regression Problem 3

Problem Statement (Photographic) Image Aesthetics Assessment Computationally distinguishing high-quality photos form low-quality ones based on photographic rules, typically in the form of: Binary Classification. Quality Scoring. Deep Image Aesthetics Classification Problem Regression Problem Deep learning based image aesthetics assessment. 4

Problem Statement Examples of High-Quality (Photographic) Images and Low-Quality Images. content RulesOfThirds color, lighting blur High-quality Low-quality 5

Development [1]: If we dig more on Scopus data, we find that the majority of publications comes from: Asia (National University of Singapore, University Tenaga Nasional and Zhejiang University) and North America (Simon Fraser University, Carnegie Mellon University, and Georgia Institute of Technology). Paper Count [1] Spathis, D. (2016). Photo-Quality Evaluation based on Computational Aesthetics: Review of Feature Extraction Techniques. arxiv preprint arxiv:1612.06259. 6

Methods Framework Feature Extraction Decision Input Image Handcrafted Features Deep Features Classification Regression Simple Image Features Image Composition Features General-Purpose Features Task-Specific Features Generic Deep Features Learned Aesthetics Deep Features Traditional Methods Deep Methods [2] Deng, Y., Loy, C. C., & Tang, X. (2017). Image aesthetic assessment: An experimental survey. IEEE Signal Processing Magazine, 34(4), 80-106. 7

Deep Aesthetic Methods 2017 ICCV: Personalized Image Aesthetics ICCV: Deep Cropping via Attention Box Prediction and Aesthetics Assessment CVPR: A-Lamp: Adaptive Layout-Aware Multi-Patch Deep Convolutional Neural Network for Photo Aesthetic Assessment TIP: Deep Aesthetic Quality Assessment with Semantic Information 2016 CVPR: Composition-preserving Deep Photo Aesthetics Assessment ECCV: Photo Aesthetics Ranking Network with Attributes and Content Adaptation ACM MM: Joint Image and Text Representation for Aesthetics Analysis 2015 ICCV: Deep Multi-Patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation 8

Deep Aesthetic Methods Approach: In general, exploit Multi-task CNN Multi-task involves aesthetic assessment, semantic content prediction, attribute prediction, etc. 9

Deep Aesthetic Methods Approach: In general, exploit Multi-task CNN Multi-task involves aesthetic assessment, semantic content prediction, attribute prediction, etc. Network Architecture: Add multiple branches after the classification network (Alexnet, VGG, Resnet, etc. 10

Deep Aesthetic Methods Approach: In general, exploit Multi-task CNN Multi-task involves aesthetic assessment, semantic content prediction, attribute prediction, etc. Network Architecture: Add multiple branches after the classification network (Alexnet, VGG, Resnet, etc. Loss Function: Regression Loss, Content/Attribute Loss, Ranking Loss Multi-branch Training Strategy: Jointly training, Sequential training, Pairwise training Sampling Strategies: 1. Sampling pairs of images with a relatively large difference in their average aesthetic scores. 2. Sample image pairs that have been scored by the same individual. 11

Deep Aesthetic Methods Approach: In general, exploit Multi-task CNN Multi-task involves aesthetic assessment, semantic content prediction, attribute prediction, etc. Network Architecture: Add multiple branches after the classification network (Alexnet, VGG, Resnet, etc. Loss Function: Regression Loss, Content/Attribute Loss, Ranking Loss Multi-branch Training Strategy: Jointly training, Sequential training, Pairwise training Training from scratch VS Fine-tune pre-trained CNN Sampling Strategies: 1. Sampling pairs of images with a relatively large difference in their average aesthetic scores. 2. Sample image pairs that have been scored by the same individual. 12

Deep Aesthetic Methods Approach: In general, exploit Multi-task CNN Multi-task involves aesthetic assessment, semantic content prediction, attribute prediction, etc. Network Architecture: Add multiple branches after the classification network (Alexnet, VGG, Resnet, etc. Loss Function: Regression Loss, Content/Attribute Loss, Ranking Loss Multi-branch Training Strategy: Jointly training, Sequential training, Pairwise training Training from scratch VS Fine-tune pre-trained CNN Dataset: AVA, AADB, etc. (see more in the next slide) Sampling Strategies: 1. Sampling pairs of images with a relatively large difference in their average aesthetic scores. 2. Sample image pairs that have been scored by the same individual. 13

Public Dataset Data label example: Score Attribute 14

Public Dataset A summary of current datasets: NAME TOTAL IMG # RATING PEOPLE # PER IMG DESCRIPTION Photo.Net 20,278 > 10 1) The score is from 0 to 7. CUHK-PQ 17,690 8-10 AVA ~25,000 78-549 AADB 10,1000 5 1) Binary label. 2) Has semantic tags. 1) The score is from 1 to 10. 2) Has semantic tags and attribute tags. 1) Five workers annotate all the images. 2) Has semantic tags and attribute tags. 3) Attribute tags are confidence scores. FLICKR-AES 40,000 5 1) The score is from 1 to 5. 15

Selected Deep Aesthetic Methods Multi-task Convolutional Neural Network (MTCNN) Overview [3] Kao, Y., He, R., & Huang, K. (2017). Deep Aesthetic Quality Assessment With Semantic Information. IEEE Transactions on Image Processing, 26(3), 1482-1495. 16

Selected Deep Aesthetic Methods Multi-task Convolutional Neural Network (MTCNN) Overview [3] Kao, Y., He, R., & Huang, K. (2017). Deep Aesthetic Quality Assessment With Semantic Information. IEEE Transactions on Image Processing, 26(3), 1482-1495. 17

Selected Deep Aesthetic Methods Multi-task Convolutional Neural Network (MTCNN) Network Architecture FC FC FC FC CONV CONV + POOLING CONV CONV 18

Selected Deep Aesthetic Methods Personalized Image Aesthetics Impractical to request every user to label lots of data and train a user-specific model. [4] Ren, J., Shen, X., Lin, Z., Mech, R., & Foran, D. J. (2017, October). Personalized Image Aesthetics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 638-647). 19

Selected Deep Aesthetic Methods Personalized Image Aesthetics Impractical to request every user to label lots of data. Therefore, the solution is to: Step 1. Train a generic aesthetic model. (common preference) Step 2. Adapt the generic model to individual users using a limited number of individual user s labeled examples. [4] Ren, J., Shen, X., Lin, Z., Mech, R., & Foran, D. J. (2017, October). Personalized Image Aesthetics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 638-647). 20

Selected Deep Aesthetic Methods Personalized Image Aesthetics Impractical to request every user to label lots of data. Therefore, the solution is to: Step 1. Train a generic aesthetic model. (common preference) Step 2. Adapt the generic model to individual users using a limited number of individual user s labeled examples. Where to get individual user s labeled examples? Collect 14 personal albums, each album has ~205 photos. Request the owner of the album to rate for their own photos. [4] Ren, J., Shen, X., Lin, Z., Mech, R., & Foran, D. J. (2017, October). Personalized Image Aesthetics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 638-647). 21

Selected Deep Aesthetic Methods Personalized Aesthetics Model (PAM) 1) Generic aesthetic prediction 2) Residual learning for personalized aesthetics Learn the offset. As the data is NOT enough, use high-level features and simply exploit Support Vector Rregressor (SVR) to do regression, instead of FC layer. PAM Design Support vector regressor 22

Selected Deep Aesthetic Methods Composition-preserving Deep CNN Motivation: Current aesthetics algorithms typically transform images as pre-processing, which hurt the performance. (Due to the FC layer to do the regression) Pre-processing

Selected Deep Aesthetic Methods Composition-preserving Deep CNN Approach: Propose an adaptive spatial pooling operation. [4] Mai, L., Jin, H., & Liu, F. (2016). Composition-preserving deep photo aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 497-506). 24

Selected Deep Aesthetic Methods Composition-preserving Deep CNN Approach: Propose an adaptive spatial pooling operation. Regular Pooling (Output feature map size varies with the input) [4] Mai, L., Jin, H., & Liu, F. (2016). Composition-preserving deep photo aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 497-506). 25

Selected Deep Aesthetic Methods Composition-preserving Deep CNN Approach: Propose an adaptive spatial pooling operation. Adaptive Spatial Pooling (Output feature map size is fixed) Regular Pooling (Output feature map size varies with the input) [4] Mai, L., Jin, H., & Liu, F. (2016). Composition-preserving deep photo aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 497-506). 26

Conclusions and Future Work Conclusions A multi-task network can incorporate different information and help learn the aesthetic scores. Semantic and attribute information are effective in learning aesthetics scores as well as personalized aesthetics. Resizing & cropping input images as preprocessing can hurt the performance of aesthetics prediction network. 27

Conclusions and Future Work Future Work Explore self-supervised or unsupervised task-specific image aesthetics assessment algorithm. (not only photography aesthetics) Creating images with high aesthetic score. Already one paper from Google: Creatism: A deep-learning photographer capable of creating professional work 28

Conclusions and Future Work Future Work Explore self-supervised or unsupervised task-specific image aesthetics assessment algorithm. (not only photography aesthetics) Creating images with high aesthetic score. Already one paper from Google: Creatism: A deep-learning photographer capable of creating professional work INPUT: OUTPUT: 29

Discussion 30

Thanks! 31