Impact of Deep Learning

Similar documents
Joint Image and Text Representation for Aesthetics Analysis

A Discriminative Approach to Topic-based Citation Recommendation

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Composition with RNN

Deep feature learning for cover song identification

Large Scale Concepts and Classifiers for Describing Visual Sentiment in Social Multimedia

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Sequence generation and classification with VAEs and RNNs

Image Steganalysis: Challenges

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

Detecting Musical Key with Supervised Learning

Singer Traits Identification using Deep Neural Network

Personalized TV Recommendation with Mixture Probabilistic Matrix Factorization

Automatic Music Genre Classification

StatPatternRecognition: Status and Plans. Ilya Narsky, Caltech

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

Large scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Multi-Agent and Semantic Web Systems: Ontologies

Pedestrian Detection with a Large-Field-Of-View Deep Network

2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Representations in Deep Neural Nets. Paul Humphreys July

arxiv: v1 [cs.lg] 15 Jun 2016

Improving Frame Based Automatic Laughter Detection

Computational Modelling of Harmony

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

DISTRIBUTION STATEMENT A 7001Ö

Neural Aesthetic Image Reviewer

Automatic Labelling of tabla signals

CS 1674: Intro to Computer Vision. Intro to Recognition. Prof. Adriana Kovashka University of Pittsburgh October 24, 2016

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Representations of Sound in Deep Learning of Audio Features from Music

Using Variational Autoencoders to Learn Variations in Data

Deep Recurrent Music Writer: Memory-enhanced Variational Autoencoder-based Musical Score Composition and an Objective Measure

LOCOCODE versus PCA and ICA. Jurgen Schmidhuber. IDSIA, Corso Elvezia 36. CH-6900-Lugano, Switzerland. Abstract

BayesianBand: Jam Session System based on Mutual Prediction by User and System

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

Learning to Create Jazz Melodies Using Deep Belief Nets

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

Automatic Construction of Synthetic Musical Instruments and Performers

arxiv: v2 [cs.sd] 31 Mar 2017

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Rewind: A Music Transcription Method

Music Genre Classification and Variance Comparison on Number of Genres

Automatic Laughter Detection

arxiv: v1 [cs.sd] 9 Dec 2017

Retrieval and Annotation of Music Using Latent Semantic Models

Singing voice synthesis based on deep neural networks

VIDEO COLOR GRADING VIA DEEP NEURAL NETWORKS

Hidden Markov Model based dance recognition

Automatic Laughter Detection

Experiments on musical instrument separation using multiplecause

Chord Classification of an Audio Signal using Artificial Neural Network

Do Television and Radio Destroy Social Capital? Evidence from Indonesian Villages Online Appendix Benjamin A. Olken February 27, 2009

Neural Network for Music Instrument Identi cation

An assessment of learned score features for modeling expressive dynamics in music

Research Projects. Measuring music similarity and recommending music. Douglas Eck Research Statement 2

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

Deep Jammer: A Music Generation Model

Music Genre Classification

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Music Information Retrieval with Temporal Features and Timbre

EXPLORING DATA AUGMENTATION FOR IMPROVED SINGING VOICE DETECTION WITH NEURAL NETWORKS

Decision-Maker Preference Modeling in Interactive Multiobjective Optimization

gresearch Focus Cognitive Sciences

LSTM Neural Style Transfer in Music Using Computational Musicology

Incremental Dataset Definition for Large Scale Musicological Research

Rewind: A Transcription Method and Website

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Automatic Piano Music Transcription

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

A Graphical Model for Chord Progressions Embedded in a Psychoacoustic Space

DOWNBEAT TRACKING WITH MULTIPLE FEATURES AND DEEP NEURAL NETWORKS

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

arxiv: v1 [cs.cv] 16 Jul 2017

Predicting Similar Songs Using Musical Structure Armin Namavari, Blake Howell, Gene Lewis

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

CS229 Project Report Polyphonic Piano Transcription

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Week 14 Music Understanding and Classification

CS 7643: Deep Learning

Query By Humming: Finding Songs in a Polyphonic Database

Chord Representations for Probabilistic Models

Grade 2 - English Ongoing Assessment T-2( ) Lesson 4 Diary of a Spider. Vocabulary

DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison

Automatic Rhythmic Notation from Single Voice Audio Sources

Modelling Symbolic Music: Beyond the Piano Roll

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Raining Book Ideas. Raining Book Ideas

A probabilistic approach to determining bass voice leading in melodic harmonisation

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Stuart Hall: Encoding Decoding

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

A Study on Music Genre Recognition and Classification Techniques

Power Words come. she. here. * these words account for up to 50% of all words in school texts

Transcription:

Impact of Deep Learning Speech Recogni4on Computer Vision Recommender Systems Language Understanding Drug Discovery and Medical Image Analysis [Courtesy of R. Salakhutdinov]

Deep Belief Networks: Training [Hinton & Salakhutdinov, 26]

Very Large Scale Use of DBN s [Quoc Le, et al., ICML, 212] Data: 1 million 2x2 unlabeled images, sampled from YouTube Training: use 1 machines (16 cores) for 1 week Learned network: 3 multi-stage layers, 1.15 billion parameters Achieves 15.8% (was 9.5%) accuracy classifying 1 of 2k ImageNet items Real images that most excite the feature: Image synthesized to most excite the feature:

Restricted Boltzmann Machines Graphical Models: Powerful framework for represen4ng dependency structure between random variables. hidden variables Pair- wise Unary Feature Detectors Image visible variables RBM is a Markov Random Field with: Stochas4c binary visible variables Stochas4c binary hidden variables Bipar4te connec4ons. Markov random fields, Boltzmann machines, log- linear models.

Model Learning Hidden units Given a set of i.i.d. training examples, we want to learn model parameters. Maximize log- likelihood objec4ve: Image visible units Deriva4ve of the log- likelihood:

Deep Boltzmann Machines Low- level features: Edges Built from unlabeled inputs. Image Input: Pixels (Salakhutdinov & Hinton, Neural Computation 212)

Deep Boltzmann Machines Learn simpler representa4ons, then compose more complex ones Higher- level features: Combina4on of edges Low- level features: Edges Built from unlabeled inputs. Image Input: Pixels (Salakhutdinov 28, Salakhutdinov & Hinton 212)

Model Formula4on h 3 h 2 h 1 v Input W 3 W 2 W 1 Same as RBMs requires approximate inference to train, but it can be done and scales to millions of examples

Samples Generated by the Model Training Data Model- Generated Samples Data

Handwri4ng Recogni4on MNIST Dataset Op4cal Character Recogni4on 6, examples of 1 digits 42,152 examples of 26 English le_ers Learning Algorithm Error Logis4c regression 12.% K- NN 3.9% Neural Net (Pla_ 25) 1.53% SVM (Decoste et.al. 22) 1.4% Deep Autoencoder (Bengio et. al. 27) Deep Belief Net (Hinton et. al. 26) 1.4% 1.2% DBM.95% Learning Algorithm Error Logis4c regression 22.14% K- NN 18.92% Neural Net 14.62% SVM (Larochelle et.al. 29) 9.7% Deep Autoencoder (Bengio et. al. 27) Deep Belief Net (Larochelle et. al. 29) 1.5% 9.68% DBM 8.4% Permuta4on- invariant version.

3- D object Recogni4on NORB Dataset: 24, examples Learning Algorithm Error Logis4c regression 22.5% K- NN (LeCun 24) 18.92% SVM (Bengio & LeCun 27) 11.6% Deep Belief Net (Nair & Hinton 29) 9.% DBM 7.2% Pa_ern Comple4on

Learning Shared Representa4ons Across Sensory Modali4es Concept sunset, pacific ocean, baker beach, seashore, ocean

Mul4modal DBM Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)

Mul4modal DBM Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)

Mul4modal DBM Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)

Mul4modal DBM Bo_om- up + Top- down Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)

Mul4modal DBM Bo_om- up + Top- down Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)

Text Generated from Images Given Generated Given Generated dog, cat, pet, ki_en, puppy, ginger, tongue, ki_y, dogs, furry insect, bu_erfly, insects, bug, bu_erflies, lepidoptera sea, france, boat, mer, beach, river, bretagne, plage, bri_any graffi4, streetart, stencil, s4cker, urbanart, graff, sanfrancisco portrait, child, kid, ritra_o, kids, children, boy, cute, boys, italy canada, nature, sunrise, ontario, fog, mist, bc, morning

Text Generated from Images Given Generated portrait, women, army, soldier, mother, postcard, soldiers obama, barackobama, elec4on, poli4cs, president, hope, change, sanfrancisco, conven4on, rally water, glass, beer, bo_le, drink, wine, bubbles, splash, drops, drop

Images Selected from Text Given Retrieved water, red, sunset nature, flower, red, green blue, green, yellow, colors chocolate, cake

Summary Efficient learning algorithms for Deep Learning Models. Learning more adap4ve, robust, and structured representa4ons. Learning a Category Image Tagging Text & image retrieval / Hierarchy Object recognigon mosque, tower, building, cathedral, dome, castle Speech RecogniGon HMM decoder MulGmodal Data CapGon GeneraGon sunset, pacific ocean, beach, seashore Deep models improve the current state- of- the art in many applica4on domains: Ø Object recogni4on and detec4on, text and image retrieval, handwri_en character and speech recogni4on, and others. [Courtesy, R. Salakhutdinov]