Impact of Deep Learning Speech Recogni4on Computer Vision Recommender Systems Language Understanding Drug Discovery and Medical Image Analysis [Courtesy of R. Salakhutdinov]
Deep Belief Networks: Training [Hinton & Salakhutdinov, 26]
Very Large Scale Use of DBN s [Quoc Le, et al., ICML, 212] Data: 1 million 2x2 unlabeled images, sampled from YouTube Training: use 1 machines (16 cores) for 1 week Learned network: 3 multi-stage layers, 1.15 billion parameters Achieves 15.8% (was 9.5%) accuracy classifying 1 of 2k ImageNet items Real images that most excite the feature: Image synthesized to most excite the feature:
Restricted Boltzmann Machines Graphical Models: Powerful framework for represen4ng dependency structure between random variables. hidden variables Pair- wise Unary Feature Detectors Image visible variables RBM is a Markov Random Field with: Stochas4c binary visible variables Stochas4c binary hidden variables Bipar4te connec4ons. Markov random fields, Boltzmann machines, log- linear models.
Model Learning Hidden units Given a set of i.i.d. training examples, we want to learn model parameters. Maximize log- likelihood objec4ve: Image visible units Deriva4ve of the log- likelihood:
Deep Boltzmann Machines Low- level features: Edges Built from unlabeled inputs. Image Input: Pixels (Salakhutdinov & Hinton, Neural Computation 212)
Deep Boltzmann Machines Learn simpler representa4ons, then compose more complex ones Higher- level features: Combina4on of edges Low- level features: Edges Built from unlabeled inputs. Image Input: Pixels (Salakhutdinov 28, Salakhutdinov & Hinton 212)
Model Formula4on h 3 h 2 h 1 v Input W 3 W 2 W 1 Same as RBMs requires approximate inference to train, but it can be done and scales to millions of examples
Samples Generated by the Model Training Data Model- Generated Samples Data
Handwri4ng Recogni4on MNIST Dataset Op4cal Character Recogni4on 6, examples of 1 digits 42,152 examples of 26 English le_ers Learning Algorithm Error Logis4c regression 12.% K- NN 3.9% Neural Net (Pla_ 25) 1.53% SVM (Decoste et.al. 22) 1.4% Deep Autoencoder (Bengio et. al. 27) Deep Belief Net (Hinton et. al. 26) 1.4% 1.2% DBM.95% Learning Algorithm Error Logis4c regression 22.14% K- NN 18.92% Neural Net 14.62% SVM (Larochelle et.al. 29) 9.7% Deep Autoencoder (Bengio et. al. 27) Deep Belief Net (Larochelle et. al. 29) 1.5% 9.68% DBM 8.4% Permuta4on- invariant version.
3- D object Recogni4on NORB Dataset: 24, examples Learning Algorithm Error Logis4c regression 22.5% K- NN (LeCun 24) 18.92% SVM (Bengio & LeCun 27) 11.6% Deep Belief Net (Nair & Hinton 29) 9.% DBM 7.2% Pa_ern Comple4on
Learning Shared Representa4ons Across Sensory Modali4es Concept sunset, pacific ocean, baker beach, seashore, ocean
Mul4modal DBM Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)
Mul4modal DBM Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)
Mul4modal DBM Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)
Mul4modal DBM Bo_om- up + Top- down Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)
Mul4modal DBM Bo_om- up + Top- down Gaussian model Replicated Sojmax Dense, real- valued image features 1 Word counts (Srivastava & Salakhutdinov, NIPS 212, JMLR 214)
Text Generated from Images Given Generated Given Generated dog, cat, pet, ki_en, puppy, ginger, tongue, ki_y, dogs, furry insect, bu_erfly, insects, bug, bu_erflies, lepidoptera sea, france, boat, mer, beach, river, bretagne, plage, bri_any graffi4, streetart, stencil, s4cker, urbanart, graff, sanfrancisco portrait, child, kid, ritra_o, kids, children, boy, cute, boys, italy canada, nature, sunrise, ontario, fog, mist, bc, morning
Text Generated from Images Given Generated portrait, women, army, soldier, mother, postcard, soldiers obama, barackobama, elec4on, poli4cs, president, hope, change, sanfrancisco, conven4on, rally water, glass, beer, bo_le, drink, wine, bubbles, splash, drops, drop
Images Selected from Text Given Retrieved water, red, sunset nature, flower, red, green blue, green, yellow, colors chocolate, cake
Summary Efficient learning algorithms for Deep Learning Models. Learning more adap4ve, robust, and structured representa4ons. Learning a Category Image Tagging Text & image retrieval / Hierarchy Object recognigon mosque, tower, building, cathedral, dome, castle Speech RecogniGon HMM decoder MulGmodal Data CapGon GeneraGon sunset, pacific ocean, beach, seashore Deep models improve the current state- of- the art in many applica4on domains: Ø Object recogni4on and detec4on, text and image retrieval, handwri_en character and speech recogni4on, and others. [Courtesy, R. Salakhutdinov]