Will computers ever be able to chat with us?

Similar documents
Automatic Speech Recognition (CS753)

Image-to-Markup Generation with Coarse-to-Fine Attention

Music Composition with RNN

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

ENGLISH FILE Beginner

Chord Classification of an Audio Signal using Artificial Neural Network

arxiv: v2 [cs.sd] 15 Jun 2017

Joint Image and Text Representation for Aesthetics Analysis

LSTM Neural Style Transfer in Music Using Computational Musicology

Less is More: Picking Informative Frames for Video Captioning

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

arxiv: v1 [cs.cv] 16 Jul 2017

Neural Aesthetic Image Reviewer

Shimon the Robot Film Composer and DeepScore


Hidden Markov Model based dance recognition

arxiv: v3 [cs.sd] 14 Jul 2017

Modeling Musical Context Using Word2vec

arxiv: v1 [cs.lg] 15 Jun 2016

Arts, Computers and Artificial Intelligence

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

SentiMozart: Music Generation based on Emotions

arxiv: v2 [cs.sd] 31 Mar 2017

TUNING RECURRENT NEURAL NETWORKS WITH RE-

Singing voice synthesis based on deep neural networks

Detecting Musical Key with Supervised Learning

Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks with a Novel Image-Based Representation

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Generating Chinese Classical Poems Based on Images

INSTITUTO POLITÉCNICO NACIONAL CECYT 8 NARCISO BASSOLS ETS ENGLISH GUIDE EXAM ENGLISH III

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY

BayesianBand: Jam Session System based on Mutual Prediction by User and System

Automated sound generation based on image colour spectrum with using the recurrent neural network

Deep Jammer: A Music Generation Model

Music Generation from MIDI datasets

Business Intelligence & Process Modelling

James Vasek (JV): Your first name, and will you state your name for me?

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

An Introduction to Deep Image Aesthetics

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

A Multi-Modal Chinese Poetry Generation Model

Visual Dialog. Devi Parikh

Singer Traits Identification using Deep Neural Network

2. Problem formulation

The decoder in statistical machine translation: how does it work?

A Discriminative Approach to Topic-based Citation Recommendation

Algorithmic Composition of Melodies with Deep Recurrent Neural Networks

arxiv: v1 [cs.ir] 16 Jan 2019

Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Generating Music from Text: Mapping Embeddings to a VAE s Latent Space

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

Using Variational Autoencoders to Learn Variations in Data

Music Genre Classification

HFCC Learning Lab Punctuation 1.6 QUOTATION MARKS

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

Target Vocabulary (Underlining indicates a word or word form from the Academic Word

Deep feature learning for cover song identification

Implementation of a turbo codes test bed in the Simulink environment

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

CS229 Project Report Polyphonic Piano Transcription

DVB-S2 and DVB-RCS for VSAT and Direct Satellite TV Broadcasting

Paraphrasing Nega-on Structures for Sen-ment Analysis

Dual frame motion compensation for a rate switching network

BBC LEARNING ENGLISH Shakespeare Speaks

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Music Genre Classification and Variance Comparison on Number of Genres

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Dual frame motion compensation for a rate switching network

Sample. A Recipe for Disaster. Introduction: Detective s Log. A Recipe for Disaster. Did you know... FALSE ALARM: Introduction Detective Series

Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System for Audio Signals

Enabling editors through machine learning

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Deep Recurrent Music Writer: Memory-enhanced Variational Autoencoder-based Musical Score Composition and an Objective Measure

Final Project [Tic-Tac-Toe]

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University


Sequential Elements con t Synchronous Digital Systems

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

Decision-Maker Preference Modeling in Interactive Multiobjective Optimization

ENGLISH FILE Elementary

Lyrics Classification using Naive Bayes

FOURTH YEAR WEB SAMPLE 2017

Sentiment and Sarcasm Classification with Multitask Learning

Blue - 1st. Double Blue - Yellow. Double. Green - Double Green - Orange - Pink - Free - Reader

Generating Music with Recurrent Neural Networks

FOIL it! Find One mismatch between Image and Language caption

BBC LEARNING ENGLISH 6 Minute Vocabulary Pronouncing verbs and nouns

Basic Natural Language Processing

Computational modeling of conversational humor in psychotherapy

Transcription:

1 / 26 Will computers ever be able to chat with us? Marco Baroni Center for Mind/Brain Sciences University of Trento ESSLLI Evening Lecture August 18th, 2016

Acknowledging... Angeliki Lazaridou Gemma Boleda, Germán Kruszewski, Roberto Zamparelli Raffaella Bernardi, Nghia Pham, Jason Weston, Louise McNally Marc Aurelio Ranzato, Sumit Chopra, Antoine Bordes, Arthur Szlam, Tomas Mikolov, Raquel Fernandez, Denis Paperno 2 / 26

Conversation is easy (?) Garrod and Pickering TICS 2004 3 / 26

reinforce- The conversational agent pipeline to interact ing speech n medium deployed reduce the sts. More hones has le s Siri ue of intento mobile are shown erstanding an abstract t u t. The s the next nown as a must therefore implement dialogue strategies that account for the unreliability of the input and provide error checking and recovery mechanisms. As a consequence, conventional deterministic flowchart-based systems are expensive to build and often fragile in operation. Input speech User System Response Spoken Language Understanding (SLU) Natural Language Generation (NLG) u t a t State Estimator Policy s t Dialogue Manager Fig. 1. Components of a finite state-based spoken dialogue system. At each turn the input speech is converted to an abstract representation of the user s intent u t, the dialogue state s t is updated and a deterministic decision rule called a policy maps the state into an action a t in response. During the last few years, a new approach to dialogue management has emerged based on the mathematical framework of partially observable Markov decision processes (POMDPs 2 ) [9], [10], [11]. This approach assumes that dialogue evolves Young et al. IEEE 2012 4 / 26

End-to-end neural networks http: //stats.stackexchange.com/questions/114385 5 / 26

Pipeline-based machine translation Anthes ACM 2010 6 / 26

7 / 26 End-to-end machine translation Training data Io sono il dottore di cui in questa novella si parla talvolta con parole poco lusinghiere. ==> I am the doctor occasionally mentioned in this story, in unflattering terms.

were the first to map the entire input sentence to vector, and is related to Cho et al. [5] altho tter End-to-end was used only for (sequence-to-sequence) rescoring hypotheses produced by aphrase-based machinesystem. Graves[ duced a novel differentiable attention mechanism that allowsneuralnetworkstofocuson t parts translation of their input, and an elegant variant of this idea was successfully applied to mach lation Architecture by Bahdanau et al. [2]. The Connectionist Sequence Classification is another popu ique for mapping sequences to sequences with neural networks, but it assumes a monoto ment between the inputs and the outputs [11]. re 1: Our model reads an input sentence ABC and produces WXYZ as theoutputsentence. l stops making predictions after outputting the end-of-sentence token. Note that the LSTM reads sentence in reverse, because doing so introduces many short term dependencies in the data that make ization problem much easier. main result of this work is the following. On the WMT 14 English to French translation ta btained a BLEU score of 34.81 by directly extracting translations from an ensemble of 5 d s (with 384M parameters and 8,000 dimensional state each) using a simple left-to-right bea Sutskever h decoder. This et al. isnips by far2014 the best result achieved by direct translation with large neural n 8 / 26

End-to-end (sequence-to-sequence) conversational agents Training data A: Shall we call Alan Turner? B: Alan Turner? A: Yeah. B: I haven t got his number have you got it? A: No I thought you had your book with you this week. http://bncweb.lancs.ac.uk/ 9 / 26

End-to-end (sequence-to-sequence) conversational agents Architecture onversationalmodel - an l., or d is h- g t- Figure 1. Using the seq2seq framework for modeling conversations. u- l., al e. 5) and train to map ABC to WXYZ as shown in Figure 1 above. VinyalsThe and hidden Le ICML state of DLW the model 2015when it receives the end of sequence symbol <eos> canbeviewedasthe 10 / 26

End-to-end conversational agents Examples H: i am seeing an error related to vpn M: what is the error message that you are getting when connecting to vpn using network connect? H: connection refused or something like that M: try this, goto <URL> page and enter your name and password H: done M: try logging in with <NAME> and check if its working fine now H: yes, now it works! Vinyals and Le ICML DLW 2015 11 / 26

End-to-end conversational agents Examples H: what do you think about messi? M: he s a great player. H: what do you think about cleopatra? M: oh, she s very regal. H: what do you think about england during the reign of elizabeth? M: it was a great place. Vinyals and Le ICML DLW 2015 12 / 26

End-to-end conversational agents Examples H: what is your job? M: i m a lawyer. H: what do you do? M: i m a doctor. Vinyals and Le ICML DLW 2015 13 / 26

14 / 26 End-to-end conversational agents Anything you say can and will be used against you in a court of law H: what are you doing? M: nothing. H: what is your name? M: i don t know, sir. H: how old are you? M: i m not sure. Li et al. arxiv 2016

15 / 26 Learning to converse by predicting the next turn A: Shall we call Alan Turner? B: Alan Turner? A: Yeah. B: I haven t got his number have you got it? A: No I thought you had your book with you this week. http://bncweb.lancs.ac.uk/

16 / 26 Reinforcement learning http://robohub.org/

17 / 26 Learning to interact with humans in the loop Training session #2546 Human: Shall we call Alan Turner? Bot: akfkjkjkjkjf fffff Human: Sorry, what was that? Bot:?taht saw tahw, yrros Human [shakes head in despair]

Lazaridou et al. arxiv 2016 (Towards Multi-Agent Communication-Based Language Learning) 18 / 26 Referential games for conversational agents ank y blue this! ank y Agent A1 Agent A2

19 / 26 Referential games for conversational agents a b c d l r A2 A1

20 / 26 Referential games for conversational agents Datasets Shapes ReferIt image pairs human vocabulary limegreen yellow #unique images 2.4k 22.5k #image pairs 100k 25k #human vocabulary 18 3467

21 / 26 Communication success 1.0 Shapes 1.0 ReferIt communication success 0.9 0.8 0.7 0.6 0.5 2 18 60 100 communication success 0.9 0.8 0.7 0.6 0.5 2 40 60 100 0.4 0 500 1000 1500 2000 2500 3000 3500 # iterations 0.4 0 500 1000 1500 2000 2500 3000 3500 # iterations

AlphaGo observe play 22 / 26

23 / 26 AlphaConverse? observe which picture? the one with a green background talk

What is the goal? What is reward? Goal-directed conversation G: you ve got a diamond mine beside [the start]? F: uh-huh G: and you ve got a fast flowing river over the other corner F: uh-huh G: right, so if you just circle round the diamond mine F: right G: and have you got a graveyard in the middle of the fast flowing river and the diamond mine F: no i don t http://groups.inf.ed.ac.uk/maptask/index.html 24 / 26

http://bncweb.lancs.ac.uk/ 25 / 26 What is the goal? What is reward? Non-goal-directed conversation A: I played football for, when I was ten year old! But, I don t go to football now, I watch the television cos I m seventy three now, but it never any harm! B: Some people say it s not very good for women to play football. A: That s a lot of rubbish! B: Well tell me what s good about football for women? A: Because it s a good sport! B: Mhm. A: There s a, you could do a lot worse things that play football!

thank you! 26 / 26