Bridging the Gap Between Humans and Machines: Lessons from Spoken Language Prof. Roger K. Moore

Bridging the Gap Between Humans and Machines: Lessons from Spoken Language Prof. Roger K. Moore Chair of Spoken Language Processing Dept. Computer Science, University of Sheffield (Visiting Prof., Dept. Phonetics, University College London) (Visiting Prof., Bristol Robotics Lab.) EU-FP7-EASEL DIGIHUM-2017, Helsinki 4 th May 2017 slide 1 Rich History of Technological Development Von Kempelen s talking machine (1791) Radio Rex (1922) Parametric Artificial Talker (1953) Speak n Spell (1983) Interactive Talking Doll (1987) DIGIHUM-2017, Helsinki 4 th May 2017 slide 2 1

Rich History of Technological Development Marconi SR128 (1982) Apple s Siri (2011) Dragon Naturally Speaking (1997) Voice dictation on SmartPhone (2007) DIGIHUM-2017, Helsinki 4 th May 2017 slide 3 Rich History of Technological Development Apple s Siri (2011) Speech-to-Speech Translation DIGIHUM-2017, Helsinki 4 th May 2017 slide 4 2

Rich History of Technological Development Amazon Echo (2015) ch pee -to-s ion h c e t Spe ransla T DIGIHUM-2017, Helsinki Google Home (2016) 4th May 2017 slide 5 Amazing Progress DIGIHUM-2017, Helsinki 4th May 2017 slide 6 3

Amazing Progress Command and Control Systems Dictation Systems Interactive Voice Response (IVR) Systems Voice-Enabled Personal Assistants Embodied Conversational Agents (ECAs) Autonomous Social Agents DIGIHUM-2017, Helsinki 4 th May 2017 slide 7 Amazing Progress Command and Control Systems Dictation Systems Interactive Voice Response (IVR) Systems Voice-Enabled Personal Assistants Embodied Conversational Agents (ECAs) Autonomous Social Agents DIGIHUM-2017, Helsinki 4 th May 2017 slide 8 4

Are We There Yet? Moore, R. K., Li, H., & Liao, S.-H. (2016). Progress and prospects for spoken language technology: what ordinary people think. INTERSPEECH (pp. 3007 3011). San Francisco, CA. DIGIHUM-2017, Helsinki 4th May 2017 slide 11 Are We There Yet? DIGIHUM-2017, Helsinki 4th May 2017 slide 12 6

Are We There Yet? DIGIHUM-2017, Helsinki 4 th May 2017 slide 13 What s the Problem? Variable Ambiguous I do not know I dn uh Meaningful fork handles four candles This nudist play will wreck a nice beach Emotional! Contaminated Contaminated DIGIHUM-2017, Helsinki 4 th May 2017 slide 14 7

What s the Problem? Graph courtesy of Mike Phillips (CEO, Mobeus Corporation) Like a Human Usability Add NL/Dialog Habitability Gap Structured Dialog Flexibility DIGIHUM-2017, Helsinki 4 th May 2017 slide 15 Masahiro Mori DIGIHUM-2017, Helsinki 4 th May 2017 slide 16 8

DIGIHUM-2017, Helsinki 4 th May 2017 slide 17 J J J K L The State-of-the-Art There is steady year-on-year technical progress Recent years have seen significant market penetration and public awareness Improvements come from: increase in available computer power corpus-driven modelling (deep learning) public benchmark testing Progress has not come about as a result of deep insights into human spoken language Spoken language technology is fragile (in real conditions) expensive (to port to new applications / languages) shallow (it doesn t understand language) DIGIHUM-2017, Helsinki 4 th May 2017 slide 18 9

J J J K L The State-of-the-Art There is steady year-on-year technical progress Recent years have seen significant market penetration and public awareness Improvements come from: increase in available computer power corpus-driven modelling (deep learning) public benchmark testing Progress has not come about as a result of deep insights into human spoken language Spoken language technology is fragile (in real conditions) expensive (to port to new applications / languages) shallow (it doesn t understand language) DIGIHUM-2017, Helsinki 4 th May 2017 slide 19 Standard SLP Architecture Introduction and Overview of W3C Speech Interface Framework http://www.w3.org/tr/voice-intro/ DIGIHUM-2017, Helsinki 4 th May 2017 slide 20 10

Standard SLP Architecture Behaviourist STIMULUS RESPONSE Introduction and Overview of W3C Speech Interface Framework http://www.w3.org/tr/voice-intro/ DIGIHUM-2017, Helsinki 4 th May 2017 slide 21 What is Language? Cummins, F. (2011). Periodic and aperiodic synchronization in skilled action. Frontiers in Human Neuroscience, 5(170), 1 9. DIGIHUM-2017, Helsinki 4 th May 2017 slide 22 11

What is Language? Ostensive Inferential Recursive Mind-Reading Scott-Phillips, T. (2015). Speaking Our Minds: Why human communication is different, and how language evolved to make it special. London, New York: Palgrave MacMillan. DIGIHUM-2017, Helsinki 4 th May 2017 slide 23 Human-Human Languaging = Ostensive Inferential Recursive Mind-Reading Moore, R. K. (2016). Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Advanced Robotic Systems, 13(74). DIGIHUM-2017, Helsinki 4 th May 2017 slide 24 12

Human-Agent Languaging = Ostensive Inferential Recursive Mind-Reading Moore, R. K. (2016). Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Advanced Robotic Systems, 13(74). DIGIHUM-2017, Helsinki 4 th May 2017 slide 25 What is Language Like? Cummins, F. (2011). Periodic and aperiodic synchronization in skilled action. Frontiers in Human Neuroscience, 5(170), 1 9. DIGIHUM-2017, Helsinki 4 th May 2017 slide 26 13

Houston, we (may) have a problem Spoken language interaction between human beings is founded on shared experiences, representations and priors The assumption of continuity between a fully coded communication system at one end, and language at the other, is simply not justified. So, is there a fundamental limit to the language-based interaction that can take place between mismatched partners? Moore, R. K. (2016). Is spoken language all-or-nothing? Implications for future speech-based human-machine interaction. In K. Jokinen & G. Wilcock (Eds.), Dialogues with Social Robots Enablements, Analyses, and Evaluation. Springer Lecture Notes in Electrical Engineering (LNEE). DIGIHUM-2017, Helsinki 4 th May 2017 slide 27 Getting it Right Wired: Do you think it s possible to bridge the uncanny valley? Mori: Yes, but why try? I think it s better to design things like Honda s Asimo, which stops right before it gets to be uncanny. DIGIHUM-2017, Helsinki 4 th May 2017 slide 28 14

Getting it Right Wired: Do you think it s possible to bridge the uncanny valley? Mori: Yes, but why try? I think it s better to design things like Honda s Asimo, which stops right before it gets to be uncanny. DIGIHUM-2017, Helsinki 4 th May 2017 slide 29 Getting it Right http://consequentialrobotics.com/miro/ DIGIHUM-2017, Helsinki 4 th May 2017 slide 30 15

Getting it Right http://www.dcs.shef.ac.uk/~roger/ MarkowitzCh12manuscript.pdf Moore, R. K. (2015). From talking and listening robots to intelligent communicative machines. In J. Markowitz (Ed.), Robots That Talk and Listen (pp. 317 335). Boston, MA: De Gruyter. DIGIHUM-2017, Helsinki 4 th May 2017 slide 31 A Glimpse of the Future? DIGIHUM-2017, Helsinki 4 th May 2017 slide 32 16

Thank You Any questions? 4 May 2017 slide 33 http://www.dcs.shef.ac.uk/~roger DIGIHUM-2017, Helsinki th VIHAR-2017 1st International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots 25-26 August 2017 University of Skövde, Sweden http://vihar-2017.vihar.org DIGIHUM-2017, Helsinki 4th May 2017 slide 34 17