Automatic Speech Recognition (CS753)

Similar documents
First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

CS 644: NATURAL LANGUAGE DIALOGUE SYSTEMS Lecture 1: Overview of Dialogue Research and Dialogue Systems

Will computers ever be able to chat with us?

VoiceWeb: Wizard of Oz

Audio scripts Transkripte

Hidden Markov Model based dance recognition

Beyond Intents! NLU for Conversational UIs. Head of Research. MetaForum 2017, Brussels. Dr. Rebecca Jonsson

LISTENING Test. Now listen to an example: You hear: Woman: Where did you go this weekend? The correct answer is C. Are there any questions?

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Level 1 & 2 Mini Story Transcripts

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Unit 7. Exercise 1. Listening Activity: Taking a Test about International Tourist Destinations, p.

Language Processing and Computational Linguistics

I don t think we ve met.

A: Hi. B: Hi. A: Hello. A: Hi. How are you? B: OK. And you? A: Great. B: How are you? A: Not bad, thanks. And you? B: Fine. What s your name?

Paper 1F: Listening and understanding in Italian. Tuesday 24 May 2011 Afternoon Time: 25 minutes and 5 minutes reading

Lesson 41: Dining Out (20-25 minutes)

HERE AND THERE. Vocabulary Collocations. Grammar Present continuous: all forms

Lingua Inglese 3. Lecture 5. Searle s Classification of Speech Acts. Representatives: the speaker is committed in

Music AND YOU. Today s message board topic: What s your favourite music? And your favourite way to listen?

Speaking and Vocabulary

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

ENGLISH FILE Beginner

Sarcasm Detection in Text: Design Document

Generating Chinese Classical Poems Based on Images

Where are we? Lecture 37: Modelling Conversations. Gap. Conversations

USA WESTBOUND LCL SAILING SCHEDULES

(Faculty/field of study)

CS61C : Machine Structures

Can I help you? Telephone calls. 2 Match the sentence beginnings (a g) to the endings (1 7) to make common

My interests. Vocabulary. Free-time activities. Let s go to the new pizza place. Good idea! I m really hungry. What are you drawing?

Punctuating Personality 1.15

Automatic Labelling of tabla signals

Multi-Agent and Semantic Web Systems: Ontologies

English in Mind. Level 2. Module 1. Guided Dialogues RESOURCES MODULE 1 GUIDED DIALOGUES

Musical Creativity. Jukka Toivanen Introduction to Computational Creativity Dept. of Computer Science University of Helsinki

YOUR QUOTE SUMMARY Dubai International Academy

Probabilist modeling of musical chord sequences for music analysis

It is a rough transcript, capturing as much of the audible conversation as possible.

1 Unit friendship TEST. Vocabulary. 6. A:... is the party going to start? B: At three.

Conversational Analysis C H A P T E R 5

KEY ENGLISH TEST for Schools. Reading and Writing 0082/01 SAMPLE TEST 3. Time. 1 hour 10 minutes


Analysis and Clustering of Musical Compositions using Melody-based Features

GREETINGS. When you enter a room, see someone you know or meet someone new, it is polite to greet him or her. To greet someone, you:

Proposal for Application of Speech Techniques to Music Analysis

Automatic Generation of Four-part Harmony

What about you? What would you say? Would you agree or disagree? (Lösungsvorschlag)

The Abbey Studio Classroom Abroad to Cambridge, London & the south of France October 1-12, 2016

The verbal group B2. Grammar-Vocabulary WORKBOOK. A complementary resource to your online TELL ME MORE Training Learning Language: English

Basic Natural Language Processing

Week 14 Music Understanding and Classification

Liberty View Elementary. Social Smarts

Sample unit. me to ask him visit my aunt. about work there for you?

Jahrgangsstufentest. an bayerischen Realschulen

M: Let s talk about the newsletter. W: OK, let s check what we ve got so far. We ve decided to have one main story and one short story, right?

How to Write Dialogue Well Transcript

Presentations- Correct the Errors

2 Present perfect of be Study the sentences, answer the questions and look at the table below. A: I went to Barcelona last summer. B: I ve been to Ice

New Inside Out Beginner Units Tests

Guide to the Republic as it sets up Plato s discussion of education in the Allegory of the Cave.

Musical Harmonization with Constraints: A Survey. Overview. Computers and Music. Tonal Music

Speaking. Procedure, Script and Materials. Time: 20 minutes. You now have 10 minutes to prepare the picture story

English as a Second Language Podcast ESL Podcast 217 Lost and Found

Time out. Module. Discuss: What do you usually do in your free time? What kind of music/films do you like? What s in this module?

Let s Get Together. Reading. Exam Reminder. Exam Task

Luigi Scognamiglio: My name is Luigi Scognamiglio, and I am calling. you on behalf of Tonino Corona s granddaughter, from the United

Talking about the Future in English. Rules Stories Exercises SAMPLE CHAPTER. By Really Learn English

Your Sentiment Precedes You: Using an author s historical tweets to predict sarcasm

On the weekend UNIT. In this unit. 1 Listen and read.

Which notice (A H) says this (1 5)? For questions 1 5, mark the correct letter A H on your answer sheet. A B C D E F G H

Sentence Processing. BCS 152 October

ENGLISH FILE. Grammar, Vocabulary, Pronunciation, and Practical English. New. Beginner. 1 Underline the correct word(s) in each sentence.

A question of sport. Vocabulary. Grammar. 4 unit 1

Exploiting Cross-Document Relations for Multi-document Evolving Summarization

PJJ Programme 1 ST FACE TO FACE SESSION. Date: 25 February 2017

Sudhanshu Gautam *1, Sarita Soni 2. M-Tech Computer Science, BBAU Central University, Lucknow, Uttar Pradesh, India

MELODY CLASSIFICATION USING A SIMILARITY METRIC BASED ON KOLMOGOROV COMPLEXITY

Part A. New Inside Out Beginner Units Tests. Vocabulary. Food, drink and sport. Colours. Adjectives. 1 Write food, drink or sport.

ინგლისური ენა. 2. My younger brother loves school and schoolbag is always full of books. A. her C. their B. his D. our

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

A2.2 Extra Listening Test 1

Reply to Romero and Soria

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

English for Speakers of Other Languages Young Learners Elementary

JETSET (JET Version) Reading

Meet Our Museum Podcast: Mary Lou Williams: Jazz Master Date: 2010 ****************************************************************************

138 Great Problems in Philosophy and Physics - Solved? Chapter 11. Meaning. This chapter on the web informationphilosopher.com/knowledge/meaning

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

BBC LEARNING ENGLISH The Grammar Gameshow

LearnEnglish Elementary Podcast Series 02 Episode 08

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection

Usability Evaluation of Speech User Interfaces for Three Currency Conversion Prototypes

Heuristic Search & Local Search

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Chorale Harmonisation in the Style of J.S. Bach A Machine Learning Approach. Alex Chilvers

8 HERE AND THERE _OUT_BEG_SB.indb 68 13/09/ :41

Music Radar: A Web-based Query by Humming System

Transcription:

Automatic Speech Recognition (CS753) Lecture 22: Conversational Agents Instructor: Preethi Jyothi Oct 26, 2017 (All images were reproduced from JM, chapters 29,30)

Chatbots Rule-based chatbots Historical prototype: ELIZA Responses generated by detecting patterns and applying rules

Chatbots Rule-based chatbots Historical prototype: ELIZA Responses generated by detecting patterns and applying rules Data-driven chatbots Two main types: 1. Information retrieval (IR) based bots 2. Machine learning (ML) based bots

Frame-based Systems

Frame-based dialog systems Modern task-specific dialog systems are based on a domain ontology which defines one or more frames Frames: Collection of slots and values Slot Type ORIGIN CITY city DESTINATION CITY city DEPARTURE TIME time DEPARTURE DATE date ARRIVAL TIME time ARRIVAL DATE date modern frame-based dialog agen

Finite-state dialog Manager Goal is to fill slots in the frames with fillers obtained from the user What city are you leaving from? Where are you going? What date do you want to leave? Is it a one-way trip? Yes No Do you want to go from <FROM> to <TO> on <DATE>? What date do you want to return? No Yes Do you want to go from <FROM> to <TO> on <DATE> returning on <RETURN>? Yes No Book the flight Figure 28.9 A simple finite-state automaton architecture for frame-based dialog. System Initiative!

Natural Language Understanding (NLU) Extract frame-based information from the user s utterances Goals of the NLU component: 1. Domain classification 2. Intent determination 3. Slot filling Show me morning flights from Boston to San Francisco on Tuesday DOMAIN: INTENT: ORIGIN-CITY: ORIGIN-DATE: ORIGIN-TIME: DEST-CITY: AIR-TRAVEL SHOW-FLIGHTS Boston Tuesday morning San Francisco

Rule-based Semantic Grammars SHOW! show me i want can i see... DEPART TIME RANGE! (after around before) HOUR morning afternoon evening HOUR! one two three four... twelve (AMPM) FLIGHTS! (a) flight flights AMPM ORIGIN! am pm! from CITY DESTINATION CITY! to CITY! Boston San Francisco Denver Washington S SHOW FLIGHTS ORIGIN DESTINATION DEPARTDATE DEPARTTIME Show me flights from Boston to San Francisco on Tuesday morning Figure 28.10 A semantic grammar parse for a user sentence, using slot names as the internal parse tree nodes.

Alternative: ML-based approaches Use a sequence model (CRF, RNN) to directly assign slot labels to words in the sentence O O O O O B-DES I-DES O B-DEPTIME I-DEPTIME O I want to fly to San Francisco on Monday afternoon please Rule-based systems could be used to bootstrap ML-based systems

Evaluating dialogue systems Subjective score: User satisfaction ratings Objective metrics: 1. Task completion success: Evaluate correctness of the whole solution 2. Efficiency cost: Total elapsed time for the dialog, total number of turns, number of system non-responses, etc.

Dialog State Systems

Dialog-state or belief-state architecture Fill slots like the frame-based dialog systems. But also, Determine what dialog act the user was making Generate new dialog acts, ask questions, reject suggestions, acknowledge an utterance, etc. Take into account the dialog context Needs a dialog policy

Dialog-state system LEAVING FROM DOWNTOWN 0.6 { from: downtown } 0.5 LEAVING AT ONE P M 0.2 { depart-time: 1300 } 0.3 ARRIVING AT ONE P M 0.1 { arrive-time: 1300 } 0.1 Automatic Speech Recognition (ASR) Spoken Language Understanding (SLU) Dialog State Tracker (DST) from: downtown to: airport from: CMU depart-time: -- from: to: CMU airport confirmed: no to: depart-time: airport 1300 score: 0.65 depart-time: confirmed: 1300 no confirmed: score: no 0.15 score: 0.10 FROM DOWNTOWN, IS THAT RIGHT? act: confirm from: downtown Text to Speech (TTS) Natural Language Generation (NLG) Dialog Policy Figure 29.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al. (2016).

What is a dialog act? Speech acts: Each utterance in a dialog is an action performed by the speaker E.g.: making orders (issuing directives), stating constraints (issuing assertives), thanking the system (issuing acknowledgements), etc. Grounding: Ground the speaker s utterances and make it clear that the hearer has understood the speaker s meaning System: Did you want to review some more of your personal profile? Caller: No. System: What s next? System: Okay, what s next?

What is a dialog act? Speech acts: Each utterance in a dialog is an action performed by the speaker E.g.: making orders (issuing directives), stating constraints (issuing assertives), thanking the system (issuing acknowledgements), etc. Grounding: Ground the speaker s utterances and make it clear that the hearer has understood the speaker s meaning Dialog acts: Speech acts + grounding combined in a single action

Dialog acts used by a restaurant recommendation system Tag Sys User Description HELLO(a = x,b = y,...) X X Open a dialog and give info a = x,b = y,... INFORM(a = x,b = y,...) X X Give info a = x,b = y,... REQUEST(a,b = x,...) X X Request value for a given b = x,... REQALTS(a = x,...) c X Request alternative with a = x,... CONFIRM(a = x,b = y,...) X X Explicitly confirm a = x,b = y,... CONFREQ(a = x,...,d) X c Implicitly confirm a = x,... and request value of d SELECT(a = x, a = y) X c Implicitly confirm a = x,... and request value of d AFFIRM(a = x,b = y,...) X X Affirm and give further info a = x,b = y,... NEGATE(a = x) c X Negate and give corrected value a = x DENY(a = x) c X Deny that a = x BYE() X X Close a dialog Figure 29.4 Dialogue acts used by the HIS restaurant recommendation system of Young

Dialog acts used by a restaurant recommendation system Tag Sys User Description HELLO(a = x,b = y,...) X X Open a dialog and give info a = x,b = y,... INFORM(a = x,b = y,...) X X Give info a = x,b = y,... REQUEST(a,b = x,...) X X Request value for a given b = x,... REQALTS(a = x,...) c X Request alternative with a = x,... CONFIRM(a = x,b = y,...) X X Explicitly confirm a = x,b = y,... CONFREQ(a = x,...,d) X c Implicitly confirm a = x,... and request value of d SELECT(a = x, a = y) X c Implicitly confirm a = x,... and request value of d AFFIRM(a = x,b = y,...) X X Affirm and give further info a = x,b = y,... NEGATE(a = x) c X Negate and give corrected value a = x DENY(a = x) c X Deny that a = x BYE() X X Close a dialog Figure 29.4 Dialogue acts used by the HIS restaurant recommendation system of Young Utterance Dialogue act U: Hi, I am looking for somewhere to eat. hello(task = find,type=restaurant) S: You are looking for a restaurant. What type of food do you like? confreq(type = restaurant, food) U: I d like an Italian somewhere near the inform(food = Italian, near=museum) museum. S: Roma is a nice Italian restaurant near the museum. inform(name = "Roma", type = restaurant, food = Italian, near = museum) U: Is it reasonably priced? confirm(pricerange = moderate) S: Yes, Roma is in the moderate price range. affirm(name = "Roma", pricerange = moderate) U: What is the phone number? request(phone) S: The number of Roma is 385456. inform(name = "Roma", phone = "385456") U: Ok, thank you goodbye. bye()

How do we interpret dialog acts? Given an utterance, decide whether it is a question, a statement, an acknowledgement, etc. Maybe just look at the form of the utterance & decide? YES-NO QUESTION Will breakfast be served on USAir 1557? STATEMENT I don t care about lunch. COMMAND Show me flights from Milwaukee to Orlando. Doesn t work often Can you give me a list of the flights from Atlanta to Boston? A OPEN-OPTION B HOLD B CHECK A ACCEPT I was wanting to make some arrangements for a trip that I m going to be taking uh to LA uh beginning of the week after next. OK uh let me pull up your profile and I ll be right with you here. [pause] And you said you wanted to travel next week? Uh yes.

Dialog act detection + slot-filling First pass classifier that determines the dialog act of the sentence Second pass sequence classification algorithm that performs slot-filling Can rely on a wide variety of features including unigrams & bigrams, parse features, punctuations, dialog context, prosodic features, etc.

Dialog Policy What dialog act should the system generate next? Example: Confirmation/Rejection Explicit Confirmation Implicit Confirmation Rejection S: Which city do you want to leave from? U: Baltimore. S: Do you want to leave from Baltimore? U: Yes. U: I d like to fly from Denver Colorado to New York City on September U: I want to travel to Berlin S: When do you want to travel to Berlin? U2: Hi I d like to fly to Seattle Tuesday Morning System: When would you like to leave? Caller: Well, um, I need to be in New York in time for the first World Series game. System: <reject>. Sorry, I didn t get that. Please say the month and day you d like to leave. Caller: I wanna go on October fifteenth.

Dialog Policy At turn i, predict which action A i to take based on the entire sequence of dialog acts from the system (A) and user (U): Â i = argmax A i 2A P(A i (A 1,U 1,...,A i 1,U i 1 ) fy this by maintaining as the dialog state ma Entire dialog state could be simplified by conditioning only on the current state of the frame and last turn by A and U: Â i = argmax A i 2A P(A i Frame i 1,A i 1,U i 1 ) nough corpus of conversations, these proba Probabilities can be estimated using a classifier

Dialog Policy Suggested approach only looks at the past of the dialog, ignores whether the action is likely to lead to a successful interaction MDP (Markov decision processes): The agent has complete knowledge of the environment and its own current state, but the effects of its actions are non-deterministic. S Finite set of states A Finite set of actions R : (S A) R is the reward function T : (S A S) [0, 1] is the state transition function Compute a policy π that specifies which action a the agent should take in a given state s so as to receive the highest reward

Bellman equation The expected cumulative reward Q(s, a) for taking action a from state s: Q(s,a)=R(s,a)+g X s 0 P(s 0 s,a)max a 0 Q(s 0,a 0 ) Value iteration algorithm used to solve the Bellman equation POMDPs (Partially observable MDPs): The agent only has probabilistic information about its current state.

Designing dialog agents

Designing dialog agents Example: Find out whether users prefer interacting with a bot that produces speech in an accent closer to their own Two comparable conversational agents: One bot with an Indian accent, the other with an American accent Wizard of Oz study Human controlled bots Participants

Designing dialog agents Example: Find out whether users prefer interacting with a bot that produces speech in an accent closer to their own Two comparable conversational agents: Recommend a restaurant/recommend a movie Comparable lengths, prompts, interestingness Canned responses in Indian and non-indian accents Hi, I m Rajeev. I hear you want to go out for a meal Would you prefer vegetarian or non-vegetarian food? I mean, do you like live music or would you prefer a quiet place? A small cafe or a grand restaurant? Hi, I m Rohit. I hear you want to watch a movie... Which language do you want to watch it in? That s very interesting Now tell me, what kind of movies do you dislike?

How about end-to-end dialog systems? Serban et al., Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models, AAAI 2016