Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Lecture 22: Conversational Agents Instructor: Preethi Jyothi Oct 26, 2017 (All images were reproduced from JM, chapters 29,30)

Chatbots Rule-based chatbots Historical prototype: ELIZA Responses generated by detecting patterns and applying rules

Chatbots Rule-based chatbots Historical prototype: ELIZA Responses generated by detecting patterns and applying rules Data-driven chatbots Two main types: 1. Information retrieval (IR) based bots 2. Machine learning (ML) based bots

Frame-based Systems

Frame-based dialog systems Modern task-specific dialog systems are based on a domain ontology which defines one or more frames Frames: Collection of slots and values Slot Type ORIGIN CITY city DESTINATION CITY city DEPARTURE TIME time DEPARTURE DATE date ARRIVAL TIME time ARRIVAL DATE date modern frame-based dialog agen

Finite-state dialog Manager Goal is to fill slots in the frames with fillers obtained from the user What city are you leaving from? Where are you going? What date do you want to leave? Is it a one-way trip? Yes No Do you want to go from <FROM> to <TO> on <DATE>? What date do you want to return? No Yes Do you want to go from <FROM> to <TO> on <DATE> returning on <RETURN>? Yes No Book the flight Figure 28.9 A simple finite-state automaton architecture for frame-based dialog. System Initiative!

Natural Language Understanding (NLU) Extract frame-based information from the user s utterances Goals of the NLU component: 1. Domain classification 2. Intent determination 3. Slot filling Show me morning flights from Boston to San Francisco on Tuesday DOMAIN: INTENT: ORIGIN-CITY: ORIGIN-DATE: ORIGIN-TIME: DEST-CITY: AIR-TRAVEL SHOW-FLIGHTS Boston Tuesday morning San Francisco

Rule-based Semantic Grammars SHOW! show me i want can i see... DEPART TIME RANGE! (after around before) HOUR morning afternoon evening HOUR! one two three four... twelve (AMPM) FLIGHTS! (a) flight flights AMPM ORIGIN! am pm! from CITY DESTINATION CITY! to CITY! Boston San Francisco Denver Washington S SHOW FLIGHTS ORIGIN DESTINATION DEPARTDATE DEPARTTIME Show me flights from Boston to San Francisco on Tuesday morning Figure 28.10 A semantic grammar parse for a user sentence, using slot names as the internal parse tree nodes.

Alternative: ML-based approaches Use a sequence model (CRF, RNN) to directly assign slot labels to words in the sentence O O O O O B-DES I-DES O B-DEPTIME I-DEPTIME O I want to fly to San Francisco on Monday afternoon please Rule-based systems could be used to bootstrap ML-based systems

Evaluating dialogue systems Subjective score: User satisfaction ratings Objective metrics: 1. Task completion success: Evaluate correctness of the whole solution 2. Efficiency cost: Total elapsed time for the dialog, total number of turns, number of system non-responses, etc.

Dialog State Systems

Dialog-state or belief-state architecture Fill slots like the frame-based dialog systems. But also, Determine what dialog act the user was making Generate new dialog acts, ask questions, reject suggestions, acknowledge an utterance, etc. Take into account the dialog context Needs a dialog policy

Dialog-state system LEAVING FROM DOWNTOWN 0.6 { from: downtown } 0.5 LEAVING AT ONE P M 0.2 { depart-time: 1300 } 0.3 ARRIVING AT ONE P M 0.1 { arrive-time: 1300 } 0.1 Automatic Speech Recognition (ASR) Spoken Language Understanding (SLU) Dialog State Tracker (DST) from: downtown to: airport from: CMU depart-time: -- from: to: CMU airport confirmed: no to: depart-time: airport 1300 score: 0.65 depart-time: confirmed: 1300 no confirmed: score: no 0.15 score: 0.10 FROM DOWNTOWN, IS THAT RIGHT? act: confirm from: downtown Text to Speech (TTS) Natural Language Generation (NLG) Dialog Policy Figure 29.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al. (2016).

What is a dialog act? Speech acts: Each utterance in a dialog is an action performed by the speaker E.g.: making orders (issuing directives), stating constraints (issuing assertives), thanking the system (issuing acknowledgements), etc. Grounding: Ground the speaker s utterances and make it clear that the hearer has understood the speaker s meaning System: Did you want to review some more of your personal profile? Caller: No. System: What s next? System: Okay, what s next?

What is a dialog act? Speech acts: Each utterance in a dialog is an action performed by the speaker E.g.: making orders (issuing directives), stating constraints (issuing assertives), thanking the system (issuing acknowledgements), etc. Grounding: Ground the speaker s utterances and make it clear that the hearer has understood the speaker s meaning Dialog acts: Speech acts + grounding combined in a single action

Dialog acts used by a restaurant recommendation system Tag Sys User Description HELLO(a = x,b = y,...) X X Open a dialog and give info a = x,b = y,... INFORM(a = x,b = y,...) X X Give info a = x,b = y,... REQUEST(a,b = x,...) X X Request value for a given b = x,... REQALTS(a = x,...) c X Request alternative with a = x,... CONFIRM(a = x,b = y,...) X X Explicitly confirm a = x,b = y,... CONFREQ(a = x,...,d) X c Implicitly confirm a = x,... and request value of d SELECT(a = x, a = y) X c Implicitly confirm a = x,... and request value of d AFFIRM(a = x,b = y,...) X X Affirm and give further info a = x,b = y,... NEGATE(a = x) c X Negate and give corrected value a = x DENY(a = x) c X Deny that a = x BYE() X X Close a dialog Figure 29.4 Dialogue acts used by the HIS restaurant recommendation system of Young Utterance Dialogue act U: Hi, I am looking for somewhere to eat. hello(task = find,type=restaurant) S: You are looking for a restaurant. What type of food do you like? confreq(type = restaurant, food) U: I d like an Italian somewhere near the inform(food = Italian, near=museum) museum. S: Roma is a nice Italian restaurant near the museum. inform(name = "Roma", type = restaurant, food = Italian, near = museum) U: Is it reasonably priced? confirm(pricerange = moderate) S: Yes, Roma is in the moderate price range. affirm(name = "Roma", pricerange = moderate) U: What is the phone number? request(phone) S: The number of Roma is 385456. inform(name = "Roma", phone = "385456") U: Ok, thank you goodbye. bye()

How do we interpret dialog acts? Given an utterance, decide whether it is a question, a statement, an acknowledgement, etc. Maybe just look at the form of the utterance & decide? YES-NO QUESTION Will breakfast be served on USAir 1557? STATEMENT I don t care about lunch. COMMAND Show me flights from Milwaukee to Orlando. Doesn t work often Can you give me a list of the flights from Atlanta to Boston? A OPEN-OPTION B HOLD B CHECK A ACCEPT I was wanting to make some arrangements for a trip that I m going to be taking uh to LA uh beginning of the week after next. OK uh let me pull up your profile and I ll be right with you here. [pause] And you said you wanted to travel next week? Uh yes.

Dialog act detection + slot-filling First pass classifier that determines the dialog act of the sentence Second pass sequence classification algorithm that performs slot-filling Can rely on a wide variety of features including unigrams & bigrams, parse features, punctuations, dialog context, prosodic features, etc.

Dialog Policy What dialog act should the system generate next? Example: Confirmation/Rejection Explicit Confirmation Implicit Confirmation Rejection S: Which city do you want to leave from? U: Baltimore. S: Do you want to leave from Baltimore? U: Yes. U: I d like to fly from Denver Colorado to New York City on September U: I want to travel to Berlin S: When do you want to travel to Berlin? U2: Hi I d like to fly to Seattle Tuesday Morning System: When would you like to leave? Caller: Well, um, I need to be in New York in time for the first World Series game. System: <reject>. Sorry, I didn t get that. Please say the month and day you d like to leave. Caller: I wanna go on October fifteenth.

Dialog Policy At turn i, predict which action A i to take based on the entire sequence of dialog acts from the system (A) and user (U): Â i = argmax A i 2A P(A i (A 1,U 1,...,A i 1,U i 1 ) fy this by maintaining as the dialog state ma Entire dialog state could be simplified by conditioning only on the current state of the frame and last turn by A and U: Â i = argmax A i 2A P(A i Frame i 1,A i 1,U i 1 ) nough corpus of conversations, these proba Probabilities can be estimated using a classifier

Dialog Policy Suggested approach only looks at the past of the dialog, ignores whether the action is likely to lead to a successful interaction MDP (Markov decision processes): The agent has complete knowledge of the environment and its own current state, but the effects of its actions are non-deterministic. S Finite set of states A Finite set of actions R : (S A) R is the reward function T : (S A S) [0, 1] is the state transition function Compute a policy π that specifies which action a the agent should take in a given state s so as to receive the highest reward

Bellman equation The expected cumulative reward Q(s, a) for taking action a from state s: Q(s,a)=R(s,a)+g X s 0 P(s 0 s,a)max a 0 Q(s 0,a 0 ) Value iteration algorithm used to solve the Bellman equation POMDPs (Partially observable MDPs): The agent only has probabilistic information about its current state.

Designing dialog agents

Designing dialog agents Example: Find out whether users prefer interacting with a bot that produces speech in an accent closer to their own Two comparable conversational agents: Recommend a restaurant/recommend a movie Comparable lengths, prompts, interestingness Canned responses in Indian and non-indian accents Hi, I m Rajeev. I hear you want to go out for a meal Would you prefer vegetarian or non-vegetarian food? I mean, do you like live music or would you prefer a quiet place? A small cafe or a grand restaurant? Hi, I m Rohit. I hear you want to watch a movie... Which language do you want to watch it in? That s very interesting Now tell me, what kind of movies do you dislike?

How about end-to-end dialog systems? Serban et al., Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models, AAAI 2016