Detecting Attempts at Humor in Multiparty Meetings

Similar documents
Analysis of the Occurrence of Laughter in Meetings

Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection

Automatic Laughter Segmentation. Mary Tai Knox

Automatic Laughter Detection

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Automatic discrimination between laughter and speech

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Phone-based Plosive Detection

Formalizing Irony with Doxastic Logic

Smile and Laughter in Human-Machine Interaction: a study of engagement

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Analysis and modeling of the role of laughter in Motivational Interviewing based psychotherapy conversations

Transcription of the Singing Melody in Polyphonic Music

Humor: Prosody Analysis and Automatic Recognition for F * R * I * E * N * D * S *

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Automatic Labelling of tabla signals

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Fusion for Audio-Visual Laughter Detection

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Laughter and Topic Transition in Multiparty Conversation

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

AUTOMATIC RECOGNITION OF LAUGHTER

Computational Modelling of Harmony

LAUGHTER serves as an expressive social signal in human

Research Article Drum Sound Detection in Polyphonic Music with Hidden Markov Models

Laughter and Smile Processing for Human-Computer Interactions

An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews

Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

10GBASE-R Test Patterns

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Experiments with Fisher Data

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Frame Processing Time Deviations in Video Processors

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection

Computer Coordination With Popular Music: A New Research Agenda 1

A repetition-based framework for lyric alignment in popular songs

A probabilistic framework for audio-based tonal key and chord recognition

A Framework for Segmentation of Interview Videos

Outline. Why do we classify? Audio Classification

SIDRA INTERSECTION 8.0 UPDATE HISTORY

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

Design Project: Designing a Viterbi Decoder (PART I)

Sentiment Analysis. Andrea Esuli

PREDICTING HUMOR RESPONSE IN DIALOGUES FROM TV SITCOMS. Dario Bertero, Pascale Fung

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli

Music Recommendation from Song Sets

Reducing False Positives in Video Shot Detection

A Novel Bus Encoding Technique for Low Power VLSI

Quick Reference Manual

Retrieval of textual song lyrics from sung inputs

Personalized TV Recommendation with Mixture Probabilistic Matrix Factorization

Laughter Valence Prediction in Motivational Interviewing based on Lexical and Acoustic Cues

Deep Learning of Audio and Language Features for Humor Prediction

MODELS of music begin with a representation of the

MC9211 Computer Organization

AMERICAN NATIONAL STANDARD

Seminar CHIST-ERA Istanbul : 4 March 2014 Kick-off meeting : 27 January 2014 (call IUI 2012)

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

Jazz Melody Generation and Recognition

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

IMPROVED MELODIC SEQUENCE MATCHING FOR QUERY BASED SEARCHING IN INDIAN CLASSICAL MUSIC

Searching for Similar Phrases in Music Audio

UNIT IV. Sequential circuit

Audio Feature Extraction for Corpus Analysis

(12) United States Patent (10) Patent No.: US 6,628,712 B1

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Sequencing and Control

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

A fragment-decoding plus missing-data imputation ASR system evaluated on the 2nd CHiME Challenge

A Low-Power 0.7-V H p Video Decoder

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

MUSI-6201 Computational Music Analysis

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

Classification of Timbre Similarity

Dual frame motion compensation for a rate switching network

Acoustic Prosodic Features In Sarcastic Utterances

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

Section 001. Read this before starting!

CS 591 S1 Computational Audio

Sarcasm Detection in Text: Design Document

A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis

Analogue Versus Digital [5 M]

Perceptual dimensions of short audio clips and corresponding timbre features

Signal Persistence Checking of Asynchronous System Implementation using SPIN

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 1. Note Segmentation and Quantization for Music Information Retrieval

Transcription:

Detecting Attempts at Humor in Multiparty Meetings Kornel Laskowski Carnegie Mellon University Pittsburgh PA, USA 14 September, 2008 K. Laskowski ICSC 2009, Berkeley CA, USA 1/26

Why bother with humor? generally, systems assume uniform truth across utterances humans do not make that assumption a speaker may be unconcerned how their utterance is interpreted but a speaker may covertly perform extra work to pass off as true/serious that which is not speaker is not helping us detect their effort (e.g. lying) or a speaker may overtly perform extra work to pass off as untrue/unserious that which may be taken at face value speaker is helping us detect their effort (e.g. joking) need to detect grades of truth, at least when speakers are collaborative K. Laskowski ICSC 2009, Berkeley CA, USA 2/26

Why bother with humor (part II)? humor plays a socially cohesive role creates vehicle for expressing, maintaining, constructing, dissolving interpersonal relationships systems must detect it, or miss important important cues underlying variability across participants to conversation K. Laskowski ICSC 2009, Berkeley CA, USA 3/26

Why bother with humor (part III)? humor does not occur uniformly in time its occurrence is colocated with segment boundaries at the detection may be helpful to segmentation of conversation at the turn level topic level meta-conversation level systems must detect it, or miss important cues underlying variability across time in conversation K. Laskowski ICSC 2009, Berkeley CA, USA 4/26

Outline of this Talk 1 Introduction 2 Humor in our Data 3 HMM Decoder Framework baseline (oracle) lexical features 4 Modeling Conversational Context speech activity/interaction features laughter activity/interaction features 5 Analysis 6 Conclusions & Recommendations K. Laskowski ICSC 2009, Berkeley CA, USA 5/26

Potential Impact of Modeling Laughter must determine if current speaker is intending to amuse task may be too hard for a computer instead, let humans do the work offline: wait to see if others laugh even if attempt to amuse fails, others may laugh to show that they understand the utterance is not meant seriously online: wait to see if speaker laughs to show that utterance is not meant seriously SPKR A JOKE SPKR B SPKR C K. Laskowski t t ICSC + 1 2009, Berkeley t + 2 CA, USA 6/26

Potential Impact of Modeling Laughter must determine if current speaker is intending to amuse task may be too hard for a computer instead, let humans do the work offline: wait to see if others laugh even if attempt to amuse fails, others may laugh to show that they understand the utterance is not meant seriously online: wait to see if speaker laughs to show that utterance is not meant seriously SPKR A JOKE SPKR B LAUGH SPKR C LAUGH K. Laskowski t t ICSC + 1 2009, Berkeley t + 2 CA, USA 6/26

Potential Impact of Modeling Laughter must determine if current speaker is intending to amuse task may be too hard for a computer instead, let humans do the work offline: wait to see if others laugh even if attempt to amuse fails, others may laugh to show that they understand the utterance is not meant seriously online: wait to see if speaker laughs to show that utterance is not meant seriously SPKR A JOKE LAUGH SPKR B LAUGH SPKR C LAUGH K. Laskowski t t ICSC + 1 2009, Berkeley t + 2 CA, USA 6/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE EMOT. INVOLVED SPEECH Laskowski & Burger, 2006 Wrede & Shriberg, 2003 Neiberg et al, 2006 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE Laskowski & Burger, 2006 Neiberg et al, 2006 EMOT. INVOLVED SPEECH Wrede & Shriberg, 2003 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE EMOT. INVOLVED SPEECH Laskowski & Burger, 2006 Wrede & Shriberg, 2003 Neiberg et al, 2006 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

Computational Context and Prior Work SENTIMENT Somasundaran et al, 2007 HUMOR Clark & Popescu Belis, 2004 EMOTIONAL VALENCE EMOT. INVOLVED SPEECH Laskowski & Burger, 2006 Wrede & Shriberg, 2003 Neiberg et al, 2006 Laskowski, 2008 SPEECH RECOGNITION SPEECH ACTIVITY PROSODIC MODELING LAUGHTER ACTIVITY Kennedy & Ellis, 2004 Truong & van Leeuwen, 2005 Knox & Mirghafori, 2007 AUDIO K. Laskowski ICSC 2009, Berkeley CA, USA 7/26

ICSI Meeting Corpus (Janin et al, 2003; Shriberg et al, 2004) naturally occurring meetings 75 meetings, 66 hours of meeting time TrainSet: 51 meetings DevSet: 11 meetings EvalSet: 11 meetings 3-9 participants per meeting different types unstructured discussion among peers round-table reporting among peers 1 professor and N students meetings human-transcribed words (with forced-alignment), dialog acts K. Laskowski ICSC 2009, Berkeley CA, USA 8/26

Humor Annotation in ICSI Meetings Based on the 8 DA types studied in Laskowski & Shriberg, Modeling Other Talkers for Improved Dialog Act Recognition in Meetings, INTERSPEECH 2009. Propositional Content DA Types statement question s q 85% 6.6% Feedback DA Types backchannel acknowledgment assert Floor Mechanism DA Types b 2.8% floor holder fh 2.5% bk 1.4% floor grabber fg 0.6% aa 1.1% hold h 0.3% K. Laskowski ICSC 2009, Berkeley CA, USA 9/26

Humor Annotation in ICSI Meetings Based on the 8 DA types studied in Laskowski & Shriberg, Modeling Other Talkers for Improved Dialog Act Recognition in Meetings, INTERSPEECH 2009. Propositional Content DA Types statement question s q 85% 6.6% joke Humor-Bearing DA Types j 0.6% Feedback DA Types backchannel acknowledgment assert Floor Mechanism DA Types b 2.8% floor holder fh 2.5% bk 1.4% floor grabber fg 0.6% aa 1.1% hold h 0.3% K. Laskowski ICSC 2009, Berkeley CA, USA 9/26

Goal of this Work SPKR A: SPKR B: SPKR C: SPKR D: K. Laskowski ICSC 2009, Berkeley CA, USA 10/26

Goal of this Work SPKR A: SPKR B: SPKR C: SPKR D: TALKSPURT K. Laskowski ICSC 2009, Berkeley CA, USA 10/26

Goal of this Work SPKR A: SPKR B: SPKR C: SPKR D: LAUGHBOUT K. Laskowski ICSC 2009, Berkeley CA, USA 10/26

Goal of this Work SPKR A: SPKR B: SPKR C: SPKR D: TASK: find speech which is humor-bearing K. Laskowski ICSC 2009, Berkeley CA, USA 10/26

Goal of this Work SPKR A: SPKR B: SPKR C: SPKR D: TASK: find speech which is humor-bearing (DA segmentation and recognition, with focus on a subset of DAs) K. Laskowski ICSC 2009, Berkeley CA, USA 10/26

Talkspurt (TS) Boundaries DA Boundaries SPKR A: SPKR B: SPKR C: SPKR D: decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Talkspurt (TS) Boundaries DA Boundaries SPKR B: TALKSPURT DIALOG ACT decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs explicitly model intra-da silence opposite (N:1 correspondence) may also occur entertain possibility that DA boundaries occur anywhere K. Laskowski ICSC 2009, Berkeley CA, USA 11/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Sub-Topology for DAs NON DA TERMINAL TALKSPURT FRAGMENT INTRA DA TALKSPURT GAP DA TERMINAL TALKSPURT FRAGMENT ENTRY EGRESS SPKR B: K. Laskowski ICSC 2009, Berkeley CA, USA 12/26

Proposed HMM Topology for Conversational Speech the complete topology consists of a DA sub-topology for each of 9 DA types fully connected via inter-da GAP subnetworks s j aa q b h fh fg bk K. Laskowski ICSC 2009, Berkeley CA, USA 13/26

Oracle Lexical Features each 100 ms frame of speech can be assigned to one word w assign to that frame the emission probability: of the bigram of which w is the right token, and of the bigram of wihch w is the left token train a generative model over left and right bigrams for each HMM state bigrams whose probability of occurrence for any DA type is < 0.1% are mapped to UNK K. Laskowski ICSC 2009, Berkeley CA, USA 14/26

Baseline Performance w/o T fully-connected topology, equiprobable transitions w/ T0 proposed topology, equiprobable transitions w/ T1 proposed topology, transitions trained using TrainSet (ML) System DevSet EvalSet FA MS ERR FA MS ERR T0 8.1 90.6 98.7 8.3 92.5 100.7 T1 0.3 96.7 97.0 0.2 94.0 94.2 LEX w/o T 53.6 32.8 86.4 53.7 32.9 86.6 LEX w/ T0 40.2 42.9 83.1 40.5 44.2 84.7 LEX w/ T1 12.7 67.0 79.6 12.8 70.5 83.3 K. Laskowski ICSC 2009, Berkeley CA, USA 15/26

Speech Activity/Interaction Features, S OTH1: SPKR: OTH2: OTH3: OTH4: decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S OTH1: SPKR: OTH2: OTH3: OTH4: decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S OTH1: SPKR: OTH2: OTH3: OTH4: T/2 T/2 decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S OTH1: SPKR: OTH2: OTH3: OTH4: T/2 T/2 decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S OTH1: SPKR: OTH4: OTH3: OTH2: T/2 T/2 decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S SPKR: OTH1: OTH2: OTH3: OTH4: T/2 T/2 decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S SPKR: OTH1: OTH2: OTH3: OTH4: K T/2 T/2 decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S SPKR: OTH1: OTH2: OTH3: OTH4: K FEATURE "VECTOR" T/2 T/2 decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Speech Activity/Interaction Features, S SPKR: OTH1: OTH2: OTH3: OTH4: K FEATURE "VECTOR" T/2 T/2 decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context consider a temporal context of width T want invariance under participant-index rotation rank OTH participants by local speaking time want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA) K. Laskowski ICSC 2009, Berkeley CA, USA 16/26

Laughter Activity/Interaction Features, L process same as for speech activity/interaction features: 1 sort others by amount of laughing time in T-width window 2 extract features from K most-laughing others may be suboptimal (too complex overfit) laughter accounts for 9.6% of vocalizing time in the paper, also consider subsetting all laughter bouts into: voiced bouts (approx. 2 /3 of laughter by time) unvoiced bouts (approx. 1 /3 of laughter by time) K. Laskowski ICSC 2009, Berkeley CA, USA 17/26

System Combination 1 model-space combination ( M ) P ([F S,F L ] [M S, M L ]) P (F S M S ) P (F L M L ) F S F L = f (K,rank (S),S) = f (K,rank (L),L) 2 feature-space combination ( F ) P ([F S,F L ] [M S, M L ]) P ([F S,F L ] M S L ) F S F L = f (K,rank (S),S) = f (K,rank (L),L) 3 feature-computation-space combination ( C ) P ([F S,F L ] [M S, M L ]) P ([F S,F L ] M S L ) F S F L = f (K,rank (S L),S) = f (K,rank (S L),L) K. Laskowski ICSC 2009, Berkeley CA, USA 18/26

Results System DevSet EvalSet FA MS ERR FA MS ERR LEX 12.7 67.0 79.6 12.8 70.5 83.3 S 7.5 47.4 54.9 8.6 62.8 71.4 L 14.0 5.3 19.3 15.6 8.1 23.7 S M L 9.7 6.6 16.3 11.0 8.4 19.4 S F L 6.0 17.8 23.8 6.8 21.6 28.4 S C L 6.0 16.0 22.0 6.4 17.8 24.2 LEX M S M L 7.7 7.2 14.8 8.3 11.0 19.4 L is the best single source of information for this task model-space combination with S leads to improvement combination with LEX leads to improvement on DevSet only K. Laskowski ICSC 2009, Berkeley CA, USA 19/26

Receiver Operating Characteristics (DevSet) 100 TRUE POSITIVE RATE (%) 80 60 40 20 LEX S L LEX+S+L no discr. equal error 0 0 5 10 15 20 FALSE POSITIVE RATE (%) K. Laskowski ICSC 2009, Berkeley CA, USA 20/26

Interpreting Emission Probability Diagrams condition: given an event of type A occurring at time t what is the likelihood that an event of type B occurs at time t [t 5,t + 5] retrain single-gaussian model on unnormalized features PROBABILITY OF OCCURRENCE OF B TIME OF OCCURRENCE OF B K. Laskowski ICSC 2009, Berkeley CA, USA 21/26

Interlocutor Laughter Context at DA Termination j DAs j DAs locally 2nd most laughing locally 1st most laughing K. Laskowski ICSC 2009, Berkeley CA, USA 22/26

Interlocutor Laughter Context at DA Termination j DAs j DAs locally 2nd most laughing locally 1st most laughing K. Laskowski ICSC 2009, Berkeley CA, USA 22/26

Interlocutor Laughter Context at DA Termination j DAs j DAs locally 2nd most laughing locally 1st most laughing K. Laskowski ICSC 2009, Berkeley CA, USA 22/26

Interlocutor Laughter Context at DA Termination j DAs j DAs locally 2nd most laughing locally 1st most laughing K. Laskowski ICSC 2009, Berkeley CA, USA 22/26

Interlocutor Laughter Context at DA Termination j DAs j DAs locally 2nd most laughing locally 1st most laughing K. Laskowski ICSC 2009, Berkeley CA, USA 22/26

Interlocutor Laughter Context at DA Termination j DAs j DAs locally 2nd most laughing locally 1st most laughing K. Laskowski ICSC 2009, Berkeley CA, USA 22/26

Target Speaker Laughter Context j DAs j DAs target speaker How well we do with laughter only from the target speaker? K. Laskowski ICSC 2009, Berkeley CA, USA 23/26

Target Speaker Laughter Context j DAs j DAs target speaker How well we do with laughter only from the target speaker? K. Laskowski ICSC 2009, Berkeley CA, USA 23/26

Target Speaker Laughter Context j DAs j DAs target speaker How well we do with laughter only from the target speaker? K. Laskowski ICSC 2009, Berkeley CA, USA 23/26

Target Speaker Laughter Context j DAs j DAs target speaker How well we do with laughter only from the target speaker? K. Laskowski ICSC 2009, Berkeley CA, USA 23/26

Target Speaker Laughter Context j DAs j DAs target speaker How well we do with laughter only from the target speaker? K. Laskowski ICSC 2009, Berkeley CA, USA 23/26

Target Speaker Laughter Context j DAs j DAs target speaker How well we do with laughter only from the target speaker? System DevSet EvalSet FA MS ERR FA MS ERR S 7.5 47.4 54.9 8.6 62.8 71.4 L 14.0 5.3 19.3 15.6 8.1 23.7 L 8.7 20.3 28.9 8.5 22.4 31.0 K. Laskowski ICSC 2009, Berkeley CA, USA 23/26

Interlocutor j-speech Context at j-da Termination target speaker locally 1st most j-talkative interlocutor locally 2nd most j-talkative interlocutor K. Laskowski ICSC 2009, Berkeley CA, USA 24/26

Interlocutor j-speech Context at j-da Termination target speaker locally 1st most j-talkative interlocutor locally 2nd most j-talkative interlocutor K. Laskowski ICSC 2009, Berkeley CA, USA 24/26

Interlocutor j-speech Context at j-da Termination target speaker locally 1st most j-talkative interlocutor locally 2nd most j-talkative interlocutor K. Laskowski ICSC 2009, Berkeley CA, USA 24/26

Summary GOAL: detect humor-bearing speech APPROACH: frame-level HMM decoding consider multiparticipant speech & laughter context RESULTS: 1 at FPRs of 5% (DevSet): lexical features yield TPRs 4 higher than random guessing speech context yields TPRs 2 higher than lexical features laughter context yields TPRs 2 higher than speech context 2 laughter context features: EER < 24% (EvalSet) 3 model-space combination improves EERs by 5% abs 4 locally most laughing interlocutor more likely to laugh than not 5 evidence that jokers themselves laugh, perhaps to signal intent 6 at most 2 participants likely to joke in any 10 second interval K. Laskowski ICSC 2009, Berkeley CA, USA 25/26

THANK YOU Special thanks to Liz Shriberg, for: access to the ICSI MRDA annotations helpful discussion during this work K. Laskowski ICSC 2009, Berkeley CA, USA 26/26