CSE 517 Natural Language Processing Winter 2013

Similar documents
Statistical Machine Translation Lecture 5. Decoding with Phrase-Based Models

Machine Translation: Examples. Statistical NLP Spring MT: Evaluation. Phrasal / Syntactic MT: Examples. Lecture 7: Phrase-Based MT

Statistical NLP Spring Machine Translation: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation Part 2, and the EM Algorithm

The decoder in statistical machine translation: how does it work?

Learning to translate with source and target syntax. David Chiang, USC Information Sciences Institute

Machine Translation and Advanced Topics on LSTMs

Topic 10. Multi-pitch Analysis

Information processing in high- and low-risk parents: What can we learn from EEG?

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

What can you tell about these films from this box plot? Could you work out the genre of these films?

CSE 101. Algorithm Design and Analysis Miles Jones Office 4208 CSE Building Lecture 9: Greedy

Normalization Methods for Two-Color Microarray Data

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

Semantic Research Methodology

Composer Style Attribution

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Scalable Semantic Parsing with Partial Ontologies ACL 2015

Musical Harmonization with Constraints: A Survey. Overview. Computers and Music. Tonal Music

Learning Word Meanings and Descriptive Parameter Spaces from Music. Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab


Draft Technical Requirements Version 4.3. October 20, 2017

Interactive Methods in Multiobjective Optimization 1: An Overview

February 16, 2007 Menéndez-Benito. Challenges/ Problems for Carlson 1977

Supervised Learning in Genre Classification

Optical Technologies Micro Motion Absolute, Technology Overview & Programming

An Inverse Evaluation of Netflix Architecture Using ATAM

methodology n 1 Using a dictionary

Design for Testability Part II

Analysis of MPEG-2 Video Streams

CSC 373: Algorithm Design and Analysis Lecture 17

Machine Translation: Challenges and Approaches

On Figure of Merit in PAM4 Optical Transmitter Evaluation, Particularly TDECQ

Test time metrics for TP2 waveforms

Computational Laughing: Automatic Recognition of Humorous One-liners

Discovery of frequent episodes in event sequences

Release Year Prediction for Songs

Midterm Examination II

AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION

Reproducibility Assessment of Independent Component Analysis of Expression Ratios from DNA microarrays.

The 46th. Anne McConnell Conference. Connecting with Characters Contest Early Readers Featuring Avi and Michael Hall. on Youth Literature.

Feature-Based Analysis of Haydn String Quartets

Natural Language Processing

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

SMART SYSTEM DUOX SYSTEM GUIDE

LESSON 2 Past Simple and Present perfect simple

SURVEYS FOR REFLECTIVE PRACTICE

PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY

Opinion Writing project Writing

Experiments with Fisher Data

Neural evidence for a single lexicogrammatical processing system. Jennifer Hughes

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition

Chasing the Ghosts of Ibsen: A computational stylistic analysis of drama in translation

Recap: Representation. Subtle Skeletal Differences. How do skeletons differ? Target Poses. Reference Poses

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Department of Electrical and Computer Engineering Mid-Term Examination Winter 2012

Code-aided Frame Synchronization

II. SYSTEM MODEL In a single cell, an access point and multiple wireless terminals are located. We only consider the downlink

Case analysis: An IoT energy monitoring system for a PV connected residence

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Design Principles and Practices. Cassini Nazir, Clinical Assistant Professor Office hours Wednesdays, 3-5:30 p.m. in ATEC 1.

CPU Bach: An Automatic Chorale Harmonization System

SIMULATION MODELING FOR QUALITY AND PRODUCTIVITY IN STEEL CORD MANUFACTURING

Section 6.8 Synthesis of Sequential Logic Page 1 of 8

The ACL Anthology Network Corpus. University of Michigan

Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx

Music Genre Classification

Video coding standards

At the Movies. Please watch this 2 min movie trailer (Disney Movie, Big Hero 6). Then you can talk about it while you are answering these questions.

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th

A Manual For Writing An Elementary Science & Engineering Fair Paper

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004

Visual Analytics for Linguists. Miriam Butt & Chris Culy ESSLII 2014, Introductory Course Tübingen

Authentication of Musical Compositions with Techniques from Information Theory. Benjamin S. Richards. 1. Introduction

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

A Discriminative Approach to Topic-based Citation Recommendation

Efficient Testing of Variant-Rich Software Systems

ENCYCLOPEDIA DATABASE

AUD 6306 Speech Science

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN ICED 05 MELBOURNE, AUGUST 15-18, 2005 GENERAL DESIGN THEORY AND GENETIC EPISTEMOLOGY

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

The MESA Machine: Ball Launch

ESA Ground Segment Technology Workshop 5-June-08. Ka band for Broadband and IPTV

An HPSG Account of Depictive Secondary Predicates and Free Adjuncts: A Problem for the Adjuncts-as-Complements Approach

A Super Fun French Project. Ma famille...et moi! Family-themed vocab. avoir+age etre adjective agreement sentence structure

SIMULATION OF PRODUCTION LINES INVOLVING UNRELIABLE MACHINES; THE IMPORTANCE OF MACHINE POSITION AND BREAKDOWN STATISTICS

Analyzing & Synthesizing Gamakas: a Step Towards Modeling Ragas in Carnatic Music

Temporal patterns of happiness and sarcasm detection in social media (Twitter)

Counter dan Register

A. Write a or an before each of these words. (1 x 1mark = 10 marks) St. Thomas More College Half Yearly Examinations February 2009

IEEE C a-02/26r1. IEEE Broadband Wireless Access Working Group <

Detecting Musical Key with Supervised Learning

A general framework for constructive learning preference elicitation in multiple criteria decision aid

CS 1674: Intro to Computer Vision. Face Detection. Prof. Adriana Kovashka University of Pittsburgh November 7, 2016

INSTRUCTIONS FOR COMPILATION OF THESIS/RESEARCH DISSERTATION

The Time Series Forecasting System Charles Hallahan, Economic Research Service/USDA, Washington, DC

Music Radar: A Web-based Query by Humming System

Transcription:

CSE 517 Natural Language Processing Winter 2013 Phrase Based Translation Luke Zettlemoyer Slides from Philipp Koehn and Dan Klein

Phrase-Based Systems Sentence-aligned corpus Word alignments cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Phrase table (translation model)

Phrase Translation Tables Defines the space of possible translations each entry has an associated probability One learned example, for den Vorschlag from Europarl data English φ(ē f) English φ(ē f) the proposal 0.6227 the suggestions 0.0114 s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal, 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159...... This table is noisy, has errors, and the entries do not necessarily match our linguistic intuitions about consistency.

Phrase-Based Decoding 7. Decoder design is important: [Koehn et al. 03]

Extracting Phrases We will use word alignments to find phrases Mary did not slap the green witch María no daba una bofetada a la bruja verde Question: what is the best set of phrases?

Extracting Phrases Phrase alignment must Contain at least two aligned words Contain all alignments for phrase pair witch Phrase Extraction Criteria Maria no daba Maria no daba Mary did not slap the green María no Maria no daba daba una bofetada a la bruja verde Mary Mary Mary did did did not slap not slap X not slap X consistent inconsistent inconsistent Extract all such phrase pairs!

Phrase Pair Extraction Example (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green) (Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the), (bruja verde, green witch) (Maria no daba una bofetada, Mary did not slap), (no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch) " (Maria no daba una bofetada a la, Mary did not slap the), (daba una bofetada a la bruja verde, slap the green witch)" (Maria no daba una bofetada a la bruja verde, Mary did not slap the green witch)" gnment induced p bofetada bruja Maria no daba una a la verde Mary did not slap the green witch

Phrases do help But they don t need to be long Why should this be? Phrase Size

Bidirectional Alignment

Alignment Heuristics

Phrase Scoring cats like fresh fish. g(f,e) = log aiment les chats. poisson le frais c(e, f) c(e). Learning weights has been tried, several times: [Marcu and Wong, 02] [DeNero et al, 06] and others Seems not to work well, for a variety of partially understood reasons Main issue: big chunks get all the weight, obvious priors don t help Though, [DeNero et al 08]

Scoring: Basic approach, sum up phrase translation scores and a language model Define y = p 1 p 2 p L to be a translation with phrase pairs p i Define e(y) be the output English sentence in y Let h() be the log probability under a tri-gram language model Let g() be a phrase pair score (from last slide) Then, the full translation score is: f(y) =h(e(y)) + k=1 Goal, compute the best translation L g(p k ) y (x) =arg max y Y(x) f(y)

The Pharaoh Decoder Scores at each step include LM and TM

Scoring: In practice, much like for alignment models, also include a distortion penalty Define y = p 1 p 2 p L to be a translation with phrase pairs p i Let s(p i ) be the start position of the foreign phrase Let t(p i ) be the end position of the foreign phrase Define η to be the distortion score (usually negative!) Then, we can define a score with distortion penalty: f(y) =h(e(y)) + L g(p k )+ k=1 L 1 k=1 Goal, compute the best translation y (x) =arg max y Y(x) f(y) η t(p k )+1 s(p k+1 )

Hypothesis Expansion Hypothesis Expansion Maria no dio dio una una bofetada bofetada a a la la bruja bruja verde verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch f: -------*- p:.182 e: slap... slap f: *-***---- p:.043 e: f: --------- p: 1 e: Mary f: *-------- p:.534 e: did not f: **------- p:.154 e: slap f: *****---- p:.015 e: the f: *******-- p:.004283 e:green witch f: ********* p:.000271 Start with empty hypothesis Add Further... e: until another hypothesis all English foreign hypothesis words words expansion covered f: find nobest foreign hypothesis words covered that covers all foreign words p: backtrack score 1 to read off translation

Hypothesis Explosion! Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch f: -------*- p:.182 e: slap f: *-***---- p:.043 e: f: --------- p: 1 e: Mary f: *-------- p:.534 e: did not f: **------- p:.154 e: slap f: *****---- p:.015 e: the f: *******-- p:.004283 e:green witch f: ********* p:.000271 Q: How much time to find the best translation? NP-hard, just like for word translation models So, we will use approximate search techniques!

Hypothesis Lattices

Pruning Problem: easy partial analyses are cheaper Solution 1: use separate beams per foreign subset Solution 2: estimate forward costs (A*-like) 1 2 3 4 5 6 on of hypothesis into queues

Tons of Data? Discussed for LMs, but can new understand full model!

Tuning for MT Features encapsulate lots of information Basic MT systems have around 6 features P(e f), P(f e), lexical weighting, language model How to tune feature weights? Idea 1: Use your favorite classifier

Why Tuning is Hard Problem 1: There are latent variables Alignments and segementations Possibility: forced decoding (but it can go badly)

Why Tuning is Hard Problem 2: There are many right answers The reference or references are just a few options No good characterization of the whole class BLEU isn t perfect, but even if you trust it, it s a corpus-level metric, not sentence-level

Why Tuning is Hard Problem 3: Computational constraints Discriminative training involves repeated decoding Very slow! So people tune on sets much smaller than those used to build phrase tables

Minimum Error Rate Training Standard method: minimize BLEU directly (Och 03) MERT is a discontinuous objective Only works for max ~10 features, but works very well then Here: k-best lists, but forest methods exist (Machery et al 08) Model Score

MERT Model Score BLEU Score

MERT