Machine Translation: Examples. Statistical NLP Spring MT: Evaluation. Phrasal / Syntactic MT: Examples. Lecture 7: Phrase-Based MT

Similar documents
Statistical NLP Spring Machine Translation: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

CSE 517 Natural Language Processing Winter 2013

Machine Translation Part 2, and the EM Algorithm

Machine Translation and Advanced Topics on LSTMs

The decoder in statistical machine translation: how does it work?

Statistical Machine Translation Lecture 5. Decoding with Phrase-Based Models


Experiments with Fisher Data

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

Learning to translate with source and target syntax. David Chiang, USC Information Sciences Institute

Basic Natural Language Processing

Non-Fiction Terms for Constructed Response and Essay Analysis students will be expected to know, recognize and apply these concepts and terms to

NVLAP LAB CODE LM Test Report. For. EiKO Global, LLC. (Brand Name: EiKO) W. 84th St, Shawnee, KS USA

methodology n 1 Using a dictionary

Automatic Labelling of tabla signals

A Super Fun French Project. Ma famille...et moi! Family-themed vocab. avoir+age etre adjective agreement sentence structure

Detecting Attempts at Humor in Multiparty Meetings

Supervised Learning of Complete Morphological Paradigms

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

Mixed Linear Models. Case studies on speech rate modulations in spontaneous speech. LSA Summer Institute 2009, UC Berkeley

LM Test Report. For. GREEN LOGIC LED ELECTRICAL SUPPLY INC (Brand Name: GLLUSA) Fuel Pump Canopy Luminaires

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Real-Time Spectrogram (RTS tm )

Pre-Translation for Neural Machine Translation

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

A Dominant Gene Genetic Algorithm for a Substitution Cipher in Cryptography

Year 3 French Revision Pack Mr Hempsted and Mme Chevalley

Note to Mr. Lopes LETTERS FOR THE SECRETARY-GENERAL

Generating Chinese Classical Poems Based on Images

关于台词的备注 : 请注意这不是广播节目的逐字稿件 本文稿可能没有体现录制 编辑过程中对节目做出的改变

Instructions to Authors

Backside Circuit Edit on Full-Thickness Silicon Devices

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

NVLAP LAB CODE LM Test Report. For. LIGHT EFFICIENT DESIGN (Brand Name:N/A) 188 S. Northwest Highway Cary, IL

Indexing local features and instance recognition

NVLAP LAB CODE LM Test Report. For. LIGHT EFFICIENT DESIGN (Brand Name:N/A) 188 S. Northwest Highway Cary, IL

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

EDITING COMPUTER ANIMATION BY COMPUTER. TOM BRITTON Un i versi ty of Western ontario London, Ontario ABSTRACT

Lesson 9 - When and Where Do You Want to Go?

Listen to the following text and repeat out loud after each sentence. Pay particular attention to the sounds ou: nous bonjour.

50 Gb/s per lane MMF objectives. IEEE 50G & NGOATH Study Group January 2016, Atlanta, GA Jonathan King, Finisar

10GBASE-R Test Patterns

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

Computer Assisted Melo-rhythmic Generation of Traditional Chinese Music from Ink Brush Calligraphy

Common assumptions in color characterization of projectors

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Connectionist Language Processing. Lecture 12: Modeling the Electrophysiology of Language II

Tamar Sovran Scientific work 1. The study of meaning My work focuses on the study of meaning and meaning relations. I am interested in the duality of

Life%with%a%French%Twist%

Preparing for remote data collection at NE-CAT

US_Math 4 1. Operations and Algebraic Thinking 4.OA 2. Number and Operations in Base Ten 4.NBT 3. Number and Operations - Fractions 4.

Introduction to NLP. Ruihong Huang Texas A&M University. Some slides adapted from slides by Dan Jurafsky, Luke Zettlemoyer, Ellen Riloff

Normalization Methods for Two-Color Microarray Data

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

Automatic Compositor Attribution in the First Folio of Shakespeare

NVLAP LAB CODE LM Test Report. For GREEN INOVA LIGHTING TECHNOLOGY (SHENZHEN) LTD. (Brand Name: GI LED LIGHTING)

NVLAP LAB CODE LM Test Report. For LED PANEL LIGHTING CO.,LTD. (Brand Name: N/A)

DECORATIVE HOME FURNISHING FABRICS

NVLAP LAB CODE LM Test Report. For CE INNOVATIONS LTD. (Brand Name: IRICO) 911 Denison St Markham, ON L3R 3K4 Canada

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

The ACL Anthology Network Corpus. University of Michigan

Journal of Field Robotics. Instructions to Authors

LED Floodlight RoHS. Model: inner box: L273*W240*H165mm master carton: L549*W485*H175mm 4PCS

Exploring Architecture Parameters for Dual-Output LUT based FPGAs

A a I i. Write. Name. Building with Dad. Handwriting Letters A,a and I,i: Words with a and i 401

How to Optimize Ad-Detective

Introduction to NLP. Ruihong Huang Texas A&M University. Some slides adapted from slides by Dan Jurafsky, Luke Zettlemoyer, Ellen Riloff

HD Review March 30, 2011 Franz Klein

STRING QUARTET CLASSIFICATION WITH MONOPHONIC MODELS

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Instructions to the Authors

NVLAP LAB CODE LM Test Report. For DONGGUAN THAILIGHT SEMICONDCTOR LIGHTING CO.,LTD

What is music as a cognitive ability?

Supervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing

3DTV: Technical Challenges for Realistic Experiences

Less is More: Picking Informative Frames for Video Captioning

Language and Mind Prof. Rajesh Kumar Department of Humanities and Social Sciences Indian Institute of Technology, Madras

INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN ICED 05 MELBOURNE, AUGUST 15-18, 2005 GENERAL DESIGN THEORY AND GENETIC EPISTEMOLOGY

Sentence Processing III. LIGN 170, Lecture 8

GENERAL WRITING FORMAT

Scan. This is a sample of the first 15 pages of the Scan chapter.

Introduction to NLP. What is Natural Language Processing?

. _ FOR IMMEDIATE RELEASE JANUARY 23, 1970 OFFICE OF THE WHITE HOUSE PRESS SECRETARY THE WHITE HOUSE

Solid State Lighting Annex: Product Quality and Performance Tiers

THE STRATHMORE LAW REVIEW EDITORIAL POLICY AND STYLE GUIDE

Conditional Probability and Bayes

Measuring Radio Network Performance

EE241 - Spring 2005 Advanced Digital Integrated Circuits

Formalizing Irony with Doxastic Logic

Regression Model for Politeness Estimation Trained on Examples

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Advanced Coding and Modulation Schemes for Broadband Satellite Services. Commercial Requirements

Registers and Counters

Digital Video Cassette Recorder DNW-75

Fieldbus Testing with Online Physical Layer Diagnostics

TRACKING THE ODD : METER INFERENCE IN A CULTURALLY DIVERSE MUSIC CORPUS

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

AutoChorale An Automatic Music Generator. Jack Mi, Zhengtao Jin

Switching Solutions for Multi-Channel High Speed Serial Port Testing

Transcription:

Statistical NLP Spring 2011 Machine Translation: Examples Lecture 7: Phrase-Based MT Dan Klein UC Berkeley Levels of Transfer World-Level MT: Examples la politique la haine. politics of hate. the policy of the hatred. nous avons signé le protocole. we did sign the memorandum of agreement. we have signed the protocol. où était le plan soli? but where was the solid plan? where was the economic base? Phrasal / Syntactic MT: Examples MT: Evaluation Human evaluations: subject measures, fluency/aquacy Automatic measures: n-gram match to references NIST measure: n-gram recall (worked poorly) BLEU: n-gram precision (no one really likes it, but everyone uses it) BLEU: P1 = unigram precision P2, P3, P4 = bi-, tri-, 4-gram precision Weighted geometric mean of P1-4 Brevity penalty (why?) Somewhat hard to game 1

Automatic Metrics Work (?) Corpus-Based MT Moling corresponnces between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See you around Machine translation system: Yo lo haré pronto Mol of translation I will do it soon I will do it around See you tomorrow Phrase-Based Systems Sentence-aligned corpus Word alignments cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Phrase table (translation mol) Many slis and examples from Philipp Koehn or John DeNero Phrase-Based Decoding The Pharaoh Mol 这 7 人中包括来自法国和俄罗斯的宇航员. [Koehn et al, 2003] Segmentation Translation Distortion Decor sign is important: [Koehn et al. 03] 2

The Pharaoh Mol Phrase Weights Where do we get these counts? Phrase-Based Decoding Monotonic Word Translation Cost is LM * TM It s an HMM? P(e e -1,e -2 ) P(f e) State inclus Exposed English Position in foreign Dynamic program loop? [. a slap, 5] 0.00001 [. slap to, 6] 0.00000016 [. slap by, 6] 0.00000001 for (econtext in allecontexts) score = scores[fposition-1][econtext] * LM(eContext+eOption) * TM(eOption, fword[fposition]) scores[fposition][econtext[2]+eoption] = max score Beam Decoding Phrase Translation For real MT mols, this kind of dynamic program is a disaster (why?) Standard solution is beam search: for each position, keep track of only the best k hypotheses for (econtext in bestecontexts[fposition]) score = scores[fposition-1][econtext] * LM(eContext+eOption) * TM(eOption, fword[fposition]) bestecontexts.maybeadd(econtext[2]+eoption, score) Still pretty slow why? Useful trick: cube pruning (Chiang 2005) If monotonic, almost an HMM; technically a semi-hmm for (lastposition < fposition) for (econtext in econtexts) combine hypothesis for (lastposition ending in econtext) with eoption If distortion now what? Example from David Chiang 3

Non-Monotonic Phrasal MT Pruning: Beams + Forward Costs Problem: easy partial analyses are cheaper Solution 1: use beams per foreign subset Solution 2: estimate forward costs (A*-like) The Pharaoh Decor Hypotheis Lattices Word Alignment Word Alignment x What is the anticipated cost of collecting fees unr the new proposal? En vertu s nouvelles propositions, quel est le coût prévu perception s droits? What is the anticipated cost of collecting fees unr the new proposal? z En vertu les nouvelles propositions, quel est le coût prévu perception les droits? 4

Unsupervised Word Alignment 1-to-Many Alignments Input: a bitext: pairs of translated sentences nous acceptons votre opinion. we accept your view. Output: alignments: pairs of translated words When words have unique sources, can represent as a (forward) alignment function a from French to English positions Many-to-Many Alignments IBM Mol 1 (Brown 93) Alignments: a hidn vector called an alignment specifies which English source is responsible for each French target word. IBM Mols 1/2 E: 1 2 3 4 5 6 7 8 9 Thank you, I shall do so gladly. A: 1 3 7 6 8 8 8 8 9 F: Gracias, lo haré muy buen grado. Mol Parameters Emissions: P( F1 = Gracias EA1 = Thank ) Transitions: P( A2 = 3) 5