Statistical NLP Spring 2011 Machine Translation: Examples Lecture 7: Phrase-Based MT Dan Klein UC Berkeley Levels of Transfer World-Level MT: Examples la politique la haine. politics of hate. the policy of the hatred. nous avons signé le protocole. we did sign the memorandum of agreement. we have signed the protocol. où était le plan soli? but where was the solid plan? where was the economic base? Phrasal / Syntactic MT: Examples MT: Evaluation Human evaluations: subject measures, fluency/aquacy Automatic measures: n-gram match to references NIST measure: n-gram recall (worked poorly) BLEU: n-gram precision (no one really likes it, but everyone uses it) BLEU: P1 = unigram precision P2, P3, P4 = bi-, tri-, 4-gram precision Weighted geometric mean of P1-4 Brevity penalty (why?) Somewhat hard to game 1
Automatic Metrics Work (?) Corpus-Based MT Moling corresponnces between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See you around Machine translation system: Yo lo haré pronto Mol of translation I will do it soon I will do it around See you tomorrow Phrase-Based Systems Sentence-aligned corpus Word alignments cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Phrase table (translation mol) Many slis and examples from Philipp Koehn or John DeNero Phrase-Based Decoding The Pharaoh Mol 这 7 人中包括来自法国和俄罗斯的宇航员. [Koehn et al, 2003] Segmentation Translation Distortion Decor sign is important: [Koehn et al. 03] 2
The Pharaoh Mol Phrase Weights Where do we get these counts? Phrase-Based Decoding Monotonic Word Translation Cost is LM * TM It s an HMM? P(e e -1,e -2 ) P(f e) State inclus Exposed English Position in foreign Dynamic program loop? [. a slap, 5] 0.00001 [. slap to, 6] 0.00000016 [. slap by, 6] 0.00000001 for (econtext in allecontexts) score = scores[fposition-1][econtext] * LM(eContext+eOption) * TM(eOption, fword[fposition]) scores[fposition][econtext[2]+eoption] = max score Beam Decoding Phrase Translation For real MT mols, this kind of dynamic program is a disaster (why?) Standard solution is beam search: for each position, keep track of only the best k hypotheses for (econtext in bestecontexts[fposition]) score = scores[fposition-1][econtext] * LM(eContext+eOption) * TM(eOption, fword[fposition]) bestecontexts.maybeadd(econtext[2]+eoption, score) Still pretty slow why? Useful trick: cube pruning (Chiang 2005) If monotonic, almost an HMM; technically a semi-hmm for (lastposition < fposition) for (econtext in econtexts) combine hypothesis for (lastposition ending in econtext) with eoption If distortion now what? Example from David Chiang 3
Non-Monotonic Phrasal MT Pruning: Beams + Forward Costs Problem: easy partial analyses are cheaper Solution 1: use beams per foreign subset Solution 2: estimate forward costs (A*-like) The Pharaoh Decor Hypotheis Lattices Word Alignment Word Alignment x What is the anticipated cost of collecting fees unr the new proposal? En vertu s nouvelles propositions, quel est le coût prévu perception s droits? What is the anticipated cost of collecting fees unr the new proposal? z En vertu les nouvelles propositions, quel est le coût prévu perception les droits? 4
Unsupervised Word Alignment 1-to-Many Alignments Input: a bitext: pairs of translated sentences nous acceptons votre opinion. we accept your view. Output: alignments: pairs of translated words When words have unique sources, can represent as a (forward) alignment function a from French to English positions Many-to-Many Alignments IBM Mol 1 (Brown 93) Alignments: a hidn vector called an alignment specifies which English source is responsible for each French target word. IBM Mols 1/2 E: 1 2 3 4 5 6 7 8 9 Thank you, I shall do so gladly. A: 1 3 7 6 8 8 8 8 9 F: Gracias, lo haré muy buen grado. Mol Parameters Emissions: P( F1 = Gracias EA1 = Thank ) Transitions: P( A2 = 3) 5