The decoder in statistical machine translation: how does it work?

Similar documents
CSE 517 Natural Language Processing Winter 2013

Machine Translation: Examples. Statistical NLP Spring MT: Evaluation. Phrasal / Syntactic MT: Examples. Lecture 7: Phrase-Based MT

Statistical NLP Spring Machine Translation: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Olly Richards. I Will Teach You A Language COPYRIGHT 2016 OLLY RICHARDS ALL RIGHTS RESERVED


Statistical Machine Translation Lecture 5. Decoding with Phrase-Based Models

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

BayesianBand: Jam Session System based on Mutual Prediction by User and System

Algorithmic Music Composition

A Framework for Segmentation of Interview Videos

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Audio Compression Technology for Voice Transmission

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Source/Receiver (SR) Setup

Post-Routing Layer Assignment for Double Patterning

The complexity of classical music networks

Machine Translation and Advanced Topics on LSTMs

Chapter 10 Basic Video Compression Techniques

A repetition-based framework for lyric alignment in popular songs

Video coding standards

Chapter 2 Introduction to

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

Audio Feature Extraction for Corpus Analysis

MTL Software. Overview

Optimum Frame Synchronization for Preamble-less Packet Transmission of Turbo Codes

E X P E R I M E N T 1

Analysis of MPEG-2 Video Streams

methodology n 1 Using a dictionary

A Discriminative Approach to Topic-based Citation Recommendation

Supplementary Course Notes: Continuous vs. Discrete (Analog vs. Digital) Representation of Information

DCI Requirements Image - Dynamics

On Figure of Merit in PAM4 Optical Transmitter Evaluation, Particularly TDECQ

Motion Video Compression

Adaptive Key Frame Selection for Efficient Video Coding

Creating Mindmaps of Documents

Xpress-Tuner User guide

Introduction To LabVIEW and the DSP Board

Code-aided Frame Synchronization

FLUX-CiM: Flexible Unsupervised Extraction of Citation Metadata

RECOMMENDATION ITU-R BT.1203 *

Level 3 French, 2013

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Reproducibility Assessment of Independent Component Analysis of Expression Ratios from DNA microarrays.

Sharif University of Technology. SoC: Introduction

Negative sentence structures

Music Genre Classification and Variance Comparison on Number of Genres

Cascading Citation Indexing in Action *

Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

FACULTY OF LANGUAGES

Experiments with Fisher Data

December 2018 Language and cultural workshops In-between session workshops à la carte December weeks All levels

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Recent APS Storage Ring Instrumentation Developments. Glenn Decker Advanced Photon Source Beam Diagnostics March 1, 2010

Copy these 2 verbs into your book:

Digging Deeper, Reaching Further. Module 1: Getting Started

VLSI System Testing. BIST Motivation

CHAPTER-9 DEVELOPMENT OF MODEL USING ANFIS

Machine Translation Part 2, and the EM Algorithm

Double Patterning OPC and Design for 22nm to 16nm Device Nodes

Hardware Implementation of Viterbi Decoder for Wireless Applications

DVB-T2 Transmission System in the GE-06 Plan

Descriptive vocabulary: Il/Elle a les cheveux courts/longs. Descriptive vocabulary: Il/Elle a les yuex bleus. Nationalities: francais(e), canadien(ne)

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

Error performance objective for 400GbE

MelTS. Melody Translation System. Nicole Limtiaco Univ. of Pennsylvania Philadelphia, PA

Recommending Citations: Translating Papers into References

ÉCOLE DE TECHNOLOGIE SUPÉRIEURE UNIVERSITÉ DU QUÉBEC GUIDELINES FOR WRITING A PROJECT REPORT, DISSERTATION OR THESIS

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices

Figures in Scientific Open Access Publications

Mixed Effects Models Yan Wang, Bristol-Myers Squibb, Wallingford, CT

Experiment 13 Sampling and reconstruction

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems

Digital Video Telemetry System

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

Composer Style Attribution

Time & Citation Networks 1

Research on sampling of vibration signals based on compressed sensing

Pre-Translation for Neural Machine Translation

Bar Codes to the Rescue!

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Using the MAX3656 Laser Driver to Transmit Serial Digital Video with Pathological Patterns

Chapter 12. Synchronous Circuits. Contents

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington

Performance of a Low-Complexity Turbo Decoder and its Implementation on a Low-Cost, 16-Bit Fixed-Point DSP

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

CONDITIONER TERMINAL BLOCK 8 ISOLATED ANALOG INPUTS STB 582

LEARN FRENCH BY PODCAST

Optical Technologies Micro Motion Absolute, Technology Overview & Programming

DICOM Correction Proposal

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

The H.26L Video Coding Project

DVB-T and DVB-H: Protocols and Engineering

Projektseminar: Sentimentanalyse Dozenten: Michael Wiegand und Marc Schulder

Pitfalls and Windfalls in Corpus Studies of Pop/Rock Music

arxiv: v1 [cs.sd] 13 Sep 2017

arxiv: v1 [cs.ai] 2 Mar 2017

Transcription:

The decoder in statistical machine translation: how does it work? Alexandre Patry RALI/DIRO Université de Montréal June 20, 2006 Alexandre Patry (RALI) The decoder in SMT June 20, 2006 1 / 42

Machine translation The goal of machine translation is the creation of a system that will translate a document without human intervention. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 2 / 42

Machine translation The goal of machine translation is the creation of a system that will translate a document without human intervention. Some paradigms have been proposed to resolve this problem: Symbolic translation Human experts encode their knowledge in the system. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 2 / 42

Machine translation The goal of machine translation is the creation of a system that will translate a document without human intervention. Some paradigms have been proposed to resolve this problem: Symbolic translation Human experts encode their knowledge in the system. Example-based translation Knowledge is acquired from a bilingual text (bitext) using basic statistics (similar to learning by analogy). Alexandre Patry (RALI) The decoder in SMT June 20, 2006 2 / 42

Machine translation The goal of machine translation is the creation of a system that will translate a document without human intervention. Some paradigms have been proposed to resolve this problem: Symbolic translation Human experts encode their knowledge in the system. Example-based translation Knowledge is acquired from a bilingual text (bitext) using basic statistics (similar to learning by analogy). Statistical machine translation Knowledge is acquired on a bitext using statistics. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 2 / 42

Statistical machine translation In statistical machine translation, we try to resolve two problems: Modeling Acquisition and type of knowledges. Decoding Usage of the knowledge to translate a new document. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 3 / 42

Statistical machine translation In statistical machine translation, we try to resolve two problems: Modeling Acquisition and type of knowledges. Decoding Usage of the knowledge to translate a new document. This presentation focuses on the second problem, the one addressed by the decoder. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 3 / 42

Overview 1 The traveler s decoder 2 Conceptual framework 3 mood 4 Implementing a phrase-based decoder 5 Experiments 6 Conclusion Alexandre Patry (RALI) The decoder in SMT June 20, 2006 4 / 42

Little story A French speaking traveler equipped with a bilingual dictionary enters in a New-York store. While reviewing the price chart, he encounters a line that he does not understand: A sheet of paper........ 0.25$ We will look at a process this traveler could use to decode this strange sentence. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 5 / 42

Resources inventory Sentence to translate A sheet of paper Bilingual dictionary source target source target A Un of de A Une of du sheet feuille paper papier sheet drap The traveler s common sense The traveller can intuitively evaluate the likelihood of a french sentence. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 6 / 42

The traveler s decoder With these resources in hand, the traveler can use the following algorithm to translate the sentence one word at a time: 1 initialize the set of candidate translations H with an empty sentence. 2 while there are incomplete sentences in H 1 pick the least completed translation h from H 2 for each possible translation δ for the next word to translate in h 1 append δ at the end of h and store the result in h copy 2 if h copy is likely following the traveler s intuition, add it to H 3 return the best candidate in H Alexandre Patry (RALI) The decoder in SMT June 20, 2006 7 / 42

Search graph A sheet of paper Alexandre Patry (RALI) The decoder in SMT June 20, 2006 8 / 42

Search graph A sheet of paper The traveler concludes that the most likely translation is Une feuille de papier. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 8 / 42

Search space complexity A sentence containing 10 words having each 5 translations can be translated by more than 9 millions target sentences and the corresponding search graph have more than 12 millions vertices. 10 translations = 5 10 vertices = i=0 5 i Alexandre Patry (RALI) The decoder in SMT June 20, 2006 9 / 42

Search space complexity A sentence containing 10 words having each 5 translations can be translated by more than 9 millions target sentences and the corresponding search graph have more than 12 millions vertices. 10 translations = 5 10 vertices = i=0 5 i If we allow word reordoring, the same sentence will have more than 35,000 billions translations and its search graph will contain more than 43,000 billions vertices. 10 ( ) 10 translations = 10!5 10 vertices = i!5 i i i=0 Alexandre Patry (RALI) The decoder in SMT June 20, 2006 9 / 42

Some mathematics A decoder searches the target document t having the highest probability to translate a given source document s: t = argmax Pr(t s) t T Alexandre Patry (RALI) The decoder in SMT June 20, 2006 10 / 42

Some mathematics A decoder searches the target document t having the highest probability to translate a given source document s: t = argmax Pr(t s) t T This equation is hard to resolve, it requires all possible target documents to be evaluated! Alexandre Patry (RALI) The decoder in SMT June 20, 2006 10 / 42

Some Greek t = argmax t T Pr(t s) Can we translate a source document one step at a time? Alexandre Patry (RALI) The decoder in SMT June 20, 2006 11 / 42

Some Greek t = argmax t T Pr(t s) Can we translate a source document one step at a time? t = argmax t T δ n 1 (s,t) Pr(t, δ n 1 s) where (s, t) is a set containing all the sequences of transformations that can be applied to an initial target sentence to translate s by t. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 11 / 42

More Greek t = argmax t T δ n 1 (s,t) Pr(t, δ n 1 s) This equation is still hard to resolve, we thus redefine the problem: ˆt = argmax t T max Pr(t, δ1 n (s,t) δn 1 s) Alexandre Patry (RALI) The decoder in SMT June 20, 2006 12 / 42

More Greek t = argmax t T δ n 1 (s,t) Pr(t, δ n 1 s) This equation is still hard to resolve, we thus redefine the problem: ˆt = argmax t T max Pr(t, δ1 n (s,t) δn 1 s) Alexandre Patry (RALI) The decoder in SMT June 20, 2006 12 / 42

Sum vs max The more likely translation is Une feuille de papier (0.1 + 0.35 = 0.45 > 0.4). Alexandre Patry (RALI) The decoder in SMT June 20, 2006 13 / 42

Sum vs max The more likely translation is Une feuille de papier (0.1 + 0.35 = 0.45 > 0.4). The more likely sequence of transformations leads to the target sentence Un drap de papier (0.4 > 0.2 and 0.4 > 0.35). We can t win all the time! Alexandre Patry (RALI) The decoder in SMT June 20, 2006 13 / 42

Simplification Most decoders assume that the sentences of a document are independents one from each others. The decoder can thus translate each sentence individually. Shortcomings: A sentence cannot be omitted, merged with another one, repositioned or sliced by the decoder. The context of a sentence is not considered when it is translated. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 14 / 42

The decoder s task The task of the decoder is to use its knowledge and a density function to find the best sequence of transformations that can be applied to an initial target sentence to translate a given source sentence. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 15 / 42

The decoder s task The task of the decoder is to use its knowledge and a density function to find the best sequence of transformations that can be applied to an initial target sentence to translate a given source sentence. This problem can be reformulated as a classic AI problem: searching for the shortest path in an implicit graph. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 15 / 42

Challenges Two independent problems must be resolved in order to build a decoder: Model representation The model defines what a transformation is and how to evaluate the quality of a translation. Search space exploration Enumerating all possible sequences of transformations is often impracticable, we must smartly select the ones that will be evaluated. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 16 / 42

Model representation Partial translation The partial translation is a translation that is being transformed. It is composed of: the source sentence the target sentence that is being built a progress indicator that shows how to continue this translation The source and target sentences can be sequences of words, trees, non-contiguous sentences,... Alexandre Patry (RALI) The decoder in SMT June 20, 2006 17 / 42

Model representation Partial translation The partial translation is a translation that is being transformed. It is composed of: the source sentence the target sentence that is being built a progress indicator that shows how to continue this translation The source and target sentences can be sequences of words, trees, non-contiguous sentences,... Example In the traveler s decoder, a partial translation could be: source A sheet of paper target Une feuille progress indicator The next word to translate is the third one. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 17 / 42

Model representation Transformation A partial translation evolves when a transformation is applied to it. A transformation can take many forms: translation of one word translation of many words reordering of the children of a node in a tree Alexandre Patry (RALI) The decoder in SMT June 20, 2006 18 / 42

Model representation Transformation A partial translation evolves when a transformation is applied to it. A transformation can take many forms: Example translation of one word translation of many words reordering of the children of a node in a tree In the traveler s decoder, an example transformation could be: Add the word feuille at the end of the target sentence and update the progress indicator. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 18 / 42

Model representation Cost A cost quantify the quality of a partial translation. Usually, it evaluates at least: the likelihood of the transformations the fluency of the target sentence generated so far the word reordering that occurred The cost is used to identify the partial translations to dismiss and to select the best complete translation when the search ends. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 19 / 42

Model representation Cost A cost quantify the quality of a partial translation. Usually, it evaluates at least: the likelihood of the transformations the fluency of the target sentence generated so far the word reordering that occurred The cost is used to identify the partial translations to dismiss and to select the best complete translation when the search ends. Example In the traveler s decoder, the cost was the common sens of the traveler. It allowed him to dismiss unlikely partial translations like Un feuille and to prefer Une feuille de papier to Un drap de papier. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 19 / 42

Model representation Transformation generator The transformation generator takes as input a partial translation and outputs the set of transformations that can be applied to it. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 20 / 42

Model representation Transformation generator The transformation generator takes as input a partial translation and outputs the set of transformations that can be applied to it. Example In the traveler s decoder, the transformation generator indicates that the partial translation Une feuille can be transformed to Une feuille du or Une feuille de. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 20 / 42

Search space exploration Hypothesis A hypothesis is made of a partial translation and of a cost. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 21 / 42

Search space exploration Search strategy The task of the search strategy is two-fold: Example Deciding the order in which the hypotheses are explored. Identify the hypotheses to dismiss (using the value of the cost). In the traveler s decoder, the search strategy was a breadth-first search where the unlikely hypotheses were dismissed. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 22 / 42

Putting it all together Each vertex is a hypothesis (partial translation and cost) Each edge corresponds to a transformation. The transformation generator enumerates the out-edges of each vertex. The search strategy defines how to explore the graph. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 23 / 42

mood What is mood? An acronym for Modular Object-Oriented Decoder. An architecture decomposing a decoder in six reusable modules. A C++ object-oriented framework to create decoders. A project that is freely available (as in speech and as in beer) under the GPL license. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 24 / 42

mood What is mood? An acronym for Modular Object-Oriented Decoder. An architecture decomposing a decoder in six reusable modules. A C++ object-oriented framework to create decoders. A project that is freely available (as in speech and as in beer) under the GPL license. Why mood? To ease the development of new decoders. To give us a tool for research in statistical machine translation. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 24 / 42

Big picture of mood Model Cost PartialTranslation Transformation TransformationGenerator Hypothesis :Cost :Transformation :Translation :TransformationGenerator :Hypothesis SearchStrategy Search Alexandre Patry (RALI) The decoder in SMT June 20, 2006 25 / 42

ramses To see if mood can be used with success, we used it to create ramses, a new implementation of pharaoh (Koehn,2004), a popular state of the art decoder. pharaoh uses a phrase-based model, one transformation can translate a sequence of contiguous words. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 26 / 42

ramses To see if mood can be used with success, we used it to create ramses, a new implementation of pharaoh (Koehn,2004), a popular state of the art decoder. pharaoh uses a phrase-based model, one transformation can translate a sequence of contiguous words. Example A phrase-based model can have rules like: red herring distraction house of commons chambre des communes Alexandre Patry (RALI) The decoder in SMT June 20, 2006 26 / 42

ramses Partial translation A partial translation is made of: source a sequence of words target a sequence of words progress indicator a mask indicating the words that have been translated so far and the position of the next word to translate for the translation to be monotone. Example source progress target what a wonderful world 1101, next=5 quel monde Alexandre Patry (RALI) The decoder in SMT June 20, 2006 27 / 42

ramses Transformation A transformation is composed of a rule and of the position at which this rule applies. A rule translates a sequence of source words by a sequence of target words with a certain probability. Example rule what a quel with probability 0.3 position 1 Alexandre Patry (RALI) The decoder in SMT June 20, 2006 28 / 42

ramses Cost The cost that is used by ramses is a weighted sum of: Sum of the log-probability of the rules applied so far Evaluates the likelihood of the transformation sequence. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 29 / 42

ramses Cost The cost that is used by ramses is a weighted sum of: Sum of the log-probability of the rules applied so far Evaluates the likelihood of the transformation sequence. Language model Evaluates the fluency of the target sentence. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 29 / 42

ramses Cost The cost that is used by ramses is a weighted sum of: Sum of the log-probability of the rules applied so far Evaluates the likelihood of the transformation sequence. Language model Evaluates the fluency of the target sentence. Distortion Penalize the word reordering that takes place between the source and the target sentence. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 29 / 42

ramses Cost The cost that is used by ramses is a weighted sum of: Sum of the log-probability of the rules applied so far Evaluates the likelihood of the transformation sequence. Language model Evaluates the fluency of the target sentence. Distortion Penalize the word reordering that takes place between the source and the target sentence. Length penalty Control the length of the generated target sentences. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 29 / 42

ramses Cost The cost that is used by ramses is a weighted sum of: Sum of the log-probability of the rules applied so far Evaluates the likelihood of the transformation sequence. Language model Evaluates the fluency of the target sentence. Distortion Penalize the word reordering that takes place between the source and the target sentence. Length penalty Control the length of the generated target sentences. Heuristic Estimates the cost needed to complete the translation. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 29 / 42

ramses Transformation generator The transformation generator returns all the transformations that translates a sequence of words that have not been already translated. We can restrict the search space by limiting the number of source words that can be skipped between two consecutive transformations. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 30 / 42

ramses Transformation generator Example With the following partial translation: source what a wonderful world progress 1100, next=3 target quel The transformation generator could return: rule wonderful merveilleux with probability 0.3 position 3 Alexandre Patry (RALI) The decoder in SMT June 20, 2006 31 / 42

ramses Transformation generator Example With the following partial translation: source what a wonderful world progress 1100, next=3 target quel The transformation generator could return: rule wonderful merveilleux with probability 0.3 position 3 rule wonderful splendide with probabilité 0.1 position 3 Alexandre Patry (RALI) The decoder in SMT June 20, 2006 31 / 42

ramses Transformation generator Example With the following partial translation: source what a wonderful world progress 1100, next=3 target quel The transformation generator could return: rule wonderful merveilleux with probability 0.3 position 3 rule wonderful splendide with probabilité 0.1 position 3 rule world monde with probability 0.7 position 4 Alexandre Patry (RALI) The decoder in SMT June 20, 2006 31 / 42

ramses Search strategy ramses uses a beam search strategy: There are N + 1 stacks of hypotheses (where N is the number of source words) An hypothesis where x words are already translated is stored in the xth stack. Stacks are visited in order. Each stack is pruned independently from the others....... Hj Hl H1 Hi Hk... 0 word translated 1 word translated 2 words translated N words translated... Hz Alexandre Patry (RALI) The decoder in SMT June 20, 2006 32 / 42

WMT 06 (Koehn and Monz, 06) Europarl corpus (Koehn 05) 6 translation directions: (fr,es,de) en http://www.statmt.org/wmt06 corpus nb. of sentence pairs train 700 000 dev 500 (of 2000) test 2 000 an open setting for testing new ideas and fairly compare different translation systems Alexandre Patry (RALI) The decoder in SMT June 20, 2006 33 / 42

A pairwise comparison of ramses and pharaoh Same language and translation models (obtained using SRILM, GIZA++ and the tools available at http://www.statmt.org) Same function to maximize: a weighted sum of 8 features λ lp length penalty + λ lm language model + λ d distortion+ 5 λ i ith translation table score i=1 Separate tuning of the 8 coefficients using (Och, 2003) (minimization of the bleu score on the dev corpus using a smart grid search) Automatic evaluation using bleu Alexandre Patry (RALI) The decoder in SMT June 20, 2006 34 / 42

bleu score The translations produced by ramses and pharaoh were evaluated using the bleu score: 4 1 bleu = BP exp( 4 log p n) n=1 { 1 si c r BP = exp(1 r/c) si c > r where c is the number of words in the target document r is the number of words in the reference document p n is the ratio of target n-grams that are shared with the reference. The bleu score is a value between 0 and 1 and a higher score is better. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 35 / 42

bleu results Décodeur bleu pharaoh 25.15 ramses 24.49 pharaoh 30.65 ramses 30.48 pharaoh 30.42 ramses 30.43 pharaoh 18.03 ramses 18.14 pharaoh 29.40 ramses 28.75 pharaoh 30.96 ramses 31.79 German English Spanish English French English English allemand English Spanish English French Alexandre Patry (RALI) The decoder in SMT June 20, 2006 36 / 42

bleu results Décodeur bleu p 1 p 2 p 3 p 4 BP German English pharaoh 25.15 61.19 31.32 18.53 11.61 0.99 ramses 24.49 61.06 30.75 17.73 10.81 1.00 Spanish English pharaoh 30.65 64.10 36.52 23.70 15.91 1.00 ramses 30.48 64.08 36.30 23.52 15.76 1.00 French English pharaoh 30.42 64.28 36.45 23.39 15.64 1.00 ramses 30.43 64.58 36.59 23.54 15.73 0.99 English allemand pharaoh 18.03 52.77 22.70 12.45 7.25 0.99 ramses 18.14 53.38 23.15 12.75 7.47 0.98 English Spanish pharaoh 29.40 61.86 35.32 22.77 15.02 1.00 ramses 28.75 62.23 35.03 22.32 14.58 0.99 English French pharaoh 30.96 61.10 36.56 24.49 16.80 1.00 ramses 31.79 61.57 37.38 25.30 17.53 1.00 Alexandre Patry (RALI) The decoder in SMT June 20, 2006 36 / 42

Return on WMT 06 Principal results highlighted in WMT 06: The baseline phrase-based system is not that far from the best systems. The quality of the translations produced by SMT systems clearly drops when translating out-of-domain corpora. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 37 / 42

Discussion mood can be use to create real life decoders. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 38 / 42

Discussion mood can be use to create real life decoders. If the features of pharaoh suit your needs, then pharaoh is preferred to ramses. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 38 / 42

Discussion mood can be use to create real life decoders. If the features of pharaoh suit your needs, then pharaoh is preferred to ramses. ramses and mood are good contenders for research. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 38 / 42

Conclusion In brief: A decoder search for the best sequence of transformations that translates a source sentence. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 39 / 42

Conclusion In brief: A decoder search for the best sequence of transformations that translates a source sentence. A decoder can be divided in two independent parts : Alexandre Patry (RALI) The decoder in SMT June 20, 2006 39 / 42

Conclusion In brief: A decoder search for the best sequence of transformations that translates a source sentence. A decoder can be divided in two independent parts : A model representation (transformations, partial translations, cost and transformation generator) Alexandre Patry (RALI) The decoder in SMT June 20, 2006 39 / 42

Conclusion In brief: A decoder search for the best sequence of transformations that translates a source sentence. A decoder can be divided in two independent parts : A model representation (transformations, partial translations, cost and transformation generator) A search strategy that defines the order in which the hypotheses are explored and that defines a pruning strategy. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 39 / 42

Conclusion In brief: A decoder search for the best sequence of transformations that translates a source sentence. A decoder can be divided in two independent parts : A model representation (transformations, partial translations, cost and transformation generator) A search strategy that defines the order in which the hypotheses are explored and that defines a pruning strategy. mood is a modular open source framework that can be used to implement new decoders. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 39 / 42

Conclusion In brief: A decoder search for the best sequence of transformations that translates a source sentence. A decoder can be divided in two independent parts : A model representation (transformations, partial translations, cost and transformation generator) A search strategy that defines the order in which the hypotheses are explored and that defines a pruning strategy. mood is a modular open source framework that can be used to implement new decoders. ramses provides as good translations as pharaoh, but is open source. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 39 / 42

Future works Research: We can probably do better than a weighted sum. See how the context of a sentence can be used. Phrase-based models overfit, see if we can do better. Future work for mood: Write a programmer manual. Add new decoders to mood. Speed up ramses. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 40 / 42

ramses in action The decoder in statistical machine translation: how does it work? A statistical machine translation system translates a source document by the target document having the highest probability to translate it. Such a system is made of a model and of a decoder. The model computes the probability that a document translates another one and the decoder uses the model to find the target document having the highest probability to translate a source document. In this presentation, I will explain how a state-of-the-art decoder for a phrase-based model works and I will present MOOD, a framework to develop such a decoder Dans le décodeur statistique machine traduction : comment cela se passe-t-il? Un système statistique machine traduction se traduit par l objectif d avoir la plus haute probabilité de traduire une source document document. Un tel système est faite d un modèle et d un décodeur. Le modèle computes la probabilité qu un document traduit une autre et le décodeur utilise le modèle à l objectif d avoir la plus haute probabilité de traduire une source document document. Dans cette présentation, je vais vous expliquer comment une pointe d un décodeur phrase-based modèle fonctionne et je présenterai humeur, un cadre à développer un tel décodeur. Alexandre Patry (RALI) The decoder in SMT June 20, 2006 41 / 42

Thank you! http://smtmood.sourceforge.net Alexandre Patry (RALI) The decoder in SMT June 20, 2006 42 / 42