Comparative Study of Word Alignment Heuristics and Phrase-Based SMT

Similar documents
A Genetic Programming Framework for Error Recovery in Robotic Assembly Systems

Handout #5. Introduction to the Design of Experiments (DOX) (Reading: FCDAE, Chapter 1~3)

Recognizing Names in Biomedical Texts using Hidden Markov Model and SVM plus Sigmoid

Music Performer Recognition Using an Ensemble of Simple Classifiers

Exploiting the Marginal Profits of Constraints with Evolutionary Multi-objective Optimization Techniques

Cost Control of the Transmission Congestion Management in Electricity Systems Based on Ant Colony Algorithm

Use the template below as a guide for organizing the text of your story.

Scheme For Finding The Next Term Of A Sequence Based On Evolution {File Closing Version 4}. ISSN

Real-time Scheduling of Flexible Manufacturing Systems using Support Vector Machines and Neural Networks

A Realistic E-Learning System based on Mixed Reality

Rank Inclusion in Criteria Hierarchies

Object Modeling for Multicamera Correspondence Using Fuzzy Region Color Adjacency Graphs

Minimum Penalized Hellinger Distance for Model Selection in Small Samples

Following a musical performance from a partially specified score.

Positive-living skills for children aged 3 to 6

Technical Information

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

11 Hybrid Cables. n f Hz. kva i P. Hybrid Cables Description INFORMATION Description

CSE 517 Natural Language Processing Winter 2013

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

Image Intensifier Reference Manual

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

A BROADCASTING PROTOCOL FOR COMPRESSED VIDEO

Polychrome Devices Reference Manual

Anchor Box Optimization for Object Detection

Heterogeneous Talent and Optimal Emigration 1

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

Chapter 7 Registers and Register Transfers

The decoder in statistical machine translation: how does it work?

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

Bibliometric Characteristics of Political Science Research in Germany

Statistics AGAIN? Descriptives

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

Analysis of Subscription Demand for Pay-TV

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

Explanation on FY2015

Modeling Form for On-line Following of Musical Performances

QUICK START GUIDE v0.98

Reliable Transmission Control Scheme Based on FEC Sensing and Adaptive MIMO for Mobile Internet of Things

A Quantization-Friendly Separable Convolution for MobileNets

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

Illumination Models and Surface Rendering Methods

current activity shows on the top right corner in green. The steps appear in yellow

Read Only Memory (ROM)

Statistical NLP Spring Machine Translation: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

The Blizzard Challenge 2014

DIGITAL SYSTEM DESIGN

LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Discussion Paper Series

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

BesTrans AOC (Active Optical Cable) Spec and Manual

Motivation. Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

RELIABILITY EVALUATION OF REPAIRABLE COMPLEX SYSTEMS AN ANALYZING FAILURE DATA

Machine Translation: Examples. Statistical NLP Spring MT: Evaluation. Phrasal / Syntactic MT: Examples. Lecture 7: Phrase-Based MT

A Comparative Analysis of Disk Scheduling Policies

Research on the Classification Algorithms for the Classical Poetry Artistic Conception based on Feature Clustering Methodology. Jin-feng LIANG 1, a

Part II: Derivation of the rules of voice-leading. The Goal. Some Abbreviations

Loewe bild 7.65 OLED. Set-up options. Loewe bild 7 cover Incl. Back cover. Loewe bild 7 cover kit Incl. Back cover and Speaker cover

Energy and Exergy Analysis for Single and Parallel Flow Double Effect Water-Lithium Bromide Vapor Absorption Systems

THE Internet of Things (IoT) is likely to be incorporated

Sealed Circular LC Connector System Plug

L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

8825E/8825R/8830E/8831E SERIES

3. Sequential Logic 1

Deerfield-Windsor School Upper School Summer Reading Guide Summer, 2018

Line numbering and synchronization in digital HDTV systems

Randomness Analysis of Pseudorandom Bit Sequences

Reduce Distillation Column Cost by Hybrid Particle Swarm and Ant

US B2. ( *) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.c. 154(b) by 0 days.

Chapter 3: Sequential Logic

A STUDY OF TRUMPET ENVELOPES

arxiv: v1 [cs.cl] 12 Sep 2018

Loewe bild 5.55 oled. Modular Design Flexible configuration with individual components. Set-up options. TV Monitor

A. Flue Pipes. 2. Open Pipes. = n. Musical Instruments. Instruments. A. Flue Pipes B. Flutes C. Reeds D. References

9! VERY LARGE IN THEIR CONCERNS. AND THEREFORE, UH, i

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

Math of Projections:Overview. Perspective Viewing. Perspective Projections. Perspective Projections. Math of perspective projection

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

Novel Quantization Strategies for Linear Prediction with Guarantees

Color Monitor. L200p. English. User s Guide

NexLine AD Power Line Adaptor INSTALLATION AND OPERATION MANUAL. Westinghouse Security Electronics an ISO 9001 certified company

Organic Macromolecules and the Genetic Code A cell is mostly water.

Optimized PMU placement by combining topological approach and system dynamics aspects

Quality improvement in measurement channel including of ADC under operation conditions

Practice Guide Sonata in F Minor, Op. 2, No. 1, I. Allegro Ludwig van Beethoven

PIANO SYLLABUS SPECIFICATION. Also suitable for Keyboards Edition

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

Production of Natural Penicillins by Strains of Penicillium chrysogenutn

AIAA Optimal Sampling Techniques for Zone- Based Probabilistic Fatigue Life Prediction

Transcription:

Comparatve Study of Word Algmet Heurstcs ad Phrase-Based SMT Hua Wu ad Hafeg Wag Toshba (Cha) Research ad Developmet Ceter 5/F., Tower W2, Oretal Plaza No., East Chag A Ave., Dog Cheg Dstrct Bejg, 00738, Cha {wuhua, waghafeg}@rdc.toshba.com.c Abstract Ths paper comparatvely aalyzes sx dfferet word algmet heurstcs ad ther mpacts o traslato qualty. We also propose a method to flter the ose the phrase tables extracted by these heurstc methods ad exame the effectveess of combato of the methods. Expermets are performed o the Europarl corpus, where a multlgual -doma trag corpus, a -doma test set, ad a out-of-doma test set are avalable. Results dcate that () the heurstcs show smlar tedeces the word algmet task o both test sets, but they perform dfferetly the traslato task o the -doma ad out-of-doma test sets; (2) geeral, the relatoshp betwee word algmet ad mache traslato performace s dffcult to be predcted, depedg o domas of the trag ad testg corpora besdes other factors such as evaluato metrcs ad the characterstcs of traslato systems; (3) ose flterg ad combato of these heurstc methods acheve larger mprovemet o the out-of-doma test set tha o the -doma test set. Itroducto Word or phrase algmet plays a crucal role statstcal mache traslato (SMT). Durg trag, the SMT systems produce algmet betwee words or phrases of exstg examples to estmate the statstcal parameters. Wth these estmated parameters, the SMT systems traslate source seteces to target seteces. Curret state-of-the-art models mache traslato are based o algmets betwee phrases (Koeh et al., 2003; Chag, 2005). Phrase-based geeratve models are frst proposed by Marcu ad Wog (2002) to extract phrase pars. Zhao ad Wabel (2005) also proposed several geeratve models to geerate phrase pars for mache traslato. A alteratve s to frst geerate word algmets. Phrase algmets are the ferred heurstcally from these word algmets (Och et al., 999; Koeh et al., 2003). DeNero et al. (2006) showed ther expermets that the heurstc methods outperform the geeratve models. Ther aalyss dcates that the performace gap stems prmarly from the segmetato varable of the geeratve model, whch creases the possblty of overfttg durg trag. Recetly, several researches have bee coducted to explore the relatoshp of word algmet qualty measures ad mache traslato qualty. The ma pots are cocluded as follows.. It s dffcult to fd a drect correlato betwee word algmet measures (such as algmet error rate) ad automated MT metrcs (Aya ad Dorr, 2006; Fraser ad Marcu, 2006). 2. Large gas algmet performace uder ay metrc are cofrmed to acheve relatvely small gas traslato performace (Lopez ad Resk, 2006). 3. Better feature mg ca lead to substatal ga traslato qualty (Lopez ad Resk, 2006). 4. It s better to geerate algmets adapted to the characterstcs of the traslato models that wll make use of ths algmet formato (Vlar et al., 2006). However, all of the above coclusos are made o the doma test sets ad ever o the out-of-doma test sets. I addto, although Lopez ad Resk (2006) poted out that t may be more useful to hadle ose phrase extracto tha to mprove word algmet qualty, they dd ot provde detaled formato to verfy ths pot. I ths paper, we wll use dfferet heurstcs to geerate word algmets, ad exame the mpacts of these heurstcs o mache traslato qualty. Ad the we wll re-evaluate the relatoshp of word algmet ad ther mpacts o mache traslato qualty o both doma ad out-of-doma test sets. Furthermore, order to exame the ose the phrase pars extracted usg dfferet algmet heurstcs, we propose a method to flter the ose the phrase tables usg assocato measures. Ad we wll also vestgate whether combg the phrase tables extracted by dfferet heurstcs mproves traslato qualty. We performed expermets o the Europarl corpus (Koeh, 2005; Koeh ad Moz, 2006), where a multlgual -doma trag corpus, a -doma test set, ad a out-of-doma test set are avalable. We obtaed the followg results:. Word algmet results show that the compromse method, whch makes compromse betwee precso ad recall, performs the best o both -doma ad out-of-doma algmet test sets. 2. Traslato results dcate that the heurstc methods perform dfferetly o the -doma ad out-of-doma test sets. O the -doma test set, the recall-oreted heurstc methods yeld better traslato qualty. O the out-of-doma test set, the precso-oreted heurstc methods yeld better traslato qualty. O both of the test sets, the compromse method acheves satsfyg traslato qualty. 3. The relatoshp betwee word algmet ad mache traslato performace depeds o domas of the trag ad testg corpora besdes other factors such as evaluato metrcs ad the characterstcs of the traslato systems used. 4. Flterg the ose the phrase tables ad combg dfferet phrase tables acheve larger mprovemet o the out-of-doma test set tha o the -doma test set.

The remader of ths paper s orgazed as follows. Frst, we wll descrbe phrase-based mache traslato ad the correspodg word algmet heurstcs used ths paper. The we wll propose a method to flter the ose phrase pars. Followg ths, we wll propose methods to combe the phrase pars extracted by dfferet methods. After that, we wll preset the expermetal results. Lastly, we wll coclude ths paper. Phrase-Based Statstcal Mache Traslato I phrase-based SMT systems, the ut of traslato s ay cotguous sequece words, whch s called phrase. It cludes two steps: trag ad traslato. Durg trag, parallel corpus s employed to duce phrase algmet the setece pars ad estmate traslato probabltes. Target moolgual corpus s employed to tra a laguage model. Durg traslato, the source setece s frst segmeted to phrases ad the traslated to target phrases usg leared phrase pars. The target phrases are the recombed to form a target setece. Log-Lear Model Gve a source setece f, the best target traslato e best ca be obtaed accordg to the followg log-lear model e best = arg max arg max e p( e f) M e m= λ m h m ( e, f) Where hm (e,f) represets feature fuctos, ad λ m s the weght assged to the correspodg feature fucto. I ths paper, we wll use the Pharaoh system (Koeh, 2004). Eght dfferet features are used ths system.. a phrase traslato probablty 2. a verse phrase traslato probablty 3. a lexcal weght: measurg the qualty of word algmet sde the phrase par 4. a verse lexcal weght 5. laguage model 6. phrase pealty 7. word pealty 8. reorderg For phrase traslato probablty, lexcal weght, ad reorderg, we use the same models (Koeh et al., 2003). We use -grams for laguage modellg. For the phrase pealty ad word pealty, we use the same heurstcs (Ze ad Ney, 2004). Word Algmet Heurstcs Oe mportat compoet used the Pharaoh system s the phrase traslato table. Sce DeNero et al. (2006) showed ther expermets that the heurstc methods outperform the geeratve models for phrase par extracto, we use heurstc methods ths paper. We frst alg the words the trag parallel corpus, extract phrase pars that are cosstet wth the word algmets, ad the assg probabltes to the obtaed phrase pars. () Word algmets are obtaed by usg the GIZA++ toolkt both traslato drectos ad the symmetrze the two algmets. I statstcal traslato models mplemeted GIZA++, oly oe-to-oe ad more-tooe word algmet lks ca be foud. Thus, some multword uts caot be correctly alged. The symmetrzato method s used to effectvely overcome ths defcecy (Och ad Ney, 2003). I ths paper, we use sx kds of symmetrzato methods. Let A ad A2 represet the two algmets source to target ad target to source traslato drectos, the sx symmetrzato methods ca be descrbed as follows.. tersecto: A = A A2 2. uo: A = A A2 3. grow: the algmets the tersecto set of the two algmets are frst added. Ad the eghborg algmet pots the uo sets drectly the left, rght, top, or bottom drectos are added. 4. grow-dag: besdes the eghborg pots the grow method, the dagoally eghborg algmet pots are also cluded. 5. grow-dag-fal: addto to the algmet pots grow-dag, the o-eghborg algmet pots betwee words, of whch at least oe s curretly ualged, are added a fal step. 6. grow-fal: I addto to the algmet pots grow, the o-eghborg algmet pots betwee words, of whch at least oe s curretly ualged, are added a fal step. Phrase Extracto Wth the word algmet results obtaed by the above sx heurstc methods, we extract phrase pars that satsfy the followg restrctos:. all source words wth a phrase are alged oly to target words wth a phrase 2. all target words wth a phrase are alged oly to source words wth a phrase More formally, the set of blgual phrases cosstet wth a word algmet A s defed as BP = {( f J I ( f j j, e, A) + m +, e ) ( ', j') A: (2) j j' j + m ' + } The phrase traslato probablty s defed as cout( f, e) p ( f e) = (3) cout( f ', e) f ' Where cout ( f, e) descrbes the frequecy of the phrase f s alged wth the phrase e the parallel corpus. Gve a phrase par ( f, e) ad a word algmet a betwee the source word postos =,..., ad the target word postos j =,...,m, the lexcal weght ca be estmated accordg to the followg method (Koeh et al., 2003). It s located at http://www.fjoch.com/ GIZA++.html.

p = w ( f e, a) j (, j) a = (, j) a w( f e j ) Nose Flterg of Phrase Pars Phrase traslato probablty ad lexcal weght are mportat features the phrase traslato table. Lopez ad Resk (2006) foud that algmet qualty has lttle mpact o the lexcal weghtg feature, whch tself provdes oly a modest mprovemet traslato qualty. Thus, we oly flter the phrase pars usg phrase traslato statstcs. Although the phrase traslato probablty descrbed equato (2) ca be used to flter the phrase table, traslato probablty usually overestmates the frequetly occurrg pars. I order to solve ths problem, we use assocato measures to flter some phrase pars. Dug (993) proved that log lkelhood rato performed very well o frequetly occurrg data. Thus, we calculate the log lkelhood rato for each phrase par. Frst we costruct a cotgecy table as show Table. (4) Target ~Target phrase phrase Totals Source phrase 2 R ~Source phrase 2 22 R 2 Totals C C 2 N Table. Cotgecy Table for Phrase Pars Accordg to the cotgecy table, the log lkelhood rato for each phrase par s defed as 2 G ( f, e) = 2 log λ j N = j log R C, j j For each source phrase, t may be traslated to ( ) target phrases. For these phrase par, we ca obta the maxmum log lkelhood value as follows: 2 (5) Max( f ) = MaxG ( f, e) (6) e As compared wth the maxmum value, we ca get a relatve value as descrbed (7). We oly keep those phrase pars whose relatve values are larger tha a threshold. 2 G ( f, e) Rato ( f, e) = Max( f ) 2 Model Combato Model Iterpolato To combe the dfferet phrase tables, we use lear terpolato method ths paper. For the phrase 2 Ths threshold s determed o a developmet set. (7) traslato probablty ad lexcal weght the traslato models, we terpolate them as show equatos (8) ad (9). p( f e) = α p ( f e) (8) = ( f e, a) = pw, = pw β ( f e, a) (9) Where p ( f e) ad p w, ( f e, a) ( =,..., ) are the phrase traslato probablty ad lexcal weght estmated by dfferet methods. α ad β are terpolato coeffcets, esurg α = ad =. =0 β = 0 Cout Mergg Aother way to combe the phrase pars extracted by dfferet methods s to use the cout mergg method, whch s wdely used laguage modelg (Baccha ad Roark, 2003; Baccha et al., 2004). The ma dea of cout mergg s to assg weghts to the occurrg cout of phrase pars, ad the mergg them to buld traslato models. The method to estmate the traslato probablty s show equato (0). p( f e) = = α = e' α cout ( f, e) cout ( f, e') (0) Where cout ( f, e) descrbes the frequecy of the phrase f alged wth the phrase e of the th method. α s the weght assged to the correspodg method. For the lexcal weght, we frst get the lexcal traslato probablty as show (), ad the calculate the lexcal weght as show equato (4). Whe calculatg the lexcal weght, the word algmet formato ca be set as the uo of the algmets volved. w( f e) = = β = e' β cout ( f, e) cout ( f, e') () Where cout ( f, e) descrbes the frequecy of the word f s alged wth the word e of the th method. β s the weght assged to the correspodg method. Expermets o Word Algmet ad Traslato Ths secto frst descrbes the word algmet ad traslato results, ad the aalyzes the relatoshp betwee word algmet methods ad mache traslato qualty.

Corpus Descrpto Traslato Data A shared task to evaluate mache traslato performace was orgazed as part of the NAACL/HLT 2006 Workshop o Statstcal Mache Traslato (Koeh ad Moz, 2006). The shared task used the Europarl corpus (Koeh, 2005), whch four laguages are volved: Eglsh, Frech, Spash, ad Germa. The shared task performed traslato betwee Eglsh ad the other three laguages. I our work, we perform traslato from the A = other three laguages to Eglsh. S Table 2 shows the formato about the blgual trag data. I the table, "fr", "e", "es", ad "de" deotes "Frech", "Eglsh", "Spash", ad "Germa", respectvely. Laguage pars Setece pars Source words Target words fr-e 688,03 5,323,737 3,808,04 es-e 730,740 5,676,70 5,222,05 de-e 75,088 5,256,793 6,052,269 Table 2. Trag Corpus for Europea Laguages For the laguage models, we use the same data provded the shared task. We also use the same developmet set ad test set provded by the shared task. The -doma test set cludes 2,000 seteces ad the out-of-doma test set cludes,064 seteces for each laguage. Word Algmet Data The trag data for word algmet s the same as that used for traslato. For the -doma test set, we use the Spash-Eglsh Europea Parlamet Pleary Sessos (EPPS) test set, 3 whch s extracted from the proceedgs of the Europea Parlamet. It cludes 500 seteces of at most 00 words that have bee selected at radom from the Eglsh-Spash trag corpus. The data set has bee splt to a 00 setece pars developmet corpus ad a 400 setece pars test corpus. I our expermets, we use the same 400 setece pars as the test set. The test set was alged maually by agreemet of three maual referece algmets (Lambert et al., 2005). It cludes 7,474 referece algmet lks. 66.7% of them are sure lks whereas 33.3% are possble lks. For the out-of-doma test set, we radomly extract 395 setece pars from the out-of-doma traslato test set descrbed the above secto. Ths set s also maually aotated, but we do ot classfy t to sure or possble lks ad take all of them as sure lks. The referece set cludes 7,037 algmet lks. The detaled formato about these two sets s descrbed Table 3. Test Average Laguage Vocabulary Words set legth Idoma Spash 2,998 2,369 30.9 Eglsh 2,537,790 29.5 Out-ofdoma Spash 2,8 0,73 27. Eglsh 2,546 93,733 23.7 Table 3. Word Algmet Test Set Statstcs Evaluato Metrcs We use the same word algmet evaluato metrcs as descrbed (Och ad Ney, 2003). If we use A to dcate the algmets detfed by the proposed methods, ad S ad P to deote the sure ad possble lks the referece algmets, the precso, recall, ad algmet error rate (AER) are calculated as descrbed Equatos (2), (3) ad (4). If we take all lks as sure lks, the P = S. S precso (2) A P recall = (3) P A S + A P AER = (4) A + S The traslato qualty was evaluated usg a wellestablshed automatc measure: BLEU score (Pape et al., 2002). Ad we also use the tool provded the NAACL/HLT 2006 shared task o SMT to calculate the BLEU scores. We use the same method descrbed (Koeh ad Moz, 2006) to perform the sgfcace test. Word Algmet Results We perform b-drectoal (source to target ad target to source) word algmets usg the GIZA++ toolkt, ad obta the symmetrzed algmet results usg the sx word algmet heurstcs descrbed ths paper. The algmet results are show Table 4 for the -doma ad out-of-doma test sets. O both of the test sets, the compromse method "growdag" obtas the lowest AER because t makes compromse betwee precso ad recall. The tersecto performs the worst because t acheves a much lower recall as compared wth other methods. From the results, t ca be see that grow-dag-fal, grow-fal, ad uo are recall-oreted methods. Itersecto ad grow are precso-oreted methods. I geeral, the compromse method, acheves the best word algmet results, ad the precso-oreted method "tersecto" gets the worst results. Traslato Results We use Koeh's trag scrpts 4 to tra the traslato model, ad the SRILM toolkt (Stolcke, 2002) to tra laguage model. For traslato, we use the Pharaoh decoder (Koeh, 2004). We ru the decoder wth ts default settgs. We use the sx word algmet methods descrbed ths paper to get dfferet word algmet results, ad the extract phrase pars cosstet wth the word algmet results. Table 5 shows the umber of the extracted phrase pars. The tersecto method obtas may more phrase pars, by about a factor of fve as compared wth the grow-dag-fal method. We also compare the detaled formato of the phrase pars. The phrase pars extracted by grow-dag-fal ad grow-fal cludes all phrase pars extracted by uo, ad the phrase pars extracted by 3 It s located at http://gps-tsc.upc.es/veu/persoal/lambert/data/ epps-algref.html. 4 It s located at http://www.statmt.org/wmt06/shared-task/ basele.html.

Symmetrzato I-doma Out-of-doma strategy Precso Recall AER Precso Recall AER grow-dag-fal 0.77 0.723 0.285 0.6383 0.7337 0.373 grow-fal 0.733 0.742 0.2863 0.6290 0.7346 0.3223 uo 0.695 0.7224 0.2947 0.643 0.7397 0.3288 grow-dag 0.7894 0.687 0.2645 0.6859 0.7052 0.3046 grow 0.822 0.6475 0.270 0.7099 0.6663 0.326 tersecto 0.8689 0.577 0.3064 0.783 0.5822 0.332 Table 4. Word Algmet Results Laguage par grow-dag-fal grow-fal uo grow-dag grow tersecto es-e 37,628,890 36,868,632 33,249,362 99,472,934 39,303,869 77,74,005 fr-e 34,5,677 33,64,98 29,582,74 02,494,636 42,602,885 73,847,297 de-e 32,954,99 3,869,009 28,09,95 23,086,503 66,904,79 23,87,373 Table 5. The Number of Phrase Pars Laguage par grow-dag-fal grow-fal uo grow-dag grow tersecto es-e 0.3053 0.3063 0.3042 0.3058 0.2976 0.2892 fr-e 0.304 0.3020 0.3006 0.3040 0.2964 0.2905 de-e 0.2407 0.2397 0.2389 0.2349 0.2288 0.2053 Table 6. Traslato Results o the I-Doma Test Set Laguage par grow-dag-fal grow-fal uo grow-dag grow tersecto es-e 0.2479 0.2503 0.2494 0.256 0.253 0.250 fr-e 0.995 0.997 0.979 0.2040 0.209 0.2022 de-e 0.666 0.663 0.643 0.707 0.643 0.530 Table 7. Traslato Results o the Out-of-Doma Test Set Laguage par grow-dag-fal grow-fal uo grow-dag grow tersecto es-e 0.304 0.300 0.3093 0.307 0.2988 0.290 fr-e 0.3084 0.3083 0.308 0.3047 0.2982 0.2923 de-e 0.2428 0.2427 0.2408 0.2360 0.2325 0.230 Table 8. Traslato Results of Flterg o the I-Doma Test Set Laguage par grow-dag-fal grow-fal uo grow-dag grow tersecto es-e 0.2608 0.267 0.2609 0.2634 0.2652 0.2607 fr-e 0.2088 0.2090 0.22 0.23 0.227 0.228 de-e 0.74 0.720 0.692 0.758 0.702 0.598 Table 9. Traslato Results of Flterg o the Out-of-Doma Test Set 32 I-doma Results 27 Out-of-doma Results 3 30 29 28 Flterg Basele 26 25 24 Flterg Basele grow-dag-fal grow-fal uo grow-dag grow tersecto grow-dag-fal grow-fal uo grow-dag grow tersecto Fgure. Spash-Eglsh Flterg Results

tersecto cludes all phrase pars extracted by grow. More tha 90% of phrases pars extracted by grow-fal are covered by grow-dag-fal, ad more tha 95% of phrase pars extracted by grow-dag are covered by tersecto. The traslato results of the dfferet methods o the doma test set ad out-of-doma test set are show Tables 6 ad 7, respectvely. O both test sets, grow-dag acheves the best BLUE scores for three tasks ad the secod best for two tasks amog sx tasks. O the doma test set, recall-oreted methods acheve better result tha precso-oreted methods. O the out-ofdoma test set, the result s very dfferet, where precso-oreted methods acheve better results tha recall-oreted methods. 5 Ths s because precsooreted methods extract may more phrase pars, ad may cover more source words of the out-of-doma test set. Further aalyss shows that, o the out-of-doma test sets, oly 355 words (of 29,488) are ot covered by the tersecto method, whle 503 words are ot covered by the grow-dag-fal method for Spash to Eglsh traslato. Word Algmet ad Traslato Qualty As descrbed the above sectos, the grow-dag method that makes compromse betwee recall ad precso performs very well o both -doma ad out-of-doma algmet test sets. Although the sx dfferet heurstc methods show a smlar tedecy for word algmet o both test sets, they perform very dfferetly for traslato o the two test sets. Recall-oreted methods perform better o the -doma test set ad precso-oreted methods perform better o the out-of-doma test set. Thus, the relatoshp betwee word algmet ad mache traslato s very complcated. It ot oly depeds o the metrcs take for word algmet ad traslato qualty, ad the characterstcs of the traslato system used (Vlar et al., 2006), t also depeds o domas of the corpora vestgated o. I cocluso, t s a good dea to use heurstcs that make compromse betwee precso ad recall, whch ca acheve satsfactory traslato results o phrase-based SMT o both -doma ad out-of-doma texts. Expermets o Nose Flterg I ths secto, we perform expermets to flter the phrase pars used the Pharaoh system. The trag ad testg data are the same as those the traslato task. Log lkelhood Rato vs. Traslato Probablty Ths secto wll compare two ose flterg methods: log lkelhood rato ad traslato probablty. Here, we use the grow-dag-fal method Frech to Eglsh traslato as a case study. The threshold equato (7) s set to 0.5 ad 0.05 for log lkelhood rato ad traslato probablty, respectvely. 6 The flterg results are show 5 I Germa to Eglsh traslato, the tersecto method acheves lower BLEU score as compared wth other methods o both of the test sets. However, the grow method acheves comparable results wth other methods o the out-of-doma test set. 6 The thresholds are set usg the developmet set, whch acheves the best results o ths set. Table 0. The basele represets the method before flterg. From the results, t ca be see that both of the flterg methods outperforms the basele, wth log lkelhood rato performg better. Ad sgfcace test show that log lkelhood rato sgfcatly outperforms the other methods. Thus, the followg sectos, we wll oly use log lkelhood rato for ose flterg. Flterg Method I-doma Out-of-doma Basele 0.304 0.995 Log lkelhood 0.3084 0.2088 Probablty 0.3043 0.2030 Table 0. Comparso of Flterg Methods Flterg Results Usg log lkelhood rato as the flterg method, the traslato results after flterg are show Tables 8 ad 9. I order to drectly compare the traslato results betwee the basele (before flterg) ad our flterg method, the results for Spash to Eglsh traslato are show Fgure. 7 From the results, t ca be see that the flterg methods s more effectve o the out-ofdoma test set tha o the -doma test set. O the -doma test set, the flterg method s oly effectve for recall-oreted methods. For the compromse method ad precso-oreted methods, the phrase tables are much larger tha recall-oreted methods, whch may cota much more ose. Log lkelhood rato s ot so dstgushable to remove much ose from them. O the out-of-doma test set, the flterg method s very effectve for all of the heurstcs, achevg more tha 0.0 BLEU score as compared wth the baseles. Ths s because some out-of-doma phrases may occur frequetly the -doma trag corpus ad the phrase traslato probablty of the frequetly occurrg pars s usually overestmated. Thus, these phrase pars occurrg frequetly may be used for traslato. I ths case, log lkelhood rato s effectve to remove these frequetly occurrg pars, whch results the mprovemet of traslato qualty. Results by Usg Dfferet Szes of Trag Corpus I order to further aalyze the effect of szes of trag corpus, we take Spash to Eglsh traslato as a case study. We obta the trag corpora by radomly select 00K, 200k, ad 400k setece pars from the etre Spash-Eglsh parallel corpus to tra traslato models. Here we use the three heurstcs "grow-dag-fal", "tersecto", ad "grow-dag" to represet the recalloreted methods, precso-oreted methods, ad compromse methods, respectvely. The results are show Fgures 2, 3, ad 4. From the fgures, t ca be see that, o all szes of trag corpora, the flterg method acheves larger mprovemet o the out-of-doma test set tha o the -doma test set. O the -doma test set, flterg s effectve for recall-based methods ad oly acheves mor mprovemet for other two methods. 7 The results for the other two traslato drectos are omtted here because they have smlar results as show Fgure.

32 3 30 29 I-doma Results Flterg Basele 27 26 25 24 23 Out-of-doma Results Flterg Basele 28 00k 200k 400k all 22 00k 200k 400k all Fgure 2. Flterg Results of Grow-dag-fal by Usg Dfferet Szes of Trag Corpus 3 30 29 28 I-doma Results Flterg Basele 00k 200k 400k all 27 26 25 24 23 22 Out-of-doma Results Flterg Basele 00k 200k 400k all Fgure 3. Flterg Results of Grow-dag by Usg Dfferet Szes of Trag Corpus 30 29 28 I-doma Results Flterg Basele 00k 200k 400k all 27 26 25 24 23 22 Out-of-doma Results Flterg Basele 00k 200k 400k all Fgure 4. Flterg Results of Itersecto by Usg Dfferet Szes of Trag Corpus By usg smaller sze (00K) of trag corpus, flterg the ose s ot so effectve for both test sets because t s subject to the problem of data sparseess ad log lkelhood rato s ot so dstgushable to remove them. Ad by creasg the szes of the trag data, flterg the phrase pars the phrase table becomes more effectve to acheve gas traslato qualty. Expermets o Model Combato I ths secto, we stll use Spash to Eglsh traslato as a case study to exame the effect of model combato of the dfferet methods. Model Iterpolato Vs. Cout Mergg Sce "grow-dag-fal", "grow-dag", ad "tersecto" represet the three kds of heurstcs, we oly perform model combato amog these three methods. The combato results are show Table. All of the methods uses log lkelhood rato to flter the ose the phrase tables. For the -doma case, the coeffcets are set to 0.8, 0., ad 0. for "grow-dag-fal", "grow-dag", ad "tersecto" for both model terpolato ad cout mergg. For the out-of-doma case, the coeffcets are set to 0., 0.2, ad 0.7 for "grow-dag-fal", "grow-dag", ad "tersecto" for both model terpolato ad cout mergg. All of the weghts are tued o the developmet set. The results show that cout mergg slghtly outperforms model terpolato. O the -doma test set, model combato oly slghtly mprove traslato qualty whle o the out-of-doma test set, model combato sgfcatly mprove traslato qualty. Ths s because combato of phrase tables extracted usg dfferet heurstcs does ot provde addtoal formato for doma traslato. I cotrast, combg these tables for out-of-doma traslato ca provde more formato to mprove traslato qualty. I-doma Out-of-doma Grow-dag-fal 3.04 26.0 Grow-dag 30.7 26.34 Itersecto 29.0 26.09 Model terpolato 3.09 26.89 Cout mergg 3.2 27.0 Table. Results of Model Combato after Flterg

Cocluso Ths paper evaluated sx dfferet word algmet heurstcs ad ther mpacts o traslato qualty. We also vestgated the effectveess of log lkelhood rato to flter the ose the phrase tables extracted usg dfferet heurstc methods ad exame the effectveess of model combato of these methods. Word algmet results show that usg the algmet error rate as a metrc, the compromse method performs the best ad the precso-oreted method "tersecto" performs the worst o both -doma ad out-of-doma test sets. I the traslato task, the results show that the heurstc methods perform dfferetly o the -doma ad out-ofdoma test sets. O both of the test sets, the method that makes compromse betwee precso ad recall acheves satsfactory traslato qualty. O the -doma test set, the recall-oreted heurstc methods yeld better traslato qualty. O the out-of-doma test set, the precso-oreted heurstc methods yeld better traslato qualty. Thus, the relatoshp betwee word algmet ad mache traslato performace also depeds o the domas of trag ad testg corpora besdes other factors such as evaluato metrcs ad the characterstcs of the traslato systems. Results also shows that flterg the ose the phrase tables results more mprovemets of traslato qualty o the out-of-doma test set tha o the -doma test set. The flterg methods acheve a mprovemet of about 0.0 BLEU score o the out-of-doma test set. Model combato results show that cout mergg performs slghtly better tha model terpolato o our test sets. Ad these two methods sgfcatly mprove the traslato qualty o the out-of-doma test set. Refereces Aya, N.F. ad Dorr, B. (2006). Gog beyod AER: A Extesve Aalyss of Word Algmets ad Ther Impact o MT. I Proceedgs of the Jot Coferece of the Iteratoal Commttee o Computatoal Lgustcs ad the Assocato for Computatoal Lgustcs (pp. 9--6). Baccha, M. ad Roark, B. (2003). Usupervsed Laguage Model Adaptato. I Proceedgs of Iteratoal Coferece o Acoustc, Speech, ad Sgal Processg (pp. 224--227). Baccha, M., Roark, B., ad Saraclar, M. (2004). Laguage Model Adaptato wth MAP Estmato ad the Perceptro Algorthm. I Proceedgs of the Huma Laguage Techology Coferece ad Meetg of the North Amerca Chapter of the Assocato for Computatoal Lgustcs (pp. 2--24). Chag, D. (2005). A Herarchcal Phrase-Based Model for Statstcal Mache Traslato. I Proceedgs of the 43rd Aual Meetg of the Assocato for Computatoal Lgustcs (pp. 263--270). DeNero, J., Gllck, D., Zhag, J., ad Kle, D. (2006). Why Geeratve Phrase Models Uderperform Surface Heurstcs. I Proceedgs of the NAACL 2006 Workshop o Statstcal Mache Traslato (pp. 3-- 38). Dug, T. (993). Accurate Methods for the Statstcs of Surprse ad Cocdece. Computatoal Lgustcs, 9(), 6--74. Fraser, A. ad Marcu, D. (2006). Measurg Word Algmet Qualty for Statstcal Mache Traslato Techcal Report ISI-TR-66, ISI-Uversty of Souther Calfora. Koeh, P. (2004). Pharaoh: A Beam Search Decoder for Phrase-Based Statstcal Mache Traslato Models. I Proceedgs of the 6th Coferece of the Assocato for Mache Traslato the Amercas (pp. 5--24). Koeh P. (2005). Europarl: A Parallel Corpus for Statstcal Mache Traslato. I Proceedgs of the MT Summt X (pp. 79--86). Koeh, P. ad Moz, C. (2006). Maual ad Automatc Evaluato of Mache Traslato betwee Europea Laguages. I Proceedgs of the 2006 HLT-NAACL Workshop o Statstcal Mache Traslato (pp. 02-- 2). Koeh, P., Och, F.J., ad Marcu, D. (2003). Statstcal Phrase-Based Traslato. I Proceedgs of the 2003 Huma Laguage Techology Coferece of the North Amerca Chapter of the Assocato for Computatoal Lgustcs (pp. 27--33). Lambert, P., Gspert, A., Bachs, R., ad Maro, J.B. (2005). Gudeles for Word Algmet Evaluato ad Maual Algmet. Laguage Resources ad Evaluato, 39 (4), 267--285. Lopez A. ad Resk, P. (2006). Word-Based Algmet, Phrase-Based Traslato: What's the Lk? I Proceedgs of the Twelfth Meetg of the Assocato for Mache Traslato the Amercas (pp. 90--99). Marcu, D. ad Wog, W. (2002). A Phrase-Based, Jot Probablty Model for Statstcal Mache Traslato. I Proceedgs of the 2002 Coferece o Emprcal Methods Natural Laguage Processg (pp. 33-- 39). Och, F.J. ad Ney, H. (2003). A Systematc Comparso of Varous Statstcal Algmet Models. Computatoal Lgustcs, 29(), 9--52. Och, F.J., Tllma, C., ad Ney, H. (999). Improved Algmet Models for Statstcal Mache Traslato. I Proceedgs of the 999 Jot SIGDAT Coferece o Emprcal Methods Natural Laguage Processg ad Very Large Corpora (pp. 20--28). Pape, K., Roukos, S., Ward, T., ad Zhu, W. (2002). BLEU: a Method for Automatc Evaluato of Mache Traslato. I Proceedgs of the 40th Aual Meetg of the Assocato for Computatoal Lgustcs (pp. 3--38). Stolcke A. (2002). SRILM - A Extesble Laguage Modellg Toolkt. I Proceedg of the Iteratoal Coferece o Spoke Laguage Processg (pp. 90-- 904). Vlar, D., Popovc, M., ad Ney, H. (2006). AER: Do We Need to "Improve" Our Algmets?. I Proceedgs of the Iteratoal Workshop o Spoke Laguage Processg (pp. 205--22). Zes, R. ad Ney, H. (2004). Improvemets Phrase- Based Statstcal Mache Traslato. I Proceedgs of the Coferece o Huma Laguage Techology (pp. 257--264). Zhao, B. ad Wabel, A. (2005). Learg a Log-Lear Model wth Blgual Phrase-Par Features for Statstcal Mache Traslato. I Proceedgs of the Fourth SIGHAN Workshop o Chese Laguage Processg (pp. 79--86).