Machine Translation Part 2, and the EM Algorithm

Similar documents
Statistical NLP Spring Machine Translation: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation: Examples. Statistical NLP Spring MT: Evaluation. Phrasal / Syntactic MT: Examples. Lecture 7: Phrase-Based MT

CSE 517 Natural Language Processing Winter 2013

Machine Translation and Advanced Topics on LSTMs

Statistical Machine Translation Lecture 5. Decoding with Phrase-Based Models

Learning to translate with source and target syntax. David Chiang, USC Information Sciences Institute

Lecture 5: Clustering and Segmentation Part 1

Experiments with Fisher Data

Less is More: Picking Informative Frames for Video Captioning

A Dominant Gene Genetic Algorithm for a Substitution Cipher in Cryptography

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS

A Discriminative Approach to Topic-based Citation Recommendation

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Basic Natural Language Processing

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Creating Mindmaps of Documents

The decoder in statistical machine translation: how does it work?

Topic 10. Multi-pitch Analysis

Chapter 21. Margin of Error. Intervals. Asymmetric Boxes Interpretation Examples. Chapter 21. Margin of Error

A PROBABILISTIC TOPIC MODEL FOR UNSUPERVISED LEARNING OF MUSICAL KEY-PROFILES

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

Detecting Musical Key with Supervised Learning

Chinese Word Sense Disambiguation with PageRank and HowNet

Breaking News English.com Ready-to-Use English Lessons by Sean Banville

The Lowest Form of Wit: Identifying Sarcasm in Social Media

Rental Setup and Serialized Rentals

Conditional Probability and Bayes

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Music Composition with RNN

GfK Audience Measurements & Insights FREQUENTLY ASKED QUESTIONS TV AUDIENCE MEASUREMENT IN THE KINGDOM OF SAUDI ARABIA

CS229 Project Report Polyphonic Piano Transcription

Sarcasm Detection in Text: Design Document

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

BIBLIOGRAPHIC DATA: A DIFFERENT ANALYSIS PERSPECTIVE. Francesca De Battisti *, Silvia Salini

STA4000 Report Decrypting Classical Cipher Text Using Markov Chain Monte Carlo

VBM683 Machine Learning

Lecture 5: Clustering and Segmenta4on Part 1

Automatic Compositor Attribution in the First Folio of Shakespeare

Random seismic noise reduction using fuzzy based statistical filter

Announcements. HW2 directory structure penalty to be removed due to grading inconsistencies.


Analysis and Clustering of Musical Compositions using Melody-based Features

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Supervised Learning in Genre Classification

Indexing local features and instance recognition

Temporal patterns of happiness and sarcasm detection in social media (Twitter)

Supervised Learning of Complete Morphological Paradigms

Semi-supervised Musical Instrument Recognition

INTEGRATED CIRCUITS. AN219 A metastability primer Nov 15

Labelling. Friday 18th May. Goldsmiths, University of London. Bayesian Model Selection for Harmonic. Labelling. Christophe Rhodes.

Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

Week 14 Music Understanding and Classification

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR

INTRODUCTION TO SCIENTOMETRICS. Farzaneh Aminpour, PhD. Ministry of Health and Medical Education

ur-caim: Improved CAIM Discretization for Unbalanced and Balanced Data

Part 2.4 Turbo codes. p. 1. ELEC 7073 Digital Communications III, Dept. of E.E.E., HKU

Off-line Handwriting Recognition by Recurrent Error Propagation Networks

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

Automatic Labelling of tabla signals

The Construction of the DB for Mobile Moviegoer s Behavior and Its Application to Fuzzy Clustering-Based Reservation App in China

Section 2.1 How Do We Measure Speed?

Recommending Citations: Translating Papers into References

Headings: Machine Learning. Text Mining. Music Emotion Recognition

Using Bibliometric Analyses for Evaluating Leading Journals and Top Researchers in SoTL

Some Experiments in Humour Recognition Using the Italian Wikiquote Collection

Automatic Speech Recognition (CS753)

A Statistical Framework to Enlarge the Potential of Digital TV Broadcasting

A probabilistic framework for audio-based tonal key and chord recognition

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

Automatic Rhythmic Notation from Single Voice Audio Sources

Phone-based Plosive Detection

Information Networks

Research Article Design and Analysis of a High Secure Video Encryption Algorithm with Integrated Compression and Denoising Block

TRACKING THE ODD : METER INFERENCE IN A CULTURALLY DIVERSE MUSIC CORPUS

Name That Song! : A Probabilistic Approach to Querying on Music and Text

Modelling Intervention Effects in Clustered Randomized Pretest/Posttest Studies. Ed Stanek

/$ IEEE

Detecting Attempts at Humor in Multiparty Meetings

Query By Humming: Finding Songs in a Polyphonic Database

International Journal of Modern Pharmaceutical Research (IJMPR)

Fallacies and Paradoxes

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Using Boosted Decision Trees to Separate Signal and Background

GUIDELINES FOR AUTHOR

University of Toronto

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

Design Project: Designing a Viterbi Decoder (PART I)

COSC3213W04 Exercise Set 2 - Solutions

A New Method for Calculating Music Similarity

Exploring Architecture Parameters for Dual-Output LUT based FPGAs

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

Western Statistics Teachers Conference 2000

Mixed Models Lecture Notes By Dr. Hanford page 151 More Statistics& SAS Tutorial at Type 3 Tests of Fixed Effects

Lyric-Based Music Mood Recognition

Transcription:

Machine Translation Part 2, and the EM Algorithm CS 585, Fall 2015 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2015/ Brendan O Connor College of Information and Computer Sciences University of Massachusetts Amherst [Some slides borrowed from mt-class.org]

Georges Artrouni's mechanical brain, a translation device patented in France in 1933. (Image from Corbé by way of John Hutchins) 2

IBM Model 1: Inference and learning Alignment inference: Given lexical translation probabilities, infer posterior or Viterbi alignment How do we learn translation parameters? EM Algorithm arg max arg max a Translation: incorporate into noisy channel (this model isn t good at this) arg max f p(a e, f, ) p(e f, ) p(f) p(e f, ) Chicken and egg problem: If we knew alignments, translation parameters would be trivial (just counting) 3

Exercise 4

1a. Garcia and associates. 1b. Garcia y asociados. 2a. Carlos Garcia has three associates. 2b. Carlos Garcia tiene tres asociados. 3a. his associates are not strong. 3b. sus asociados no son fuertes. 4a. Garcia has a company also. 4b. Garcia tambien tiene una empresa. 5a. its clients are angry. 5b. sus clientes están enfadados. 6a. the associates are also angry. 6b. los asociados tambien están enfadados. 8a. the company has three groups. 8b. la empresa tiene tres grupos. 9a. its groups are in Europe. 9b. sus grupos están en Europa. 10a. the modern groups sell strong pharmaceuticals. 10b. los grupos modernos venden medicinas fuertes. 11a. the groups do not sell zanzanine. 11b. los grupos no venden zanzanina. 12a. the small groups are not modern. 12b. los grupos pequeños no son modernos. 7a. the clients and the associates are enemies. 7b. los clientes y los asociados son enemigos. 5

MLE Maximum Likelihood Estimation: general method to learn parameters theta from observed data x arg max P (x ) Turns out... for simple multinomial models, the MLE is simply normalized counts! dog P (w = dog ) MLE = P (corpus ) ) MLE dog = count of dog num tokens total 6

Naive Bayes: x: text, z: classes Supervised Learning Given z, learn θ MLE algorithm: Count words per class θ = count(w,k)/count(k) Unsupervised Learning Learn z,θ at once (Clustering)

Naive Bayes: x: text, z: classes Supervised Learning Given z, learn θ Unsupervised Learning Learn z,θ at once (Clustering) MLE algorithm: Count words per class θ = count(w,k)/count(k) Hard EM algorithm: Randomly initialize θ Iterate: 1. Predict each document class z := argmax_z P(z x,theta) 2. Count words per class θ = count(w,k)/count(k) Soft EM: Expectation -step: Calculate z posterior values, and M-step: fractional counts

EM Motivation: Want to learn parameters with observed data (text) but the model wants Latent/missing variables (alignments) Applications Unsupservised learning: e.g. unsup. NB, unsup. HMM Alignment models: e.g. IBM Model 1 Is Model 1 unsupervised? 8

EM Algorithm pick some random (or uniform) parameters Repeat until you get bored (~ 5 iterations for lexical translation models) using your current parameters, compute expected alignments for every target word token in the training data p(a i e, f) (on board) keep track of the expected number of times f translates into e throughout the whole corpus keep track of the expected number of times that f is used as the source of any translation use these expected counts as if they were real counts in the standard MLE equation Thursday, January 24, 13

EM for Model 1 Thursday, January 24, 13

EM for Model 1 Thursday, January 24, 13

EM for Model 1 Thursday, January 24, 13

EM for Model 1 Thursday, January 24, 13

EM for Model 1 Thursday, January 24, 13

Convergence Thursday, January 24, 13

stopped here 17

MT Phrase-based models Evaluation 18

Phrase-based MT p(f, a e) =p(f e, a) p(a e) Phrase-to-phrase translations ization of the phrase-based model of translation. The model involves three s Phrases can memorize local reorderings State-of-the-art (currently or very recently) in industry, e.g. Google Translate 19

Phrase extraction for training: Phrase Extraction Preprocess with IBM Models to predict alignments I open the box watashi wa hako wo akemasu hako wo akemasu / open the box

Decoding Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by hag bawdy no slap to the green witch did not give the the witch

Decoding Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by hag bawdy no slap to the green witch did not give the the witch

Decoding Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by hag bawdy no slap to the green witch did not give the the witch

MT Evaluation

Illustrative translation results la politique de la haine. (Foreign Original) politics of hate. (Reference Translation) the policy of the hatred. (IBM4+N-grams+Stack) nous avons signé le protocole. (Foreign Original) we did sign the memorandum of agreement. (Reference Translation) we have signed the protocol. (IBM4+N-grams+Stack) où était le plan solide? (Foreign Original) but where was the solid plan? (Reference Translation) where was the economic base? (IBM4+N-grams+Stack) the Ministry of Foreign Trade and Economic Cooperation, including foreign direct investment 40.007 billion US dollars today provide data include that year to November china actually using foreign 46.959 billion US dollars and

MT Evaluation Manual (the best!?): SSER (subjective sentence error rate) Correct/Incorrect Adequacy and Fluency (5 or 7 point scales) Error categorization Comparative ranking of translations Testing in an application that uses MT as one subcomponent E.g., question answering from foreign language documents May not test many aspects of the translation (e.g., cross-lingual IR) Automatic metric: WER (word error rate) why problematic? BLEU (Bilingual Evaluation Understudy)

BLEU Evaluation Metric (Papineni et al, ACL-2002) Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/ chemical attack against public places such as the airport. Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail, which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack, [?] highly alerts after the maintenance. N-gram precision (score is between 0 & 1) What percentage of machine n-grams can be found in the reference translation? An n-gram is an sequence of n words Not allowed to match same portion of reference translation twice at a certain n- gram level (two MT words airport are only correct if two reference words airport; can t cheat by typing out the the the the the ) Do count unigrams also in a bigram for unigram precision, etc. Brevity Penalty Can t just type out single word the (precision 1.0!) It was thought quite hard to game the system (i.e., to find a way to change machine output so that BLEU goes up, but quality doesn t)

BLEU Evaluation Metric (Papineni et al, ACL-2002) Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/ chemical attack against public places such as the airport. Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail, which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack, [?] highly alerts after the maintenance. BLEU is a weighted geometric mean, with a brevity penalty factor added. Note that it s precision-oriented BLEU4 formula (counts n-grams up to length 4) exp (1.0 * log p1 + 0.5 * log p2 + 0.25 * log p3 + 0.125 * log p4 max(words-in-reference / words-in-machine 1, 0) p1 = 1-gram precision P2 = 2-gram precision P3 = 3-gram precision P4 = 4-gram precision Note: only works at corpus level (zeroes kill it); there s a smoothed variant for sentence-level

BLEU in Action (Foreign Original) the gunman was shot to death by the police. (Reference Translation) the gunman was police kill. #1 wounded police jaya of #2 the gunman was shot dead by the police. #3 the gunman arrested by police kill. #4 the gunmen were killed. #5 the gunman was shot to death by the police. #6 gunmen were killed by police?sub>0?sub>0 #7 al by the police. #8 the ringer is killed by the police. #9 police killed the gunman. #10 green = 4-gram match (good!) red = word not matched (bad!)

Multiple Reference Translations Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport. Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places. Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail, which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack, [?] highly alerts after the maintenance. Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden, which threatens to launch a biochemical attack on such public places as airport. Guam authority has been on alert. Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia. They said there would be biochemistry air raid to Guam Airport and other public places. Guam needs to be in high precaution about this matter.

Initial results showed that BLEU predicts human judgments well (variant of BLEU) Adequacy Fluency 2.5 2.0 1.5 1.0 R 2 = 90.2% R 2 = 88.0% NIST Score 0.5 0.0-2.5-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0 2.5-0.5-1.0-1.5-2.0-2.5 Human Judgments slide from G. Doddington (NIST)