Yorick Wilks. Machine Translation. Its Scope and Limits

Machine Translation

Yorick Wilks Machine Translation Its Scope and Limits 123

Yorick Wilks Department of Computer Science The University of Sheffield Regent Court, 211 Portobello Street Sheffield, S1 4DP, UK Y.Wilks@dcs.shef.ac.uk ISBN: 978-0-387-72773-8 e-isbn: 978-0-387-72774-5 Library of Congress Control Number: 2008931409 c Springer Science+Business Media LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper springer.com

Foreword This book is a set of essays covering aspects of machine translation (MT) past, present and future. Some have been published before, some are new but, taken together, they are meant to present a coherent account of the state of MT, its evolution up to the present, and its scope for the future. At certain points, Afterwords have been added to comment on the, possibly changed, relevance of a chapter at the time of publication. The argument for reprinting here some older thoughts on MT is an attempt to show some continuity of one researcher s thoughts, so far as possible, in the welter of argument and dispute that has gone on over decades on how MT is to be done. The book is certainly not intended as a comprehensive history of the field, and these already exist. Nor is any one MT system described or advocated here. The author has been involved in the production of three quite different systems: a toy semanticsbased system at Stanford in 1971, whose code was placed in the Computer Museum in Boston as the first meaning-driven MT system. Later, in 1985, I was involved in New Mexico in ULTRA, a multi-language system with strong semantic and pragmatic features, intended to show that the architecture of the (failed) EUROTRA system could perform better at 1% of what that cost. No comprehensive description of EUROTRA is given here, and the history of that project remains to be written. Lastly, in 1990, I was one of three PIs in the DARPA-funded system PANGLOSS, a knowledge-based system set up in competition with IBM s CANDIDE system, that became the inspiration for much of the data-driven changes that have overtaken language processing since 1990. None of these systems is being presented here as a solution to MT, for nothing yet is, but only as a test-bed of ideas and performance. Machine translation is not, as some believe, solved, nor is it impossible, as others still claim. It is a lively and important technology, whose importance in a multi-lingual and information-driven world can only increase, intellectually and commercially. Intellectually, it remains, as it always has been, the ultimate testbed of all linguistic and language processing theories. In writing this book, I am indebted to too many colleagues and students to mention, though I must acknowledge joint work with David Farwell (chapters 10 v

vi Foreword and 14), and Sergei Nirenburg, Jaime Carbonnel and Ed Hovy (chapter 8). I also need to thank Lucy Moffatt for much help in its preparation, and Roberta, for everything, as always. Sheffield, 2008 Yorick Wilks

History Page Some chapters have appeared in other forms elsewhere: Chapter 2: Wilks, Y. (1984) Artificial Intelligence and Machine Translation. In S. and W. Sedelow (eds.) Current Trends in the Language Sciences. Amsterdam: North Holland. Chapter 3: Wilks, Y. (1973) An Artificial Intelligence Approach to Machine Translation. In R. Schank and K. Colby (eds.) Computer models of Thought and Language. San Francisco: Freeman. Chapter 4: Wilks, Y. (1992) SYSTRAN: it obviously works, but how much can it be improved? In J. Newton (ed.) Computers and Translation. London: Routledge. Chapter 7: Wilks, Y. (1994) Developments in machine translation research in the US. In the Aslib Proceedings, Vol. 46 (The Association of Information Management). vii

Contents 1 Introduction... 1 Part I MT Past 2 Five Generations of MT... 11 3 An Artificial Intelligence Approach to Machine Translation... 27 4 It Works but How Far Can It Go: Evaluating the SYSTRAN MT System... 65 Part II MT Present 5 Where Am I Coming From: The Reversibility of Analysis and Generation in Natural Language Processing... 89 6 What are Interlinguas for MT: Natural Languages, Logics or Arbitrary Notations?... 97 7 Stone Soup and the French Room: The Statistical Approach to MT at IBM...101 8 The Revival of US Government MT Research in 1990...115 ix

x Contents 9 The Role of Linguistic Knowledge Resources in MT...125 10 The Automatic Acquisition of Lexicons for an MT System...139 Part III MT Future 11 Senses and Texts...157 12 Sense Projection...169 13 Lexical Tuning...177 14 What Would Pragmatics-Based Machine Translation be Like?...195 15 Where was MT at the End of the Century: What Works and What Doesn t?...215 16 The Future of MT in the New Millennium...225 References...237 Index...247