TEXT ENCODING INITIATIVE
Text Encoding Initiative Background and Context Edited by Nancy Ide Department o/computer Science, Vassar College, Poughkeepsie, NY, USA AND J ean Veronis Laboratoire Parole et Langage, CNRS & Universite de Provence, Aix-en-Provence, France Reprinted from Computers and the Humanities, Volume 29, Nos. 1,2 & 3 (1995), edited by Glyn Holmes (With the addition of an SGMIJTEI Bibliography) Springer Science+Business Media, B.V.
Library of Congress Cataloging-in-Publication Data Text encodlng InItIatIve: background and contexts / edlted by Nancy Ide and Jean Veronls. p. CII. "Reprlnted frolll COllputers and the hullanltles 29: 1-3, 1995." ISBN 978-0-7923-3704-1 ISBN 978-94-011-0325-1 (ebook) DOI 10.1007/978-94-011-0325-1 1. Text processlng (Collputer sclencel 2. Codlng theory. 1. Ide, Nancy M. II. Veronls,Jean. III. COllputers and the hullanltles. OA76.9.T48T47 1995 005.7'2--dc20 95-31289 ISBN 978-0-7923-3704-1 Printed on acid-free pa per AII Rights Reserved 1995 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1995 Softcover reprint ofthe hardcover Ist edition 1995 No part of the material protected by this copyright notice may be reproduced Of utilized in any fonn or by any means, electronic or mechanical, including photocopying, recording or by any infonnation storage and retrieval system, without written pennission from the copyright owner,
Table of Contents CHARLES GOLDFARB / Preface 1 NANCY IDE and JEAN veronis / Introduction 3 PART I: GENERAL TOPICS NANCY IDE and C.M. SPERBERG-McQUEEN / The Text Encoding Initiative: Its History, Goals, and Future Development 5 C.M. SPERBERG-McQUEEN and LOU BURNARD / The Design of the TEl Encoding Scheme 17 LOU BURNARD / What is SGML and How Does It Help? 41 PART ll: DOCUMENT-WIDE ENCODING ISSUES HARRY GAYLORD / Character Representation 51 RICHARD GIORDANO / The TEl Header and the Documentation of Electronic Texts 75 DOMINIC DUNLOP / Practical Considerations in the Use of TEl Headers in Large Corpora 85 PART Ill: ENCODING SPECIFIC TEXT TYPES DAVID CmSHOLD and DAVID ROBEY / Encoding Verse Texts 99 JOHN LA V AGNINO and ELL! MYLONAS / The Show Must Go On: Problems of Tagging Performance Texts 113 ROBIN COVER and PETER ROBINSON / Textual Criticism 123 DANIEL GREENSTEIN and LOU BURNARD / Speaking with One Voice: Encoding Standards and the Prospects for an Integrated Approach to Computing in History 137 STIG JOHANSSON / The Encoding of Spoken Texts 149 ALAN MELBY / E-TIF: An Electronic Terminology Interchange Format 159 NANCY IDE and JEAN veronis / Encoding Dictionaries 167
PART IV: SPECIAL ENCODING MECHANISMS STEVEN J. DeROSE and DAVID DURAND / The TEl Hypertext Guidelines 181 D. TERENCE LANGENDOEN and GARY F. SIMONS / Rationale for the TEl Recommendations for Feature-Structure Markup 191 DAVID BARNARD, LOU BURNARD, JEAN-PIERRE GASPART, LYNNE A. PRICE, C.M. SPERBURG-McQUEEN, GIOVANNI BATTISTA VARILE / Hierarchical Encoding of Text: Technical Problems and SGML Solutions 211 SGMLlTEI Bibliography by Robin C. Cover 233
Computers and the Humanities 29: I, 1995. Preface Charles F. Goldfarb Saratoga. California If asked for a sure recipe for chaos I would propose a project in which several thousand impassioned specialists in scores of disciplines from a dozen or more countries would be given five years to produce some 1300 pages of guidelines for representing the information models of their specialties in a rigorous, machineverifiable notation. Clearly, it would be sociologically and technologically impossible for such a group even to agree on the subject matter of such guidelines, let alone the coding details. But just as clearly as the bumblebee flies despite the laws of aerodynamics, the Text Encoding Initiative has actually succeeded in such an effort. The TEl Guidelines are extraordinary. Even if they were never adopted they would stand as a significant contribution to scholarship for their detailed analysis of the information sets of a huge range of complex text types. But in fact they have already been implemented, both by scholars for research and interchange and by commercial publishers for the publication of linguistic and humanistic works. I am delighted that my invention, the Standard Generalized Markup Language, was able to play a role in the TEl's magnificent accomplishment, particularly because almost all of the original applications of SGML were in the commercial and technological realms. It is reasonable, of course, that organizations with massive economic investments in new and changing information should want the benefits of information asset preservation and reuse that SGML offers. It is gratifying that the TEl, representing the guardians of humanity's oldest and most truly valuable information, chose SGML for those same benefits. The vaunted "information superhighway" would hardly be worth traveling if the landscape were dominated by industrial parks, office buildings, and shopping malls. Thanks to the Text Encoding Initiative, there will be museums, libraries, theaters, and universities as well.