A Proposal For a Standardized Common Use Character Set in East Asian Countries

Similar documents
Meetings and Conferences

The Current Status of Authority Control of Author Names in the National Diet Library

China National Bibliography at the Crossroad. Ben Gu ( 顧犇 ) National Library of China

Do we still need bibliographic standards in computer systems?

Cataloging in the National Diet Library : Centering on the outline from April 2002 and the relationship with the NII

Orientalist Libraries in the U.S.: Emerging Issues in Information Exchange

What's New in Technical Processing

NDL s Digital Collection and Service for Information Access

A Case Study of Web-based Citation Management Tools with Japanese Materials and Japanese Databases

In Need of a Total Plan: From Wade-Giles to Pinyin

Final Report on Pinyin Conversion by the CEAL Pinyin Liaison Group

1. PARIS PRINCIPLES 1.1. Is your cataloguing code based on the Paris Principles for choice and form of headings and entry words?

Digital reunification of dispersed collections: The National Library of Korea digitization project

From Clay Tablets to MARC AMC: The Past, Present, and Future of Cataloging Manuscript and Archival Collections

AU-6407 B.Lib.Inf.Sc. (First Semester) Examination 2014 Knowledge Organization Paper : Second. Prepared by Dr. Bhaskar Mukherjee

KOREA ESSENTIALS No. 1. Hangeul. Korea s Unique Alphabet

Users guide: Downloading Bibliographic Records via the NDL-OPAC

On the Development of the Institute of Chinese Studies Library at Heidelberg University

Metadata FRBR RDA. BIBLID (2008) 97:1 p (2008.6) 1

Automation of Processes in the National Library of China: Historical Review and Future Perspective

Retrospective Conversion of East Asian Materials

Cataloging Principles: IME ICC

Harvard Law School Library Collection Development Policy

The Organization and Classification of Library Systems in China By Candise Branum LI804XO

2009 CDNLAO COUNTRY REPORT

Survey on the state of national bibliographies in Asia Unni Knutsen, Oslo University College July 2006

Cataloguing Code Comparison for the IFLA Meeting of Experts on an International Cataloguing Code July 2003

Should the Journal of East Asian Libraries Be a Peer- Reviewed Journal? A Report of the Investigation and Decision

Preservation of East Asian Language Materials at the Library of Congress

Advocacy Actions of LAROC

AACR2 versus RDA. Presentation given at the CLA Pre-Conference Session From Rules to Entities: Cataloguing with RDA May 29, 2009.

INTRODUCTION TO. prepared by. Library of Congress Acquisitions and Bibliographic Access Directorate. (Internet:

Reference Books in Japanese Public Libraries that Provide Good Reference Services

ENCYCLOPEDIA DATABASE

POSITION DESCRIPTION Library Services Assistant-Advanced. Position Summary

Catalogues and cataloguing standards

Free Ebooks Brush Writing: Calligraphy Techniques For Beginners

INR 2002 Research Paper Assignment

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

A History of Writing. one of the earliest examples of writing, a 4th millennium tablet from Uruk, lists sacks of grain and heads of cattle

The CYCU Chang Ching Yu Memorial Library Resource Development Policy

Comparison between PR China and USA in the Field of Library and Information Sciences

Department of American Studies M.A. thesis requirements

MARC 21 : The Standard Exchange Format for the 21 st Century

INFS 427: AUTOMATED INFORMATION RETRIEVAL (1 st Semester, 2018/2019)

WORLD LIBRARY AND INFORMATION CONGRESS: 75TH IFLA GENERAL CONFERENCE AND COUNCIL

McGill-Harvard-Yenching Library Joint Digitization Project: Ming-Qing Women's Writings

Organizations and Institutions

Guidelines for Manuscript Preparation for Advanced Biomedical Engineering

Module-2. Organization of Library Resources: Advanced. Unit-2: Library Cataloguing. Downloaded from

REFERENCE SERVICE INTERLIBRARY ORGANIZATION OF. Mary Radmacher. Some of the types of library systems in existence include:

WHAT BELONGS IN MY RESEARCH PAPER?

RDA: The Inside Story

Masters in Film Studies

Chinese Collections in Japan

Cataloging Fundamentals AACR2 Basics: Part 1

ASIAN DEVELOPMENT REVIEW GUIDELINES FOR THE SUBMISSION OF MANUSCRIPTS

Continuities. Serials Catalogers Should Take the Plunge with RDA. By Steve Kelley

Instructions to Authors

The Korean Collection in the Harvard-Yenching Library

GENERAL WRITING FORMAT

WG2: Transcription of Early Letter Forms Brian Hillyard

STATEMENT OF INTERNATIONAL CATALOGUING PRINCIPLES

Comparing Books Held by Japanese Public Libraries: Outsourcing versus Local Government Management

Jerry Falwell Library RDA Copy Cataloging

How to write a RILM thesis Guidelines

No. 019 Newsletter - Association for Asian Studies. Committee on American Library Resources in the Far East

Journal of East Asian Libraries

An introduction to RDA for cataloguers

Cataloguing for the world: motivation, method and madness

History of Library Cataloguing. Nanaji Shewale Librarian, GIPE, Pune (India)

Cataloguing the Slavonic Manuscript Collection of the Plovdiv Public Library MARC21 * Template

ARAB REPUBLIC. Introduction of Machine-Readable Cataloguing at the National Information and Documentation Centre. SeppoVuorinen

Guidelines for academic writing

Instructions to Authors

Written language: a research guide LIS 407. Paul Hoffman

Cataloguing Code Comparison for the IFLA Meeting of Experts on an International Cataloguing Code July 2003 PARIS PRINCIPLES

You Say Pei-ching, I Say Beijing: Should We Call the Whole Thing Off?

GUIDE FOR ENGLISH THESIS PREPARATION

Discovering Modern China: Report on CLIR Project of the East Asia Library. Presented to UW Library Council By EAL CLIR Project Team May 12, 2016

Subject: RDA: Resource Description and Access Constituency Review of Full Draft Workflows Book Workflow

Report. General Comments

Introduction. The following draft principles cover:

LC GUIDELINES SUPPLEMENT TO THE MARC 21 FORMAT FOR AUTHORITY DATA

Authority Control: A Conversation

THE REGULATION. to support the License Thesis for the specialty 711. Medicine

Discussion Of Industrial Design Protection Practice In Governmental Agencies And Courts

INDEX. classical works 60 sources without pagination 60 sources without date 60 quotation citations 60-61

Review Your Thesis or Dissertation

ROLE OF FUNCTIONAL REQUIREMENTS FOR BIBLIOGRAPHIC RECORDS IN DIGITAL LIBRARY SYSTEM

By Aksel G. S. Josephson. THE Proposition for the establishment of a Bibliographi

Capturing the Mainstream: Subject-Based Approval

Reasons for separating information about different types of responsibility

(Presenter) Rome, Italy. locations. other. catalogue. strategy. Meeting: Manuscripts

Constructing Bibliographic Relationships through DOI for Asian Studies. Estelle Cheng

Read And Write Chinese: A Simplified Guide To The Chinese Characters By Rita Mei-Wah Choy READ ONLINE

CALL FOR PAPERS. standards. To ensure this, the University has put in place an editorial board of repute made up of

Author Guidelines Foreign Language Annals

The Founding of the Harvard-Yenching Library

Path between Authenticity and Integrity

Transcription:

Journal of East Asian Libraries Volume 1980 Number 63 Article 9 10-1-1980 A Proposal For a Standardized Common Use Character Set in East Asian Countries Tokutaro Takahashi Follow this and additional works at: https://scholarsarchive.byu.edu/jeal BYU ScholarsArchive Citation Takahashi, Tokutaro (1980) "A Proposal For a Standardized Common Use Character Set in East Asian Countries," Journal of East Asian Libraries: Vol. 1980 : No. 63, Article 9. Available at: https://scholarsarchive.byu.edu/jeal/vol1980/iss63/9 This Article is brought to you for free and open access by the All Journals at BYU ScholarsArchive. It has been accepted for inclusion in Journal of East Asian Libraries by an authorized editor of BYU ScholarsArchive. For more information, please contact scholarsarchive@byu.edu, ellen_amatangelo@byu.edu.

A PROPOSAL FOR A STANDARDIZED COMMON USE CHARACTER SET IN EAST ASIAN COUNTRIES I'okutaro Takahashi National Diet Library According to a current estimate, the world is producing about a half million titles of books yearly; some 160,000 titles, about 30% of the total, are said to be in non-roman scripts, while about 80,000 titles come from East Asian countries. The population of East Asian countries accounts for a quarter of the world population, so a proportional increase in the output of publications in this area may be expected in the future. 1. Characteristics of East Asian scripts Written Japanese usually consists of kanji (Chinese characters) and kana (phonetic alphabets in which are written words often functioning as prepositions and syllables indicating other grammatical relationships). As back as 1949, the Japanese Government formulated a set of some 2,000 Chinese characters, with their radicals and strokes modified in some cases to a simpler form, and instructed public Institutions, including schools, to use only these characters in their writing. Consequently, the Chinese characters which we now normally use are in some cases different from the traditional forms, and in varying degrees different from the Chinese characters currently used by Chinese and Koreans. Further, the people of the People's Republic of China are using radically simplified characters, while those in Taiwan retain the traditional form of Chinese characters; likewise, in Korea they are using a combination of Chinese characters and Hangul script. The following example shows how the same characters differ from one country to another. Fig. 1 Library in Taiwan and Korea in China (P.R.C.) in Japan 2. Essential requisites for common use To attain the goal of UBC (Universal Bibliographic Control), an East Asian version of the UNIMARC format has to be developed with the close cooperation of the countries concerned as well as of the IFLA UBC Office in London. Requisites for developing an East Asian version in the context of UNIMARC are the following: 1) The essential elements in Chinese characters in the descriptive part - 48 -

of cataloging, namely, title, author name, etc. should be linked with access points which ought to be transcribed in romanized form or other phonetic scripts. 2) In order to attain mutual use of MARC tapes produced by East Asian countries, a set of standardized characters in common use must be developed and formulated. 3. Uniform coding of Chinese-character sets for data processing Supposing that we get for use MARC tapes from other East Asian countries using different forms of Chinese characters, and that we undertake to follow ISBD principles strictly, some sort of conversion tables must be devised so as to enable us to output the descriptive part of each particular record as it appears in the original language. In March 1980, the National Institute of Japanese Literature published a technical report entitled "Kanji Dictionaries for Data Processing Systems". In that report, the following three components are pointed out as requisite for processing kanji characters. 1) Kanji coding dictionary: this provides a technical approach toward formulation of a correlation table between Chinese characters and their own codes, as well as specification of fonts used. 2) Kanji thesaurus: this functions as a machine-readable Chinese character authority file, similar in nature to an ordinary thesaurus, grouping together synonymous and other related charac ters to show their inter-relationships. 3) Kanji controller: this is to be responsible for maintaining the kanji character set, assigns code numbers to newly registered characters, and determines forms and fonts, etc. of these characters. This is an example of the Chinese character control process to be predeter mined as part of the total system of bibliographic Information processing in East Asian scripts. This control system may require a full-time person. In order to standardize a Chinese character set at a national level and to perform effective control over it, operation of a national-scale Chinese character authority system may be the best solution. Should this become operational in each East Asian country, the standardization of a common Chinese-character set and the exchange of Chinese-character MARC tapes between East Asian countries would become a reality. 4. Chinese-character sets in East Asian countries In order to make possible an exchange of Chinese-character bibliographic infor mation in such a form as JAPAN/MARC tape between the countries concerned, each country must have its own standardized character set. - 49 -

Graphic characters commonly used for information interchange, such as dia critics, numerals, and Roman, Greek, and Cyrillic letters are not many and could easily be provided in any character set. However, when it comes to the Chinese characters being used in this region, such characters are so numerous that it is essential, first of all, to survey the frequency of usage for each and all individual characters and determine which characters are to be listed in the standard set. In Korea, they use Chinese characters together with a Korean script called Hangul. The Korean Scientific and Technological Information Center (KORSTIC) has been developing a Hangul and Chinese character processing system since 1975. In China, a radical simplification of Chinese characters has been enforced since 1956 and yet a standard character set has yet to be established. In Taiwan, several surveys cn frequency of usage of their characters have been conducted, leading to the formulation of a standardized set of radicals and other components, which, if properly combined by a mechanized process produce individual characters. To date, the only standard interchange code available there is the telegraphic code. Nevertheless, this code is not suitable for most of the existing Chinese data processing systems. At present, the Institute of Information Industry is working on the establishment of a standard Chinese character set. 5. Standardization of a character set in Japan Japanese bibliographic data are usually represented by a combination of kana and kanji. In order to process such data, the Japan Information Processing Development Center, at the request of the Agency of Industrial Science and Technology, studied and prepared in 1974 standard kanji codes for information interchange. This study report was submitted to the Agency for further perusal, and its final outcome was the Japan Industrial Standard for Kanji coding system promulgated as JIS C6226-1978 in January 1978. Graphic characters have been included in this standard set after a close exami nation of their frequency of usage. This set consists of the following characters: Diacritics Numerals Roman letters Greek letters Cyrillic letters Kana Kanji Total 108 10 52 48 66 169 6,349 6,802 Coded kanji characters are divided into two groups. The first level group in cludes the 2,965 characters most frequently used; the second level group Includes - 50 -

3,384. In order to process some thousands of kanji In Japan, input and output machines must be so made as to conform to JIS standard. 6. Proposed ISO standard for East Asian scripts The peoples in East Asia share, in large part, the same origin of their culture and civilization, traceable to Chinese civilization. However, in the long history of their own growth and development, some things unique to each country have emerged and taken root. The language is no exception, and it frequently happens that the same Chinese characters have different meanings. For example, the noun i means a railway train in^japan but in China it means an automobile; again, the Chinese character J%j is frequently used in China, but is not in common use in Japan. Conversely, however, because of the cultural background shared by the East Asian countries as referred to above, many Chinese words are used with identical meanings by the peoples throughout this region. For this reason, therefore, if a full-scale survey is to be made of a group of characters commonly used, though in variant forms, among these nations, it would be possible to work out a Common Core Character Set (CCCS) to be shared by each nation. As a first step to this end, an adequate core size has to be determined. On the Japanese scene, as Miss Kiyoko Tamura has explained in her paper, the NDL character set has two groups. The first group has about 2,000 frequently used characters and the second group consists of less frequently used characters. Characters not in our set also ought to be handled. For the purpose of outputsing data, say, from China, there must be developed a system capable of generating such Chinese characters as are used by that country for accurate transcription of such essential elements as title and author, if ISBD principles are to be followed. To handle such characters as are not in the standard set, the National Institute of Japanese Literature has developed a unique system called FONTGEN (font generator). The complexity and the enormous number of Chinese characters (50,000 or more) require a highly sophisticated system for both software and hardware. The proposed Common Core Character Set (CCCS) uses a single code number for each character which is somewhat different in form, as is illustrated above in Figure 1. When it is necessary to switch from one form of character to a different form within a given data field in the phase of outputting a particular record, the device of using an escape sequence technique could be employed. For a conceptual outline of CCCS, please see Figure 2 below. 7. Format and authority control of East Asian bibliographic records The JAPAN/MARC format may well suggest development of a proposed UNIMARC East Asian version, and we are willing to contribute a reasonable share, based on our own experience. JAPAN/MARC consists of two parts, namely, a descriptive part and access points. - 51 -

The descriptive part must be transcribed according to ISBD rules, whereas to access points we are able to make another approach, in view of the necessity that access points should be so established as to meet most users' needs. In this regard, may I tell an illustrating episode about a Japanese librarian who visited China sometime ago? His name is Urata, which we write "i iz7 A Chinese colleague pointed out that if his books, published in Japan, were to be catalogued in China, his name would be entered as f. fq. This means that to change the form of Chinese characters for access points may be permissible but not for the descriptive part, in which ISBD principles should be observed strictly. Thus, what we expect will have to emerge in order to give effective control over access points in these several countries will be the development and implementation of an international MARC network, as well as a national authority control system. In conclusion, I wish to point out that in order to share the use of MARC tapes produced in countries using different forms of Chinese characters, a pre-requisite is to establish a Common Core Character Set (CCCS). To propose such a Set is the main purpose of this paper. Fig. 2 Korea Character Set China Character Set External I Characters Common Core **** External Characters Taiwan Character Set Japan Character Set (Mr. Takaha8hi is Director of the Administrative Division of the National Diet Library, Tokyo; Mr. Takahashi asks us to point out that this article was prepared in cooperation with Mr. Toshikazu Kanaka, Chief, Computer Section, Administrative Division, and Mr. Shojiro Maruyama, Senior Researcher, Acquisitions and Processing Division) - 52 -