Research on concept-sememe tree and semantic relevance computation

Similar documents
Chinese Word Sense Disambiguation with PageRank and HowNet

Introduction to WordNet, HowNet, FrameNet and ConceptNet

Word Sense Disambiguation in Queries. Shaung Liu, Clement Yu, Weiyi Meng

Scopus New Interface and its application in research. Elsevier Greater China 2014

Biography Of Entrepreneurs Pdf Download >>>

Author Academy: Your Guide to Publication Success. Lu Ye Managing Director, China Editorial Director, Physical Science & Engineering April 8, 2015

The Analysis of Film Subtitling Translation in the Cross-Cultural Communication Between America and China

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

Scholarship 2017 Chinese

The Research of Word Sense Disambiguation Method Based on Co-occurrence Frequency of Hownet

关于台词的备注 : 请注意这不是广播节目的逐字稿件 本文稿可能没有体现录制 编辑过程中对节目做出的改变

Towards Culturally-Situated Agent Which Can Detect Cultural Differences

Britannica 6 Book Interactive Science Library >>>

Indexing and Abstracting

全國高級中等學校專業群科 106 年專題及創意製作競賽 創意組 作品說明書封面 別 : 外語群. 參賽作品名稱 :Reading between Chinese Zodiac and English. Proverbs Interactive Picture Book

Autobiographies 自传. A Popular Read in the UK 英国流行读物. Read the text below and do the activity that follows. 阅读下面的短文, 然后完成练习 :

Publishing your paper in IOP journals

A Cognitive Analysis of False Friends in Chinese-English Translation on Conceptual Metaphor Theory

bitesizedchinese.com HSK Level 2 Chinese True or false Worksheets 010 Read the sentences carefully and decide if the statements below are true xīn 新

Research on the Development of Education Level of University Sports Aesthetics Based on AHP

On Advertisement Translation from the Perspective of. English-Chinese Cultural Differences

A Report of Similarities and Differences

The Inspiration of Folk Fine Arts based on Common Theoretical Model to Modern Art Design

ResearchSpace: Querying a Semantic Network

3D Video Transmission System for China Mobile Multimedia Broadcasting

Sound visualization through a swarm of fireflies

Reducing False Positives in Video Shot Detection

Christmas gangsta reindeer jingle bells

A Computational Model for Discriminating Music Performers

Metonymic Patterns for WOMEN across Time: A Usage-based Approach to Visualizations of Language Change

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder

James Davies Lessons Website: Break a Bad Habit! 打破坏习惯! LANGUAGE FOCUS: Higher-level lifestyle context, signposts & vocab

Problem Points Score USE YOUR TIME WISELY USE CLOSEST DF AVAILABLE IN TABLE SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

DOI /s x When Can China Put Tao Xingzhi into Its History? Reading Chu Zhaohui s Multiple Perspectives on Life Education

Appendices. Appendix 1

An Interactive Case-Based Reasoning Approach for Generating Expressive Music

Knowledge Representation

Research on sampling of vibration signals based on compressed sensing

Post-Routing Layer Assignment for Double Patterning

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

A Music Retrieval System Using Melody and Lyric

Shenzhen Ok Smart-Lcm Photoelectric Co., Ltd.

Triune Continuum Paradigm and Problems of UML Semantics

Cognitive modeling of musician s perception in concert halls

LEDs FOR DISPLAY. World-Class Advanced LED Encapsulation Manufacturer STOCK CODE: Jan.2018

Introduction to Knowledge Systems

The Cultural Differences Between English and Chinese Courtesy Languages. SUN Mei, TIAN Zhao-xia

Music Information Retrieval with Temporal Features and Timbre

MANKS. Oval Plate (36cm) HKD 1,860 Pitcher HKD 1,325 Oval Plate (22x25cm) HKD 625 LTD

The Debate on Research in the Arts

Before I Die, I Want To 在我离世前, 我要

The Japan Society for Oriental Medicine s journal Kampo Medicine Instruction to Authors (revised March 2008)

MUSIC A Language Without Borders

Letters of note volume ii by sweetwater

arxiv:cs/ v1 [cs.ir] 23 Sep 2005

Keywords: Edible fungus, music, production encouragement, synchronization

Improving MeSH Classification of Biomedical Articles using Citation Contexts

Identifying functions of citations with CiTalO

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Chinese Syntax. A Minimalist Approach

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

Asian Social Science August, 2009

Amazon: competition or complement to OPACs Maja Žumer University of Ljubljana, Slovenia

A Real Time Infrared Imaging System Based on DSP & FPGA

On the Superiority of Translators Over Machines * REN Rui, ZHANG Lele. Northeastern University, Shenyang, China

Incorporating Chinese Characters of Words for Lexical Sememe Prediction

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

STEP. A Student Brochure. Standardized Test of English Proficiency إحضار إثب ات الشخصية شرط أسايس لدخول االختب ار

Music Radar: A Web-based Query by Humming System

A Room with a View. I opened my eyes to a well-dressed attractive man standing over my bed. He was trying to

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 7, NOVEMBER

For Travel Agency Staff Only. MK Flight schedules. HKG-MRU MK641 01:30/07:15 (Every Tue & Sat) MRU-HKG MK640 20:45/10:30+1(Every Thu & Sun)

Creating Mindmaps of Documents

National Sun Yat-Sen University Thesis/Dissertation Format Regulations

United States Patent: 4,789,893. ( 1 of 1 ) United States Patent 4,789,893 Weston December 6, Interpolating lines of video signals

Updates on Programmes for January February 2014

New Media Art and Chinese Traditional Aesthetics

Everybody loves raymond aftermath alt. stories

Broken Wires Diagnosis Method Numerical Simulation Based on Smart Cable Structure

Reference Books in Japanese Public Libraries that Provide Good Reference Services

Shanxi, PRC, China *Corresponding author

Adaptive Key Frame Selection for Efficient Video Coding

SIMULATION STUDY ON COPY DEMULTIPLEXING *

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

A CRITICAL STUDY OF LIN YUTANG AS A TRANSLATION THEORIST, TRANSLATION CRITIC AND TRANSLATOR

Decomposing Creativity: The Case of Writing Humor. Lydia Chilton University of Washington, Stanford University

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Jazz Melody Generation and Recognition

A Study of Predict Sales Based on Random Forest Classification

The Teaching Method of Creative Education

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Klystron Output Resonator - Particle-in-Cell (PIC) Simulation

Journal of Environmental Chemistry manuscript submission guidelines (Revised April 2014)

A Comparison of Methods to Construct an Optimal Membership Function in a Fuzzy Database System

ITU-T Y.4552/Y.2078 (02/2016) Application support models of the Internet of things

A Discriminative Approach to Topic-based Citation Recommendation

CITATION ANALYSES OF DOCTORAL DISSERTATION OF PUBLIC ADMINISTRATION: A STUDY OF PANJAB UNIVERSITY, CHANDIGARH

Feature-Based Analysis of Haydn String Quartets

Transcription:

Research on concept-sememe tree and semantic relevance computation GuiPing Zhang 1, Chao Yu 1, DongFeng Cai 1, Yan Song 1, JingGuang Sun 1 1 Natural Language Processing Laboratory, Shenyang Institute of eronautical Engineering P.O.box 118, No.52 North Huanghe Street, Shenyang, Liaoning, China, 110034. Email: zgp@ge-soft.com, yc089067@sina.com, cdf@ge-soft.com, mattsure@gmail.com, sunjingguang@gmail.com bstract. In this paper, we parse the hierarchy relation of concepts in HowNet, with the parsing we design a concept-sememe tree structure, which can make it easy to understand the relations between the sememes in concepts. The tree structure can easily describe the relationship between sememes in concept and make it more convenient to process by computer. The steps of building the tree are also presented in this paper. We then discuss the relevance computation based on HowNet. The preliminary experiment shows the relevance computation method can achieve satisfying results. Keywords: HowNet; relevance; concept-sememe tree. 1 Introduction The computation of similarity and relevance between words has wide application in the fields of machine translation, sense disambiguation, IR etc. In some cases, similarity and relevance may be confused. Similarity between words refers to the feature of clustering while relevance refers to the extent of association [1]. The two words with similar semantic relationship may have similar associated words such as the words 医生 (doctor) and 护士 (nurse), both have the same associated words as 医院 (hospital), 病人 (patient), 打针 (have an injection), 吃药 (drug) etc. However, associated words usually have no similarity such as the word pair 吃 (eat) and 食物 (food) etc. Researches on computation of similarity and relevance are widely spread in the world. Recently, there are two semantic similarity computation methods based on HowNet: one is described by Professor Liu Qun in [2] and the other is described by professor Dong ZhenDong in [3]. In this paper, we describe how to build concept-sememe tree on the basis of decomposing word concept definition (DEF). The concept-sememe tree illustrates DEFs of the words and makes them easy to process by computer. 2 Concept similarity computation and the implementation of the conceptsememe tree Concept is a kind of description of word sense. Polysemous words have several different concepts. ecause polysemous words contain several concepts, we must ascertain the DEFs of the polysemous words before we compute the semantic similarity of the word pairs. Thus the computation of the semantic similarity is, in fact, to compute the similarity of the concepts. 2.1 HowNet based concept similarity computation Number of the same concept node determines the similarity of the concept of the words. The more concept nodes two concepts share, the more similar the two concepts are. Here the concept node is a pair of semantic role=sememe. Same concept nodes refer to the same description of sememes in character 398

form and the same structure of these concepts. In practical computation, it is not easy to determine whether two concept nodes are with the same structure. In this paper we design the concept-sememe tree which translates concept DEF into tree structure, it can make us convenient to understand the hierarchy structure of concept DEF and also can make it easy to process by computer. For example, the concept DEF of the concept 医生 (doctor) and 护士 (nurse) are shown as: 医生 (doctor) : DEF={human 人 :HostOf={Occupation 职位 },domain={medical 医 },{doctor 医治 :agent={~}}}; 护士 (nurse) : DEF={human 人 :HostOf={Occupation 职位 },domain={medical 医 },{TakeCare 照料 :agent={~}}} Through the analysis of the concept DEF we develop a program to decompose concept DEF and build concept-sememe tree as fig1 and fig2: 医生 (doctor) NONME={human 人 } Hostof={Occupation 职位 } Domain={medical 医 } NONME={doctor 医治 } agent={~} Fig. 1. The concept-sememe tree of concept 医生 (doctor) 护士 (nurse) Hostof={Occupation 职位 } NONME={human 人 } Domain={medical 医 } NONME={TakeCare 照顾 } agent={~} Fig. 2. The concept-sememe tree of concept 护士 (nurse) We can see there are five concept nodes in the concept-sememe tree of doctor and nurse respectively. Seen from the character form, there are four same pairs of the concept node, they are: NONME={human 人 }, Hostof={Occupation 职位 }, Domain={medical 医 } and agent={~}. ut we will find from the concept-sememe tree of concept 医生 (doctor) that the father node of concept node agent={~} is NONME={doctor 医治 },and so to the concept-sememe tree 护士 (nurse), the father node of agent={~} is NONME={doctor 照顾 },that is,their corresponding father nodes are different, so we consider the concept node agent={~} of concept 医生 (doctor) is different from that of concept 护士 (nurse) in structure. That is, in these two concepts with five concept nodes, the same number of concept nodes is three, which is the most important reference to similarity computation between concepts. The computation is described in detail in [3]. 2.2 The process of building concept-sememe tree 1:We categorize the concept node into two types, one type is the dynamic role = {value} which has been fully described. The partly described ones are classified to the other type. For example, in the concept DEF of 医生 (doctor) HostOf={Occupation 职位 } is the first type and human 人, doctor 医治 is the second type. 2:The method of searching the father node to one concept node of the first type: to any concept node (Node(i)), find the nearest colon j which is ahead of Node(i), if the number of the symbol { equals to the number of } between the area of Node(i) and colon j, so the concept node which is ahead of colon j is the father node of Node(i). Otherwise find the next position of the colon which is ahead of colon j, continues the same judgment. 3:The method of searching the father node from concept nodes of the second type: to any concept node (Node(i)), find the nearest colon j which is ahead of Node(i), if the number of the symbol { is one more than the number of } between the area of Node(i) and colon j, so the concept 399

node which is ahead of colon j is the father node of Node(i). Otherwise find the next position of the colon which is ahead of colon j, continues the same judgment. 4:When father nodes of all concept nodes are found, we can easily describe the relationship of the sememes through concept-sememe tree. Now we make an example to illustrate the use of concept-sememe tree: The DEFs of the words 洗衣 (wash clothes) and 洗衣机 (washer) are {wash 洗涤 :patient={clothing 衣物 }} and {tool 用具 :{wash 洗涤 :instrument={~},patient={clothing 衣物 }}} respectively. ccording to procedure mentioned above we can build concept-sememe tree as fig 3, 4 shows: 洗衣 (wash clothes) NONME={wash 洗涤 } Fig. 3. Concept-sememe tree of 洗衣 (wash clothes) patient={clothing 衣物 } 洗衣机 (washer) NONME={tool 工具 } NONME={wash 洗涤 } instrument={~} patient={clothing 衣物 } Fig. 4. Concept-sememe tree of 洗衣机 (washer) It is easily seen from fig 3,4 that the concepts 洗衣 (wash clothes) and 洗衣机 (washer) have two same concept nodes which are concept node wash 洗涤 and concept node patient={clothing 衣物 } in the form of character. ut the ancestral nodes of the two concept-sememe trees are wash 洗涤 and tool 工具 respectively. Then there won t be any same concept nodes between them. Thus the value of similarity between concepts wash clothes ( 洗衣 ) and washer ( 洗衣机 ) is small in HowNet. 3 Semantic relevance computation Semantic relevancy refers to how close of the relationship between two words is. In this paper, we decide to implement concept relevance computation by using HowNet. The sememes in concept of the words and related concepts field[4] in HowNet provide an approach for relevance computation. Relevant concepts are the concepts which are associated with the concept of a given word. Related concepts field is a set of relevant concepts, which is figured by words. 3.1 Compute the relevance Rel 1 of the sememes of DEFs between two words The expression of the sememes of the word concept provides clues for us to build the association relationship of the words. We separate the DEF into the set of sememes. The overlap of the sememes indicates the extent of semantic relevancy, here we describe it as Rel 1. See the following formula (1): Sememe Sememe Rel1 (, ) = Sememe Sememe (1) Here Sememe and Sememe refer to the sememe set of concept and sememe set of concept respectively. The numerator refers to the number of the same sememes in concept and concept while the denominator refers to the number of the sememes in the union of sememes in concept and sememes in concept. For example the concept of the word 报纸 (paper) and 新闻 (news) is {publications 书刊 :{publish 出版 : ContentProduct={news 新闻 }, LocationFin={~}}} and {news 新 400

闻 } respectively. We can get the number of sememes of the two concepts is 4 and 1, the same sememe between them is only {news 新闻 }. Then the Rel 1 ( 报纸 (paper), 新闻 (news) ) is 0.25. 3.2 Compute the complete containing degree Rel 2 of the sememes of DEF Two related words may have no same sememe in their DEFs. The sense of two words is related because they contact with intermediate entities in some context. For example, 食物 (food), 鱼 (fish) and 海货 (seafood), their DEFs are {food 食品 }, {fish 鱼 } and {food 食品 :material={fish 鱼 }} respectively. It is easy to see that the DEF of 海货 (seafood) describes the relationship between 食物 (food) and 鱼 (fish). ecause 鱼 (fish) can be regarded as material of 食物 (food), then the relevancy of 食物 (food) and 鱼 (fish) can be built through the DEF of the word 海货 (seafood). Take another example, the words 吃 (eat) and 面包 (bread), their DEFs are {eat 吃 } and {food 食品 } respectively. It is hard to find the association from their DEFs since they are too simple, but they should be relevant from common sense. Then we should ascertain their relevancy by the description of the sememes of related words in their related concepts field. There are 2554 words in the relate concepts field of {eat 吃 } of the word 吃 (eat) while there are 116 of them contain the DEF {food 食品 }. nd there are 857 words in the relate concepts field of {food 食品 } of the word 面包 (bread) while there are 13 of them contain the DEF {eat 吃 }. Thus we count the number of words which completely contain the concept of the word to illustrate the extent of containing DEF as Rel 2, see the formula (2): Num( bj ) Num( ai ) Rel2 (, ) = MX (, ) (2) N N In the formula (2), and refer to different concepts, a i refers to the concept of word i in the related concepts field of concept while b j refers to the concept of word j in the related concepts field of concept. N refers to the number of the words in the related concepts field of word and N refers to the number of the words in the related concepts field of word. Supposing concept is {eat 吃 } and concept is {food 食品 }, we apply formula (2) as following: 13 116 Rel 2(, ) = MX(, ) = MX(0.123163,0.213117) = 0.213117 857 2554 3.3 Compute the containing degree of the words in the related concepts field of two words (Rel 3 ) y using the method mentioned above, we still find it is hard to establish relationship between some words such as the words 鱼 (fish) and 水 (water). Their DEFs are {fish 鱼 } and {water 水域 } respectively. We can see that there is no same sememe and mutual comprisal relationship between sememes but the word 水 (water) still appears in the related concepts field of the word fish 鱼. This is because the first sememe of DEF 鱼 (fish) is described as fish 鱼 and its sememe frame is {animal 兽 :MaterialOf={edible 食物 },{alive 活着 :experiencer={~},location={waters 水域 }},{eat 吃 :patient={~}},{swim 游 :agent={~}}}. Professor Dong ZhenDong extracts the DEF segment location={water 水域 } to make a fuzzy search and embodies all the words whose DEF contains the whole segments into the related concepts field of word 鱼 (fish). This makes an approach to build the relationship between words 鱼 (fish) and 水 (water). We use the number of the same words in related concepts field of two words to indicate the extent of the semantic relation of the words. See formula (3): W W Rel3 (, ) = W W (3) Here W and W refers to word sets of concept and concept respectively, numerator refers to the intersection of two words related concepts field while denominator refers to the union of them. 401

4 Experiment fter the computation of Rel 1 (, ) Rel 2 (, ) Rel 3 (, ), we can use formula (4) to compute the concept relevance between words and. Rel(, ) = β Rel (, ) + β Rel (, ) + β Rel (, ) 1 1 2 2 3 3 Here β i is an adjustable weight coefficient and the sum is 1. In our experiment, these parameters are: β 1 =0.3,β 2 =0.2,β 3 =0.5. Here we list some experimental results in table 1: Table 1. relevance computation results Word 1 Word 2 similarity relevance (HowNet) 鱼 (Fish) 水 (Water) 0.016667 0.3686129 吃 (Eat) 食物 (Food) 0.000624 0.2339492 吃 (Eat) 水 (Water) 0.000624 0.0454115 吃 (Eat) 报纸 (Newspaper) 0.000624 0.002011 新闻 (News) 报纸 (Newspaper) 0.116667 0.76 新闻 (News) 记者 (correspondent) 0.118605 0.733 新闻 (News) 传播 (Disseminate) 0.000624 0.1710146 警察 (Policeman) 法官 (Judge) 0.825000 0.7278904 警察 (Policeman) 警衔 (Police rank) 0.101247 0.2913509 警察 (Policeman) 治安 (public order) 0.001247 0.0540706 医生 (Doctor) 护士 (Nurse) 0.620000 0.6752305 医生 (Doctor) 手术 (Operation) 0.000624 0.5999999 护士 (Nurse) 手术 (Operation) 0.000624 0.4780876 From the table we can see that most experimental results are satisfying. The words with strong similarity also get high relevance value such as the words 警察 (policeman) and 法官 (judge). The words with strong relevancy usually do not show strong similarity such as words 新闻 (news) and 记 者 (correspondent) etc. (4) 5 Conclusion The experimental result is acceptable and conforms to human s intuition. In the future work, we will make some further researches on semantic information of concept of word and we will classify the concepts to make the relevance value more reasonable. References 1. Dagan I., Lee L. and Pereira F. (1999), Similarity-based models of word cooccurrence. 2. LiuQun, LiSuJian, word similarity computation based on HowNet http:/ /www.keenage.com, 2002. 3. Dong ZhenDong, HowNet and Computation of Meaning[M] Singapore:World Scientific press, p197-206 2006. 4. Dong Qiang, Dong ZhenDong, related concepts field s building based on HowNet[J] Language Computing and Text Processing based on context p364-369 2003 402