009 Intenatonal Confeence on Machne Leanng and Computng IPCSIT vol.3 (0 (0 IACSIT Pess, Sngapoe Reseach on Sentence Relevance Based on Semantc Computaton Jnzhong Xu, Xaozhong Fan, Jntao Mao School of Compute Scence & Technology, Beng Insttute of Technology, Beng 0008, Chna Abstact. In Automatc Queston Answeng System, one ey of queston pasng and answe extactng s elevance computaton of sentences. Ths pape ntoduces an appoach to compute sentence elevance. Usng the semantc computaton n HowNet, the elevance of sentences can be calculated. The elevance of sentences can be calculated though the computaton of elevance between wods of sentence and subect wods. Expemental esults show the effectveness of the method. Keywods: queston pasng, sentence elevance, semantc computaton, HowNet. Intoducton The eseach of Automatc Queston Answeng System has queston match. In that teatment pocess of ou system, the elevance between queston and answe wll be used. That eseach nvolves the sentence elevance, whch s ealzed by computng the elevance of wods semantc n sentences. The mpotance of the study s to have an appopate calculaton method of sentence elevance. Semantc elevance and semantc smlaty ae two dffeent concepts. But they ae closely lned. Semantc smlaty s a degee that two wods n dffeent contexts can be used to eplace each othe wthout alteng the syntactc and semantc stuctue of text. Semantc elevance ncludes some concepts of semantc smlaty, and the calculaton method of smlaty has efeence value fo the eseach of elevance. Semantc elevance s ealzed based on computng of semantc smlaty wth HowNet.. Common calculaton method of elevance At pesent calculaton method sentence elevance manly has two types: the calculaton method based on smlaty and the calculaton method based on ontology... Calculaton method based on smlaty Calculaton method based on smlaty s geneally to use of vecto space model. It s vey smla to the calculaton method of wod smlaty based on statstcal. Quey sentence and sentences of sentence base ae tansfomed nto vectos of chaactestc wods space, and then the cosne angle between two vectos s used to descbe the sentence elevance. Ths method s smple, but modellng and computng based on vecto space ae not an accuate eflecton of semantc nfomaton of quey sentence... Calculaton method based on ontology Ontology has eceved moe and moe attenton n compute scence. A lot of eseach s ted to apply ontology to computng elevance. Sentence elevance computng based on ontology ncludes thee pats: constuct doman ontology, eywods weghtng and elevance computng. Ontology descbes common concept of a feld explctly and fomally, uses and computes can accuately communcate based on semantc by the defnton of shaed doman-specfc concept and wods, and not ust exchange the data of gammatcal epesentaton. Concept s the basc stuctual unt of ontology, Coespondng autho. Tel.: + 86 369333066; fax: +86 006895944. E-mal addess: xunzhong@63.net. 80
concept sets ae oganzed by concept heaches. Concept has attbute, and the concepts ae elated to each othe though the attbutes. In addton, example and synonym ae stuctued n ontology to descbe the sentence contents adequately. Ontology theoy has been wdely used n nowledge engneeng, natual language pocessng, dgtal lbay and othe felds. Ontology descbes the sentence wods by a sees of defnton of attbute, elaton and example, these ae the basc esouce to compute elevance. 3. Sentence elevance 3.. HowNet HowNet s an on-lne common-sense nowledge base unvelng nte-conceptual elatons and nteattbute elatons of concepts as connotng n lexcons of the Chnese and the Englsh equvalents. Concept and sememe ae two mpotant pats of HowNet. Concept s a descpton of wods semantc, a wod can be descbed wth seveal concepts, and concept s descbed by sememe. Descpton of concepts n HowNet s an attempt to pesent the nte-elaton between concepts and that between the attbutes. HowNet has 68 sememes, these sememes nclude 0 types: Event 事件, entty 实体, attbute 属性值, avalue 属性值, quantty 数量, qvalue 数量值, SecondayFeatue 次要特征, syntax 语法, EventRole 动态角色 and EventFeatues 动态属性. The descpton of concepts n HowNet s necessaly complex. A concept s descbed by an example. It s found n the nowledge dctonay as: NO.=03683 W_C= 打 G_C=V [da3] S_C= E_C=~ 球,~ 网球,~ 篮球,~ 羽毛球,~ 牌,~ 扑克,~ 麻将,~ 秋千,~ 太极拳, 球 ~ 得很棒 W_E=play G_E=V S_E= E_E= DEF={execse 锻炼 :doman={spot 体育 }} In the example, No. s the enty numbe of the concept n the dctonay, G_C s the pat of speech of ths concept n Chnese, and G_E s that n Englsh, E_C s the example of the concept, W_E s the concept n Englsh, DEF s the defnton. Semantc computng has been ntoduced to computaton of elevance wth HowNet. Fst of all, calculatve mechansm s set up between smlaty and elevance of sememe. Second, accodng to the calculaton esults of sememe, smlaty and elevance of wods can ealze. Last, sentence elevance can compute by smlaty and elevance of wods. 3.. Wods semantc smlaty The sememe classfcaton tee s gven n HowNet, uppe and lowe semantc elaton exsts between paent node and chld node, so we can compute the semantc smlaty by usng sememe classfcaton tee. Spd( p, p Sm( p, p Depth( p Depth( p In the fomula, p and p ae two sememe, Spd(p, p s the concdence degee of p and p, Depth(p s the depth of sememe n sememe tee. In HowNet, concept has 4 pats: fst basc sememe descpton, the fst sememe n DEF; othe basc sememe descpton, othe sememes except fst sememe n DEF; 3 elaton sememe descpton, pats of concept s descbed by elaton sememe=basc sememe o elaton sememe=(specfc wod o (elaton sememe=specfc wod n DEF; 4 symbol sememe descpton, pats of concept s descbed by elaton symbol basc sememe o elaton symbol (specfc wod n DEF. The two concept smlaty of the 4 pats ae Sm (C, C, Sm (C, C, Sm 3 (C, C and Sm 4 (C, C. So the whole smlaty of concept s as follow: 8
Sm( C 4, C Sm ( C, C Sm ( C, C 4 In the fomula, β ( 4, and β +β +β 3 +β 4 =, β β β 3 β 4 >0. In the two wods W and W, f W has n concepts: c, c c n, W has m concepts: c, c c n, the smlaty of W and W s the maxmum smlaty of concepts, t s as follow: Sm( W, W max Sm( s, s n, m 3.3. Wods semantc elevance Semantc elevance s a vague concept, thee s no specfc obectve ctea can be measued. In syntactc analyss, the elevance of two wods s hghe; the dstance s shote n syntactc tee. Relevance of wods nvolves mophology, syntax, semantc, even pagmatc and othes. In these, elevance of the wods the geatest mpact s semantc elevance. The defnton of elevance s a eal numbe between 0 and. Defnton : In syntactc analyss, semantc elevance s the degee of modfy elaton, subect-pedcate elaton and co-efeental elaton of two wods n a phase stuctue. Defnton : In HowNet, fo W and W ae any two wods, W has n meanngs: S, S,, S n, W has m meanngs: S, S,, S m. If thee exsts S = S, n, m, then elevance of W and W s. If smlaty of two wods s hgh, the elevance s hgh, but elevance of two wods s hgh, the smlaty s not hgh. Semantc s descbed by sememe n HowNet. The sememe has 6 classes n HowNet, each class s a tee stuctue, and these classes ae elated to each othe by explanaton sememe. Hyponymy of sememe tee consttute the elevance of sememe, elaton between sememe and explanaton sememe consttute the elevance of sememe. In the system consst of sememe, each sememe may also have a cetan elaton wth the sememe whch s n dffeent tee. Ths adds lateal tes to tee heachy stuctue of sememe, so the system of sememe becomes a netwo stuctue. Accodng to nhetance, hypogynous sememe nhets explanaton sememe of uppe sememe, and explanaton sememe also has cetan heachcal stuctue, so t has lateal elevance of sememe. The elevance between two sememes s SRel, the fomula as follow: d( p, p S Rel( S, S max D (,, In the fomula, p and p ae the fst basc sememe of concept S and S espectvely; D s the degee of lateal elaton, t s the dffeence value of the laye explanaton sememe nfluencng a sememe and ts laye, and exceedng the laye, the nfluence can be gnoed, the value of D s 0 sutably; d(p, p s the dffeence value of the laye whch p appeas n the explanaton sememe of p. In HowNet, the concept s descbed by sememe, so the concept elevance must be computed by sememe elevance. If a concept C s expessed by n sememes, concept C s expessed by m smemes, and then concept elevance s appoxmate to the max value of the elevance between a sememe n C and a sememe n C. It s as follow: C Rel( C, C max S Rel( S, S In that, S s the th smeme n C, S s the th sememe n C, and n, m. In HowNet, a wod can has seveal concepts, so the wods elevance can be computed by concept elevance. If wod W has x concepts, wod W has y concepts, and then wods elevance s appoxmate to the max value of the elevance between C and C. It s as follow: W Rel( W, W max C Rel( C, C In that, C s the th concept n W, C s the th concept n W, and x, y. 8
Accodng to these eseaches above, the semantc elevance of two wods consst of semantc smlaty and wods elevance. So defne wods semantc elevance s WSRel(W,W : WSRel( W, W SmW (, W W Re l( W, W In that, Sm(W,W s the smlaty between wod W and wod W, WRel(W,W s the wods elevance, η and η ae the weghtng of smlaty and elevance. 3.4. Keywods weghtng computaton A sentence s descbed by a set whch conssts of wods, but some of wods ae the eywods whch have hgh weght n sentences. Computng eywods weghtng of sentence s stat befoe computng elevance. Ths mples that the dstance between eywods s shot, eywods weghtng of coespondng s hgh, and vce vesa. Keywods weghtng computaton adopt TFIDF, the fomula as follows: w n tf tf df tf log( N / n 0.0 df n tf log( N / n 0.0 In that, w s a weghtng of wods n sentence, tf s occuence fequency of wods of sentence n coespondng document, d f s quantzaton of dstbuton whch eywods ae n all documents of the df log( / 0.0 document set. N n, N s the total numbe of document, n s the numbe of document whch appeas eywods, n s the numbe of sentence. The denomnato of fomula s nomalzaton pocessng fo evey component. Based on ths assumpton, occuence fequency of wods of sentence n coespondng document must be hgh, but t s low n othe documents of all documents. So, weghtng s the poduct of tf and d f. Fom ths, quantzaton of weghtng s based on occuence fequency of wods and documents. 3.5. Sentence elevance computaton Sentence elevance s the elevance between quey sentence and sentences of sentence base. Quey sentence has moe eywods of sentence base and weghtng of eywods n quey sentence s hgh, the elevance s hgh. So sentence elevance s measued by wods and eywods weghtng n sentence. So defnton as follow: a set of wods n sentence A: WodSetA{ Wod, Wod... Wodn} a set of wods n sentence B n sentence base: WodSetB { Wod, Wod... Wod } 3 smlaty between Wod and Wod : Sm( Wod, Wod n, 4 a set of weghtng elevance: Weght { Weght, Weght... Weght} 5 elevance between A and B: StRel(A, B Sm( Wod, Wodp Sm( Wod, Wod Tang a Wod fom WodSetA, f and p, then. If Sm( Wod, Wod Weght p s moe than theshold tmp, then weghtng elevance. Evey wod n WodSetA s pocess as above, the weghtng elevance s WeghtA. So the elevance between A and B s the sentenc elevance StRel(A, B: St Rel( A, B w WeghtA The sentence elevance afte nomalzaton pocessng: WeghtB 83
St Rel( A, B w WeghtA w WeghtA WeghtB WeghtB λ>0, λ s constant whch value s detemned accodng to the specfc condtons. 4. Expement 4.. Sentence elevance computaton Wod segmentaton pocessng s the basc of ths expement system, the sentence segment ndependent wods. Wods elevance computaton s the ey of sentence elevance computaton, t s the theoetcal bass of quey wods matchng, and ts esult has an mpotant effect on elevance statstc. Sentence elevance statstc s the basc of matchng, t s a elevance statstc of quey sentence, and system output the sentence of maxmal elevance and ts elevance. The expement s ealzed as Fgue. 4.. Expemental analyss Fg. : Calculaton pocess of Sentence elevance In the expement, we choose compute feld and collect 0 Common compute falues fom Web, magazne and boos. These sentences ae the quey sentences. Fo examnng the esult of sentence elevance computaton, we choose sentences as sentence base fom a boo whch s Common Compute Poblem and 000 Cases of Falues. In the expement system, we have a statstc. The esult s n table, the QN s the quey No., RS s the numbe of elevant sentences of quey sentence, and LR s the lagest elevance value. Table. : Result of expement Wods semantc elevance computaton s based on HowNet, and the sentence elevance s computed usng wods elevance. Fom expement, the esult s satsfactoy, and the method s an effectve method n Automatc Queston Answeng System. The method also can be appled to othe felds, such as text categozaton, text clusteng, nfomaton eteval and othes. 5. Refeences [] Qun Lu, and Suan L. Wod smlaty computng based on Hownet. Computatonal Lngustcs and Chnese Language Pocessng. 00, pp. 59-76. [] Zhendong Dong, and Qang Dong. Hownet [EB/OL]. http: www.eenage.com,000030 0030. [3] Tan Xa. Study on Chnese Wods Semantc Smlaty Computaton. Compute Engneeng. 007, 33(6:9-94. [4] K.W. Gan, and P.W. Wong, Annotaton nfomaton stuctues n Chnese texts usng Hownet. Chana E.Second Chnese Language Pocessng Woshop. HongKong: Hong Kong Unvesty of Scence and Technology. 000, pp. 85-9. [5] Suan L. Reseach of Relevance between Sentences Based on Semantc Computaton. Compute Engneeng and Applcatons. 00, 38(7:75-76. 84