arxiv: v1 [cs.cl] 16 Aug 2018

Size: px
Start display at page:

Download "arxiv: v1 [cs.cl] 16 Aug 2018"

Transcription

1 Sememe Preiction: Learning Semantic Knowlege from Unstructure Textual Wiki Descriptions Wei Li, Xuancheng Ren, Damai Dai, Yunfang Wu, Houfeng Wang, Xu Sun MOE Key Laboratory of Computational Linguistics, School of Electronics Engineering an Computer Science, Peking University arxiv: v1 [cs.cl] 16 Aug 2018 Abstract Huge numbers of new wors emerge every ay, leaing to a great nee for representing them with semantic meaning that is unerstanable to NLP systems. Sememes are efine as the minimum semantic units of human languages, the combination of which can represent the meaning of a wor. Manual construction of sememe base knowlege bases is time-consuming an labor-intensive. Fortunately, communities are evote to composing the escriptions of wors in the wiki websites. In this paper, we explore to automatically preict lexical sememes base on the escriptions of the wors in the wiki websites. We view this problem as a weakly orere multi-label task an propose a Label Distribute seq2seq moel (LD-seq2seq) with a novel soft loss function to solve the problem. In the experiments, we take a real-worl sememe knowlege base HowNet an the corresponing escriptions of the wors in Baiu Wiki 1 for training an evaluation. The results show that our LD-seq2seq moel not only beats all the baselines significantly on the test set, but also outperforms amateur human annotators in a ranom subset of the test set. 1 Introuction With the evelopment of the Internet, new wors are emerging at an unpreceente spee. It is ifficult for Natural language processing (NLP) systems to unerstan these new wors or phrases without auxiliary information with limite contexts. Fortunately, many volunteers in the community are evote to constructing the wiki pages for many of the new wors an phrases, which makes wiki websites like Wikipeia 2 an Baiu Wiki 1 very valuable resources. However, the escriptions in the wiki pages are epicte in natural language which are unstructure, noisy an har for the NLP systems to unerstan. Therefore, there is a great nee to represent these wors with semantic meanings in a structure fashion that can be easily unerstoo by the NLP systems. Wor Description in Baiu Wiki Sememes 缕析 (analysis in etail) 逐条认真的分析缕析行情 (A careful analysis. Analyze the market in etail) 分析 (analyze) 详 (etaile) Table 1: Example of Sememe Preiction via Wiki Description Wors can be represente with semantic subunits from a finite set of limite size. For example, the wor lovers can be approximately represente as {Human Frien Love Desire}, the wor 缕析 (analysis in etail) can be represente as analyze an etaile (see Table 1). Linguists efine sememes as this kin of semantic sub-units of human languages (Bloomfiel, 1926) that express semantic meanings of concepts. This iea is similar to the iea of language universals (Goar an Wierzbicka, 1994). To represent the semantic meaning of wors with the sememes, researchers buil sememe base knowlege bases (KBs) by annotating wors with a pre-efine set of sememes. One of the usable an most wellknown sememe KBs is HowNet (Dong an Dong, 2006). In the ontology of HowNet, there are over 2,000 sememes. They manually annotate more than 100,000 wors an phrases in Chinese in a hierarchical structure. Because of its explicit way to represent knowlege (the number of sememes is limite, which emboy knowlege), HowNet is easy to be aopte in NLP systems while remains unerstanable to human beings. The manual construction of such KBs is very time-consuming an labor-intensive, for instance, HowNet was built for more than 10 years by a number of linguistic experts. However, many of

2 the annotate wors in the KBs are alreay out of ate, in the meanwhile, the progress of manual construction can not catch up with the emerging spee of the new wors. In the real worl, there are many ifferent wiki websites, such as Wikipeia, 3 Baiu Wiki, 4 Huong Wiki, 5 an so on. These websites contain millions of high-quality articles escribing the worl knowlege emboie in the wors an phrases. For instance, Baiu Wiki contains 15,243,192 articles, mostly in Chinese. When people are not familiar with some wors, nowaays they prefer to look up the escriptions in these wiki websites. However, for the commonly use classical wors, ictionaries are still a valuable source, in which people look up the meanings of the wors. Therefore, we think it is reasonable to use resources from both kins of web pages. In this paper, we inten to explore a way to preict lexical sememes of a wor base on its corresponing escriptions in the wiki (ictionary) pages. We view this task as a weakly orere multi-labeling problem (the orer is alreay given by HowNet). Vinyals et al. (2015) claime that the orer between labels matters, an they propose to use seq2seq learning for the multi-label problem. Nam et al. (2017) propose several ways to organize the orer of labels so that seq2seq woul work better on the MLC task. We observe that the classical sequence-to-sequence (seq2seq) moel makes a strong assumption on the orer of the labels, which is not suitable for the multi-label problem. Assuming the orer between tokens with heuristic rules is also problematic. Therefore, we propose a novel label istribute seq2seq moel (LD-seq2seq) with a soft loss function to solve the problem. Since single wiki escription may involve noise, an is not comprehensive, we esign a multi-resource encoer that can take various escription resources (e.g., escriptions from ifferent wiki websites) into consieration. Our contributions lie in the following aspects: We propose to preict the sememes of a wor base on its textual escriptions in wiki pages, which transforms the unstructure textual knowlege from wiki pages into istribute semantic knowlege We view this task as a weakly orere multilabeling problem an propose a Label Distribute Seq2seq moel with a soft loss function to solve the problem. We o extensive experiments on sememe preiction an observe that our moel beats all the baselines. Our moel even outperforms amateur human annotators on a ranom subset of the test set. Furthermore, we give a etaile analysis of the error reasons with concrete examples an possible solutions. 2 Relate Work HowNet has been wiely use in various NLP tasks such as wor similarity computation (Liu an Li, 2002), wor sense isambiguation (Duan et al., 2007) (similar to wor Clustering (Jin et al., 2007)), sentiment analysis (Huang et al., 2014) an name entity recognition (Li et al., 2016). Niu et al. (2017) claime that using wor sememe information in HowNet can improve wor representation. Zeng et al. (2018) propose to expan the Linguistic Inquiry an Wor Count (Pennebaker et al., 2001) lexicons base on wor sememes. Xie et al. (2017) propose to preict sememes of a wor by measuring the similarity between the jointly learne wor embeings an sememe embeings. Their solution is simple an straightforwar. However, in many of the cases in real-worl applications, we o not have access to the accurately learne wor embeings, especially for the new wors. First, it is har to collect enough context ata for learning the embeing of new wors. Secon, in most of the eep learning applications, the wor embeings are fixe after training, which makes it ifficult to learn the embeing of the new wors an fix them into the system. There are three main types of traitional machine learning algorithms for the Multi- Label Classification (MLC) task, problem transformation methos (Boutell et al., 2004; Tsoumakas an Vlahavas, 2007; Rea et al., 2011),algorithm aaptation methos (Clare an King, 2001; Zhang an Zhou, 2007; Fürnkranz et al., 2008) an ensemble methos (Tsoumakas et al., 2011; Szymański et al., 2016). Simple neural networks moels have also been applie to eal with MLC tasks (Zhang an Zhou, 2006; Nam et al., 2014;

3 Benites an Sapozhnikova, 2015; Kurata et al., 2016). Li et al. (2015) propose to consier the previously generate labels as features for preicting new ones. Yang et al. (2018) further evelope this iea to use recurrent neural networks to moel the correlation between labels. 3 Our Approach In this section, we show our solution to the sememe preiction task. An overview of our moel is shown in Figure Task Definition Given one (single resource) or a few (multiple resources) textual escriptions D = ( (1), (2),, (m) ) of a wor from the wiki pages, our goal is to preict the corresponing sememes s = (s 1,s 2,,s n ) of the wor, where s is a subset of the sememe label space S. Our task can be moele as fining an optimal label sequence s that maximizes the conitional probability p(s D), which is calculate as follows, p(s D) = n p(s i s 1,s 2,,s i 1,D) (1) i=1 3.2 Basic Seq2seq Moel for Multi-Label Vinyals et al. (2015) propose to use seq2seq paraigm to eal with the problem of preicting labels that form a set. They claime that the orer of the labels matters even for labels that form a set. Encoer: For one textual escription i with l wors in D, it is first encoe to l hien states (h 1,h 2,,h l ) by the biirectional gate recurrent neural networks (BiGRNN), the last of which is treate as the vector v for the textual escription, h t = GRU(h t 1,x t ) (2) Decoer: The ecoer generates the sememes one by one base on the vector v. At the t-th time of ecoing, the probability of the sememe p t is calculate as follows, s t = GRU([s t 1;c t;e t 1]) (3) p t = softmax(ws t +b) (4) c t = l αt,ihi (5) i=1 score t,i = v T a tanh(w as t +U ah i) (6) exp(score t,i) α t,i = l j=1 exp(scoret,j) (7) where s t is the hien state at the t-th time, c t is the context vector calculate with the attention mechanism over the hien states of the escriptions (h 1,h 2,,h l ), e t 1 is the embeing of the sememe with the highest probability preicte at the (t 1)-th time. 3.3 Propose Label Distribute Seq2seq Moel We think that even though the orer of the labels matters, we shoul not strictly restrict the orer of the labels. However, the traitional cross entropy loss function applie to the classical seq2seq moel actually puts a strict assumption on the orer of the labels. For example, if the thir token in the target sequence is preicte at the first place, it will be punishe with no ifference to preicting an utterly wrong token. To eal with the task of preicting weakly orere labels (or even unorere labels), we propose a soft loss function instea of the original har cross entropy loss function, loss = y ilog(p i ) (8) i Instea of using the original har one-hot target probability y i, we use a soft target probability istribution, which is calculate accoring to y i an the sememe sequence s of this sample. Let q s enote the bag of wors representation of s, where only the slots of the sememes in s are fille with 1s. We use a function ξ to project the original target label probability y into a new probability istribution y, y t = ξ(y t,q s ) (9) This function is esigne so as to ecrease the harsh punishment when the moel preicts the labels in the wrong orer. In this paper, we apply a simple yet effective projection function as Equation (10). It shoul be note that this is an example implementation, one can also esign more sophisticate projection functions if neee, ξ(y t,s) = ((q s /M)+y t )/2 (10) where M is the length of s. This function means that at the t-th time of ecoing, for each target token s i, we first split a probability ensity of 1.0 equally across all the M tokens into 1/M. Then, we take the average of this probability istribution an the original probability y t to be the final probability istribution at time t.

4 Decoer p t-1 p t p t A... w w w w 1 2 l-1 l... Encoer Gate Mechinism Decoer Encoer A... w 1 w 2 w l-1 w l-1 A... w w w w 1 2 l-1 l Figure 1: An overview of propose label istribute seq2seq moel. We compute the loss base on a soft probability istribution rather than the one-hot istribution. Figure 2: An illustration of the multi-resource moel, the ifferent escriptions vectors an context vectors are combine with gate mechanism. In the figure we show two escriptions, while our moel can be extene to multiple escriptions. 3.4 Multi-Resource Moel Description resource from a single source can be unreliable an is not able to express the meaning of the wor comprehensively. In this paper, we propose to use a multi-resource encoer to make use of escriptions from multiple resources. An overview of this moel is shown in Figure 2. To emonstrate the effectiveness of multiple resources, we implement our encoer using two resources for simplicity, but it can be extene to more resources without much effort. Assume for a wor, we have two textual escriptions (1) an (2), containing l (1) an l (2) wors (w (1) 1,w(1) 2,,w(1)) an l (1) (w (2) 1,w(2) 2,,w(2)) respectively. We use Bil (2) GRNN to encoe the two escriptions separately into two sequences of hien states,,h(1)) an (h (2) l (1) 1,h(2) 2,,h(2)). l (2) (h (1) 1,h(1) 2 We use the hien states at the last time step h (1) l (1) an h (2) l (2) as the representation for the corresponing escriptions (1) an (2), which we enote as v (1) an v (2). To combine the two vectors v (1) an v (2) into one uniform v, we apply the gate mechanism, which is calculate as follows, g 1 = σ(w 1[v (1) g 2 = σ(w 2[v (1) ;v(2) ;v(2) ]+b1) (11) ]+b2) (12) v = g 1 v (1) +g 2 v (2) (13) whereσ inicates thesigmoi function, W 1,W 2, b 1 an b 2 are learnable parameters, means the element-vise multiplication. The ecoer part follows the same structure as the moel in Section 3.3, except that we first separately calculate the context vectors c (1) t an c (2) t with attention mechanism. Then we use gate mechanism to combine the two vectors c (1) t an c (2) t into one context vector c t. The gate mechanism here follows the same process for the combination of v (1) an v (2) with ifferent parameters. 4 Experiment 4.1 Dataset HowNet: HowNet is a knowlege base that uses sememes to represent the semantic meaning of a wor or a phrase. There are over 100,000 annotate wors in HowNet. Wors can have multiple senses. Each sense is further represente by a combination of no more than 8 sememes. The sememes form a hierarchical structure. However, following the settings of most of the previous work, we o not consier the specific relations between sememes, but only consier the orer between them, which we call weakly orere sememes. For simplicity, we o not consier multiple senses, an just assume that the first sense of the wor is its basic sense. Wiki Pages: Because the wors annotate in the HowNet consist of both common wors an newly emerge wors (by that time), we choose two escription sources for the wors, Baiu Wiki 6 ( 百度百科 ) an Baiu Dictionary 7 ( 百度词典 ). Baiu Wiki contains 15,244,702 articles that are eite by the volunteers with a lot of new emerge wors, while Baiu Dictionary is similar to language ictionaries (still from crow-source) with

5 better quality efinitions an escriptions for common wors. We get the textual escriptions of the wors annotate in the HowNet from Baiu Wiki an Baiu Dictionary, an get 62,810 wors that have attache escriptions (if at least one of the escriptions from two sources exist, it is counte as one case). We ranomly split the ata into three parts, Train (80%), Dev (10%) an Test (10%). 4.2 Baseline Moels ML-KNN (Multi-label KNN): This is the k- Nearest Neighborhoo classification metho aapte to multi-label classification. LP (Label Powerset): LP (Tsoumakas an Vlahavas, 2007) is a problem transformation approach to multi-label classification that transforms a multi-label problem to a multi-class problem with one multi-class classifier traine on all unique label combinations foun in the training ata. CC (Classifier Chain): For the label space with L labels, CC (Rea et al., 2011) trains L classifiers orere in a chain accoring to the Bayesian chain rule. BR (Binary Relevance): BR (Boutell et al., 2004) transforms a multi-label classification problem with L labels in the label space into L single-label separate binary classification problems using the same base classifier. RNN-MLLR (RNN multi-label logistic regression): This moel uses the same multiresource encoer of our propose moel, while uses the one-versus-all logistic regression multi-label classifier to preict the sememes base on the encoe vector of the escriptions. 4.3 Experiment Details For the textual escriptions, we use characters as the input, the vocabulary size of characters is 11,097. We ranomly initialize the character embeings. There are 2,185 sememes in the HowNet. We use wor2vec (Mikolov et al., 2013) toolkit to pre-train the embeings of the sememes with efault parameters of the coe to capture the co-occurrence relationship of the sememes. The embeings of the sememes are fine-tune uring training. The imension of both the character embeings an sememe embeings are 200. All the imensions of hien states are set to Moel P R F1 ML-KNN LP BR CC RNN-MLLR Basic Seq2seq LD-Seq2seq (Proposal) Table 2: Comparison with ifferent baseline moels. All the moels use two wiki resources in this table. P means Precision, R means recall rate The batch size is 20. < EOS > token is ae to the en of a sememe sequence to inicate when to stop preiction. We use Aam optimizer (Kingma an Ba, 2014) to minimize the loss. We train our moel for 10 epochs, an choose the moel parameters from the epoch that gets the highest F1 score on the Dev set. 4.4 Results an Analysis We use micro Precision (P), Recall rate (R) an F1 score as the evaluation metrics. Comparison with Baselines: In Table 2, we show our experiment results compare with the baseline methos. From the results we can see that clustering base metho ML-KNN performs the worst for sememe preiction. We assume that this is because the textual escriptions are very iverse, which makes KNN har to etermine the borers among space of ifferent labels. Methos that aim to transform classifiers to multi-label task perform closely to each other, with F1 scores aroun 25%. Compare with traitional machine learning methos (ML-KNN, LP, CC, BR), neural network base methos (RNN-MLLR, Basic seq2seq) performs much better, which beats other baselines by a big margin. Although RNN-MLLR achieves goo results, it is still not as goo as seq2seq base moel. We assume that this is because MLLR base moels are not very goo at moeling the connections between labels. In our sememe preiction task, the sememes are in weak orer. Moreover, some sememes are strongly relate to some others an some sememes often co-occur. For instance, when the sememe Emotion occurs, it is likely to be followe by FeelingByBa, generic an esire. Our propose Label Distribute seq2seq moel gets the best performance, we assume that this is because even though orer between labels matters (Vinyals et al., 2015), for

6 Metho Precision Recall F1 Human Human+Wiki Proposal Table 3: Comparison with Human performance on a ranom subset of test samples. Human means that the annotator oes not have access to the wiki escriptions. Human+Wiki means the annotator has access to the wiki escriptions. the weakly orere multi-label problem, a strong assumption on orering hurts the performance, an our soft loss function can effectively relieve the problem. Comparison with Human Performance: In Table 3, we show the results of amateur human an our moel s result on a subset of the test set. We ranomly select 100 samples from the test set, an ask human annotators to select 1 5 sememes out of 20 that they think can escribe the meaning of the wor. Because the annotators o not have backgroun knowlege on HowNet, the annotation task is actually simpler than annotating from scratch. The annotators are highly eucate (with proper knowlege) amateur native speakers without special training on linguistics or the annotation system of HowNet. We guarantee that all the correct sememes are within the selecte 20 sememes. The annotators are aske to first preict the sememes base on their common sense (Human), then they are provie with the escriptions from Baiu Wiki an aske to o the work again (Human - Wiki). From the results we can see that even for human beings, it is har to preict the sememes completely right without special training on the annotation system of HowNet. Human annotators are able to unerstan the semantic meaning of the wor an can unerstan the escription very well. However, they ten to preict more sememes than there actually are, which is reflecte by the high recall rate. The imbalance between precision an recall inicates that the sememe architecture of HowNet may have the problem of being too finegraine, many sememes other than the actual ones are also relate to the wor, meaning wise. Still, by referring to wiki escriptions, human annotators are able to preict more precisely, this is because there are some rare wors or entities in the ataset that people selom use in the real life. Although the recall rate of our propose moel is not as high as human annotators, its precision beats human annotators by a big margin, which makes the F1 score higher than human. We assume that this is because by learning from the big bulk of training ata, our moel is more likely to be consistent with the logic of the annotation system. Effect of Propose Soft Loss Function: From Table 2 we can see that seq2seq moel with our novel soft loss (LD-Seq2seq) performs much better than the basic seq2seq moel. We think that this is because our novel loss function eases the restriction on the orer between labels. For example, assume the target sememes are (s 1,s 2,s 3 ) in orer. At the first time step of ecoing, the one-hot loss function woul strongly punish the ecoer from giving s 2 or s 3 probabilities, which may confuse the ecoer, because at the moment the ifference between time step 1 an time step 2 may not be significant when the orer of the labels are not obvious. However, our soft loss function woul still lea the ecoer to firstly choose s 1, while the two labelss 2 ans 3 are also encourage with some probability less than s 1. The experiment results show that this moification is very effective to make seq2seq work well on the multilabel problem. Effect of Applying Multi-Resource: From Table 4 we observe that using multiple resources instea of a single one can greatly improve the performance. This correspons with our expectation as more escriptions can provie more comprehensive information of the wor from various aspects. Moreover, since the alignment between sememes an escriptions are noisy, the gate mechanism can automatically ecie how much one escription contributes to the preiction base on its relateness. Between the two resources we use (Baiu Wiki an Baiu Dictionary), ictionarystyle resource provies much higher precision ( ), we assume this is because the escriptions in this kin of resource have better quality in general. However, many new wors an rare wors are not inclue in the ictionary an some of the entries in the Baiu Dictionary have noisy escriptions as well (e.g., English escriptions instea of Chinese), so ictionary alone oes not preict as well as the multi-resource one.

7 Correct 24 % 24 % Plausible Wrong 29 % 23 % Partial Close Literal % % % 3.45 % 6.90 % Too Simple Unable % % Pattern Complex Polysemy Figure 3: The istribution of preiction result types. Figure 4: The istribution of error types in Wrong. Moel Precision Recall F1 SingleRes-Wiki , SingleRes-Dict MultiRes Table 4: Results of using ifferent resources. The seq2seq moel applies the basic architecture without aaptation to the multi-label problem. SingleRes inicates that the encoer only consiers a single textual resource. MultiRes inicates that the encoer consiers multiple textual resources (Wiki an Dictionary). 4.5 Error Analysis an Case Stuy In Figure 3, we show the istribution of the results from a ranomly chosen subset of test samples (100 samples) an give some concrete examples of the sememe preiction in Table 5. We use accuracy (the case is viewe as right only if all of its sememes are matche) as the evaluation metric in the error analysis. correct means the preiction is completely right. In Figure 3, Wrong means that our moel makes wrong preictions. For instance, for the wor 国有化 (nationalize), the stanar answer is -ize an central, while our preiction is place, own, country an politics, none of the preicte sememes are in the answer set, but these sememes actually make sense, because nationalize is inee to make something own by the country, which is usually an action of politics, our preiction fails to capture the ynamic proceure of -ize, but still this sequence of sememes can escribe some aspects of the wor, thus being able to help in ownstream tasks. Partial means that part of the result is correct or the result is a subset of the real answer, for instance, for the wor 宦门 (official family), our preiction is family an official, while the correct answer is family, human an official, our preiction captures most part of the meaning, an the missing sememe human can actually be euce by the sememe family. Plausible means that we think the preicte sememes can also reflect the meaning of the wor or better, even ifferent from the original ones, for example, for the wor 混纺 (blen fabric), our preiction is material, clothing an tool while the answer is artifact, clothing an tool. The ifference between two sequence of sememes lie between material an artifact, blen fabric is clearly an artificial material, both the answer an our preiction captures one aspect of the wor, our sequence of sememes are even better for presenting the semantic meaning of the wor. The existence of plausible preictions (not entirely equal to the reference) may be relate to the annotation system of HowNet. Some of the sememes we observe in the reference are very sparse, for instance, weatherfine is a sememe in HowNet, which we think can be split into other sememes like weather an begoo. Except for the wrong preictions (29%), we observe that the rest of the preiction result types are all similar to or can be substitution to the stanar sememes of the wor. We think for these parts of the preictions, the preicte sememes are able to represent most part of the meaning of the wor, which is helpful for ownstream tasks. Actually, even part of the wrong preictions can be of help, which we will explain in etail. In Figure 4, we further split the reason of the Wrong preictions in Figure 3 into seven categories. Literal: Among the reasons, a large part ( Literal 24.14%) is because the moel is istracte by the literal meaning of some part of the escriptions that is not the key information about the wor. For example, for the wor 磕 (knock),

8 Wor Reference Preiction Category 历史唯物主义 (historical 知识 (knowlege), 思想 (thinking), 物知识 (knowlege), 思想 (thinking), 物 Correct materialism) 质 (physical), 主 (primary), 最 (most) 质 (physical), 主 (primary), 最 (most) 宦门 (official family) 家庭 (family), 人 (human), 官 (official) 家庭 (family), 官 (official) Partial 混纺 (blen fabric) 人工物 (artifact), 衣物 (clothing), 用具材料 (material), 衣物 (clothing), 用具 Plausible (tool) (tool) 国有化 (nationalize) 变性态 (ize), 归属中央 (central) 地方 (place), 有 (own), 国家 (country), 政 (politics) Wrong Table 5: Examples of wor an sememes. Reference inicates the stanar sememes in HowNet, Preiction inicates our preicte results. The categories of the examples are corresponing to Figure 3. our moel preicts the sememes position an wholly, because there are expressions about position like 碰在硬东西上 (knocke on a har thing), 人与人之间 (between people) an 使附着物掉下来 (make the attachment off), these expressions are all concerne about the position of something, which mislea the moel. Close: 20.69% of the wrong preictions are actually close to the answers. 国有化 (nationalize) we mentione above is an example of this type. Polysemy: 17.24% of the wrong preictions are because of polysemy, that is, some wors have multiple meanings, the stanar sememes refers to a ifferent meaning from the escription. For example, 一如 can mean title of a rank in karate or the same, the sememes refer to the meaning of the same, while the escription in the wiki is about karate. The mismatch between the escription an the answer causes such problems. Complex: 10.34% of the wrong preictions are because the escriptions are too complex or long, which usually inclue many other meanings of the wor. Because we only use a heuristic way to align the senses with the escription, an the senses in the escriptions of the wiki are not clearly aligne, sometimes the sememes in the reference is only a part of the escription, which is not in the ominant position. For example, the wor 践履 can mean step on an fulfill, step on is the original meaning of the wor, however, the most common usage of this wor now points to the meaning of fulfill. In the escription, a large part is escribing the meaning step on an giving instances of this meaning. This makes our moel focus on the wrong part of the escription, thus making wrong preictions. Pattern: 6.9% of the wrong preictions are because the pattern of the annotate answer, most of which are involve with the explanation of some rarely use Chinese characters. For example, the wor 轲 means wooen vehicle, but this original meaning is rarely use now, an the wor is more acknowlege as part of the name of a saint in China 孟轲 (Mencius), so the sememes in the reference are character an China. Too Simple: 3.45% of the wrong preictions are because the escriptions from the wiki are too simple. For example, the escription of the wor 猛子 is 扎猛子, which is just another way of expression without much explanation. Unable: We can not tell why our moel fails to preict the right answer for the rest of the wrong preictions (17.24%). Uner this circumstance, the escriptions are clear, but the preicte sememes are not concerne about the escription. To solve the mistakes we mention above, several possible methos can be applie. First, a more powerful wor sense alignment step can be applie, this can make the escription an the sememes correspon to each other. Secon, the annotation system can be moifie, so that the sparsity of the sememes can be reuce an less overlappe. Thir, context of the wors can be introuce to help istinguish between ifferent senses. 5 Conclusion an Future Work In this paper, we focus on the task of learning knowlege from unstructure textual escriptions from wiki pages. We choose to represent wors an phrases with weakly orere sememes. To preict the sememes of a wor base on the escriptions, we propose to apply a seq2seq base moel. We observe that irectly applying seq2seq framework is problematic because of its strong assumption on the orer between labels. To make seq2seq moel more suitable for multi-label tasks, we propose a novel soft loss function that turns the one-hot target label into a probability istribution. To make preiction more accurate, we also propose a multi-resource encoer that makes use of multiple wiki resources. Experiment results show our label istribute seq2seq moel works well on

9 the sememe preiction task. The performance is even better than amateur human on a ranomly selecte subset of the test set. We make a etaile error analysis an propose possible solutions. In the future, we woul like to explore how to better align the wor senses with the articles in the wiki pages. It woul also be interesting to take the more sophisticate structures of sememes into consieration. References Fernano Benites an Elena Sapozhnikova Haram: a hierarchical aram neural network for large-scale text classification. In Data Mining Workshop (ICDMW), 2015 IEEE International Conference on, pages IEEE. Leonar Bloomfiel A set of postulates for the science of language. Language, 2(3): Matthew R Boutell, Jiebo Luo, Xipeng Shen, an Christopher M Brown Learning multilabel scene classification. Pattern recognition, 37(9): Amana Clare an Ross D King Knowlege iscovery in multi-label phenotype ata. In European Conference on Principles of Data Mining an Knowlege Discovery, pages Springer. Zhenong Dong an Qiang Dong Hownet An The Computation Of Meaning (With C-rom). Worl Scientific. Xiangyu Duan, Jun Zhao, an Bo Xu Wor sense isambiguation through sememe labeling. In IJCAI, pages Johannes Fürnkranz, Eyke Hüllermeier, Enelo Loza Mencía, an Klaus Brinker Multilabel classification via calibrate label ranking. Machine learning, 73(2): Cliff Goar an Anna Wierzbicka Semantic an lexical universals: Theory an empirical finings, volume 25. John Benjamins Publishing. Minlie Huang, Borui Ye, Yichen Wang, Haiqiang Chen, Junjun Cheng, an Xiaoyan Zhu New wor etection for sentiment analysis. In Proceeings of the 52n Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages Peng Jin, Xu Sun, Yunfang Wu, an Shiwen Yu Wor clustering for collocation-base wor sense isambiguation. In International Conference on Intelligent Text Processing an Computational Linguistics, pages Springer. Dieerik P. Kingma an Jimmy Ba A metho for stochastic optimization. abs/ Aam: CoRR, Gakuto Kurata, Bing Xiang, an Bowen Zhou Improve neural network-base multi-label classification with better initialization leveraging label cooccurrence. In Proceeings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages Li Li, Houfeng Wang, Xu Sun, Baobao Chang, Shi Zhao, an Lei Sha Multi-label text categorization with joint learning preictions-as-features metho. In Proceeings of the 2015 Conference on Empirical Methos in Natural Language Processing, pages Wei Li, Yunfang Wu, an Xueqiang Lv Improving wor vector with prior knowlege in semantic ictionary. In Natural Language Unerstaning an Intelligent Applications, pages , Cham. Springer International Publishing. Qun Liu an Sujian Li Wor similarity computing base on hownet. Computational linguistics an Chinese language processing, 7(2): Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrao, an Jeff Dean Distribute representations of wors an phrases an their compositionality. In Avances in neural information processing systems, pages Jinseok Nam, Jungi Kim, Enelo Loza Mencía, Iryna Gurevych, an Johannes Fürnkranz Largescale multi-label text classification revisiting neural networks. In Joint european conference on machine learning an knowlege iscovery in atabases, pages Springer. Jinseok Nam, Enelo Loza Mencía, Hyunwoo J Kim, an Johannes Fürnkranz Maximizing subset accuracy with recurrent neural networks in multilabel classification. In Avances in Neural Information Processing Systems, pages Yilin Niu, Ruobing Xie, Zhiyuan Liu, an Maosong Sun Improve wor representation learning with sememes. In Proceeings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages James W Pennebaker, Martha E Francis, an Roger J Booth Linguistic inquiry an wor count: Liwc Mahway: Lawrence Erlbaum Associates, 71(2001):2001. Jesse Rea, Bernhar Pfahringer, Geoff Holmes, an Eibe Frank Classifier chains for multi-label classification. Machine learning, 85(3):333. Piotr Szymański, Tomasz Kajanowicz, an Kristian Kersting How is a ata-riven approach better than ranom choice in label space ivision for multi-label classification? Entropy, 18(8):282.

10 Grigorios Tsoumakas, Ioannis Katakis, an Ioannis Vlahavas Ranom k-labelsets for multilabel classification. IEEE Transactions on Knowlege an Data Engineering, 23(7): Grigorios Tsoumakas an Ioannis Vlahavas Ranom k-labelsets: An ensemble metho for multilabel classification. In European Conference on Machine Learning, pages Oriol Vinyals, Samy Bengio, an Manjunath Kulur Orer matters: Sequence to sequence for sets. arxiv preprint arxiv: Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, an Maosong Sun Lexical sememe preiction via wor embeings an matrix factorization. In Proceeings of the 26th International Joint Conference on Artificial Intelligence, pages AAAI Press. Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, an Houfeng Wang Sgm: Sequence generation moel for multi-label classification. arxiv preprint arxiv: Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu, an Maosong Sun Chinese liwc lexicon expansion via hierarchical classification of wor embeings with sememe attention. Min-Ling Zhang an Zhi-Hua Zhou Multilabel neural networks with applications to functional genomics an text categorization. IEEE transactions on Knowlege an Data Engineering, 18(10): Min-Ling Zhang an Zhi-Hua Zhou Ml-knn: A lazy learning approach to multi-label learning. Pattern recognition, 40(7):

Outline. Introduction to number systems: sign/magnitude, ones complement, twos complement Review of latches, flip flops, counters

Outline. Introduction to number systems: sign/magnitude, ones complement, twos complement Review of latches, flip flops, counters Outline Last time: Introuction to number systems: sign/magnitue, ones complement, twos complement Review of latches, flip flops, counters This lecture: Review Tables & Transition Diagrams Implementation

More information

Promises and challenges of electronic journals 169. Heting Chu Palmer School of Library & Information Science, Long Island University, NY, USA

Promises and challenges of electronic journals 169. Heting Chu Palmer School of Library & Information Science, Long Island University, NY, USA Promises an challenges of electronic journals 169 Learne Publishing (1999)13, 169 175 Introuction Rapi avancement of information technologies, incluing the internet an igitizing techniques, means that

More information

arxiv: v1 [cs.cl] 23 Aug 2018

arxiv: v1 [cs.cl] 23 Aug 2018 Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations Guangxiang Zhao, Jingjing Xu, Qi Zeng, Xuancheng Ren MOE Key Lab of Computational Linguistics, School of EECS, Peking

More information

Singing Voice Conversion Using Posted Waveform Data on Music Social Media

Singing Voice Conversion Using Posted Waveform Data on Music Social Media Singing Voice Conversion Using Poste Waveform Data on Music Social Meia Koki Sena, Yukiya Hono, Kei Sawaa, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku an Keiichi Tokua Department of Computer Science

More information

Christine Baldwin Project Manager, SuperJournal. David Pullinger Project Director, SuperJournal

Christine Baldwin Project Manager, SuperJournal. David Pullinger Project Director, SuperJournal What reaers value in acaemic journals 229 Learne Publishing (2000)13, 229 239 Introuction SuperJournal 1,2 was a research project in the Electronic Libraries (elib) programme 3 that examine how reaers

More information

A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM

A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM T. Borer an A. Cotton BBC R&D, 56 Woo Lane, Lonon, W12 7SB, UK ABSTRACT High Dynamic Range (HDR) television has capture the imagination of the

More information

Perceptual Quantiser (PQ) to Hybrid Log-Gamma (HLG) Transcoding

Perceptual Quantiser (PQ) to Hybrid Log-Gamma (HLG) Transcoding Perceptual Quantiser (PQ) to Hybri Log-Gamma (HLG) Transcoing Part of the HR-TV series. Last upate June 07. Introuction This ocument escribes the transcoe process between PQ an HLG where the isplay brightness

More information

DXR.1 Digital Audio Codec

DXR.1 Digital Audio Codec DXR.1 Digital Auio Coec SECTION 1...INTRODUCTION... 3...DIGITAL SERVICES... 3...WHAT COMES WITH THE DXR.1?... 3 2...SETUP... 4...DATA CONNECTION... 4...POWER CONNECTION... 4...AUDIO CONNECTIONS... 5...CONTACT

More information

JAMIA. Information Information for Authors

JAMIA. Information Information for Authors 102 2005 Information for Authors Information JAMIA for Authors The Journal of the American Meical Informatics Association (JAMIA) enorses an recommens the guielines publishe as Uniform Requirements for

More information

Generating Chinese Classical Poems Based on Images

Generating Chinese Classical Poems Based on Images , March 14-16, 2018, Hong Kong Generating Chinese Classical Poems Based on Images Xiaoyu Wang, Xian Zhong, Lin Li 1 Abstract With the development of the artificial intelligence technology, Chinese classical

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

Lecture Notes 12: Digital Cellular Communications

Lecture Notes 12: Digital Cellular Communications SNR Lecture Notes 2: Digital Cellular Communications Consier a cellular communications system with hexagonal cells each containing a base station an a number of mobile units Figure 5: Celluar Communication

More information

Chinese Word Sense Disambiguation with PageRank and HowNet

Chinese Word Sense Disambiguation with PageRank and HowNet Chinese Word Sense Disambiguation with PageRank and HowNet Jinghua Wang Beiing University of Posts and Telecommunications Beiing, China wh_smile@163.com Jianyi Liu Beiing University of Posts and Telecommunications

More information

Life Science Journal 2014;11(6)

Life Science Journal 2014;11(6) A Stuy of Joranians Television Viewers Habits Hani H. Al-Dmour, Muhamma Alshurieh 2, Sa'a Salehih 3. Marketing Department Faculty of Business, The University of Joran. Amman Joran, E-mail: mourn@ju.eu.jo

More information

An Introduction to Deep Image Aesthetics

An Introduction to Deep Image Aesthetics Seminar in Laboratory of Visual Intelligence and Pattern Analysis (VIPA) An Introduction to Deep Image Aesthetics Yongcheng Jing College of Computer Science and Technology Zhejiang University Zhenchuan

More information

The Ukulele Circle of Fifths - Song Structure Lesson

The Ukulele Circle of Fifths - Song Structure Lesson The Ukulele Circle of Fifths - Song Structure Lesson You will learn: How the circle of fifths is constructe. How the circle of fifths helps you unerstan the structure of a song. How to use the circle of

More information

Sentiment and Sarcasm Classification with Multitask Learning

Sentiment and Sarcasm Classification with Multitask Learning 1 Sentiment and Sarcasm Classification with Multitask Learning Navonil Majumder, Soujanya Poria, Haiyun Peng, Niyati Chhaya, Erik Cambria, and Alexander Gelbukh arxiv:1901.08014v1 [cs.cl] 23 Jan 2019 Abstract

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

Finding Dense Subgraphs via Low-Rank Bilinear Optimization Fining Dense Subgraphs via Low-Ran Bilinear Optimization Dimitris S. Papailiopoulos Ioannis Mitliagas Alexanros G. Dimais Constantine Caramanis The University of Texas at Austin Abstract Given a graph,

More information

By Jon R. Davids, MD, Daniel M. Weigl, MD, Joye P. Edmonds, MLIS, AHIP, and Dawn W. Blackhurst, DrPH

By Jon R. Davids, MD, Daniel M. Weigl, MD, Joye P. Edmonds, MLIS, AHIP, and Dawn W. Blackhurst, DrPH 1155 COPYRIGHT Ó 2010 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Reference Accuracy in Peer-Reviewe Peiatric Orthopaeic Literature By Jon R. Davis, MD, Daniel M. Weigl, MD, Joye P. Emons, MLIS,

More information

Modeling Musical Context Using Word2vec

Modeling Musical Context Using Word2vec Modeling Musical Context Using Word2vec D. Herremans 1 and C.-H. Chuan 2 1 Queen Mary University of London, London, UK 2 University of North Florida, Jacksonville, USA We present a semantic vector space

More information

arxiv: v2 [cs.sd] 15 Jun 2017

arxiv: v2 [cs.sd] 15 Jun 2017 Learning and Evaluating Musical Features with Deep Autoencoders Mason Bretan Georgia Tech Atlanta, GA Sageev Oore, Douglas Eck, Larry Heck Google Research Mountain View, CA arxiv:1706.04486v2 [cs.sd] 15

More information

arxiv: v3 [cs.sd] 14 Jul 2017

arxiv: v3 [cs.sd] 14 Jul 2017 Music Generation with Variational Recurrent Autoencoder Supported by History Alexey Tikhonov 1 and Ivan P. Yamshchikov 2 1 Yandex, Berlin altsoph@gmail.com 2 Max Planck Institute for Mathematics in the

More information

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect

More information

Sarcasm Detection in Text: Design Document

Sarcasm Detection in Text: Design Document CSC 59866 Senior Design Project Specification Professor Jie Wei Wednesday, November 23, 2016 Sarcasm Detection in Text: Design Document Jesse Feinman, James Kasakyan, Jeff Stolzenberg 1 Table of contents

More information

Towards Complexity Studies of Indonesian Songs

Towards Complexity Studies of Indonesian Songs Towars Complexity Stuies of Inonesian Songs Hokky Situngkir [hs@compsoc.banungfe.net] Dept. Computational Sociology Banung Fe Institute Research Fellow Surya Research International August 8 th 2007 Abstract

More information

On-line Multi-label Classification

On-line Multi-label Classification On-line Multi-label Classification A Problem Transformation Approach Jesse Read Supervisors: Bernhard Pfahringer, Geoff Holmes Hamilton, New Zealand Outline Multi label Classification Problem Transformation

More information

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network Xin Jin 1,2,LeWu 1, Xinghui Zhou 1, Geng Zhao 1, Xiaokun Zhang 1, Xiaodong Li 1, and Shiming Ge 3(B) 1 Department of Cyber Security,

More information

RESEARCH INVESTIGATION

RESEARCH INVESTIGATION T H E A T R E RESEARCH INVESTIGATION How coul an actor perform Teiresias as in the original performance of Antigone 442 BC? Mostar, the 01.10.08 Georg Schauer wor count: 2451/2500 T H E A T R E A R T S

More information

Humor recognition using deep learning

Humor recognition using deep learning Humor recognition using deep learning Peng-Yu Chen National Tsing Hua University Hsinchu, Taiwan pengyu@nlplab.cc Von-Wun Soo National Tsing Hua University Hsinchu, Taiwan soo@cs.nthu.edu.tw Abstract Humor

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Research on concept-sememe tree and semantic relevance computation

Research on concept-sememe tree and semantic relevance computation Research on concept-sememe tree and semantic relevance computation GuiPing Zhang 1, Chao Yu 1, DongFeng Cai 1, Yan Song 1, JingGuang Sun 1 1 Natural Language Processing Laboratory, Shenyang Institute of

More information

AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS

AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine LaBRI - CNRS UMR 5800 - University of Boreaux {fourer, rouas, hanna,

More information

QDR SRAM DESIGN USING MULTI-BIT FLIP-FLOP M.Ananthi, C.Sathish Kumar 1. INTRODUCTION In memory devices the most

QDR SRAM DESIGN USING MULTI-BIT FLIP-FLOP M.Ananthi, C.Sathish Kumar 1. INTRODUCTION In memory devices the most International Journal of Avance Research in Electronics an Communication Engineering (IJARECE) ABSTRACT: QDR SRAM DESIGN USING MULTI-BIT FLIP-FLOP M.Ananthi, C.Sathish Kumar 1. INTRODUCTION In memor evices

More information

AUDIO KEY LINKS: PLAYBACK DEVICES IMPROVEMENT IST PRESTO Preservation Technologies for European Broadcast Archives

AUDIO KEY LINKS: PLAYBACK DEVICES IMPROVEMENT IST PRESTO Preservation Technologies for European Broadcast Archives PRETO Preservation Technologies for European roacast Archives IT-1999-20013 AUDIO KEY LINK: PLAYACK DEVICE IMPROVEMENT Authors: Daniele AIROLA, alvatore CANGIALOI an Giorgio Dimino (RAI) 2 PRETO IT-1999-2013

More information

Melody classification using patterns

Melody classification using patterns Melody classification using patterns Darrell Conklin Department of Computing City University London United Kingdom conklin@city.ac.uk Abstract. A new method for symbolic music classification is proposed,

More information

A Discriminative Approach to Topic-based Citation Recommendation

A Discriminative Approach to Topic-based Citation Recommendation A Discriminative Approach to Topic-based Citation Recommendation Jie Tang and Jing Zhang Department of Computer Science and Technology, Tsinghua University, Beijing, 100084. China jietang@tsinghua.edu.cn,zhangjing@keg.cs.tsinghua.edu.cn

More information

An Efficient Test Pattern Generator -Mersenne Twister-

An Efficient Test Pattern Generator -Mersenne Twister- R1-12 SASIMI 2013 Proceings An Efficient Test Pattern Generator -Mersenne Twister- Hiroshi Iwata Sayaka Satonaka Ken ichi Yamaguchi Department of Information Engineering, Faculty of Avanc Engineering Nara

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 25 N. University Street West Lafayette, IN 4797-266 http://www.cs.purue.eu/people/comer Copyright 26. All rights reserve.

More information

SINGING COMPANION LESSON BOOK

SINGING COMPANION LESSON BOOK SINGING COMPANION LESSON BOOK Name: 36 COMPREHENSIVE LESSONS from Malovance, Wieneke, Meloia an Burgmayer CURWEN HAND SIGNS The application of solfeggio is best reinforce by using the Curwen han signs

More information

Less is More: Picking Informative Frames for Video Captioning

Less is More: Picking Informative Frames for Video Captioning Less is More: Picking Informative Frames for Video Captioning ECCV 2018 Yangyu Chen 1, Shuhui Wang 2, Weigang Zhang 3 and Qingming Huang 1,2 1 University of Chinese Academy of Science, Beijing, 100049,

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello Small chord vocabularies Typically a supervised learning problem N C:maj C:min C#:maj C#:min D:maj D:min......

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

RESEARCH OF FRAME SYNCHRONIZATION TECHNOLOGY BASED ON PERFECT PUNCTURED BINARY SEQUENCE PAIRS

RESEARCH OF FRAME SYNCHRONIZATION TECHNOLOGY BASED ON PERFECT PUNCTURED BINARY SEQUENCE PAIRS Research Rev. Adv. Mater. of frame Sci. synchronization 33 (2013) 261-265 technology based on perfect punctured binary sequence pairs 261 RESEARCH OF FRAME SYNCHRONIZATION TECHNOLOGY BASED ON PERFECT PUNCTURED

More information

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Finding Sarcasm in Reddit Postings: A Deep Learning Approach Finding Sarcasm in Reddit Postings: A Deep Learning Approach Nick Guo, Ruchir Shah {nickguo, ruchirfs}@stanford.edu Abstract We use the recently published Self-Annotated Reddit Corpus (SARC) with a recurrent

More information

Chapter 5. Synchronous Sequential Logic. Outlines

Chapter 5. Synchronous Sequential Logic. Outlines Chpter 5 Synchronous Sequentil Logic Outlines Sequentil Circuits Ltches Flip-Flops Anlysis of Clocke Sequentil Circuits Stte Reuction n Assignment Design Proceure 2 5. Sequentil Circuits Sequentil circuits

More information

Lab 3 : CMOS Sequential Logic Gates

Lab 3 : CMOS Sequential Logic Gates CARLETON UNIERSITY epartment of Electronics ELEC-3500 igital Electronics September 30, 2005 Lab 3 : CMOS Seuential Logic Gates esign an Specification of Seuential Logic Gates an Librar Cell igital esigns

More information

Enabling editors through machine learning

Enabling editors through machine learning Meta Follow Meta is an AI company that provides academics & innovation-driven companies with powerful views of t Dec 9, 2016 9 min read Enabling editors through machine learning Examining the data science

More information

Neural Aesthetic Image Reviewer

Neural Aesthetic Image Reviewer Neural Aesthetic Image Reviewer Wenshan Wang 1, Su Yang 1,3, Weishan Zhang 2, Jiulong Zhang 3 1 Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

Metonymy Research in Cognitive Linguistics. LUO Rui-feng

Metonymy Research in Cognitive Linguistics. LUO Rui-feng Journal of Literature and Art Studies, March 2018, Vol. 8, No. 3, 445-451 doi: 10.17265/2159-5836/2018.03.013 D DAVID PUBLISHING Metonymy Research in Cognitive Linguistics LUO Rui-feng Shanghai International

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Height-Adjustable Desks Speci cation Guide

Height-Adjustable Desks Speci cation Guide Height-Ajustable Desks Speci cation Guie Availability Electronic price list upate with release 186A (US) an 149A (Canaa), ate February 19, 2018 Spec News is available on villagesteelcasecom Search Marketing

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Lab 3 : CMOS Sequential Logic Gates

Lab 3 : CMOS Sequential Logic Gates CARLETON UNIERSITY epartment of Electronics ELEC-3500 igital Electronics Januar 20, 2004 Lab 3 : CMOS Seuential Logic Gates esign an Specification of Seuential Logic Gates an Librar Cell igital circuits

More information

Introduction to WordNet, HowNet, FrameNet and ConceptNet

Introduction to WordNet, HowNet, FrameNet and ConceptNet Introduction to WordNet, HowNet, FrameNet and ConceptNet Zi Lin the Department of Chinese Language and Literature August 31, 2017 Zi Lin (PKU) Intro to Ontologies August 31, 2017 1 / 25 WordNet Begun in

More information

A Music Retrieval System Using Melody and Lyric

A Music Retrieval System Using Melody and Lyric 202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

A Multi-Modal Chinese Poetry Generation Model

A Multi-Modal Chinese Poetry Generation Model A Multi-Modal Chinese Poetry Generation Model Dayiheng Liu Machine Intelligence Laboratory College of Computer Science Sichuan University Chengdu 610065, P. R. China Email: losinuris@gmail.com Quan Guo

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Research on sampling of vibration signals based on compressed sensing

Research on sampling of vibration signals based on compressed sensing Research on sampling of vibration signals based on compressed sensing Hongchun Sun 1, Zhiyuan Wang 2, Yong Xu 3 School of Mechanical Engineering and Automation, Northeastern University, Shenyang, China

More information

Idiom Savant at Semeval-2017 Task 7: Detection and Interpretation of English Puns

Idiom Savant at Semeval-2017 Task 7: Detection and Interpretation of English Puns Idiom Savant at Semeval-2017 Task 7: Detection and Interpretation of English Puns Samuel Doogan Aniruddha Ghosh Hanyang Chen Tony Veale Department of Computer Science and Informatics University College

More information

arxiv: v1 [cs.lg] 16 Dec 2017

arxiv: v1 [cs.lg] 16 Dec 2017 AUTOMATIC MUSIC HIGHLIGHT EXTRACTION USING CONVOLUTIONAL RECURRENT ATTENTION NETWORKS Jung-Woo Ha 1, Adrian Kim 1,2, Chanju Kim 2, Jangyeon Park 2, and Sung Kim 1,3 1 Clova AI Research and 2 Clova Music,

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text Sabrina Stehwien, Ngoc Thang Vu IMS, University of Stuttgart March 16, 2017 Slot Filling sequential

More information

Discovering Similar Music for Alpha Wave Music

Discovering Similar Music for Alpha Wave Music Discovering Similar Music for Alpha Wave Music Yu-Lung Lo ( ), Chien-Yu Chiu, and Ta-Wei Chang Department of Information Management, Chaoyang University of Technology, 168, Jifeng E. Road, Wufeng District,

More information

On the mathematics of beauty: beautiful music

On the mathematics of beauty: beautiful music 1 On the mathematics of beauty: beautiful music A. M. Khalili Abstract The question of beauty has inspired philosophers and scientists for centuries, the study of aesthetics today is an active research

More information

1. (1pts) What is the Hamming distance between these two bit patterns: and ?

1. (1pts) What is the Hamming distance between these two bit patterns: and ? . (pts) What is the Hamming istance between these two bit patterns: 00 an 00? 2. (2pts) Write the equation for the carry out of the 4th aer cell in an ALU using carrylookahea, in terms of P s an G s. 3.

More information

Music genre classification using a hierarchical long short term memory (LSTM) model

Music genre classification using a hierarchical long short term memory (LSTM) model Chun Pui Tang, Ka Long Chui, Ying Kin Yu, Zhiliang Zeng, Kin Hong Wong, "Music Genre classification using a hierarchical Long Short Term Memory (LSTM) model", International Workshop on Pattern Recognition

More information

Audio Cover Song Identification using Convolutional Neural Network

Audio Cover Song Identification using Convolutional Neural Network Audio Cover Song Identification using Convolutional Neural Network Sungkyun Chang 1,4, Juheon Lee 2,4, Sang Keun Choe 3,4 and Kyogu Lee 1,4 Music and Audio Research Group 1, College of Liberal Studies

More information

A New Scheme for Citation Classification based on Convolutional Neural Networks

A New Scheme for Citation Classification based on Convolutional Neural Networks A New Scheme for Citation Classification based on Convolutional Neural Networks Khadidja Bakhti 1, Zhendong Niu 1,2, Ally S. Nyamawe 1 1 School of Computer Science and Technology Beijing Institute of Technology

More information

2 400065 tanyulong911@ sina. com 16ZD52 Title A Study on the Realm and Spirit of Drunkenness in Ancient Chinese Aesthetics Abstract The idea of drunkenness originated in the pre-qin period and developed

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

A Comparison of Literature Classification Schemes in Dewey Decimal Classification and New Classification Scheme for Chinese Libraries

A Comparison of Literature Classification Schemes in Dewey Decimal Classification and New Classification Scheme for Chinese Libraries Journal of Library and Information Science Research 6:2 (June 2012) A Comparison of Literature Classification Schemes in Dewey Decimal Classification and New Classification Scheme for Chinese Libraries

More information

Retiming Sequential Circuits for Low Power

Retiming Sequential Circuits for Low Power Retiming Sequential Circuits for Low Power José Monteiro, Srinivas Devadas Department of EECS MIT, Cambridge, MA Abhijit Ghosh Mitsubishi Electric Research Laboratories Sunnyvale, CA Abstract Switching

More information

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Hendrik Vincent Koops 1, W. Bas de Haas 2, Jeroen Bransen 2, and Anja Volk 1 arxiv:1706.09552v1 [cs.sd]

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

HumorHawk at SemEval-2017 Task 6: Mixing Meaning and Sound for Humor Recognition

HumorHawk at SemEval-2017 Task 6: Mixing Meaning and Sound for Humor Recognition HumorHawk at SemEval-2017 Task 6: Mixing Meaning and Sound for Humor Recognition David Donahue, Alexey Romanov, Anna Rumshisky Dept. of Computer Science University of Massachusetts Lowell 198 Riverside

More information

Analysis and Clustering of Musical Compositions using Melody-based Features

Analysis and Clustering of Musical Compositions using Melody-based Features Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates

More information

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Background Abstract I attempted a solution at using machine learning to compose music given a large corpus

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra, David Sontag, Aykut Erdem Quotes If you were a current computer science student what area would you start studying heavily? Answer:

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

A Study of Predict Sales Based on Random Forest Classification

A Study of Predict Sales Based on Random Forest Classification , pp.25-34 http://dx.doi.org/10.14257/ijunesst.2017.10.7.03 A Study of Predict Sales Based on Random Forest Classification Hyeon-Kyung Lee 1, Hong-Jae Lee 2, Jaewon Park 3, Jaehyun Choi 4 and Jong-Bae

More information

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering, DeepID: Deep Learning for Face Recognition Xiaogang Wang Department of Electronic Engineering, The Chinese University i of Hong Kong Machine Learning with Big Data Machine learning with small data: overfitting,

More information

Understanding the Changing Roles of Scientific Publications via Citation Embeddings

Understanding the Changing Roles of Scientific Publications via Citation Embeddings Understanding the Changing Roles of Scientific Publications via Citation Embeddings Jiangen He Chaomei Chen {jiangen.he, chaomei.chen}@drexel.edu College of Computing and Informatics, Drexel University,

More information

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler

More information

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 03, 2015 ISSN (online): 2321-0613 V Priya 1 M Parimaladevi 2 1 Master of Engineering 2 Assistant Professor 1,2 Department

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Using Genre Classification to Make Content-based Music Recommendations

Using Genre Classification to Make Content-based Music Recommendations Using Genre Classification to Make Content-based Music Recommendations Robbie Jones (rmjones@stanford.edu) and Karen Lu (karenlu@stanford.edu) CS 221, Autumn 2016 Stanford University I. Introduction Our

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park, Annie Hu, Natalie Muenster Email: katepark@stanford.edu, anniehu@stanford.edu, ncm000@stanford.edu Abstract We propose

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Video Quality Evaluation with Multiple Coding Artifacts

Video Quality Evaluation with Multiple Coding Artifacts Video Quality Evaluation with Multiple Coding Artifacts L. Dong, W. Lin*, P. Xue School of Electrical & Electronic Engineering Nanyang Technological University, Singapore * Laboratories of Information

More information

The Research Overview of Variant Chinese Characters

The Research Overview of Variant Chinese Characters Cross-Cultural Communication Vol. 11, No. 7, 2015, pp. 61-65 DOI: 10.3968/7314 ISSN 1712-8358[Print] ISSN 1923-6700[Online] www.cscanada.net www.cscanada.org The Research Overview of Variant Chinese Characters

More information