Adversarial Learning for Chinese NER from Crowd Annotations

Similar documents
GESTURE RECOGNITION RESEARCH FOR HUMAN-MACHINE SYMBIOTIC ENVIRONMENT. T.Kirishima 1, K.Sato 2, and K.Chihara 3

Combinatorial structures and processing in Neural Blackboard Architectures

-To become familiar with the input/output characteristics of several types of standard flip-flop devices and the conversion among them.

CE 603 Photogrammetry II. Condition number = 2.7E+06

Interaction Between Real And Financial Sectors In Nigeria: A Causality Test

Guidance Supplement for ACR Computed Tomography Accreditation on the MX16-slice CT

Optimal Filter Estimation for Lucas-Kanade Optical Flow

The Blizzard Challenge 2014

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

Overview ECE 553: TESTING AND TESTABLE DESIGN OF. Ad-Hoc DFT Methods Good design practices learned through experience are used as guidelines:

DO NOT COPY DO NOT COPY DO NOT COPY DO NOT COPY

Lab 2 Position and Velocity

FDI, Human Capital and Economic Growth: Evidence from Nigeria

10. Water tank. Example I. Draw the graph of the amount z of water in the tank against time t.. Explain the shape of the graph.

Regression Model Used in Analyzing the Effect of Foreign Direct Investment on Economic Growth

4.1 Water tank. height z (mm) time t (s)

MULTI-VIEW VIDEO COMPRESSION USING DYNAMIC BACKGROUND FRAME AND 3D MOTION ESTIMATION

Physics 218: Exam 1. Sections: , , , 544, , 557,569, 572 September 28 th, 2016

Adaptive Down-Sampling Video Coding

F0 ESTIMATION FOR NOISY SPEECH BY EXPLORING TEMPORAL HARMONIC STRUCTURES IN LOCAL TIME FREQUENCY SPECTRUM SEGMENT

Workflow Overview. BD FACSDiva Software Quick Reference Guide for BD FACSAria Cell Sorters. Starting Up the System. Checking Cytometer Performance

Measurement of Capacitances Based on a Flip-Flop Sensor

THE INCREASING demand to display video contents

A Methodology for Evaluating Storage Systems in Distributed and Hierarchical Video Servers

Student worksheet: Spoken Grammar

Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion

UPDATE FOR DESIGN OF STRUCTURAL STEEL HOLLOW SECTION CONNECTIONS VOLUME 1 DESIGN MODELS, First edition 1996 A.A. SYAM AND B.G.

References and quotations

application software

DIAGNOSTIC PRE-ASSESSMENT C. Use this Diagnostic Pre-Assessment to identify students who require intervention in this area: Expressions & Equations

Polychrome Devices Reference Manual

Drivers Evaluation of Performance of LED Traffic Signal Modules

application software

G E T T I N G I N S T R U M E N T S, I N C.

The Art of Image Acquisition

Line numbering and synchronization in digital HDTV systems

The Art of Image Acquisition

Solution Guide II-A. Image Acquisition. Building Vision for Business. MVTec Software GmbH

Solution Guide II-A. Image Acquisition. HALCON Progress

Real-time Facial Expression Recognition in Image Sequences Using an AdaBoost-based Multi-classifier

Ten Music Notation Programs

TUBICOPTERS & MORE OBJECTIVE

T-25e, T-39 & T-66. G657 fibres and how to splice them. TA036DO th June 2011

LATCHES Implementation With Complex Gates

Diffusion in Concert halls analyzed as a function of time during the decay process

A Turbo Tutorial. by Jakob Dahl Andersen COM Center Technical University of Denmark

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

TRANSFORM DOMAIN SLICE BASED DISTRIBUTED VIDEO CODING

Nonuniform sampling AN1

2015 Communication Guide

AN-605 APPLICATION NOTE

Region-based Temporally Consistent Video Post-processing

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

PowerStrip Automatic Cut & Strip Machine

Quantifying Domestic Movie Revenues Using Online Resources in China

EX 5 DIGITAL ELECTRONICS (GROUP 1BT4) G

I (parent/guardian name) certify that, to the best of my knowledge, the

Quality improvement in measurement channel including of ADC under operation conditions

Internet supported Analysis of MPEG Compressed Newsfeeds

Voice Security Selection Guide

Automatic Selection and Concatenation System for Jazz Piano Trio Using Case Data

LOW LEVEL DESCRIPTORS BASED DBLSTM BOTTLENECK FEATURE FOR SPEECH DRIVEN TALKING AVATAR

Chapter 7 Registers and Register Transfers

Source and Channel Coding Issues for ATM Networks y. ECSE Department, Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A

Removal of Order Domain Content in Rotating Equipment Signals by Double Resampling

MELSEC iq-f FX5 Simple Motion Module User's Manual (Advanced Synchronous Control) -FX5-40SSC-S -FX5-80SSC-S

First Result of the SMA Holography Experirnent

MELODY EXTRACTION FROM POLYPHONIC AUDIO BASED ON PARTICLE FILTER

The Determinants and Impacts of Foreign Direct Investment in Nigeria

AN ESTIMATION METHOD OF VOICE TIMBRE EVALUATION VALUES USING FEATURE EXTRACTION WITH GAUSSIAN MIXTURE MODEL BASED ON REFERENCE SINGER

Implementation of Expressive Performance Rules on the WF-4RIII by modeling a professional flutist performance using NN

Analyzing the influence of pitch quantization and note segmentation on singing voice alignment in the context of audio-based Query-by-Humming

The Impact of e-book Technology on Book Retailing

Australian Journal of Basic and Applied Sciences

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Practice Guide Sonata in F Minor, Op. 2, No. 1, I. Allegro Ludwig van Beethoven

Hierarchical Sequential Memory for Music: A Cognitive Model

Read Only Memory (ROM)

SOME FUNCTIONAL PATTERNS ON THE NON-VERBAL LEVEL

Personal Computer Embedded Type Servo System Controller. Simple Motion Board User's Manual (Advanced Synchronous Control) -MR-EM340GF

Performance Rendering for Piano Music with a Combination of Probabilistic Models for Melody and Chords

Research on the Classification Algorithms for the Classical Poetry Artistic Conception based on Feature Clustering Methodology. Jin-feng LIANG 1, a

Mean-Field Analysis for the Evaluation of Gossip Protocols

SAMPLE UNITS. Maddy. Emma. Dan. Robbie

Coded Strobing Photography: Compressive Sensing of High-speed Periodic Events

On Mopping: A Mathematical Model for Mopping a Dirty Floor

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

LABORATORY COURSE OF ELECTRONIC INSTRUMENTATION BASED ON THE TELEMETRY OF SEVERAL PARAMETERS OF A REMOTE CONTROLLED CAR

Vowels and consonants? - articulatory characteristics. Phonetics and Phonology. Vowels and consonants? - acoustic characteristics

Computer Vision II Lecture 8

Computer Vision II Lecture 8

Computer Graphics Applications to Crew Displays

Image Intensifier Reference Manual

MODELLING PERCEPTION OF SPEED IN MUSIC AUDIO

SAFETY WITH A SYSTEM V EN

2 Specialty Application Photoelectric Sensors

NewBlot PVDF 5X Stripping Buffer

Singing voice detection with deep recurrent neural networks

Signing Naturally, Teacher s Curriculum Guide, Units 7 12 Copyright 2014 Lentz, Mikos, Smith All Rights Reserved.

Transcription:

Adversarial Learig for Chiese NER from Crowd Aoaios Yaosheg Yag 1, Meisha Zhag 4, Weliag Che 1 Wei Zhag, Haofe Wag 3, Mi Zhag 1 1 School of Compuer Sciece ad Techology, Soochow Uiversiy, Chia Alibaba Group ad 3 Shezhe Gowild Roboics Co. Ld 4 School of Compuer Sciece ad Techology, Heilogjiag Uiversiy, Chia 1 ysyag@su.suda.edu.c, {wlche, mizhag}@suda.edu.c 4 maso.zms@gmail.com, lau.zw@alibaba-ic.com, 3 wag haofe@gowild.c majoriy work direcly build models o crowd aoaios, ryig o model he differeces amog aoaors, for example, some of he aoaors may be more rusful (Rodrigues, Pereira, ad Ribeiro 014; Nguye e al. 017). Here we focus maily o he Chiese NER, which is more difficul ha NER for oher laguages such as Eglish for he lack of morphological variaios such as capializaio ad i paricular he uceraiy i word segmeaio. The Chiese NE aggers raied o ews domai ofe perform poor i oher domais. Alhough we ca alleviae he problem by usig characer-level aggig o resolve he problem of poor word segmeaio performaces (Peg ad Dredze 015), sill here exiss a large gap whe he arge domai chages, especially for he exs of social media. Thus, i order o ge a good agger for ew domais ad also for he codiios of ew eiy ypes, we require large amous of labeled daa. Therefore, crowdsourcig is a reasoable soluio for hese siuaios. I his paper, we propose a approach o raiig a Chiese NER sysem o he crowd-aoaed daa. Our goal is o exrac addiioal aoaor idepede feaures by adversarial raiig, alleviaig he aoaio oises of oexpers. The idea of adversarial raiig i eural eworks has bee used successfully i several NLP asks, such as cross-ligual POS aggig (Kim e al. 017) ad crossdomai POS aggig (Gui e al. 017). They use i o reduce he egaive iflueces of he ipu divergeces amog differe domais or laguages, while we use adversarial raiig o reduce he egaive iflueces brough by differe crowd aoaors. To our bes kowledge, we are he firs o apply adversarial raiig for crowd aoaio learig. I he learig framework, we perform adversarial raiig bewee he basic NER ad a addiioal worker discrimiaor. We have a commo Bi-LSTM for represeig aoaor-geeric iformaio ad a privae Bi-LSTM for represeig aoaor-specific iformaio. We build aoher label Bi-LSTM by he crowd-aoaed NE label sequece which reflecs he mid of he crowd aoaors who lear eiy defiiios by readig he aoaio guidebook. The commo ad privae Bi-LSTMs are used for NER, while he commo ad label Bi-LSTMs are used as ipus for he worker discrimiaor. The parameers of he commo Bi-LSTM are leared by adversarial raiig, maximizig he worker discrimiaor loss ad meawhile miimizarxiv:1801.05147v1 [cs.cl] 16 Ja 018 Absrac To quickly obai ew labeled daa, we ca choose crowdsourcig as a aleraive way a lower cos i a shor ime. Bu as a exchage, crowd aoaios from o-expers may be of lower qualiy ha hose from expers. I his paper, we propose a approach o performig crowd aoaio learig for Chiese Named Eiy Recogiio (NER) o make full use of he oisy sequece labels from muliple aoaors. Ispired by adversarial learig, our approach uses a commo Bi-LSTM ad a privae Bi-LSTM for represeig aoaorgeeric ad -specific iformaio. The aoaor-geeric iformaio is he commo kowledge for eiies easily masered by he crowd. Fially, we build our Chiese NE agger based o he LSTM-CRF model. I our experimes, we creae wo daa ses for Chiese NER asks from wo domais. The experimeal resuls show ha our sysem achieves beer scores ha srog baselie sysems. Iroducio There has bee sigifica progress o Named Eiy Recogiio (NER) i rece years usig models based o machie learig algorihms (Zhao ad Ki 008; Collober e al. 011; Lample e al. 016). As wih oher Naural Laguage Processig (NLP) asks, buildig NER sysems ypically requires a massive amou of labeled raiig daa which are aoaed by expers. I real applicaios, we ofe eed o cosider ew ypes of eiies i ew domais where we do o have exisig aoaed daa. For such ew ypes of eiies, however, i is very hard o fid expers o aoae he daa wihi shor ime limis ad hirig expers is cosly ad o-scalable, boh i erms of ime ad moey. I order o quickly obai ew raiig daa, we ca use crowdsourcig as oe aleraive way a lower cos i a shor ime. Bu as a exchage, crowd aoaios from oexpers may be of lower qualiy ha hose from expers. I is oe bigges challege o build a powerful NER sysem o such a low qualiy aoaed daa. Alhough we ca obai high qualiy aoaios for each ipu seece by majoriy voig, i ca be a wase of huma labors o achieve such a goal, especially for some ambiguous seeces which may require a umber of aoaios o reach a agreeme. Thus The correspodig auhor is Weliag Che. Copyrigh c 018, Associaio for he Advaceme of Arificial Ielligece (www.aaai.org). All righs reserved.

ig he NER loss. Thus he resulig feaures of he commo Bi-LSTM are worker ivaria ad NER sesiive. For evaluaio, we creae wo Chiese NER daases i wo domais: dialog ad e-commerce. We require he crowd aoaors o label he ypes of eiies, icludig perso, sog, brad, produc, ad so o. Ideifyig hese eiies is useful for chabo ad e-commerce plaforms (Klüwer 011). The we coduc experimes o he ewly creaed daases o verify he effeciveess of he proposed adversarial eural ework model. The resuls show ha our sysem ouperforms very srog baselie sysems. I summary, we make he followig coribuios: We propose a crowd-aoaio learig model based o adversarial eural eworks. The model uses labeled daa creaed by o-expers o rai a NER classifier ad simulaeously lears he commo ad privae feaures amog he o-exper aoaors. We creae wo daa ses i dialog ad e-commerce domais by crowd aoaios. The experimeal resuls show ha he proposed approach performs he bes amog all he compariso sysems. Relaed Work Our work is relaed o hree lies of research: Sequece labelig, Adversarial raiig, ad Crowdsourcig. Sequece labelig. NER is widely reaed as a sequece labelig problem, by assigig a uique label over each seeial word (Raiov ad Roh 009). Early sudies o sequece labelig ofe use he models of HMM, MEMM, ad CRF (Laffery e al. 001) based o maually-crafed discree feaures, which ca suffer he feaure sparsiy problem ad require heavy feaure egieerig. Recely, eural ework models have bee successfully applied o sequece labelig (Collober e al. 011; Huag, Xu, ad Yu 015; Lample e al. 016). Amog hese work, he model which uses Bi-LSTM for feaure exracio ad CRF for decodig has achieved sae-of-he-ar performaces (Huag, Xu, ad Yu 015; Lample e al. 016), which is exploied as he baselie model i our work. Adversarial Traiig. Adversarial Neworks have achieved grea success i compuer visio such as image geeraio (Deo e al. 015; Gai e al. 016). I he NLP commuiy, he mehod is maily exploied uder he seigs of domai adapio (Zhag, Barzilay, ad Jaakkola 017; Gui e al. 017), cross-ligual (Che e al. 016; Kim e al. 017) ad muli-ask learig (Che e al. 017; Liu, Qiu, ad Huag 017). All hese seigs ivolve he feaure divergeces bewee he raiig ad es examples, ad aim o lear ivaria feaures across he divergeces by a addiioal adversarial discrimiaor, such as domai discrimiaor. Our work is similar o hese work bu is applies o crowdsourcig learig, aimig o fid ivaria feaures amog differe crowdsourcig workers. Crowdsourcig. Mos NLP asks require a massive amou of labeled raiig daa which are aoaed by expers. However, hirig expers is cosly ad o-scalable, boh i erms of ime ad moey. Isead, crowdsourcig is aoher soluio o obai labeled daa a a lower cos bu wih relaive lower qualiy ha hose from expers. Sow e al. (008) colleced labeled resuls for several NLP asks from Amazo Mechaical Turk ad demosraed ha o-expers aoaios were quie useful for raiig ew sysems. I rece years, a series of work have focused o how o use crowdsourcig daa efficiely i asks such as classificaio (Fel e al. 015; Bi e al. 014), ad compare qualiy of crowd ad exper labels (Dumirache, Aroyo, ad Wely 017). I sequece labelig asks, Dredze, Talukdar, ad Crammer (009) viewed his ask as a muli-label problem while Rodrigues, Pereira, ad Ribeiro (014) ook workers ideiies io accou by assumig ha each seeial word was agged correcly by oe of he crowdsourcig workers ad proposed a CRF-based model wih muliple aoaors. Nguye e al. (017) iroduced a crowd represeaio i which he crowd vecors were added io he LSTM-CRF model a rai ime, bu igored hem a es ime. I his paper, we apply adversarial raiig o crowd aoaios o Chiese NER i ew domais, ad achieve beer performaces ha previous sudies o crowdsourcig learig. Baselie: LSTM-CRF We use a eural CRF model as he baselie sysem (Raiov ad Roh 009), reaig NER as a sequece labelig problem over Chiese characers, which has achieved sae-ofhe-ar performaces (Peg ad Dredze 015). To his ed, we explore he BIEO schema o cover NER io sequece labelig, followig Lample e al. (016), where seeial characer is assiged wih oe uique ag. Cocreely, we ag he o-eiy characer by label O, he begiig characer of a eiy by B-XX, he edig characer of a eiy by E-XX ad he oher characer of a eiy by I-XX, where XX deoes he eiy ype. We build high-level eural feaures from he ipu characer sequece by a bi-direcioal LSTM (Lample e al. 016). The resulig feaures are combied ad he are fed io a oupu CRF layer for decodig. I summary, he baselie model has hree mai compoes. Firs, we make vecor represeaios for seeial characers x 1 x x, rasformig he discree ipus io low-dimesioal eural ipus. Secod, feaure exracio is performed o obai high-level feaures h er 1 her her, by usig a bi-direcioal LSTM (Bi-LSTM) srucure ogeher wih a liear rasformaio over x 1 x x. Third, we apply a CRF aggig module over h er 1 her her, obaiig he fial oupu NE labels. The overall framework of he baselie model is show by he righ par of Figure 1. Vecor Represeaio of Characers To represe Chiese characers, we simply exploi a eural embeddig layer o map discree characers io he lowdimesioal vecor represeaios. The goal is achieved by a lookig-up able E W, which is a model parameer ad will be fie-ued durig raiig. The lookig-up able ca be iiialized eiher by radom or by usig a preraied embeddigs from large scale raw corpus. For a give Chiese characer sequece c 1 c c, we obai he vec-

Worker-Adversarial worker Baselie B-PER L-PER O O o worker o er o er 1 o er o er CNN 1 h er 1 h er h er h er 1 h commo 1 h commo h commo h commo h privae 1 h privae h privae h privae Bi-LSTM Bi-LSTM Bi-LSTM x 1 x x x x 1 x x x ȳ 1 ȳ ȳ ȳ w 1 w w w Figure 1: The framework of he proposed model, which cosiss of wo pars. or represeaio of each seeial characer by: x = look-up(c, E W ), [1, ]. Feaure Exracio Based o he vecor sequece x 1 x x, we exrac higher-level feaures h er 1 her her by usig a bidirecioal LSTM module ad a simple feed-forward eural layer, which are he used for CRF aggig a he ex sep. LSTM is a ype of recurre eural ework (RNN), which is desiged for solvig he explodig ad dimiishig gradies of basic RNNs (Graves ad Schmidhuber 005). I has bee widely used i a umber of NLP asks, icludig POS-aggig (Huag, Xu, ad Yu 015; Ma ad Hovy 016), parsig (Dyer e al. 015) ad machie raslaio (Wu e al. 016), because of is srog capabiliies of modelig aural laguage seeces. By raversig x 1 x x by order ad reversely, we obai he oupu feaures h privae 1 h privae h privae of he bi- LSTM, where h privae = h h. Here we refer his Bi- LSTM as privae i order o differeiae i wih he commo Bi-LSTM over he same characer ipus which will be iroduced i he ex secio. Furher we make a iegraio of he oupu vecors of bi-direcioal LSTM by a liear feed-forward eural layer, resulig i he feaures h er 1 her her by equaio: h er = Wh privae + b, (1) where W ad b are boh model parameers. CRF Taggig Fially we feed he resulig feaures h er, [1, ] io a CRF layer direcly for NER decodig. CRF aggig is oe globally ormalized model, aimig o fid he bes oupu sequece cosiderig he depedecies bewee successive labels. I he sequece labelig seig for NER, he oupu label of oe posiio has a srog depedecy o he label of he previous posiio. For example, he label before I-XX mus be eiher B-XX or I-XX, where XX should be exacly he same. CRF ivolves wo pars for predicio. Firs we should compue he scores for eac based h er, resulig i o er, whose dimesio is he umber of oupu labels. The oher par is a rasiio marix T which defies he scores of wo successive labels. T is also a model parameer. Based o o er ad T, we use he Vierbi algorihm o fid he besscorig label sequece. We ca formalize he CRF aggig process as follows: o er = W er h er, [1, ] score(x, y) = (o,y + T y 1,y ) () =1 y er ( ) = arg max score(x, y)), y where score( ) is he scorig fucio for a give oupu label sequece y = y 1 y y based o ipu X, y er is he resulig label sequece, W er is a model parameer. Traiig To rai model parameers, we exploi a egaive loglikelihood objecive as he loss fucio. We apply sofmax over all cadidae oupu label sequeces, hus he probabiliy of he crowd-aoaed label sequece is compued by: exp ( score(x, ȳ) ) p(ȳ X) = y Y X exp ( ), (3) score(x, y) where ȳ is he crowd-aoaed label sequeces ad Y X is all cadidae label sequece of ipu X. Based o he above formula, he loss fucio of our baselie model is: loss(θ, X, ȳ) = log p(ȳ X), (4) where Θ is he se of all model parameers. We use sadard back-propagaio mehod o miimize he loss fucio of he baselie CRF model.

Worker Adversarial Adversarial learig has bee a effecive mechaism o resolve he problem of he ipu feaures bewee he raiig ad es examples havig large divergeces (Goodfellow e al. 014; Gai e al. 016). I has bee successfully applied o domai adapio (Gui e al. 017), cross-ligual learig (Che e al. 016) ad muli-ask learig (Liu, Qiu, ad Huag 017). All seigs ivolve feaure shifig bewee he raiig ad esig. I his paper, our seig is differe. We are usig he aoaios from o-expers, which are oise ad ca ifluece he fial performaces if hey are o properly processed. Direcly learig based o he resulig corpus may adap he eural feaure exracio io he biased aoaios. I his work, we assume ha idividual workers have heir ow guidelies i mid afer shor raiig. For example, a perfec worker ca aoae highly cosisely wih a exper, while commo crowdsourcig workers may be cofused ad have differe udersadigs o cerai coexs. Based o he assumpio, we make a adapio for he origial adversarial eural ework o our seig. Our adapio is very simple. Briefly speakig, he origial adversarial learig adds a addiioal discrimiaor o classify he ype of source ipus, for example, he domai caegory i he domai adapio seig, while we add a discrimiaor o classify he aoaio workers. Solely he feaures from he ipu seece is o eough for worker classificaio. The aoaio resul of he worker is also required. Thus he ipus of our discrimiaor are differe. Here we exploi boh he source seeces ad he crowd-aoaed NE labels as basic ipus for he worker discrimiaio. I he followig, we describe he proposed adversarial learig module, icludig boh he submodels ad he raiig mehod. As show by he lef par of Figure 1, he submodel cosiss of four pars: (1) a commo Bi-LSTM over ipu characers; () a addiioal Bi-LSTM o ecode crowd-aoaed NE label sequece; (3) a covoluioal eural ework (CNN) o exrac feaures for worker discrimiaor; (4) oupu ad predicio. Commo Bi-LSTM over Characers To build he adversarial par, firs we creae a ew bidirecioal LSTM, amed by he commo Bi-LSTM: h commo 1 h commo h commo = Bi-LSTM(x 1 x x ). (5) As show i Figure 1, his Bi-LSTM is cosruced over he same ipu characer represeaios of he privae Bi- LSTM, i order o exrac worker idepede feaures. The resulig feaures of he commo Bi-LSTM are used for boh NER ad he worker discrimiaor, differe wih he feaures of privae Bi-LSTM which are used for NER oly. As show i Figure 1, we cocaeae he oupus of he commo ad privae Bi-LSTMs ogeher, ad he feed he resuls io he feed-forward combiaio layer of he NER par. Thus Formula 1 ca be rewrie as: h er = W(h commo h privae ) + b, (6) where W is wider ha he origial combiaio because he ewly-added h commo. Noiceably, alhough he resulig commo feaures are used for he worker discrimiaor, hey acually have o capabiliy o disiguish he workers. Because his par is exploied o maximize he loss of he worker discrimiaor, i will be ierpreed i he laer raiig subsecio. These feaures are ivaria amog differe workers, hus hey ca have less oises for NER. This is he goal of adversarial learig, ad we hope he NER beig able o fid useful feaures from hese worker idepede feaures. Addiioal Bi-LSTM over Aoaed NER Labels I order o icorporae he aoaed NE labels o predic he exac worker, we build aoher bi-direcioal LSTM (amed by label Bi-LSTM) based o he crowd-aoaed NE label sequece. This Bi-LSTM is used for worker discrimiaor oly. Durig he decodig of he esig phase, we will ever have his Bi-LSTM, because he worker discrimiaor is o loger required. Assumig he crowd-aoaed NE label sequece aoaed by oe worker is ȳ = ȳ 1 ȳ ȳ, we exploi a lookig-up able E L o obai he correspodig sequece of heir vecor represeaios x 1x x, similar o he mehod ha maps characers io heir eural represeaios. Cocreely, for oe NE label ȳ ( [1, ]), we obai is eural vecor by: x = look-up(ȳ, E L ). Nex sep we apply bi-direcioal LSTM over he sequece x 1x x, which ca be formalized as: 1 = Bi-LSTM(x 1x x ). (7) The resulig feaure sequece is cocaeaed wih he oupus of he commo Bi-LSTM, ad furher be used for worker classificaio. CNN Followig, we add a covoluioal eural ework (CNN) module based o he cocaeaed oupus of he commo Bi-LSTM ad he label Bi-LSTM, o produce he fial feaures for worker discrimiaor. A covoluioal operaor wih widow size 5 is used, ad he max poolig sraegy is applied over he covoluio sequece o obai he fial fixed-dimesioal feaure vecor. The whole process ca be described by he followig equaios: = h commo = ah(w c [ = max-poolig( 1,,, 1 + ]) h worker ) where [1, ] ad W c is oe model parameer. We exploi zero vecor o paddle he ou-of-idex vecors. Oupu ad Predicio Afer obaiig he fial feaure vecor for he worker discrimiaor, we use i o compue he oupu vecor, which scores all he aoaio workers. The score fucio is defied by: o worker = W worker, (9) where W worker is oe model parameer ad he oupu dimesio equals he umber of oal o-exper aoaors. (8)

The predicio is o fid he worker which is resposible for his aoaio. Adversarial Traiig The raiig objecive wih adversarial eural ework is differe from he baselie model, as i icludes he exra worker discrimiaor. Thus he ew objecive icludes wo pars, oe beig he egaive log-likelihood from NER which is he same as he baselie, ad he oher beig he egaive he log-likelihood from he worker discrimiaor. I order o obai he egaive log-likelihood of he worker discrimiaor, we use sofmax o compue he probabiliy of he acual worker z as well, which is defied by: p( z X, ȳ) = exp(oworker z ) z exp(oworker z ), (10) where z should eumerae all workers. Based o he above defiiio of probabiliy, our ew objecive is defied as follows: R(Θ, Θ, X, ȳ, z) = loss(θ, X, ȳ) loss(θ, Θ, X) = log p(ȳ X) + log p( z X, ȳ), (11) where Θ is he se of all model parameers relaed o NER, ad Θ is he se of he remaiig parameers which are oly relaed o he worker discrimiaor, X, ȳ ad z are he ipu seece, he crowd-aoaed NE labels ad he correspodig aoaor for his aoaio, respecively. I is worh oig ha he parameers of he commo Bi-LSTM are icluded i he se of Θ by defiiio. I paricular, our goal is o o simply miimize he ew objecive. Acually, we aim for a saddle poi, fidig he parameers Θ ad Θ saisfyig he followig codiios: ˆΘ = arg mi R(Θ, Θ, X, ȳ, z) Θ ˆΘ = arg max R( ˆΘ, Θ, X, ȳ, z) Θ (1) where he firs equaio aims o fid oe Θ ha miimizes our ew objecive R( ), ad he secod equaio aims o fid oe Θ maximizig he same objecive. Iuiively, he firs equaio of Formula 1 ries o miimize he NER loss, bu a he same ime maximize he worker discrimiaor loss by he shared parameers of he commo Bi-LSTM. Thus he resulig feaures of commo Bi-LSTM acually aemp o hur he worker discrimiaor, which makes hese feaures worker idepede sice hey are uable o disiguish differe workers. The secod equaio ries o miimize he worker discrimiaor loss by is ow parameer Θ. We use he sadard back-propagaio mehod o rai he model parameers, he same as he baselie model. I order o icorporae he erm of he argmax par of Formula 1, we follow he previous work of adversarial raiig (Gai e al. 016; Che e al. 016; Liu, Qiu, ad Huag 017), by iroducig a gradie reverse layer bewee he commo Bi-LSTM ad he CNN module, whose forward does ohig bu he backward simply egaes he gradies. #Se AvgLe Kappa DL-PS 16,948 9.1 0.6033 UC-MT,337 34.97 0.7437 UC-UQ,300 7.69 0.759 Table 1: Saisics of labeled daases. Experimes Daa Ses Wih he purpose of obaiig evaluaio daases from crowd aoaors, we collec he seeces from wo domais: Dialog ad E-commerce domai. We hire udergraduae sudes o aoae he seeces. They are required o ideify he predefied ypes of eiies i he seeces. Togeher wih he guidelie docume, he aoaors are educaed some ips i fifee miues ad also provided wih 0 exemplifyig seeces. Labeled Daa: DL-PS. I Dialog domai (DL), we collec raw seeces from a chabo applicaio. Ad he we radomly selec 0K seeces as our pool ad hire 43 sudes o aoae he seeces. We ask he aoaors o label wo ypes of eiies: Perso-Name ad Sog-Name. The aoaors label he seeces idepedely. I paricular, each seece is assiged o hree aoaors for his daa. Alhough he seig ca be waseful of labor, we ca use he resulig daase o es several well-kow baselies such as majoriy voig. Afer aoaio, we remove some illegal seeces repored by he aoaors. Fially, we have 16,948 seeces aoaed by he sudes. Table 1 shows he iformaio of aoaed daa. The average Kappa value amog he aoaors is 0.6033, idicaig ha he crowd aoaors have moderae agreeme o ideifyig eiies o his daa. I order o evaluae he sysem performaces, we creae a se of corpus wih gold aoaios. Cocreely, we radomly selec 1,000 seeces from he fial daase ad le wo expers geerae he gold aoaios. Amog hem, we use 300 seeces as he developme se ad he remaiig 700 as he es se. The res seeces wih oly sude aoaios are used as he raiig se. Labeled daa: EC-MT ad EC-UQ. I E-commerce domai (EC), we collec raw seeces from wo ypes of exs: oe is iles of merchadise eries (EC-MT) ad aoher is user queries (EC-UQ). The aoaors label five ypes of eiies: Brad, Produc, Model, Maerial, ad Specificaio. These five ypes of eiies are very impora for E- commerce plaform, for example buildig kowledge graph of merchadises. Five sudes paricipae he aoaios for his domai sice he umber of seeces is small. We use he similar sraegy as DL-PS o aoae he seeces, excep ha oly wo aoaors are assiged for each seece, because we aim o es he sysem performaces uder very small duplicaed aoaios. Fially, we obai,337 seeces for EC-MT ad,300 for EC-UQ. Table 1 shows he iformaio of aoaed resuls. Similarly, we produce he developme ad es daases for sysem evaluaio, by radomly selecig 400 seeces ad leig wo expers o geerae he groudruh

aoaios. Amog hem, we use 100 seeces as he developme se ad he remaiig 300 as he es se. The res seeces wih oly crowdsourcig aoaios are used as he raiig se. Ulabeled daa. The vecor represeaios of characers are basic ipus of our baselie ad proposed models, which are obaied by he lookig-up able E W. As iroduced before, we ca use preraied embeddigs from large-scale raw corpus o iiialize he able. I order o prerai he characer embeddigs, we use oe large-scale ulabeled daa from he user-geeraed coe i Iere. Toally, we obai a umber of 5M seeces. Fially, we use he ool wordvec 1 o prerai he characer embeddigs based o he ulabeled daase i our experimes. Seigs For evaluaio, we use he eiy-level merics of Precisio (P), Recall (R), ad heir F1 value i our experimes, reaig oe agged eiy as correc oly whe i maches he gold eiy exacly. There are several hyper-parameers i he baselie LSTM- CRF ad our fial models. We se hem empirically by he developme performaces. Cocreely, we se he dimesio size of he characer embeddigs by 100, he dimesio size of he NE label embeddigs by 50, ad he dimesio sizes of all he oher hidde feaures by 00. We exploi olie raiig wih a mii-bach size 18 o lear model parameers. The max-epoch ieraio is se by 00, ad he bes-epoch model is chose accordig o he developme performaces. We use RMSprop (Tielema ad Hio 01) wih a learig rae 10 3 o updae model parameers, ad use l -regularizaio by a parameer 10 5. We adop he dropou echique o avoid overfiig by a drop value of 0.. Compariso Sysems The proposed approach (heceforward referred o as AL- Crowd) is compared wih he followig sysems: CRF: We use he Crfsuie ool o rai a model o he crowdsourcig labeled daa. As for he feaure seigs, we use he supervised versio of Zhao ad Ki (008). CRF-VT: We use he same seigs of he CRF sysem, excep ha he raiig daa is he voed versio, whose groudruhs are produced by majoriy voig a he characer level for each aoaed seece. CRF-MA: The CRF model proposed by Rodrigues, Pereira, ad Ribeiro (014), which uses a prior disribuaio o model muliple crowdsourcig aoaors. We use he source code provided by he auhors. LSTM-CRF: Our baselie sysem raied o he crowdsourcig labeled daa. LSTM-CRF-VT: Our baselie sysem raied o he voed corpus, which is he same as CRF-VT. 1 hps://code.google.com/archive/p/wordvec hp://www.chokka.org/sofware/crfsuie/ Model P R F1 CRF 89.48 70.38 78.79 CRF-VT 85.16 65.07 73.77 CRF-MA 7.83 90.79 80.8 LSTM-CRF 90.50 79.97 84.91 LSTM-CRF-VT 88.68 75.51 81.57 LSTM-Crowd 86.40 83.43 84.89 ALCrowd 89.56 8.70 85.99 Table : Mai resuls o he DL-PS daa. Model Daa: EC-MT P R F1 CRF 75.1 66.67 70.64 LSTM-CRF 75.0 7.84 73.91 LSTM-Crowd 73.81 75.18 74.49 ALCrowd 76.33 74.00 75.15 Daa: EC-UQ CRF 65.45 55.33 59.96 LSTM-CRF 71.96 66.55 69.15 LSTM-Crowd 67.51 71.10 69.6 ALCrowd 74.7 68.60 71.53 Table 3: Mai resuls o he EC-MT ad EC-UQ daases. LSTM-Crowd: The LSTM-CRF model wih crowd aoaio learig proposed by Nguye e al. (017). We use he source code provided by he auhors. The firs hree sysems are based o he CRF model usig radiioal hadcrafed feaures, ad he las hree sysems are based o he eural LSTM-CRF model. Amog hem, CRF-MA, LSTM-Crowd ad our sysem wih adversarial learig (ALCrowd) are based o crowd aoaio learig ha direcly rais he model o he crowd-aoaios. Five sysems, icludig CRF, CRF-MA, LSTM-CRF, LSTM- Crowd, ad ALCrowd, are raied o he origial versio of labeled daa, while CRF-VT ad LSTM-CRF-VT are raied o he voed versio. Sice CRF-VT, CRF-MA ad LSTM- CRF-VT all require groud-ruh aswers for each raiig seece, which are difficul o be produced wih oly wo aoaios, we do o apply he hree models o he wo EC daases. Mai Resuls I his secio, we show he model performaces of our proposed crowdsourcig learig sysem (ALCrowd), ad meawhile compare i wih he oher sysems meioed above. Table shows he experimeal resuls o he DL- PS daases ad Table 3 shows he experime resuls o he EC-MT ad EC-UQ daases, respecively. The resuls of CRF ad LSTM-CRF mea ha he crowd aoaio is a aleraive soluio wih low cos for labelig daa ha could be used for raiig a NER sysem eve here are some icosisecies. Compared wih CRF, LSTM-CRF achieves much beer performaces o all he hree daa, showig +6.1 F1 improveme o DL-PS, +4.51 o EC-MT, ad +9.19 o EC-UQ. This idicaes ha LSTM-

85 80 75 70 65 60 Radom Preraied Daa:DL-PS Daa:EC-MT Daa:EC-UQ Figure : Comparisos by usig differe characer embeddigs, where he Y-axis shows he F1 values CRF is a very srog baselie sysem, demosraig he effeciveess of eural ework. Ieresigly, whe compared wih CRF ad LSTM-CRF, CRF-VT ad LSTM-CRF-VT raied o he voed versio perform worse i he DL-PS daase. This red is also meioed i Nguye e al. (017). This fac shows ha he majoriy voig mehod migh be usuiable for our ask. There are wo possible reasos accouig for he observaio. O he oe had, simple characer-level voig based o hree aoaios for each seece may be sill o eough. I he DL-PS daase, eve wih oly wo predefied eiy ypes, oe characer ca have ie NE labels. Thus he majoriyvoig may be icapable of hadlig some cases. While he cos by addig more aoaios for each seece would be grealy icreased. O he oher had, he los iformaio produced by majoriy-voig may be impora, a leas he ambiguous aoaios deoe ha he ipu seece is difficul for NER. The ormal CRF ad LSTM-CRF models wihou discard ay aoaios ca differeiae hese difficul coexs hrough learig. Three crowd-aoaio learig sysems provide beer performaces ha heir couerpar sysems, (CRF-MA VS CRF) ad (LSTM-Crowd/ALCrowd VS LSTM-CRF). Compared wih he srog baselie LSTM-CRF, ALCrowd shows is advaage wih +1.08 F1 improvemes o DL-PS, +1.4 o EC-MT, ad +.38 o EC-UQ, respecively. This idicaes ha addig he crowd-aoaio learig is quie useful for buildig NER sysems. I addiio, ALCrowd also ouperforms LSTM-Crowd o all he daases cosisely, demosraig he high effeciveess of ALCrowd i exracig worker idepede feaures. Amog all he sysems, ALCrowd performs he bes, ad sigificaly beer ha all he oher models (he p-value is below 10 5 by usig -es). The resuls idicae ha wih he help of adversarial raiig, our sysem ca lear a beer feaure represeaio from crowd aoaio. Discussio Impac of Characer Embeddigs. Firs, we ivesigae he effec of he preraied characer embeddigs i our proposed crowdsourcig learig model. The compariso resuls are show i Figure, where Radom refers o he radom iiialized characer embeddigs, ad Preraied refers o he embeddigs preraied o he ulabeled daa. Accordig o he resuls, we fid ha our model wih he preraied embeddigs sigificaly ouperforms ha usig Figure 3: Case sudies of differe sysems, where amed eiies are illusraed by square brackes. he radom embeddigs, demosraig ha he preraied embeddigs successfully provide useful iformaio. Case Sudies. Secod, we prese several case sudies i order o sudy he differeces bewee our baselie ad he worker adversarial models. We coduc a closed es o he raiig se, he resuls of which ca be regarded as modificaios of he raiig corpus, sice here exis icosise aoaios for each raiig seece amog he differe workers. Figure 3 shows he wo examples from he DL-PS daase, which compares he oupus of he baselie ad our fial models, as well as he majoriy-voig sraegy. I he firs case, oe of he aoaios ge he correc NER resul, bu our proposed model ca capure i. The resul of LSTM-CRF is he same as majoriy-voig. I he secod example, he oupu of majoriy-voig is he wors, which ca accou for he reaso why he same model raied o he voed corpus performs so badly, as show i Table. The model of LSTM-CRF fails o recogize he amed eiy Xiexie because of o rusig he secod aoaio, reaig i as oe oise aoaio. Our proposed model is able o recogize i, because of is abiliy of exracig worker idepede feaures. Coclusios I his paper, we preseed a approach o performig crowd aoaio learig based o he idea of adversarial raiig for Chiese Named Eiy Recogiio (NER). I our approach, we use a commo ad privae Bi-LSTMs for represeig aoaor-geeric ad -specific iformaio, ad lear a label Bi-LSTM from he crowd-aoaed NE label sequeces. Fially, he proposed approach adops a LSTM- CRF model o perform aggig. I our experimes, we creae wo daa ses for Chiese NER asks i he dialog ad e- commerce domais. The experimeal resuls show ha he proposed approach ouperforms srog baselie sysems.

Ackowledgmes This work is suppored by he Naioal Naural Sciece Foudaio of Chia (Gra No. 6157338, 615505, ad 6160160). This work is also parially suppored by he joi research projec of Alibaba ad Soochow Uiversiy. Weliag is also parially suppored by Collaboraive Iovaio Ceer of Novel Sofware Techology ad Idusrializaio. Refereces [014] Bi, W.; Wag, L.; Kwok, J. T.; ad Tu, Z. 014. Learig o predic from crowdsourced daa. I UAI, 8 91. [016] Che, X.; Su, Y.; Ahiwaraku, B.; Cardie, C.; ad Weiberger, K. 016. Adversarial deep averagig eworks for cross-ligual seime classificaio. arxiv prepri arxiv:1606.01614. [017] Che, X.; Shi, Z.; Qiu, X.; ad Huag, X. 017. Adversarial muli-crieria learig for chiese word segmeaio. arxiv prepri arxiv:1704.07556. [011] Collober, R.; Weso, J.; Boou, L.; Karle, M.; Kavukcuoglu, K.; ad Kuksa, P. 011. Naural laguage processig (almos) from scrach. The Joural of Machie Learig Research 1:493 537. [015] Deo, E. L.; Chiala, S.; Fergus, R.; e al. 015. Deep geeraive image models usig a laplacia pyramid of adversarial eworks. I NIPS, 1486 1494. [009] Dredze, M.; Talukdar, P. P.; ad Crammer, K. 009. Sequece learig from daa wih muliple labels. I Workshop Co-Chairs, 39. [017] Dumirache, A.; Aroyo, L.; ad Wely, C. 017. Crowdsourcig groud ruh for medical relaio exracio. arxiv prepri arxiv:1701.0185. [015] Dyer, C.; Balleseros, M.; Lig, W.; Mahews, A.; ad Smih, N. A. 015. Trasiio-based depedecy parsig wih sack log shor-erm memory. I ACL, 334 343. [015] Fel, P.; Black, K.; Rigger, E. K.; Seppi, K. D.; ad Haerel, R. 015. Early gais maer: A case for preferrig geeraive over discrimiaive crowdsourcig models. I HLT-NAACL, 88 891. [016] Gai, Y.; Usiova, E.; Ajaka, H.; Germai, P.; Larochelle, H.; Laviolee, F.; Marchad, M.; ad Lempisky, V. 016. Domai-adversarial raiig of eural eworks. Joural of Machie Learig Research 17(59):1 35. [014] Goodfellow, I.; Pouge-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; ad Begio, Y. 014. Geeraive adversarial es. I NIPS, 67 680. [005] Graves, A., ad Schmidhuber, J. 005. Framewise phoeme classificaio wih bidirecioal lsm ad oher eural ework archiecures. Neural Neworks 18(5):60 610. [017] Gui, T.; Zhag, Q.; Huag, H.; Peg, M.; ad Huag, X. 017. Par-of-speech aggig for wier wih adversarial eural eworks. I Proceedigs of he 017 Coferece o EMNLP, 401 410. Copehage, Demark: Associaio for Compuaioal Liguisics. [015] Huag, Z.; Xu, W.; ad Yu, K. 015. Bidirecioal lsm-crf models for sequece aggig. arxiv prepri arxiv:1508.01991. [017] Kim, J.-K.; Kim, Y.-B.; Sarikaya, R.; ad Fosler- Lussier, E. 017. Cross-ligual rasfer learig for pos aggig wihou cross-ligual resources. I Proceedigs of he 017 Coferece o EMNLP, 8 88. Copehage, Demark: Associaio for Compuaioal Liguisics. [011] Klüwer, T. 011. From chabos o dialog sysems. Coversaioal ages ad aural laguage ieracio: Techiques ad Effecive Pracices 1. [001] Laffery, J.; McCallum, A.; Pereira, F.; e al. 001. Codiioal radom fields: Probabilisic models for segmeig ad labelig sequece daa. I ICML, volume 1, 8 89. [016] Lample, G.; Balleseros, M.; Subramaia, S.; Kawakami, K.; ad Dyer, C. 016. Neural archiecures for amed eiy recogiio. I NAACL, 60 70. [017] Liu, P.; Qiu, X.; ad Huag, X. 017. Adversarial muli-ask learig for ex classificaio. I Proceedigs of he 55h ACL, 1 10. Vacouver, Caada: Associaio for Compuaioal Liguisics. [016] Ma, X., ad Hovy, E. 016. Ed-o-ed sequece labelig via bi-direcioal lsm-cs-crf. I Proceedigs of he 54h ACL, 1064 1074. [017] Nguye, A. T.; Wallace, B.; Li, J. J.; Nekova, A.; ad Lease, M. 017. Aggregaig ad predicig sequece labels from crowd aoaios. I Proceedigs of he 55h ACL, volume 1, 99 309. [015] Peg, N., ad Dredze, M. 015. Named eiy recogiio for chiese social media wih joily raied embeddigs. I Proceedigs of he EMNLP, 548 554. [009] Raiov, L., ad Roh, D. 009. Desig challeges ad miscocepios i amed eiy recogiio. I Proceedigs of he CoNLL-009, 147 155. [014] Rodrigues, F.; Pereira, F.; ad Ribeiro, B. 014. Sequece labelig wih muliple aoaors. Machie Learig 95():165 181. [008] Sow, R.; O Coor, B.; Jurafsky, D.; ad Ng, A. Y. 008. Cheap ad fas bu is i good?: evaluaig o-exper aoaios for aural laguage asks. I Proceedigs of he coferece o EMNLP, 54 63. Associaio for Compuaioal Liguisics. [01] Tielema, T., ad Hio, G. 01. Lecure 6.5- rmsprop: Divide he gradie by a ruig average of is rece magiude. COURSERA: Neural eworks for machie learig 4():6 31. [016] Wu, Y.; Schuser, M.; Che, Z.; Le, Q. V.; Norouzi, M.; Macherey, W.; Kriku, M.; Cao, Y.; Gao, Q.; Macherey, K.; e al. 016. Google s eural machie raslaio sysem: Bridgig he gap bewee huma ad machie raslaio. arxiv prepri arxiv:1609.08144. [017] Zhag, Y.; Barzilay, R.; ad Jaakkola, T. 017. Aspec-augmeed adversarial eworks for domai adapaio. arxiv prepri arxiv:1701.00188.

[008] Zhao, H., ad Ki, C. 008. Usupervised segmeaio helps supervised learig of characer aggig for word segmeaio ad amed eiy recogiio. I IJCNLP, 106 111.