Minimum Penalized Hellinger Distance for Model Selection in Small Samples

Similar documents
Handout #5. Introduction to the Design of Experiments (DOX) (Reading: FCDAE, Chapter 1~3)

Rank Inclusion in Criteria Hierarchies

Scheme For Finding The Next Term Of A Sequence Based On Evolution {File Closing Version 4}. ISSN

Comparative Study of Word Alignment Heuristics and Phrase-Based SMT

A Genetic Programming Framework for Error Recovery in Robotic Assembly Systems

Real-time Scheduling of Flexible Manufacturing Systems using Support Vector Machines and Neural Networks

Heterogeneous Talent and Optimal Emigration 1

Cost Control of the Transmission Congestion Management in Electricity Systems Based on Ant Colony Algorithm

Use the template below as a guide for organizing the text of your story.

Music Performer Recognition Using an Ensemble of Simple Classifiers

Exploiting the Marginal Profits of Constraints with Evolutionary Multi-objective Optimization Techniques

A Realistic E-Learning System based on Mixed Reality

11 Hybrid Cables. n f Hz. kva i P. Hybrid Cables Description INFORMATION Description

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

Energy and Exergy Analysis for Single and Parallel Flow Double Effect Water-Lithium Bromide Vapor Absorption Systems

St. Patrick s Day Music Worksheets!

Section 1 Notation. A note is a symbol that represents a pitch, or musical tone. Notes are placed on a staff as space notes or line notes.

MUSIC THEORY. Essentials of. Alfred s TEACHER S ACTIVITY KIT, COMPLETE. 90 Reproducible Activities, Plus 18 Tests

Recognizing Names in Biomedical Texts using Hidden Markov Model and SVM plus Sigmoid

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

LAN CABLETESTER INSTRUCTION MANUAL I. INTRODUCTION

LONGITUDINAL AND TRANSVERSE PHASE SPACE CHARACTERIZATION

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

Table of Contents. Gloria in excelsis Deo 5. Qui tollis peccata mundi 16. Quoniam tu solus sanctus 21

A BROADCASTING PROTOCOL FOR COMPRESSED VIDEO

1.1 The Language of Mathematics Expressions versus Sentences

Polychrome Devices Reference Manual

Fast Intra-Prediction Mode Decision in H.264/AVC Based on Macroblock Properties

Object Modeling for Multicamera Correspondence Using Fuzzy Region Color Adjacency Graphs

A Place In the Choir

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

Final Exam REVIEW for ENG 1PR-03

Positive-living skills for children aged 3 to 6

Estimating PSNR in High Definition H.264/AVC Video Sequences Using Artificial Neural Networks

GEOGRAPHICAL ORIGIN PREDICTION OF FOLK MUSIC RECORDINGS FROM THE UNITED KINGDOM

Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios

3 rd Edition FULL VOICE WORKBOOK SERIES. Level One. Researched and Developed by Nikki Loney and Mim Adams.

Statistics AGAIN? Descriptives

AREA (SQ. FT.) BREAKDOWN: 1. SALES AREA: 2. ENTRY VESTIBULE (EXT.): 3. SERVICE: 4. TOILET ROOM: 5. OFFICE: 6. STAIRWAY/REAR EXIT: 7.

SYMBOL CONVERSION LONG-TERM EQUITY OPTIONS EXPIRING IN JANUARY AND MARCH 2007

PROPOSED MACRO-MODEL FOR THE ANALYSIS OF INFILLED FRAME STRUCTURES

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

Chapter 7 Registers and Register Transfers

THE SIGMA-DELTA MODULATOR FOR MEASUREMENT OF THE THERMAL CHARACTERISTICS OF THE CAPACITORS

cmp-lg/ Nov 1994

more often than not. Even though Keats died when he was only 25, having stopped writing

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

Image Intensifier Reference Manual

Line numbering and synchronization in digital HDTV systems

Conettix D6600/D6100IPv6 Communications Receiver/Gateway Quick Start

National Writing Day. National Writing Day Wednesday 27th June

BOUND FOR SOUTH AUSTRALIA

3. Sequential Logic 1

Bibliometric Characteristics of Political Science Research in Germany

Following a musical performance from a partially specified score.

Complement Structures: Outline. Complement Structures and Non-Finite Constructions in HPSG. Problems for Small Clauses. Category Selection

Facial Expression Recognition Method Based on Stacked Denoising Autoencoders and Feature Reduction

Fort Hays Kansas State College Forsyth Library Leaflet - No. 6

An Adaptive Length Frame Synchronization Scheme

An Industrial Case Study for X-Canceling MISR

Mathematical Model of the Pharmacokinetic Behavior of Orally Administered Erythromycin to Healthy Adult Male Volunteers

A Novel Method for High Current Vacuum Arc Interruption Using Externally Applied Ultra High Axial Pulsed Magnetic Field

Burl Faywood. Gospel Keyboard Studies. Blest Be the Tie That Binds. bœœ œ œ œ œ œ œ. & b. œ œ œ œ œ œ œ œ œ œ œ œ œ œœ œ œ œ œ œ œ œ œ œ.

Design for Verication at the Register Transfer Level. Krishna Sekar. Department of ECE. La Jolla, CA RTL Testbench

Classical Boomwhackers

Focus. Video Encoder Optimisation to Enhance Motion Representation in the Compressed-Domain

Step 1: Identify Your Common Hot Buttons (-) (+) Takes over as you are leading a meeting or making a presentation

Math Assignment 10

Math of Projections:Overview. Perspective Viewing. Perspective Projections. Perspective Projections. Math of perspective projection

current activity shows on the top right corner in green. The steps appear in yellow

Probability. Chapter 14 - AP Statistics

Read Only Memory (ROM)

To Bean or not to bean! by Uwe Rosenberg, with illustrations by Björn Pertoft Players: 2 7 Ages: 10 and up Duration: approx.

Audio Professional LPR 35

Elizabeth H. Phillips-Hershey and Barbara Kanagy Mitchell

Switch over to climate protection with energy-saving OSRAM products.

Six Songs for a Young Man

Kees Schoonenbeek Arranger, Composer, Director, Publisher, Teacher

CAE, May London Exposure Rating and ILF. Stefan Bernegger, Dr. sc. nat., SAV Head Analytical Services & Tools Swiss Reinsurance Company Ltd

VERIZON COMMUNICATIONS

single-phase industrial vacuums for dust Turbine motorized industrial vacuums for dust single-phase industrial vacuums for WeT & dry

T-25e, T-39 & T-66. G657 fibres and how to splice them. TA036DO th June 2011

DIGITAL DISPLAY SOLUTION REAL ESTATE POINTS OF SALE (POS)

An a ly s i s -By-Sy n t h e s i s of Ti m b r e, Timing, a n d Dy n a m i c s

LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Background Manuscript Music Data Results... sort of Acknowledgments. Suite, Suite Phylogenetics. Michael Charleston and Zoltán Szabó

Study newsletter. Phrase of the week week 4. Good job We use this expression to say that it is lucky something happened.

Digital Delay / Pulse Generator DG535 Digital delay and pulse generator (4-channel)

THE Internet of Things (IoT) is likely to be incorporated

A Life of Service THE FIRST Issue INTERNATIONAL ASSOCIATION OF ATTUNEMENT PRACTITIONERS

CLT - spring driven cable reels. Manufactured by Cavotec Alfo

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

LMusTCL. Sample paper from Your full name (as on appointment form). Please use BLOCK CAPITALS. Centre

MISSA PACEM. Penitential Act. heal the con - S, A, Assembly

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

Calypso Cradle Carol Jill Gallina

Year 2 Sound Waves - Weekly Overview. Term 1

A MUSICAL. Preview Only. pizz.

Transcription:

Ope Joural of Statstcs,,, 369-38 ttp://dxdoorg/436/os445 ublsed Ole October (ttp://wwwscrorg/oural/os) Mu ealzed ellger Dstace for Model Selecto Sall Saples apa Ngo *, Bertrad Ntep Laboratore de Mateatques et Applcatos (LMA), Uverste Cek Ata Dop, Dakar-Fa, Seegal Eal: * papago@ucadedus, tepoo@yaoofr Receved May 7, ; revsed Jue 5, ; accepted July, ABSRAC I statstcal odelg area, te Akake forato crtero AIC, s a wdely kow ad extesvely used tool for odel coce e φ-dvergece test statstc s a recetly developed tool for statstcal odel selecto e popularty of te dvergece crtero s owever tepered by ter kow lack of robustess sall saple I ts paper te pealzed u ellger dstace type statstcs are cosdered ad soe propertes are establsed e lt laws of te estates ad test statstcs are gve uder bot te ull ad te alteratve ypoteses, ad approxatos of te power fuctos are deduced A odel selecto crtero relatve to tese dvergece easures are developed for paraetrc ferece Our terest s te proble to testg for coosg betwee two odels usg soe foratoal type statstcs, we depedet saple are draw fro a dscrete populato ere, we dscuss te asyptotc propertes ad te perforace of ew procedure tests ad vestgate ter sall saple beavor Keywords: Geeralzed Iforato; Estato; ypotess est; Mote Carlo Sulato Itroducto A copreesve surveys o earso c-square type statstcs as bee provded by ay autors as Cocra [], Watso [] ad Moore [3,4], partcular o quadratcs fors te cell frequeces Recetly, Adrews [5] as exteded te earso c-square testg etod to o-dyac paraetrc odels, e, to odels wt covarates Because earso c-square statstcs provde atural easures for te dscrepacy betwee te observed data ad a specfc paraetrc odel, tey ave also bee used for dscratg aog copetg odels Suc a stuato s frequet Socal Sceces were ay copetg odels are proposed to ft a gve saple A well kow dffculty s tat eac c-square statstc teds to becoe large wtout a crease ts degrees of freedo as te saple sze creases As a cosequece goodess-of-ft tests based o earso type c-square statstcs wll geerally reect te correct specfcato of every copetg odel o crcuvet suc a dffculty, a popular etod for odel selecto, wc s slar to use of Akake [6] Iforato Crtero (AIC), cossts cosderg tat te lower te c-square statstc, te better s te odel e precedg selecto rule, owever, does ot take to accout rado varatos eret te values of te * Correspodg autor statstcs We propose ere a procedure for takg to accout te stocastc ature of tese dffereces so as to assess ter sgfcace e a propose of ts paper s to address ts ssue We sall propose soe coveet asyptotcally stadard oral tests for odel selecto based o φ-dvergece type statstcs Followg Vuog [7,8] te procedures cosdered ere are testg te ull ypotess tat te copetg odels are equally close to te data geeratg process (DG) versus te alteratve ypotess tat oe odel s closer to te DG were closeess of a odel s easured accordg to te dscrepacy plct te φ-dvergece type statstc used us te outcoes of our tests provde forato o te stregt of te statstcal evdece for te coce of a odel based o ts goodess-of-ft (see Ngo [9]; Dedou ad Ngo []) e odel selecto approac roposed ere dffers fro tose of Cox [], ad Akake [] for o ested ypoteses s dfferece s tat te preset approac s based o te dscrepacy plct te dvergece type statstcs used, wle tese oter approaces as Vuog s [7] tests for odel selecto rely o te Kullback-Lebler [3] forato crtero (KLIC) Bera [4] sowed tat by usg te u ellger dstace estator, oe ca sultaeously obta asyptotc effcecy ad robustess propertes te presece of outlers e works of Spso [5] ad Copyrgt ScRes

37 NGOM, B NE Ldsay [6] ave sow tat, te tests ypoteses, robust alteratves to te lkelood rato test ca be geerated by usg te ellger dstace We cosder a geeral class of estators tat s very broad ad cotas ost of estators curretly used practce we forg dvergece type statstcs s covers te case studes arrs ad Basu [7]; Basu et al [8]; Basu ad Basu [9] were te pealzed ellger dstace s used e reader of ts paper s orgazed as follows Secto troduces te basc otatos ad deftos Secto 3 gves a sort overvew of dvergece easures Secto 4 vestgates te asyptotc dstrbuto of te pealzed ellger dstace I Secto 5, soe applcatos for testg ypoteses are proposed Secto 6 presets soe sulato results Secto 7 cocludes te paper Deftos ad Notato I ts secto, we brefly preset te basc assuptos o te odel ad paraeters estators, ad we defe our geeralzed dvergece type statstcs We cosder a dscrete statstcal odel, e X, X,, X a depedet rado saple fro a dscrete populato wt support X,, Let p,, p be a probablty vector e Ω were Ω s te splex of probablty -vectors, p, p,, p ; p,,,, p We cosder a paraeter odel p,, p : wc ay or ay ot cota te true dstrbuto, were Θ s a copact subset of k-desoal Eucldea space (wt k < ) If cotas, te tere exsts a θ Θ suc tat ad te odel s sad to be correctly specfed We are terested testg : (wt true paraeter ) versus : By we deote te usual Eucldea or ad we terpret probablty dstrbutos o X as row vectors fro R For splcty we restrct ourselves to ukow true paraeters θ satsfyg te classcal regularty codtos gve by Brc []: ) rue θ s a teror pot of ad p, for, us p,, p s a teror pot of te set ) e appg : s totally dfferetable at θ so tat te partal dervatves of p wt respect to eac θ exst at θ ad p (θ) as a lear approxato at θ gve by k p p p o were o deotes a fucto verfyg 3) e Jacoba atrx J o l p k s of full rak (e of rak k ad k < ) 4) e verse appg : s cotuous at 5) e appg : s cotuous at every pot Uder te ypotess tat, tere exsts a ukow paraeter θ suc tat ad te proble of pot estato appears a atural way Let be saple sze We ca estate te dstrbuto, p,, p,, p p p by te vector of observed frequeces o X e of easurable appg X s o paraetrc estator p,, p s de- N fed by p, N X were X f X () oterwse We ca ow defe te class of φ-dvergece type statstcs cosdered ts paper 3 A Bref Revew of φ-dvergeces May dfferet easures quatfyg te degree of dscrato betwee two probablty dstrbutos ave bee studed te past ey are frequetly called dstace easures, altoug soe of te are ot strctly etrcs ey ave bee appled to dfferet areas, suc as edcal age regstrato (Jose W lu [], classfcato ad retreval, aog oters s class of dstaces s referred, te lterature, as te class of φ, f or g-dvergeces (Cssza r []; Vada []; Morales et al [3]; te class of dspartes (Ldsay [6]) e dvergece easures play a portat role statstcal teory, especally large teores of estato ad testg Later ay papers ave appeared te lterature, were dvergece or etropy type easures of forato ave bee used testg statstcal ypoteses Aog oters we refer to Read ad Cresse [4], Zografos et al [5], Salcru et al [6], Bar-e ad Daud [7], Mee dez et al [8]), ardo et al [9] ad te refereces tere A easure of dscrato betwee two probablty dstrbutos called φ-dvergece, was Copyrgt ScRes

NGOM, B NE 37 troduced by Cssza r [3] Recetly, Broatowsk et al [3] preseted a ew dual represetato for dvergeces er a was to troduce estato ad test procedures troug dvergece optzato for dscrete or cotuous paraetrc odels I te proble were depedet saples are draw fro two dfferet dscrete populatos, Basu et al [3] developed soe tests based o te ellger dstace ad pealzed versos of t Cosder two populatos X ad Y, accordg to classfcato crtera ca be grouped to classes speces x, x,, x ad y, y,, y wt probabltes p, p,, p ad Q q, q,, q respectvely e p D, Q q (3) q s te -dvergece betwee ad Q (see Cssza r, [3]) for every te set Φ of real covex fuctos defed o, e fucto (t) s assued to verfy te followg regularty codto: :, R s covex ad cotuous, were ad p l u u Its restrcto o, s fte, twce cotuously dfferetable a egborood of u =, wt ad (cf Lese ad Vada [33]) We sall be terested also paraetrc estators Q Q (33) of wc ca be obtaed by eas of varous pot estators : of te ukow paraeter It s coveet to easure te dfferece betwee observed ad expected frequeces A u Dvergece estator of θ s a zer of D, were s a oparaetrc dstrbuto estate I our case, were data coe fro a dscrete dstrbuto, te eprcal dstrbuto defed () ca be used I partcular f we replace 4 x x x (3) we get te ellger dstace betwee dstrbuto ad θ gve by, D, p p : D (34) Lese ad Vada [33], Ldsay [6] ad Morales et al [3] troduced te so-called u φ-dvergece estate defed by Θ arg Θ D, ; D, D, ; (35) (36) Reark 3 e class of estates (34) cotas te axu lkelood estator (MLE) I partcular f we replace log x x we get KL arg KL, arg log p MLE were KL s te odfed Kullback-Lebler dvergece Bera [4] frst poted out tat te u ellger dstace estator (MDE) of θ, defed by arg, ; Θ D as robustess propretes Furter results were gve by aura ad Boos [34], Spso [5], ad Basu et al [35] for ore detals o ts etod of estato Spso, owever, oted tat te sall saple perforace of te ellger devace test at soe dscrete odels suc as te osso s soewat usatsfactory, te sese tat te test requres a very large saple sze for te c-square approxato to be useful (Spso [5], able 3) I order to avod ts proble, oe possblty s to use te pealzed ellger dstace (see arrs ad Basu, [36]; Basu, Basu ad Basu, [9]; Basu et al [3]) e pealzed ellger dstace faly betwee te probablty vectors ad θ s defed by: D, p p C p were s a real postve uber wt : p ad c : p (37) (38) Note tat we =, ts geerates te ordary ellger dstace (Spso, [5]) ece (37) ca be wrtte as follows arg, (39) Θ D Oe of te suggestos to use te pealzed ellger s otvated by te fact tat ts sutable coce ay lead to a estate ore robust ta te MLE A odel selecto crtero ca be desged to estate a expected overall dscrepacy, a quatty wc reflects te degree of slarty betwee a ftted ap- Copyrgt ScRes

37 NGOM, B NE proxatg odel ad te geeratg or true odel Estato of Kullback s forato (see Kullback- Lebler [3]) s te key to dervg te Akake Iforato crtero AIC (Akake [6]) Motvated by te above developets, we propose by aalogy wt te approac troduced by Vuog [7,8], a ew forato crtero relatg to te φ-dvergeces I our test, te ull ypotess s tat te copetg odels are as close to te data geeratg process (DG) were closeess of a odel s easured accordg to te dscrepacy plct te pealzed ellger dvergece 4 Asyptotc Dstrbuto of te ealzed ellger Dstace ereafter, we focus o asyptotc results We assue tat te true paraeter ad appg : satsfy codtos - 6 of Brc [] We cosder te -vector p,, p, te k Jacoba atrx J J wt J l l p l te k atrx,, ; l,, k D dag J ad te k k Fser forato atrx I p p p r s rs,,, k D were dag dag,, p p e above defed atrces are cosdered at te pot θ Θ were te dervatves exst ad all te coordates p (θ) are postve e stocastc covergeces of rado vectors X to a rado vector X are deoted by X X ad L X X (covergeces probablty ad law, respectvely) Istead cx for a sequece of postve ubers c we ca wrte X oc p s relato eas: D l x l sup x cx x A estator of s cosstet f for every Θ te rado vector p,, p teds probablty to p,, p, e f l x for all We eed te followg result to prove eore 43 roposto 4 (Madal et al [37]) Let Φ, let p:θ Ω be twce cotuously dfferetable a egborood of ad assue tat codtos - 5 of Secto old Suppose tat I s te k k Fser Iforato atrx ad satsfyg (37) e te ltg dstrbuto of + s N, I Lea 4 We ave were N,,, a estator of p,, p defed () wt ad roof Deote N dag N V p,, N p were s X X oterwse as V p ; ; p ( p ; ; p ad applyg te Cetral Lt eore we ave N p N p,, N, were For splcty, we wrte D, dag D, stead (4) eore 43 Uder te assuptos of roposto (4), we ave were, N, M M M M dag M J I roof A frst order aylor expaso gves (4) Copyrgt ScRes

NGOM, B NE 373 J o (4) I te sae way as Morales et al [8], t ca be establsed tat: I D dag o Fro (4) ad (43) we obta J I D dag o terefore te rado vectors ad M I (43) were I s te uty atrx, ave te sae asyptotc dstrbuto Furterore t s clear (applyg CL) tat - Beg ples N, te atrx dag I N, I, M M terefore, we get N, M M M M (44) e case wc s terest to us ere s to test te ypotess : Our proposal s based o te follow g pealzed dvergece test statstc D, were ad ave bee troduce eore (43) ad (37) respectvely Usg arguets slar to tose developed by Basu [7], uder te assuptos of (43) ad te ypotess : =, te asyptotc dstrbuto of D, s a c-square we = wt derees k degrees of freedo Sce te oters ebers of pealzed ellger dstace tests dffer fro te ordary ellger dstace test oly at te epty cells, tey too ave te sae asyptotc dstrbuto Cosderg ow te case we te odel s wrog e : We troduce te followg regularty assuptos (A ) ere exsts arg f Θ D, suc tat: as we + Λ * (A ) ere exsts ;, wt Λ Λ Λ p (4) ad Λ Λ suc tat Λ N, eore 44 Uder : ad assue tat codtos (A ) ad (A ) old, we ave: were, N,, D D J J J J (45),,, wt D p, p,,, p Ad J p p, p p,, wt D p, p,,, p p p p p roof A frst order aylor expaso gves,, J o D D (46) Fro te assued assuptos (A ) ad (A ), te result follows 5 Applcatos for estg ypotess e estate ca be used to perfor statstcal tests D, 5 est of Goodess-Ft For copleteess, we look at D, te usual way, e as a goodess-of-ft statstc Recall tat ere s te u pealzed ellger dstace est- D, s a cosstet estator ator of Sce Copyrgt ScRes

374 NGOM, B NE of tc, D, s D, te ull ypotess we usg te stats- o : D, or equvaletly, o : = ece, f o s reected so tat oe ca fer tat te paraetrc odel s sspecfed Sce D, s o-egatve ad takes value zero oly we =, te tests are defed troug te crtcal rego C D, q, k were q,k s te ( )-quatle of te -dstrbuto wt k degrees of freedo Reark 5 eore (44) ca be used to gve te followg approxato to te power of test : o D, Approxated power fucto s D, q, k q D,,, k were q,k s te ( )-quatle of te -dstrbuto wt k degrees of freedo ad s a sequece of dstrbuto fucto tedg uforly to te stadard oral dstrbuto x Note tat f o : D,, te for ay fxed sze te probablty of reecto o : D, wt te reec to rule D, q, k teds to oe as (57) Obtag te approxate saple, guarateeg a power for a gve alteratve, s a terestg applcato of Forula (57) If we ws te power to be equal to *, we ust solve te equato q, k D,, It s ot dffcult to ceck tat te saple sze *, s te soluto of te followg equato, D, D *, e soluto s gve by q, k aab ab D, q, k Copyrgt ScRes wt a ad, b q,kd, ad te requred sze s, were deotes teger part of 5 est for Model Selecto As we etoed above, we oe cooses a partcular -dvergece type statstc D, D, wt te correspodg u pealzed ellger dstace estator of, oe actually evaluates te goodess-of-ft of te paraetrc odel accordg to te dscrepacy D, betwee te true dstrbuto ad te specfed odel us t s atural to defe te best odel aog a collecto of copetg odels to be te odel tat s closest to te true dstrbu- to accordg to te dscrepacy D, I ts paper we cosder te proble of selectg betwee two odels Let G G( ; be aoter odel, were s a q-desoal paraetrc odels I a slar way, we ca defe te u pealzed ellger dstace estator of ad te correspodg dscrepacy D, G for te odel G Our specal terest s te stuato wc a researcer as two copetg paraetrc odels ad G, ad e wses to select te better of two odels based o ter dscrato statstc betwee te observatos ad odels ad G, defed respectvely by D, ad D, Let te two copetg paraetrc odels ad G wt te gve dscrepacy D, Défto 5 eq : D, D, eas tat te two odels are equvalet, p : D, D, eas tat s better ta G, G : D, D, eas tat s worse ta G Reark 53 ) It does ot requre tat te sae d vergece type statstcs be used forg D, ad D, Coosg, owever, dfferet ds- crepacy for evaluatg copetg odels s ardly utfed ) s defto does ot requre tat eter of te copetg odels be correctly specfed O te oter ad, a correctly specfed odel ust be at least as good as ay oter odel e followg expresso of te dcator D, D, s ukow, but fro te prevous secto, t ca be estated by te dfferece D, D, s dfferece coverges to zero uder te ull ypo-

NGOM, B NE 375 eq tess, but coverges to a strctly egatve or postve costat we ad G olds ese propertes actually ustfy te use of D, D, as a odel selecto dcator ad coo procedure of selectg te odel wt gest goodess-of-ft As argued te troducto, owever, t s portat to take to accout te rado ature of te dfferece D, D, so as to assess ts sgfc- ace o do so we cosder te asyptotc dstrbuto of D, D, eq uder Our aor task s to propose soe tests for odel selecto, e for te ull ypotess eq agast te alteratve or G We use te ext lea wt a d as te correspodg u pealzed ellger dstace estator of ad Usg ad defed earler, we cosder te vector K k,, k were were k D, p, q D, p Q q q,,, wt,, wt,, Lea 54 Uder te assuptos of te eore (44), we ave () for te odel,,, K p D D () for odel G Q o G p K,,, D D G Q G o roof e results follow fro a frst order aylor expaso We defe K K ; Q Q K K ; Q Q wc s te varace of K K; Q Q Sce K, K, Q, Q, ad * are cosstetly estated by ter saple aalogues K, K, Q, Q ad *, ece s cosstetly estated by K K Q Q K K Q Q ; ; Next we defe te odel selecto statstc ad ts asyptotc dstrbuto uder te ull ad alteratve ypotess Let I D, D, were I stads for te pealzed ellger Idcator e followg teore provdes te lt dstrbuto of I uder te ull ad alteratves ypotess eore 55 Uder te assuptos of eore (44), suppose tat, te eq ) Uder te ull ypotess, I N, ) Uder te ull ypotess probablty 3) Uder te ull ypotess G probablty roof Fro te Lea (54), t follows tat D, D,,, Q Q G G o p D D G K K Uder eq : G ad G we get,, Q Q op D D K K K K; Q Q op Fally, applyg te Cetral Lt eore ad assuptos (A ) ad (A ), we ca ow edately obta I N, 6 Coputatoal Results 6 Exaple o llustrate te odel procedure dscussed te precedg secto, we cosder a exaple We eed to defe te copetg odels, te estato etod used for eac copetg odel ad te ellger pealzed pealzed type statstc to easure te departure of eac proposed paraetrc odel fro te true data geeratg process For our copetg odels, we cosder te proble of coosg betwee te faly of osso dstrbuto ad Copyrgt ScRes

376 NGOM, B NE te faly of Geoetrc dstrbuto e osso dstrbuto () s paraeterzed by ad as desty f x exp x, x! x G x, p p p for for x ad zero oterwse e Geoetrc dstrbuto G(p) s paraeterzed by p ad as desty x ad zero oterwse We use te u pealzed ellger dstace statstc to evaluate te dscrepacy of te proposed odel fro te true data geeratg process We partto te real le to tervals C, C,, Λ, were C ad C e coce of te cells s dscussed below e correspodg u pealzed ellger dstace estator of ad p are: arg D, arg p arg D, c f p p p arg f p pp c p ad p p are probabltes of te cells C, C uder te osso ad Geoetrc true dstrbuto respectvely We cosder varous sets of experets wc data are geerated fro te xture of a osso ad Geoetrc dstrbuto ese two dstrbutos are xture of a osso ad Geoetrc dstrbuto ese two dstrbutos are calbrated so tat ter two eas are close (4 ad 5 respectvely) ece te DG (Data Geeratg rocess) s geerated fro M(π) wt te desty π π os 4 π Geo were π (π [, ] s specfc value to eac set of experets I eac set of experet several rado saple are draw fro ts xture of dstrbutos e saple sze vares fro to 3, ad for eac saple sze te uber of replcato s I eac set of experet, we coose two values of te paraeter = ad =, were = correspods to te classc ellger dstace e a s to copare te accuracy of te selecto odel depedg o te paraeter settg cose I order a perfect ft by te proposed etod, for te cose paraeters of tese two dstrbutos, we ote tat ost of te ass s cocetrated betwee ad erefore, te cose partto as egt cells defed by C, C,,,,7 ad C7, C8 7, Copyrgt ScRes represets te last cell We coose dfferet values of π wc are, 5, 535, 75, Altoug our proposed odel selecto procedure does ot requre tat te data geeratg process belog to eter of te copetg odels, we cosder te two ltg cases π = ad π = for tey correspod to te correctly specfed cases o vestgate te case were bot copetg odels are sspecfed but ot at equal dstace fro te DG, we cosder te case π = 5, π = 75 ad π = 5 secod case s terpreted slarly as a Geoetrc slgtly cotaated by a osso dstrbuto e forer case correspod to a DG wc s osso but slgtly cotaated by a Geoetrc dstrbuto I te last case, π = 535 s te value for wc te osso D, G ad te Geoetrc D, Gp faly are approxatvely at equal dtace to te xture (π) accordg to te pealzed ellger dstace wt te above cells us ts set of experets correspods approxatvely to te ull ypotess of our proposed odel selecto test e results of our dfferet sets of experets are preseted ables -5 e frst alf of eac table gves te average values of te u pealzed ellger dstace estator ad p, te pealzed ellger goodess-of-ft statstcs, D, Gp D G ad dcator statstcs e values pareteses are stadard errors e secod alf of eac table gves percetage te uber of tes our proposed odel selecto procedure based o favors te osso odel, te Geoetrc odel, ad decsve e tests are coducted at 5% oal sgfcace level I te frst two sets of experets (π = ad π = ) were oe odel s correctly specfed, we use te labels correct, correct ad decsve we a coce s ade e frst alves of ables -5 cofr our asyptotc results ey all sow tat te u pealzed ellger estators ad p coverge to ter pseudo-true values te sspecfed cases ad to ter true values te correctly specfed cases as te saple sze creases Wt respect to our, ts dverges to or + at te approxate rate of except te able 5 I te latter case te statstc coverges, as expected, to zero wc s te ea of te asyptotc N(, ) dstrbuto uder our ull ypotess of equvalece Wt te excepto of ables ad, we observed a large percetage of correct decsos s s because bot odels are ow correctly specfed I cotrast, turg to te secod alves of te ables ad, we frst ote tat te percetage of correct coces usg statstc steadly creases ad ultately coverges to, ad te ellger

NGOM, B NE 377 able DG = os(4) 3 4 5 3 (3) 95(3) 97() 5() () 395(4) 49(4) 45(3) 45(8) 45(3) D (os) = 33(7) 8(5) 59(3) 4(3) 37() = / 96(4) 64(3) 48() 34() 3() D (Geo) = 39(8) 348() 8(9) 8() 7(5) = 78(7) 6(8) 4(6) 36(6) 3(3) = Correct Idecsve Icorrect 367(4) 43(69) 434(38) 483(5) 497(8) 77% 3% 87% 3% = 36(33) 398(48) 373(9) 46(35) 45(87) Correct Idecsve Icorrect 7% 3% 79% % 9% 8% 83% 7% 96% 4% 86% 7% 93% 7% able DG = Geo() 3 4 5 3 96(4) 3(3) 3() 3() () 39() 46(89) 49(67) 49(58) 435(34) D (os) = 356(4) 39() 7(9) 53(8) 44(7) = 5 8() 73(7) 54(7) 46(7) 37() D (Geo) = 5(6) 89(5) 53(3) 39() 33() = 3(4) 67(3) 44() 35() 7(98) = 88(43) 56(37) 3(5) 334(4) 34(3) Correct 36% 6% 77% 84% 9% Idecsve 64% 38% 3% 6% 8% Icorrect = 7(7) 6(5) 76(96) 3(65) 49(3) Correct Idecsve Icorrect 36% 64% 6% 38% 77% 3% 84% 6% 9% 8% able 3 DG = 75 Geo() + 5 os(4) 3 4 5 3 3(3) 97() 8(8) (5) () 46(7) 39(55) 48(55) 397(43) 4() D (os) = 546(3) 47() 4(9) 4(8) 367(6) = 344(7) 34(5) 3(5) 3(5) 34(3) D (Geo) = 5(6) 89(5) 53(3) 39() 33() = 367(6) 43(53) 434(47) 483(7) 537() = () 8(89) 8() 37(99) 3(84) Geo 3% 4% 5% 64% 8% Idecsve 77% 6% 5% 36% 9% os = 84(9) 83(7) 845(6) 967(5) 3(78) Geo Idecsve os 7% 8% 3% 5% 83% % 9% 89% % % 77% % 33% 66% % Copyrgt ScRes

378 NGOM, B NE able 4 DG = 75 os(4) + 5 Geo() 3 4 5 3 3(3) (3) () 6() 3() 4(43) 49(3) 397(8) 4(6) 49(7) D (os) = 779(45) 634(3) 65(8) 57(4) 5() = 443(4) 473() 5() 5(8) 483(4) D (Geo) = 55(35) 87(5) 53(3) 39() 33() = 64(5) 66(5) 7(4) 69(3) 63() = Geo Idecsve os 4(7) 44() 49(8) 77() 89(9) 38% 6% 37% 63% 3% 68% 7% 83% % 79% = 8(37) 37(33) 3(6) 66(8) 83(6) Geo Idecsve os 48% 5% 45% 55% 46% 54% 3% 7% 4% 76% able 5 DG = 535 os(4) + 465 Geo() 3 4 5 3 96(6) 4(5) (3) 3(7) 4() 3968(6) 396(46) 398(374) 43(39) 4() D (os) = 869(63) 6(46) 58(36) 55(38) 3(5) = 633(3) 49(8) 369(7) 3(6) 4(7) D (Geo) = 867(5) 68(37) 553(3) 495(6) 37() = 57() () 63() 87(9) 37() = 79(4) 38(5) 8(99) 334() 44(67) Geo 3% 4% 5% % 3% Idecsve 9% 9% 93% 88% 88% os 5% 4% % % % = 86(4) 48(64) 378(9) 45(86) 67(73) Geo Idecsve os 5% 9% 3% 6% 9% 4% 4% 95% % 9% 9% % % 88% % e precedg coets for te secod alves of ables ad also apply to te secod alves of ables 3 ad 4 I all ables -4, te results cofr, sall saples, te relatve doato of te odel selecto procedure based o te pealzed ellger statstc test ( = ) ta te oter correspodg to te coce of classcal ellger statstc test ( = ), percetages of correct decsos able 5 also cofrs our asyptotcs results: as saple sze creases, te percetage of reecto of bot odels coverges, as t sould, to I Fgures, 3, 5, 7 ad 9 we plot te stogra of datasets ad overlay te curves for Geoetrc ad osso dstrbuto We te DG s correctly specfed Fgure, te osso dstrbuto as reasoable cace of beg dstgused fro geoetrc dstrbuto Slarly, Fgure 3, as ca be see, te Geoetrc dstrbuto closely approxates te data sets I Fgures 5 ad 7 two dstrbutos are close but te Geoetrc (Fgure 5) ad te osso dstrbutos (Fgure 7) does appear to be uc closer to te data sets We = 535, te dstrbuto for bot (Fgure 9) osso dstrbuto ad Geoetrc dstrbuto are slar, wle beg slgtly correspodg to te ordary ellger dstace As expected, our statstc dvergece dverges to (Fgures ad 8) ad to + (Fgures 4 ad 8) ore rapdly syetrcal about te axs tat passes troug te ode of data dstrbuto s follows fro Copyrgt ScRes

NGOM, B NE 379 Fgure stogra of DG os(4) wt = 5 Fgure 4 Coparaso barplot of depedg Fgure Coparatve barplot of depedg Fgure 5 stogra of DG = 75 Geo + 5 os wt = 5 Fgure 3 stogra of DG-Geo() wt = 5 te fact tat tese two dstrbutos are equdstat fro te fact tat tese two dstrbutos are equdstat fro te DG ad would be dffcult to dstgus fro data practce e precedg results tables ad te eore (55) cofr, Fgures, 4, 6 ad 8, tat te ellger dcator for te odel selecto procedure based o paelzed ellger dvergece statstc wt = 5 (lgt bars) doates te procedure obtaed wt = (dark bars) we we use te pealzed ellger dstace test ta te classcal ellger dstace test ece, Fgure allows a coparso wt te asyptotc N (, ) approxato uder our ull ypotess of equvalece ece te dcator /, based o te pealzed ellger dstace s closer to te ea of N (, ) ta s te dcator Copyrgt ScRes

38 NGOM, B NE Fgure 6 Coparatve barplot of depedg Fgure 9 stogra of DG = 465 Geo + 535 os wt = 5 Fgure 7 stogra of DG = 5 Geo + 75 os wt = 5 Fgure Coparatve barplot of depedg Fgure 8 Coparatve barplot of 7 Cocluso I ts paper we vestgated te probles of odel selecto usg dvergece type statstcs Specfcally, we proposed soe asyptotcally stadard oral ad c-square tests for odel selecto based o dvergece type statstcs tat use te correspodg u pealzed ellger estator Our tests are based o testg weter te copetg odels are equally close to te true dstrbuto agast te alteratve ypoteses tat oe odel s closer ta te oter were closeess of a odel s easured accordg to te dscrepacy plct te dvergece type statstcs used e pealzed ellger dvergece crtero outperfors classcal crtera for odel selecto based o te ordary ellger dstace, especally sall saple, te dfferece s Copyrgt ScRes

NGOM, B NE 38 expected to be al for large saple sze Our work ca be exteded several drectos Oe exteso s to use rado stead of fxed cells Rado cells arse we te boudares of eac cell c deped o soe ukow paraeter vector, wc are estated For varous exaples, see eg, Adrews [37] For stace, wt approprate rado cells, te asyptotc dstrbuto of a earso type statstc ay becoe depedet of te true paraeter o uder correct specfcato I vew of ts latter result, t s expected tat our odel selecto test based o pealzed ellger dvergece easures wll rea asyptotcally orally or csquare dstrbuted 8 Ackowledgeets s researc was supported, part, by grats fro AIMS (Afrca Isttute for Mateatcal Sceces) 6 Melrose Road, Muzeberg-Cape ow 7945 Sout Afrca REFERENCES [] W G Cocra, e est of Goodess of Ft, e Aals of Mateatcal Statstcs, Vol 3, No 3, 95, pp 35-345 do:4/aos/777938 [] G S Watso, O te Costructo of Sgfcace ests o te Crcle ad te Spere, Boetrka, Vol 43, No 3-4, 956, pp 344-35 do:37/3393 [3] D S Moore, C-Square ests Studes Statstcs, 978 [4] D S Moore, ests of C-Squared ype Goodess of Ft ecques, 986 [5] D W K Adrews, C-Square Dagostc ests for Ecooetrc Models: eory, Ecooetrca, Vol 56, No 6, 988, pp 49-453 do:37/935 [6] A Kake, Iforato eory ad Exteso of te Lkelood Rato rcple, roceedgs of te Secod Iteratoal Syposu of Iforato eory, 973, pp 57-8 [7] Q Vuog, Lkelood Rato ests for Model Selecto ad No-Nested ypoteses, Ecooetrka, Vol 57, No, 989, pp 57-36 do:37/9557 [8] Q Vuog ad W Wag, Mu C-Square Estato ad ests for Model Selecto, Joural of Ecooetrcs, Vol 57, No -, 993, pp 4-68 do:6/34-476(93)94-d [9] Ngo, Selected Estated Models wt Á-Dvergece Statstcs Global, Joural of ure ad Appled Mateatcs, Vol 3, No, 7, pp 47-6 [] A Dédou ad Ngo, Cutoff e Based o Geeralzed Dvergece Measure, Statstcs ad robablty Letters, Vol 79, No, 9, pp 343-35 do:6/spl96 [] D R Cox, ests of Separate Fales of ypoteses, roceedgs of te Fourt Berkeley Syposu o Mate- atcal Statstcs ad robablty, Los Ageles, -3 Jue 96, pp 5-3 [] Akake, A New Look at te Statstcal Model Idetfcato, IEEE rasacto o Iforato eory, Vol 9, No 6, 974, pp 76-73 [3] S Kullback ad R A Lebler, O Iforato ad Suffcecy, e Aals of Mateatcal Statstcs, Vol, No, 95, pp 79-86 do:4/aos/7779694 [4] R J Bear, Mu ellger Dstace Estates for araetrc Models, e Aals of Mateatcal Statstcs, Vol 5, No 3, 977, pp 445-463 [5] D G Spso, ellger Devace est: Effcecy, Breakdow ots ad Exaples, Joural of Aerca Statstcal Assocato, Vol 84, No 45, 989, pp 7-3 do:8/6459989478744 [6] B G Ldsay, Effcecy versus Robustess: e Case for Mu Dstace ellger Dstace ad Related Metods, Aals of Statstcs, Vol, No, 994, pp 8-4 do:4/aos/76355 [7] A Basu ad B G Ldsay, Mu Dsparty Estato for Cotuous Models: Effcecy, Dstrbutos ad Robustess, e Aals of Mateatcal Statstcs, Vol 46, No 4, 994, pp 683-75 do:7/bf773476 [8] A Basu, I R arrs ad S Basu, ests of ypoteses Dscrete Models Based o te ealzed ellger Dstace, Statstcs ad robablty Letters, Vol 7, No 4, 996, pp 367-373 do:6/67-75(95)-8 [9] A Basu ad S Basu, ealzed Mu Dsparty Metods for Multoal Models, Statstca Sca, Vol 8, 998, pp 84-86 [] M W Brc, e Detecto of artal Assocato, II: e Geeral Case, Joural of te Royal Statstcal Socety, Vol 7, No, 965, pp -4 [] J W lu, J B A Matz ad A M Vergever, f-iforato Measures to Medcal Iage Regstrato, IEEE rasactos o Medcal Iagg, Vol 3, No, 4, pp 58-56 do:9/mi483687 [] I Vada, eory of Statstcal Evdece ad Iforato, Kluwe Acadec lubser, Dordrect, 989 [3] D Morales, L ardo ad I Vada, Asyptotc Dvergece of Estates of Dscrete Dstrbuto, Joural of Statstcal lag ad Iferece, Vol 483, No 3, 995, pp 347-369 do:6/378-3758(95)3-y [4] N Cresse ad R C Read, Multoal Goodess of Ft est, Joural of te Royal Statstcal Socety, Vol 463, No 3, 984, pp 44-464 [5] K Zografos ad K Feretos, Dvergece Statstcs Saplg ropertes ad Multoal Goodess of ft ad Dvergece ests, Coucatos Statstcs eory ad Metods, Vol 9, No 5, 99, pp 785-8 do:8/36998839 [6] M Salcru, D Morales, M L Meedez, et al, O te Applcatos of Dvergece ype Measures estg Statstcal ypoteses, Joural of Multvarate Aalyss, Vol 5, No, 994, pp 37-39 do:6/va99468 [7] A Bar-e ad J J Dad, Geeralsato of te Ma- Copyrgt ScRes

38 NGOM, B NE alaobs Dstace te Mxed Case, Joural of Multvarate Aalyss, Vol 53, No, 995, pp 33-34 do:6/va9954 [] L ardo, D Morales, M Salcrù ad M L Meedez, Geeralzed Dvergeces Measures: Aout of Iforato, Asyptotc-Dstrbuto ad Its Applcatos to est Statstcal ypoteses, Iteratoal Sceces, Vol 84, No 3-4, 995, pp 8-98 [3] M L Meedez, L ardo, M Salcrù ad D Morales, Dvergece Measures, Based o Etropy Fuctos ad Statstcal Iferece, Sakyã: e Ida Joural of Statstcs, Vol 57, No 3, 995, pp 35-337 [4] I Csszár, Iforato-ype Measure of Dfferece of robablty Dstrbuto ad Idrect Observatos, Studa Scetaru Mateatcaru ugarca, Vol, 967, pp 99-38 [5] M Broatowsk ad A oa, Dual Dvergece Estators ad ests: Robustess Results, Joural of Multvarate Aalyss, Vol, No,, pp -36 [6] A Basu, A Madal ad L ardo, ypotess estg for wo Dscrete opulatos Based o te ellger Dstace, Statstcs ad robablty Letters, Vol 8, No 3-4,, pp 6-4 do:6/spl98 [7] F Lese ad I Vada, Covex Statstcal Dstace, vol 95 of euber-exte zur Mateatk, 987 [8] R aura ad D D Boos, Mu ellger Dstace Estato for Multvarate Locato ad Covarace, Joural of Aerca Statstcal Assocato, Vol 8, No 333, 989, pp 3-9 [9] A Basu, S Sarkar ad A N Vdyasakar, Mu Negatve Expoetal Dsparty Estato araetrc Models, Joural of Statstcal lag ad Iferece, Vol 58, No, 997, pp 349-37 do:6/s378-3758(96)78-x [] I R arrs ad A Basu, ellger Dstace as ealzed Loglkelood, Coucatos Statstcs eory ad Metods, Vol, No 3, 994, pp 637-646 do:8/369988384 [] A Madal, R K atra ad A Basu, Mu ellger Dstace Estato wt Iler Modfcato, Sakya, Vol 7, 8, pp 3-3 Copyrgt ScRes