Hierarchical Sequential Memory for Music: A Cognitive Model

Similar documents
-To become familiar with the input/output characteristics of several types of standard flip-flop devices and the conversion among them.

Lab 2 Position and Velocity

Measurement of Capacitances Based on a Flip-Flop Sensor

A Turbo Tutorial. by Jakob Dahl Andersen COM Center Technical University of Denmark

10. Water tank. Example I. Draw the graph of the amount z of water in the tank against time t.. Explain the shape of the graph.

MULTI-VIEW VIDEO COMPRESSION USING DYNAMIC BACKGROUND FRAME AND 3D MOTION ESTIMATION

4.1 Water tank. height z (mm) time t (s)

LATCHES Implementation With Complex Gates

UPDATE FOR DESIGN OF STRUCTURAL STEEL HOLLOW SECTION CONNECTIONS VOLUME 1 DESIGN MODELS, First edition 1996 A.A. SYAM AND B.G.

application software

Overview ECE 553: TESTING AND TESTABLE DESIGN OF. Ad-Hoc DFT Methods Good design practices learned through experience are used as guidelines:

2015 Communication Guide

application software

Real-time Facial Expression Recognition in Image Sequences Using an AdaBoost-based Multi-classifier

CE 603 Photogrammetry II. Condition number = 2.7E+06

(12) (10) Patent N0.: US 7,260,789 B2 Hunleth et a]. (45) Date of Patent: Aug. 21, 2007

Nonuniform sampling AN1

MELODY EXTRACTION FROM POLYPHONIC AUDIO BASED ON PARTICLE FILTER

Automatic Selection and Concatenation System for Jazz Piano Trio Using Case Data

Adaptive Down-Sampling Video Coding

DO NOT COPY DO NOT COPY DO NOT COPY DO NOT COPY

Video Summarization from Spatio-Temporal Features

TUBICOPTERS & MORE OBJECTIVE

Mean-Field Analysis for the Evaluation of Gossip Protocols

Physics 218: Exam 1. Sections: , , , 544, , 557,569, 572 September 28 th, 2016

THE INCREASING demand to display video contents

First Result of the SMA Holography Experirnent

AN ESTIMATION METHOD OF VOICE TIMBRE EVALUATION VALUES USING FEATURE EXTRACTION WITH GAUSSIAN MIXTURE MODEL BASED ON REFERENCE SINGER

Singing voice detection with deep recurrent neural networks

EX 5 DIGITAL ELECTRONICS (GROUP 1BT4) G

Workflow Overview. BD FACSDiva Software Quick Reference Guide for BD FACSAria Cell Sorters. Starting Up the System. Checking Cytometer Performance

Automatic location and removal of video logos

Advanced Handheld Tachometer FT Measure engine rotation speed via cigarette lighter socket sensor! Cigarette lighter socket sensor FT-0801

TRANSFORM DOMAIN SLICE BASED DISTRIBUTED VIDEO CODING

Enabling Switch Devices

Marjorie Thomas' schemas of Possible 2-voice canonic relationships

Coded Strobing Photography: Compressive Sensing of High-speed Periodic Events

Solution Guide II-A. Image Acquisition. Building Vision for Business. MVTec Software GmbH

Computer Vision II Lecture 8

Computer Vision II Lecture 8

Monitoring Technology

The Art of Image Acquisition

Telemetrie-Messtechnik Schnorrenberg

The Art of Image Acquisition

Solution Guide II-A. Image Acquisition. HALCON Progress

A Methodology for Evaluating Storage Systems in Distributed and Hierarchical Video Servers

Video inpainting of complex scenes based on local statistical model

Student worksheet: Spoken Grammar

Besides our own analog sensors, it can serve as a controller performing variegated control functions for any type of analog device by any maker.

Personal Computer Embedded Type Servo System Controller. Simple Motion Board User's Manual (Advanced Synchronous Control) -MR-EM340GF

Computer Graphics Applications to Crew Displays

Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion

The Impact of e-book Technology on Book Retailing

Source and Channel Coding Issues for ATM Networks y. ECSE Department, Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A

Drivers Evaluation of Performance of LED Traffic Signal Modules

AUTOCOMPENSATIVE SYSTEM FOR MEASUREMENT OF THE CAPACITANCES

TEA2037A HORIZONTAL & VERTICAL DEFLECTION CIRCUIT

United States Patent (19) Gardner

Digital Panel Controller

LOW LEVEL DESCRIPTORS BASED DBLSTM BOTTLENECK FEATURE FOR SPEECH DRIVEN TALKING AVATAR

Novel Power Supply Independent Ring Oscillator

G E T T I N G I N S T R U M E N T S, I N C.

Q = OCM Pro. Very Accurate Flow Measurement in partially and full filled Pipes and Channels

A ROBUST DIGITAL IMAGE COPYRIGHT PROTECTION USING 4-LEVEL DWT ALGORITHM

MELSEC iq-f FX5 Simple Motion Module User's Manual (Advanced Synchronous Control) -FX5-40SSC-S -FX5-80SSC-S

Removal of Order Domain Content in Rotating Equipment Signals by Double Resampling

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

SOME FUNCTIONAL PATTERNS ON THE NON-VERBAL LEVEL

Determinants of investment in fixed assets and in intangible assets for hightech

R&D White Paper WHP 120. Digital on-channel repeater for DAB. Research & Development BRITISH BROADCASTING CORPORATION.

H3CR. Multifunctional Timer Twin Timer Star-delta Timer Power OFF-delay Timer H3CR-A H3CR-AS H3CR-AP H3CR-A8 H3CR-A8S H3CR-A8E H3CR-G.

SAFETY WITH A SYSTEM V EN

Sustainable Value Creation: The role of IT innovation persistence

VECM and Variance Decomposition: An Application to the Consumption-Wealth Ratio

Communication Systems, 5e

IN THE FOCUS: Brain Products acticap boosts road safety research

Truncated Gray-Coded Bit-Plane Matching Based Motion Estimation and its Hardware Architecture

Performance Rendering for Piano Music with a Combination of Probabilistic Models for Melody and Chords

USB TRANSCEIVER MACROCELL INTERFACE WITH USB 3.0 APPLICATIONS USING FPGA IMPLEMENTATION

Diffusion in Concert halls analyzed as a function of time during the decay process

Region-based Temporally Consistent Video Post-processing

On Mopping: A Mathematical Model for Mopping a Dirty Floor

SC434L_DVCC-Tutorial 1 Intro. and DV Formats

Tarinaoopperabaletti

THERMOELASTIC SIGNAL PROCESSING USING AN FFT LOCK-IN BASED ALGORITHM ON EXTENDED SAMPLED DATA

Commissioning EN. Inverter. Inverter i510 Cabinet 0.25 to 2.2 kw

Study of Municipal Solid Wastes Transfer Stations Locations Based on Reverse Logistics Network

Circuit Breaker Ratings A Primer for Protection Engineers

BLOCK-BASED MOTION ESTIMATION USING THE PIXELWISE CLASSIFICATION OF THE MOTION COMPENSATION ERROR

LCD Module Specification

ANANKASTIC CONDITIONALS

Supercompression for Full-HD and 4k-3D (8k) Digital TV Systems

DIGITAL MOMENT LIMITTER. Instruction Manual EN B

And the Oscar Goes to...peeeeedrooooo! 1

The Measurement of Personality and Behavior Disorders by the I. P. A. T. Music Preference Test

I (parent/guardian name) certify that, to the best of my knowledge, the

Predicting the perceived Quality of impulsive Vehicle sounds

Positive Feedback: Bi-Stability. EECS 141 F01 Sequential Logic. Meta-Stability. Latch versus Flip-Flop. Mux-Based Latches

TLE Overview. High Speed CAN FD Transceiver. Qualified for Automotive Applications according to AEC-Q100

Transcription:

10h Inernaional Sociey for Music Informaion Rerieval Conference (ISMIR 009) Hierarchical Sequenial Memory for Music: A Cogniive Model James B. Maxwell Simon Fraser Universiy Philippe Pasquier Simon Fraser Universiy Arne Eigenfeld Simon Fraser Universiy jbmaxwel@sfu.ca pasquier@sfu.ca arne_e@sfu.ca ABSTRACT sumpion of wo fundamenal ideas: 1) ha memories are hierarchically srucured, and ) ha higher levels of his srucure show increasing emporal invariance. The HTM is a ype of Bayesian nework, and is bes described as a memory sysem ha can be used o discover or infer causes in he world, o make predicions, and o direc acion. Each node has wo main processing modules, a Spaial Pooler (SP) for soring unique spaial paerns (discree daa represenaions expressed as single vecors) and a Temporal Pooler (TP) for soring emporal groupings of such paerns. The processing in an HTM occurs in wo phases: a boom-up classificaion phase, and a op-down recogniion, predicion, and/or generaion phase. Learning is a boom-up process, involving he sorage of discree vecor represenaions in he SP, and he clusering of such vecors ino emporal groups [4], or variable-order Markov chains, in he TP. A node's learned Markov chains hus represen emporal srucure in he raining daa. As informaion flows up he hierarchy, beliefs abou he ideniy of he discree inpu represenaions are formed in each node's SP, and beliefs abou he membership of hose represenaions in each of he sored Markov chains are formed in he TP. Since he model is hierarchical, higher-level nodes sore invarian represenaions of lower-level saes, leading o he formaion of high-level spaio-emporal absracions, or conceps. A simplified represenaion of HTM processing is given in Figure 1. Here we see a -level hierarchy wih wo We propose a new machine-learning framework called he Hierarchical Sequenial Memory for Music, or HSMM. The HSMM is an adapaion of he Hierarchical Temporal Memory (HTM) framework, designed o make i beer suied o musical applicaions. The HSMM is an online learner, capable of recogniion, generaion, coninuaion, and compleion of musical srucures. 1. INTRODUCTION In our previous work on he MusicDB [10] we oulined a sysem inspired by David Cope's noion of music recombinance [1]. The design used Cope's SPEAC sysem of srucural analysis [1] o build hierarchies of musical objecs. I was similar o exising music represenaion models [7, 9, 13], in ha i emphasized he consrucion of hierarchies in which he objecs a each consecuively higher level demonsraed increasing emporal invariance [5] i.e., an S phrase in SPEAC analysis, and a "head" in he Generaive Theory of Tonal Music [9], boh use singular names a higher levels o represen sequences of musical evens a lower levels. Oher approaches o learning musical srucure include neural nework models [8], recurren neural nework models (RNNs) [11], RNNs wih Long Shor-Term Memory [3], Markov-based models [1, 14], and saisical models []. Many of hese approaches have achieved high degrees of success, paricularly in modeling melodic and/or homophonic music. Wih he HSMM we hope o exend such approaches by enabling a single sysem o represen melody, harmony, homophony, and various conrapunal formaions, wih lile or no explici a priori modeling of musical "rules" he HSMM will learn only by observing musical inpu. Furher, because he HSMM is a cogniive model, i can be used o exploi musical knowledge, in real ime, in a variey of ineresing and ineracive ways.. BACKGROUND: THE HTM FRAMEWORK In his book On Inelligence, Jeff Hawkins proposes a op-down model of he human neocorex, called he Memory Predicion Framework (MPF) [6]. The model is founded on he noion ha inelligence arises hrough he ineracion of percepions and predicions; he percepion of sensory phenomena leads o he formaion of predicions, which in urn guide acion. When predicions fail o mach learned expecaions, new predicions are formed, resuling in revised acion. The MPF, as realized compuaionally in he HTM [4, 5], operaes under he as- Figure 1. Simplified HTM processing. nodes a L1 and one node a L. This HTM has already received some raining, so ha each L1 node has sored four spaial paerns and wo Markov chains, while he L node has sored hree spaial paerns and wo Markov chains. There are wo inpu paerns, p1 and p. I can be seen ha p1 corresponds o paern 4 of Node 1, and ha paern 4 of Node 1 is a member of Markov chain b. When presened wih p1, he node idenifies paern 4 as Permission o make digial or hard copies of all or par of his work for personal or classroom use is graned wihou fee provided ha copies are no made or disribued for profi or commercial advanage and ha copies bear his noice and he full ciaion on he firs page. 009 Inernaional Sociey for Music Informaion Rerieval 49

Poser Session 3 he sored paern mos similar o p1, calculaes he membership of paern 4 in each of he sored Markov chains, and oupus he vecor [0, 1], indicaing he absence of belief ha p1 is a member of Markov chain a, and he cerainy ha p1 is a member of Markov chain b. I can also be seen from Figure 1 ha he oupus of he children in hierarchy are concaenaed o form he inpus o he paren. The SP of node 3 hus reas he concaenaed oupus of nodes 1 and as a discree represenaion of heir emporal sae a a given momen i.e., ime is frozen by he paren node's SP. Node 3 hen handles is inernal processing in essenially he same manner as nodes 1 and. The doed lines indicae he op-down processes by which discree sae represenaions can be exraced or inferred from he sored Markov chains, and passed down he nework. Top-down processing can be used o suppor he recogniion of inpus paerns, o make predicions, or o generae oupu. 3. divides polyphonic maerial across he 10 available voices in he vecor represenaion. Group piches are firs sored in ascending order, afer which he voice separaion rouine follows one simple rule: ied (susained) noes mus no swich voices. Pich maerial is represened using iner-pich raio [16], calculaed by convering he MIDI noes o herz, Figure. Music represenaion for he HSMM. MOTIVATIONS BEHIND THE HSMM and dividing he pich a ime -1 by he pich a ime. In order o avoid misleading values, as a resul of calculaing he raio beween a res (pich value 0.0) and a pich, ress are omied from he pich represenaion, and he velociy represenaion is used o indicae when noes are acive or inacive (see Figure ). I will be noed ha velociy is no given using convenional MIDI values, bu is raher used as a flag o indicae he sae of a given voice in he Group. Posiive values indicae onses, negaive values indicae susained noes, and zeros indicae offses. We have simplified he non-zero values o 1 and -1 in order o avoid aribuing oo much weigh o noe velociy in he raining and inference process. The rhyhmic values used represen he imes a which each voice undergoes a ransiion eiher from one pich o anoher, or from a noe-on o a noe-off. We use he inereven ime beween such changes, and calculae he raio beween consecuive iner-even imes, for each voice n, according o he following: Our ineres in he HTM as a model for represening musical knowledge derives from is poenial o build spaioemporal hierarchies. The curren HTM implemenaion from Numena Inc., however, is focused primarily on visual paern recogniion [4, 5], and is currenly incapable of learning he sor of high-level emporal srucure found in music. This srucure depends no only on he emporal proximiy of inpu paerns, bu also on he specific sequenial order in which hose paerns arrive. The HSMM reas sequenial order explicily, and can hus build deailed emporal hierarchies. Anoher moivaion behind he HSMM lies in he fac ha he HTM is sricly an offline learner. For composiional applicaions, we are ineresed in a sysem ha can acquire new knowledge during ineracion, and exploi ha knowledge in he composiional process. We have hus designed he HSMM wih four primary funcions in mind: 1) Recogniion: The sysem should have a represenaion of he hierarchical srucure of he music a any given ime in a performance. ) Generaion: The sysem should be capable of generaing sylisically inegraed musical oupu. 3) Coninuaion: If a performance is sopped a a given poin, he sysem should coninue in a sylisically appropriae manner. 4) Paern Compleion: Given a degraded, or parial inpu represenaion, he sysem should provide a plausible compleion of ha inpu (i.e., by adding a missing noe o a chord). 4. inerevenraio [n]= inereventime 1 [n] (1) inereventime [n ] The final represenaion for he HSMM will hus consis of one 10-member iner-pich raio vecor, one 10-member iner-even ime raio vecor, and one 10-member velociy flag vecor. 5. HSMM LEARNING AND INFERENCE Figure 3 shows a four-level HSMM hierarchy wih inpus for pich, rhyhm, and velociy informaion, an associaion node (L) for making correlaions beween he L1 oupus, and wo upper-level nodes for learning higherordered emporal srucure. The associaion node a L provides he necessary conneciviy beween pich, rhyhm, and velociy elemens required for he idenificaion of musical moives. The upper-level nodes a L3 and L4 are used o learn high-order musical srucure from he moives learned a L. MUSIC REPRESENTATION For he curren sudy, we are working wih sandard MIDI files from which noe daa is exraced and formaed ino hree 10-member vecors: one for pich daa, one for rhyhmic daa, and one for velociy daa. The incoming music is firs pooled ino srucures similar o Cope s Groups [1] verical slices of music conaining he oal se of unique piches a a given momen. A new Group is creaed every ime he harmonic srucure changes, as shown in Figure. The Groups are preprocessed using a simple voice-separaion algorihm, which 430

10h Inernaional Sociey for Music Informaion Rerieval Conference (ISMIR 009) HSMM is an online learner, an ineger incremenaion is no appropriae, as i would lead o couns of vanishingly small proporions being assigned o new coincidences if he sysem were lef running for long periods of ime. Thus, in he HSMM, he couns vecor is updaed according o he following: inc= C 0.01 (3) couns [ opcoinc ]=couns 1 [opcoinc ] inc (4) couns [i ]= Figure 3. A four-level HSMM hierarchy. (5) where C is he number of learned coincidences, inc is he incremenaion value, and opcoinc is he coincidence ha raed as having he maxsimilariy (Figure 4) o he inpu. Because couns is regularly normalized, i represens a ime-limied hisogram in he HSMM. SP inference above L1 is calculaed as in he HTM, bu we ouline i here for he sake of clariy. A higher levels we wan o calculae he probabiliy ha he new inpu λ should be classified as one of he sored coincidences. When he node has more han one child, we consider each child s conribuion o he overall probabiliy separaely: 5.1 Online Learning in he HSMM Whereas he TP in he HTM builds Markov chains during is raining phase, in he HSMM we focus simply on consrucing discree sequences from he series of paerns inpu o he node. As in he HTM, he paerns sored in he SP will be referred o as coincidences. The sequences sored by he TP will be referred o simply as sequences. 5.1.1 Learning and Inference in he Spaial Pooler The objecive of SP learning is o sore unique inpu paerns as coincidences, while he objecive of SP inference is o classify inpu paerns according o he sored coincidences. The algorihm used is given in Figure 4. As in he HTM, when a new inpu is received he SP checks he inpu agains all sored coincidences, C. The resul is an SP oupu vecor y, calculaed according o: couns [i ], for i=0 o couns 1 1 inc C =C 1... C M 1 M =... M (6) y [i]= max C j [k, i] j [ k ], for i=0 o C 1 j=1 k where M is he number of child nodes, C j is he porion of coincidence vecor k aribued1 o child j, and λj is he porion of λ aribued o child j. Figure 5 shows an example calculaion for a hypoheical SP wih wo sored coincidences. () y [i ]=e d p, C [i ] /, for i =0 o C 1 where d(p,c[i]) is he Euclidean disance from inpu p o coincidence C[i], and σ is a consan of he SP. The consan σ is used o accoun for noise in he inpu, and is useful for handling rhyhm vecors, where frequen flucuaions of iming accuracy are expeced. The oupu y is a belief disribuion over coincidences, in which a higher value indicaes greaer similariy beween inpu paern p and sored coincidence C[i], and hus greaer evidence ha p should be classified as C[i]. If he similariy of p o all sored coincidences is less han he minumum allowed similariy, simthreshold, p is added as a new coincidence. In he even ha a new coincidence is added, he algorihm uses he value of maxsimilariy i.e., he belief in he coincidence mos similar o inpu p as he iniial belief value when adding he new coincidence. I hen normalizes y in order o scale he new belief according o he belief over all sored coincidences. In order o decide wheher a new coincidence is required a higher levels, we sar by firs deermining wheher he inpu paern λ (see Figure 1) should be sored as a new coincidence. This is simply a maer of checking he lengh of he λ vecor a ime agains he lengh a ime -1. If he lengh has increased, we know ha a leas one of he children has learned a new represenaion in he curren ime sep, and ha a new coincidence mus be added in order o accoun for he addiional informaion. For each new coincidence sored by he SP, a hisogram called he couns vecor is updaed. In he HTM, he updae is an ineger incremenaion a coun of how many imes he coincidence has been seen during raining. However, because he p The curren inpu paern C The able of sored coincidences maxsimilariy The maximum similariy value found simthreshold The minimum degree of similariy beween inpu p and coincidence C[i] required for p o be classified as C[i] unmachedcoinc A coun of he number of imes inpu p was found o be insufficienly similar o all coincidences in C Se maxsimilariy o 0 For each sored coincidence C[i] in C Calculae y[i], given inpu p, according o Equaion If y[i] > maxsimilariy Se maxsimilariy o y[i] If y[i] < simthreshold Incremen unmachedcoinc coun If unmachedcoinc coun = size of C add inpu p o sored coincidences C append maxsimilariy o end of y vecor normalize y vecor Figure 4. Online SP learning and inference. Recall ha when he node has more han one child, each coincidence will be a concaenaion of child oupus. 1 431

Poser Session 3 5.1. Learning in he Temporal Pooler The objecive of TP learning is o consruc sequences from he series of belief vecors (y) received from he SP. When a new inpu o he TP is received, he TP firs calculaes he winning coincidence of y: opcoinc = argmax y [i ] (7) i I hen deermines wheher his coincidence has changed since he previous ime sep i.e., wheher opcoinc equals opcoinc-1 and sores he resul in a flag called change. The nex sep is o deermine wheher he ransiion from opcoinc-1 opcoinc exiss among he TP s sored sequences. To do his, we depar from he HTM enirely, and use an algorihm we refer o as he Sequencer algorihm. In he Sequencer algorihm, we consider wo aspecs of he relaionship beween opcoinc and a given sored sequence, Seqn: 1) he posiion of opcoinc in Seqn (zero if opcoinc Seqn), referred o as he sequencer sae, and ) he cumulaive slope formed by he hisory of sequencer saes for Seqn. Thus, if Seqn is four coincidences in lengh, and each successive opcoinc maches each coincidence in Seqn, hen he hisory of sequencer saes will be he series: {1,, 3, 4}, wih each ransiion having a slope of 1.0. We use a vecor called seqsae o sore he sequencer saes, and a vecor called seqslope o sore he cumulaive slope for each sequence, formed by he hisory of sequencer saes. The slope is calculaed as follows: seqslope [i ]= 1 1 seqsae [i ] seqsae [i ] { 1 Figure 6. Using seqslope o find he bes sequence. 5.1.3 Inference in he Temporal Pooler The objecive of TP inference is o deermine he likelihood ha opcoinc is a member of a given sored sequence. A each ime sep, he TP uses he couns vecor, from he SP, o updae a Condiional Probabiliy Table, called he weighs marix, which indicaes he probabiliy of a specific coincidence occurring in a given sequence. The weighs marix is calculaed as: (8) seqslope [i ]= seqslope [ i] seqslope [i ], i=1.0 1 seqslope [i ] seqslope [i], i 1.0 seqslope [i ]= Figure 5. SP inference calculaions above L1. seqslope [ i] 1 e 1 k (9) weighs [i, j]=couns [ j ] I i, j / couns [k ] I i, j { I i, j = 1, 0, (10) (11) i =1 C [ j] S [i] C [ j] S [i] where C[j] is he jh sored coincidence and S[i] is he ih sored sequence. The probabiliies sored by he weighs marix are used during TP inference, and also when forming op-down beliefs in he hierarchy, as inroduced in Secion. I is a row-normalized marix where rows represen sequences and columns represen coincidences. Because he couns vecor mainains is hisogram of opcoinc occurrences over a limied emporal window, he weighs marix in he HSMM is able o ac as a form of shor-erm memory for he node. The oupu of TP inference is he boom-up belief vecor z, which indicaes he degree of membership of opcoinc in each of he sored sequences. The argmax of z hus idenifies he sequence mos srongly believed o be acive, given opcoinc. To calculae z, we use a varian of he sumprop and maxprop algorihms used in he HTM [6], which we refer o as pmaxprop. The algorihm uses he weighs marix o calculae a belief disribuion over sequences, as follows: where seqsae[i] indicaes he posiion of opcoinc in sequence i (zero if non-member). The sigmoid scaling performed in Equaion 10 helps o consrain he cumulaive slope values. Figure 6 shows an example of using cumulaive sequence slopes o reveal he bes sequence. A levels above L1, we only updae he seqslope vecor when change = 1, in order o help he TP learn a a ime scale appropriae o is level in he hierarchy. A node parameer, slopethresh, is used o deermine he minimum slope required for he TP o pass ono he inference sage wihou adding a new sequence or exending an exising sequence. If he maximum value in seqslope does no exceed he value of slopethresh, hen eiher a new sequence is creaed, or an exising sequence exended. Generally, we allow only one occurrence of any given coincidence in a single sequence a all levels above L1, hough any number of sequences may share ha coincidence. This is done o avoid building long sequences a he boom of he hierarchy, hus dividing he consrucion of longer sequences across he differen levels. We allow consecuive repeiions of coincidences a L1, bu do no allow non-consecuive repeiions. This is a musical consideraion, given he frequen use of repeiions in musical language. i z [i]= max weighs [i, j] y [ j ] (1) j=1 An example run of he pmaxprop algorihm is given in Figure 7, using he coincidences and sequences from Figure 6. Because he weighs marix in he HSMM is a 43

10h Inernaional Sociey for Music Informaion Rerieval Conference (ISMIR 009) Figure 7. The pmaxprop algorihm calculaions. shor-erm memory, and he pmaxprop algorihm is a one-sho inference, wih no consideraion of he previous ime sep, we combine he resuls of pmaxprop wih he resuls of he Sequencer algorihm, o yield he final boom-up belief vecor: z [i ]= z [i] seqslope [i ] (13) 5. Belief Formaion in an HSMM Node The final belief vecor o be calculaed, a belief disribuion over coincidences called BelC, represens he combinaion of he node's classificaion of a given inpu, and is predicion regarding ha inpu in he curren emporal conex. Thus, for every boom-up inpu here is a opdown, feedback response. Boom-up vecor represenaions passing beween nodes are denoed wih λ, while op-down, feedback represenaions are denoed wih π. A schemaic of node processing can be seen in Figure 8. The op-down, feedback calculaions used in he HSMM are he same as hose used in he HTM, bu we ouline hem here for compleeness. The firs sep in processing he op-down message is o divide he op-down paren belief π by he node's boomup belief λ (a he op of he hierarchy, he boom-up belief z is used for he op-down calculaions): (14) ' [i ]= [i] / [i] Nex, he π vecor is used o calculae he op-down belief disribuion over sored coincidences as: y [i ]= max weighst [i, j ] ' [ j ] Seq i S (15) for i =0 o C 1 where weighst[i,j] is he ransposed weighs marix, and y is he op-down belief over coincidences, and S is he able of sored sequences. Figure 9 gives an example, assuming he coincidences and sequences from Figure 7. The BelC vecor is hen calculaed as he produc of he op-down (y ) and boom-up (y ) belief disribuions over coincidences: Bel C [i ]= y [i ] y [i ] Figure 8. HSMM node processing.. This calculaion ensures ha each child porion of he op-down oupu is proporional o he belief in he node. In cases where he paren node has wo or more children, he π vecor is divided ino segmens of lengh equal o he lengh of each child s λ vecor (i.e., reversing he concaenaion of child messages used during boom-up processing). The various sages of op-down processing are illusraed on he righ side of Figure 8. One exra sep, in suppor of TP inference in he HSMM, is added ha is no presen in he HTM. In accordance wih he ideas of he MPF, i seemed inuiively clear o us ha predicions could be used locally in he TP o suppor he learning process by disambiguaing SP inpus whenever possible. Wih his in mind we added a calculaion o he TP inference algorihm ha biases he SP belief vecor, y, according o he sae of he change flag, and he curren op-down inference over sequences. In cases where he sequence inferred by op-down processing a ime -1 conains opcoinc-1, and change = 0, he belief value for opcoinc-1 is srenghened. However, when change = 1, belief in he nex coincidence in he inferred sequence is srenghened. The algorihm is given in Figure 10. Thus, when he sae of he node appears o be changing, belief is biased slighly oward wha is mos likely o occur, whereas when he sae appears o be sable, he mos recen belief is assumed o be correc. (16) This calculaion ensures ha he belief of he node is always based on he available evidence boh from above and below he node s posiion in he hierarchy. A all levels above L1, he op-down oupu of he node (he message sen o he children) is calculaed using he BelC vecor and he able of sored coincidences C: [i ]=argmax C [i ] Bel C [ j ] C [ i ] C for i=0 o C 1 Figure 9. Using he weighs marix o calculae he op-down belief over coincidences. 6. DISCUSSION AND CONCLUSION The srengh of he HSMM lies in is balancing of hierarchical inerdependence wih node-level independence. Each node learns in he conex of he hierarchy as a whole, bu also forms is own represenaions and beliefs over a paricular level of musical srucure. A L1, simple moivic paerns can be recognized and/or generaed, and a he higher levels, larger musical srucures like phrases, melodies, and secions can also be learned, classified, and (17) A he node level, he λ and z vecors are equivalen. The naming is inended o disinguish he beween-node processes from he wihin-node processes. 433

Poser Session 3 generaed. Furher, since nodes above L1 all process informaion in an idenical manner, and only require a single child, addiional high-level nodes could be added, enabling he learning of higher levels of formal srucure songs, movemens, composiions. Each node can be moniored independenly, and is sae exploied for composiional, analyical, or musicological purposes. Composiion ools could be developed, offering various levels of ineraciviy, while mainaining sylisic coninuiy wih he user's musical language. In he area of classic Music Informaion Rerieval, low levels of he HSMM could be used o idenify common moivic gesures among a given se of works, while higher levels could be used o recognize he music of individual composers, or o cluser a number of works by sylisic similariy. opseq-1 The sequence inferred by op-down processing predcoinc The prediced coincidence 8. REFERENCES [1] D. Cope: Compuer Models of Musical Creaiviy. 87-13, 6-4, MIT Press, Cambridge, MA, 005. [] S. Dubnov, G. Assayag, and O. Larillo: Using MachineLearning Mehods for Musical Syle Modelling, Compuer, 36/10, 003. [3] D. Eck and J. Schmidhuber: Learning he Long-Term Srucure of he Blues, Lecure Noes in Compuer Science, Vol. 415, Springer, Berlin, 00. [4] D. George. How he Brain Migh Work: A Hierarchical and Temporal Model for Learning and Recogniion, PhD Thesis, Sanford Universiy, Palo Alo, CA, 008. [5] D. George and J. Hawkins: A Hierarchical Bayesian Model of Invarian Paern Recogniion in he Visual Corex, Redwood Neuroscience Insiue, Menlo Park, CA, 004. [6] J. Hawkins and S. Blakeslee: On Inelligence, Times Books, New York, NY, 004. For each coincidence c in opseq-1 If opseq-1[c] equals opcoinc-1 Se predcoinc o opseq-1[c+1] [7] K. Hiraa and T. Aoyagi: Compuaional Music Represenaion Based on he Generaive Theory of Tonal Music and he Deducive Objec-Oriened Daabase, Compuer Music Journal, 7/3, 003. If change = 0 y[opcoinc-1] = y[opcoinc-1] * 1.1 else if change = 1 y[predcoinc] = y[predcoinc] * 1.1 [8] D. Hörnel and W. Menzel: Learning Music Srucure and Syle wih Neural Neworks, Compuer Music Journal, /4, 1998. Figure 10. Biasing he prediced coincidence. [9] F. Lerdhal and R. Jackendoff: A Generaive Theory of The HSMM explois shor-erm and long-erm memory srucures, and uses an explici sequencing model o build is emporal hierarchies, hus giving i he capaciy o learn high-level emporal srucure wihou he ree-like opologies required by HTM neworks. Tremendous progress has been made in he cogniive sciences and cogniive modeling, bu such work has remained largely unexplored by he compuer music communiy, which has focused more on pure compuer science and signal processing. The HSMM offers a firs sep oward he developmen and exploiaion of a realisic cogniive model for he represenaion of musical knowledge, and opens up a myriad of areas for exploraion wih regard o he associaed cogniive behavior. 7. Tonal Music, MIT Press, Cambridge, MA, 1983. [10] J. Maxwell and A. Eigenfeld: The MusicDB: A Music Daabase Query Sysem for Recombinance-based Composiion in Max/MSP, Proceedings of he 008 Inernaional Compuer Music Conference, Belfas, Ireland, 008. [11] M.C. Mozer: Neural Nework Music Composiion by Predicion: Exploring he Benefis of Psychoacousic Consrains and Muli-scale Processing, Connecion Science, 6/-3, 1994. [1] E. Pollasri and G. Simoncelli: Classificaion of Melodies by Composer wih Hidden Markov Models, Proceedings of he Firs Inernaional Conference on WEB Delivering of Music, 001. FUTURE WORK [13] Y. Uwabu, H. Kaayose and S. Inokuchi: A Srucural A working prooype of he HSMM has been implemened, and iniial ess have shown grea promise. A fuure paper will cover he evaluaion in deail, wih an emphasis on exploiing he srenghs offered by he cogniive model. We are ineresed in exploring alernaive disance merics for he L1 nodes paricularly he pich and rhyhm nodes, where more musically-grounded merics may be effecive. We are also ineresed in exploring differen opologies for he hierarchy, in paricular, opologies ha isolae individual voices and allow he sysem o learn boh independen monophonic hierarchies and associaive polyphonic hierarchies. Along similar lines, we would like o explore he possibiliies offered by saebased gaing of individual nodes in more complex hierarchies, in order o simulae he cogniive phenomenon of aenion direcion. Analysis Tool for Expressive Performance, Proceedings of he Inernaional Compuer Music Conference, San Fransisco, 1997. [14] K. Verburg, M Dinolfo and M. Fayer: Exracing Paerns in Music for Composiion via Markov Chains, Lecure Noes in Compuer Science, Springer, Berlin, 004 [15] G. Wilder: Adapive Melodic Segmenaion and Moivic Idenificaion, Proceedings of he Inernaional Compuer Music Conference, Belfas, Ireland, 008. 434