A hierarchical self-organizing map model for sequence recognition

A hierarchical self-organizing map model for sequence recognition Otávio Augusto S. Carpinteiro Instituto de Engenharia Elétrica Escola Federal de Engenharia de Itajubá Av. BPS 1303, Itajubá, MG, 37500-000, Brazil e-mail: otavio@iee.efei.br h-page: http://www.iee.efei.br/~otavio phone: +55-35-6291325 fax: +55-35-6291187 March 1999 Keywords: artificial intelligence, artificial neural networks, pattern recognition, self-organizing map.

Abstract The paper presents the analysis of an original hierarchical neural model on a complex sequence the complete sixteenth four-part fugue in G minor of the Well-Tempered Clavier (vol. I) of J. S. Bach. The model makes an effective use of context information, through its hierarchical topology and its embedded time integrators, and that enables it to keep a very good account of past events. The model performs sequence classification and discrimination efficiently. It has application in domains which require pattern recognition, or particularly, which demand recognizing either a set of sequences of vectors in time or sub-sequences into a unique and large sequence of vectors in time.

1. Introduction Many researchers have developed artificial neural models to classify sequences in time. Several models based on Kohonen s self-organizing feature map [1], nonetheless, have faced some wellknown flaws. In windowed data models, as in Kangas model [2], the input vectors are concatenated in a fixed-size window which slides in time. The size of the memory to the past inputs is limited to the size of the window. The most serious deficiency of such models is that they become computationally expensive as wider windows are required. In time integral models, as in Chappell and Taylor s model [3], the activation of a unit is a combination of its current input and its former outputs decayed in time. Such models suffer from loss of context. Sequences that slightly differ in their initial elements (e.g., abccc, baccc, and bbccc) would probably have the same classification. In non-orthodox models, neither windows nor time integrators are employed. In James and Miikkulainen s model [4], for instance, when a vector in the sequence is input, the output unit in the map which wins the competition for that vector is disabled for further competition, and its activation decays in time to indicate which vector in the sequence it is representing. Each winning output unit represents just a single vector in the input sequence, and so, the representation for the whole input sequence is given by a sequence of winning output units in the map. This distributed representation is more complicated because it has to take into consideration not only the winning units but also their activations. For example, input sequences with the same elements but in different order (e.g., abcd and dcba) will have the same winning units, and so, one has to look at their activations to verify which input sequence the map is representing. Another disadvantage is that the model is unable to recognize sub-sequences inside a large and unique input sequence. The reason is that, after winning the competition in the map, the winning units are disabled for further competitions, and consequently, identical or similar sub-sequences inserted into different points of the large sequence will never hold identical or similar representations in the map. The model presented here is also based on Kohonen s map. However, thanks to its hierarchical topology and its embedded time integrators, it makes an efficient use of context information which prevents it from suffering from the flaws mentioned above. Artificial neural models have been widely employed in the musical domain. Among them, we may cite models for pitch perception [5, 6], chord perception [7], tonal perception [8, 9, 10, 11, 12], musical time perception [13], perception of musical sequences [14], musical pattern categorizations [15, 16], and musical composition [17, 18, 19, 20, 21]. Gjerdingen s work [15, 16] is the closest to the area of musical sequence recognition. His neural model was required to classify groups of notes produced by segmentation into categories. Unfortunately though, nothing was mentioned about existence or not of similar sequences of patterns in the training set, so that one could not assess performance of the net in terms of classifications and misclassifications of such sequences.

This paper reports our final research in neural models applied to the musical domain. The initial results were published in [22]. The research reported here, however, is much more relevant, for the musical sequences employed in the studies were much more complex than those employed in [22], and the results obtained and conclusions are also much more significant. In [22], we applied the model to a single part (or line) of a fugue of J. S. Bach. Here, the model is trained and evaluated on a more complex sequence in time a complete four-part fugue of J. S. Bach. In the following sections, we shall describe the representation, model, and the experiment on the musical sequences. 2. Representation for the sequences The concept behind the representation is the division of a musical piece into equal size time intervals. The small figure in a musical sequence is set to be the time interval (TI). Thus, all other figures in the sequence are multiples of TI. For example, if TI is an eighth note, a quarter note lasts two TIs, a half note lasts four TIs, and so on. A time interval counter (TIC) may also be defined. One TIC lasts one TI. TIC is the unit in which the musical sequence is measured. Therefore, at each TIC, either there is a rest, or a note onset, or a note sustained. The input data in the experiment consists of a sequence of musical intervals, which corresponds to the Bach s fugue. Data is input one TIC at a time. Fifteen neural units are used in the representation. Each unit represents one musical interval ranging from an octave down to an octave up. We assume here, therefore, that listeners are able to separate out voices if intervals between notes in the voices are greater than a fifteenth. When there is a rest, none of the input units receives activation. Otherwise, when a note is onset or sustained, the unit corresponding to the interval receives activation. The representation for musical sequences takes into consideration all musical voices, and therefore, is complex because the voices interact. The representation assumes three facts. First, any note onset occurring in a TIC makes up an interval with all notes onset or sustained which occurred in the TIC immediately before. Second, the representation does represent the intervals which occur in a TIC, but does not represent multiple instances occurring through the voices in the TIC. Third, at any given TIC, an interval corresponding to a note onset masks any occurrence of the same interval corresponding to a note sustained. Let us consider the multivoiced musical sequence in figure 1, and a time integrator with decay rate of 0.5 applied to input units. The sequence lasts 8 TICs, and TI is a sixteenth note. Units corresponding to intervals made up by a note onset or note sustained receive an activation of 1.0 and 0.5 respectively. Table 1 displays then the integrated value of activations of the units when inputting, TIC by TIC, the sequence in figure 1.

4 G 8 < 4 I : 2 4 2 4!)! -!)!)!)! -!(! -?? 1 2 3 4 5 6 7 8 Figure 1: A multivoiced musical sequence TIC Table 1: Representation for the musical sequence in figure 1 Activations of Units Representing Intervals -8-7 ::: -4-3 -2 0 +2 ::: +5 +6 +7 +8 1 ::: ::: 2 ::: 1.0 ::: 1.0 3 1.0 ::: 1.0 0.5 ::: 1.0 0.5 4 1.0 ::: 1.5 0.25 ::: 1.5 0.25 5 0.5 ::: 1.0 0.75 1.125 ::: 0.75 1.125 6 0.25 ::: 1.0 0.375 1.063 ::: 0.375 1.063 7 0.125 ::: 0.5 0.188 0.531 ::: 0.188 0.531 8 0.063 ::: 0.25 0.094 0.266 ::: 0.094 0.266 3. The model The model is made up of two self-organizing maps (SOMs), as shown in figure 2. Its features, performance, and potential are better evaluated in [22, 23]. The problem of loss of context which occurs in other models, the analysis of our model, the analysis of Kohonen s SOM model, and the comparison of our model with Kohonen s in a simple temporal pattern recognition problem are reported in [23]. In this paper, we made use of contrived sequences to explain the behaviour of the two models in the presence of context information. We explained the effect of the time integrators on the results produced as well. In our model, the bottom SOM is responsible for making up representations which take into consideration not only the information given by the input vectors, but also the information given by the context in which these vectors are inserted. Therefore, the information passed up to the top SOM is much more acurate, and consequently, so are the classifications. We finally show that our model has much better performance than that of Kohonen s.

Top SOM Map Time Integrator Λ Bottom SOM Map Time Integrator V(t ) Figure 2: The model The input to the model is a sequence in time of m-dimensional vectors, S 1 = V(1), V(2),..., V(t),..., V(z), where the components of each vector are non-negative real values. The sequence is presented to the input layer of the bottom SOM, one vector at a time. The input layer has m units, one for each component of the input vector V(t), and a time integrator. The activation X(t) of the units in the input layer is given by X( t) = V( t) + δ 1X( t 1) (1) where δ 1 (0,1) is the decay rate. For each input vector X(t), the winning unit i * (t) in the map 1 is the unit which has the smallest distance Ψ(i,t). For each output unit i, Ψ(i,t) is given by the Euclidean distance between the input vector X(t) and the unit s weight vector W i. Each output unit i in the neighbourhood N * (t) of the winning unit i * (t) has its weight W i updated by W ( t + 1) = W ( t) + α ϒ ( i)[ X( t) Wi( t)] (2) i i where α (0,1) is the learning rate. ϒ(i) is the neighbourhood interaction function [24], a Gaussian type function, and is given by ϒ () i = κ + κ e 1 2 * 2 κ 3[ Φ( ii, ( t))] 2 σ 2 (3) where κ 1, κ 2, and κ 3 are constants, σ is the radius of the neighbourhood N * (t), and Φ(i,i * (t)) is the distance in the map between the unit i and the winning unit i * (t). The distance Φ(i,i ) between any two units i and i in the map is calculated according to the maximum norm, { } ' '' ' '' ' '' Φ(i, i ) = max l l, c c (4) where (l,c ) and (l,c ) are the coordinates of the units i and i respectively in the map.

The input to the top SOM is determined by the distances Φ(i,i * (t)) of the n units in the map of the bottom SOM. The input is thus a sequence in time of n-dimensional vectors, S 2 = Λ(Φ(i,i * (1))), Λ(Φ(i,i * (2))),..., Λ(Φ(i,i * (t))),..., Λ(Φ(i,i * (z))), where Λ is a n-dimensional transfer function on a n-dimensional space domain. Λ is defined as * * 1 κφ( ii, ( t)) if i N ( t) * ΛΦ ( ( ii, ( t))) = 0 otherwise (5) where κ is a constant, and N * (t) is a neighbourhood of the winning unit. The sequence S 2 is then presented to the input layer of the top SOM, one vector at a time. The input layer has n units, one for each component of the input vector Λ(Φ(i,i * (t))), and a time integrator. The activation X(t) of the units in the input layer is thus given by where δ 2 (0,1) is the decay rate. * X() t = ΛΦ ( ( i, i ())) t + δ2x( t 1) (6) The dynamics of the top SOM is identical to that of the bottom SOM. 4. The experiment We wanted the model to be tested on musical sequences, because the musical domain set three strong conditions on the model. First, that the model were able to recognize both a set of input sequences and a set of sub-sequences within a large and unique input sequence. The model was required to recognize a set of input sequences when the whole sequence was segmented. Otherwise, the entire piece consisted of a unique input sequence, and the model was thus required to recognize sub-sequences of that sequence. Second, that the model classified sequences (or sub-sequences) properly in the presence of noise. The reason followed from the fact that any two sequences which differed slightly had to achieve similar classifications. Third, that the model recognized sequences (or sub-sequences) in a very precise form. The reason for the latter is that any two sequences which shared either some intervals, or even all intervals, but in an alternative order or rhythm, were musically different, and as a consequence, had to be recognized as distinct. The experiment was on recognition of the instances of a theme occurring in a complex sequence the complete sixteenth four-part fugue in G minor of the first volume of The Well-Tempered Clavier of Bach. The fugue had 544 TICs, and TI was a sixteenth note. The theme of the fugue is shown in figure 3.

I 22 S?!! [ [! 4!! theme?!!!! PNN! -!... Figure 3: Theme of the sixteenth fugue in G minor The fugue in G minor was chosen for several reasons. First, as many of fugues of Bach, it has four voices. Second, it possesses many perfect and modified instances of theme. Third, it includes two cases of stretto 2. One case occurs between its seventeenth and eighteenth bars, in which two instances of theme overlap, and the other case occurs between its twenty eighth and thirtieth bars, in which three instances of theme overlap. Fourth, the thematic material of the theme is extensively developed throughout the fugue. Such developments 3, although quite similar to the theme, are not instances of theme. Fifth, very common intervals, as seconds up and down, occur extensively in the theme as well as in many passages in the fugue. All facts above are usually present in real situations, for example, in those in which humans are asked to perform thematic recognition in musical domains. Apart from providing a typical situation in a real domain, such facts also increase very much the level of difficulty of the domain to which the artificial neural model is applied. The input data consisted of two sets, hereafter referred to as input set I and input set II. Input set I consisted of a large and unique sequence of musical intervals, which corresponded to the fugue. Input set II contained many sequences, which were formed by segmenting the fugue whenever there were rests. The experiment pursued two aims. Firstly, to determine whether the model recognizes all instances of the theme in the fugue. Secondly, to determine whether any other sequence (or sub-sequence), which was not an instance, was not misclassified as theme. The training of the two SOMs of the model took place in two phases coarse-mapping and fine-tuning. The initial learning rate was set to 0.5, and the initial size of the neighbourhood was set to the size of the map. In the coarse-mapping phase, the learning rate and the radius of the neighbourhood were reduced linearly whereas in the fine-tuning phase, they were kept constant in 0.01 and 1 respectively. The coarse-mapping phase took 20%, and the fine-tuning phase took 80% of the total number of epochs. The initial weights were given randomly, in the range between 0 and 0.1, to both SOMs. Different values for decay rate were tested. In the bottom SOM of model II, it varied from 0.1 to 0.7, and in the top SOM, from 0.5 to 0.9. We present here, however, only the results using decay rates of 0.3 and 0.7 for the bottom and top SOM respectively. Such results, reached with those decay rates, were the best for all studies performed. The bottom SOM of model II was tested with map size of 15 15, and was trained in 700 epochs. The top SOM was tested with map size of 18 18, and trained in 850 epochs. Two

transfer functions Λ were tested. The first was given by equation 5, with neighbourhood N * (t) = {i Φ(i,i * (t)) < 2}, and κ = 0.5. The second was also given by equation 5, but with neighbourhood N * (t) = {i Φ(i,i * (t)) < 4}, and κ = 0.25. We report here studies using the second transfer function only, for it produced the better results. The input layer of the bottom SOM of model II held fifteen units, one for each musical interval ranging from an octave down to an octave up. The representation employed in these units is fully described in section 2. The experiment comprised five studies. In the last four, in order to study the effect of noise on the classifications, reinforcement in activation was given to input units when representing instances of theme. For example, in the second experiment, note onset and note sustained received activations of 0.1 and 0.07 respectively. When corresponding to instances of the theme, they received instead, activations of 0.5 and 0.35 respectively. Table 2 shows the activation values of notes onset and sustained, whether reinforced or not, as well as the input set employed in each study 4. Table 2: Parameter values of the studies Study Input Reinforcement Note Note N. Onset N. Sustained Set Value Onset Sustained (Reinforced) (Reinforced) I I 1 0.1 0.07 0.1 0.07 II I 5 0.1 0.07 0.5 0.35 III I 10 0.1 0.07 1.0 0.7 IV I 100 0.1 0.07 10.0 7.0 V II 100 0.1 0.07 10.0 7.0 A sequence (or sub-sequence) S a is said to have the same classification as that of the * * theme S t if the distance Φ(ia(), z it ()) z < 2, where i * a () z and i * t ( z) are the last winning units of Sa and S t. In case of S a be also an instance of the theme, the mean error of the instance S a, Ê(S a ), is then given by '' ta * * ( ia( z), it ( z)) Ê( S a ) = Φ '' ' ' ( t t + 1) z= ta a a (7) ' '' where t a and t a are the initial and final TICs of the instance Sa 5. The total mean error is given by the sum of the mean errors of all instances. We have selected some instances of theme to present here. Those instances hold some peculiarities, whether in terms of similarity with the theme, or convergence to the theme classification, or context in which they were inserted, which make them representatives among the whole set of instances.

Figures 4 to 10 plot, for each selected instance of theme, the distances between each winning unit of the instance and its corresponding winning unit in the theme for each study carried out on the model. We used equation 4 to measure those distances in the map of the top SOM. Table 3 displays the main characteristics of the instances of theme. Table 4 shows, for each study, the total error, and the mean errors of each instance of theme. Table 5 displays the classifications and misclassifications of the studies. Figure 11 plots the total mean error of classifications in accordance with reinforcement provided in the first four studies. 16 14 12 study I study II study III study IV study V Distance 10 8 6 4 2 0 33 36 39 42 45 48 Figure 4: Classifications of the first instance of theme (TICs 33 50) relative to theme. The instance occurs in the fourth voice the highest one concurrently with another voice. The instance differs from the theme in its first two TICs. The classifications of the instance yielded by the model only converge to that of the theme in the fourth and fifth studies. TIC

16 14 12 study I study II study III study IV study V Distance 10 8 6 4 2 0 265 268 271 274 277 280 Figure 5: Classifications of the seventh instance of theme (TICs 265 282) relative to theme. The instance occurs in the first voice, concurrently with three other voices. The instance is a perfect copy of the theme. The classifications of the instance yielded by the model do not converge to that of the theme in any of the studies. TIC

16 14 12 study I study II study III study IV study V Distance 10 8 6 4 2 0 273 276 279 282 285 288 Figure 6: Classifications of the eighth instance of theme (TICs 273 290) relative to theme. The instance occurs in the third voice, and is a perfect copy of the theme. At its beginning, the eighth instance occurs concurrently with three other voices. From TIC 283 onwards, it occurs concurrently with just one voice. The classifications of the instance produced by the model only converge to that of the theme in the second, third, fourth, and fifth studies. TIC

16 14 12 study I study II study III study IV study V Distance 10 8 6 4 2 0 361 364 367 370 373 376 Figure 7: Classifications of the eleventh instance of theme (TICs 361 377) relative to theme. The instance occurs in the third voice, concurrently with two other voices. The instance differs from the theme in its first two TICs, and in two TICs in its middle. The classifications of the instance yielded by the model only converge to that of the theme in the fifth study. TIC

16 14 12 study I study II study III study IV study V Distance 10 8 6 4 2 0 441 444 447 450 453 456 Figure 8: Classifications of the twelfth instance of theme (TICs 441 456) relative to theme. The instance occurs in the fourth voice, and is a perfect copy of the theme. At its beginning, the instance occurs unaccompanied. From TIC 443 onwards, it occurs concurrently with two other voices, and from TIC 451 onwards, it occurs concurrently with three other voices. The classifications of the instance produced by the model do not converge to that of the theme in any of the studies. TIC

16 14 12 study I study II study III study IV study V Distance 10 8 6 4 2 0 449 452 455 458 461 464 Figure 9: Classifications of the thirteenth instance of theme (TICs 449 466) relative to theme. The instance occurs in the second voice, and is a perfect copy of the theme. At its beginning, the instance occurs concurrently with two other voices. From TIC 451 onwards, it occurs concurrently with three other voices, and from TIC 459 onwards, it occurs concurrently with just one voice. The classifications of the instance yielded by the model only converge to that of the theme in the second, third, fourth, and fifth studies. TIC

16 14 12 study I study II study III study IV study V Distance 10 8 6 4 2 0 457 460 Figure 10: Classifications of the fourteenth instance of theme (TICs 457 460) relative to theme. The instance occurs in the first voice, concurrently with three other voices. The instance is a perfect copy of the theme. The classifications of the instance produced by the model do not converge to that of the theme in any of the studies. TIC

Table 3: Main characteristics of the instances of theme Voice: voice in which the instance occurs (1: lowest; 4: highest); No. Voices: number of voices sounding concurrently (including the one in which the instance occurs); Dif. TICs: difference in number of TICs between the instance and the theme; Conv.: studies in which the instance converges to the classification of the theme; TIC: TICs in which the instance occurs; Voice No. V oices Dif.TICs Conv. TIC 1 3 4/5 73{90 1 3 2 4/5 209{226 1 3 4/5 313{330 1 4 265{282 1 4 457{460 2 3/4/2 2/3/4/5 449{466 2 3 2 4/5 97{114 2 4 4/5 521{538 3 2 3/4/5 185{201 3 4/2 2/3/4/5 273{290 3 3 4 5 361{377 3 3 2/4/5 497{514 4 2 2 4/5 33{50 4 3 3/5 337{353 4 4 4/5 233{250 4 1/3/4 441{456

Table 4: Mean errors of the instances of theme TIC oftheerrors Instances Mean Study: I Study: II Study: III Study: IV Study: V 33{50 6.78 3.61 1.72 0.56 0.39 73{90 13.44 4.28 4.67 1.56 0.72 97{114 9.67 3.83 1.83 0.72 0.39 185{201 4.76 3.18 1.00 0.47 0.29 209{226 13.22 5.56 5.89 2.00 2.06 233{250 14.28 5.61 5.39 2.11 2.61 265{282 11.06 6.17 5.61 5.00 5.61 273{290 8.61 7.39 5.61 6.67 7.67 313{330 12.33 4.94 4.89 1.33 2.72 337{353 7.41 5.06 2.59 1.18 1.06 361{377 9.18 4.24 6.35 1.53 2.82 441{456 8.62 4.00 4.00 4.00 5.19 449{466 8.72 6.39 6.39 6.56 8.28 457{460 11.25 10.75 7.25 6.75 11.00 497{514 8.67 4.06 2.83 1.33 0.89 521{538 14.44 6.44 6.17 1.89 1.22 Total 162.44 85.51 72.19 43.66 52.92 Table 5: Classifications and misclassifications of the model Study No. Hits Classications. No F ailures No. Miscl. Minor Misclassications No. Miscl. jor Ma I 0 16 2 5 II 3 13 0 6 III 4 12 0 14 IV 11 5 11 0 V 13 3 6 0

190 170 Total Mean Error 150 130 110 90 70 50 30 0 10 20 30 40 50 60 70 80 90 100 Reinforcement Figure 11: Total mean error of classifications

Some conclusions may be drawn from the results. First, as displayed in table 5, the model held a high number of misclassifications in the third study. Such a high number was due to the fact that the model classified an intermediate part of theme as its final part, and consequently, kept on misclassifying intermediate parts of instances of theme as their final parts as well. Second, as shown in tables 4 and 5, the model exhibited a similar good performance in the fourth and fifth studies. In the fourth, the model exhibited lower mean errors, whereas in the fifth study, it presented better results for classifications and misclassifications. The model satisfied, therefore, the first of our required conditions as stated in section 4, i.e., it was able to perform efficiently both on a set of input sequences and on a set of sub-sequences within a large and unique input sequence. Third, by analysing the results displayed in the figures 4 to 11, and in the table 3, one may observe that the model was fault tolerant to errors, satisfying therefore, the second of our required conditions as stated in section 4. It classified properly several instances which differed slightly from the theme, whether in the pitch or in the duration of one single note. The model performed classification efficiently in the presence of noise as well. When instances of theme occurred concurrently with other polyphonic voices, the degree of noise was so high that it caused the model not to classify instances correctly. We may observe that in the results of studies I, II, and III. However, when thematic reinforcement was given to instances in a greater amount, the remaining polyphonic voices started playing roles of noisy backgrounds, and then, the model started classifying rightly instances of theme. That may be observed in the results of studies IV and V. Fourth, as it may be observed in figures 5, 8, 10, and in the table 3, the model failed, in all studies, in recognizing three instances of theme. It succeeded, however, as shown in figures 6 and 9, and in the table 3 in recognizing two other instances of theme in the last four studies. These instances, which occur between TICs 265 and 282, TICs 273 and 290, TICs 441 and 456, TICs 449 and 466, and TICs 457 and 460, overlapped through the voices, making up the two cases of stretto present in the fugue. One may conclude, therefore, that the recognition of strettos was not performed reasonably by the model. Fifth, the model presented a fairly high number of minor misclassifications in the fourth and fifth studies, according to the results in the table 5. Yet, we should not consider severely those results, because they are much more a case of `inertia' than a proper error. We considered a case of minor misclassification when the model kept on classifying as instance of theme the next few TICs which followed the instance. In respect to the major errors yet, the model performed perfectly, not producing anyone. The results thus reveal that the third of our required conditions, as stated in the section 4, was finely satisfied. Sixth, by comparing studies IV and V in tables 4 and 5, one may verify that there is not a significant difference between their results. The model exhibited lower mean errors in the study IV, but otherwise, it presented better results in terms of classifications and misclassifications in the study V. The straightforward conclusion which may be drawn is thus,

that the segmentation of the fugue on rests, i.e., the resetting of the model on inputs which corresponds to rests, does not yield any improvement in terms of classification. Seventh, by analysing the relation between number of voices sounding and the mean errors in the tables 3 and 4, one may observe that the recognition of an instance of theme becomes more difficult when the number of voices sounding simultaneously increases. The only exception is in the instance of theme occurring along TICs 449 to 466. The mean errors in the fourth and fifth studies are 6.56 and 8.28 respectively. They are much higher, for example, than those in the same studies occurring in the instances along TICs 97 to 114, and 521 to 538. Nevertheless, one must see that the instance of theme along TICs 449 to 466 is the one which, together with the instances along TICs 441 to 456, and 457 to 460, makes up the second stretto present in the fugue. Hence, the entrance of the first instance from TICs 441 to 456, and of the third instance from TICs 457 to 460, probably contributed to those high mean errors achieved. Finally, there seems not to be any relation between the voice in which the instance of theme occurs and the ease in recognizing it. Let us present three examples taken from the tables 3 and 4. The first is given by the instances I a and I b from TICs 185 to 201, and from 33 to 50 respectively. I a occurs in the third voice, and I b in the fourth one. Both instances occur together with another voice. The mean errors of I a are lower than those of I b. The second example is produced by the instances I c and I d, from TICs 313 to 330, and 337 to 353 respectively. I c occurs in the first voice, I d in the fourth, and there are two other voices sounding concurrently with those instances. The mean errors of I c are higher than those of I d, with the exception of that one in the study II. The third example is given by the instances I e from TICs 521 to 538, and I f from TICs 233 to 250. I e occurs in the second voice, and I f in the fourth one. There are three other voices sounding concurrently with the instances. The mean errors of I e are lower than those of I f in the studies IV and V. Thus, the recognition of instances of theme was not facilitated when the instances occurred in any specific voice. 5. Conclusion An original representation for musical sequences, and an original artificial neural model for sequence classification is presented. The model has a topology made up of two self-organizing map networks, one on top of the other. It encodes and manipulates context information effectively, and that enables it to perform sequence classification and discrimination efficiently. The model has application in domains which demand classifying either a set of sequences of vectors or sub-sequences into a unique and large sequence of vectors in time. The results obtained have shown that the artificial neural model was able to perform efficiently sequence classification and discrimination. The model was able to classify properly most of the instances of theme occurring in the musical piece. It is worth noticing that the model performed classification even in the presence of noise, i.e., even when instances occurred modified, in different past contexts, and amidst different polyphonic voices in the musical piece.

The model could also discriminate instances of theme from sequences that shared some similarity with the theme. Very many of these pseudo-instance sequences occurred in the fugue. Rightly yet, the model did not classify them as instances of theme. The parameters of the model and the model itself are not, in any way, tied up to any specific application, whether it be a musical piece or any other domain. Indeed, we are currently doing further research on the model, applying it to the electrical engineering domain, to predict the demand for electrical load. The first results of this work are to be published in [25]. Footnotes 1 Also known as array, grid, or output layer. 2 Stretto is a musical passage where two or more instances of theme overlap. 3 In the fugal domain, the developments of thematic material take place in parts called episodes. 4 Reinforcement was provided from the seventh common TIC between the theme and any of its instances. 5 The initial TIC is the seventh TIC of the instance. References [1] T. Kohonen. Self-Organization and Associative Memory. Springer-Verlag, Berlin, third edition, 1989. [2] J. Kangas. On the Analysis of Pattern Sequences by Self-Organizing Maps. PhD thesis, Laboratory of Computer and Information Science, Helsinki University of Technology, Rakentajanaukio 2 C, SF-02150, Finland, 1994. [3] G. J. Chappell and J. G. Taylor. The temporal Kohonen map. Neural Networks, 6:441 445, 1993. [4] D. L. James and R. Miikkulainen. SARDNET: a self-organizing feature map for sequences. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Proceedings of the Advances in Neural Information Processing Systems, volume 7. Morgan Kaufmann, 1995. [5] H. Sano and B. Jenkins. A neural network model for pitch perception. In P. Todd and D. Loy, editors, Music and Connectionism, pages 42 53. The MIT Press, Cambridge, MA, 1991. [6] I. Taylor and M. Greenhough. Modelling pitch perception with adaptive resonance theory artificial neural networks. Connection Science, 6(2&3): 135 154, 1994.

[7] B. Laden and D. Keefe. The representation of pitch in a neural net model of chord classification. In P. Todd and D. Loy, editors, Music and Connectionism, pages 64 83. The MIT Press, Cambridge, MA, 1991. [8] D. Scarborough, B. Miller, and J. Jones. Connectionist models for tonal analysis. In P. Todd and D. Loy, editors, Music and Connectionism, pages 54 63. The MIT Press, Cambridge, MA, 1991. [9] M. Leman. The ontogenesis of tonal semantics: results of a computer study. In P. Todd and D. Loy, editors, Music and Connectionism, pages 100 127. The MIT Press, Cambridge, MA, 1991. [10] J. Bharucha. Music cognition and perceptual facilitation: a connectionist framework. Music Perception, 5(1):1 30, 1987. [11] J. Bharucha. Pitch, harmony, and neural nets: a psychological perspective. In P. Todd and D. Loy, editors, Music and Connectionism, pages 84 99. The MIT Press, Cambridge, MA, 1991. [12] J. Bharucha and P. Todd. Modeling the perception of tonal structure with neural nets. In P. Todd and D. Loy, editors, Music and Connectionism, pages 128 137. The MIT Press, Cambridge, MA, 1991. [13] P. Desain and H. Honing. The quantization of musical time: a connectionist approach. In P. Todd and D. Loy, editors, Music and Connectionism, pages 150 167. The MIT Press, Cambridge, MA, 1991. [14] M. Page. Modelling the perception of musical sequences with self-organizing neural networks. Connection Science, 6(2&3): 223 246, 1994. [15] R. Gjerdingen. Categorization of musical patterns by self-organizing neuronlike networks. Music Perception, 7(4):339 370, 1990. [16] R. Gjerdingen. Using connectionist models to explore complex musical patterns. In P. Todd and D. Loy, editors, Music and Connectionism, pages 138 149. The MIT Press, Cambridge, MA, 1991. [17] P. Todd. A connectionist approach to algorithmic composition. In P. Todd and D. Loy, editors, Music and Connectionism, pages 173 194. The MIT Press, Cambridge, MA, 1991. [18] J. Lewis. Algorithms for music composition by neural nets: improved CBR paradigms. Proceedings of the International Computer Music Conference, Computer Music Association, pages 180 183, 1989. [19] J. Lewis. Creation by refinement and the problem of algorithmic music composition. In P. Todd and D. Loy, editors, Music and Connectionism, pages 212 228. The MIT Press, Cambridge, MA, 1991.

[20] M. Mozer. Connectionist music composition based on melodic, stylistic, and psychophysical constraints. In P. Todd and D. Loy, editors, Music and Connectionism, pages 195 211. The MIT Press, Cambridge, MA, 1991. [21] M. Mozer and T. Soukup. Connectionist music composition based on melodic and stylistic constraints. In R. Lippmann, J. Moody, and D. Touretzky, editors, Proceedings of the Advances in Neural Information Processing Systems, Morgan Kaufmann, volume 3, pages 789 796, 1991. [22] O. A. S. Carpinteiro. A hierarchical self-organizing map model for sequence recognition. Neural Processing Letters, 9:1 12, 1999. [23] O. A. S. Carpinteiro. A hierarchical self-organizing map model for pattern recognition. In L. Caloba and J. Barreto, editors, Proceedings of the Brazilian Congress on Artificial Neural Networks 97 (CBRN 97), pages 484 488, UFSC, Florianópolis, SC, Brazil, 1997. [24] Z. Lo and B. Bavarian. Improved rate of convergence in Kohonen neural network. In Proceedings of the International Joint Conference on Neural Networks, volume 2, pages 201 206, July 8 12 1991. [25] O. A. S. Carpinteiro. A hierarchical self-organizing map model in short-term load forecasting. To appear in the Proceedings of the Fifth International Conference on Engineering Applications of Neural Networks (EANN), Warsaw, Poland, September 13 15 1999.