A mathematical model for a metric index of melodic similarity PIETRO DI LORENZO Dipartimento di Matematica Seconda Università degli Studi di Napoli 1
Abstract What is a melody? When is there similarity between two melodies? Musicians cannot operatively ask for this question. Always, they doubt whether there is some objective definition. Furthermore, it is very difficult to estimate musical and melodic similarity. A solution is provided by signal analysis method. In this paper each melody is encoded into a numerical sequence of integer number that represents the pitches of the notes. Starting from a multidisciplinary point of view, I propose to build up a metric index in the space of all the simple melodies, based on the crosscorrelation function. This is a well-defined and known function in statistics, used in a several natural sciences. Cross-correlation is applied to two melodies; its value for lag time τ=0 is used to define the index. This index (real-valued) is like the Mercalli-Cancani Sieberg empirical seismicity scale. A metric index of melodic similarity turns out to be a very practical and helpful instrument to catalogue Gregorian chant musical variants among same canticulae and in folk melodies. I have applied this method to investigate for several melodies the distance between original melody and its variants. The results show a good agreement with the perceptive point of view. Between a melody and its pitch transposed variant (up or down) there is no distance. If I select and change only a note (without a particular role as tonic or dominant) I obtain a little distance. The distance increases as much as the pitch of the note expected in variants is different from original one. 2
Introduction About music and melody there are many different definitions. I look for operative definitions (into a signal theory) for music and melody. These concepts are like familiar to every musician: nevertheless their definitions are not operative but only metaphoric and philosophic. Musical theory and physical aspects Despite the subjectivity of contemporary definitions of music and melody, I found it is essential to work and achieve a result, which can be accepted unanimously. Musical theory recognizes some features of music that correspond to physical quantities such as: height, intensity, rhythm, duration (and time), tonal (harmony), timbre. But what is really meant melody? The entry in D.E.U.M.M. seems to be the most comprehensive. Melody is certainly one of the most fundamental elements in music. It is commonly defined as a sequence of sounds organically arranged i.e. sounds making an entity of a welldefined expressive value. Its structure is seen as sequence of intervals. About its temporal development (other than linear pattern musical notes) melody is seen as the horizontal element (diachronic) of a musical structure (whereas harmony is the vertical or synchronic element). The concept of melody does not necessarily imply continuity in the sound event, which could also be interrupted by short pauses. Intonation and rhythm are essential elements to melody, but also: scale, tonal area, timbre, intensity, movement, and articulation. However, if we equip the geometrical space of melodies with a natural equivalence relation, based on the above-mentioned qualities (scale, tone, timbre,...), we can partition it into significant equivalence classes. Indeed, a melody is always recognizable (I m simplifying), when played on a different scale, with different chords and one or more instruments, within any intensity range, with variations of intensity rather than variations of words. Selfridge-Field (pp.9-12) also classifies melodies on a larger scale (e.g. as mixed, selfaccompanied, submerged, wandering and widespread). The motive is the seed of melody; it is, sometimes, a short or very short fragment of the sequence rhythmically characterized by one or more melody intervals and it shows a self-contained pattern. Here I suggest an operative and physical definition of music, like in the theory of signals: a signal in time marked (at the extremes) by the persisting absence of a signal or by one of its values considered non voluntary or meaningless (sound noise/noise). In the same physical meaning, I state 3
the following definition of melody: a signal fragment that is a temporal window marked by other sounds or by voluntary pauses. It is clear that the marking takes place on linguistic and Gestaltic choices, i.e. a self-contained sound pattern, complete and self-contained features, and the absence of a macro-structural redundancy. In brief, something, which reminds necessary and sufficient conditions familiar to mathematicians. Melodic signal analysis If you encode music into numerical form x t, each melody is a proper subset of the signal space. In this case, we have not sampling problem as aliasing, windowing, and filtering, digital noise and so on. Let us consider the pair of record x t and y t x(t)=x t y(t)=y t t {1,2,...,N}, Time index is a discrete one. Let define cross-correlation function by 1 T R xy ( τ) = x ty t +τ t {-T,-T+1,...-1,0,1,...T-1,T}, 2T + 1 t = T 4
where N is the dimension of record sample and τ is the lag time (τ N). Let define auto-correlation function by A xy ( τ) = 1 T x tx 2T + 1 t = T t +τ t {-T,-T+1,...-1,0,1,...T-1,T} with τ 0 and τ N. Auto-correlation and cross-correlation functions are symmetric with respect to τ. The autocorrelation function describes the general dependence of the value of the data at one time with respect to the values at another time. The cross-correlation function of two data describes the general dependence of the values of the x data at one time with respect to the values of y data. If τ such that R xy (τ)=0, then x t and y t are not correlated at this τ value. If τ 0 results R xy (τ)=0, then x t and y t are independent. Melodic encoding and model Here, melodies are coded into numerical code: every note corresponds to a numerical bit, according to rule that C=1, C#=2, D=3 etc. Note durations are conventionally set at a basic music figure (here a Quarter). Every longer figure is represented with a copy of the same code repeated as many times as it covers up duration. This code is justified by perception: men distinguish the intervals in a melody that is distance from notes (physically these are logarithmic ratio of frequencies). 0 codifies unintentional silent, rests, pause or signal absence. The model is thought like an instrumental apparatus: using it and a rule protocol, it gives back a numerical value, similarly a thermometer! The model is built up on cross-correlation function. Often, auto-correlation function has a maximum for τ=0. On the contrary, cross-correlation function has in general no maximum for τ=0. If the maxim exists, two data samples x t and y t are correlated for correspondent τ value and then I define the distance from x t to y t by 5
1 d(x t, y t ) = 1 d R. R ( τ) xy d is the index of melodic similarity. It is an empirical proximity index similar to Mercalli-Cancani- Sieberg seismic scale. This model can analyze every musical signal. I think that it is particularly adapted to: 1) "pure" melodies: melodies like those shown in Selfridge-Field (pp.9-12) are not correctly computable; 2) monodic music without accompaniment; 3) omorithmic pieces; 4) melodies characterized by melodic pattern with joint tone interval or small interval. Experimental data : analyzed melodies and variants To point out an application of model I select a piece of monodic repertoire, from early western Christian monodic music (briefly Gregorian Chant in the following). That is for several important reasons. 1) It is a monodic chant. 2) The piece selected is in syllabic style (a neuma for every note). In a now-days transcription with modern figure the neuma are all codified with the same symbol (usually Quaver). The transcription does not force us to quantize the sound during or the metric accent. 3) In the Gregorian chant every melody is built up using just one selected scale. In Middle Age Music Theory there was eight musical modes: each individual chants had to be assigned to a proper mode. Then, in every melody no modulation appears since there is no possibility to move form a scale to another in the same melody. 4) Melodic simplicity. The intervals are small, there are just a few little jumps. This example of model application is on the "Stabat Mater". Every neuma has a fixed value equal a Quaver and the finale is always a Crotchet. This method is more faithful and powerful in music application since there is not necessary a sampling physical procedure. 6
My goal is to compare two melodies and see when they can be declared similar. I study eight artificial variants of each original melody (namely Stabat Mater ten motives). For example, the variants in the first melody are the following. V1) Change pitch only of first C to C sharp. V2) Transpose all C to C sharp. V3) Transpose all pitches of one semitone up. V4) Transpose all pitches of one seventh up. V5) Double all figure values (e.g. from Quaver to Crotchet). V6) Change every Crotchet with a Quaver and a Quaver rest. V7) Interpolate the V6 with a fixed interpolation note D. V8) Interpolate with a random pitch (generated in the range A 4 =69 to A 5 =81). I compute twenty V8 variants and I use mean value of the cross-correlation values to obtain the distance. Data elaboration In the following, I call A xjxj (τ) for J {1,2,...,10} the auto-correlation function and R xjym (τ) for M {1,2...,8} the cross-correlation function between each original melody (or its doubled variant 5) and the other variants. Only original melodies and their V5 variants are compared also by auto-correlation. Then I compute the cross-correlation functions between original melody and each variant from V1 to V4 and between variant V5 and all following variant (V6,V7,V8). Thus, very proximal melodies will have distance near zero and vice versa melodies quite different will have d(x J,y M ) greater then zero. 7
Results Always, the A xjxj (τ) the auto-correlation function patterns of original melodies are very similar to the auto-correlation A y5y5 (τ) of the fifth variants. I try to quantify according to a scientific criterion the human perception about melody difference. Cross-correlation functional patterns are similar to auto-correlation one. You can also note (table 1) that, generally : 1) R xjy1 (0) is less than 1 and the other peaks of R xjy1 (τ) are attenuated; 2) R xjy3 (τ) is quite similar to R xjy1 (τ) but with values hardly greater; 3) R xjy3 (τ) and R xjy4 (τ) are perfectly equal to R xjyj (τ); 4) R xjy6 (τ) accords to periodic decay function patterns cross-correlation is attenuated and flatted, squared with horizontal segment; 5) R xjy7 (τ) is near to R xjy7 (τ) but with value magnification; 6) R xjy8 (τ) is clearly random in amplitude and pattern. original auto original orig - var 1 orig - var 2 orig - var3 orig - var4 auto var5 var5 - var6 var5 - var7 var5 - var8 1 0 0,005 0,02 0 0 0 40 0,5 2,6 2 0 0,003 0,01 0 0 0 29 0,9 2,2 3 0 0,003 0,01 0 0 0 30 0,4 1,8 4 0 0,003 0,01 0 0 0 29 0,5 1,6 5 0 0,005 0,03 0 0 0 30 0,6 1,6 6 0 0,006 0,01 0 0 0 35 0,8 1,9 7 0 0,003 0,01 0 0 0 32 0,8 1,7 8 0 0,005 0,01 0 0 0 32 0,8 1,8 9 0 0,003 0,02 0 0 0 32 0,7 1,7 10 0 0,002 0,00 0 0 0 28 0,6 1,6 Table 1. Distances between original melodies and variants. Rows are different melodies. Columns are variants (except first). 8
Discussion and conclusions The model proposed emphasizes single and multiple change of pitch. The more R xjxm (τ) is less than 1, the melodies are far away and conversely. Difference in trend between A xjxj (τ) and R xjym (τ) increases when perceptive distance rises. On the contrary R xjym (τ=0) decreases with the increase of the perceptive distance. In V1 variants there is only a little distance from the original melody: just expert musicians can perceptively appreciate this change. All V2 variants are indeed similar but with a greater distance (perhaps due to change tonal structure that is perceptively more evident). All V3 and V4 variants are absolutely equal to original according to the perceptive equivalence of the pitch transpositions to up or down. Also the V5 variants coincide with the original according to perceptive results, i.e. velocity change is not recognized. In V6 variants distances are the greatest according to the increase of difficulty on perceiving (furthermore in Gregorian chant there are not rests!). The V7 variants are similar to V6 ones but flatted are not horizontal: distances are smaller then sixth variants because there are fixed interpolate notes (the principal of mode). A great distance also occurs for V8 variants (random interpolate): here it is very difficult understand the original melodic pattern. But the complex structure of melody hides random interpolate data and thus distance is not the greatest one. 9
Address for correspondence : PIETRO DI LORENZO Dipartimento di Matematica Seconda Università degli Studi di Napoli 81100, Caserta, Italia -Via Antonio Vivaldi, 43 +39+(0)823/274752 - fax +39+(0)823/274753 e-mail: pietro.dilorenzo@unina2.it 10
References AA.VV. (1995): lemma Melodia in DEUMM, Dizionario Enciclopedico Universale della Musica e dei Musicisti, UTET. Baroni, M. - Dalmonte, R. - Jacoboni, C. (1999): "Le regole della musica", Torino, EDT. Bendat, J.S. Piersol, A.G. (1970): Random data, Wiley. Deutsch, D. (1987): Memoria ed attenzione nella musica in La musica ed il cervello, editor McDonald Critchley, Henson,R.A. Selfridge-Field, E. (1998): Conceptual and Representational Issues in Melodic Comparasion in Melodic similarity ed. Hewlet,W.B. Selfidge-Field,E. Zaripov, R.C. (1970): Musica con il Calcolatore, Padova, F. Muzzio. 11