Detection of Near-Duplicate Musical Documents from a Multi-Level Comparison of Tonal Information

Size: px

Start display at page:

Download "Detection of Near-Duplicate Musical Documents from a Multi-Level Comparison of Tonal Information"

Anne Davis
6 years ago
Views:

1 Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information Julien Allali LaBRI and Pacific Institute For the Mathematical Sciences, Université de Bordeaux, France and Simon Fraser University, anada Pascal Ferraro LaBRI and Pacific Institute For the Mathematical Sciences, Université de Bordeaux, France and University of algary, anada Pierre Hanna, Matthias Robine, Thomas Rocher LaBRI, Université de Bordeaux, France 1 Introduction Plagiarism is the act of copying or including another author idea without proper acknowledgment. More precisely, in the music industry, in order to have a copyright infringement case, the plaintiff must demonstrate that the defendant not only had access to the plaintiff s song, but also that the two songs are substantially similar. Some famous plagiarism trials took place in the last 50 years. In 1976 and then in 1981, Georges Harrison was sued for plagiarism over the single My Sweet Lord because of its similarity to the 1963 hiffons single He s So Fine 1. More recently, in 2005, a Belgian songwriter, Salvatore Acquaviva, took Madonna to court claiming she plagiarized a tune he wrote with her worldwide 1998 hit Frozen 2. The number of music documents available on the World Wide Web is highly increasing and so is the risk of plagiarism infringements. For instance, each year, more than new record albums are released which correspond to over new musical pieces registered for copyright (Vogel, 2010). In France, the Société des auteurs, compositeurs et éditeurs de musique (SAEM), a French professional association collecting payments of artists rights, has registered over pieces in 2004 (Miyet, 2009). This same year, the SAEM who is also in charge of checking the originality of new musical pieces in order to limit the number of copyright infringement, was only able to manually investigate a few of them. Furthermore, this verification is still limited and a full musical analysis has to be performed by human experts once a complaint is lodged. In order to better protect the rights of artists, new automatic tools that evaluate similarities between music fragments must be proposed. Some studies in the context of Music Information Retrieval deal with computer-based techniques that may help listeners to retrieve near-duplicate music documents and court to assess plagiarisms. One of the 1 (accessed Oct 28, 2010) 2 (accessed Oct 28, 2010)

2 2 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information main goal of music retrieval systems is actually to find musical pieces in large databases given a description or an example. These systems compute a numeric score on how well a query matches each piece of the database and rank music pieces according to this score. omputing such a degree of resemblance between two pieces of music is a difficult problem. The performance of the existing systems may strongly depend on musical culture, on personal opinion, on mood, etc. From a computational point of view, evaluating similarities consists in computing a measure between a pair of musical segments. Three families of methodologies have been proposed (Orio, 2006). Approaches based on index terms generally consider N-grams techniques (Doraisamy & Rüger, 2003; Uitdenbogerd, 2002), which count the number of common distinct terms between the query and a potential answer. Geometric algorithms (Ukkonen et al., 2003; Typke et al., 2004; Typke & Walczak-Typke, 2008) consider geometric representations of music and compute distances between objects. Techniques based on string matching (Hanna et al., 2007) are generally more accurate as they can take into account variations in both the query and/or the database. This is a major feature in the context of music retrieval systems since audio analysis always induces approximations. These techniques often assume a representation as a sequence of the melody, either as a sequence of notes (Mongeau & Sankoff, 1990) or as a sequence of set of notes (Hanna et al., 2008) in the case of polyphonic tonal music (although more structured representations have also been proposed, e.g. (Rizo et al., 2006)). However, the harmony of a musical piece can be seen at different level of decomposition from notes to the main key (Robine et al., 2009). In this hapter, we will see in Sections 2 and 3 how these different levels can be structured hierarcally as an ordered rooted tree graph. In this late representation, more sophisticated algorithms to compare trees need then to be introduced to compare musical pieces (Section 4). In Section 5, we will show how an adaptation of tree-to-tree comparison is used to propose a system for detecting near-duplicate music documents. We finally present in Section 6 some perspectives and remaining problems in the context of the detection of music document plagiarisms. 2 Modularity of Harmony The notion of modularity in music relies on the idea of decomposing a music into elementary constituents and describing their connections. Studying musical modularities thus amounts to studying how harmony can be decomposed into elementary constituents and what the properties of the structures resulting from these decompositions are. Interestingly, modularity obtained from decompositions is also readily noticeable in many natural processes or living organisms (e.g. plants (Godin & araglio, 1998)) and in that way music can be somehow identified to a natural process. Harmony decomposition may a priori be either artificial or natural. It is artificial if the decomposition is carried out using criteria that have no musical meaning: a musical piece can be decomposed for instance into a set of 10 ms-long sections. On the other hand, the decomposition is natural if the listener uses musical features to identify harmony constituents. In this section, we focus on natural (i.e. musical) decompositions, which have more specific properties than artificial decompositions. Different types of modularity exists within a musical piece at the same time. Five levels of decomposition are thus discussed, from a sequence of notes to the main key. This decomposition allows for describing precisely tonal properties and will be essential in Section 3 to define a multiscale representation of harmony. The modularity presented in the following is based on (Robine et al., 2009), however this decomposition is not unique.

3 Julien Allali, Pascal Ferraro, Pierre Hanna, Matthias Robine & Thomas Rocher 3 (a) (b) Figure 1: (a) represents a polyphonic excerpt. (b) illustrates the homorhythmic transformation introduced by the simultaneity definition. 2.1 Notes Notes are the lowest level of harmony. They are well-known components of a musical piece that can be retrieved from audio by automatic transcriptions (Klapuri & Davy, 2006), or directly from symbolic data. Following a model proposed in (Mongeau & Sankoff, 1990), any monophonic score can be represented as a sequence of ordered pairs with the pitch of the note as the first component and its length as the second. Since they are the smallest particles to represent tonal properties, notes are a fundamental harmonic level for music retrieval purposes. 2.2 Simultaneities Intuitively, the first higher level, called simultaneity, consists in a monophonic reduction of the notes that sound at the same time. To prevent the problem of overlapping notes in the representation of this level, a simultaneity is defined as a set of notes beginning and ending at the same time. Such a set can be constituted by either one or several notes. Thus, a music piece shall be represented by a sequence of simultaneities using an homorhythmic transformation, as illustrated in Fig. 1. Music perception then remains and polyphonic sounds are reduced to a monophonic sequence of simultaneities. A simultaneity may be represented using four features such as a bass (the note with the lowest pitch in the group of notes), a figured bass (the organization of the intervals from the bass), an onset time and a duration. This provides a unique representation of a simultaneity with respect to harmony. Indeed, two simultaneities having both the same bass and the same figured bass (e.g. 4, F4, A4 and 4, A4, 5, F5) are harmonically equivalent to each other. However, many other different notations could be used for a simultaneity (Harte et al., 2005). 2.3 hords The upper level refers to the sequence of chords composing a musical piece. Based on a definition proposed in (Harte et al., 2005), a musical chord may be represented using a root (the note upon which the chord is built), a bass note (or its inversion defined by the relation between the root and the bass), a type (defined by the component intervals that make up the chord relative to the root), a mode (which may be major, minor or undefined), an onset time, a duration, a degree regarding the key, and a tonal tension with respect to the key. hords are organized in sequences according to the time. Furthermore, depending on the application, different kinds of representations of a chord may be proposed: absolute, relative to the precedent chord or relative to the key. We illustrate these differences by representing the chord progression of the excerpt in Fig. 2. An absolute representation of this sequence, based on the triple (roots, modes and lengths), would be

4 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information Figure 2: Excerpt of The Phantom Of The Opera (A.L. Webber).

4 4 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information Figure 2: Excerpt of The Phantom Of The Opera (A.L. Webber). the following: (D,m,6)(D,m, 1 2 )(Db,m, 1 2 )(,m, 1 2 )(B,M, 1 2 )(Bb,M,4) A relative representation would be achieved using the successive differences of roots in semitones and ratios of successive lengths: (0, 1 12 )( 1,1)( 1,1)( 1,1)( 1,8) The couple (chord,degree) is an example of key relative representation. The chord degree is an integer between 1 and 7, with a -1 value when the chord root does not belong to the scale from the key. The Lerdahl s distance (Lerdahl, 2001) is computed between the chord and the triad from the key : (1, 0)(1, 0)( 1, 11)(7, 9)( 1, 12)(6, 7) Several methods for chord detection from audio are based on a windowed analysis (hew, 2000; Gómez, 2006). It gives a unique length to the chords, depending on the parameters of the window. However, this uniform sampling of chords does not make any differences between simultaneities, which may contain ornamental or single notes, and chords. A more sophisticated method is proposed by Temperley (Temperley, 1999) which enables to extract non-uniform chord sequences from symbolic data. 2.4 Local Keys The sequence of keys composing a musical piece, thus forming the different modulations, is defined at the level of local key. A local key can therefore be represented with parameters such as a tonic (a pitch class and an accidental), a mode, an onset time and a duration. Since key detection has been a very active field for years now, lots of methods have been proposed based on either pitch profiles (Temperley, 1999), chomagrams (huan & hew, 2005), or Hidden Markov Models (Noland & Sandler, 2006). The different key finding algorithms have proven to be quite efficient, especially with classical music databases 3. Note this level can be relevant for music retrieval purposes since similar modulations may indicate similarity in the form of musical pieces. 2.5 Key The main key of a musical piece is obviously an important parameter of the harmony. A key has the same parameters as a local key (tonic, mode, onset time and duration). 3 (accessed Oct 28, 2010)

5 Julien Allali, Pascal Ferraro, Pierre Hanna, Matthias Robine & Thomas Rocher 5 The key of a musical piece may be used for classification purposes in classical Western music. During the classical period, the choice of key was indeed important. For example, (Steblin, 1996) indicates that a key Major may be described as pure or simplicity, whereas for Ab Major the feelings induced are related to death or grave. The key could also be interesting for retrieving a musical piece in adequacy with a physical constraint, for instance a song that is in adequacy with the voice of a singer. Analysis methods for main keys are the same as for the local key detection (see Section 2.4). 3 A Multiscale Representation of Harmony In this section, we build a model to represent harmony satisfying the requirements derived from the analysis carried out in the previous section. 3.1 List of Sequences As introduced in (Mongeau & Sankoff, 1990), any monophonic score may be formally regarded as a serie of notes represented by an ordered pairs with the pitch and the length of the note. This representation as a sequence of notes can be naturally extended to any level of decomposition of a musical piece. A first representation then consists of a list of different sequences, i.e. a sequence of notes, a sequence of simultaneities, a sequence of chords, a sequence of keys and a main key. Although these representations have been extensively used in literature, they are somehow limited: without a structured information of the harmony, some inadequate choices could be made when estimating the similarity. We then propose to use a global structure for the representation of the harmony, including the different modularities introduced earlier, instead of a list of sequences. 3.2 Tree Representation Definitions and Notations A directed graph G is defined by a pair of sets (V,E) where V denotes the set of nodes and E is a finite set of ordered pairs of nodes called edges. Let (v 1,v 2 ) be an edge of E, v 1 is called a parent of v 2 and v 2 is a child of v 1. The set of children of a node v is denoted by child(v). A node that has no child is called a leaf. G represents the number of nodes of G. We shall sometimes say that a node x is in G, meaning x V. A directed acyclic graph (DAG) D is a connected graph containing no directed cycles. A rooted tree is a DAG such that there exists a unique node, called the root, which has no parent node, and any node different from the root has exactly one parent node. In the following, a rooted tree is called simply a tree. In this hapter, we consider the set of rooted ordered trees, noted T. A rooted tree is said ordered if the set of children of a given node are ordered. These are therefore trees for which the left-to-right order among the sibling nodes is significant. In our case this order is given by the time. A subtree is a particular connected subgraph of a tree. More precisely the complete subtree T [x] is a maximal subtree rooted in x. Finally, a set of disconnected ordered trees is called a forest. Each node of a tree can be associated with one (or several attributes) that represent musical characteristic(s) and consist(s) of either real number(s) (e.g. pitch, length), or symbol(s) (e.g. chord type). Let α be a labeling function which associates a label from a finite or infinite set Σ = a,b,c,... with each node and each edge. A function score s, called elementary score, is supposed to be defined on labels. A score between nodes of a graph can be defined using the score on labels: s(v 1,v 2 ) = s(α(v 1 ),α(v 2 )). Let λ be a unique symbol not in Σ, s is extended by defining quantities s(α(v 1 ),λ) and s(λ,α(v 2 )) so that s defined a score

6 6 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information on Σ {λ}. The score s(α(v 1 ),λ) between the label of a node v 1 and the label λ is denoted by s(v 1,λ) by convention. Tab. 1 shows score values chosen during our following experiments. The score between a note and a rest has been fixed to 0.5 (Hanna et al., 2007). Pitch difference in semitones Associated score rest Table 1: Scores associated to the substitution of two notes as a function of the interval between notes (in semitones), according principally to consonance Harmony Tree Many works have been already proposed to structure the representation of music. Thus, methods based on the theory from either (Schenker, 1935) or (Lerdahl & Jackendoff, 1985) induce a tree, with rule-based reduction algorithms. (Rizo et al., 2006) also propose a tree representation for key guessing based on time reduction. This late representation implies a hierarchy relying on bars induced by the time signature of the score notation. However, different trees can represent the same melody (same sequence of pitches and durations). In particular, two melodies with two different time signatures are represented by two different musical scores. In this case, these two melodies may sound similar but are represented in a different way. key local keys chords simultaneities time notes Figure 3: Ordered tree for harmony representation. We present in the following a new multilevel representation of the harmony of a musical piece using an ordered tree. The tree representated in Fig. 3 is ordered according to the time and has a depth of five where each depth corresponds to the five levels described in Section 2. Thus, let us define the five depths of the harmony tree:

7 Julien Allali, Pascal Ferraro, Pierre Hanna, Matthias Robine & Thomas Rocher 7 1. The root of the structure represents the main key of the musical piece. 2. The main key may be decomposed as a sequence of local keys. These local keys i.e. the successive modulations, are represented by nodes at depth 2, and connected to the root. 3. The third level is the chord sequence, as the harmonic background of a jazz standard could be noted, for example. Each chord of this level participates to a key and is therefore represented by a node linked to a local key above. 4. Simultaneities are represented by nodes at depth 4. A sequence of simultaneities is connected to a parent which represents the chord embedding the corresponding simultaneities. 5. The fifth and last depth of the tree representation contains the sequence of notes. Each note being a part of a simultaneity has a link to a parent representing a simultaneity. Fig. 4 shows how the harmony of a musical excerpt can be represented using the harmony tree. (a) M M A m M M5 GM5 Em7 Am5 M5,2,3,2 G,3 E,4 E,4 A,3,3 G E D G G# A D D D E E G G G B B B G G E E A (b) Figure 4: A short musical piece is exposed in (a). (b) shows its tree representation. A simultaneity is represented here by its bass and the number of different pitch classes in its configuration.

8 8 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information 4 Measuring Similarity between Trees The comparison of trees is an important operation applied in several fields, such as molecular biology (ollins et al., 2000) or pattern recognition (Lu, 1979). To compute similarity between trees, edit distance metrics, initially introduced for string to string comparison problem (Wagner & Fisher, 1974), were first extended to compare ordered trees (Selkow, 1977; Tai, 1979). A distance between two trees is thus computed as the minimum cost of a sequence of elementary operations that convert one tree into the other, thus evaluating the global ressemblance between these trees. However, in many cases trees share only a limited region of similarity. This may be a common domain or simply a short region of recognizable similarity. This case is dealt with by so-called local mapping in an algorithm developed in (Smith & Waterman, 1981) to evaluate local similarity between strings. Local similarity aims at identifying the best pair of regions, one from each tree, such that the optimal (global) similarity of these two regions is the best possible. This relies on a scoring scheme (called local score) that maximizes a similarity score because otherwise, in the minimization case, an empty sequence of edit operations would always yield the smallest score. Three edit operations are commonly used: substituting a node x into a node y means changing the label of x into the label of y, deleting a node x means making the children of x become a new children of the father of x and then removing x, inserting a node y means that y becomes the child of a node x and a subset of consecutive children (relatively to their order) of x becomes the set of children of y. Let e be an edit operation, a score σ is assigned to each edit operation (using the score defined on tree nodes) as follows: if e substitutes x into y then σ(e) = s(α(x),α(y)), if e deletes x then σ(e) = s(α(x),λ) and if e inserts the node y then σ(e) = s(λ,α(y)). The score σ is extended to a sequence of edit operations E = (e 1,e 2,...,e n ) by letting σ(e) = n i=1 σ(e i ). This makes it possible to define a similarity S(T 1,T 2 ) between trees T 1 and T 2 as the maximum score of edit operation sequences transforming T 1 into T 2, namely S(T 1,T 2 ) = max{σ(e)}, (1) E E where E represents the set of sequences of edit operations transforming T 1 into T 2. Likewise, we can extend this notion to the similarity between forests S F (F 1,F 2 ). We consider here an extension of the Selkow s algorithm (Selkow, 1977), that computes the local similarity between two trees by considering an optimal sequence of edit operations transforming these two trees (Ouangraoua et al., 2007). In our application, edit operations are constrained such that depth and order relationships between nodes are preserved. The computation of this local similarity actually allows to detect local conserved areas between both trees. Naively, the algorithm to compute the local similarity would need to inspect every pair of regions and apply a global comparison algorithm to it. We propose a generalization of (Smith & Waterman, 1981) approach based on the notion of prefix mapping between trees. Definition 1. Let T be a tree rooted in r, any partial subtree of T rooted in r is called a prefix of T or a T -prefix. By convention, the empty tree θ is a T -prefix. Note that a particular prefix of T rooted in r is T [r] itself. Let T 1 and T 2 be two trees and let x 1 and x 2 be two nodes of T 1 and T 2, the sets of T 1 [x 1 ]-prefixes and T 2 [x 2 ]-prefixes are respectively denoted by T 1 [x 1 ] and T 2 [x 2 ]. A similar definition can be proposed for a forest: Definition 2. Let F be a forest made of n trees T 1,...,T n respectively rooted in r 1,r 2,...,r n. A F-prefix is a sub-forest of F made of any prefixes of T 1,...,T n. The local prefix mapping problem for a given pair x 1,x 2 of nodes is to find a (possibly empty) prefix ρ 1 of T 1 [x 1 ] and a (possibly empty) prefix ρ 2 of T 2 [x 2 ] such that the score of the optimal sequence of edit

9 Julien Allali, Pascal Ferraro, Pierre Hanna, Matthias Robine & Thomas Rocher 9 operations transforming ρ 1 into ρ 2 is the maximum over all scores of sequences of edit operations between prefixes of T 1 [x 1 ] and T 2 [x 2 ]. The score of the sequence solving the optimal local prefix mapping problem (called local score) for a given pair x 1,x 2 of nodes is denoted by LS(T 1 [x 1 ],T 2 [x 2 ]): LS(T 1 [x 1 ],T 2 [x 2 ]) = max{s(ρ 1,ρ 2 ),(ρ 1,ρ 2 ) T 1 [x 1 ] T 2 [x 2 ]}. (2) Note that a local prefix problem between two forests F 1 [x 1...y 1 ] and F 2 [x 2...y 2 ] is similarly defined as: LS(F 1 [x 1...y 1 ],F 2 [x 2...y 2 ]) = max{s F (ρ 1,ρ 2 ),(ρ 1,ρ 2 ) F 1 [x 1...y 1 ] F 2 [x 2...y 2 ]}. (3) where F 1 [x 1...y 1 ] and F 2 [x 2...y 2 ] represent respectively the set of F 1 [x 1...y 1 ]-prefixes and F 2 [x 2...y 2 ]- prefixes. Local similarity between two trees is then defined as the score of the best pair of local prefixes in trees T 1 and T 2 : LS(T 1,T 2 ) = max{ls(t 1 [x 1 ],T 2 [x 2 ]),(x 1,x 2 ) V 1 V 2 }. (4) In order to evaluate local similarity, the algorithm thus needs first to find maximum similarity between prefixes of T 1 [x 1 ] and T 2 [x 2 ], for any pair of nodes (x 1,x 2 ) of V 1 V 2, and then to determine the best pair of nodes x1 Max, x2 Max of T 1 and T 2. The local similarity is computed using a dynamic programming based algorithm using the following recursive relation: 0 S(F S(F 1 [x 1...y 1 ],F 2 [x 2...y 2 ]) = max 1 [x 1...y 1 1],F 2 [x 2...y 2 1]) + S(T 1 [y 1 ],T 2 [y 2 ]) S(F 1 [x 1...y 1 ],F 2 [x 2...y 2 1]) + S(θ,T [y 2 ]) (5) S(F 1 [x 1...y 1 1],F 2 [x 2...y 2 ]) + S(T [y 1 ],θ) Remark this equation guarantees that any local similarity is a non-negative real number. 5 A System for Detecting Musical Near-Duplicate Music Documents In this Section, in order to exhibit how a computer-based method is able to automatically detect nearduplicate music documents, we consider some famous examples of plagiarisms. Unfortunately there is only a few number of plagiarism cases available in the literature that allow a complete description using a symbolic representation. We have thus reduced our analysis to a short set of instances. 5.1 An Empirical Evaluation In the early 70 s, a famous music plagiarism trial involved George Harrison and his song My Sweet Lord that was released in 1970 on his album All Things Must Pass (opyright Website LL, 1995). He was sued for plagiarism of the song He s So Fine composed in 1963 by Ronald Mack and performed by The hiffons. Although Harrison explained that he did not knowingly appropriate the melody of this song, the court concluded in 1976 that he had maybe unconsciously copied the melody of He s So Fine. In order to make its decision, the court investigated the structure of both songs. Fig. 5 shows two fragments of each of these songs. He s So Fine : is composed of four repetitions of a short musical motif (Motif A, Fig. 6), followed by four repetitions of Motif B (Fig. 6). My Sweet Lord has a very similar structure with four repetitions of Motif A, followed by three repetitions of Motif B. The fourth repetition

10 10 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information of Motif B includes the grace note illustrated in Motif. Based on these observations, the court ruled the infringement of copyright. Nowadays, judge s rulings in plagiarism trials are made from similar empirical evaluation of two musical pieces and lead to the need of accurate automatic methods. Based on the implementation of algorithms to compare trees or sequences, we propose to carry out some experiments to evaluate the capability of such computational methods to evaluate plagiarism assessments. (a) (b) Figure 5: Manual transcriptions of excerpts (corresponding to motif A and motif B) of the two songs My Sweet Lord (G. Harrison) and He s So Fine (R. Mack). Figure 6: Short musical motifs composing the structure of the two songs My Sweet Lord (G. Harrison) and He s So Fine (R. Mack): motif A (top), motif B (middle) and motif (bottom).

11 Julien Allali, Pascal Ferraro, Pierre Hanna, Matthias Robine & Thomas Rocher 11 Representation Musical piece Similarity score R. Mack vs G. Harrison (1976) Query Sweet Lord Sweet Lord So Fine Essen Rank 1 Note hord Tree Query So Fine So Fine Sweet Lord Essen Rank 1 Note hord Tree Selle vs Gibb (1984) Query Let It End Let It End How Deep Essen Rank 1 Note hord Tree Query How Deep How Deep Let It End Essen Rank 1 Note hord Tree Heim vs Universal (1946) Query Vagyok Vagyok Perhaps Essen Rank 1 Note hord Tree Query Perhaps Perhaps Vagyok Essen Rank 1 Note hord Tree Autumn leaves vs La Maritza (1974) Query Autumn Leaves Autumn Maritza Essen Rank 1 Note hord Tree Query Maritza Maritza Autumn Essen Rank 1 Note hord Tree Table 2: Results of a few experiments with copyright infringement cases, by considering three different representations (notes, chords and the harmony tree).

12 12 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information 5.2 Toward an Automatic System The following tests concern 4 music copyright infringement cases in the United States over the last 50 years 4.Two different musical pieces are associated to each case. These two pieces have been stated as very similar by a court. Each of these pieces is successively considered as the query, and has been added into a noise collection. The noise collection has been set up from the Essen folksong database which contains more than 5000 musical pieces, symbolically encoded as MIDI files. We expect the system not only to retrieve the query in the collection as the most similar piece, but also to retrieve the associated piece which has been ruled as very similar, even though potentially harmonically and/or melodically different. The comparison of polyphonic music excerpts remaining a difficult problem, these evaluations focus on the comparison of monophonic melodies. Three representations of the melody are considered: as a sequence of notes, as a sequence of chords and as an harmony tree. Since we are evaluating monophonic musical pieces, notes and simultaneities are equivalent. In the same way, we assumed musical pieces don t have any modulations, thus the key and local keys are similar and reduced to only one note. Finally, both levels that contain the main musical informations are notes and chords. The note level is directly obtained from the MIDI files. The chord level is computed by the software Melisma Music Analyzer developed by Temperley and Sleator, which automatically estimates the chords according to the notes of the MIDI files (Temperley, 1999). A note is described by a pitch and a length. A chord is only described by one integer according to the line of fifths (similar to circle of fifths) (Temperley, 1999). In order to evaluate their similarity, melodies are compared using edit-distance based algorithms. As for tree comparison, a natural way to evaluate the differences between two sequences is to count the maximum score of transformations (chosen among a predetermined set of allowed transformations) which must be applied to obtain one sequence from the other, where a score is assigned to each operation. The literature proposed a lot of different sets of transformations e.g. (Wagner & Fisher, 1974; Smith & Waterman, 1981; Mongeau & Sankoff, 1990). The method developed in (Allali et al., 2007) robust to transposition invariances will be used to compare sequences. Tab. 2 shows the results of the different experiments. For each query, the similarity scores with the most similar music piece in Essen folksong database. This music piece is called Rank 1 as all the pieces are ranked according to their similarity score with the query. Although the representations used are rather simple, these experiments highlight that the tree structure improves significantly the music retrieval system. As indicated in Tab. 2, for each case, at least one sequence-based system (sequence of notes or sequence of chords) is able to extract from the Essen database the same piece estimated as a plagiarism by a court ruling. However, these results are highly dependent on the representation. In two cases the comparison of sequences of notes delivers the right verdict whereas in two other cases it is obtained from a sequence of chords model. Moreover, for the wrong representations, the discrepancy of similarity between the best score and the score of the plagiarism may be quite high. For instance, the score obtained by considering only the notes between la Maritza and Autumn leaves is 11.4 whereas the best score of the Essen database is Yet for each case, the system using the harmony tree representation has identified the correct plagiarism from the Essen database for every single case, and allows the development of a complete automatic method to help the evaluation of plagiarism infringement. These results clearly show that considering both chord and note levels improve the quality of detecting near-duplicate melodies using only a representation as a sequence of notes, though the analysis must be extended to a most complete set of cases. 4 (accessed Oct 28, 2010)

13 Julien Allali, Pascal Ferraro, Pierre Hanna, Matthias Robine & Thomas Rocher 13 6 onclusion and Perspectives Existing algorithms that can be applied to detect near-duplicate music documents rely on string matching or geometric algorithms. Although these algorithms are generally quite efficient, elements of musical theory have to be taken into account in order to improve these existing systems. In particular, the multiscale representation of harmony led to important improvements of these edit-distance based systems. Even though the results presented in this hapter are very promising, they are still limited to a few plagiarism infringement cases. A more consequent database of court rulings shall be experimented to fully experiment the reliability of the system. Furthermore, comparison methods have to be improved for each level of the harmony tree. Then, a complete evaluation of the retrieval system based on this representation has to be proposed. Despite a lot of analysis methods providing harmony parameters of a musical piece, improvements may be obtained by the use of the structured representation. Indeed, the information on a given level could be pertinent to analyze the parent level. Although, we didn t evaluate this aspect, the method should be powerful enough to detect samples in musical pieces. Provided that a musical document is the combination of small pieces of music from several musical documents, although this musical document has very low similarity with each musical document, this musical document is stilled regarded as a plagiarised work. For example, the song Bitter Sweet Symphony by the English alternative rock band The Verve became famous for the legal controversy surrounding plagiarism charges. This song is actually based on a music from an Andrew Loog Oldham adaptation of a Rolling Stones song, The Last Time. Originally, The Verve had negotiated a licence to use a sample from the Oldham recording, but it was successfully argued that the Verve had used too much of the sample 5. Technically, since our method is identifying the local similarity between two musical pieces (i.e. portions in both musical pieces sharing a maximum similarity), these samples will be identified and the plagiarism (or the use of legal sample) will be identified. The system presented here may be seen as a first step toward an automatic evaluation of plagiarism infringement. Nevertheless the Human expert would always have to confirm results obtained from such a system. References Allali, J., Ferraro, P., Hanna, P., & Iliopoulos,. (2007). Local transpositions in alignment of polyphonic musical sequences. In N. Ziviani & R. Baeza-Yates (Eds.), 14th String Processing and Information Retrieval Symposium, volume 4726 of Lecture Notes in omputer Science (pp ).: Springer. hew, E. (2000). Towards a Mathematical Model of Tonality. PhD thesis, MIT ambridge, MA. huan,. & hew, E. (2005). Fuzzy Analysis in Pitch-lass Determination for Polyphonic Audio Key Finding. In Proceedings of 6th International onference on Music Information Retrieval (ISMIR) (pp ). London, UK. ollins, G., Le, S.-Y., & Zhang, K. (2000). A New method for omputing Similarity Between RNA Structures. In Proceedings of the 5th Joint onference on Information Sciences, volume 2 (pp ). Atlantic ity, NJ. opyright Website LL (1995). opyright Website. opyright Website LL. Harrison.aspx (accessed Oct 28, 2010). Doraisamy, S. & Rüger, S. (2003). Robust Polyphonic Music Retrieval with N-grams. Journal of Intelligent Information Systems, 21(1), (accessed Oct 28, 2010)

14 14 hapter X Detection of Near-Duplicate Musical Documents from a Multi-Level omparison of Tonal Information Godin,. & araglio, Y. (1998). A multiscale model of plant topological structures. Journal of theoretical biology, 191, Gómez, E. (2006). Tonal Description of Music Audio Signals. PhD thesis, University Pompeu Fabra, Barcelona, Spain. Hanna, P., Ferraro, P., & Robine, M. (2007). On optimizing the editing algorithms for evaluating similarity between monophonic musical sequences. Journal of New Music Research, 36(4), Hanna, P., Robine, M., Ferraro, P., & Allali, J. (2008). Improvements of Alignment Algorithms for Polyphonic Music Retrieval. In Proceedings of the Int. omputer Music Modeling and Retrieval onference (MMR) (pp ). openhagen, Denmark. Harte,., Sandler, M., Abdallah, S., & Gómez, E. (2005). Symbolic Representation of Musical hords: A Proposed Syntax for Text Annotations. In Proceedings of the International onference on Music Information Retrieval (ISMIR) (pp ). London, UK. Klapuri, A. & Davy, M., Eds. (2006). Signal Processing Methods for Music Transcription. New York: Springer. Lerdahl, F. (2001). Tonal Pitch Space. Oxford University Press. Lerdahl, F. & Jackendoff, R. (1985). A Generative Theory of Tonal Music. ambridge, Massachussetts: MIT Press. Lu, S. (1979). A Tree-to-Tree Distance and its Application to luster Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, Miyet, B. (2009). Rapport d activité. Technical report, SAEM. Mongeau, M. & Sankoff, D. (1990). omparison of Musical Sequences. omputers and the Humanities, 24(3), Noland, K. & Sandler, M. (2006). Key Estimation Using a Hidden Markov Model. In Proceedings of the 7th International onference on Music Information Retrieval (ISMIR) (pp ). Victoria, anada. Orio, N. (2006). Music Retrieval: A Tutorial and Review. Foundations and Trends in Information Retrieval, 1(1), Ouangraoua, A., Ferraro, P., Tichit, L., & Dulucq, S. (2007). Local Similarity Between Quotiented Ordered Trees. Journal of Discrete Algorithms, 5(1), Rizo, D., Inesta, J., & Ponce de Leon, P. (2006). Tree Model of Symbolic Music for Tonality Guessing. In Proceedings of the 24th IASTED International onference on Artificial Intelligence and Applications (IAIA) (pp ). Innsbruck, Austria. Robine, M., Hanna, P., Rocher, T., & Ferraro, P. (2009). Structured representation of harmony for music retrieval. In Proceedings of the International omputer Music onference (IM) Montreal, Quebec, anada. Schenker, H. (1935). Der Frei Satz. Vienna: Universal Edition. published in English as Free omposition, translated and edited by E. Oster, Longman, Selkow, S. (1977). The Tree-to-Tree Editing Problem. Information processing letters, (pp ). Smith, T. & Waterman, M. (1981). Identification of ommon Molecular Subsequences. Journal of Molecular Biology, 147, Steblin, R. (1996). A History of Key haracteristics in the 18th and Early 19th enturies. New York: University of Rochester Press, second edition. Tai, K. (1979). The Tree-to-Tree orrection Problem. Journal of the Association for omputing Machinery, (pp ). Temperley, D. (1999). The ognition of Basic Musical Structures. The MIT Press. Typke, R., Veltkamp, R., & Wiering, F. (2004). Searching Notated Polyphonic Music Using Transportation Distances. In Proceedings of the 12th AM Multimedia onference (MM) (pp ). New-York, USA. Typke, R. & Walczak-Typke, A. (2008). A tunneling-vantage indexing method for non-metrics. In Proceedings of the 9th International onference on Music Information Retrieval (ISMIR) (pp ). Philadelphia, USA. Uitdenbogerd, A. (2002). Music Information Retrieval Technology. PhD thesis, RMIT University, Melbourne, Australia.

15 Julien Allali, Pascal Ferraro, Pierre Hanna, Matthias Robine & Thomas Rocher 15 Ukkonen, E., Lemström, K., & Mäkinen, V. (2003). Geometric Algorithms for Transposition Invariant ontent-based Music Retrieval. In Proceedings of the 4th International onference on Music Information Retrieval (ISMIR) (pp ). Baltimore, USA. Vogel, H. L. (2010). Financial Market Bubbles and rashes. ambridge University Press. Wagner, R. & Fisher, M. (1974). The String-to-String orrection Problem. Journal of the association for computing machinery, 21,

Toward a General Framework for Polyphonic Comparison

Fundamenta Informaticae XX (2009) 1 16 1 IOS Press Toward a General Framework for Polyphonic Comparison Julien Allali LaBRI - Université de Bordeaux 1 F-33405 Talence cedex, France julien.allali@labri.fr