Detecting Changes in Music Using Compression

Detecting Changes in Music Using Compression BA Thesis (Afstudeerscriptie) written by Arseni Storojev (born April 27th, 1987 in Kazan, Russia) under the supervision of Maarten van Someren and Sennay Ghebreab, submitted in partial fulfillment of the requirements for the degree of BA Kunstmatige Intelligentie at the Universiteit van Amsterdam.

Abstract Using compression for determining likelihood between files, based on compression distance seems to work in different fields like image classification, language classification, music classification and more. In this paper we will be exploring the possibilities of applying compression for clustering music in time and detecting the changes in music in different periods of music history. We also research if the development of the music history was linear, meaning that the differences in music pieces further from each other in time will be smaller than the differences between the pieces in the closer periods.

Contents 0.1 Introduction........................................ 2 0.1.1 Motivation.................................... 2 0.1.2 Music history................................... 2 0.1.3 Related work................................... 2 0.2 Background........................................ 3 0.2.1 Changes in music with the time......................... 3 0.2.2 Linear development & points of focus...................... 5 0.3 Method.......................................... 6 0.3.1 Applying compression to classification..................... 6 0.3.2 MIDI files and preprocessing........................... 7 0.3.3 Datasets...................................... 7 0.3.4 Implementation.................................. 9 0.4 Results........................................... 10 0.4.1 Dataset 1..................................... 10 0.4.2 Dataset 2..................................... 12 0.4.3 Dataset 3..................................... 14 0.5 Conclusion & Discussion................................. 15 0.6 Future work........................................ 16 1

0.1 Introduction 0.1.1 Motivation Compression methods like those used in popular programs like WinZIP, WinRAR, tar.gz and others seem to work good for classification of different kinds of files. The method is based on the fact that the more structure there is in a file before compression, the smaller the size of the compressed version of this file will be. The same applies to compression of 2 or more different files - more likelihood in structures of the files means the smaller size of the compressed version of these files. This makes it possible to measure the distances between files by their compressed version sizes and cluster them according to their likelihood. This approach proves to be successful in different fields of study, for example in image-, text-, music classification, plagiarism detection and more. The application of this method by [4] for music genre classification shows that the method works for musical files and opens a new field of study. Compression can, for example, be applied for approximation of authorship of music by unknown authors, classification of files in large music collections without textual input and others. 0.1.2 Music history One point of interest is in musicology. Music is known to be a very structured field - almost all the music pieces written by humans posses features like melody, pitch, tempo, volume, rhythm, tone and others. Actually, music representations like sheet music are a function of these features through the time (for instance - melody is nothing else than structured pitch changes in time). Different composers who live in the same time usually work in a small number of styles common for that period and the pieces of the same time are much more alike than the pieces of different times. People, even not familiar with music history can easily classify the pieces by genres and times of creation. The birth of new styles and genres is chronological and happens when, due to some kind of evolution, new features appear. So, the features, at least in usual music styles, should be easy to compress by the compression algorithms and thus used for classification. In this paper we research if compression can be used for clustering a set of music pieces in a chronological order based on their similarity. We will also make some secondary observations, about the changes of music features in time, the diversity of different genres and the assumption that the pieces closer to each other in time will be more similar to each other than the pieces far from each other. 0.1.3 Related work Music classification by compression has been examined by a number of scientists. [4, 3] have proposed the classification by compression in different domains like language trees, literature, SARS-virus, DNA, and especially music, using a Quartet-Tree method for visualization. The experiments with prepared MIDI-files showed to be efficient for distinguishing genres and composers. [8] applied compression on a dataset of MIDI files in 3 genres, preprocessed by low-pass filters, derivatives and average pitches to ignore transformations in music which are considered equivalent. The classification was then performed by using a standard compressor and a simple k-nearest-neighbor algorithm, with good results for classification of genres and a little worse results for classification of composers. Another work [7] was inspired on [4] and was intended to test classification by compression against other methods like Statistical Language Models and Support Vector Machines. An algorithm for approximating Kolmogorov Complexity based on Lempel-Ziv encoder was given and the method was tested using k- Nearest-Neighbor algorithm on 770 preprocessed MIDI files. The method appeared to perform better than Statistical Language Models and an effective representation of MIDI files as a time-pitch function was proposed. None of the related work was yet specifically focused on evolution of music in time. 2

0.2 Background 0.2.1 Changes in music with the time The field of musicology studies, among other aspects, the changes of music in time. There is a common standard classification of genres and styles in music history. Ignoring the Early an Medieval music, from which there are little examples to find, we decided to take the whole span of music genres in the Western tradition, starting with Baroque and ending with the modern pop music. The period we are considering is from 1500 to modern times. The rest of this section is a small description of the main music genres and their properties. We have used [6, 5] for the background information on the music history. Baroque (1600-1760) Although Baroque music is very diverse, the structure of this kind of music is very distinguishable and some common characteristic features of this period can be mentioned. The melodies of the Baroque pieces are long, continuous and uniform and have many ornaments. The mood of the pieces is constant. The rhythmic patterns are very regular and are repeated many times during the piece, often by different voices (instruments). The tempo changes are not usual, tempo stays the same during the whole pieces. The dynamics (the volume) also do not change much, partly due to the construction of the instruments of that period, for which the gradual changes in volume were impossible. Another feature of the instruments of that time that the timbres can be changed, but not gradually, so Baroque music has lots of contrasts in dynamics and timbres. One of the main features of Baroque is that the texture of most of the music of that time is polyphonic, which means that each instrument or voice has own complex melody, usually different from another instrument which leads to complex structure of the pieces. Although these voices are different, the parts of their melodies are repeating and duplicated or sometimes even continued by the by other voices. Another characteristic feature of Baroque music is the usage of figured bass (Basso Continuo), the harmonic part of the pieces, provided by a special groups of instruments. The typical types and forms of this period are: Fugue, The Chorale, Opera, the Dance Suite and more. Classical (1730-1820) In the classic period, the polyphonic texture of music is left and the pieces become homophonic, meaning there is some leading melody with accompaniment. New forms like Sonata, Concerto, String Quartet, Symphony and more appear and become widespread. The melodies of the classical period are shorter and less structured than those of the baroque music and their mood can change very often - step by step or at once, often resulting in a big contrast. The melodies are also much more accessible to ordinary people and can easily be remembered and sang, some of them are even borrowed from the popular folk music. Some typical patterns in the melodies are arpeggio s and Alberti Bass (repeating bass patterns in piano music). The rhythmical structure is very flexible - the patterns are often syncopated and are changing quickly in short times. The dynamics are also rich - the volume changes can be unexpected or gradual, but in any case not constant. The invention and introduction of new musical instruments like the piano as we know it, let the musicians control the changes in volume easily, resulting in more variation in dynamics and gradual developments. Romantic (1815-1910) The characteristics of the Romantic period are an extension of the tendencies of the Classic period. The music is getting more and more emotional and expressive and the structures of the music become even more complex. The orchestra of the Romantic period is extended by more instruments and the pieces are longer than before. The instruments get much more sophisticated, for example piano gets the pedals and a metal frame, the brass instruments got valves, etc and musicians with extraordinary playing skill appear, for example the famous violinist Paganini. For practising skills, a new form of pieces, called etude appears, among some other new common forms (e.g. Waltz, Mazurka, etc). Programme music is born, meaning that the pieces are not just beautiful sound compositions, but also tell stories. 3

The melodies are song-like, with many chromatic harmonies and discords and are recurring during the pieces. The dramatic contrasts in pitch and dynamics are usual. 20th century music and Second Viennese School(1900-1930) 20th century spans a broad variety of styles and genres in music, partly due to technological progress. Music is also getting more experimental. The traditional consonance in harmony is often replaced by dissonance. The classical 12 notes are sometimes extended by half- and quarter-tones and even smaller pitch differences. The instruments are no longer used strictly in the traditional groups like orchestra s, quartets, etc, but can be played in all possible combinations. There are also new instruments, not typical for the Western tradition, which are borrowed from other traditions or even invented by the composers. For example, percussion is taking its place in modern music and electronic instruments appear, but also the noise-like sounds from daily life like clapping hands, body sounds, sounds of kitchen equipment and more are common. The melodies are often unpredictable - the volume and pitch can be changing chaotically and the rhythmic structures can be breaking all the traditional rules, for example with different voices having different tempo s. Although the changes in music can be extreme, lot s of genres in the 20th century are still structured and possessing characteristic features. We decided to take the classic music of the beginning of this century as a class and put the other genres and styles in own categories: Impressionism, Jazz, Rock, Electronic music, and Hip-Hop. The first category we are considering in the group 20th century is the music of the Second Viennese School, where composers are using some new techniques and rules. The main characteristic feature of this like using 12 tone series instead of traditional melodies, which forms the main change of this music with respect to the Romantic period. Impressionism (1890-1940) The period of Impressionism is not only a period in music, but also in arts and other fields of culture. The impressionist artists were focused on the atmosphere, so the stories of program music as in the Romantic period are not present anymore. In contrast to the traditional music scales, the composers are using dissonance and alternative tone scales like the whole-tone scale and pentatonics. Short forms of music, like arabesque, prelude or nocturne are used, instead of big structures like symphony or sonata. The melodies and harmonies of Impressionist music are intended to imitate colors and emotions through pitch, rhythm and tone, creating complex textures. Jazz (1920-1960) Originating from the music traditions of African slaves in America, Jazz has taken a solid place in the Western music history. Although Jazz has lots of sub styles and combinations with other genres, the main features of this style can be easily mentioned. One of the main principles of Jazz is improvisation, meaning that musicians are creating music real time, and not playing pieces composed by someone else. The improvisations are based on some simple or less simple basic melodies with a solid rhythmical and harmonic structures. These structures usually remain the same and form a repetitive background for the improvisation by some main voice. The melodies are very rhythmical and syncopated, and are based on different melodic and tonal patterns than in the traditional music. The background is a repetition of certain straight rhythmical and bass line patterns. Jazz is often played by small groups of instruments, like voice, saxophone or trumpet as the main voices and double-bass and drums as the rhythm section as the basis, but in some periods is also played by a Jazz orchestra, consisting of mostly wind instruments. Rock (1960-2009) Rock has its roots in the preceding period of jazz and blues and some features of those periods is to find back in this genre. For example, the rhythm section of the rock groups is still a combination of drums and a bass instrument, typically bass guitar, with addition of a rhythm guitar for creating a solid basis for the vocals. The melodies are then performed by voice, keyboard or solo-guitar, although other instruments are also frequently used. As rock music is a mass music, intended for a broad audience, the songs are usually short and the melodies and accompaniments are relatively simple and accessible 4

by average population. Rock has a huge number of subgenres, from melodic blues-based styles to dark and aggressive heavy metal, but all of them are characterized by electric-guitar sound (guitar riff s and ostinato s), singing and the rhythmical background. The pitch and the tempo of rock music is usually constant and the dynamics are not very rich, although contrasts between soft and loud parts of music are often present. We decided to take 7 subgenres of Rock according to [1] for a closer examination, the list of these genres is to find in section 0.3.3. Electronic music (1990-2009) The electronic music is a new type of music which appeared after the invention of the electronic musical instruments, like synthesizers, samplers, drum machines, etc. Although part of the music of this type is a variation of the traditional instrumental music, the advances in technology led to totally new features and, consequently, genres. New voices/instruments and sounds, high tempo s impossible for human musicians and unlimited number of exact repetitions of the same patterns or samples led to creation of techno, synthpop, drum n bass and other styles. The features of these types of music are: monotonic repetitions of the same pattern in squares, constant tempo, usage of a fixed number of voices (drum machines/drum kits, bass lines, sometimes vocals). We will be focusing on popular electronic music, usually used for dancing in the disco s. We won t be considering electronic music as extension of classic music, where all kinds of random experiments with sound, rhythm and other parameters make it less possible to classify. The rhythmic structure of electronic pop music is usually square and solid, although sometimes syncopated, like with genre s as drum n bass. The melodies are simple if present - in lots of pieces there is no melody at all, but just a rhythmic structure based on drums and additional samples. Drums are the main component and hold the whole structure on the constant tempo, variations in tempo are rare. The dynamics are usually not present - the music is played on constant maximum volume. The tempo is the most suited for dancing, between 130-160 BPM. Hip Hop (1990-2009) Hip Hop is a genre which is characterized by a repeating sampled background and recitative lyrics as electronic music mentioned above. As the text is the main part of this music, the background can be a very simple beat or repetition of samples, taken from some other music. This means that compressing hip hop should be very simple, ignoring the text. Hip-Hop evolved much in the last 20 years, but the tempo of hip-hop in the 2000 s is slower than electronic music and is somewhere between 80-120 BPM. There is usually no melody present, except the oral recitation, but sometimes the producers are using samples from other genres, like jazz to make music more melodic. The main component of Hip-Hop is drums, rooting from the traditional African music and the harmony can be totally absent in the Hip-Hop pieces. 0.2.2 Linear development & points of focus As music is changing in time and people even not familiar with the history of music can approximate the time of creation of almost any piece with quite a good precision and the results of other research[4, 7, 8] show that compression is suitable for clustering genres, we expect compression to also be able to reconstruct the evolution of music in some representation. As described above, some of the changes in the music history are gradual, as with the development of Classic music into Romantic music, but some are more revolutionary, for example the change from polyphonic to homophonic music between the Baroque and Classic period. Another big revolutionary change is the invention of the electronic music instruments and particularly sequencers and samplers, which come back in the music in form of exact repetitions. We will focus on these 2 revolutionary developments in our experiments and expect to see them back in our visualisations (described below). We also expect the evolution of music to be sequential, meaning that an arbitrary period of music in time will have features more alike with the features of periods in the direct neighborhood of this period and less alike with the features of periods further in time. For example, the music of a Baroque composer 5

Bach will have less common features with modern techno music, than with the pieces of Romantic composers like Chopin. For this reason we expect the distances between the pieces approximated by NCD, to be bigger for pieces further in time from each other, than for pieces in the direct time neighborhood. The differences with respect to some starting point, for example the year of creation of the earliest piece we are considering, will then get bigger with time. Although the styles and genres were changing not uniformly in time, we expect the development to be linear for big periods of times, as composers are human and have limited capabilities to change music fundamentally, not based on the features of the music and their time. 0.3 Method 0.3.1 Applying compression to classification Compression distance & Kolmogorov complexity A number of papers (see section 0.1.3) state that compression can be successfully applied to different fields of problems. The idea of compression rests on the Kolmogorov Complexity. Kolmogorov Complexity means every file has a theoretical minimal number of bits to represent it losslessly. So, K(X) is an integer representing the length of the shortest binary version of X, from which X can be reproduced losslessly. K(X) is an ideal non-computable measure, which means it can be found by the best compression algorithm possible and thus not known to mankind. However, Kolmogorov Complexity can be approximated using the existing compression algorithms. In our case we will be using the existing compression programs like GZIP, BZIP, WinZIP (or any other lossless compressors) for the approximation of K(X) by the resulting file sizes. As K(X) is the minimum number of bytes needed for representing X, K(X Y ) is the minimum number of bits needed for reconstructing X by some other file Y (so, in our case representing some music file by another music file). To approximate K(X Y ) we can use the symmetry of algorithmic information theorem by Li and Vitanyi (1997), stating that K(X Y ) = K(XY ) K(Y ), where XY is a concatenation of strings X and Y. Just the sizes of the compressed versions of the files are not enough for classification purposes. We need some distance measure to represent the distances within files. A metric similar to Euclidean distance with the following properties will be used: D(a, b) = 0, ifa = b (1) D(a, b) = D(b, a) (2) D(a, b) <= D(a, c) + D(c, b) (3) The practical implementation of the metric is the Normalized Compression Distance, in the following form: NCD(x, y) = max(k(y x), K(x y)) max(k(x), K(y)) For more information on Kolmogorov Complexity and the NCD, please refer to [3]. (4) Dissimilarity matrices Given a set of files for classification, it is logical to calculate the pair-wise NCD-distances between files and put them into a dissimilarity matrix. The full dissimilarity matrix of this classification will be a N-by-N matrix, with N = number of files. This is not the most accessible notation for humans, so we need to visualize the data in some more user-friendly way. The easiest sufficient representations for a small number of files are the visualizations of the 1D or 2D projections of the dissimilarity matrices. The 1D visualization is a line with points, where the distances between the points correspond to the dissimilarities between the files. The 2D visualization is also a representation of the dissimilarities by distances in space, but in this case an extra dimension lets the close points group in the same parts of the plot, which is more handy when a bigger number of files is being analyzed. An alternative Quartet Method for visualization was proposed by Cilibrasi et al in [4] which seems to be very good 6

for clustering files with each other. All of the above methods can only serve for testing the compression methods on clustering files with each other and do not say anything about the changes in the similarities in time. That s why we need a representation which also will take time into consideration and will be suitable for tracing and visualizing the changes of the similarities in time. NCD Distances vs chronological order If, as at least in our domain of music and some other domains, the time of creation of items is known (e.g. J.S.Bach wrote Well-Tempered Clavier in 1722, a typical example of the Baroque period), the files containing these items can be placed chronologically (e.g. File 1 is a typical example of Period 3 ). We can also calculate the dissimilarity matrices of these files and scale those matrices to 1D. The 1D scaling of the dissimilarity matrix then can be plotted against the chronological time line in a 2D figure. This means that we can see the development of the files through time. Let s define such a 2D representation. Let s denote the chronological time line T to be a metric, possessing the same properties of metrics as mentioned for NCD. The values of T are representing some periods in time, for example the indication of the earliest period is T = 1. The periods are aligned chronologically. This means that the bigger T is, the further the corresponding period in time is. Denoting the time periods by points on T could be confusing, as we are taking a period of time (e.g. 1550-1700) for a point, but as we only want to know the relations between periods, we can neglect this. This is the first dimension of our representation. The second dimension is a dissimilarity scale. The 1D projection of the dissimilarity matrix preserves the relations between the files, so we can just use this matrix of NCD distances as the second dimension in our 2D visualization. 0.3.2 MIDI files and preprocessing We decided to use MIDI files for our experiments due to their small size and their formality. The MIDI files are a meta notation of music, similar to sheet music, consisting of a header chunk with the information about the piece (e.g. Instruments used) and one or more track chunks where every channel/instrument has its own track. Each track is a sequence of midi-on and midi-off events, similar to pushing and releasing the piano keys. Every event has a time stamp and a number of parameters, like pitch, velocity (volume). So, the structure of MIDI files is a almost noise-free representation of features like melodies, tempo, rhythm, number and kinds of instruments, etc. As MIDI files contain some meta information, like the titles, program used for creation, names of instruments, etc, which are not relevant for the classification, but could distort the results, we decided to remove these from the files in our datasets. Each MIDI file was converted into a text file with MIDI commando s, where after the unnecessary parameters were removed and the text file was converted back to a binary file. As we were interesting in the working of the compression algorithms on different kinds of files (e.g. Textual files are a completely different representation than binary files) and because the compressors are often based on extracting syntactic features, we decided to use all the 3 types of files for our experiments: 1. Original MIDI files as downloaded from the Internet 2. Textual dumps of these MIDI files 3. Binary files, created by removing all the irrelevant parameters from the textual dumps 0.3.3 Datasets Cilibrasi Genres set (Dataset 1) We have used 3 datasets for the experiments with changes of music in time. The first dataset is the original dataset of genres, collected by R. Cilibrasi and others for the experiments in [4]. This dataset consists of pieces in 3 genres - Classic, Jazz and Rock. As the pieces in these genres have been written in different periods (e.g. Pieces of Bach and Debussy in the Classic genre have a 2-century gap between 7

them), we decided to consider smaller clusters inside these groups, according to the time. The resulting categories are: 1. Classic: Bach (Baroque period), 4 files 2. Classic: Chopin (Classic period), 4 files 3. Classic: Debussy (Impressionism), 4 files 4. Jazz 1: earlier pieces (G. Gershwin, D. Gillespie, etc), 3 files 5. Jazz 2: later pieces (J. Coltrane, M. Davis, etc), 9 files 6. Rock 1: earlier pieces (Beatles, Jimi Hendrix, etc), 7 files 7. Rock 2: later pieces (Metallica, Police, etc), 6 files 10 Genres (Dataset 2) The second dataset is a self-collected set of MIDI files in 10 categories, described in section 0.2.1. This dataset is the main set used in the experiments. The files were downloaded from different websites with MIDI file archives, randomly taking the pieces of the most famous composers of the genres. As the files were downloaded from the different sources, were coded by different people and are different in lengths, numbers of tracks and instruments used they are suitable for our experiments of classifying music without preprocessing to test the ability of the method to work on homogenius data. This dataset consists of 333 MIDI files in 10 categories, ordered in time: 1. Baroque, 22 files 2. Classical period, 31 files 3. Romantic period, 57 files 4. Impressionism, 22 files 5. 20th century music, 17 files 6. Jazz, 20 files 7. Rock, 54 files 8. Electronic music, 36 files 9. Hip Hop, 26 files 10. Pop music, 31 files POP is a very wide genre consisting of very different pieces in the period 1970-2009 and due to the results of some test-experiments which showed very big distances between pieces in POP-genre, we decided to use 300 files in the categories 1-9 only. 7 Subgenres of Rock (Dataset 3) The last dataset was collected after some experimentation with the first 2 datasets to zoom in in one small genre with less parameters to check the performance of the method on one small genre, where the changes in music are easier to trace. Another reason is to check the hypothesis that the pieces closer to each other in time will be more similar to each other than the pieces in periods further from each other. The chosen genre is Rock, as it spans a rather short period 1960-2009 and is characterized by a standard instruments set (2 guitars, bass guitar, drums and voice) and short structured songs, so that the parameters of the MIDI files are relatively alike and only the important changes in music itself can be extracted by the compressor. This dataset consists of 134 Rock MIDI files in 7 sequential groups representing the 7 subgenres of rock in 1963-2009, according to [1]: 8

1. Blues Based Rock (1963-1970), 23 files 2. Art Rock (1966-1980), 13 files 3. Punk (1973-1980), 11 files 4. Heavy Metal (1970-1991), 22 files 5. Stadium Rock (1965-1993), 21 files 6. Alternative Rock (1980-1994), 20 files 7. Indie (1980-2007), 20 files 0.3.4 Implementation We are using CompLearn toolkit by R. Cilibrasi et al [3], for measuring pair-wise compression distances between the given files and creating the corresponding dissimilarity matrices. The compressor used is open source, patent-free bzip2 compressor. The matrices produced are (N+1)-by-N matrices, consisting of the first column containing the given file names and a N-by-N dissimilarity matrix with zeroes on diagonal, where N = number of files. Due to the working of the compression algorithm, where the order of the files given for compression matters for the resulting compressed file size, the dissimilarity matrices are not symmetrical and have slight differences in the distances between the same files. We will be ignoring these small differences and just flip the upper triangle of the matrix to the lower triangle, around the diagonal. The further processing takes place in MATLAB, where the dissimilarity matrices are loaded, scaled, analyzed and plotted. We first decided to scale the matrices to 1D and 2D to get a visualization of the distances between files, to see if clustering is successful and works for the given sets. These visualizations can easily be interpreted by humans as they correspond to the natural Euclidean distances in the daily life. The scaling was performed with the MATLAB standard function mdscale. Because some of the dissimilarity matrices, created by CompLearn, namely the clean binary versions of the MIDI-files, were not able to be scaled in the standard non-metric mdscale with stress normalized by the sum of squares of the inter-point distances, we decided to use the classical multidimensional scaling with a strain [2] parameter for scaling the matrices. The second step was plotting the 1D dissimilarity matrix against the chronological line, consisting of the periods, numbered to their position in time. For example, for one dataset, Baroque was associated with period #10, Classic was assigned #20, Romantic - #30, etc. The numbering of periods in tens was chosen not to renumber all categories in case some sub category will be inserted in the matrix. Every piece in the datasets was assigned a period number at the beginning of the file name, e.g. 10 Bach - Invention6.mid. The 1D similarity matrix was then combined with the column with file names, so that we got 2D matrix consisting the file names vs their position on the similarity scale. As MATLAB automatically converts the strings beginning with numbers, the resulting matrix was a matrix of a period numbers column and a similarities column, with 1 piece per row. The matrix was then plotted by the standard MATLAB function plot. As we expect a linear character of changes of music in time, a correlation between the time line and the similarities, thus the values in these 2 columns, was expected, meaning that for the pieces in the later periods on the chronological line, the position on the similarity scale will be higher than for the pieces in the earlier periods. The corrcoef function of MATLAB was used for finding the correlation coefficients. It is possible that music in some periods in time is much more diverse than in other periods or that there is a huge amount of noise for the files of certain periods. For these reasons we also decided to research which groups of files do have stronger correlations between each other. For example, pairs of groups like Period 1 vs Period 2, Period 1 vs Period 3 or triples like Period 3 vs Period 5 vs Period 7 were taken among all other possible combinations and subsets of all groups. This was done automatically by a self-written MATLAB function which extracted the necessary rows of our matrix and took the correlation coefficients of the matrices consisting of these rows only. Only relatively strong correlations above 0.7 and 0.6 were taken with signification level of 0.05. 9

Figure 1: Dataset 1 - MIDI (up), BIN (middle), TXT(bottom) Files Time-Similarity and 2D plots 0.4 Results The described implementation was tested on all the 3 datasets. As it is impossible to plot all the period combinations with high correlations, due to the large number of files, we decided to give the plots of the results of compression classification on the whole datasets in MIDI, TXT and clean BINARY files. For the second big dataset we have also chosen to plot some additional category combinations for illustration. The rest of the results is given as a list of groups with significant correlations. 0.4.1 Dataset 1 The classification of the first dataset has produced various results. The 2D plots in the left column of figure 1 show that the points are grouped by group numbers quite well. The groups closer to each other in time are also closer to each other in the visualization. Some groups, like group 6 also seem to be much more uniform then the other groups, resulting in their co-location in the plot. The Timeline-Similarity plots also show good clustering and the expected linear development in case of the clean BINARY files. However, the curve in the Timeline-Similarity plot for the original MIDI 10

Cilibrasi dataset MIDI Correlation for groups [1 3] = -0.98975 Correlation for groups [1 4] = 0.8759 Correlation for groups [1 5] = 0.82413 Correlation for groups [1 6] = 0.94136 Correlation for groups [1 7] = 0.86144 Correlation for groups [3 4] = 0.98035 Correlation for groups [3 5] = 0.96723 Correlation for groups [3 6] = 0.99496 Correlation for groups [3 7] = 0.98251 Correlation for groups [1 4 5] = 0.78277 Correlation for groups [1 6 7] = 0.85798 Correlation for groups [3 4 5] = 0.87001 Correlation for groups [3 5 6] = 0.80443 Correlation for groups [3 6 7] = 0.95878 Figure 2: Dataset 1 - correlations > 0.75 between combinations of time periods for MIDI files files and their TXT dumps does not show any strong linear development. The correlations between time and similarity, obtained by the experiments confirm our observations: the original MIDI files have correlation of 0.5213, their TXT dumps 0.0268 and the clean BINARY versions of files - 0.8100. The experiments with correlations of the different combinations of period groups are similar: some combinations of groups for MIDI files show (extremely) high correlations (e.g. 0.995 for groups 3 and 6), the groups of periods for TXT files show no high correlations and almost all of correlations of the combinations of groups for the BINARY files are above 0.75, see figure 2 (we show only the correlations above 0.75 due to space restrictions). Cilibrasi dataset BINARY Correlation for groups [1 5] = 0.78816 Correlation for groups [1 6] = 0.84371 Correlation for groups [1 7] = 0.85814 Correlation for groups [2 4] = 0.77143 Correlation for groups [2 6] = 0.8528 Correlation for groups [2 7] = 0.87729 Correlation for groups [3 5] = 0.75355 Correlation for groups [3 6] = 0.84849 Correlation for groups [3 7] = 0.87159 Correlation for groups [1 2 5] = 0.79127 Correlation for groups [1 2 6] = 0.84098 Correlation for groups [1 2 7] = 0.85232 Correlation for groups [1 3 5] = 0.77408 Correlation for groups [1 3 6] = 0.83117 Correlation for groups [1 3 7] = 0.84566 Correlation for groups [1 4 5] = 0.78005 Correlation for groups [1 4 6] = 0.82979 Correlation for groups [1 4 7] = 0.84246 Correlation for groups [1 5 6] = 0.7907 Correlation for groups [1 5 7] = 0.80107 Correlation for groups [1 6 7] = 0.84348 Correlation for groups [2 3 6] = 0.83486 Correlation for groups [2 3 7] = 0.85712 Correlation for groups [2 4 6] = 0.83022 Correlation for groups [2 4 7] = 0.85608 Correlation for groups [2 5 7] = 0.76724 Correlation for groups [2 6 7] = 0.82966 Correlation for groups [3 4 6] = 0.8006 Correlation for groups [3 4 7] = 0.82468 Correlation for groups [3 6 7] = 0.82609 Correlation for groups [1 2 3 5] = 0.76543 Correlation for groups [1 2 3 6] = 0.81839 Correlation for groups [1 2 3 7] = 0.83087 Correlation for groups [1 2 4 5] = 0.78802 Correlation for groups [1 2 4 6] = 0.83292 Correlation for groups [1 2 4 7] = 0.84325 Correlation for groups [1 2 5 6] = 0.80961 Correlation for groups [1 2 5 7] = 0.81831 Correlation for groups [1 2 6 7] = 0.85513 Correlation for groups [1 3 4 5] = 0.76905 Correlation for groups [1 3 4 6] = 0.81862 Correlation for groups [1 3 4 7] = 0.83077 Correlation for groups [1 3 5 6] = 0.79958 Correlation for groups [1 3 5 7] = 0.80779 Correlation for groups [1 3 6 7] = 0.8495 Correlation for groups [1 4 5 6] = 0.78125 Correlation for groups [1 4 5 7] = 0.79182 Correlation for groups [1 4 6 7] = 0.83158 Correlation for groups [1 5 6 7] = 0.78811 Correlation for groups [2 3 4 6] = 0.81487 Correlation for groups [2 3 4 7] = 0.83583 Correlation for groups [2 3 5 6] = 0.77459 Correlation for groups [2 3 5 7] = 0.79018 Correlation for groups [2 3 6 7] = 0.84582 Correlation for groups [2 4 5 7] = 0.75748 Correlation for groups [2 4 6 7] = 0.81782 Correlation for groups [3 4 6 7] = 0.7958 Correlation for groups [1 2 3 4 5] = 0.76531 Correlation for groups [1 2 3 4 6] = 0.81192 Correlation for groups [1 2 3 4 7] = 0.82283 Correlation for groups [1 2 3 5 6] = 0.80369 Correlation for groups [1 2 3 5 7] = 0.81152 Correlation for groups [1 2 3 6 7] = 0.84836 Correlation for groups [1 2 4 5 6] = 0.80229 Correlation for groups [1 2 4 5 7] = 0.81093 Correlation for groups [1 2 4 6 7] = 0.84598 Correlation for groups [1 2 5 6 7] = 0.81461 Correlation for groups [1 3 4 5 6] = 0.79 Correlation for groups [1 3 4 5 7] = 0.79795 Correlation for groups [1 3 4 6 7] = 0.83669 Correlation for groups [1 3 5 6 7] = 0.8065 Correlation for groups [1 4 5 6 7] = 0.78089 Correlation for groups [2 3 4 5 6] = 0.76167 Correlation for groups [2 3 4 5 7] = 0.7777 Correlation for groups [2 3 4 6 7] = 0.83021 Correlation for groups [2 3 5 6 7] = 0.78317 Correlation for groups [1 2 3 4 5 6] = 0.79695 Correlation for groups [1 2 3 4 5 7] = 0.80449 Correlation for groups [1 2 3 4 6 7] = 0.8392 Correlation for groups [1 2 3 5 6 7] = 0.81691 Correlation for groups [1 2 4 5 6 7] = 0.80794 Correlation for groups [1 3 4 5 6 7] = 0.7977 Correlation for groups [2 3 4 5 6 7] = 0.77321 Correlation for groups [1 2 3 4 5 6 7] = 0.80995 Figure 3: Dataset 1 - correlations > 0.75 between combinations of time periods for BINARY files 11

Figure 4: Dataset 2 - Baroque VS Hip Hop MIDI (up), BIN (middle), TXT(bottom) Files Time- Similarity and 2D plots 0.4.2 Dataset 2 As the second dataset consists of 10 groups, from which we used 9 for classification, the 2D and Timeline-Similarity plots contain 300 points, which makes it difficult to read the plots. For this reason, we decided to include the visualizations of the experiment with 2 groups, namely Baroque vs Hip-Hop, besides the visualization of the results of classification of all files in the dataset in MIDI, TXT and BI- NARY. In the right column of figure 4 we see a clear successful 2D clustering of Hip-Hop (labelled as group 90) and Baroque in 2 periods (labelled 10 and 15). The visualizations for MIDI and BINARY files are the best, showing that variations in Baroque music are much wider than in Hip-Hop, which is clustered better. The plot for the TXT files looks noisier, with most of points clustered with their groups, but other points shooting out of clusters. The Timeline-Similarity scale for the original MIDI files is the best, showing a nearly linear development, confirmed by a correlation of 0.8005. Baroque groups are less uniform than Hip-Hop and a clustered with each other, while Hip-Hop lays much further and is clustered together. The plot for the BINARY files looks similar if flipped around the horizontal line - for some internal reasons the classification is done another way around, preserving the relations between 12

Figure 5: Dataset 2 - MIDI, BINARY(up) and TXT(bottom) files Time-Similarity plots files, but leading to a declining curve. The pieces inside Baroque period lay further from each other than in the plot for original MIDI files, but the correlation is still -0.6704, showing a somewhat weaker linear development. The plot for the TXT dumps of the MIDI files looks like a bad variation of the plot for the BINARY files and is the worst with a very weak correlation of 0.2333. There are lots of differences inside the groups and more similarity between files from different groups, so the classification is not successful. NOTE: the correlations for Baroque vs Hip-Hop are not visible in figure 6, due to the fact that a special dissimilarity matrix for these 2 subgenres was created, resulting in better clustering than in the case with the full dissimilarity matrix, containing distances between all files in all genres and thus making the distances between Baroque and Hip Hop less important in the whole picture, resulting in a correlation lower than 0.75. We decided not to include the 2D visualizations of the classification of the whole dataset due to their poor readability and to focus on the Timeline-Similarity plots only (figure 5). The plots in the first row of the figure are the plots for the MIDI and BINARY files and are again similar, if flipped around the horizontal axis. The third plot for TXT files also shows the same pattern, in some different variant and the other ones if flipped upside down. The corresponding correlations are: 0.5328, -0.6170 and -0.4279. The results for the BINARY versions of files are again the best, which is also visible on the plot in figure 5. The experiments with the combinations of subperiods confirm our observations - there are no high correlations for the TXT dumps, a few high correlations for the original MIDI files (figure 6) and lots of group combinations with high correlations for the clean BINARY versions of files(figure 7). One interesting observation is that the genre of Baroque is not present in the combinations with highest correlations, probably due to large diversity between the pieces inside the genre itself. The diversity on the plots also gets less for the periods in closer to the 13

10 genres set MIDI files Correlation for groups [20 80] = 0.7542 Correlation for groups [40 70] = 0.80867 Correlation for groups [40 80] = 0.82301 Correlation for groups [40 90] = 0.78846 Correlation for groups [40 50 70] = 0.76307 Correlation for groups [40 50 80] = 0.75749 Correlation for groups [40 60 70] = 0.80234 Correlation for groups [40 60 80] = 0.8071 Correlation for groups [40 60 90] = 0.76204 Correlation for groups [40 70 80] = 0.78701 Correlation for groups [40 80 90] = 0.81705 Correlation for groups [40 50 60 70] = 0.7579 Correlation for groups [40 50 80 90] = 0.75458 Correlation for groups [40 60 70 80] = 0.77861 Correlation for groups [40 60 80 90] = 0.79472 Figure 6: Dataset 2 - correlations > 0.75 between combinations of time periods for MIDI files 10 genres set BINARY Files Correlation for groups [20 70] = -0.76508 Correlation for groups [20 80] = -0.7514 Correlation for groups [30 70] = -0.78511 Correlation for groups [30 80] = -0.76846 Correlation for groups [40 70] = -0.81823 Correlation for groups [40 80] = -0.79801 Correlation for groups [40 90] = -0.78126 Correlation for groups [20 60 70] = -0.75269 Correlation for groups [30 40 70] = -0.75449 Correlation for groups [30 50 70] = -0.76657 Correlation for groups [30 50 80] = -0.75421 Correlation for groups [30 60 70] = -0.78473 Correlation for groups [30 60 80] = -0.76688 Correlation for groups [30 70 80] = -0.77964 Correlation for groups [30 70 90] = -0.75836 Correlation for groups [30 80 90] = -0.76603 Correlation for groups [40 50 70] = -0.80551 Correlation for groups [40 50 80] = -0.78657 Correlation for groups [40 50 90] = -0.75741 Correlation for groups [40 60 70] = -0.80887 Correlation for groups [40 60 80] = -0.78048 Correlation for groups [40 70 80] = -0.78271 Correlation for groups [40 70 90] = -0.7584 Correlation for groups [40 80 90] = -0.77961 Correlation for groups [20 30 70 80] = -0.75273 Correlation for groups [30 40 60 70] = -0.76359 Correlation for groups [30 40 70 80] = -0.77855 Correlation for groups [30 40 70 90] = -0.75151 Correlation for groups [30 40 80 90] = -0.75689 Correlation for groups [30 50 60 70] = -0.76813 Correlation for groups [30 50 60 80] = -0.75387 Correlation for groups [30 50 70 80] = -0.77296 Correlation for groups [30 50 80 90] = -0.75876 Correlation for groups [30 60 70 80] = -0.77302 Correlation for groups [30 60 70 90] = -0.75009 Correlation for groups [30 60 80 90] = -0.75473 Correlation for groups [30 70 80 90] = -0.75333 Correlation for groups [40 50 60 70] = -0.79864 Correlation for groups [40 50 60 80] = -0.77263 Correlation for groups [40 50 70 80] = -0.78258 Correlation for groups [40 50 70 90] = -0.75579 Correlation for groups [40 50 80 90] = -0.77505 Correlation for groups [40 60 70 80] = -0.76911 Correlation for groups [40 60 80 90] = -0.75281 Correlation for groups [20 30 60 70 80] = -0.7511 Correlation for groups [30 40 50 60 70] = -0.75154 Correlation for groups [30 40 50 70 80] = -0.77175 Correlation for groups [30 40 60 70 80] = -0.77622 Correlation for groups [30 40 60 80 90] = -0.75233 Correlation for groups [30 40 70 80 90] = -0.76636 Correlation for groups [30 50 60 70 80] = -0.76701 Correlation for groups [30 50 70 80 90] = -0.7516 Correlation for groups [40 50 60 70 80] = -0.77028 Correlation for groups [40 50 60 80 90] = -0.75196 Correlation for groups [30 40 50 60 70 80] = -0.7699 Correlation for groups [30 40 50 70 80 90] = -0.76209 Correlation for groups [30 40 60 70 80 90] = -0.76078 Correlation for groups [30 40 50 60 70 80 90] = -0.75692 Figure 7: Dataset 2 - correlations > 0.75 between combinations of time periods for BINARY files 0.4.3 Dataset 3 The third dataset consisting of rock pieces in 7 genres is the less correlated in terms of Time and Similarity. All the plots (figure 8)- MIDI, BINARY of TXT - do not show any linear development and the groups look very similar to each other. The points in the TXT and BINARY plots even form different distinct group inside periods which are much closer to some groups in other periods than to each other. The correlations are 0.3704, 0.1859 and 0.1315 for MIDI, BINARY and TXT files respectively. The experiments with the combinations of subperiods also did not show any strong correlations - none of them are above 0.5 even in the best case with the original MIDI files. 14

Figure 8: Dataset 3 - MIDI, BINARY(up) and TXT(bottom) files Time-Similarity plots 0.5 Conclusion & Discussion The results of the experiments show that the method works successfully for clustering individual genres - pairwise or in combinations. The 2D plots confirm that in the most cases the MIDI and BINARY files are clustered good, and the TXT files show a bit worse performance. The proposed 2D Timeline-Similarity plots also seem to be effective for showing the changes in the similarity of music in time. For example, we can see that Baroque, Classical and Romantic periods from the dataset 2 have much more variation inside them, than the more primitive genres like Hip-Hop, Rock or Electronic music. This could mean that music got less varied in the time, or that the chosen genres from the modern times are relatively small subgenres of modern period compared to the big genres of the old times. So, for example, Hip-Hop is a small subgenre of the 20th century music, which will probably not be visible to people after 300 years as the small subgenres of Baroque are not visible for us. In any case we can state that our visualization can successfully represent the variations in features of certain genres. The chosen revolutional changes (change from polyphonic to homophonic music and the invention of sampler) are no really visible on all the plots, besides the fact that there is less variety in sampled music like Hip-Hop and Electronic music in the pairwise period 2D plots and the gap between Baroque and Classic music on some of the Timeline-Similarity plots (figure 5). This could be so due to the fact that there are much more parameters in music than just the texture features like polyphony, or the matter of repetitions. For example, the sampled music also uses samples with complex melodic and harmonic structures, or more instruments, so the fact that the sample is repeating could be less important for the classifier, than the structure of the repeated sample. This raises a question which features are more important for the chosen compressor. The assumption that the development of music in time is linear is partly confirmed for different combinations of periods/genres. The relationship can be seen on the Timeline-Similarity plots, and in the high correlations between combinations of periods. The correlations are the highest for the clean BINARY files and get higher if wide and rich genres are taken away. From the experiments with the 15