Visualizing the Chromatic Index of Music

Visualizing the Chromatic Index of Music Dionysios Politis, Dimitrios Margounakis, Konstantinos Mokos Multimedia Lab, Department of Informatics Aristotle University of Thessaloniki Greece {dpolitis, dmargoun}@csd.auth.gr, mokosko@otenet.gr Abstract Musical imaging is a recent trend in visualizing hidden dimensions of one-dimensional audio signals. The ascription of colors to psychoacoustic phenomena is consistent to the music perception depicted in the variety of scales and styles of ethnic music. Audio tools based on Software Engineering techniques are built for visualizing the chrominance of global music. Key-Words: - Color in Music, Psychoacoustics, Scales, Styles and Perception, Audio Tools. 1. Introduction In our previous WEDELMUSIC paper, entitled Determining the Chromatic Index of Music, a multidimensional model for musical chromatic analysis was thoroughly presented [1]. Some algorithms were developed in order to index the chrominance of a scale, as well as the chrominance of a musical piece. The peculiarities of different kinds of music were considered with respect to the distinction of these special characteristics. A colorful sequence, based on these indices, was finally produced, which was a unique and exact chromatic representation of a musical composition (see Fig. 1). The aim of this paper is to present the tool, which was developed during our research about chroma in music [1][2]. MEL-IRIS v.1.0. provides an integrated environment for chromatic analysis of MIDI and audio pieces, classification according to theirs chroma and visualization of their chromatic index in real time. MEL-IRIS derives from the words Melodic Irida. Iris is the name of the goddess Irida, as it is pronounced in the ancient Greek language. Irida is associated with colors, as she is said to be the Greek goddess of the rainbow. MEL-IRIS was mainly developed in BORLAND C++ BUILDER, while MATLAB was used for the initial processing of audio files. Figure 1. Chromatic strips produced by MEL-IRIS. 2. Goals The main goal of this research effort is to suggest a new music classification schema, based on the musical chroma. MEL-IRIS is designed for processing musical pieces from audio servers, creating a unique chromatic index for each of them and classifying them according to the chromatic index. The chromatic indices are metadata that can be utilized in a wide range of applications, e.g. MIR systems. They can serve as a musical genus identifier, or even as an artist identifier. Genuses or genres are not perceived merely as Western music predicates [3] but as concepts of ethnic music perception in diachrony [4] and synchrony [5][6]. A colorful strip can be associated with a musical piece serving as a signature and as a classifier as well. Further processing of the colorful strips could lead to a fancy real-time animation, based on the chromatic elements of a song, or even to some kind of algorithmic audio-vis ual show. Finally, a music composer can take advantage of the chromatic indices, as he can chromatically process his own musical compositions himself. Considering the primary correspondence between chromatic sequences and feelings, an artist is able to fix the desired, or even an additive, emotional value in his own musical pieces. 3. Theoretical background Next, the basic principles, which were applied into MEL-IRIS, are compendiously discussed. The

mathematical background of our application is analytically presented in our previous WEDELMUSIC paper [1]. To start with, we define as chromatic any sound with frequency irrelevant to the discrete frequencies of the scale. In proportion to the distance of the interval, that this sound creates with its neighbors (previous and next sound), we can estimate, how much chromatic this sound is. In order to come near to modes, different than the Western ones, we further subdivided the spaces, using the half-flat sign and the half-sharp sign from the Arabic music [7]. As a result of this addition, the minimum space between two notes is the quartertone. For the correspondence of a frequency (from the whole spectrum of frequencies) to a note by a personal computer to make sense, microtonal thresholds for each note were defined, using the formula: We used the fact that the note A4 corresponds to 440 Hz as a benchmark (see Table I). Note Low threshold High threshold C 5 530.86 546.415 C# 5 546.415 562.433 D 5 562.433 578.914 D 5 578.914 595.873 D 5 595.873 613.333 Table I. Oriental scales microtonal spectrum thresholds. The procedure of the chromatic analysis is serial and consists of five steps (the output at one step is the input to the next step): 1. Extraction of melodic sequence (frequencies) 2. Scale Matching 3. Segmentation 4. Calculation of chromatic values 5. Creation of color strip It is obvious that the procedure for melody isolation is non-identical for MIDI and audio files. This differentiation has lead to a special handling of.wav and.mp3 files, using sonograms analysis in MATLAB environment. The rest steps are identical for both MIDI and audio files, with an exception in step 3 (segmentation), where audio files may again be treated optionally in a special way. In step 2, a simple algorithm is used in order to extract the dominant frequencies of a melody and, according to the spaces they create among them, a scale matches the musical piece. The selected scale is the one, the spaces of which better approximate the spaces of the dominant (1) frequencies. Essentially, the scale that yields the minimum error rate (calculated from the absolute values of the differences between the several combinations of spaces) is chosen. The standard segmentation method, which is used in the application, is a modified version of the Cambouropoulos-Widmer algorithm [8]. Heuristics are also used. The algorithm is constrained by some rules, e.g. - IF (time>1 sec) AND (NO sound is played) THEN split the segment at exactly the middle of the silence time - Segments that contain less than 10 notes are not allowed, etc. The alternative for audio files (if the user does not want to use the standard method) is the automated segmentation, which is based on the features of the particular wavelet. The initial chroma of a musical piece (cº) is the chroma c of the chosen scale. According to the sequence of frequencies, which was the output of the first step, each space affects the current cvalue (possible increment or reduction), creating this way a continuous chromatic c value of the musical piece (see Fig.2). 1,75 1,65 1,55 1,45 1,35 1,25 FAIRUZ: Ya Zambaa (WAV) 1 35 69 103 137 171 205 239 SONG 16 Figure 2. A c time diagram of an audio file. The amount of c values is equal to the amount of the notes, which comprise the melody. These c values produce the final colorful strip. This chromatic visualization (see Fig. 1) consists of boxes that represent the segments. Each box represents a segment. The length of a box is proportional to the duration of the segment it represents. This results in the real-time lengthways creation of the chromatic strip. The basic color of a segment is the average <c> of the c values that correspond to all the notes of the particular segment. As the creation of a box comes near to the end, the basic color changes in order to achieve a smooth transition to the basic color of the next segment. A 12-grade color scale was designed to correspond c values to colors [9]. On the 2

following Table II colors are ranged in chromatical order, beginning from white and ending to black [10]. c Color 1 White 1.1 Sky Blue / Turqoise 1.2 Green 1.3 Yellow / Gold 1.4 Orange 1.5 Red 1.6 Pink 1.7 Blue / Royal Blue 1.8 Purple 1.9 Brown 2 Gray 2.1 Black Table II. Colors c values correspondence. The actual color of each segment is characterized from the combination of the R G B variables (Red Green Blue) [11]. The values of R G B are calculated from the functions on the following Table III, given the average <c>. c (input) R-G-B (output) c <= 1 R = G = B = 255 1 < c <= 1.1 R = 2550 c + 2805 G = B = 255 1.1 < c <= 1.2 R = 0 G = 255 B = 2550 c + 3060 1.2 < c <= 1.3 R = 2550 c 3060 G = 255 B = 0 1.3 < c <= 1.4 R = 255 G = 1270 c + 1906 B = 0 1.4 < c <= 1.5 R = 255 G = 1280 c + 1920 B = 0 1.5 < c <= 1.6 R = 255 G = 0 B = 2550 c 3825 1.6 < c <= 1.7 R = 2550 c + 4335 G = 0 B = 255 1.7 < c <= 1.8 R = 1280 c 2176 G =0 B = 1270 c + 2414 1.8 < c <= 1.9 R = 128 G = 0 B = 1280 c + 2432 1.9 < c <= 2 R = 128 G = B = 1280 c 2432 2 < c <= 2.1 R = G = B = 1280 c + 2688 c > 2.1 R = G = B =0 Table III. Calculation of R G B variables. 4. MEL-IRIS v.1.0. : A short description of the Audio Tool The MEL-IRIS project is programmed using Borland C++ Builder 6 compiler and uses the Paradox database. It can work under any Microsoft Window operating system and is fully functional both on stand alone systems and on networks, where users have the right to share, view, edit and search over existed records on the same database as long as a Borland Database Engine is installed. It supports internal multi-windowing viewing and requires the existence of Microsoft Media Player for the playback of audio files. 4.1. Frequency and segment extraction Opening the audio file an internal automated editor for MIDI files or sonogram analyzer for other audio files is triggered that separates melody according to the file format it corresponds. In MIDI files for example notes and frequencies are represented as events of binary code in the file (see Fig. 3) and are converted into real-world representations such as notes, delta time, velocity and other essential information such as tempo and time signature which help us to estimate the exact time of each note (see Fig. 4). Figure 3. The binary representation of a MIDI file. During this step possible segments of audio file and their time in milliseconds are calculated using a modified algorithm derived from the Cambouropoulos-Widmer clustering algorithm [8]. Finally the user has the opportunity to save notes, frequencies and segments in text files for further examination and analysis also essential for the other parts (file conv.txt contains the frequencies for each note, 3

segments.txt the number of notes for each segment and times.txt the partial-segmented times in milliseconds of the audio file). Byzantine music that each of them has a unique value chroma (see Fig. 6). The latter, inserts the song into five categories that show how chromatic a song is, depending on the selected scale chroma from scale match (see Fig. 7). The five categories are: Very Low Chromatic ( scale chroma <= 1.3). Low Chromatic (1.3 < scale chroma <= 1.6). Medium Chromatic (1.6 < scale chroma <= 1.9). High Chromatic (1.9 < scale chroma <= 2.2). Very High Chromatic ( scale chroma > 2.2). Figure 4. The real-world representation of a MIDI file. The attributes that are kept for further use for each song name are chroma, tone, name and origin (see Fig. 8). By these attributes along with the scale distribution and the segment file a sample for the visual representation of audio file is created (file deigma.txt is the partial-segmentated sample of the audio file consisting of time value in milliseconds, chroma value and brightness value for each segment). 4.2. Chroma extraction In this part we use the files we created from Frequency and segment extraction. Opening these files we automatically see the scale distribution of the audio file based on our scale algorithm along with a prompt to name the song in order to keep track of our file system and to use it on our chromatic categorization. Scale distribution consists of seven values, which are unique for every audio file (see Fig. 5). Figure 6. Scale Bank. Figure 5. Scale distributions. At this point the user first runs scale match and then index chroma (see Fig. 5). The former, automatically finds the best suitable scale, mode and chroma taking into consideration our Scale Bank - a database containing scales and modes taken from Western, Balkan, Arabic, Oriental, Ancient Greek and Figure 7. Song classification. 4

Figure 8. Scale Attributes. Figure 9. Chromatic strips. CPU tick). The refresh rate of the chromatic strip is equal to one millisecond for better visualization. Two pixels of our strip are colored for about every second that passes using a step method. According to the chroma value and brightness value of each segment, which is taken from our sample file, a RGB color is being chosen for every pixel using a mapping-color algorithm (see Fig. 9). At the end of each segment a black pixel is created (see Fig. 9) which shows the end of the segment. The user can see the exact time in milliseconds that the segment ended while the audio file plays (see Fig. 10) and also pause or resume the process for further examination of the visualization. Finally all chromatic strips are saved in a personal database based on the name of the song in order to keep track with our experiments of the visual representation of our audio files. 4.4. Audio files processing A special process for the extraction of the sequence of frequencies from audio voice recordings is required, as mentioned before. Therefore, MEL-IRIS provides a special interface (using MATLAB), in order to produce sonograms. The melodic sequence can thus be extracted from the sonograms. The following figure shows the interface, which is discussed here. Figure 10. Partial-segmentation time. 4.3. Visual representation In this part the user selects the audio file he wants to play. While he hears to the music a chromatic strip is filled. The exact time the play starts our coloring procedure begins. The coloring of the strip is synchronized to the playback of the audio file because we use internal CPU time to calculate both the coloring delay, which is taken from our sample file created on Chroma extraction and the audio file delay (every millisecond is converted to a Figure 11. The audio-processing interface. As it can be seen, the user can change the parameters on Windows menu and FFT menu (Sample Frequency, Frequency Limit, Window Size, Window, FFT Size, FFT overlap) that are used in spectrum analysis. Figure 12 shows the default values of MEL-IRIS. MEL-IRIS also offers the choice of automatic segmentation of an audio file, based on the attributes of its wavelet. The user is allowed to use this automatic method, modify it, or even split the piece up manually, by 5

defining the beginning and the end of each segment, according to his acoustic perception. After the segmentation of the audio file is done, we can create the sonogram and the sequence of frequencies of a particular segment (see Fig. 13), using the Spectrum Analysis button from the Action Menu (see Fig 12). The sequence of frequencies is produced with sampling of frequencies that bear the highest volume (the darkest peaks on the 3-d graph) at a specific time. We use very short pre-defined time intervals for the sampling. Clicking on the Spectogram button, the user is given the option to view another more flexible view of the same sonogram, which can be processed in several ways (see Figure 14). Figure 13. The array editor (extracted frequencies). Figure 14. MEL-IRIS spectograms. The results of spectrum analysis are automatically used as the input of step 2, and the serial procedure continues as described before. 5. Observations Figure 12. The Main Menu. We tested a very large amount of musical pieces in MEL-IRIS and the results were more than interesting? they were also encouraging for the consecution of our research. To begin with, one observation is that the classification that aroused from Scale Index gave a chromatic dimension to the way of music perception [9]. The happy and shiny songs were categorized together, while the sad and melancholic pieces fell into another category. Moreover, the heavy and strange (for the western musicians) hearings came under another category. Apart from the similarities in hearings, we also observed that the chromatic strips of the songs in the same category appeared to be quite similar in colors and/or the melody evolution, e.g. Chant Sacris del Orient and Salmos para el 3er Milenio por Soueur Marie Keyrouz in the very high chromatic category. Finally, an important observation is that the distinction between audio files and MIDI files can very easily be done from the chromatic strips. This stems from the fact that the 6

freedom in melody motion and the capability of using the whole spectrum of frequencies in audio recordings gives greater chromatic fluctuation, in contrary to MIDI files, where this freedom is limited. However, it is also possible to achieve chromatic variance in MIDI files using pitch bend (pitch wheel). 6. Future work Our aim is to continue our research and improve the capabilities of MEL-IRIS with new elements enrichment. On of them is the multi-channel chromatic process of music for MIDI files, where multiple strips would depict the chroma of a musical piece (one strip for every channel, not only for the melody). These strips would be mixed together or separated, according to the evolution of the musical composition. Another goal, upon which we already work, is the creation of music from colors, which is the reverse of what we have presented. It is about an algorithmic composer that will be able to create new music and mix already recorded musical patterns, which will be stored in a dynamic database. The start-point of this process of musical synthesis will be the user s choices from the chromatic palette. Finally, one of our future plans is the design of a unique interface, suitable for chromatic emotional synthesis. It will be suitable for composers and singers, in order to change the chroma of their musical pieces at will. [8] Cambouropoulos, E., Widmer, G., Automatic motivic analysis via melodic clustering, Journal of New Music Research, 29 (4), 2000, pp. 303-317. [9] Juslin, P.: Communicating Emotion in Music Performance: A Review and Theoretical Framework in Juslin, P. & Sloboda, J. (eds.), Music and Emotion: Theory and Research, Oxford University Press, 2001. [10] Chamoudopoulos D., Music and Chroma, The Arts of Sound, Papagregoriou Nakas, Greece, 1997, pp. 249-253. [11] Fels, S., Nishimoto, K. and Mase, K., MusiKalscope: A Graphical Musical Instrument, in IEEE Multimedia Magazine, Vol. 5, No.3, July-September 1998, pp. 26-35. 7. References [1] Politis, D., Margounakis, D., Determining the Chromatic Index of Music, Proceedings, 3 rd WEDELMUSIC Conference, September, 15-17, 2003. [2] Politis, D., Margounakis, D., In Search for Chroma in Music, Proceedings, 7 th WSEAS International Multi Conference on Circuits, Systems, Communication and Computers CSCC2003, Corfu, July 7-10, 2003. [3] Tzanetakis, G., Cook, P., Musical Genre Classification of Audio Signals, IEEE Transactions on Speech and Audio Processing, 10(5), July 2002. [4] West, M.L, Ancient Greek Music, Oxford University Press, 1994. [5] Burns, E., Intervals, Scales and Tuning, in Deutch, D. (Ed.), The Psychology of Music, 2 nd edition, Academic Press, London 1999. [6] Shepard, R., Pitch Perception and Measurement, in Cook, P. (Ed.), Music, Cognition and Computerized Sound, MIT Press, Cambridge, Massachusetts, 1999. [7] Giannelos, D., La Musique Byzantine, L Harmattan, 1996, pp. 63-75. 7