INVESTIGATING KEY DETECTION TO FACILITATE HARMONIC MIXING

Size: px

Start display at page:

Download "INVESTIGATING KEY DETECTION TO FACILITATE HARMONIC MIXING"

Stuart Fields
5 years ago
Views:

1 INVESTIGATING KEY DETECTION TO FACILITATE HARMONIC MIXING Jurgen Cuschieri Department of Computer Information Systems University of Malta May 2015 Submitted in Partial fulfilment for the degree of Bachelor of Science (Honours) in IT at the University of Malta

2 University of Malta Library Electronic Thesis & Dissertations (ETD) Repository The copyright of this thesis/dissertation belongs to the author. The author s rights in respect of this work are as defined by the Copyright Act (Chapter 415) of the Laws of Malta or as modified by any successive legislation. Users may access this full-text thesis/dissertation and can make use of the information contained in accordance with the Copyright Act provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.

3 FACULTY OF INFORMATION AND COMMUNICATION TECHNOLOGY Declaration Plagiarism is defined as the unacknowledged use, as one's own, of work of another person, whether or not such work has been published, and as may be further elaborated in Faculty or University guidelines" (University Assessment Regulations, 2009, Regulation 39 (b)(i), University of Malta). I / We*, the undersigned, declare that the [assignment / Assigned Practical Task report / Final Year Project report] submitted is my / our* work, except where acknowledged and referenced. I / We* understand that the penalties for committing a breach of the regulations include loss of marks; cancellation of examination results; enforced suspension of studies; or expulsion from the degree programme. Work submitted without this signed declaration will not be corrected, and will be given zero marks. * Delete as appropriate. (N. B. If the assignment is meant to be submitted anonymously, please sign this form and submit it to the Departmental Officer separately from the assignment). Student Name Signature Student Name Signature Student Name Signature Student Name Signature Course Code Title of work submitted Date i

4 Submission Size Submission size is the maximum allowable size of a submitted Final Year Project report measured as the number of pages or as the number of words, or both. I, the undersigned, submit this FYP report with full knowledge that the maximum allowable number of pages, excluding report preamble (title page/s, abstract, acknowledgments, etc.), table of contents, any appendices and annexes, bibliography and reference list, is 60 (sixty) pages. Bibliography and reference list can have a maximum length of 15 (fifteen) pages, additional to the 60 (sixty) pages. I am aware that any case to include pages exceeding the above-mentioned limits must be made through my supervisor at least one month prior to the date of my submission. Failing this, I understand that I will be given a maximum of 3 (three) working days in which to submit changes. I also understand that if I do not submit these changes on time, a 10 (ten) percent penalty will be applied to my final awarded mark. I also understand that if my final submission does not conform to the above-mentioned limits, it will not be accepted for marking. I, the undersigned, also declare that I am aware of the FYP submission guidelines as listed on the Faculty of ICT web-site: data/assets/pdf_file/0017/208160/final_year_project_harmonisation_gui delines.pdf, and that my work conforms to these guidelines. Work submitted without this signed declaration will not be corrected, and will be given 0 (zero) marks. Student s full name Study-unit code Date of submission Title of submitted work (two above lines available) Student s signature ii

5 Abstract Music track mixing is the process that DJs undertake to transit from one track to another, whilst maintaining a continuous play. Next track decisions are generally based on two musical characteristics: the tempo and the key. A track will synchronise with another, if both tracks are at the same tempo. However, even if two tracks are synchronised, the effect they create is not necessarily audibly pleasing. Essentially, two tracks will mix well if they reside in the same or compatible keys. Harmonic Mixing is a solution to this problem. The primary aim of this dissertation is to investigate the different techniques that lead to the extraction of the key from a song in audio file formats. The dissertation is also accompanied by a proof of concept application that aids DJs and musicians in achieving harmonic mixing by automating the processes of key and tempo detection and recommending a choice tracks that are in the same or compatible keys. The key detection algorithm converts the audio signal to the frequency domain using the Short Time Fourier Transform. The frequencies in Hz are mapped to Pitch Class Profile bins, resulting in a 12 dimensional chroma vector. Each dimension represents intensities of each semi-tone class (chroma) along time. Correlation of the chroma vectors with a set of pre-defined binary templates representing the 24 possible keys is then performed, generating a correlation co-efficient for each possible key. Finally, a fair weighting system is used to extract the most probable key. Furthermore, various recommender tools are presented. Next track candidates can either be harmonically compatible or within a user specified tempo range boundary. The application also offers whole mix sequence recommendations corresponding to a set of user specified criteria, including a starting track and its position and the accepted tempo difference between successive tracks. The testing stage ensures that the parameters which provide the best results are used in the final key detection algorithm. This step contributes positively to the final accuracy rate. In fact, in the evaluation, an average accuracy rate of 80% was recorded, when testing the key detection on test sets of classical and dance music. When testing for the correctness using a black-box approach, the recommender functions returned the expected and correct combinations. iii

6 Acknowledgements I would like to dedicate this dissertation to my family, especially my parents, who have been the rock on which I have built my whole academic experience. From primary school till university, I have always found their help and support in every time of need. I would like to express my sincere gratitude to my supervisor, Dr. Joseph Vella, whose experience proved truly valuable throughout the writing of this dissertation. I would also like to take the opportunity to thank the lecturers and staff at the University of Malta Faculty of ICT for their help throughout these last three years. I would also like to show my appreciation to all of my friends and loved ones, particularly: Jake, who has been my DJ ing partner for more than 5 years; Gabriel, my study companion throughout these 3 years; Matteo, for helping me in questions regarding Musical Theory; Christine, for the long friendship and constant help and support; and my girlfriend, Veronique, whose support throughout difficult times was greatly needed and duly appreciated. Last but not least, this dissertation is for the thing that kept me going and helped me overcome every obstacle in my life.. for the love of music and DJ ing. iv

7 Table of Contents Abstract... iii Acknowledgements... iv Table of Contents... v List of Figures... viii List of Tables... ix 1 Introduction Literature Review Properties and Concepts of Audio Frequency Sample Rate Tempo Monophonic Sound Stereophonic Sound Background of Musical Theory and Terminology Pitch Octave Chord Scale Key Pitch Class Chroma Harmonics Common Audio Formats WAV MP Background of DJ ing and Harmonic Mixing History of DJ ing A DJ Setup Harmonic Mixing Correlation Fourier Analysis Fast Fourier Transform Short Time Fourier Transform Key Detection Algorithms Human Auditory Perception and Musical Cognition Models Chord Segmentation and Recognition using EM Trained HMMs Pattern Matching Techniques Beat Detection Algorithms Case Studies on DJ Hardware and Applications Rekordbox Beatport Pioneer CDJ Case Studies on Development Tools Matlab...16 v

8 PostgreSQL Requirements Managing the Music Collection Playlists Audio Player Key Analysis Tempo Detection Recommending Harmonically Matching tracks Recommending Candidates within a BPM range Whole Mix Recommendations Specification and Design Audio Functions Add Track Search / Delete / Modify Tracks Creating Playlists Key Detection Beat Detection Recommender Functions Getting Candidates by Key and BPM Getting a whole mix recommendation Front End - Graphical User Interface Design The Database Schema Implementation Software Audio Functions Add Track Create Playlist Key Detection Recommender Tools Generating Candidates by Key and BPM Mix Recommendation by Key Testing and Evaluation Parameter Testing on Key Detection Algorithm FFT Length Hopsize Downsampling Quantitative Evaluation on Key Detection Algorithm Accuracy Test on Single Chords and Chord Progressions Accuracy Test on Classical Music Accuracy Test on Dance Music Qualitative Evaluation Performance Evaluation Key Detection Performance Evaluation Mix Recommendation Get Candidates By Key / BPM Evaluation Whole Mix Recommendations Evaluation Summary of Evaluation...57 vi

9 7 Future Work Conclusions Bibliography General References...60 Web Pages References...62 Appendix A Harmonically Compatible Examples...64 Non Harmonically Compatible Examples...64 vii

10 List of Figures Figure 1- Turntable DJ Setup Figure 2 - Camelot Wheel Figure 3 - Time Frequency Spectrogram Representation Figure 4 - Seperation of Concerns between Data Access and Presentation Layers Figure 5 - Visual Representation of the modules of the system Figure 6 - Data Flow Diagram of the General System Figure 7 - Activity Diagram of the General System Figure 8- Data Flow Diagram of the Create Playlist function Figure 9 - Activity Diagram of the getcandidatesbykey function Figure 10- Early Proto-type of the Graphical User Interface Figure 11 - Entity Relationship Diagram of the system Figure 12 - Activity Diagram of the Key Detection Algorithm Figure 13 - Spectrogram representation of the A-Major Chord Figure 14 - Chromagram representation of the A-Major Chord Figure 15 - Evaluation Results when generating Candidates by Key Figure 16 - Evaluation Results when generating Candidates within a BPM range Figure 17 - Snippet of the playlist contents before conducting the test Figure 18 - Evaluation Results when generating Candidates by Key from a playlist.. 53 Figure 19 - Evaluation Results for Mix Recommendations Scenario Figure 20 - Evaluation Results for Mix Recommendations Scenario Figure 21 - Evaluation Results for Mix Recommendations Scenario Figure 22 - Evaluation Results for Mix Recommendations Scenario Figure 23 - Evaluation Results for Mix Recommendations Scenario 5 Take Figure 24 - Evaluation Results for Mix Recommendations Scenario 5 Take Figure 25 - Evaluation Results for Mix Recommendations Scenario 5 Take Turntable DJ Setup. [Online]. Available: [Accessed: May. 3, 2015]. 2 Camelot Wheel. [Online]. Available: [Accessed: May. 3, 2015]. 3 Spectrogram. [Online]. Available: [Accessed: May. 3, 2015]. viii

11 List of Tables Table 1 - Database Schema of MusicCollection Table 2 - Database Schema of Playlist Table 3 - Database Schema of MusicCollection_Playlist Table 4 - The Data in MusicCollection Playlist after creating a playlist Table 5 - Key Detection Parameter Testing Results for FFT Length Table 6 - Key Detection Parameter Testing Results for Hop Size Table 7 - Key Detection Parameter Testing Results for Down Sampling Table 8 - Key Detection Accuracy Rate Evaluation on Classical Music Table 9 - Key Detection Accuracy Rate Evaluation on Dance Music Table 10 - Key Detection Performance Rate Evaluation Table 11 - Mix Recommendation Performance Rate Evaluation Table 12 - Whole Mix Recommendations Evaluation Scenarios Table 13 - Harmonically Compatible Pairs Table 14 - Non Harmonically Compatible Pairs ix

12 1 Introduction The term DJ 4 refers to a person who mixes a sequence of audio tracks to an audience at an event or on radio broadcasts. The term mixing refers to the process that the DJ undertakes to transit from one track to another whilst keeping a constant flow and achieving a pleasurable audible effect. Two tracks can be mixed together and be perfectly synchronized beat-to-beat and still sound off. The reason for this is likely to be that the keys of both tracks are incompatible, which causes the melodies of the two tracks to clash, in turn creating an unpleasant effect. Harmonic mixing is a solution to this problem. Harmonic Mixing is a technique used by professional DJs. It is based on choosing audio tracks that are in compatible-keys. The term harmonic therefore refers to a mash-up that sounds pleasurable to the human ear. A track mixes well with another track that is in the same key or one of another three keys. These three are called: Perfect Fourth, Perfect Fifth and Relative Minor if the given track s key is a major or Relative Major if the given track s key is a minor 5. A person who can detect the key of a musical piece by ear is said to have Perfect Pitch, which is rather uncommon. In fact, Perfect Pitch is not something that can be achieved through musical training (Levitin, 2008). In practice a DJ has to know the key of every track in the collection to achieve harmonic mixing. Even if the DJ has perfect pitch, the task to scan through thousands of possibilities and determine which ones would match the track currently being played is too tedious and time consuming. Most often the transition time between one track and another is as little as two minutes and in this time the DJ would also have to take care of other factors, such as beat-matching and track transitioning. The aim of this dissertation is to explore the possibility of developing a system that incorporates key detection and harmonic mixing. The findings through research are supported with an artifact in the form of a computer application. The system takes a track, analyses its details and stores its data in the database. The application, being a DJ tool, offers various functions that correspond to a set of requirements. These functions 4 DJ is an acronym for Disk Jockey 5 Mixing Harmonically. [Online]. Available: [Accessed: Dec. 10, 2015]. 1

13 include key detection, recommendations of harmonically compatible tracks, and whole mix sequence recommendations. The term tempo in itself means time and when it is used to describe a musical piece, it refers to the rate at which the music progresses (Pilhofer & Day, 2007). Extracting the tempo from an audio track is another interesting area of research in audio analysis. However, given the time frame and the main objective of this project, implementing a tempo detection algorithm from scratch was not a possible option. Consequently, various algorithms described in literature were analysed and a suitable open-source option was included in the application. By having a value for the BPM 6 of a track at hand, next track decisions can be extended to consider options that are not only harmonically compatible but also within a user specified BPM range. Moreover, it is pertinent to note that in the scene of audio analysis, key detection is not considered a straightforward task. Henderson (Henderson, 2013) states that commercial applications do not exceed an accuracy of 80%. This highlights the computational difficulty to achieve a high accuracy, especially, given a strict time frame and limited resources. 6 BPM is an acronym for Beat Per Minute and is commonly used to describe tempo 2

14 2 Literature Review 2.1 Properties and Concepts of Audio This section firstly outlines a brief introduction to terminologies, properties and aspects of sound. Sound can be described as air-travelling vibrations caused by the reverberations of objects. The sound waves are spread outward in several directions from these objects. Sound waves travel through the air, where they are reflected off different objects, creating more changes in the molecules found in the surrounding air. These changes and moving of air molecules will cause vibrations against the human s eardrum. These vibrations are processed as sound by the human brain (Mahaney, 2007) Frequency Frequency can be described as the number of wave cycles that occur in a second. It is the property of sound that most determines pitch (Pilhofer & Day, 2007) Sample Rate The sample rate is the number of samples of audio carried per second, measured in Hz or khz. The range of human hearing is said to be between 20Hz to 20kHz (Rosen, 2011). When a piece of audio is being created for audible frequency ranges, the audio waveforms are typically sampled at 44.1 khz, 48 khz, 88.2 khz, or 96 khz (Self, 2012). The greater the number of samples per second, the better the quality of the audio will be Tempo The term beat is the basic unit of time or the pulse of music (Witlich, 1965). Tempo refers to the speed and rate or pace of the beat: the speed at which music progresses (Pilhofer, Day, 2007). The abbreviation (BPM) refers to the number of beats per minute in an audio track. It is a commonly used term to refer to tempo Monophonic Sound Monophonic sound refers to a method of sound production achieved by the use of one audio channel. It is the most basic form of sound output. Monophonic sound usually comes from one path, such as a single microphone or a single loudspeaker (Self, 2012). 3

15 2.1.5 Stereophonic Sound The invention of stereophonic sound was mainly to create the impression of sound coming from various directions. It creates an illusion of directionality and audible perspective 7. This is usually achieved by having a configuration of a set of stereo headphones or two or more speakers 2.2 Background of Musical Theory and Terminology This chapter highlights a background of how musical theory is put into context, alongside a list of terminologies, that are relevant to the topic of this dissertation Pitch Pitch is an auditory sensation in which a listener uses the frequency of a vibration to match the musical tone to the relative position on a musical scale (Plack & Oxenham, 2005). Although pitch may be quantified as a frequency, pitch and frequency are not the same thing. Pitch is not a purely objective physical property but rather a subjective attribute of sound (Hartmann, 1997) Octave An octave can be described as the interval between one pitch and another with half or double its frequency. Cooper (Cooper, 1973) called this relationship a natural phenomenon that has been referred to as the basic miracle of music. The use of the octave is very common in various musical systems Chord A chord is a harmonic unit with at least three different tones sounding simultaneously (Benward & Saker, 2003). In simpler terms, a chord can be described as a combination of three or more pitches sounding at the same time Scale In music a scale is a series of notes in a specific, consecutive order. It is a common practice that all parts of a musical piece including melody and/or harmony are built using notes from a single scale (Benward & Saker, 2003). 7 Stereophonic. [Online]. Available: [Accessed: Dec. 12, 2014]. 4

16 The distance between one note and the successive note is called a scale step. The most common and well-known types of scales are the major and minor scales Key In music, the key identifies the tonal centre of a song. The tonal centre is a note around which the whole song revolves. To sound right, every note in a musical piece has to gravitate towards the tonal centre. For example, if a song is in the key of C, then every note in the song gravitates towards C. In simpler terms, the key describes the specific pattern of notes that governs a song. Apel (Apel, 1969) discusses how key and scale, although bearing similarities, are not the same thing. He describes a scale as an ordered set of notes typically used in a key and the key of a musical piece as the centre of gravity, established by particular chord progressions. Major and minor are two frequently used terms when describing a piece of music with regards to its key. These two terms can also describe a section, scale, chord, or interval. Kamien (Kamien, 2008) states that the crucial difference between Major and Minor scales is that in the minor scales there is only a half-step between the second and third tones, whereas in the major scales, both the differences between the third and fourth notes and between the seventh and eight note are half-steps Pitch Class A Pitch Class is described as the set of all pitches that are a whole number of octaves apart. Randel (Randel, 2013) uses the example of the Pitch Class C and explains how it represents all the possible C pitches, irrespective of octave position Chroma Tymoczko (Tymoczko, 2011) explains how psychologists refer to the quality of a pitch as its chroma. A chroma is an attribute of pitches. Muller (Muller, 2007) relates the terms Pitch Class and Chroma by stating a Pitch Class as being the set of all pitches sharing the same chroma. The Pitch Class Profile (PCP), first proposed by Fujishima, is based on the same idea as the chroma vector, in which the Fourier transform intensities are mapped to the twelve semitone pitch classes corresponding to musical notes (Fujishima, 1999). 5

17 The chromagram is a chroma based feature, a variation on time-frequency distributions, which represents the spectral energy at each of the 12 pitch classes (Yu et. al, 2010) Harmonics The term harmonic in music refers to notes that are played in a certain way that is pleasing to the ears. A harmonic series specifically refers to a series of numbers related by whole-number ratios. For example, the series of frequencies 1000, 2000, 3000, 4000, 5000, 6000, etc., given in Hertz (Hz.), is a harmonic series; so is the series 500, 1000, 1500, 2000, 2500, 3000, etc. Notice that the difference in frequency between adjacent members of both series is constant, that is to say, the harmonics are equally- spaced - (Bain, 2003). In such a series, the lowest instance of frequency is said to be the fundamental and its multiples are referred to overtones, harmonics or partials. When referring to signal processing, the harmonic of a wave is a component frequency of the signal that is an integer multiple of the fundamental frequency. 2.3 Common Audio Formats The two most common audio formats are Waveform Audio File Format (Wav) and MPEG-2 Audio Layer III (MP3) WAV Waveform Audio File Format is a multi-media file format for storing digital audio data. It is a file format standard for storing an audio bit stream on PCs. It supports a variety of bit resolutions, sample rates, and channels of audio. The WAV file is an instance of the Resource Interchange File Format (RIFF) defined by IBM and Microsoft. A RIFF 8 file is a tagged file format MP3 MPEG-2 Audio Layer III or more commonly known as MP3 is an audio coding format which uses lossy data compression. It is intended to greatly reduce the amount of data required to represent the audio file whilst still sounding adequate to most listeners. It is a very popular method of compressing audio with a low quality loss. The compressed 8 Microsoft Riff. [Online]. Available: [Accessed: Dec. 12, 2014]. 6

18 audio is split into several small data frames, each containing a frame header and compressed audio data (Nilsson, 2000). 2.4 Background of DJ ing and Harmonic Mixing This section will examine how the work of DJ'ing is tackled. Furthermore, harmonic mixing will also be discussed in detail History of DJ ing The acronym DJ stands for disc jockey. The name comes from a combination of disc referring to the record and jockey referring to the person who operates the machine 9. DJs are not an invention of the twenty first century. In fact the first DJ to ever perform was back in More than a hundred years ago, Reginald Fessenden was the first DJ when he participated in the first ever radio broadcast in history (Brewster, 2006). The term Disc Jockey did not appear publicly prior to the 1940 s. In fact, it was in 1941 that it appeared in print in Variety (Fisher, 2007). Two years later, Jimmy Savile organised the first ever DJ dance party in Otley, England. In 1947, he claimed to have become the first ever DJ to use two turntables to achieve a continuous play, and started working as a radio DJ at Radio Luxembourg eleven years later (Mc Callum & Steele, 1983). More than thirty years later, as of 2015, DJ Mag 10 claims Hardwell to be the world s number one DJ A DJ Setup It is important to observe how the necessary equipment for DJ'ing is set up. The conventional DJ set up consists of two turntables or CD players, which offer DJ features. These are referred to as CDJs. Additionally, a mixer, a pair of headphones, a set of monitor speakers and a sound system complete the DJ setup. 9 DJ. [Online]. Available: Accessed [May. 4, 2015]. 10 Top 100 DJs, [Online]. Available: [Accessed: May. 25, 2015]. 7

In the Figure 1, one is presented with two turntables on the side and a mixer in between. Nowadays, DJ CD players are more commonly used and have, more or less, replaced the traditional turntables.

19 In the Figure 1, one is presented with two turntables on the side and a mixer in between. Nowadays, DJ CD players are more commonly used and have, more or less, replaced the traditional turntables. Figure 1- Turntable DJ Setup The turntables/cdjs are devices that act as the source for audio playback. The mixer is an efficient device for smoothing transitions from one track to another. Moreover, the headphones are essential elements to help the DJ listen to the next track carefully, without having the audience hear it. This is important to be able to perform tasks such as pitching (tempo matching) and beat matching (beat-synchronisation), prior to mixing the audio track to the track that is currently being played Harmonic Mixing When mixing two tracks, even if they are at an equal tempo and perfectly synchronised beat-to-beat, it does not necessarily mean that they will sound right. For two tracks to sound great together they have to be in the same or related key. Harmonic mixing is an advanced technique used by renowned DJs all around the world. It enables long blends and mash ups between tracks whilst eliminating key clashes. This is achieved by mixing tracks that are in the same or related keys 11. Harmonic mixing is therefore an area of DJ'ing in which the DJ does not randomly pick out the next song but rather chooses one which when mixed will create a pleasing audible effect. 11 Harmonic Mixing: How to Guide. [Online]. Available: [Accessed: Dec. 14, 2014]. 8

Since harmonic mixing deals with the keys of audio pieces, the DJ should have the keys of his tracks at hand. Also the DJ should know which keys are compatible.

20 Since harmonic mixing deals with the keys of audio pieces, the DJ should have the keys of his tracks at hand. Also the DJ should know which keys are compatible. Track A is compatible with another track B if both songs share key signatures, if B s key is a fifth above and both have same quality or if B's key is a fourth above and both have same quality. The term quality refers to the key being either major or minor. In order to help DJs achieve harmonic mixing, Mark Davis created the Camelot wheel: a visual representation of which keys are compatible with each other (see Figure 2). Figure 2 - Camelot Wheel If we take a track in the key of C-Major, the Camelot Wheel helps us understand that a track, which is in either the G-Major or F-Major key, will harmonically match, since on this visual representation it is clear to see that C-Major is bounded between these two keys, and they are directly touching the C-Major section. A-Minor is also directly touching. This is called the relative minor of C-Major and hence a track in A-Minor would also match a track in the key of C-Major. Another track, which has the same key signature of C-Major, will obviously harmonically match too. Another example would be D-Major, which is harmonically matching to itself D-Major, G-Major, A-Major and its relative minor, B-Minor. If we were to use the coding system defined by harmonicmixing.com, 8B is harmonically matching to itself, the previous key 7B, the following key 9B and its relative minor 8A. Let s now take an instance of a track, which is in a minor key. A 9

21 track, which is in the key of A-Flat minor defined by code 1A, is harmonically matching to itself 1A, the previous key 12A, the following key 2A and its relative major 1B. 2.5 Correlation Correlation is a statistical term that measures how two entities move in relation to each other. The correlation coefficient, which is a value from -1 to 1, is the computation of correlation. A correlation co-efficient of positive one is called Perfect Positive Correlation and it implies that as one entity moves in a direction, the other entity will move in lockstep, in the same direction. On the other hand, a correlation co-efficient of -1, i.e. Perfect Negative correlation, represents the scenario, when one entity moves in a specific direction and the other entity moves in the opposite direction. If the correlation co-efficient is 0, the movements of the two entities are said to have no correlation Fourier Analysis Fourier Analysis is the term given to a family of techniques, whose aim is to decompose signals into sinusoids. It is named after French Mathematician and Physicist Jean Baptiste Joseph Fourier ( ) (Smith, 1998). In this section, the two techniques that will be discussed are the Fast Fourier Transform and the Short Time Fourier Transform Fast Fourier Transform Heidemann, Johnson and Burrus (Heidemann et al, 1985) describe the Fast Fourier Transform (FFT) as a well-known and very adequate algorithm for calculating the Discrete Fourier Transform (DFT): a formula for calculating the N Fourier coefficients from a sequence of N numbers. Carl Friedrich Gauss was the first to work on the FFT back in During his experiments to find out the orbit of certain asteroids from sample locations, he developed the DFT. The DFT is a part of the family of Fourier Analysis, used with digitised signals (Smith, 1998). He also invented an algorithm for calculating the DFT. However, this was not published, as there appeared to be other solutions proving to be more convenient to solve his problem of finding the orbits of the asteroids (Worner, 2013). 12 Correlation. [Online]. Available: [Accessed: Dec. 16, 2014]. 10

More recently in 1965, 160 years after Gauss came up with his algorithm, Cooley & Tukey (Cooley & Tukey, 1965) proposed an equivalent algorithm, calling it the Fast Fourier Algorithm. Cooley et.

22 More recently in 1965, 160 years after Gauss came up with his algorithm, Cooley & Tukey (Cooley & Tukey, 1965) proposed an equivalent algorithm, calling it the Fast Fourier Algorithm. Cooley et. al (Cooley et. al, 1967) discuss the time complexity of the algorithm as opposed to older Fourier transform methods. Up until the recent publications of the FFT, implementations were using N 2 operations to compute a Fourier Transforms of N data points and it is hence not surprising that the new methods requiring N log Noperations were given much attention Short Time Fourier Transform The Short Time Fourier Transform (STFT) is a classical linear time-frequency representation. It is used to analyse the frequency of a signal when varying with time. The STFT is achieved by applying the Fourier transform by a fixed-sized, moving window to input series. The window is moved by one time point at a time, so overlapping windows are achieved (Okamura, 2011). Gauthier and Duval (Gauthier & Duval, 2007) describe the STFT as relatively simple. Despite this, they also add, that STFT has become a standard tool for the analysis of non-stationary signals Baba (Baba, 2012) describes how the STFT method has been used in the field of ultrasound blood-flow imaging for a long time, due to its suitability for analysis of nonstationary signals. The Spectrogram, shown in Figure 3, returns the time-dependent Fourier transform for a sequence (Haque et al, 2010). It is basically a graph with three axes. The x-axis represents the time domain, the y-axis represents the frequency domain and the third axis represents the intensities of the frequency at a particular point in time. Figure 3 - Time Frequency Spectrogram Representation 11

23 The amplitude of the frequencies is represented by the intensity or colour of each point in the image. 2.7 Key Detection Algorithms The detection of key from audio is not new, though rarely described in literature. Many existing implementations operate on symbolic data e.g. MIDI or notated music, in which the notes of the incoming signal are already known. This is not possible in the case when musical key detection is to be performed on an audio file Human Auditory Perception and Musical Cognition Models Pauws (Pauws, 2004) bases his implementation on human auditory perception and musical cognition models. His algorithm works directly on raw audio input. The algorithm takes a 100-millisecond section of the signal source and down samples its audio to around 10 khz. This cuts off any unneeded frequencies above the 5 khz mark and reduces the computing cost without significantly affecting the results. The remaining samples in the section are multiplied by a Hamming window, zero-padded, and the amplitude spectrum is calculated from a 1024-point FFT. Next, a procedure is applied to enhance the peaks without seriously affecting the frequencies or their magnitudes. The resulting spectrum is then smoothed using a Hanning filter. A twelve-dimension chroma vector (chromagram) is then calculated from the frequency spectrum, which converts the frequencies into the twelve musical notes. The chroma vector is normalised to show the relative ratios of each musical note in the frequency spectrum. Adding and normalising the chromagrams over all the frames results in a chromagram for the complete audio. These are correlated with Krumhansl s (Krumhansl, 1990) key profiles and the key profile that has maximum correlation over the entire computed chroma vector is taken as the most likely key. The performance accuracy of this implementation rates up to 75.1%. However, Pauws noted how most mistakes arose due to confusion between the relative, dominant, subdominant or parallel key. By considering these keys as friendly or similar keys, the accuracy reputably goes up to 94.1%. 12

24 2.7.2 Chord Segmentation and Recognition using EM Trained HMMs Sheh and Ellis (Sheh & Ellis, 2003) describe a method of recognising the chords in a piece of music using pitch class profiles and Hidden Markov Models (HMMs) using the Expectation Maximisation (EM) algorithm to train them. A HMM is a stochastic finite automaton in which each state generates an observation (Gold & Morgan, 1999). The EM algorithm is an approach that structures the statistical classifier parameter estimation problem to incorporate hidden variables (Gold & Morgan, 1999). The algorithm uses monophonic audio down-sampled at Hz. The audio is divided into N overlapping frames of 4096 points and transformed to the frequency domain using the STFT technique. The result is mapped to the PCP (Pitch Class Profile) features, which are commonly made up of twelve-dimensional vectors. Each dimension corresponds to the intensity of a semitone class (chroma). The procedure collapses pure tones of the same pitch class, independent of octave, to the same PCP bin. Frequency to pitch mapping is obtained using the logarithmic characteristics of the equal temperament scale. The PCP vectors are normalised to show the intensities of each pitch class relative to one another. Pre-determined PCP vectors are used as features to train a HMM with one state for each chord distinguished by the system. EM is used to find maximum-likelihood parameter estimates for HMM, by calculating the mean and variance vector values and the transition probabilities for each chord. Finally chord alignment/recognition is performed with the Viterbi algorithm. The PCP vector corresponding to a chord that aligned itself the most with the PCP vectors computed from the song is chosen as the most likely key. Sheh and Ellis state that they tested the system on a data set of 20 tracks and obtained an accuracy result of 75% Pattern Matching Techniques Other key detection algorithms that achieve similar results but do not require the need to code and train a Hidden Markov Model exist. Fujishima (Fujishima, 1999) describes an algorithm, which first transforms an input sound using the Discrete Fourier Transform spectrum and then does pattern matching on the Pitch Class Profile (PCP) to determine the chord type. Roebuck (Roebuck, 2011), in his implementation, attempts to avoid the use of Hidden Markov Models. He uses pre-defined templates, which represent the possible keys. 13

25 Correlation of the computed chroma values with each of the templates is performed and a correlation coefficient is calculated for each of the twenty four keys. Roebuck then uses a weighting system that rewards the highest correlating keys whilst penalising the highest correlating keys that correlate closely to the second highest correlating key. Roebuck explains how his method achieved results of 73% accuracy when comparing results from his algorithm with results from three similar programs. Interestingly, he noted that different genres of music returned different accuracies. 2.8 Beat Detection Algorithms The terms Beat Detection or Beat Tracking Algorithms refer to the extraction of the tempo (in BPM) from an audio track. This section will describe a beat detection algorithm that has been discussed in literature and briefly mention other possibilities. An algorithm which uses dynamic programming to achieve beat tracking was described by Ellis (Ellis, 2007) The process sets off with the conversion of the audio into an onset strength value with a 250 Hz sampling rate. An approximate global tempo is estimated by auto-correlation of the onset strength by applying a preference window which is a Gaussian on a log-time axis, and choosing the period with the largest autocorrelation as the tempo. Dynamic programming is then used to find the set of beat times that optimize both the onset strength at each beat and the spacing between beats. This technique efficiently searches all possible beat sequences to optimize a total cost that can be broken down into a local score at each beat time and a transition cost. For every possible beat time, the best preceding beat time is located and the cumulative score up to that beat is calculated. Then, the largest score close to the end of the audio is located and the entire sequence of beats leading to that beat time is recovered through a backtrace table storing the predecessor for every beat time. One of the used evaluation strategies involved picking out different versions of the same track by different artists and setting up two lists, comprised of one member from each pair in each list. The testing is performed by comparing, for example, the first track from the first list with each track in the second list and observing if the said track recognizes the right track as the cover version. They report an accuracy rate of 66%. 14

26 Another beat detection algorithm was described by Tzanetakis et. al (Tzanetakis, et. al, 2001). This method uses the Discrete Wavelet Transform and is based on detecting the most noticable periodicities of the signal. Another option was described by Scheirer (Scheirer, 1998) in the paper Tempo and Beat Analysis of Acoustic Music Signals. This method uses a small number of bandpass filters and banks of parallel comb filters to analyze the tempo of music signals. 2.9 Case Studies on DJ Hardware and Applications This section is dedicated to applications and tools which aid DJs in their work Rekordbox Rekordbox 13 is an application by Pioneer DJ, which acts as a tool to help DJs in all aspects of DJing. The typical user experience would start by loading music into the RekordBox collection and organising the tracks into playlists. Rekordbox offers features such as automatic analysis of BPM, cue and loop point configuration and key detection. Then, the user having prepared his DJ set at home would export the playlists into a USB storage device. At the performance venue, when using the USB device on specialised Pioneer CD players such as the Pioneer CDJ 2000 Nexus, the playlists previously created on Rekordbox can be accessed, enabling the DJ to quickly select and access the tracks with pre-configured cue and loop points. The information previously found on Rekordbox such as BPM, Key and waveform would also be shown on the CDJs. In Henderson s (Henderson, 2013) study, Rekordbox recorded an accuracy rate of 80%. This was the highest result from seven applications Beatport Sisario (Sisario, 2013) describes Beatport as the pre-eminent store for downloading electronic dance music. The service has nearly 40 million users and is equipped with a catalog of more than one million tracks. Beatport also offers a key detection value for every track in the database. In Henderson s (Henderson, 2013) study, it is stated that Beatport recorded an overall accuracy rate of 74%. 13 Rekordbox. [Online]. Available: [Accessed: Dec. 16, 2015]. 15

27 2.9.3 Pioneer CDJ 2000 The term CDJ refers to a line of DJ CD players from Pioneer 14, mainly intended as a playback device for DJs allowing analogue control of music from CDs, SD cards or more recently USB flash storage devices. The most recent version of Pioneer s CDJ range is the CDJ 2000 Nexus. This device offers an extensive set of features including a full colour LCD display ensuring quick song selection and instant understanding of musical progress through the wave form of the track, Pro DJ Link which lets the user link one device to another expanding the scale of DJ performances to a whole new level, Slip and Quantize functions which aid create a reliable performance without changing progress of original music, key lock to prevent the key of the track from changing when increasing or decreasing tempo and many more. Rekordbox software was produced by Pioneer and is intended for use with this particular CDJ and other new versions of the CDJ range Case Studies on Development Tools This section will discuss tools that are used in the development stages of this project Matlab Matlab 15 is described as a high-level language and interactive environment used by engineers and scientists. It allows exploring and visualisation of ideas. It also helps to enforce collaborations across various disciplines. These include signal and image processing, communications, control systems and computation finance. Matlab Guide (Graphical User Interface Development Environment) is an environment that helps developers design graphical user interfaces. Common controls in Guide include list boxes, pull-down menus, and push buttons, as well as Matlab plots. 14 Pioneer DJ. [Online]. Available: [Accessed: Dec. 16, 2014]. 15 Mathworks. Matlab. [Online]. Available: [Accessed: May. 20, 2015]. 16

28 PostgreSQL PostgreSQL 16 is an open source object-relational database system. With 15 years of experience, the architecture is proven to be effective, reliable and reputable for data integrity and correctness. PostgreSQL is renowned to have won numerous awards along the years. These include the Developer.com Product of the Year, Database Tool and the Linux Journal Editors Choice Awards for Best Database awards. The Literature Review outlined various key elements which are relevant to key detection and harmonic mixing. The following chapter will now describe the user requirements of the application. 16 PostgreSQL. About. [Online]. Available: [Accessed: May. 20, 2015]. 17

29 3 Requirements This section presents the user requirements of the system. The user is expected to be a DJ or an artist and the requirements will therefore conform with both the preparations of DJ sets off gig and also the live aspect of DJ ing. 3.1 Managing the Music Collection The music collection is a library of digital audio files that the DJ maintains and updates on a regular basis. In the music collection, the details of each track are saved under appropriate fields. A DJ is likely to require details such as: title, artist, album, genre, BPM, duration and key. It is important for the DJ to have functions that facilitate the adding of tracks to the database. Track information is normally automatically analysed, but the user still needs to be able to manually input specific details. Another important requirement is the searching of tracks from the music collection. The track data is important to be editable as this gives the DJ the possibility of customizing the music collection. Individual tracks may need to be deleted from the music collection, hence, a delete functionality is also necessary. 3.2 Playlists Playlists aid DJs in the customisation of a music collection. Baccigalupo comments that the function of playlist is to separate a music collection based on different characteristics, likings or similiarities (Baccigalupo &Plaza, 2007). This is put into context in the ways in which a DJ may choose to create different playlists that are based on different sets, genres or other factors. Thus, the main requirement in this regard is to find a way to facilitate playlist creation. Furthermore, another element that allows further accessibility is to have a display option that enables the user to view the tracks in a particular playlist. Finally, the delete function is also an important requirement for the user, to be able to remove unwanted playlists. 3.3 Audio Player Any DJ application, inevitably requires an Audio Player, which lets the user control the playback of the audio tracks within the collection. This enables the user to play the audio 18

30 from within the application eliminating the need to search for the track from within the PC directory. Consequently, an Audio Player will be implemented as part of the system. 3.4 Key Analysis Musical key analysis task is a well known for its difficulty in the areas of musical theory and audio scene analysis. The majority of DJs will not be able to hear a track and audibly detect its key. Even if this was a possibility, the time taken to achieve this over a music collection with thousands of tracks would be too long. Therefore, a DJ is very likely to require a key analysis algorithm that automatically detects the key of the audio tracks that are added to the music collection. 3.5 Tempo Detection Detecting the number of beats per minute (BPM) is considered another important requirement in any DJ application. For two tracks to synchronise, in other words, beatmatch, both tracks BPM must be exactly equal. When a DJ intends to mix a track into another, knowing the BPM of every track facilitates the process and removes the necessity of having to detect the key by ear, which is not an ability which every DJ possesses. 3.6 Recommending Harmonically Matching tracks During a performance a DJ finds it much easier if the program automatically recommends harmonically matching tracks to a specifically selected song. Therefore the main requirement that will be made available by the system is that of presenting harmonically matching options to a selected track. A case in point would be a collection of twenty six tracks, correspondingly named to the letters of the alphabet. The user chooses track X and the system tells the user that tracks A, N, O and Z are harmonically matching to X. A DJ may want to perform this function on a particular sub-set of the whole music collection. Therefore, another requirement for the system is to offer the possibility of restricting the results on a specific playlist that is pre-selected by the user. 3.7 Recommending Candidates within a BPM range Another requirement would see the program suggest candidates that fit a user specificed BPM range. This would follow a similar approach to the previous section, however, the key of the track will not be considered. Say the chosen track is 128 bpm and the range 19

31 is 2; tracks which are in the range of 126 to 130 will be returned as possible candidates. Similarly to the previous function, the possibility of restricting the results on a specific playlist should also be considered as a requirement. 3.8 Whole Mix Recommendations Some DJs who have large music collections and particularly prefer to prepare DJ sets, would most likely appreciate the functionality of getting whole mix recommendations from some starting point decided by them. The starting point can either be the first, middle or last track. The user is likely to want to get mix recommendations based on either key or BPM. In the latter case, the next track would simply be a track which fits the BPM range boundary decided by the user. In the recommendations by key case, the user would find it more useful to get a recommendation for a harmonic mix which also considers an accepted BPM range boundary. The reason behind this is that tracks that are harmonically matching but are too distant in BPM, will have to first be synchronised at the same speed in order to be mixed. When a track s tempo is altered, the key also changes. Even if key lock functions are now being introduced on most DJ devices, there is still only a limited BPM range that key lock will respect before getting useless. Consequently, choosing close BPMs ensures better usability of the system. Similarly to the previously described functions, mix recommendations can either be done on the whole music collection or on a single playlist. 20

32 4 Specification and Design As already described in the previous sections, the artifact proposed as proof of concept for this dissertation is an application that acts as a tool for DJs and artists alike. It makes use of key detection and beat tracking algorithms to aid the user in achieving harmonic mixing whilst performing. The specifications and decisions taken to design the system will be discussed in this chapter. The application starts with an empty music collection and first requires the user to add a track in order to benefit from the range of functions offered. The track is analyzed and useful information is extracted. This information along with the analyzed key and BPM is saved in the respective fields in the database table music collection. The user then chooses any of the functions offered by the system. The functions communicate with the database to give recommendations to the user. The final result is always shown in the view table. The Figure 4 describes the separation of concerns between the presentation and data access layers. It also shows the relationship between the external entities and the Graphical User Interface. Figure 4 - Seperation of Concerns between Data Access and Presentation Layers The system can be split into 4 modules, namely, Audio Functions, Data, Recommender Functions and the View. As shown in the Figure 5, the data within the system is interflowing between all the components. 21

33 Figure 5 - Visual Representation of the modules of the system The Figure 6 is a level 0 Data Flow Diagram to help understand the general flow of data in an out of the system. It also points out the communication between the external entities i.e. the user and the database. It can be applied for any function in the system. Figure 6 - Data Flow Diagram of the General System In the next part of this section, the design measures to implement the different modules are discussed. Finally, a part of the section is dedicated to the Graphical User Interface (GUI) and the database schema. 4.1 Audio Functions Add Track The application starts with an empty music collection and first requires the user to add a track in order to benefit from the range of functionalities offered. 22

34 The button Add Track opens a file selection dialogue box, which allows the user to browse through the directory and select the required file. The file can be either WAV or MP3. The chosen track is analysed and useful data is extracted. The track s data is then saved to the relevant table in the database with a unique ID and the path to the actual file on the computer. Key analysis and BPM detection are immediately performed on the audio track, once added to the collection. The predicted key and BPM values are saved among the other fields in the table. These fields are then later used in the program with parts of the system that require such data and information. This approach of triggering key analysis and beat detection upon track addition, avoids having to perform needless computationally demanding operations later on while using the application. The music added to the library is immediately shown in the view table, which displays all the tracks currently added to the collection. The same table also serves as a result view to display the results of the various functions offered in the system. The user, however, is able to switch back to display All tracks by clicking on the All tracks button. The Figure 7 is a general activity diagram, which simplifies what happens in the system and explains it in simple diagramatic form. Figure 7 - Activity Diagram of the General System Search / Delete / Modify Tracks The user can choose to search for a track within the music collection. The search function uses pattern matching facilities that are aided by the Like SQL function which searches for keywords. When the search button is clicked, the user is asked to enter any keyword. The search function will search all the columns in the table and return all the records that contain the keyword input by the user in the view table. 23

35 The user is able to delete any record from the the music collection by selecting the desired track from the table and clicking on delete. The delete button is made active when a track has been selected. To modify a track, the user clicks on its row in the view table and specifically changes the content of the field as desired. When modifying fields, the program takes care of data validation to ensure that user input satisfies the column types within the database Creating Playlists Playlists are created and labeled according to the user indications. The user can choose to switch the view to show any of the previously created playlists but is able to switch back to the initial view, showing all tracks in the music collection. The idea behind introducing a playlist option for the user came from the ever-increasing use of playlists in similar software applications. When choosing to create a new playlist, the user is prompted to enter the name of the playlist. A message is then displayed, asking the user to select the tracks from the music collection table, and click on the save button when ready. The user selects tracks from the playlist by clicking on the desired tracks in succession. Any tracks that are repeatedly chosen are only considered once. When saving new playlists, the program takes care of creating unique names in the case that the chosen name already exists. The following is a Data flow diagram describing the data flow in the Create Playlist function. The function interacts with the database to create a new playlist and associate the selected tracks with the newly created playlist. Figure 8- Data Flow Diagram of the Create Playlist function 24

36 4.1.4 Key Detection It is inevitable that any key detection algorithm needs to be converted from the time domain to the frequency domain. There are several techniques that can be used, namely; the Constant Q, Wavelet or Fourier Transforms. The Fourier Transform family seems to be widely used for time-frequency conversions. The options available were the Fast Fourier Transform (FFT) and the Short Time Fourier Transform (STFT). The STFT was eventually opted for, due to its ability to operate more efficiently on non-stationary audio signals, as opposed to the FFT. Essentially, STFT is FFT applied to small sections of a signal at a time. Once the program receives the STFT result, the values are mapped to a Chroma vector, which translates the frequency in Hz to real music notes (chroma). The chroma vector was opted for, due to its computational efficiency and low dimensionality. In fact the chroma vector is a commonly used option in the field of key detection and chord recognition. The result of the chroma vector can be visually displayed on a Chromagram. When faced with the query of analyzing the values of the chromagram, there were two possible routes to get to the final result of detecting a key. Hidden Markov models (HMM) can be used to record the template, which provides the best alignment with the chroma vector. Furthermore, the prediction is worked out through the means of a probabilistic model. Alternatively, a pattern matching technique using correlation, can be performed using a set of pre-defined templates representing the 24 possible keys. In this case, the chroma vector is correlated against the 24 possible keys, getting a correlation coefficient for each key. The Probabilistic method requires a lot of time for the development and training of HMMs. As regard to the pattern matching results, although a different method is adopted, they still offer very similar results. Thus, the pattern matching techniques were used for this system Beat Detection It is important to understand that the main objective of this project is to create an effective key detection algorithm. Furthermore, the subsequent attempts of tempo extraction from the audio track were additional features which depended on whether the time permitted or not. 25

37 Ellis (Ellis, 2007) implementation was well described in his paper Beat Tracking by Dynamic Programming and he also offered some of his implementation source files under a free license. The implementation was described in the Literature Review chapter. Small details in the source code were edited and the algorithm was included in the application to extend its usability. This proved to be the best option given the timeframe of the project. 4.2 Recommender Functions Once the user has added a significant amount of entries into the collection, the recommender functions are used to aid in harmonic mixing. The following is a description of the functions available within the system Getting Candidates by Key and BPM These two functions were designed with the DJ s main requirement in mind; that of getting next track suggestions. The Get Candidates by Key function returns harmonically matching candidates to the selected track. The Get Candidates by BPM on the other hand returns tracks whose BPM resides within a range of the chosen track s BPM. Upon opening the application, the Key and BPM buttons appear to be inactivated. This indicates that first a track has to be selected in order to use any of the functions. Once the user proceeds with selecting the track by clicking on the desired row in the table view, the buttons are activated indicating that an option can be chosen. When the user is within the music collection view, the functions work on the whole music collection table in the database. On the other hand, when the user has previously chosen a playlist view, the functions only return recommendations from that specific playlist. As described in the previous chapter, a key is compatible with itself and three other keys. When the key button is chosen, the function queries the database, returning the records, which match the four possible keys. The relationship between keys is visualised in the Camelot Wheel (figure 2 in Section 2.4.3). In the BPM case, the function returns tracks that are within a range of the selected track s BPM. Initially, it was planned that the program would return tracks that were in the range 2 < x < 2 where x is the current track s BPM, meaning that the accepted range 26

38 would be that of +/ 2. However, it was decided to let the user decide the range to make the function more intuitive and user friendly. At the end of both options, the results are displayed in the view table. In the BPM case the results are displayed in order, showing the closest candidates to the selected track first. In the case that the program does not find any candidates, an appropriate message is displayed to notify the user. The user can revert to display the music collection by clicking on the All tracks button available in the GUI. In the Figure 9, the general flow of activity in the getcandidatesbykey function is described. Figure 9 - Activity Diagram of the getcandidatesbykey function Getting a whole mix recommendation In this section, the functions that return a mix recommendation based on either Key or BPM values, are described. The Recommend a Mix button is prominent under the Recommender Tools section of the GUI. The number of tracks in the mix is predetermined by the user. Both functions start off with the user selecting the starting track of the mix from the table. Similarly to the previous section, the Recommend a Mix button is inactivated upon opening the program. In these functions case, however, a new GUI component is present. A pop up menu is used to let the user decide on whether to get recommendations based on Key or BPM. In this case, when the user selects a track from the table, the pop up menu button is activated, indicating that the user can select an option. When a selection is made in the pop up menu, the Recommend a mix button is activated too, indicating that the user can proceed to get a mix sequence recommendation. The next option lets the user decide whether the selected track of the mix recommendation should be the first, middle or last track. In the last track scenario the function simply flips the result of the first track scenario, consequently showing the chosen track as the last one. In the middle track scenario a different function is created. The middle function is based on the same idea as the start function. First, the user 27

39 specified number of tracks variable is split into two variables left and right. Then, these variables are used to iterate through two loops and generate tracks on both sides of the middle track. This way, if the chosen number of tracks value is eight, left will be four, right will be four and the selected track will be in the middle. In this section the focus is on the first track scenario which will be described in more detail. Similarly to the functions described in the previous section, the recommendation can be restricted to only suggest tracks from a specific playlist, rather from the whole music collection. The program always holds the path of the current track in memory and uses the key and BPM fields of that particular track to perform a set of operations in order to establish the next track. Specifically to the harmonic mix recommendation case, the function does not only consider the key of the next possible track, but also which of the harmonically matching keys is closest in BPM. The reasoning behind this is that a track which is harmonically matching but is faster or slower, will have to be increased or decreased in speed in order to be synchronized with another track, a factor which would change the original key of the track. Consequently, choosing the next possible track solely based on whether it is harmonically compatible is inaccurate. This feature is put into practice by having the user select an accepted BPM range boundary along with the starting track position and the number of tracks in the mix. The harmonic mix recommendation function uses the previously created functions (described in the previous section) to get the candidates by Key and the candidates by BPM. The next track should be key compatible and as close as possible to the BPM of the current track. On the other hand, when generating mix recommendations based on BPM range boundaries, the function retrieves tracks whose BPM value resides within a range of the previous track s BPM. The BPM range is pre-determined by the user in a dialog box that appears upon function selection. The BPM mix recommendation function uses the method (described in the previous section) that generates candidates by BPM and sets the track with the closest BPM as the next track in the mix. In the case that more than one track has the same value, a random track is chosen. Initially, the first track that fits the range was selected, however to ensure that the system does not always output the 28

same results for the same options, a random selection system was introduced. In the eventuality that not enough tracks are found, an appropriate message is output to the user. 4.

40 same results for the same options, a random selection system was introduced. In the eventuality that not enough tracks are found, an appropriate message is output to the user. 4.3 Front End - Graphical User Interface Design Since this application is aimed to facilitate the DJ s work, the front end plays a very important part, as this is the part of the application that the user will be constantly be interacting with. In this section various aspects related to the GUI of the application and the measures taken to achieve an effective result will be discussed (Martin, n.d). The following figure shows an early prototype of the system. Figure 10- Early Proto-type of the Graphical User Interface The final GUI is very similar. The main difference is that an Audio Player with Play and Stop control buttons was added above the Recommender Tools section. An effective user interface should be structured purposefully. Similar things should be grouped, whilst dissimilar things should be differentiated. For this reason, the interface for the artifact is divided into sections. These are Track Options, Recommender Tools and the View Table. The latter is used to display information about the tracks in the music collection, the playlists created by the user and the results of the functions in Recommender Tools. In a later version of the GUI, an extra section dedicated to the 29

41 Audio Player was added. In this section the functionality of playing audio files was introduced. A user interface should also be simple, easy to understand and straightforward to use. This is achieved by ensuring that all options needed to perform a specific task are clearly visible and can be easily found and accessed. Any unneeded or redundant information was left out as not to distract the user. An important feature, as already described in the previous section, is that upon opening the application, some buttons are de-activated. This sends the message that an action should be completed before the underlying functions can be used, e.g before using the function to recommend tracks by key, a track should be selected from the view table, or a playlist has to be selected prior to being deleted. More importantly, users should be informed of errors and changes in state and condition. Consequently, for every executed function, the system returns a message in an appropriate dialogue. Additionally, a progress bar showing the amount of time left to finish the task is shown in functions that take a significant amount of time to process completely e.g. when adding a track and analyzing it s key. This asks the user to wait until the task is finished before attempting to perform another. 4.4 The Database Schema A database was essential for the system. The database was built using Postgresql DBMS. The data of the audio file is stored in the database along with a unique id for each record. The path is also unique as a single track cannot be added to the database twice. Information about audio tracks is stored in a table called musiccollection. Music Collection Table Attribute Data Type Constraint id Character Varying(8) Primary Key title Character Varying(200) artist Character Varying(200) album Character Varying(100) genre Character Varying(100) bpm Character Varying(3) duration CharacterVarying(12) 30

42 musickey Character Varying(3) path Character Vasrying(200) Unique, Not Null Table 1 - Database Schema of MusicCollection Another table playlist is used to record playlist entries. This table consists of a unique Playlist id and a Playlist name that is decided by the user. The latter is also unique to provide means for the user to visually distinguish one playlist from another. Playlist Table Attribute Data Type Constraint playlistid Character varying(8) Primary Key playlistname Character varying(50) Unique, Not Null Table 2 - Database Schema of Playlist Finally another table called musiccollection_playlist resolves the many to many relationship between the musiccollection and playlist tables. MusicCollection_Playlist Table Attribute Data Type Constraint mcpid Character varying(8) Primary Key playlistid Character varying(8) Foreign Key trackid Character varying(8) Foreign Key Table 3 - Database Schema of MusicCollection_Playlist The relationship between the described tables can be better understood in the Entity Relationship Diagram in Figure 11. The entities in the figure only feature the key fields. Figure 11 - Entity Relationship Diagram of the system 31

43 5 Implementation This chapter describes the implementation of the system. The aim is to give a detailed and intensive idea of how the system was coded. In some cases, the explanation will be merely an extension and a more detailed description of the previous chapter. On the other hand, in other cases, snippets of the source code will be presented to help the reader understand the implementation of the system through the use of the various functions, methods and algorithms. This section will be split in three main sections: Software, Audio Functions and Recommender Functions. 5.1 Software The system was coded using Matlab_R2014A. Matlab offers the required level of programming for the purpose of this project and was eventually opted for. Matlab also offers a Graphical User Interface creator called Guide, which makes it pretty easy for the programmer to create and manage GUIs. The artifact also required the use of a Database Management System. The choice was Postgresql 9.4. The Java Database Connectivity (JDBC) driver was used to establish connections between Matlab and Postgresql. 5.2 Audio Functions Add Track The add_file_to_db function starts by opening a file selection dialogue box with the Matlab function uigetfile. The user browses through the computer directory and chooses the desired track. Once a track is chosen, the program immediately executes a query with the exists SQL function to determine whether the track already exists within the database. The next step is to get the data about the chosen file. This is achieved through the use of the functions audioinfo and fileparts. In some cases, when the relevant details are not retrievable from the audio, a default value is given. The key and BPM detection functions are called, which consequently execute a series of other functions that work together to detect the key and tempo of the chosen audio track. The id is created by 32

44 getting the count of rows within the musiccollection table and using this value to get the id of the last track in the table and incrementing the value to create a new id. This ensures that the unique constraint is always respected. When the required information has been obtained, the snippet of source code shown below is executed to submit the fields and create a new record within the musiccollection table. tablename = 'musiccollection'; colnames = 'id','title','artist','album','genre','bpm', 'duration','musickey','path'}; data = {id,title,artist,album,genre,bpm, duration,musickey,fullpath}; data_table = cell2table(data,'variablenames',colnames); fastinsert(conn,tablename,colnames,data_table); curs = exec(conn,'select * from musiccollection'); curs = fetch(curs); close(curs); close(conn); The music collection is immediately refreshed and the added track is visible instantly Create Playlist When the user clicks on the Create Playlist button, an input dialog is shown prompting the user to enter the name of the playlist. The name is saved in a variable playlistname. Upon entering the name, the user is instructed to choose the tracks to be added to the playlist by clicking on their respective rows in the music collection table and clicking on the Save button when ready. When a row is clicked in the music collection table, the Cell Selection Callback function is called. A function whentrackclicked was created to get the path and id of the track in that row. This function is being used in the callback and the ids of the tracks being selected by the user are saved in an array idvector. The program does not allow duplicates and any track chosen twice is simply not considered. The array idvector is used in the Save callback function. In this function the createplaylist function is called and playlistname and idvector are passed as arguments. 33

45 In the createplaylist function a query is immediately executed to order the records within the playlist table by id. Then the count is used to get the value of the largest id and create a new id which is simply an increment of one on the largest id. Another query checks if the name of the playlist already exists within the table in which case the name of the playlist is concatenated with the occurrence count to make it unique. Next, the function simply creates a new playlist with the new id and the unique playlist name in the Playlist table in the database. A for loop iterates for x times, where x is the number of tracks in idvector. During every iteration, the track ids are retrieved from idvector one by one and a new unique id is also created for the musiccollection_playlist table record entry. A new record is created in the mentioned table after every iteration, with a unique id mcpid, the previously created playlist id playlistid which is a foreign key from the playlist table and the track id trackid in that iteration (from idvector ) which is a foreign key from the music collection table. The following table shows the records created in the musiccollection_playlist table after creating a new playlist with playlistid 21 and the selected tracks with trackid 6, 9, 13 and 17. Table 4 - The Data in MusicCollection Playlist after creating a playlist Finally, the playlist should be shown in the music collection table. This achieved through the execution of the following query: SELECT a.* FROM musiccollection a INNER JOIN musiccollection_playlist b ON a.id = b.trackid WHERE b.playlistid = i where i is the playlistid of the playlist to be displayed. The above query refers to the one-to-many relationship between the tables where the primary key in the musiccollection table is a foreign key in the musiccollection_playlist table. 34

46 5.2.3 Key Detection As mentioned in the Add Track section, the key detection function is called as soon as a track is added to the collection and the detected key is saved among the other fields in the database. The following figure continues to graphically describe the flow of activity in the Key Detection algorithm. In this section all the functions that the program runs through to get to the detected key are described. Figure 12 - Activity Diagram of the Key Detection Algorithm The audio track is read using the audioread function. The samples of the track are saved in y. Then the split_audio_news function, splits y into 5.5 seconds intervals. At the end of every loop, each 5.5 second interval is created and the samples that make up the interval are saved in section_samples. Section_samples is passed as an argument to the function stereo_to_mono which transforms the audio from stereo to mono. This is done by taking the average of every two consecutive samples and thus reducing the number of samples by two. After having performed the above mentioned two processes on each 5.5 second interval, each interval s samples are added to a matrix final. Every audio process that follows is applied on each column vector in final (on every 5.5 second interval) using the demo_chroma function.. The audio is divided into overlapping frames of N = 8192 points and converted to the frequency domain using the STFT representation. The result can be displayed using a time-frequency spectrogram which shows frequency intensitities of the audio file along time. The following figure shows the Spectrogram reprsentation of an A-Major chord. 35

Figure 13 - Spectrogram representation of the A-Major Chord The next step is to map the STFT result which consists of frequency intensities in Hz to pitch classes or real music notes.

47 Figure 13 - Spectrogram representation of the A-Major Chord The next step is to map the STFT result which consists of frequency intensities in Hz to pitch classes or real music notes. Once the STFT result is obtained, it is mapped to the Pitch Class Profile (PCP) features. The result is a chromagram which consists of twelve dimensional vectors, having each dimension correspond to the intensity of a semitone class (chroma). This procedure collapses pure tones of the same pitch class to the same PCP bin (i.e. notes having the same pitch irrespective of octave). Frequency to pitch mapping is achieved using the logarithmic characteristics of the equal temperament scale. STFT bins k are mapped to PCP bins p according to: p(k) = [24. log 2 ( k N. f sr f ref )] mod 24 where f ref is the reference frequency corresponding to the first index in the chroma vector. The frequency f ref = 440Hz (corresponding to pitch class A ) was chosen. f sr is the sampling rate. Having left the audio to its original sample rate the f sr is N is the size of the FFT in samples. The choice of N is For each time slice, the value of each PCP element is calculated, by summing the magnitude of all frequency bins that correspond to a particular pitch class i.e. for p = 0, 1,, 23, PCP[P] = X[k] 2 k:p(k)=p The result of the described frequency to pitch mapping can be shown on the Chromagram representation in Figure 14. The intensities at the 1, 5 and 8 mark on the 36

48 y-axis represent A, C# and E respectively, which are the root, third and fifth note of the A-Major chord. Figure 14 - Chromagram representation of the A-Major Chord In the case when the chroma vector has a significant amount of 0 values, it is simply skipped. This ensures that points in the song where there might be silence and which can therefore affect the final accuracy rate of the algorithm are not considered. The next step is to match the processed chroma vector to pre-defined key templates. A matrix of binary templates representing the twelve Major and twelve Minor possible keys, is created. Each row vector represents a key and is 12 twelve dimensional, as the chroma vector. Each bin in each template represents a pitch class. Each key is labelled according to the root, third and fifth notes within the chord. The labelling is according to: [A,A#,B,C,C#,D,D#,E,F,F#,G,G#]. An example of the A Major key template is shown below. AMaj = [1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]; where the 1 s in the template represent the root note A, the third note C# and the fifth note E of the chord A Major. Every different major key template is just a shifted version of the other. On the other hand, the minor key templates are the same as the major ones, but the third note is shifted by 1 to the left. The following example shows the A Minor key template. AMin = [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]; 37

49 Now, each chroma vector is correlated against each key template. This step produces a correlation coefficient for each of the twenty-four keys. The correlation is achieved through the use of the corrcoef function offered by Matlab. For each chroma vector, there will be a correlation coefficient R for each key. For this reason, a twenty-four dimensional vector is created for each chroma vector having each element represent the correlation co-efficient of the said chroma vector with the key represented by that particular index e.g. Index 1 represents A Major, index thirteen represents A Minor. The next step involves the use of these coefficients to give a weighting to the highest correlated key. First, the coefficient vector is normalised by dividing each coefficient value by the top value, so that the top value becomes one. Then a weighting is assigned to the key that returned the highest coefficient. This weighting corresponds to the difference between the highest and second highest correlating keys. This weighting system rewards keys that returned a correlation coefficient which was by far the highest value. On the other hand, it penalises keys that correlate closely to the other values. A weighting vector is kept and the weighting for the most probable key is simply added to its respective index at the end of every 5.5 section section iteration. When the end of the song is reached, a weighting for every key will be available in the weighting vector. The key with the highest weighting is chosen as the detected key. The key value is added to the music collection table record, among the other detail fields. 5.3 Recommender Tools Generating Candidates by Key and BPM In the main menu, the user needs to first select a track from the music collection table view and choose to either get candidates by key or BPM by clicking on the desired button. Then the user decides the number of tracks in the mix and the BPM range boundary. Both functions begin by connecting to the database and retrieving the BPM or key of the selected track. Then the program checks whether the user has selected a playlist or not. If a playlist has been chosen, the following query is executed so the function only considers the tracks from within that particular playlist. 38

50 SELECT a.id FROM musiccollection a INNER JOIN musiccollection_playlist b ON a.id = b.trackid WHERE b.playlistid = pid where pid is the playlistid of the chosen playlist. The id of every track within the said playlist is saved in a array idlistcell. In the next section the candidates are retrieved from the appropriate data, set i.e. if a playlist has been previously chosen, the considered data set is made up of the tracks within that playlist ( idlistcell ) and if not, the data set considered is the whole music collection. At the end, when both functions have returned the candidates in the array data, the function difference offered by Matlab is applied on the ids in data and the id of the chosen track. This ensures that when displaying the results, the id of the selected track is not considered. The final step of both functions involves showing the candidates in the table. This is achieved by the execution of the following query: SELECT mc.* FROM musiccollection mc JOIN ( SELECT * FROM unnest(array[candidates]) WITH ordinality) AS x (id, ordering) ON mc.id = x.id ORDER BY x.ordering); where candidates is the array consisting of the candidates to be displayed. This query returns a new array ordering the candidates to be displayed according to the occurrence of ids in candidates, e.g. if the contents of candidates is 3, 10, 1, 12 then the results displayed in the table will be displayed in that order. This is especially useful in the BPM case, where the closest tracks in BPM to the selected track are given higher priority in the table. This section will now branch out to describe details which are specific to the BPM range candidates and harmonic candidates functions respectively. Specifically to the BPM scenario, in the getcandidatesbybpm function, the range boundary is established depending on the value chosen by the user in the main menu e.g. If the user chooses a track whose bpm is 128 and a BPM range of 2, the lower and higher boundary would be 126 and 130, respectively. 39

51 The query used to retrieve the candidates that fit the bpm boundary range in the event that a playlist had been chosen follows. The result is saved in an array data. SELECT id FROM musiccollection JOIN Unnest(String_to_array('idlist', ',')) id using (id) WHERE bpm BETWEEN lowerbpm AND higherbpm where the subset of the music collection that is considered is the list of ids found within idlist. The variables lowerbpm and higherbpm are the lower and higher BPM range boundaries, respectively. Similarly, when a playlist has not been chosen, the query used is the same but does not require the join and unnest statements and instead queries the whole music collection. The next step creates an array bpmarr with the same size of data and populates it with the respective bpm values of every track in data. Next, bpmarr is sorted with relation to the BPM value of the originally chosen track. Then the indices of the sorted bpmarr which contains BPM values are used to sort the original array data which contains track ids. This ensures that the tracks closest in BPM to the chosen track are given higher priority when the result is viewed in the table. On the other hand, when generating candidates by key, using the getcandidatesbybpm function, a case scenario is considered through the use of a switch statement. This is done on the key of the chosen track. A typical case would be: case 'A' hmk = ['A ';'D ';'E ';'F#m']; where A is the key of the selected track and hmk are the harmonically matching keys. Similarly to the BPM function, when the user has previously selected a query, the query executed only considers a subset of the music collection table. The query in this case is shown below. SELECT id FROM musiccollection JOIN unnest(string_to_array('idlist, ',')) id USING (id) WHERE m usickey = 'keyone' OR musickey = 'keytwo' OR musickey = 'keythree' OR musickey = 'keyfour'); where the harmonically matching keys are split into four different variables keyone, keytwo, keythree and keyfour, and the only ids that are considered are those in idlist. 40

52 When the user has not previously selected a playlist, the query is executed on the whole music collection Mix Recommendation by Key In the main menu the user selects the starting track for the mix from the music collection table. The user has the option to select between Key and BPM from the pop up menu near the Recommend a Mix button. Once the user selects the Key option and clicks on Recommend a Mix, a dialogue box asks for two user inputs: the number of tracks desired in the mix, and the accepted BPM range. The latter value is used, so the function does not only consider whether the next chosen track is harmonically matching but also which of the harmonically matching tracks are closest in BPM to the selected track. In the next dialogue, the user decides whether the chosen track should be the first, middle or last track of the mix. For now, only the first track scenario is considered. This function also considers the two different scenarios in which the user has previously selected a playlist or not. This is due to the use of the aforementioned functions getcandidatesbykey and getcandidatesbybpm which query subsets of the music collection table when the user has selected a playlist view. The function, first gets the count of the tracks to be considered i.e. the count of the tracks in the whole music collection or the count of the tracks in the playlist (if a playlist had been previously chosen). This is done to inform the user in the case that not enough tracks are available to return a mix recommendation with the desired number of tracks, in which case the function terminates at that point. The function then repeats the following in a loop for x times, where x is the number of tracks decided by the user. The id of the starting track is retrieved and immediately added to arrayids, which is intended to store the list of ids of the tracks that make up the mix. The BPM of the chosen track is also retrieved. Then the getcandidatesbykey function is executed on the track in that iteration and the result is saved in possibletracksids. The difference function offered by Matlab, is applied to the lists of ids in possibletracksids and arrayids. This ensures that tracks that are already within the mix in that iteration are not considered as possible next track recommendations. If the difference function returns an empty array, it means that either all the ids in the result returned by getcandidatesbykey were already previously chosen or no harmonically compatible 41

53 candidates were found, in which case the function breaks out of the loop and outputs a message declaring that not enough tracks were found to make up the whole mix recommendation. Now the getcandidatesbybpm function is executed on the track in the current iteration. Similarly, the difference function is applied to only consider tracks which are not already in arrayids. The result is saved in a array possibletracksbpm. Next, the intersection of possibletracksids and possibletracksbpm is taken. The result saved in intersection represents tracks that are both compatible by key and within the requested BPM range. If the size of intersection is 1, the function takes the first and only occurrence of intersection and sets it as the next track. The other 2 scenarios are when the intersection size is greater than 1 and when the intersection is empty. In the first scenario, when the intersection size is greater than 1, the intersection array consisting of ids is translated into another array of the same size called newintbpmarr, with the respective BPM value of every track in intersection. Next, the function getclosesttrackbybpm is called. This function starts off by sorting newintbpmarr with respect to the BPM value of the track in the current iteration, ctbpm. The indices of the new sorted array are used to sort the original intersection array containing ids. The function takes the first element of the now sorted intersection array in a temporary variable, and checks if it has any key candidates that have not already been used in the parent function. If no harmonically compatible candidates are available with this option, the next track in the sorted array is considered and the same test to establish whether the track in question would have any candidates or not is re-performed. If no track has any candidates, the function just chooses the first element of the sorted array, since in this scenario the function does not have any other suitable candidate for the next position in the mix recommendation and subsequently a deadlock is inevitable. The described process ensures that the chosen track by the getclosesttrackbybpm function, does not result in an unneeded deadlock if any of the other tracks within the BPM range could offer new track possibilities for the mix recommendation. This means that the program looks one step ahead, in the sense that it avoids choosing tracks which in the next iteration would have no harmonically compatible candidates. In the second scenario, since the intersection size is zero, there were no common candidates in possibletracksids and possibletracksbpm : no candidates that are both 42

54 harmonically matching and within the BPM range. Therefore, the track closest in BPM to the track in the current iteration is retrieved from possibletracksids again with the use of the function getclosesttrackbybpm described above. This will only be considering the harmonically matching tracks but will return the closest one in terms of BPM for better accuracy. At the end, the chosen track to occupy the next position in the mix, is checked one last time to ensure that it has not already been used, which would mean that every possible candidate has already been used. If the chosen track is indeed a possible candidate, its path is retrieved from the database and is set as the new path string which will be used again in the next iteration. In this section, however, only the scenario in which the chosen track was set to be the first track in the mix, was considered. When the track is chosen to be the last track of the mix, the result from the described function is simply flipped, such that the first track becomes the last. When the track is chosen to be the middle one, 2 loops are traversed instead of just 1. First, the number of tracks is split into variables, left and right. A loop iterating for the value of left generates half the tracks. The value is flipped, such that the chosen track becomes the last track. Another loop generates the other half of the tracks to satisfy the right side of the mix. Both results are then concatenated and displayed in the result view table. 43

55 6 Testing and Evaluation This chapter concerns a series of testing and evaluation, starting with parameter testing. In this regard, the optimum parameter values are obtained by testing the algorithm with different values. Moving on to the evaluation, a quantitative approach is adopted to define the accuracy rate for the key detection in three different scenarios. The first scenario concerns key detection on single chords and chord progressions. Songs from the classical music genre and those from various sub-genres of dance music are analysed in the second and third scenarios respectively. The second part of the evaluation section is dedicated to qualitative evaluation. The first part evolves around black box tests, in which the output of the various functions of the system is tested on smaller test sets. The performance rates of the key detection algorithm and the other functions of the application are also evaluated in this section. After each test, a discussion on the results obtained will follow and the strengths and weaknesses of the system will be outlined accordingly. 6.1 Parameter Testing on Key Detection Algorithm Parameters play a crucial part in ensuring the best results from the algorithm. In some cases different parameters were returning a wide variety of results. In other cases it was interesting to determine whether parts of the system were beneficial or not for the algorithm. Therefore, it was decided that the best course of action was to perform parameter testing on a set of seven tracks. The genre opted for was Dance Music for the reason that a DJ is most likely to be using the application with similar tracks FFT Length The right FFT length N is a balance between analysing enough data to avoid inaccuracies and not taking too much of the signal at a time. A too small N may result in not enough harmonic data being captured, whilst a too large N would defeat the whole purpose of using STFT in the first place. In the following test, different lengths are tried on the set of tracks and the different results are examined. The FFT lengths tried are always a power of 2. This ensures the exploitation of the computation efficiency of the FFT. 44

56 FFT Length (N) in Samples Elderbrook How many times (Andhim Remix) (Bm) Milos Ilic Slow Loris (Abm/ G#m) Jules & Moss Double False Face (C) Johannes Brecht Nuages (Original Mix) (Gm) Luca Guerrieri Harmony (Original Mix) (Ebm / D#m) Show- B -Arps to Heaven (Original Mix) (Gm) Robosonic Busted (Original Mix) (Abm / G#m) 1024 Bm G#m C G D#m A# B 2048 Bm G#m C G D#m Gm B 4096 Bm G#m C Gm D#m Gm G#m 8192 Bm G#m C Gm D#m Gm G#m Bm G#m C Gm D# Gm D#m Table 5 - Key Detection Parameter Testing Results for FFT Length A test performed on the above test set, showed that the FFT length that returned the best accuracy was N = 8192 samples. On the N = 1024 and N = 2048 marks, the key detection process was particularly slow. There are constant positive detections on the 8192 FFT length mark and the performance rate is relatively fast. Hence, the FFT length parameter is chosen to be N = Hopsize The hopsize is the time in between overlapping frames of the STFT. More overlap will give more analysis points and in turn more accurate results, however, the performance will be affected proportionally. A small hop size generally offers the best results, but a balance should be reached between an accurate and relatively fast key detection algorithm. Hopsize (ms) Elderbrook How many times (Andhim Remix) (Bm) Milos Ilic Slow Loris (Abm/ G#m) Jules & Moss Double False Face (C) Johannes Brecht Nuages (Original Mix) (Gm) Luca Guerrieri Harmony (Original Mix) (Ebm / D#m) Show- B -Arps to Heaven (Original Mix) (Gm) Robosonic Busted (Original Mix) (Abm / G#m) 11.6ms Bm G#m C Gm D#m Gm G#m 23.2s Bm G#m C Gm D#m Gm G#m 46.4ms) Bm G#m C Gm D#m Gm G#m 92.9ms Bm G#m C Gm D#m Gm G#m 185.8ms Bm G#m C Gm D# Gm G#m Table 6 - Key Detection Parameter Testing Results for Hop Size 45

57 In the smaller hop sizes e.g 11.6ms, the time taken for the key detection algorithm to run completely was too long, especially given the fact that every track s duration was beyond the 5 minute mark. The key detection was unaltered in all the recordings except in Harmony in which an incorrect detection was recorded at 185.8ms. On the 92.9ms mark, it can be noted that the algorithm returned correct detections and the algorithm executed with an acceptable performance rate. Therefore, the hop size in the final implementation was chosen to be equivalent to 92.9ms Downsampling The intention behind down sampling is to reduce the number of samples within the audio file and speed up the computation of the STFT. In this section, the effect that different down sampling rates has on the final key detection is tested with regard to the detected key and the time taken for the key detection algorithm to successfully return a key value. Down Sampling Rate (hz) Elderbrook How many times (Andhim Remix) (Bm) Bm (7.10s) Bm (10.57s) Bm (18.71s) Milos Ilic Slow Loris (G#m) G#m (6.57s) G# (10.17s) G#m (17.17s) Jules & Moss Double False Face (C) C (6.84s) C (10.02s) C (18.10s) Johannes Brecht Nuages (Original Mix) (Gm) Gm (5.39s) Gm (8.00s) Gm (13.72s) Luca Guerrieri Harmony (Original Mix) (Ebm) D#m (5.32s) D# (8.02s) D#m (13.51s) Show- B -Arps to Heaven (Original Mix) (Gm) Gm (4.99s) Gm (7.84s) Gm (13.00s) Table 7 - Key Detection Parameter Testing Results for Down Sampling Robosonic Busted (Original Mix) Abm D#m (6.01s) D#m (8.28s) G#m (15.78s) On the 11025hz and 22050hz sample rates, some of the tracks recorded incorrect detections. On the 44100hz (no down-sampling), the algorithm returned a perfect accuracy rate. The average time taken to perform key detection with an 11025Hz sample rate (down-sampling ratio of 4) was 6.03 seconds as opposed to seconds on the 44100Hz mark. The difference is not very significant and this time frame is acceptable since key detection is only performed once upon track addition. Therefore, downsampling was omitted from the final implementation. 46

58 6.2 Quantitative Evaluation on Key Detection Algorithm In this section, accuracy rates will be established from tests performed on three different test sets. The first will be a test set comprised of single chords or chord progressions. The second and third test sets are from Classical and Dance music, respectively Accuracy Test on Single Chords and Chord Progressions When testing the key detection algorithm on four single chords and four chord progressions, the algorithm recorded an accuracy rate of 100%. The test files were created on a Digital Audio Workstation particularly for this project. The single chord files were simply a repetition of one chord in a period of time. The chord progressions were a sequence of chords that can be found in a particular key. Both sets of audio files always returned a correct key detection. This high accuracy rate can be contributed to the fact that only a single instrument was used in the audio files, which is the easiest scenario for the key detection algorithm to successfully detect a key Accuracy Test on Classical Music An evaluation exercise was performed on a set of twenty classical songs. In such examples, the keys of the tracks are known in advance. The results can be seen in the following table. The tracks, which are labelled with, were detected correctly. Name of Song Actual key Detected Key 1 Ludwig Van Beethoven Symphony No 5 in C Minor 2 Antonio Vivaldi, The Four Seasons (Le Quattro Stagioni) - Concerto for Violin in E Major, RV 269, Op. 8, Spring Allegro 3 Johann Sebastian Bach, Partita for Solo Violin No. 3 in E Major, BWV 1006 Preludio Cm E E Cm E E 4 Johann Pachelbel, Canon in D Major D D 5 Johann Sebastian Bach, Brandenburg Concerto No.3 In G Major, BWV 1048, 1. Allegro G G 6 Wolfgang Amadeus Mozart, Concerto for Piano and Orchestra No. 21 in C Major, K. 467 II. Andante 7 Franz Liszt Liebestraum No. 3 In A-Flat Major, G 541, Op. 62 F F G# G# 8 Wolfgang Amadeus Mozart, Symphony No. 40 in G Minor, KV 550 I. Allegro Molto Gm Gm 9 Tomaso Albinoni, Adagio in G Minor Gm Gm 47

59 10 Ludwig Van Beethoven, Bagatelle for Piano in A Minor, WoO 59, Fu r Elise, No. 59 Am Am 11 Stanley Myers The Deer Hunter - Cavatina E E 12 Joseph Haydn, Symphony No. 94 in G Major, The Surprise II. Andante C C 13 Georg Philipp Telemann, Concerto for Viola and Strings in G Major II. Largo 14 Johann Sebastian Bach, Mass in B Minor, BWV 232_ VI. Et Resurrexit 15 Antonín Dvořák, Symphonie No. 9 in E Minor, From the New World, II. Largo G G Bm C# Em (varying) C# 16 Wolfgang Amadeus Mozart, Piano Sonata No. 11 in A Major, K.331 III. Alla Turca - Allegretto 17 Edvard Grieg, Concerto for Piano and Orchestra, Op. 16 in A Minor Allegro Molto Moderato 18 Alexander Borodin, String Quartet No. 2 in D Major III. Notturno 19 George Frideric Handel, Water Music Suite No. 1 in F Major Air 20 Claude Debussy, Suite Bergamasque No. 3, L 75 Clair de Lune A Am D F Db A Am A F C# Table 8 - Key Detection Accuracy Rate Evaluation on Classical Music The program correctly detected 17/20 of the tracks which corresponds to 85% of the whole test set. In the 15 th track, the detected key was C# and the actual key is said to be Em, however, when looking at the musical staff of the song, it can be clearly seen that there are various changes in the key, which could contribute to the inaccuracy in the final detection. The other two tracks which were not detected correctly are the 14 th and 18 th track. When detecting these 2 tracks on Rekordbox, however, it incorrectly detected the key with the same key detected by our algorithm. This fact can point out that there might be some kind of inconsistency within the key of the tracks, a factor which is not very rare in the genre of Classical Music. In general, the key detection algorithm performed well with Classical music, in which acoustic instruments are frequently used. This contributes to a better definition of the chords being played and hence the STFT finds it easier to define the pitches in specific time instances Accuracy Test on Dance Music A set of 30 tracks were first randomly selected from a library of Dance tracks. The key detections of these tracks were first obtained from the Rekordbox software and the 48

60 Beatport website. Then, the test set was refined to 20, by only selecting the tracks which returned identical key values on both Rekordbox and Beatport. This ensures a better probability that the key which is considered to be the Actual key is actually correct. The results obtained can be seen in the table below. Name of Song Actual key Detected Key 1 RAR Do We (Original Mix) Am D 2 Special Case Ice Twice (Original Mix) C C 3 Max Cooper - Woven Ancestry (Lusine Remix) Dbm C#m (Dbm) 4 Hot Natured Benediction (Original Mix) Am Am 5 16 Bit Lolitas Not The Only One (Original Mix) Em Em 6 Recondite Caldera (Original Mix) Ebm D#m (Ebm) 7 Moderat No. 22 (Original Mix) Em Em 8 Whomi Near Walls (Original Mix) Am A 9 Jamie Stevens The Wonder of you (Original Mix) Cm Cm 10 Lizzie Curious Wiggle (Original Mix) Bb A# (Bb) 11 Johannes Brecht - Sirenes (Original Mix) Cm Cm 12 Olivier Giacomotto, Kiko Beautiful Place (Original Mix) 13 London Grammar Sights (Dennis Ferrer Remix) Ebm Ebm (D#m) 14 Dr. Kucho Can t Stop Playing (Oliver Heldens Ebm Ebm (D#m) Remix) 15 Guy J Once in a Blue Moon (Original Mix) A# Fm 16 Sasha - Xpander (Original Mix) Gm Gm 17 Ten Walls Walking with Elephants (Original Mix) Am A 18 Clarian Claire (Original Mix) Cm Cm 19 Paul Van Dyk For An Angel Am C * 20 HVOB Oxid (original Mix) Dm D Table 9 - Key Detection Accuracy Rate Evaluation on Dance Music In the above test, 15 out of 20 tracks were detected correctly. This corresponds to 75% of the test set. In the 19 th track, the detected key is the relative major of the actual key. In this case both the detected and actual keys share the same key signature. If it is considered a near miss, the accuracy rate goes up to 80%. F F 49

61 It is interesting to note how in the 8 th, 17 th and 20 th track, the detected and actual keys are of the same pitch class but an incorrect mode i.e. A instead of Am and D instead of Dm. Various articles on the internet describe key detection programs as biased towards the minor keys, as most tracks in Dance music are said to be written in minor keys. However, this was not confirmed by a reliable source and therefore, was not taken into consideration in the evaluation. The accuracy rate (75%) was slightly lower than that of the Classical music (85%); however this was expected as in Dance music the tracks are usually made up of more sounds that can interfere when detecting a key. The kick drum used in practically every Dance music track could itself reside in a different key and could therefore very easily affect the final accuracy of the key detection algorithm. In general, the accuracy rate was pleasant, especially when considering that tests performed on commercial applications concluded similar results. 6.3 Qualitative Evaluation Performance Evaluation Key Detection The performance of the key detection algorithm was considered to be one of the important aspects in the requirements of the system. In this section, the performance of the algorithm will be tested. The specifications of the test laptop computer are Intel Core- I7 2.3ghz with 16gb of memory. Name Duration Time Taken (s) C Minor Chord Progression Denney Low Frequency (Original Mix) Eddie Amador House Music (Robosonic Remix) Climbers Tomorrow Never Comes (N. Stojan Remix) Table 10 - Key Detection Performance Rate Evaluation The test set consisted of 4 audio tracks ranging from 9 seconds to 11 minutes 14 seconds. On an audio file of 9 seconds, the time taken to detect a key was under 1 second. In an audio file of over 11 minutes, the time taken was under 30 seconds (25.44s). This can be considered as an acceptable waiting time for the application. 50

62 6.3.2 Performance Evaluation Mix Recommendation One of the main functions of the application is that of returning mix recommendations starting from a user selected track and given a set of values. It is important that these recommendations are returned in an efficient manner. In this section, both the key and BPM functions will be evaluated against the time they take to run when fed different values. The number of tracks within the music collection at the time of test was 114. Key or BPM Number of Tracks Starting Track Time Taken (s) Key 5 First 7.08 BPM 10 Last 4.52 Key 15 First BPM 20 Middle Key 20 Middle Table 11 - Mix Recommendation Performance Rate Evaluation The least time taken to successfully return a mix recommendation with 5 tracks was 4.52s. The longest recorded time was 33.71s when generating a 20 track long harmonically compatible mix. The Key Mix recommendations are slower, as the computation is performed on both the key and BPM, as opposed to the BPM case, in which only the BPM values are considered. However, in general, the time taken to return a mix recommendation is quite fast and acceptable for the purpose of the project Get Candidates By Key / BPM Evaluation In this section 3 scenarios will first be reasoned out. Then, the output of the program will be evaluated to determine whether the functions are working as intended. These scenarios include: an instance for harmonically matching candidates, an instance for candidates within a specified BPM range and an instance for harmonically compatible candidates found within a chosen playlist. Let us consider the track Luca Guerrieri Harmony whose key is D#m. If the Key function is chosen, the program should returns tracks that are in D#m, G#m, A#m and F#. When generating the results, a total of 19 tracks were returned. The tracks all belonged to keys which were compatible to D#m. A snippet of the result from the result view table can be found in Figure

Figure 15 - Evaluation Results when generating Candidates by Key Now examine a scenario for the BPM case, by applying the

Therefore, the application should return values from 123

tracks around the 125 BPM mark should be displayed first and tracks near the 127 BPM mark should be displayed last.

A snippet of the result can be seen in Figure 16.

63 Figure 15 - Evaluation Results when generating Candidates by Key Now examine a scenario for the BPM case, by applying the function on the Minicoolboyz Absent track, whose BPM value is The BPM range was set to 2. Therefore, the application should return values from to The results should also be displayed sorted in relation to the chosen track s BPM value, which means that tracks around the 125 BPM mark should be displayed first and tracks near the 127 BPM mark should be displayed last. 11 tracks were returned and were all within the accepted range. The result was also displayed correctly. A snippet of the result can be seen in Figure 16. Figure 16 - Evaluation Results when generating Candidates within a BPM range In the last scenario, first, the playlist Deep House is selected. The playlist consists of 8 tracks. A snippet of the playlist can be seen in Figure 17. Figure 17 - Snippet of the playlist contents before conducting the test 52

64 The track Special Case Ice Twice is selected and the Key function is selected. The application should only return tracks from the currently selected playlist, Deep House. The candidates should be harmonically compatible to the chosen track i.e. in the key of B, E, F# or G#m. The result featured the only harmonically compatible tracks from within the playlist Deep House. A snippet of the result is shown in Figure 18. Figure 18 - Evaluation Results when generating Candidates by Key from a playlist Whole Mix Recommendations Evaluation In this section the correctness of the functions that return whole mix recommendations is tested and evaluated. Five scenarios that tackle the main features of these functions are considered. The following table features a description of each scenario and the used values. Key / BPM BPM Range Number of Tracks Starting Track 1 Key 2 5 First 2 BPM 4 6 First 3 Key 1 9 Middle 4 BPM 3 12 Last 5 Key 2 4 First Table 12 - Whole Mix Recommendations Evaluation Scenarios In the first two scenarios, one for key and one for BPM, the chosen track is the first track of the mix. The third scenario is a harmonically compatible mix recommendation, where the starting track will be the in the middle of the mix. In the fourth scenario a BPM mix recommendation is evaluated, in which the chosen track is the last track of the mix. In each of the mentioned scenarios, different parameter values for the BPM range and number of tracks are applied and the results evaluated accordingly. In the last scenario a harmonically compatible mix recommendation is generated for three consecutive times, using the same parameter values to test whether the function returns random mix recommendations or not. 53

For the first scenario the track Bob Sinclair Back Again was chosen as the starting track. The track s key is D and the BPM value is 123.0.

The result should start from Back Again and return another 5 tracks. Each track should be harmonically compatible to the previous and as close as possible in BPM. The result can be seen in Figure 19.

65 For the first scenario the track Bob Sinclair Back Again was chosen as the starting track. The track s key is D and the BPM value is The BPM range was set at 2 and the number of tracks in the mix was chosen to be 5. The starting track was chosen to be the first track of the mix. The result should start from Back Again and return another 5 tracks. Each track should be harmonically compatible to the previous and as close as possible in BPM. The result can be seen in Figure 19. Figure 19 - Evaluation Results for Mix Recommendations Scenario 1 In the second scenario, another random track is chosen as the starting track. This is &Me Everless whose BPM value is Similarly to the previous scenario, the track was set to be the first track of the mix. The BPM range in this case was chosen to be 4 and the number of tracks for the mix was chosen to be 6. The first track in the result should be Everless and each new track s BPM must be within the specified BPM range, when compared to the previous track. The result can be seen in Figure 20. Figure 20 - Evaluation Results for Mix Recommendations Scenario 2 In the third scenario, the chosen track was Vincent Leijen Rosette. The track s key is Cm and the BPM value is The track was set to be in the middle of the mix recommendation. The number of tracks in the mix was chosen to be 9 and the BPM range was restricted to 1. The result should be a harmonically compatible mix in which the selected track should be the middle track. Similarly to scenario one, each track should match the previous track harmonically and be as close as possible in BPM. The result consisted of a total of 10 tracks (including the selected track) and Rosette was 54

The chosen BPM range was 3 and the number of tracks was 12. The chosen track The Wonder of You was chosen to be the last track of the mix.

66 the 6 th track. Each track is unique and bounded by harmonically compatible options. The result is shown Figure 21. Figure 21 - Evaluation Results for Mix Recommendations Scenario 3 In the fourth scenario, the chosen track was Jamie Stevens The Wonder of You. It s BPM value is The chosen BPM range was 3 and the number of tracks was 12. The chosen track The Wonder of You was chosen to be the last track of the mix. Thirteen tracks were returned in the result and the chosen track was in the 13 th (last) position. The difference between BPM values from one track to another is always within the previously specified BPM range. A snippet of the result is shown in Figure 22. Figure 22 - Evaluation Results for Mix Recommendations Scenario 4 In the fifth and last scenario we will test whether harmonically compatible mix recommendations generate random results. The same test will be executed three times, using the same values for the BPM range and number of tracks. The chosen track is Maceo Plex & Gabriel Ananda Solitary Daze. The track is in the key of D#m and its BPM value is The chosen BPM range and number of tracks values were 2 and 4, respectively. The chosen track was specified to be the first track of the mix. A snippet of the 3 results can be found in Figure 23, Figure 24 and Figure 25, respectively. 55

Figure 23 - Evaluation Results for Mix Recommendations Scenario 5 Take 1 Figure 24 - Evaluation Results for Mix Recommendations

succession of tracks made sensible and attractive sense. The results were also different in every instance.

However, upon checking, this track was the only track in the music collection, which matched the chosen track both in BPM and in

67 Figure 23 - Evaluation Results for Mix Recommendations Scenario 5 Take 1 Figure 24 - Evaluation Results for Mix Recommendations Scenario 5 Take 2 Figure 25 - Evaluation Results for Mix Recommendations Scenario 5 Take 3 In each of the recommendations, the succession of tracks made sensible and attractive sense. The results were also different in every instance. As can be seen from Figure 22 to Figure 24, the 2 nd track is always the same in each result. However, upon checking, this track was the only track in the music collection, which matched the chosen track both in BPM and in key. In this chapter, some of the Key and BPM values, shown from Figure 14 to Figure 24 are dummy values intended for testing purposes and may not necessarily represent the actual values of the tracks. 56

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals