UNIVERSITY OF MIAMI FROST SCHOOL OF MUSIC A METRIC FOR MUSIC SIMILARITY DERIVED FROM PSYCHOACOUSTIC FEATURES IN DIGITAL MUSIC SIGNALS.

Size: px
Start display at page:

Download "UNIVERSITY OF MIAMI FROST SCHOOL OF MUSIC A METRIC FOR MUSIC SIMILARITY DERIVED FROM PSYCHOACOUSTIC FEATURES IN DIGITAL MUSIC SIGNALS."

Transcription

1 UNIVERSITY OF MIAMI FROST SCHOOL OF MUSIC A METRIC FOR MUSIC SIMILARITY DERIVED FROM PSYCHOACOUSTIC FEATURES IN DIGITAL MUSIC SIGNALS By Kurt Jacobson A Research Project Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree of Master of Science in Music Engineering Technology Coral Gables, FL April 2006

2 UNIVERSITY OF MIAMI Submitted in partial fulfillment of the requirements for the degree of Master of Science in Music Engineering Technology A METRIC FOR MUSIC SIMILARITY DERIVED FROM PSYCOACOUSTIC FEATURES IN DIGITAL MUSIC SIGNALS Kurt Jacobson Approved: Ken C. Pohlmann Professor, Music Engineering Dr. Edward P. Asmus Associate Dean, Graduate Studies Fred DeSena Dr. James Shelley Asst. Prof., Music Theory and Composition VP, Academic and Research Systems

3 JACOBSON, KURT (M.S. in Music Engineering Technology) (April 2006) A Metric for Music Similarity Derived from Psychoacoustic Features In Digital Music Signals Abstract of a Master s Research Project at the University of Miami Research project supervised by Professor Ken Pohlmann From purchase to playback, digital music formats are becoming the pervasive mode of music consumption. Technologies like perceptual audio encoding and peer-topeer networking have enabled even casual enthusiasts to amass large digital music collections. New online music services offer customers millions of song titles for download. Portable digital music players allow listeners to carry thousands of music files in their front pocket. At every level, the amount digital of music content available for consumption is growing to nearly unmanageable proportions. Finding better ways to organize, index, and search digital music collections is the focus of content-based music information retrieval (MIR). A diverse body of research, MIR deals with problems like automatic genre classification, automatic song summarization, and music similarity quantification as well as others. This work describes a system for deriving music similarity measures from a set of music signals using digital signal processing techniques. The system employs three distinct dimensions of similarity: timbral similarity, rhythmic similarity, and structural similarity, to place individual songs in a music similarity space. The system is tested on a set of popular music files obtained from the itunes online music store as well as other music collections. Multidimensional scaling of the resulting similarity data is used to visualize song files in the music similarity and calibrate the system to estimate genre boundaries. Also, an intelligent jukebox application is implemented that generates playlists based on the system s music similarity measures.

4 Acknowledgements I would like to thank my family and fris for their support, especially my mother for proof reading. I would like the thank Ken Pohlmann, Colby Lieder, and James Shelley for their direction and assistance with this work, and Fred DeSena for agreeing to join my thesis committee on such short notice. I would also like to acknowledge and thank Jason Haft for sharing his insightful thoughts on music similarity and Etienne Handman of the Music Genome Project for contributing indepent music similarity data for benchmarking this system. I would also like to thank my peers Ben Fields and Nicolas Betancur for their assistance with this work.

5 Table of Contents Introduction Music Similarity Defined The Music Genre Psychoacoustic Features Melodic Content Rhythmic Patterns Timbre Features Song Structure Design Goals for a Music Similarity System Literature Review MPEG Mel-Frequency Cepstral Analysis K-means Clustering Beat Spectrum Analysis Automatic Audio Segmentation Implementation Feature Extraction System Tempo Calculation MFCC Analysis Timbre Model Rhythm Model Structure Model SimMetrix Model Comparison Tempo Similarity Timbre Model Similarity Rhythm Model Similarity Structure Model Similarity Combined Model Similarity Experiments Preliminary Tests Assumptions about the riddim Small-Scale Experiment Design Preliminary Test Results Larger-scale Test itunes Top Ten Test Set Additional Test Sets Results The Same Song Title Sanity-Check Multi-dimensional Scaling Music Genome Project SimMetricPlayer Discussion Future Work Conclusions 46 References Appix A. 50 Appix B.. 53 Appix C.. 56

6 List of Figures and Tables Figure Figure Figure Figure Figure Figure Figure Figure Table Table Figure 6.1a Figure 6.1b Figure 6.1c. 41 Figure 6.1d Table A.. 50 Table B.1 53 Table B.2 54 Table B.3 55

7 Introduction Digital audio has become nearly ubiquitous. The combination of increasing storage capacity, robust perceptual codec technology, and efficient networking protocols have led to a music content explosion. Technologies like mp3 and peer-to-peer networking have enabled even the casual music enthusiast to amass large digital audio collections. Navigating such large stores of digital audio can be daunting. While text-based queries can sort audio files based on manually entered metadata (i.e. ID3 tags), this approach has some significant disadvantages: manually entering metadata is timeconsuming, the metadata could be entered incorrectly or it could be incomplete, and metadata descriptions are limited to pre-defined fields. Some of these problems are addressed by recent technologies that retrieve metadata from the internet automatically (Windows Media Player, CDDB, etc), but the underlying musical content retrieval issues remain. Although text-based searching methods are well developed, text-based descriptions of digital music fail to significantly describe the musical content. The artist name and genre are often the only text-based information available to describe the musical content of a digital audio file. This provides a weak foundation for applying text-based information retrieval methods to large digital music collections. The prevalence of digital music content and the shortcomings of text-based retrieval methods have motivated a significant amount of research in content-based music information retrieval (MIR) an approach that utilizes some characteristics of the musical signal rather than text-based metadata. A diverse group of musicians, librarians, 1

8 researchers, and software developers have contributed to this effort, producing a wide array of theories and techniques related to MIR. The first annual International Conference on Music Information Retrieval (ISMIR) held in October of 2000, provided an international forum for those involved in work on accessing digital music content. The Motion Picture Expert Group (MPEG), a body of the International Organization for Standardization (ISO), has developed a substantial international standard for media content description known as MPEG-7. Although not as widely implemented as other MPEG technologies, MPEG-7 provides a robust and flexible framework for content-based retrieval of digital media. Simply put, these efforts are to enable more efficient indexing and searching of media content, whether music, images, or video. Such efforts are of interest to education, academia, entertainment, and industry. One approach to MIR is the audio-based music similarity measure. Audio-based music similarity measures could be applied to MIR in a number of ways including automatic playlist generation, recommation of unknown song titles or artists, organization and visualization of music collections, and music retrieval by example. Providing a consumer who purchases a given song with recommations of similar songs is a very common scenario in music distribution. In current systems, this is almost always accomplished using accumulated sales data. If many consumers have purchased both song title A and song title B, and a given consumer decides to purchase song title A, title B is automatically recommed. Such a system is easy to implement and undeniably effective as a sales mechanism. However, such systems are not content- 2

9 based and t to neglect less popular song titles discouraging diversity and even hindering the advancement of new artists. Automatic playlist generation is another significant application of music similarity measures. Generating playlists based on music similarity information enables intelligent jukebox applications, where the user would select an initial song and the application would automatically play similar songs. Such an application is developed as means of subjectively evaluating the music similarity system described here. As a contribution to the field of music information retrieval and as the foundation for an intelligent jukebox application, this work describes a system for quantifying music similarity between a set of songs based on the psychoacoustic features present in the digital signals corresponding to that set of songs. The purposed system combines several previously developed methods for quantifying music similarity, including the works of Logan [8], Aucouturier [5], Pampalk [6], and Foote [15]. A timbre model for music similarity is implemented based on [5, 6, 8]. The timbre model also uses parts of the open-source MA Toolbox [19]. A rhythm model for music similarity is implemented based on [15, 16]. A new method for modeling song structures is introduced as well. The similarity measures of these three models are combined to get an overall similarity distance between digital music signals. 3

10 1 - Music Similarity Defined Before developing a system for quantifying music similarity, we must ask: what is it that really makes distinct pieces of music similar? When comparing pieces of music, a listener forms a judgment about similarity based on the psychoacoustic features of the set of songs in question. These features include short-time temporal variations (rhythm), spectral content (timbre), the changes in spectral content (melody, harmony), and overall temporal variations (song structure). Of course, the perception of sound, particularly the perception of musical sound is a very complex phenomenon and this list of psychoacoustic features is by no means complete. However, after soliciting the opinions of a wide variety of professional musicians, music engineers, and music enthusiasts, there seems to be a general consensus that this list includes the more salient psychoacoustic features for determining music similarity. Let us then define music similarity as the product of an individual s personal taste and the psychoacoustic features present in a set of distinct musical signals. Of course, music similarity is largely an abstraction and notions of music similarity are subjective. However, there is usually at least a loose consensus among individuals as to which songs are similar and which artists are similar. This is evidenced by the fact that music critics almost always describe a particular artist or song in terms of similar artists or songs. Statements like, Blind Melon sounds like Lynyrd Skynyrd meets the Grateful Dead, are common and can generally be agreed upon. This suggests, at least in some broad sense, there exists a ground truth for music similarity, indepent of perception. Researchers have purposed a variety of methods for obtaining such ground truth data, but with only moderate success. 4

11 If we neglect the individual s personal taste as a contributing factor, music similarity can be considered a multidimensional model, derived from the psychoacoustic features present in a set of songs. We will treat music similarity as such in this work. 1.1 Musical Genres The music genre is the most common manifestation of music similarity. Songs that share a certain style or basic musical language are considered to be of the same genre. Although music genre implies music similarity and vice a versa, it is important to note that musical genre is not identical to musical similarity. Music genres can be based on time period (Baroque Music), geographical origin (Cuban Music), or even media format (videogame music). While these genres reflect similarity, they do not necessarily reflect musical similarity. Therefore, music similarity cannot be defined solely in terms of genres, however the music genre provides important clues to music similarity. Any measure of music similarity should adhere to at least some of the inter-song relationships established by music genres and, to a greater degree, music sub-genres. At the same time, a measure of music similarity should find meaningful relationships in songs across genres. 1.2 Psychoacoustic Features Putting aside the pigeonhole approach of assigning song titles to music genres, consider again that music similarity is a function of psychoacoustic features. Research in psychoacoustics has shown that certain aspects of an audio signal are more salient than others. Therefore it is reasonable to assume certain psychoacoustic features of a set of 5

12 music signals are more relevant in determining music similarity. Let us consider more closely the psychoacoustic features mentioned above: short-time temporal variations (rhythm), spectral content (timbre), the changes in spectral content (melody, harmony), and overall temporal variations (song structure) Melodic and Harmonic Features In part, music similarity is a function of melodic and harmonic content. This is perhaps most true with respect to trained musicians, who actively recognize the melodic and harmonic relationships in a piece of music. However, a listener with no musical training is still sensitive to the harmonic content of a song, consciously or unconsciously. Distinct harmonic relationships t to evoke distinct perceptions of mood or emotion. Musicologists have long argued that the mode and the tonic of a musical piece relate to the feelings and moods evoked by that piece. While there exists a large body of work on pitch tracking and harmonic analysis of digital music signals, the contribution of melodic and harmonic features to music similarity are essentially neglected here, however possible modeling techniques are discussed Rhythmic Features Music similarity is also a function of rhythm. In Electronic Dance music, rhythm ts to be one of the most salient features. It is not surprising that subgenre divisions of Dance music correspond to rhythmic style: Trance, House, DnB, etc. Even in a broader scope, intuition holds that music similarity is, in part, a function of rhythmic style. 6

13 There exist several methods for modeling the rhythm or temporal characteristics of a digital signal including [16] and [18]. The system developed here uses a specialized autocorrelation method first purposed by Foote in [16] Timbral Features Timbre is another dimension of music similarity. Timbre is commonly defined as the attribute which allows a listener to discriminate two sounds with the same pitch and loudness. The voicing, the instrumentation, the singing style, and many other elements contribute to the overall timbre of a song. From a signal processing perspective, timbre is more difficult to define. It is clear that the timbre of an audio signal is somehow related to its spectral content. This relationship is not as deterministic as the relationship between an audio signal s perceived pitch and its spectral content, where a certain fundamental frequency corresponds to a specific pitch. There does not exist a scale of timbres as there exists a scale of pitches. However, timbre is still a useful concept in audio signal processing. Timbre modeling techniques have been applied to automatic instrument identification by Brown [11], content-based retrieval of audio samples by MuscleFish Audio (1996), and even content-based music information retrieval [5, 6, 8, 9]. The MPEG-7 standard for multimedia content description even provides a set of descriptions and higher-level description schemes for modeling the timbre of an audio signal. Specifically, the standard includes InstrumentTimbre, HarmonicInstrumentTimbre, and PercussiveInstrumentTimbre as frameworks for modeling the timbre of a given audio signal. These models as well as other descriptors 7

14 and description schemes in MPEG-7 can be used evaluate music similarity in the timbral dimension. Beyond the MPEG-7 standard, the method of Mel-frequency cepstral analysis has also been associated with musical timbre. Initially used in speech processing, Mel-frequency cepstral coefficients (MFCC) are used throughout the system developed here and will be described in section Song Structure Features Another aspect of music similarity is song structure. The phrasing and the changes in a set of songs contribute to music similarity. An electronic dance tune with simple phrasing and only one break-down or change is more similar to another dance tune with a similar structure than to an experimental electronic tune with complicated phrasing and many changes. This example illustrates how song structure can reflect music similarity (or dissimilarity) within the broad genre of electronic music. However, song structure ts to follow genre divisions: Hip-hop and Dance music have fewer changes and a simpler structure than Jazz or Orchestral music. While methods for automatic audio segmentation can be used to model song structure as in [13, 17], these techniques have not been applied to music similarity measures. 8

15 2. Design Goals for a Music Similarity System If we neglect the individual s personal taste and assume music similarity is a function of psychoacoustic features, it is possible to design a system that automatically models music similarity for a set of songs. The goal of this work is to develop and test such a system. The system should rely only on psychoacoustic features extracted from the music signals in question. To facilitate scalability, the system should be divided into two parts: one part to extract, model, and store the psychoacoustic features of a given music signal, and a second part to compare the stored models of two distinct music signals to derive a music similarity measure. Let us call part one a Feature Extractor or FeatX, and part two a Similarity Metric calculator or SimMetrix. Such a configuration allows for songs to be added to or deleted from the test set at any time. The FeatX process need only run once on a given song, so the most processor intensive work should be done there. The SimMetrix part of the system should be of lower complexity so comparisons between songs can be made quickly (assuming FeatX has already annotated the songs). The system should make use of the psychoacoustic features identified in the previous section (due to time constraints, melodic content will be neglected). For each song, FeatX will extract and store a model for timbre, rhythm, and song structure. SimMetrix will compare the models between songs and calculated a similarity distance measure. 9

16 3. Literature Review 3.1 MPEG-7 Descriptors Since 1998, the ISO body known as MPEG has been developing a standard for multimedia content description called MPEG-7. The standard applies to digital video, audio, and images. Part 4 of the standard is dedicated to audio and specifies a rich set of description tools pertaining to audio content. These include low-level feature Descriptors (Ds), like AudioSpectrumEnvelopeType and LogAttackTimeType, that directly describe features of the audio signal, as well as higher-level Description Schemes (DSs), like AudioSignatureType and InstrumentTimbreType, which combine low-level features to form more abstract description schemes [1]. Any standard MPEG-7 description relies on three main components: Descriptors (Ds), Description Schemes (DSs), and Description Definition Language (DDL). Descriptors are representations of distinctive characteristics, or features, of the media data. Description Schemes specify the structure and semantics of the relationships between their components, which may be either Descriptors and/or Description Schemes. Both Descriptions and Description Schemes are expressed using the Description Definition Language (DDL). The DDL is based on XML schema. One Descriptor of particular interest here is the AudioBPMType. It is inted to describe the frequency of beats in an audio signal representing musical content. The beat frequency information is given in units of beats per minute (bpm), together with optional weights indicating the reliability of this measurement. This basic description of tempo is quite objective, and derived measures can be checked against a software or hardware 10

17 BPM calculator a device that allows the user to tap along with the music to get the BPM tempo. Knowing the tempo of a music signal is also useful in creating dance music playlists or dj-style mixes of music. Although the actual tempos of a given set of songs undoubtedly affect their similarity, tempo measures themselves provide little help. Of course tempo measures are insufficient for determining music similarity, but more importantly tempo measures are error prone. Even the most accurate tempo measures can be off by a factor of two because of the half-time / double-time effect. Tempo analysis for a song that actually has a presto tempo may result in a largo tempo measure because the algorithm was counting half-time. However, the AudioBPMType is implemented in this system as a starting point and as a guide for storing model information in xml format. 3.2 Mel-Frequency Cepstral Analysis Although MPEG-7 provides a vast array of content description methods for audio, there are some very interesting techniques not included in the standard. One notable omission is that of Mel-frequency cepstral coefficient (MFCC) analysis. Several groups of researchers, including Logan and Aucouturier, have explored using MFCCs as a means to describe the timbre of music signals with some promising results. Some research even indicates that MFCCs out perform MPEG-7 implementations for general recognition tasks [10]. MFCCs are derived from the discrete cosine transform (DCT) or Fourier Transform of the log amplitude of the Mel-frequency spectrum of an audio signal. The 11

18 Mel scale is a perceptual scale of pitches judged by listeners to be equal in distance from one another. Quantitatively, to convert from f hertz to m mel: = ln$ & 1 + f! # % 700" m (1) To perform Mel-scale frequency warping a Mel filter bank like the one in Figure 3.1 is applied. Applying the Mel filter bank positions spectral information logarithmically instead of linearly. This approach more closely approximates the human auditory system s response. Conceptually, MFCCs can be thought of as information about the rate of change in the different perceptual spectral bands. Since the method was first proposed as a front- to a word recognition system by Davis and Mermelstein in 1980, MFCC analysis has been one of the most successful techniques in speech Figure 3.1. A typical Mel filter bank used in the Mel-Frequency Cepstral Coefficient (MFCC) analysis 12

19 processing. MFCC analysis has also proven useful in other audio applications. Automatic instrument identification tasks have been performed using MFCC analysis by Brown [11]. This motivates the use of MFCC analysis to model musical timbre. Some recent research has focused on using MFCC analysis as a means of measuring timbral similarity including the work of Aucouturier [5], Logan [8], and Pampalk [6]. 3.3 K-Means Clustering In the MFCC analysis of an entire song, a set of coefficients is generated for every frame of audio. Two main statistical modeling approaches have been proposed to model a music signal s timbre from frames of MFCCs. Gaussian Mixture Models (GMMs) have been suggested by Aucouturier to model the timbre of a song by a probability density function describing the spectral distribution across all MFCC frames [5]. However, Logan has shown that K-means clustering provides comparable performance with less complexity [8]. The K-means algorithm starts by partitioning the input points into k initial sets. It then calculates the centroid of each set. A new partition is constructed by associating each point with the closest centroid. Then the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed). The result of K-means clustering is a set of K clusters characterized by the mean, covariance, and weight of each cluster. 13

20 K-means cluster models are compared to each other using a technique called Earth Mover s Distance (EMD). The EMD between two songs provides a measure of timbre similarity. EMD is described in detail in section Beat Spectrum Given that the cluster model approach to timbre description using MFCCs ignores all temporal information, it is necessary to consider additional methods for describing the rhythmic qualities of a musical signal. The BPM measure is only an indication of tempo, and gives no clues to the rhythmic feel or style of beat present in the song. However, methods for quantifying the rhythmic similarity of a set of music Figure 3.2. Self-similarity Matrix for a 10s segment of Bob Marley s No More Trouble Both axis are in terms of frame number. The checkerboard patterns indicate rhythmic repetition in the music signal. 14

21 signals have been developed. Most notable is the idea of the Beat Spectrum proposed by Foote in [15]. The Beat Spectrum is calculated from an audio signal using three basic steps: 1) Parameterization: The audio is parameterized using some spectral representation. MFCC frames are actually well-suited to this task. 2) Self-similarity: Some distance measure is used to calculate the similarity between every frame of audio. These measures are embedded in a 2-D selfsimilarity matrix. The visualization of such a matrix is shown if Figure ) Summation: The Beat Spectrum results from finding the periodicities in the similarity matrix using diagonal sums or autocorrelation. The result is a vector that models the rhythmic patterns in a music signal as a function of time. The Beat Spectrum vectors can be stored and compared directly using any standard distance measure, however Foote s research suggests that the cosine distance measure is most appropriate [15, 16]. 3.5 Automatic Audio Segmentation Methods for automatically segmenting audio have been developed by Tzanetakis [13], Foote [17], and others. While these methods have been developed with the application of browsing and annotating in mind, they could be applied to model the overall song structure of a music signal. The segmentation technique presented in [17] as a front- for modeling song structure similarity. 15

22 4. Implementation: SimMetrix System Design Using the MATLAB programming environment, a system called FeatX was developed to extract and store the tempo, timbre model, rhythm model, and structure model for a digital music file. The various models and id3 tag information are stored in an MPEG-7 compliant xml file (named songfilename.xml). Figure 4.1. A High-level block diagram describing the FeatX system, which generates models for musical timbre, rhythm, and musical structure and stores them in xml format. The bpm measure is not shown. To compare music files based on the song models stored in xml meta-files, SimMetrix was developed. Different methods are used to compute appropriate similarity distances for each of the three models. These methods are described in detail later. Each song-to-song comparison results in three similarity distances, one for each model. The 16

23 distances can be weighted and combined to plot music files in a similarity space. Techniques like multidimensional scaling (MDS) allow for music similarity visualization in two, three, or four dimensions. The source code is included in Appix C. 4.1 Feature Extraction with FeatX The feature extraction process begins by searching the target directory for all readable audio files. At this time FeatX only supports.wav and.mp3 formats. The system has five main processes: Tempo Calculation with getbpm() To derive the tempo of a music signal the MPEG-7 AudioBPMTypeD is used [2]. In FeatX this Descriptor is encapsulated in the function getbpm(), which is passed the actual music signal along with some parameters. The incoming signal is decomposed and pre-processed into a number of spectral bands with transition frequencies of 200 Hz, 400 Hz, 800 Hz, 1600 Hz, and 3200 Hz, respectively. The following processing steps are carried out for each frequency band: The band limited signal is derived from the input signal by means of bandpass filtering (lowpass filtering for the first frequency band, highpass filtering for the last frequency band). The band limited signal is two-way rectified (i.e. the absolute values are taken) and smoothed over time with a time constant around 100 ms to calculate an envelope signal. At this point, the signal may be decimated in order to reduce computational complexity. The envelope signal is differentiated (i.e. the differences between subsequent samples are calculated) and the result is limited to non-negative values, corresponding to the onset portions of the signal. Each differentiated envelope is normalized by its maximum value. The biased normalized autocorrelation function (ACF) is calculated for all lag values up to a maximum value which corresponds to a minimum detectable beat frequency value (note: in this context, normalized means that the function is scaled to reach a maximum value of 1.0 for lag=0. biased means that the so-called auto-correlation method is used to estimate the 17

24 correlation coefficients rather than the co-variance method ). The relation between beat frequency values (in bpm) and lag is given by the relation: 60 BeatFreq = for t > 0 (2) f s * t lag Since using integer lag values leads to a limitation in representable bpm values, further refinement can be achieved by using appropriate (e.g. quadratic) means for interpolation. A weighting factor for each frequency band is determined, e.g. as the ratio between the maximum value and the mean value of the bandwise ACF within the range of relevant lags, reduced by one. A combined envelope is then computed by summing all individual (differentiated and normalized) envelopes weighted by their respective weighting factors. Next, the combined envelope autocorrelation function (CEACF) is calculated from the combined envelope in the same way as described above for the ACF calculation in individual frequency bands. A reliability measure may be extracted in the same manner as described before for the weighting factor for individual frequency bands. Next, all meaningful local maximum peaks in the CEACF are detected within the range of meaningful lags, i.e. the lag range corresponding to the permissible range of musical beat frequencies (e.g. 60 BPM to 200 BPM). Each peak value, weighted by the summary reliability, is stored in a result vector together with its corresponding BPM value. Each weighted peak indicates the possibility of the corresponding BPM value to represent the correct beat frequency value. A final estimation stage decides which of the detected BPM values will eventually be returned as the beat frequency. If there are BPM values detected that fit into a tolerance range of already stored BPM values (BPM class) the corresponding relative peaks are added. If there are BPM values detected that are not within an already stored BPM class, these values and their corresponding relative peaks are inserted into the 18

25 vector and define a new BPM class. Each BPM class represents a plausible BPM estimate at the BPM value averaged across its members. This process is repeated twice for each audio file once for a frame near the beginning of the song and once seven seconds into the song. Two BPM values and two weight values are stored in an xml file with the same name as the audio file is in [2] MFCC Analysis with getmfcc() The next process is the extraction of the MFCCs. The audio signal is subsampled to 22.05kHz to save on computation. This of course has the effect of cutting the signal s spectral content above 11kHz. But imagine an even more dramatic low-pass filtering of a set of audio signals with a cutoff around 8kHz. Although the audio quality would be diminished, the music would still be intelligible and a human listener would still perceive the same music similarity relationships. It is a reasonable assumption that the most salient features for music similarity are present in the lower frequencies, so this type of sub-sampling is common in music information retrieval processes. The sub-sampled signal is windowed with a Hanning window of length 1024 and 50% overlap. A Fast Fourier Transform (FFT) is performed on each frame using the Matlab fft routine. A Mel filter-bank is used to perform the frequency warping to the Mel-frequency scale. The filter-bank described in terms of the magnitude response is multiplied by a scaled output from the fft to derive a Mel-frequency spectrum. The Discrete Cosine Transform (DCT) of the logarithm of the magnitude of the Melfrequency spectrum results in the vector of MFCCs. Twenty MFCCs were used including the first coefficient. 19

26 4.1.3 TimbreModel() To store the MFCCs, a statistical model is constructed. The K-means approach described earlier is used to cluster the MFCCs of a given music signal into 10 clusters (this number was chosen as a compromise between the suggestions of Logan [8] and Aucouturier [5]). Fig The Timbre Model for Your Man. The ten means are indicated by the fine dark lines and the covariances are indicated by the gray shading. The ma_kmeans algorithm is used from the open-source MA Toolbox developed by Pampalk [19]. All the parameters returned by the k-means algorithm are apped to the xml file in the type TimbreModel the syntax is shown below: 20

27 <SoundModel xsi:type="timbremodel" ClusterMethod="kmeans" CovarianceType="diag"> <SeriesOfVectors nin="20" totalnumcentres="10" nwts="410"> <Means> (10x20 matrix of floats) </Means> <Covariances> (10x20 matrix of floats) </Covariances> <priors> (1x10 matrix of floats) </priors> </SeriesOfVectors> </SoundModel> The TimbreModel type includes the means (centers), covariances, and weights of the k-means clustering of the MFCC frames. Various studies indicate that this type of statistical description of MFCC distributions across a music signal provides a good model for the timbre of an entire song [8, 9] RhythmModel() To model the rhythm of the musical signals in question, the beat spectrum technique developed by Foote in [15] is implemented. The MFCC representation of the audio signal is used again to calculate the RhythmModel. To minimize computation time, only a 1000x1000 frame self-similarity matrix is used. This corresponds to about 10 seconds of audio taken about 10 seconds into the song. Because the matrix is symmetric, only half of self-similarity matrix is calculated. The pdist Matlab function is used to calculate the cosine distance between the MFCC vectors of frames i and j for the selfsimilarity matrix S : S( i, j) =1" $ $ MFCC k (i) # MFCC k ( j) MFCC k (i) 2 $ MFCC k ( j) 2 (3) Because the beat spectrum is a function of lag time, l = i! j, and only a finite lag time needs to be considered, S( i, j) only needs to be calculated for l > i! j. This saves considerable computation time. max 21

28 Once S ( i, j) has been calculated, the beat spectrum B (l) can be found by simply summing the diagonals of S.! B ( l) = S( k, k + l) (4) k " R Here, B (0) is the main diagonal across the range R for which S has been solved. B (1) is the first super diagonal, B (2) is the second super diagonal, and so on. The RhythmModel is stored in xml as follows: <SoundModel xsi:type="rhythmmodel" DistanceType="cosine"> <SeriesOfVectors totalnum="25"> <Bl> (1x25 of floats) </Bl> </SeriesOfVectors> </SoundModel> The RhythmModel is adapted from Foote s Beat Spectrum method. The RhythmModels for two songs are shown in fig 4.3. Axel F is an electronic dance tune and its repetitive four-on-the-floor rhythm creates regular peaks in the Beat Spectrum. The jazz classic Take Five creates a less regular Beat Spectrum, but the 5/4 time signature is still apparent. 22

29 Figure 4.3. Rhythm models for Axel F and Take Five notice the strong repetitive peaks in the fouron-the floor dance music and the 5/4 beat structure of Take Five 23

30 Figure 4.4. The lo-resolution self-similarity matrix and novelty index for Axel F. Note the changes suggested by the self-similarity matrix are reflected as spikes in the novelty index. The spikes in the novelty index around 1.5 minutes correspond to a break down section in the song. 24

31 4.1.5 SongStructure() To model a song s musical structure a modified version of the self-similarity matrix in getrhythmmodel() is created. A self-similarity matrix is created for the entire song, skipping some integer number of frames. The number of frames skipped is the subframerate (usually 16). This sub-sampling of frames has the effect of low-pass filtering the self-similarity matrix eliminating more rapid temporal changes, as well as reducing the required computing time. The lo-resolution self-similarity, S lo"res, matrix is calculated as in getrhythmmodel(), using a cosine distance between MFCC vectors. Scaling the S lo"res to gray-scale values produces an image of the song s overall structure. The S lo"res image for Axel F can be seen in Figure 4.4. Note the lighter cross-like area Figure 4.5. A Gaussian-Tapered checkerboard kernel used for audio segmentation. The kernel is correlated along the main diagonal of the self-similarity matrix S lo"res to produce the novelty index Nv as shown in Figure

32 in the center of the image. This area corresponds to a break down section in this repetitive, four-on-the-floor dance tune. To find points where a given song changes, a Gaussian Checkerboard kernel is correlated across the main diagonal (see Figure 4.5). This is a method developed by Foote for automatic audio segmentation and automatic thumbnailing of music. The kernel consists of four blocks in a square checkerboard configuration. The top-left and bottom-right blocks are matrices of 1 s and the top-right and bottom-left blocks are matrices of -1 s. The entire checkerboard square is multiplied by a Gaussian window, resulting in the Gaussian Checkerboard kernel, show in Figure 4.5. The correlation of the kernel across the main diagonal of S lo"res results in a novelty index, Nv, as shown in figure 4.4. The novelty index Nv constitutes the StructureModel and it is stored in an xml file as follows: <SoundModel xsi:type="structuremodel" DistanceType="cosine"> <SeriesOfVectors totalnum="n" subframerate= 16 > <Nv> (1xN of floats) </Nv> </SeriesOfVectors> </SoundModel> The size of Nv is determined by the length of the music signal in question. A longer song will have a longer Nv. Preliminary experiments indicated non-overlapping hann windows of length 512, with a subframerate of 16, and a 32x32 kernel produce the most meaningful StructureModels. This results in vector lengths between 1700 and 2800 for the songs in the itunes test set. 26

33 Although the StructureModel could be represented more compactly (this will be shown later), the entire novelty vector is stored to facilitate experimentation with different methods for comparing StructureModels. 4.2 SimMetrix Model Comparison Feature vectors describing a musical signal only become useful if they can provide some additional functionality for comparison or search. A high-level block diagram of the SimMetrix system can be seen in Figure 4.6. The system operates on a query music file and a target directory containing music files and their FeatX-generated xml metadata. The system can also calculate similarities between every song in a target Figure 4.6. A High-level block diagram describing the SimMetrix system, which calculates the music similarity between songs p and q from the corresponding xml files. The bpm measure is not shown. 27

34 directory and create a square matrix of inter-song distances, SM. SimMetrix generates such a matrix for TimbreModel similarity, RhythmModel similarity, and StructureModel similarity. A tempo similarity method is included for completeness. Each model requires a different distance calculation Tempo similarity Tempo similarity is the simplest distance calculation because it involves simply taking the absolute value of the difference of the query file BPM and the target file BPM. Since BPM values are frequently off by a factor of two, it is best to take the minimum value of the following: n ( BPM BPM ) " BPM = 2! min n=[ ] (5) q p TimbreModel similarity The TimbreModel similarity requires a more complicated distance measure. Again, the open-source MA Toolbox [19] is used, this time to implement the Earth Mover s Distance (EMD). EMD calculates the minimum amount of work required to transform one model into another. Consider the clusters as piles of earth, we are interested in how much earth (or probability mass) we need to move to transform model TimbreModel P into model TimbreModel Q. Let P = µ,!, w )...( µ,!, w )} be the model for song P with m clusters {( p p1 p1 1 pm pm pm where µ pi,! pi, and w pi are the mean, covariance, and weight of that cluster. Similarly, let Q = µ,!, w )...( µ,!, w )} be the model for the query song. We can calculate {( q q1 q1 1 qm qm qm the distance between clusters by 28

35 d p i q i ' = ' pi qj ' + ' qj pi & # 2 $ ( µ ) ( +! pi µ qj ) (6) % ' pi ' qj " Let f piqj be the flow between p i and q j. This flow reflects the cost of moving probability mass from one cluster to another. We solve for all f piqj that minimize the overall cost W defined by W = m n!! i= 1 j = 1 d p q i j f p q i j (7) That is, we seek the cheapest way to transform signal P to signal Q. This can be formulated as a linear programming task for which efficient solutions exist. Having solved for all f p i q j, the EMD is then calculated as EMD( p,q) = " " m " n i=1 m " n f i=1 j=1 pi q j j=1 d pi q j f pi q j (8) If P and Q are very different TimbreModels, the EMD will be large. If P and Q are very similar TimbreModels, the EMD will be small. If P and Q are the same TimbreModel, the EMD will be zero. Because EMD values for very different TimbreModels can be arbitrarily large, a maximum limit is set. Preliminary experiments indicated that for most TimbreModel pairs, EMD "1000. However, for some pairs EMD >> To compensate for this, a special normalization process is applied: if EMD( p,q) >1000, EMD( p,q) =1000 SM timbre ( p,q) = EMD( p,q) 1000 (9) This normalizes the range for TimbreModel similarity to 0 " SM timbre "1, 29

36 conforming to the other model similarities RhythmModel similarity The RhythmModel similarity can be found quite easily by computing the cosine distance between two RhythmModels SM rhythm ( p,q) =1" $ $B p # B q 2 2 B p $ B q (10) Where B p and B q are the RhythmModels for songs p and q, respectively. The cosine distance is inherently normalized to one. This is implemented based on Foote s work in [16] StructureModel similarity The StructureModel similarity calculation is based on the number changes detected in a music signal and their locations relative to the length of the song. The number of significant changes, C p, in song p can be determined from Nv p - the novelty index. if (Nv p (i) > Nv threshold & Nv p (i "1) # Nv threshold ) C p = C p +1 (11) rl p ( j) = i length(nv p ) j = j +1 Where Nv threshold is some constant and Nv p (i "1) refers to the previous value of the novelty index. In this way, only the positive crossings of the Nv threshold increment C p. This is a fairly accurate method for finding changes in an audio signal. Note that rl p records the normalized locations of the changes. Should rl p = 0.5, this would indicate a 30

37 change in the middle of the song. The mean of rl p is taken to get µl p. To compare the StructureModel of song p with that of song q, C p and C q are considered magnitudes, while µl p and µl q are taken to be the corresponding angles. The StructureModel similarity between songs p and q is calculated as a Euclidean distance between the resulting vectors: SM structure ( p,q) = ( C p cos(" # µl p ) $ C q cos(" # µl q )) 2 ( ) 2 (12) + C p sin(" # µl p ) $ C q sin(" # µl q ) Here," represents some maximum angle. The lower " is set, the less impact the relative location of changes has on StructureModel Similarity. Preliminary tests indicate that " = # 4 is an appropriate value. This allows songs with different change distributions, but the identical numbers of changes to still be similar. SM structure is normalized to one by dividing by the maximum value of SM structure. This normalization conforms to the other similarity measures Combined Model Similarity To derive one matrix of similarity distances SM total that combines all three model similarities the following equation is used: SM total = w timbre SM timbre + w rhythm SM rhythm + w structure SM structure (13) The weights reflect the relative importance of each model to overall music similarity. Initial values were chosen as w timbre = 0.5, w rhythm = 0.4, and w structure = 0.1. The weights are adjusted experimentally as described in section

38 5 - Experiments 5.1 Preliminary Experiments During the development process for FeatX and SimMetrix, some small-scale experiments were designed to quickly assess the systems performance. These tests were not at all rigorous, but still provided direction for improving the system Assumptions About The Riddim One of the biggest challenges in music information retrieval lies in evaluating a MIR system s performance. Because musical tastes and genre descriptions are rather subjective, it is difficult to objectively grade any music similarity system. Web surveys, probing P2P networks, and text-mining online music guides are methods that have been proposed as a means to obtain ground truth data on music similarity. Time constraints dictate a simpler approach for initial experiments. It is common place in some musical genres for several different artists to use the same backing music or riddim. This is especially common in modern Dancehall and Reggae music. Dozens of different vocalists may release different song titles - with unique vocal melodies and arrangements but all using the same riddim. One example is the Diesel Riddim which has been used by artists like Beenie Man, Elephant Man, Lexxus, Captian Barkey, and many others. Intuition suggests SimMetrix should score song titles on the same riddim as similar if not, the system fails. 32

39 5.1.2 Small-Scale Experiment Design A total of 30 music files were included in the initial test set. The files were selected from my personal digital music collection. Of the small-scale test set, six songs used the Soprano s Riddim, five songs used the Diesel Riddim, and two songs used the Bad Road Riddim. The rest of the songs were selected from Reggae, Hip-Hop, Rock, and Jazz Vocalist genres. Xml files were generated for each file using FeatX Preliminary Test Results Three query songs were chosen: Elephant Man s Passa Passa on the Diesel Riddim, Alavode s Burn Dem on the Soprano s Riddim, and Billie Holiday s God Bless the Child as a control. Each query was run separately. Averages across all songs of the same riddim and across all other songs were calculated. These results are presented in Table 5.1. The results of these initial experiments suggest that all three models are useful in Query song Elephant (Diesel) Alavod (Sprn.) Billie Holiday Group TABLE 5.1 EXPERIMENT RESULTS Average tempo distance Average cluster distance Songs on Diesel Riddim Songs on Soprano s Other songs Songs on Soprano s Songs on Diesel Riddim Other songs All songs Songs on Diesel Riddim Songs on Soprano s Average Beat spectrum distance Table 5.1. The average distances for each similarity parameter for three different query songs. The query songs are Elephant Man (on Diesel), Alavode (on Soprano s), and Billie Holiday s God Bless the Child. These results seem to indicate that songs which use the same riddim have considerably smaller distances across all parameters. Note this is prior to normalization. 33

40 measuring music similarity. The intra-riddim similarity distances should be smaller than the cross-riddim similarity distances. In other words, songs on the same riddim should be rated as most similar. This was true for both the TimbreModel and the RhythmModel. This small-scale experiment served as a sanity check for the system. Given that intrariddim similarity distances were smallest, the system passes. 5.2 Full-scale Experiment itunes Top Ten Test Set A more thorough test of the system would require a larger pool of test songs not bound by any single individual s personal collection. To create such a test pool of digital music files, the top ten rated songs in eleven different genres were purchased from the itunes music store. The genres were selected arbitrarily and included Hip-hop/Rap, Classical, Pop, World, Jazz, Dance, Electronic, Country, Blues, Alternative, and R&B/Soul. The itunes top ten ratings are based on route sales data. To date, over 980 million songs have been purchased since the service first launched on April 28, There are currently itunes stores available in the United States, United Kingdom, France, Germany, Austria, Belgium, Finland, Greece, Ireland, Italy, Luxembourg, the Netherlands, Portugal, Spain, Canada, Denmark, Norway, Sweden, Switzerland, Japan, and Australia. Given the popularity and broad scope of the itunes music store, the genre distinctions applied to the test songs can be used as a benchmark for the music similarity system. A useful system should at least loosely identify genre boundaries defined by the itunes music store. 34

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson Automatic Music Similarity Assessment and Recommendation A Thesis Submitted to the Faculty of Drexel University by Donald Shaul Williamson in partial fulfillment of the requirements for the degree of Master

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Classification of Timbre Similarity

Classification of Timbre Similarity Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common

More information

A New Method for Calculating Music Similarity

A New Method for Calculating Music Similarity A New Method for Calculating Music Similarity Eric Battenberg and Vijay Ullal December 12, 2006 Abstract We introduce a new technique for calculating the perceived similarity of two songs based on their

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

CS 591 S1 Computational Audio

CS 591 S1 Computational Audio 4/29/7 CS 59 S Computational Audio Wayne Snyder Computer Science Department Boston University Today: Comparing Musical Signals: Cross- and Autocorrelations of Spectral Data for Structure Analysis Segmentation

More information

An Examination of Foote s Self-Similarity Method

An Examination of Foote s Self-Similarity Method WINTER 2001 MUS 220D Units: 4 An Examination of Foote s Self-Similarity Method Unjung Nam The study is based on my dissertation proposal. Its purpose is to improve my understanding of the feature extractors

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t MPEG-7 FOR CONTENT-BASED MUSIC PROCESSING Λ Emilia GÓMEZ, Fabien GOUYON, Perfecto HERRERA and Xavier AMATRIAIN Music Technology Group, Universitat Pompeu Fabra, Barcelona, SPAIN http://www.iua.upf.es/mtg

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

The song remains the same: identifying versions of the same piece using tonal descriptors

The song remains the same: identifying versions of the same piece using tonal descriptors The song remains the same: identifying versions of the same piece using tonal descriptors Emilia Gómez Music Technology Group, Universitat Pompeu Fabra Ocata, 83, Barcelona emilia.gomez@iua.upf.edu Abstract

More information

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND Aleksander Kaminiarz, Ewa Łukasik Institute of Computing Science, Poznań University of Technology. Piotrowo 2, 60-965 Poznań, Poland e-mail: Ewa.Lukasik@cs.put.poznan.pl

More information

Enhancing Music Maps

Enhancing Music Maps Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,

More information

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam CTP431- Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology KAIST Juhan Nam 1 Introduction ü Instrument: Piano ü Genre: Classical ü Composer: Chopin ü Key: E-minor

More information

Music Structure Analysis

Music Structure Analysis Lecture Music Processing Music Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

Voice Controlled Car System

Voice Controlled Car System Voice Controlled Car System 6.111 Project Proposal Ekin Karasan & Driss Hafdi November 3, 2016 1. Overview Voice controlled car systems have been very important in providing the ability to drivers to adjust

More information

Recognising Cello Performers using Timbre Models

Recognising Cello Performers using Timbre Models Recognising Cello Performers using Timbre Models Chudy, Magdalena; Dixon, Simon For additional information about this publication click this link. http://qmro.qmul.ac.uk/jspui/handle/123456789/5013 Information

More information

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS 1th International Society for Music Information Retrieval Conference (ISMIR 29) IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS Matthias Gruhne Bach Technology AS ghe@bachtechnology.com

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Recognising Cello Performers Using Timbre Models

Recognising Cello Performers Using Timbre Models Recognising Cello Performers Using Timbre Models Magdalena Chudy and Simon Dixon Abstract In this paper, we compare timbre features of various cello performers playing the same instrument in solo cello

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

How to Obtain a Good Stereo Sound Stage in Cars

How to Obtain a Good Stereo Sound Stage in Cars Page 1 How to Obtain a Good Stereo Sound Stage in Cars Author: Lars-Johan Brännmark, Chief Scientist, Dirac Research First Published: November 2017 Latest Update: November 2017 Designing a sound system

More information

Tempo and Beat Tracking

Tempo and Beat Tracking Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Tempo and Beat Tracking Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Music Information Retrieval

Music Information Retrieval CTP 431 Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology (GSCT) Juhan Nam 1 Introduction ü Instrument: Piano ü Composer: Chopin ü Key: E-minor ü Melody - ELO

More information

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Music Complexity Descriptors. Matt Stabile June 6 th, 2008 Music Complexity Descriptors Matt Stabile June 6 th, 2008 Musical Complexity as a Semantic Descriptor Modern digital audio collections need new criteria for categorization and searching. Applicable to:

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Motion Video Compression

Motion Video Compression 7 Motion Video Compression 7.1 Motion video Motion video contains massive amounts of redundant information. This is because each image has redundant information and also because there are very few changes

More information

Features for Audio and Music Classification

Features for Audio and Music Classification Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

More information

Audio Feature Extraction for Corpus Analysis

Audio Feature Extraction for Corpus Analysis Audio Feature Extraction for Corpus Analysis Anja Volk Sound and Music Technology 5 Dec 2017 1 Corpus analysis What is corpus analysis study a large corpus of music for gaining insights on general trends

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Timing In Expressive Performance

Timing In Expressive Performance Timing In Expressive Performance 1 Timing In Expressive Performance Craig A. Hanson Stanford University / CCRMA MUS 151 Final Project Timing In Expressive Performance Timing In Expressive Performance 2

More information

Full Disclosure Monitoring

Full Disclosure Monitoring Full Disclosure Monitoring Power Quality Application Note Full Disclosure monitoring is the ability to measure all aspects of power quality, on every voltage cycle, and record them in appropriate detail

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

ISMIR 2008 Session 2a Music Recommendation and Organization

ISMIR 2008 Session 2a Music Recommendation and Organization A COMPARISON OF SIGNAL-BASED MUSIC RECOMMENDATION TO GENRE LABELS, COLLABORATIVE FILTERING, MUSICOLOGICAL ANALYSIS, HUMAN RECOMMENDATION, AND RANDOM BASELINE Terence Magno Cooper Union magno.nyc@gmail.com

More information

Perceptual dimensions of short audio clips and corresponding timbre features

Perceptual dimensions of short audio clips and corresponding timbre features Perceptual dimensions of short audio clips and corresponding timbre features Jason Musil, Budr El-Nusairi, Daniel Müllensiefen Department of Psychology, Goldsmiths, University of London Question How do

More information

WE ADDRESS the development of a novel computational

WE ADDRESS the development of a novel computational IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 663 Dynamic Spectral Envelope Modeling for Timbre Analysis of Musical Instrument Sounds Juan José Burred, Member,

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

Analysis, Synthesis, and Perception of Musical Sounds

Analysis, Synthesis, and Perception of Musical Sounds Analysis, Synthesis, and Perception of Musical Sounds The Sound of Music James W. Beauchamp Editor University of Illinois at Urbana, USA 4y Springer Contents Preface Acknowledgments vii xv 1. Analysis

More information

Music Information Retrieval Community

Music Information Retrieval Community Music Information Retrieval Community What: Developing systems that retrieve music When: Late 1990 s to Present Where: ISMIR - conference started in 2000 Why: lots of digital music, lots of music lovers,

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

A Categorical Approach for Recognizing Emotional Effects of Music

A Categorical Approach for Recognizing Emotional Effects of Music A Categorical Approach for Recognizing Emotional Effects of Music Mohsen Sahraei Ardakani 1 and Ehsan Arbabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran,

More information

CONCATENATIVE SYNTHESIS FOR NOVEL TIMBRAL CREATION. A Thesis. presented to. the Faculty of California Polytechnic State University, San Luis Obispo

CONCATENATIVE SYNTHESIS FOR NOVEL TIMBRAL CREATION. A Thesis. presented to. the Faculty of California Polytechnic State University, San Luis Obispo CONCATENATIVE SYNTHESIS FOR NOVEL TIMBRAL CREATION A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo In Partial Fulfillment of the Requirements for the Degree

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory

More information

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning Mitsubishi Electric Research Laboratories, Cambridge, MA, USA {peker,ajayd,}@merl.com

More information

Fast Ethernet Consortium Clause 25 PMD-EEE Conformance Test Suite v1.1 Report

Fast Ethernet Consortium Clause 25 PMD-EEE Conformance Test Suite v1.1 Report Fast Ethernet Consortium Clause 25 PMD-EEE Conformance Test Suite v1.1 Report UNH-IOL 121 Technology Drive, Suite 2 Durham, NH 03824 +1-603-862-0090 Consortium Manager: Peter Scruton pjs@iol.unh.edu +1-603-862-4534

More information

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene Beat Extraction from Expressive Musical Performances Simon Dixon, Werner Goebl and Emilios Cambouropoulos Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria.

More information

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310,

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310, Aalborg Universitet A Causal Rhythm Grouping Jensen, Karl Kristoffer Published in: Lecture Notes in Computer Science Publication date: 2005 Document Version Early version, also known as pre-print Link

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval with Temporal Features and Timbre Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC

More information

Lab 5 Linear Predictive Coding

Lab 5 Linear Predictive Coding Lab 5 Linear Predictive Coding 1 of 1 Idea When plain speech audio is recorded and needs to be transmitted over a channel with limited bandwidth it is often necessary to either compress or encode the audio

More information

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS Item Type text; Proceedings Authors Habibi, A. Publisher International Foundation for Telemetering Journal International Telemetering Conference Proceedings

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Proceedings of the 3 rd International Conference on Control, Dynamic Systems, and Robotics (CDSR 16) Ottawa, Canada May 9 10, 2016 Paper No. 110 DOI: 10.11159/cdsr16.110 A Parametric Autoregressive Model

More information

Data will be analysed based upon actual screen size, but may be presented if necessary in three size bins : Screen size category Medium (27 to 39 )

Data will be analysed based upon actual screen size, but may be presented if necessary in three size bins : Screen size category Medium (27 to 39 ) Mapping Document Country: Technology: Sub Category: All Introduction The first stage in the Mapping and Benchmarking process is the definition of the products, i.e. clearly setting the boundaries that

More information

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY Matthias Mauch Mark Levy Last.fm, Karen House, 1 11 Bache s Street, London, N1 6DL. United Kingdom. matthias@last.fm mark@last.fm

More information