AUDIO CLASSIFICATION USING SEMANTIC TRANSFORMATION AND CLASSIFIER ENSEMBLE

Size: px
Start display at page:

Download "AUDIO CLASSIFICATION USING SEMANTIC TRANSFORMATION AND CLASSIFIER ENSEMBLE"

Transcription

1 6th International WOCMAT & Ne Media Conference 200, YZU, Taouan, Taian, November 2-3, 200 AUDIO CLASSIFICATION USING SEMANTIC TRANSFORMATION AND CLASSIFIER ENSEMBLE Ju-Chiang Wang *, Hung-Yi Lo, Shh-ang Jeng * and Hsin-Min Wang * Department of Electrical Engineering, National Taian Universit, Taipei, Taian Institute of Information Science, Academia Sinica, Taipei, Taian asriver@iis.sinica.edu.t, hungi@iis.sinica.edu.t, seng@cc.ee.ntu.edu.t, hm@iis.sinica.edu.t ABSTRACT This paper presents our inning audio classification sstem in MIREX 200. Our sstem is implemented as follos. First, in the training phase, the frame-based 70- dimensional feature vectors are extracted from a training audio clip b MIRToolbox. Next, the Posterior Weighted Bernoulli Mixture Model (PWBMM) is applied to transform the frame-decomposed feature vectors of the training song into a fixed-dimensional semantic vector representation based on the pre-defined music tags; this procedure is called Semantic Transformation. Finall, for each class, the semantic vectors of associated training clips are used to train an ensemble classifier consisting of SVM and AdaBoost classifiers. In the classification phase, a testing audio clip is first represented b a semantic vector, and then the class ith the highest score is selected as the final output. Our sstem as raned first out of 36 submissions in the MIREX 200 audio mood classification tas.. INTRODUCTION Automatic music classification is a ver important topic in the music information retrieval (MIR) field. It as first addressed b Tanetais et al., ho ored on automatic musical genre classification of audio signals in 200 []. After ten ears of development, man inds of audio classification datasets have been created ith categor definitions and class labels corresponding to a set of audio examples. In addition, man approaches have been proposed for classifing music data according to genre [, 2], mood [3, 4], or artists [5, 6]. Music Information Retrieval Evaluation exchange (MIREX), an annul MIR algorithm competition held ointl ith ISMIR, started to evaluate audio classification from In the audio classification field, fixed numbers of categories or classes are usuall pre-defined b experts for different application tass. In general, these categories or classes should be definite and as mutuall exclusive as possible. Hoever, hen most people listen to a song the have never heard before, the usuall have certain musical impressions in their minds, although the ma not be able to name the exact musical categor of the song. These musical impressions inspired b direct auditor cues can be described b some general ords, such as exciting, nois, fast, male vocal, drum, and guitar. We believe that the co-occurrences of the musical impressions or concepts ma indicate the membership of a song in a specific audio class. Therefore, in this stud, e ill explore the relationship beteen the general tag ords and the specific categories. Since people tend to mentall tag a piece of music ith specific ords hen the listen to it, music tags are a natural a to describe the general musical concepts. The tags can include different tpes of musical information, such as genre, mood, and instrumentation. Therefore, e believe that the noledge of pregenerated music tags in a music dataset can help the classification of another music dataset. In other ords, e can train a music tagging sstem to recognie musical concepts of a song in terms of semantic tags first, and then the music classification sstem can classif the song into specific classes based on the semantic representation. Figure shos an overvie of our music classification sstem. There are to laers in our sstem, i.e., semantic transformation (ST) and ensemble classification (EC). In the training phase of the ST laer, e first extract audio features ith respect to various tpes of musical characteristics, including dnamics, spectral, timbre, and tonal features, from the training audio clips. Next, e appl the Posterior Weighted Bernoulli Mixture Model (PWBMM) [7] to automaticall tag the clips. The PWBMM performed ver ell in terms of the tag-based area under the receiver operating characteristic curve (AUC-ROC) in the MIREX 200 audio tag classification tas [8]. The AUC-ROC of the tag affinit output is an important a to evaluate the correct tendenc of the tagging prediction; therefore, e have proper confidence in appling the PWBMM in the music tagging step in our sstem. The PWBMM is trained on the MaorMiner dataset claed from the ebsite of the MaorMiner music tagging game. The dataset contains 2,472 ten-second audio clips and their associated tags. As shon in Table, e select 45 tags to define the semantic space. In other ords, a

2 6th International WOCMAT & Ne Media Conference 200, YZU, Taouan, Taian, November 2-3, 200 song is transformed into a 45-dimensional semantic vector over the pre-defined tags b ST based on the tagging procedure. In the MaorMiner dataset, the counts of a tag given to a music clip ranges from 2 to 2. These counts are also modeled b PWBMM and have been shon to facilitate the performance of music tag annotation [7]. In the training phase of the EC laer, for each class, the associated training audio clips, each represented b a 45-dimensional semantic vector, are used to train an ensemble classifier, consisting of support vector machine (SVM) and AdaBoost classifiers. In the final classification phase, given a testing audio clip, the class ith the highest output score is assigned to it. Audio Clips MIRToolbox.3 70-dim Feature Vectors Semantic Representation SVM Pr PWBMM T T2 T3 T4 T5 T45 Figure. The flochart of our audio classification sstem. The remainder of this paper is organied as follos. In Section 2, e describe the music features used in this or. In Section 3, e present ho to appl PWBMM for music semantic representation, and in Section 4, e present our ensemble classification method. We introduce the MIREX 200 audio train/test: mood classification tas and discuss the results in Section 5. Finall, the conclusion is given in Section 6. Probabilit Ensemble MaorMiner Dataset 2,472 ten-sec clips 45 pre-defined tags tag counts: 2~2 Fitted using ML AdaBoost Class Class2 Class3 Class4 Class5 Final Scores of Each Class Table. The 45 tags used in our music classification sstem. metal instrumental horns piano guitar ambient saxophone house loud bass fast eboard roc noise british solo electronica beat 80s dance strings drum machine a pop r&b female electronic voice rap male trumpet distortion quiet techno drum fun acoustic vocal organ soft countr hip hop snth slo pun 2. MUSIC FEATURE EXTRACTION We use MIRToolbox.3 2 for music feature extraction [9]. As shon in Table 2, four tpes of features are used in our sstem, including dnamics, spectral features, timbre, and tonal features. To ensure the alignment and prevent the mismatch of different features in a vector, all the features are extracted from the same fixed-sie shorttime frame. Given a song, a sequence of 70-dimensional feature vectors is extracted ith 50ms frame sie and 0.5 hop shift. Table 2. The music features used in the 70-dimensional frame-based vector. Tpes Feature Description Dim dnamics rms centroid spread seness urtosis spectral entrop flatness rolloff at 85% rolloff at 95% brightness roughness irregularit ero crossing rate spectral flux timbre MFCC 3 delta MFCC 3 delta-delta MFCC 3 e clarit e mode possibilit tonal HCDF chroma pea chroma centroid chroma 2 2

3 6th International WOCMAT & Ne Media Conference 200, YZU, Taouan, Taian, November 2-3, POSTERIOR WEIGHTED BERNOULLI MIXTURE MODEL The PWBMM-based music tagging sstem contains to steps. First, it converts the frame-based feature vectors of a song into a fixed-dimensional vector (in a Gaussian Mixture Model (GMM) posterior representation). Then, the Bernoulli Mixture Model (BMM) [0] predicts the scores over 45 music tags for the song. 3.. GMM Posterior Representation Before training the GMM, the feature vectors from all training audio clips are normalied to have a mean of 0 and standard deviation of in each dimension. Then, the GMM is fitted b using the expectation and maximiation (EM) algorithm. The generation of the GMM posterior representation can be vieed as a process of soft toeniation from a music bacground model. We address a latent music class as a latent variable {,, 2, } corresponding to the -th Gaussian component ith mixture eight π, mean vector μ, and covariance matrix Σ in the GMM. With the GMM, e can describe ho liel a given feature vector x belongs to a latent music class b the posterior probabilit of the latent music class: x) i π i π N ( x μ, Σ ). N ( x μ, Σ ) i i () Given a song s, b assuming that each frame contributes equall to the song, the posterior probabilit of a certain latent music class can be computed b N s ) x n ), (2) N n here x n is the feature vector of the n-th frame of song s and N is the number of frames in song s Bernoulli Mixture Model Assume that e have a training music corpus ith J audio clips, each denoted as s,,,j, and ith associated tag counts c,,,w. The tag counts are positive integers indicating the number of times that tag t has been assigned to clip s. The binar random variable, ith {0,}, represents the event of tag t appling to song s Generative Process The generative process of BMM has to steps. First, the probabilit of the latent class,,,, is chosen from song s s class eight vector θ : p ( θ ) θ, (3) here θ is the eight of the -th latent class. Second, a case of the discrete variable is selected based on the folloing conditional probabilities: 0, β ) β. (4), β ) β The conditional probabilit that models the probabilit of clip s having tag t is a Bernoulli distribution ith input discrete variable and parameter β for the -th class. The complete oint distribution over and is described ith model parameter β and eight matrix Θ, here its ro vector is θ of clip s :, β, Θ) J, β, θ ) J W θ β. (5) The marginal log lielihood of the music corpus can be expressed as: J W log β, Θ) log θ β. (6) Model Inference b the EM Algorithm The BMM can be fitted ith respect to parameter β and eight matrix Θ b maximum-lielihood (ML) estimation. B lining the latent class of BMM ith the latent music class of GMM described in Section 3., the posterior probabilit in Eq. (2) can be vieed as the class eight, i.e., θ s ). Therefore, e onl need to estimate β, hich corresponds to the probabilit that a latent music class occurs. We appl the EM algorithm to maximie the corpus-level log-lielihood in Eq. (6) in the presence of latent variable. In the E-step, given the clip-level eight matrix Θ and the model parameter β, the posterior probabilit of each latent variable can be computed b γ ( ) β, Θ, ), β ) β, θ θ β for θ i β θ ( β ) for 0. θ ( ) i β In the M-step, the update rule for β is as follos, ) θ ) (7)

4 6th International WOCMAT & Ne Media Conference 200, YZU, Taouan, Taian, November 2-3, 200 γ ( ) β. (8) γ ( ) From the tag counts of the music corpus, e no that there exist different levels of relationship beteen a clip and a tag. If clip s has a more-than-one tag count c for tag t, e can mae song s contribute to β c times rather than onl once in each iteration of EM. This leads to a ne update rule for β : c ( ) γ β. (9) c γ ( ) + γ ( ),, Semantic Transformation ith PWBMM The -th component of the semantic vector v of a given clip s is computed as the conditional probabilit of given θ and β: β, θ) θ β ) θ β. (0) For the ensemble classification laer, given an audio clip s m, m,2,,m, its semantic representation is generated in the same a. First, a sequence of music feature vectors is extracted from s m. Second, the vector sequence is transformed into a fixed dimensional posterior eight vector θ m via Eq. (2). Third, the eight vector θ m is transformed into a fixed dimensional semantic vector v m via Eq. (0). 4. THE ENSEMBLE CLASSIFICATION METHOD Assume that e have G classes for the audio classification tas, and all the classes are independent. We can train G binar ensemble classifiers, denoted as C g, g, 2,,G, for each class. Each ensemble classifier C g calculates a final score b combining the outputs of to sub-classifiers: SVM and AdaBoost. 4.. Support Vector Machine SVM finds a separating surface ith a large margin beteen training samples of to classes in a highdimensional feature space implicitl introduced b a computationall efficient ernel mapping. The large margin implies good generaliation abilit in theor. In this or, e exploited a linear SVM classifier f(v) of the folloing form: W f ( v ) λ v + b, () here v is the -th component of the semantic vector v of a testing clip; λ and b are parameters to be trained from (v m, l mg ), m,,m, here v m is the semantic vector of the m-th training clip and l mg {, 0} is the g- th class label of the m-th training clip; and W is the dimension of the semantic vector. The advantage of linear SVM is training efficienc. Certain recent literature has shon that it has comparable prediction performance compared to non-linear SVM. A single cost parameter is determined b using cross-validation AdaBoost Boosting is a method of finding a highl accurate classifier b combining several base classifiers, even though each of them is onl moderatel accurate. We use decision stumps as the base learner. The decision function of the boosting classifier taes the folloing form: T g( v) α th t ( v), (2) t here α t is set as suggested in [5]. The model selection procedure can be done efficientl as e can iterativel increase the number of base learners and stop hen the generaliation abilit ith respect to the validation set does not improve Calibrated Probabilit Scores and Probabilit Ensemble The ensemble classifier averages the scores of the to sub-classifiers, i.e., SVM and AdaBoost. Hoever, since the sub-classifiers for different classes are trained independentl, their ra scores are not comparable. Therefore, e transform the ra scores of the subclassifiers into probabilit scores ith a sigmoid function [7]: Pr( l v ), (3) + ex Af + B) here f is the ra score of a sub-classifier, and A and B are learned b solving a regularied maximum lielihood problem [8]. As the sub-classifier output has been calibrated into a probabilit score, a classifier ensemble for a specific class is formed b averaging the probabilit scores of associated SVM and AdaBoost subclassifiers, and the probabilit scores of classifiers for different classes become comparable. The class ith the highest output score is assigned to a testing music clip Cross-Validation We first perform inner cross-validation on the training

5 6th International WOCMAT & Ne Media Conference 200, YZU, Taouan, Taian, November 2-3, 200 set to determine the cost parameter C of linear SVM and the number of base learners in AdaBoost. Then, e retrain the classifiers ith the complete training set and the selected parameters. We use the AUC-ROC as the model selection criterion. 5. MIREX 200 AUDIO TRAIN/TEST: MUSIC MOOD CLASSIFICATION We submitted our audio classification sstem described above to the MIREX 200 Audio Train/Test tass. Due to some unnon reasons, onl the evaluation results on the music mood dataset ere reported (this also happens to some other teams), although e believe that our sstem is dedicated to adapt to an inds of audio classification datasets. In the folloing discussions, this sstem is denoted as WLJW2. We also submitted a simple sstem (WLJW) as a baseline sstem. In WLJW, the representation of an audio clip is the mean vector of all frame-based feature vectors of the clip, and a simple quadratic classifier [5] for each class is trained. 5.. The Music Mood Dataset The music mood dataset [4] as first used in MIREX There are second audio clips in 22,050H mono ave format selected from the APM collection 3. The corresponding five mood categories, each contains 20 clips, are shon in Table 3. The mood class of an audio clip is labeled b human udges using the Evalutron 6000 sstem [6]. Table 3. The five mood categories and their components. Class Mood Components passionate, rousing, confident, boisterous, rod rollicing, cheerful, fun, seet, amiable/good 2 natured literate, poignant, istful, bitterseet, 3 autumnal, brooding humorous, sill, camp, quir, himsical, 4 itt, r aggressive, fier, tense/anxious, intense, 5 volatile, visceral 5.2. Evaluation Results MIREX uses three-fold cross-validation to evaluate the sstems submitted. In each fold, one subset is selected as the test set and the remaining to subsets serve as the training set. The performance is summaried in Table 4 [7]. The summar accurac is the average accurac of 3 the three folds. The bold values represent the best performance in each evaluation metric. Table 4. The performance of all submissions on the music mood dataset. Submission Summar Accurac per Testing Fold Code Accurac 0 2 WLJW WLJW BMPE BRPC BRPC CH CH CH CH FCY FCY FE GP GR HE JR JR JR JR MBP MP MW RJ RJ R R RRS SSP TN TN TN TS WLB WLB WLB WLB It is clear that our sstem WLJW2 is raned first out of 36 submissions in terms of summar accurac. The summar accurac of WLJW2 is 0.34% higher than that of our baseline sstem WLJW. The results demonstrate that semantic transformation and classifier ensemble indeed enhance the audio classification performance. MIREX has also performed significance tests, and the results are shon in Figure 2. Figure 3 shos the overall class-pairs confusion matrix of WLJW2. According to the confusion matrix, our sstem reveals high confidence in classes 3 and 5, and the accuracies are 83.33% and 88.33%, respectivel.

6 6th International WOCMAT & Ne Media Conference 200, YZU, Taouan, Taian, November 2-3, 200 [3] D. Liu, L. Lu, and H.-J. Zhang, Automatic Mood Detection from Acoustic Music Data, ISMIR, [4] X. Hu, J. S. Donie, C. Laurier, M. Ba, and A. F. Ehmann, The 2007 MIREX Audio Mood Classification Tas: Lessons Learned, ISMIR, Figure 2. The significance tests on accurac per fold b Friedman's ANOVA / Tue ramer HSD [7]. Figure 3. The overall confusion matrix of WLJW2. 6. CONCLUSIONS In this paper, e have presented a music classification sstem integrating to laers of prediction based on semantic transformation and ensemble classification. The semantic transformation provides a musicall conceptual representation, hich matches human auditor sense to some extent, to a given audio clip. The robust ensemble classifier facilitates the final classification step. The results of MIREX evaluation tass have shon that our sstem achieves ver good performance compared to other sstems. 7. ACNOWLEDGEMENTS This or as supported in part b Taian e-learning and Digital Archives Program (TELDAP) sponsored b the National Science Council of Taian under Grant: NSC H REFERENCES [] G. Tanetais, G. Essl, and P. Coo, Automatic Musical Genre Classification of Audio Signals, ISMIR, 200. [2] T. Li, M. Ogihara, and Q. Li, A Comparative Stud on Content-Based Music Genre Classification, ACM SIGIR, [5] D. Ellis, B. Whitman, A. Bereneig, and S. Larence, "The Quest for Ground Truth in Musical Artist Similarit," ISMIR, [6] T. Li and M. Ogihara, Music Artist Stle Identification b Semisupervised Learning from both Lrics and Content, ACM MM, [7] J.-C. Wang, H.-S. Lee, S.-. Jeng, and H.-M. Wang, Posterior Weighted Bernoulli Mixture Model for Music Tag Annotation and Retrieval, APSIPA ASC, 200. [8] MIREX 200 Results: Audio Tag Affinit Estimation, Submission Code: WLJW3, Name: Adaptive PWBMM, g/subtas2_report/aff/ [9] O. Lartillot and P. Toiviainen, A Matlab Toolbox for Musical Feature Extraction from Audio, DAFx, [0] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, [] Y. Freund and R. E. Schapire, A Decision-theoretic Generaliation of On-line Learning and An Application to Boosting, Journal of Computer and Sstem Sciences, vol. 55, no., pp.9-39, 997. [2] H.-Y. Lo, J.-C. Wang, and H.-M. Wang, Homogeneous Segmentation and Classifier Ensemble for Audio Tag Annotation and Retrieval, ICME, 200. [3] J. Platt, Probabilistic Outputs for Support Vector Machines and Comparison to Regularied Lielihood Methods, Advances in Large Margin Classifiers, Cambridge, MA. [4] H.-T. Lin, C.-J. Lin, and R.-C. Weng, A Note on Platt's Probabilistic Outputs for Support Vector Machines, Machine Learning, vol. 68, no.3, pp , [5] W. J. ranosi, Principles of Multivariate Analsis: A User's Perspective, Ne Yor: Oxford Universit Press, 988. [6] A. A. Grud, J. S. Donie, M. C. Jones, and J. H. Lee, Evalutron 6000: Collecting Music Relevance Judgments, ACM JCDL, [7] MIREX 200 Results: Audio Mood Classification,

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA Ming-Ju Wu Computer Science Department National Tsing Hua University Hsinchu, Taiwan brian.wu@mirlab.org Jyh-Shing Roger Jang Computer

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

A Categorical Approach for Recognizing Emotional Effects of Music

A Categorical Approach for Recognizing Emotional Effects of Music A Categorical Approach for Recognizing Emotional Effects of Music Mohsen Sahraei Ardakani 1 and Ehsan Arbabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran,

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

TOWARDS TIME-VARYING MUSIC AUTO-TAGGING BASED ON CAL500 EXPANSION

TOWARDS TIME-VARYING MUSIC AUTO-TAGGING BASED ON CAL500 EXPANSION TOWARDS TIME-VARYING MUSIC AUTO-TAGGING BASED ON CAL500 EXPANSION Shuo-Yang Wang 1, Ju-Chiang Wang 1,2, Yi-Hsuan Yang 1, and Hsin-Min Wang 1 1 Academia Sinica, Taipei, Taiwan 2 University of California,

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Mood Tracking of Radio Station Broadcasts

Mood Tracking of Radio Station Broadcasts Mood Tracking of Radio Station Broadcasts Jacek Grekow Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, Bialystok 15-351, Poland j.grekow@pb.edu.pl Abstract. This paper presents

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Classification of Timbre Similarity

Classification of Timbre Similarity Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Music Information Retrieval

Music Information Retrieval CTP 431 Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology (GSCT) Juhan Nam 1 Introduction ü Instrument: Piano ü Composer: Chopin ü Key: E-minor ü Melody - ELO

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Features for Audio and Music Classification

Features for Audio and Music Classification Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Music Information Retrieval Community

Music Information Retrieval Community Music Information Retrieval Community What: Developing systems that retrieve music When: Late 1990 s to Present Where: ISMIR - conference started in 2000 Why: lots of digital music, lots of music lovers,

More information

Exploring Relationships between Audio Features and Emotion in Music

Exploring Relationships between Audio Features and Emotion in Music Exploring Relationships between Audio Features and Emotion in Music Cyril Laurier, *1 Olivier Lartillot, #2 Tuomas Eerola #3, Petri Toiviainen #4 * Music Technology Group, Universitat Pompeu Fabra, Barcelona,

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Lecture 15: Research at LabROSA

Lecture 15: Research at LabROSA ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 15: Research at LabROSA 1. Sources, Mixtures, & Perception 2. Spatial Filtering 3. Time-Frequency Masking 4. Model-Based Separation Dan Ellis Dept. Electrical

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect

More information

TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS. Hamid Eghbal-zadeh, Markus Schedl and Gerhard Widmer

TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS. Hamid Eghbal-zadeh, Markus Schedl and Gerhard Widmer TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS Hamid Eghbal-zadeh, Markus Schedl and Gerhard Widmer Department of Computational Perception Johannes Kepler University of Linz, Austria ABSTRACT

More information

Unifying Low-level and High-level Music. Similarity Measures

Unifying Low-level and High-level Music. Similarity Measures Unifying Low-level and High-level Music 1 Similarity Measures Dmitry Bogdanov, Joan Serrà, Nicolas Wack, Perfecto Herrera, and Xavier Serra Abstract Measuring music similarity is essential for multimedia

More information

A New Method for Calculating Music Similarity

A New Method for Calculating Music Similarity A New Method for Calculating Music Similarity Eric Battenberg and Vijay Ullal December 12, 2006 Abstract We introduce a new technique for calculating the perceived similarity of two songs based on their

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information

The Million Song Dataset

The Million Song Dataset The Million Song Dataset AUDIO FEATURES The Million Song Dataset There is no data like more data Bob Mercer of IBM (1985). T. Bertin-Mahieux, D.P.W. Ellis, B. Whitman, P. Lamere, The Million Song Dataset,

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

Music Information Retrieval for Jazz

Music Information Retrieval for Jazz Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

VECTOR REPRESENTATION OF EMOTION FLOW FOR POPULAR MUSIC. Chia-Hao Chung and Homer Chen

VECTOR REPRESENTATION OF EMOTION FLOW FOR POPULAR MUSIC. Chia-Hao Chung and Homer Chen VECTOR REPRESENTATION OF EMOTION FLOW FOR POPULAR MUSIC Chia-Hao Chung and Homer Chen National Taiwan University Emails: {b99505003, homer}@ntu.edu.tw ABSTRACT The flow of emotion expressed by music through

More information

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz

More information

A Music Retrieval System Using Melody and Lyric

A Music Retrieval System Using Melody and Lyric 202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC Vaiva Imbrasaitė, Peter Robinson Computer Laboratory, University of Cambridge, UK Vaiva.Imbrasaite@cl.cam.ac.uk

More information

AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION

AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION 12th International Society for Music Information Retrieval Conference (ISMIR 2011) AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION Yu-Ren Chien, 1,2 Hsin-Min Wang, 2 Shyh-Kang Jeng 1,3 1 Graduate

More information

Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis

Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis R. Panda 1, R. Malheiro 1, B. Rocha 1, A. Oliveira 1 and R. P. Paiva 1, 1 CISUC Centre for Informatics and Systems

More information

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. X, MONTH Unifying Low-level and High-level Music Similarity Measures

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. X, MONTH Unifying Low-level and High-level Music Similarity Measures IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. X, MONTH 2010. 1 Unifying Low-level and High-level Music Similarity Measures Dmitry Bogdanov, Joan Serrà, Nicolas Wack, Perfecto Herrera, and Xavier Serra Abstract

More information

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1343 Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet Abstract

More information

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1,

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1, Automatic LP Digitalization 18-551 Spring 2011 Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1, ptsatsou}@andrew.cmu.edu Introduction This project was originated from our interest

More information

SIGNAL + CONTEXT = BETTER CLASSIFICATION

SIGNAL + CONTEXT = BETTER CLASSIFICATION SIGNAL + CONTEXT = BETTER CLASSIFICATION Jean-Julien Aucouturier Grad. School of Arts and Sciences The University of Tokyo, Japan François Pachet, Pierre Roy, Anthony Beurivé SONY CSL Paris 6 rue Amyot,

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

Singer Identification

Singer Identification Singer Identification Bertrand SCHERRER McGill University March 15, 2007 Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 1 / 27 Outline 1 Introduction Applications Challenges

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson Automatic Music Similarity Assessment and Recommendation A Thesis Submitted to the Faculty of Drexel University by Donald Shaul Williamson in partial fulfillment of the requirements for the degree of Master

More information

A Survey of Audio-Based Music Classification and Annotation

A Survey of Audio-Based Music Classification and Annotation A Survey of Audio-Based Music Classification and Annotation Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang IEEE Trans. on Multimedia, vol. 13, no. 2, April 2011 presenter: Yin-Tzu Lin ( 阿孜孜 ^.^)

More information

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno* *Grad. Sch l of Informatics, Kyoto Univ. **PRESTO JST / Nat

More information

Perceptual dimensions of short audio clips and corresponding timbre features

Perceptual dimensions of short audio clips and corresponding timbre features Perceptual dimensions of short audio clips and corresponding timbre features Jason Musil, Budr El-Nusairi, Daniel Müllensiefen Department of Psychology, Goldsmiths, University of London Question How do

More information

Multimodal Music Mood Classification Framework for Christian Kokborok Music

Multimodal Music Mood Classification Framework for Christian Kokborok Music Journal of Engineering Technology (ISSN. 0747-9964) Volume 8, Issue 1, Jan. 2019, PP.506-515 Multimodal Music Mood Classification Framework for Christian Kokborok Music Sanchali Das 1*, Sambit Satpathy

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Homework 2 Key-finding algorithm

Homework 2 Key-finding algorithm Homework 2 Key-finding algorithm Li Su Research Center for IT Innovation, Academia, Taiwan lisu@citi.sinica.edu.tw (You don t need any solid understanding about the musical key before doing this homework,

More information

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval with Temporal Features and Timbre Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC

More information

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark 214 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION Gregory Sell and Pascal Clark Human Language Technology Center

More information

Variation in multitrack mixes : analysis of low level audio signal features

Variation in multitrack mixes : analysis of low level audio signal features Variation in multitrack mixes : analysis of low level audio signal features Wilson, AD and Fazenda, BM 10.17743/jaes.2016.0029 Title Authors Type URL Variation in multitrack mixes : analysis of low level

More information

MODELING MUSICAL MOOD FROM AUDIO FEATURES AND LISTENING CONTEXT ON AN IN-SITU DATA SET

MODELING MUSICAL MOOD FROM AUDIO FEATURES AND LISTENING CONTEXT ON AN IN-SITU DATA SET MODELING MUSICAL MOOD FROM AUDIO FEATURES AND LISTENING CONTEXT ON AN IN-SITU DATA SET Diane Watson University of Saskatchewan diane.watson@usask.ca Regan L. Mandryk University of Saskatchewan regan.mandryk@usask.ca

More information

Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index

Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index Kwan Kim Submitted in partial fulfillment of the requirements for the Master of Music in Music Technology in the Department

More information

Content-based music retrieval

Content-based music retrieval Music retrieval 1 Music retrieval 2 Content-based music retrieval Music information retrieval (MIR) is currently an active research area See proceedings of ISMIR conference and annual MIREX evaluations

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

Analysing Musical Pieces Using harmony-analyser.org Tools

Analysing Musical Pieces Using harmony-analyser.org Tools Analysing Musical Pieces Using harmony-analyser.org Tools Ladislav Maršík Dept. of Software Engineering, Faculty of Mathematics and Physics Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech

More information

Coimbra, Coimbra, Portugal Published online: 18 Apr To link to this article:

Coimbra, Coimbra, Portugal Published online: 18 Apr To link to this article: This article was downloaded by: [Professor Rui Pedro Paiva] On: 14 May 2015, At: 03:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Lesson 5 Contents Overview of Lesson 5 Rhythm Change 1a Rhythm Watch Time Signature Test Time Dotted Half Notes Flower Waltz Three Step Waltz

Lesson 5 Contents Overview of Lesson 5 Rhythm Change 1a Rhythm Watch Time Signature Test Time Dotted Half Notes Flower Waltz Three Step Waltz Lesson 5 Contents Overvie of Lesson 5 Rhythm Change a Rhythm Watch b Time Signature c Test Time 2 Dotted Half Notes 2a Floer Waltz 2b Three Step Waltz 2c Autumn Leaves 2d Single Bass Notes Bass Staff Notes

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

HIT SONG SCIENCE IS NOT YET A SCIENCE

HIT SONG SCIENCE IS NOT YET A SCIENCE HIT SONG SCIENCE IS NOT YET A SCIENCE François Pachet Sony CSL pachet@csl.sony.fr Pierre Roy Sony CSL roy@csl.sony.fr ABSTRACT We describe a large-scale experiment aiming at validating the hypothesis that

More information

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation for Polyphonic Electro-Acoustic Music Annotation Sebastien Gulluni 2, Slim Essid 2, Olivier Buisson, and Gaël Richard 2 Institut National de l Audiovisuel, 4 avenue de l Europe 94366 Bry-sur-marne Cedex,

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS Juhan Nam Stanford

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

DETECTION OF PITCHED/UNPITCHED SOUND USING PITCH STRENGTH CLUSTERING

DETECTION OF PITCHED/UNPITCHED SOUND USING PITCH STRENGTH CLUSTERING ISMIR 28 Session 4c Automatic Music Analysis and Transcription DETECTIO OF PITCHED/UPITCHED SOUD USIG PITCH STREGTH CLUSTERIG Arturo Camacho Computer and Information Science and Engineering Department

More information

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION Graham E. Poliner and Daniel P.W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University, New York NY 127 USA {graham,dpwe}@ee.columbia.edu

More information

TRACKING THE ODD : METER INFERENCE IN A CULTURALLY DIVERSE MUSIC CORPUS

TRACKING THE ODD : METER INFERENCE IN A CULTURALLY DIVERSE MUSIC CORPUS TRACKING THE ODD : METER INFERENCE IN A CULTURALLY DIVERSE MUSIC CORPUS Andre Holzapfel New York University Abu Dhabi andre@rhythmos.org Florian Krebs Johannes Kepler University Florian.Krebs@jku.at Ajay

More information

Research & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION

Research & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION Research & Development White Paper WHP 232 September 2012 A Large Scale Experiment for Mood-based Classification of TV Programmes Jana Eggink, Denise Bland BRITISH BROADCASTING CORPORATION White Paper

More information

Music Mood Classication Using The Million Song Dataset

Music Mood Classication Using The Million Song Dataset Music Mood Classication Using The Million Song Dataset Bhavika Tekwani December 12, 2016 Abstract In this paper, music mood classication is tackled from an audio signal analysis perspective. There's an

More information

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL Matthew Riley University of Texas at Austin mriley@gmail.com Eric Heinen University of Texas at Austin eheinen@mail.utexas.edu Joydeep Ghosh University

More information

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Arijit Ghosal, Rudrasis Chakraborty, Bibhas Chandra Dhara +, and Sanjoy Kumar Saha! * CSE Dept., Institute of Technology

More information

PREDICTING THE PERCEIVED SPACIOUSNESS OF STEREOPHONIC MUSIC RECORDINGS

PREDICTING THE PERCEIVED SPACIOUSNESS OF STEREOPHONIC MUSIC RECORDINGS PREDICTING THE PERCEIVED SPACIOUSNESS OF STEREOPHONIC MUSIC RECORDINGS Andy M. Sarroff and Juan P. Bello New York University andy.sarroff@nyu.edu ABSTRACT In a stereophonic music production, music producers

More information

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface 1st Author 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl. country code 1st author's

More information

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR

NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR 12th International Society for Music Information Retrieval Conference (ISMIR 2011) NEXTONE PLAYER: A MUSIC RECOMMENDATION SYSTEM BASED ON USER BEHAVIOR Yajie Hu Department of Computer Science University

More information

Popular Song Summarization Using Chorus Section Detection from Audio Signal

Popular Song Summarization Using Chorus Section Detection from Audio Signal Popular Song Summarization Using Chorus Section Detection from Audio Signal Sheng GAO 1 and Haizhou LI 2 Institute for Infocomm Research, A*STAR, Singapore 1 gaosheng@i2r.a-star.edu.sg 2 hli@i2r.a-star.edu.sg

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features

Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features R. Panda 1, B. Rocha 1 and R. P. Paiva 1, 1 CISUC Centre for Informatics and Systems of the University of Coimbra, Portugal

More information