MUSI-6201 Computational Music Analysis

Size: px

Start display at page:

Download "MUSI-6201 Computational Music Analysis"

Jerome Turner
5 years ago
Views:

1 MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015

2 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp ) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

3 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp ) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

4 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp ) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

5 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp ) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

6 introduction one of the oldest research topics in MIR classic machine learning task related fields: speech-music classification instrument recognition artist identification music emotion recognition

7 introduction one of the oldest research topics in MIR classic machine learning task related fields: speech-music classification instrument recognition artist identification music emotion recognition

8 introduction one of the oldest research topics in MIR classic machine learning task related fields: speech-music classification instrument recognition artist identification music emotion recognition

9 applications large music databases: annotation sorting, browsing, retrieving recommendation systems automatic playlist generation mashup generation

10 genre: definition what is musical genre

11 genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems

12 genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems

13 genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies

14 genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song

15 genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song 3 ill-defined genre labels: geographic (indian music), historic (baroque), technical (barbershop), instrumentation (symphonic music), usage (christmas songs)

16 genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song 3 ill-defined genre labels: geographic (indian music), historic (baroque), technical (barbershop), instrumentation (symphonic music), usage (christmas songs) 4 taxonomy scalability: genres and subgenres evolve over time

17 genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song 3 ill-defined genre labels: geographic (indian music), historic (baroque), technical (barbershop), instrumentation (symphonic music), usage (christmas songs) 4 taxonomy scalability: genres and subgenres evolve over time 5 non-orthogonality: several genres for one piece of music

18 genre: taxonomy examples Speech Music Male Female Sports Disco Country Hip Hop Rock Blues Reggae Pop Metal Classical Jazz Choir Orchestra Piano String Quartet Big Band Cool Fusion Piano Quartet Swing Background Speech Music Male Female +Background Classical Non-Classical Chamber Orchestra Rock Electro/Pop Jazz/Blues Piano Solo String Quartet Other Symphonic +Choir +Soloist Soft Rock Hard Rock Hip Hop Techno/Dance Pop

$observations with humans 1 human classification far from perfect: 75 90 % for limited set of classes 2 for many genres, humans need only a fraction of a second to classify short time timbre features$

19 observations with humans 1 human classification far from perfect: % for limited set of classes 2 for many genres, humans need only a fraction of a second to classify short time timbre features sufficient? plots from 1, 2 1 S. Lippens, J.-P. Martens, T. D. Mulder, et al., A Comparison of Human and Automatic Musical Genre Classification, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, R. O. Gjerdingen and D. Perrott, Scanning the Dial: The Rapid Recognition of Music Genres, Journal of New Music Research, vol. 37, no. 2, pp , Jun. 2008, 00067, issn:

20 observations with humans 1 human classification far from perfect: % for limited set of classes 2 for many genres, humans need only a fraction of a second to classify short time timbre features sufficient? plots from 1, 2 1 S. Lippens, J.-P. Martens, T. D. Mulder, et al., A Comparison of Human and Automatic Musical Genre Classification, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, R. O. Gjerdingen and D. Perrott, Scanning the Dial: The Rapid Recognition of Music Genres, Journal of New Music Research, vol. 37, no. 2, pp , Jun. 2008, 00067, issn:

21 overview Audio Signal Feature Extraction Classification Genre Label 1 feature extraction dimensionality reduction meaningful representation 2 classification map or convert feature to comprehensible domain

22 overview Audio Signal Feature Extraction Classification Genre Label 1 feature extraction dimensionality reduction meaningful representation 2 classification map or convert feature to comprehensible domain

23 feature categories high level similarities? melody, hook lines, bass lines, harmony progression rhythm & tempo structure instrumentation & timbre... technical feature categories tonal technical timbral temporal intensity extracted features should be extractable (not: time envelope in polyphonic signals) relevant (not: pitch chroma for instrument ID) non-redundant have discriminative power (robust to noise)

24 feature categories high level similarities? melody, hook lines, bass lines, harmony progression rhythm & tempo structure instrumentation & timbre... technical feature categories tonal technical timbral temporal intensity extracted features should be extractable (not: time envelope in polyphonic signals) relevant (not: pitch chroma for instrument ID) non-redundant have discriminative power (robust to noise)

25 feature categories high level similarities? melody, hook lines, bass lines, harmony progression rhythm & tempo structure instrumentation & timbre... technical feature categories tonal technical timbral temporal intensity extracted features should be extractable (not: time envelope in polyphonic signals) relevant (not: pitch chroma for instrument ID) non-redundant have discriminative power (robust to noise)

26 instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

27 instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

28 instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

29 instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

30 instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

overview intro MGC classifiers example feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) compute long term features

31 overview intro MGC classifiers example feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) compute long term features & subfeatures per texture window compute subfeatures per file normalize subfeatures (select or) transform subfeatures feature vector classifier input summary

32 feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

33 feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

34 feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

35 feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

36 feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

37 feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input music speech std rms mean spectral centroid

38 long term features 1/2 derived from beat histogram 3 3 G. Tzanetakis and P. Cook, Musical genre classification of audio signals, Transactions on Speech and Audio Processing, vol. 10, no. 5, pp , Jul. 2002, issn: doi: /TSA

39 long term features 2/2 derived from pitch histogram or pitch chroma 4 4 G. Tzanetakis, A. Ermolinskyi, and P. Cook, Pitch Histograms in Audio and Symbolic Music Information Retrieval, in Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR), Paris, 2002.

40 additional feature examples stereo features mid channel energy vs. side channel energy spectral channel differences features at higher semantic levels: tempo, structure, harmonic complexity, instrumentation

41 additional feature examples stereo features mid channel energy vs. side channel energy spectral channel differences features at higher semantic levels: tempo, structure, harmonic complexity, instrumentation

42 classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

43 classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

44 classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

45 classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

46 classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

47 training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

48 training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

49 training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

50 training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

51 training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

52 training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

53 classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

54 classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

55 classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

56 classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

57 classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

58 classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

59 classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

60 classification: extract test vector and set class to majority of classifier: knn training: extract reference vectors from training set (keep class labels) matlab source: matlab/displayknn.m

61 classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors matlab source: matlab/displayknn.m

62 classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors k = 3 matlab source: matlab/displayknn.m

63 classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors k = 3 matlab source: matlab/displayknn.m k = 5

64 classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors k = 3 matlab source: matlab/displayknn.m k = 5 k = 7

65 classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors classifier data: all training vectors

66 classifier: GMM training: build model of each class distribution as superposition of Gaussian distributions classification: compute output of each Gaussian and select class with highest probability classifier data: per class per Gaussian: µ and covariance, mixture weight?

67 classifier: GMM training: build model of each class distribution as superposition of Gaussian distributions classification: compute output of each Gaussian and select class with highest probability matlab source: matlab/displaygmm.m

68 classifier: GMM training: build model of each class distribution as superposition of Gaussian distributions classification: compute output of each Gaussian and select class with highest probability classifier data: per class per Gaussian: µ and covariance, mixture weight?

69 classifier: SVM training: map features to high dimensional space find separating hyperplane (linear classification) through maximum distance of support vectors (data points) classification: apply feature transform and proceed with linear classification classifier data: support vectors, kernel, kernel parameters vector machine

70 classifier: SVM training: map features to high dimensional space find separating hyperplane (linear classification) through maximum distance of support vectors (data points) classification: apply feature transform and proceed with linear classification classifier data: support vectors, kernel, kernel parameters vector machine

71 classifier: SVM training: map features to high dimensional space find separating hyperplane (linear classification) through maximum distance of support vectors (data points) classification: apply feature transform and proceed with linear classification classifier data: support vectors, kernel, kernel parameters vector machine

72 results classification results depend on training set, test set, and number of classes typical ranges: 10 classes 50 80% note: results vary largely between datasets ill-defined genre boundaries non-uniformly distributed classes overfitting through songs from same album or artist...

73 results classification results depend on training set, test set, and number of classes typical ranges: 10 classes 50 80% note: results vary largely between datasets ill-defined genre boundaries non-uniformly distributed classes overfitting through songs from same album or artist...

74 results classification results depend on training set, test set, and number of classes typical ranges: 10 classes 50 80% note: results vary largely between datasets ill-defined genre boundaries non-uniformly distributed classes overfitting through songs from same album or artist...

75 speech/music classification baseline example 1 extract features 2 represent each file with its 2-dimensional feature vector 3 knn to classify unknown audio files 4 evaluate classification performance

76 speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value each mean of Spectral Centroid µ SC = 1 v SC (n) N standard deviation of RMS 1 σ RMS = (v RMS (n) µ RMS ) N 2 4 represent each file as 2-dimensional vector ( µsc, σ RMS ) T n n

77 speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value each mean of Spectral Centroid µ SC = 1 v SC (n) N standard deviation of RMS 1 σ RMS = (v RMS (n) µ RMS ) N 2 4 represent each file as 2-dimensional vector ( µsc, σ RMS ) T n n

78 speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value each mean of Spectral Centroid µ SC = 1 v SC (n) N standard deviation of RMS 1 σ RMS = (v RMS (n) µ RMS ) N 2 4 represent each file as 2-dimensional vector ( µsc, σ RMS ) T n n

79 speech/music classification example: features 2/2 std rms music speech matlab source: matlab/displayscatter.m mean spectral centroid

80 speech/music classification example: training set use dataset annotated as speech and music: requirements large compared to number of features representative for use case (diverse) here: 110 speech files 119 music files extract the features for the dataset

81 speech/music classification example: results (knn) confusion matrix: classification rate: speech music # files speech music = 84.2% single feature classification results Spectral Centroid: 56.7% RMS: 85.1%

82 speech/music classification example: results (knn) confusion matrix: classification rate: speech music # files speech music = 84.2% single feature classification results Spectral Centroid: 56.7% RMS: 85.1%

83 speech/music classification example: results (knn) confusion matrix: classification rate: speech music # files speech music = 84.2% single feature classification results Spectral Centroid: 56.7% RMS: 85.1%

84 summary lecture content 1 name three possible problems in the definition of the ground truth for genre classification 2 is it possible for genre classifiers to yield better accuracy than human experts 3 list the feature processing steps from audio to the input of the classifier

85 summary lecture content 1 name three possible problems in the definition of the ground truth for genre classification 2 is it possible for genre classifiers to yield better accuracy than human experts 3 list the feature processing steps from audio to the input of the classifier

86 summary lecture content 1 name three possible problems in the definition of the ground truth for genre classification 2 is it possible for genre classifiers to yield better accuracy than human experts 3 list the feature processing steps from audio to the input of the classifier

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for