MUSI-6201 Computational Music Analysis

Similar documents
INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Music Genre Classification and Variance Comparison on Number of Genres

Automatic Music Clustering using Audio Attributes

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Outline. Why do we classify? Audio Classification

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Supervised Learning in Genre Classification

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Classification of Timbre Similarity

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Subjective Similarity of Music: Data Collection for Individuality Analysis

Music Genre Classification

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

A Survey of Audio-Based Music Classification and Annotation

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Topics in Computer Music Instrument Identification. Ioanna Karydi

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Automatic Music Genre Classification

Week 14 Music Understanding and Classification

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Automatic Rhythmic Notation from Single Voice Audio Sources

Music Similarity and Cover Song Identification: The Case of Jazz

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Music Information Retrieval Community

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Singer Traits Identification using Deep Neural Network

Music Information Retrieval

Unifying Low-level and High-level Music. Similarity Measures

Features for Audio and Music Classification

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. X, MONTH Unifying Low-level and High-level Music Similarity Measures

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A Categorical Approach for Recognizing Emotional Effects of Music

CS 591 S1 Computational Audio

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

TOWARD UNDERSTANDING EXPRESSIVE PERCUSSION THROUGH CONTENT BASED ANALYSIS

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS. Hamid Eghbal-zadeh, Markus Schedl and Gerhard Widmer

Computational Modelling of Harmony

jsymbolic 2: New Developments and Research Opportunities

Content-based music retrieval

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

Music Recommendation from Song Sets

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation

TOWARDS CHARACTERISATION OF MUSIC VIA RHYTHMIC PATTERNS

Data Driven Music Understanding

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

Tempo and Beat Analysis

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Improving Frame Based Automatic Laughter Detection

HIT SONG SCIENCE IS NOT YET A SCIENCE

MODELS of music begin with a representation of the

A MUSIC CLASSIFICATION METHOD BASED ON TIMBRAL FEATURES

The song remains the same: identifying versions of the same piece using tonal descriptors

Chord Classification of an Audio Signal using Artificial Neural Network

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Creating a Feature Vector to Identify Similarity between MIDI Files

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Automatic Labelling of tabla signals

Lecture 9 Source Separation

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Effects of acoustic degradations on cover song recognition

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

An Accurate Timbre Model for Musical Instruments and its Application to Classification

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Contextual music information retrieval and recommendation: State of the art and challenges

WE ADDRESS the development of a novel computational

The Million Song Dataset

Analyzing the Relationship Among Audio Labels Using Hubert-Arabie adjusted Rand Index

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

MUSICAL NOTE AND INSTRUMENT CLASSIFICATION WITH LIKELIHOOD-FREQUENCY-TIME ANALYSIS AND SUPPORT VECTOR MACHINES

Music Structure Analysis

Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features

Audio Feature Extraction for Corpus Analysis

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Aalborg Universitet. Feature Extraction for Music Information Retrieval Jensen, Jesper Højvang. Publication date: 2009

Audio Structure Analysis

Music Information Retrieval

Multimodal Music Mood Classification Framework for Christian Kokborok Music

THE POTENTIAL FOR AUTOMATIC ASSESSMENT OF TRUMPET TONE QUALITY

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

An Examination of Foote s Self-Similarity Method

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Hidden Markov Model based dance recognition

Automatic Piano Music Transcription

Extracting Information from Music Audio

THE importance of music content analysis for musical

A Language Modeling Approach for the Classification of Audio Music

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam


Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music

Transcription:

MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015

temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155) sources: slides (latex) & Matlab github repository lecture content definition of musical genre typical features and feature categories simple classifiers and basic classifier properties

introduction one of the oldest research topics in MIR classic machine learning task related fields: speech-music classification instrument recognition artist identification music emotion recognition

introduction one of the oldest research topics in MIR classic machine learning task related fields: speech-music classification instrument recognition artist identification music emotion recognition

introduction one of the oldest research topics in MIR classic machine learning task related fields: speech-music classification instrument recognition artist identification music emotion recognition

applications large music databases: annotation sorting, browsing, retrieving recommendation systems automatic playlist generation mashup generation

genre: definition what is musical genre

genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems

genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems

genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies

genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song

genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song 3 ill-defined genre labels: geographic (indian music), historic (baroque), technical (barbershop), instrumentation (symphonic music), usage (christmas songs)

genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song 3 ill-defined genre labels: geographic (indian music), historic (baroque), technical (barbershop), instrumentation (symphonic music), usage (christmas songs) 4 taxonomy scalability: genres and subgenres evolve over time

genre: definition what is musical genre clusters of musical similarity? hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies 2 genre label scope: song, album, artist, piece of a song 3 ill-defined genre labels: geographic (indian music), historic (baroque), technical (barbershop), instrumentation (symphonic music), usage (christmas songs) 4 taxonomy scalability: genres and subgenres evolve over time 5 non-orthogonality: several genres for one piece of music

genre: taxonomy examples Speech Music Male Female Sports Disco Country Hip Hop Rock Blues Reggae Pop Metal Classical Jazz Choir Orchestra Piano String Quartet Big Band Cool Fusion Piano Quartet Swing Background Speech Music Male Female +Background Classical Non-Classical Chamber Orchestra Rock Electro/Pop Jazz/Blues Piano Solo String Quartet Other Symphonic +Choir +Soloist Soft Rock Hard Rock Hip Hop Techno/Dance Pop

observations with humans 1 human classification far from perfect: 75 90 % for limited set of classes 2 for many genres, humans need only a fraction of a second to classify short time timbre features sufficient? plots from 1, 2 1 S. Lippens, J.-P. Martens, T. D. Mulder, et al., A Comparison of Human and Automatic Musical Genre Classification, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, 2004. 2 R. O. Gjerdingen and D. Perrott, Scanning the Dial: The Rapid Recognition of Music Genres, Journal of New Music Research, vol. 37, no. 2, pp. 93 100, Jun. 2008, 00067, issn: 0929-8215.

observations with humans 1 human classification far from perfect: 75 90 % for limited set of classes 2 for many genres, humans need only a fraction of a second to classify short time timbre features sufficient? plots from 1, 2 1 S. Lippens, J.-P. Martens, T. D. Mulder, et al., A Comparison of Human and Automatic Musical Genre Classification, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, 2004. 2 R. O. Gjerdingen and D. Perrott, Scanning the Dial: The Rapid Recognition of Music Genres, Journal of New Music Research, vol. 37, no. 2, pp. 93 100, Jun. 2008, 00067, issn: 0929-8215.

overview Audio Signal Feature Extraction Classification Genre Label 1 feature extraction dimensionality reduction meaningful representation 2 classification map or convert feature to comprehensible domain

overview Audio Signal Feature Extraction Classification Genre Label 1 feature extraction dimensionality reduction meaningful representation 2 classification map or convert feature to comprehensible domain

feature categories high level similarities? melody, hook lines, bass lines, harmony progression rhythm & tempo structure instrumentation & timbre... technical feature categories tonal technical timbral temporal intensity extracted features should be extractable (not: time envelope in polyphonic signals) relevant (not: pitch chroma for instrument ID) non-redundant have discriminative power (robust to noise)

feature categories high level similarities? melody, hook lines, bass lines, harmony progression rhythm & tempo structure instrumentation & timbre... technical feature categories tonal technical timbral temporal intensity extracted features should be extractable (not: time envelope in polyphonic signals) relevant (not: pitch chroma for instrument ID) non-redundant have discriminative power (robust to noise)

feature categories high level similarities? melody, hook lines, bass lines, harmony progression rhythm & tempo structure instrumentation & timbre... technical feature categories tonal technical timbral temporal intensity extracted features should be extractable (not: time envelope in polyphonic signals) relevant (not: pitch chroma for instrument ID) non-redundant have discriminative power (robust to noise)

instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux,... pitch features (tonal): pitch chroma distribution/change,... rhythm features (temporal): onset density, beat histogram features,... statistical features (technical): standard deviation, skewness, zero crossings,... intensity features: level variation, number of pauses,...

overview intro MGC classifiers example feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) compute long term features & subfeatures per texture window compute subfeatures per file normalize subfeatures (select or) transform subfeatures feature vector classifier input 3 4 5 6 7 summary

feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input

feature extraction 1 extract instantaneous features 2 compute derived features (derivative, filtered) 3 compute long term features & subfeatures per texture window 4 compute subfeatures per file 5 normalize subfeatures 6 (select or) transform subfeatures 7 feature vector classifier input music speech std rms mean spectral centroid

long term features 1/2 derived from beat histogram 3 3 G. Tzanetakis and P. Cook, Musical genre classification of audio signals, Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293 302, Jul. 2002, issn: 1063-6676. doi: 10.1109/TSA.2002.800560.

long term features 2/2 derived from pitch histogram or pitch chroma 4 4 G. Tzanetakis, A. Ermolinskyi, and P. Cook, Pitch Histograms in Audio and Symbolic Music Information Retrieval, in Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR), Paris, 2002.

additional feature examples stereo features mid channel energy vs. side channel energy spectral channel differences features at higher semantic levels: tempo, structure, harmonic complexity, instrumentation

additional feature examples stereo features mid channel energy vs. side channel energy spectral channel differences features at higher semantic levels: tempo, structure, harmonic complexity, instrumentation

classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

classification: general steps 1 define training set: annotated results 2 normalize training set 3 train classifier 4 evaluate classifier with test set 5 (adjust classifier settings, return to 4.)

training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

training set training set size vs. number of features training set too small overfitting feature number too large overfitting training set too noisy underfitting training set not representative bad classification performance classifier poor classifier bad classification performance different classifier features poor features bad classification performance feature selection new, better features features not normalized possibly bad classification performance feature range feature mean feature distribution classifier: rules of thumb

classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

classifier: evaluation define test set for evaluation test set different from training set otherwise, same requirements example: N-fold cross validation 1 split training set into N parts (randomly, but preferably identical number per class) 2 select one part as test set 3 train the classifier with all observations from remaining N 1 parts 4 compute the classification rate for the test set 5 repeat until all N parts have been tested 6 overall result: average classification rate

classification: extract test vector and set class to majority of classifier: knn training: extract reference vectors from training set (keep class labels) matlab source: matlab/displayknn.m

classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors matlab source: matlab/displayknn.m

classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors k = 3 matlab source: matlab/displayknn.m

classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors k = 3 matlab source: matlab/displayknn.m k = 5

classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors k = 3 matlab source: matlab/displayknn.m k = 5 k = 7

classifier: knn training: extract reference vectors from training set (keep class labels) classification: extract test vector and set class to majority of k nearest reference vectors classifier data: all training vectors

classifier: GMM training: build model of each class distribution as superposition of Gaussian distributions classification: compute output of each Gaussian and select class with highest probability classifier data: per class per Gaussian: µ and covariance, mixture weight?

classifier: GMM training: build model of each class distribution as superposition of Gaussian distributions classification: compute output of each Gaussian and select class with highest probability matlab source: matlab/displaygmm.m

classifier: GMM training: build model of each class distribution as superposition of Gaussian distributions classification: compute output of each Gaussian and select class with highest probability classifier data: per class per Gaussian: µ and covariance, mixture weight?

classifier: SVM training: map features to high dimensional space find separating hyperplane (linear classification) through maximum distance of support vectors (data points) classification: apply feature transform and proceed with linear classification classifier data: support vectors, kernel, kernel parameters https://en.wikipedia.org/wiki/support vector machine

classifier: SVM training: map features to high dimensional space find separating hyperplane (linear classification) through maximum distance of support vectors (data points) classification: apply feature transform and proceed with linear classification classifier data: support vectors, kernel, kernel parameters https://en.wikipedia.org/wiki/support vector machine

classifier: SVM training: map features to high dimensional space find separating hyperplane (linear classification) through maximum distance of support vectors (data points) classification: apply feature transform and proceed with linear classification classifier data: support vectors, kernel, kernel parameters https://en.wikipedia.org/wiki/support vector machine

results classification results depend on training set, test set, and number of classes typical ranges: 10 classes 50 80% note: results vary largely between datasets ill-defined genre boundaries non-uniformly distributed classes overfitting through songs from same album or artist...

results classification results depend on training set, test set, and number of classes typical ranges: 10 classes 50 80% note: results vary largely between datasets ill-defined genre boundaries non-uniformly distributed classes overfitting through songs from same album or artist...

results classification results depend on training set, test set, and number of classes typical ranges: 10 classes 50 80% note: results vary largely between datasets ill-defined genre boundaries non-uniformly distributed classes overfitting through songs from same album or artist...

speech/music classification baseline example 1 extract features 2 represent each file with its 2-dimensional feature vector 3 knn to classify unknown audio files 4 evaluate classification performance

speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value each mean of Spectral Centroid µ SC = 1 v SC (n) N standard deviation of RMS 1 σ RMS = (v RMS (n) µ RMS ) N 2 4 represent each file as 2-dimensional vector ( µsc, σ RMS ) T n n

speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value each mean of Spectral Centroid µ SC = 1 v SC (n) N standard deviation of RMS 1 σ RMS = (v RMS (n) µ RMS ) N 2 4 represent each file as 2-dimensional vector ( µsc, σ RMS ) T n n

speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value each mean of Spectral Centroid µ SC = 1 v SC (n) N standard deviation of RMS 1 σ RMS = (v RMS (n) µ RMS ) N 2 4 represent each file as 2-dimensional vector ( µsc, σ RMS ) T n n

speech/music classification example: features 2/2 std rms music speech matlab source: matlab/displayscatter.m mean spectral centroid

speech/music classification example: training set use dataset annotated as speech and music: requirements large compared to number of features representative for use case (diverse) here: 110 speech files 119 music files extract the features for the dataset

speech/music classification example: results (knn) confusion matrix: classification rate: speech music # files speech 93 17 110 music 19 100 119 100 + 93 110 + 119 = 84.2% single feature classification results Spectral Centroid: 56.7% RMS: 85.1%

speech/music classification example: results (knn) confusion matrix: classification rate: speech music # files speech 93 17 110 music 19 100 119 100 + 93 110 + 119 = 84.2% single feature classification results Spectral Centroid: 56.7% RMS: 85.1%

speech/music classification example: results (knn) confusion matrix: classification rate: speech music # files speech 93 17 110 music 19 100 119 100 + 93 110 + 119 = 84.2% single feature classification results Spectral Centroid: 56.7% RMS: 85.1%

summary lecture content 1 name three possible problems in the definition of the ground truth for genre classification 2 is it possible for genre classifiers to yield better accuracy than human experts 3 list the feature processing steps from audio to the input of the classifier

summary lecture content 1 name three possible problems in the definition of the ground truth for genre classification 2 is it possible for genre classifiers to yield better accuracy than human experts 3 list the feature processing steps from audio to the input of the classifier

summary lecture content 1 name three possible problems in the definition of the ground truth for genre classification 2 is it possible for genre classifiers to yield better accuracy than human experts 3 list the feature processing steps from audio to the input of the classifier