Features for Audio and Music Classification

Similar documents
GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Music Genre Classification and Variance Comparison on Number of Genres

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

MUSI-6201 Computational Music Analysis

Classification of Timbre Similarity

Topics in Computer Music Instrument Identification. Ioanna Karydi

Automatic Laughter Detection

A New Method for Calculating Music Similarity

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Supervised Learning in Genre Classification

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

Automatic Laughter Detection

Recognising Cello Performers Using Timbre Models

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Improving Frame Based Automatic Laughter Detection

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Subjective Similarity of Music: Data Collection for Individuality Analysis

Recognising Cello Performers using Timbre Models

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Music Recommendation from Song Sets

python_speech_features Documentation

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

A FEATURE SELECTION APPROACH FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Acoustic Scene Classification


Speech and Speaker Recognition for the Command of an Industrial Robot

Music Genre Classification

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Singer Traits Identification using Deep Neural Network

Outline. Why do we classify? Audio Classification

Psychoacoustic Evaluation of Fan Noise

Normalized Cumulative Spectral Distribution in Music

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

TOWARD UNDERSTANDING EXPRESSIVE PERCUSSION THROUGH CONTENT BASED ANALYSIS

Music Information Retrieval for Jazz

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Comparison Parameters and Speaker Similarity Coincidence Criteria:

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Singer Identification

Figure 1: Feature Vector Sequence Generator block diagram.

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility

Determination of Sound Quality of Refrigerant Compressors

Voice Controlled Car System

Data Driven Music Understanding

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

Digital Signal Processing. Prof. Dietrich Klakow Rahil Mahdian

System Identification

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Music Information Retrieval with Temporal Features and Timbre

Temporal Envelope and Periodicity Cues on Musical Pitch Discrimination with Acoustic Simulation of Cochlear Implant

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly

Singing Voice Detection for Karaoke Application

Lecture 15: Research at LabROSA

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

A Survey of Audio-Based Music Classification and Annotation

Automatic Rhythmic Notation from Single Voice Audio Sources

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Perceptual dimensions of short audio clips and corresponding timbre features

An Examination of Foote s Self-Similarity Method

WE ADDRESS the development of a novel computational

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Automatic Classification of Instrumental Music & Human Voice Using Formant Analysis

FACTORS AFFECTING AUTOMATIC GENRE CLASSIFICATION: AN INVESTIGATION INCORPORATING NON-WESTERN MUSICAL FORMS

Week 14 Music Understanding and Classification

COMBINING FEATURES REDUCES HUBNESS IN AUDIO SIMILARITY

Fast Ethernet Consortium Clause 25 PMD-EEE Conformance Test Suite v1.1 Report

Tempo and Beat Analysis

CHAPTER 4 SEGMENTATION AND FEATURE EXTRACTION

A Survey on: Sound Source Separation Methods

Phone-based Plosive Detection

An Accurate Timbre Model for Musical Instruments and its Application to Classification

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

UNDERSTANDING the timbre of musical instruments has

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Automatic discrimination between laughter and speech

Music Information Retrieval Community

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Sound Quality Analysis of Electric Parking Brake

MUSICAL INSTRUMENTCLASSIFICATION USING MIRTOOLBOX

Towards Music Performer Recognition Using Timbre Features

Experimental Study of Attack Transients in Flute-like Instruments

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

SCOPE OF ACCREDITATION TO ISO/IEC 17025:2005 & ANSI/NCSL Z

638 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010

CS229 Project Report Polyphonic Piano Transcription

Music BCI ( )

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

Singer Recognition and Modeling Singer Error

A Step toward AI Tools for Quality Control and Musicological Analysis of Digitized Analogue Recordings: Recognition of Audio Tape Equalizations

Sound design strategy for enhancing subjective preference of EV interior sound

Dimensional Music Emotion Recognition: Combining Standard and Melodic Audio Features

Mood Tracking of Radio Station Broadcasts

Transcription:

Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

Introduction Wanted: automatic audio and music classifier Previous work: Typical method: Feature extraction followed by classification Specific method of classification is not always crucial i.e., features are the limiting factor Temporal properties of audio are important for classification and summarization Our focus here is on features for audio classification and their temporal properties 2

Method: General Compare classification performance of four feature sets: Standard low-level signal parameters Mel-frequency cepstral coefficients (MFCC) Psychoacoustic features Auditory filterbank temporal envelope Include statistics of feature temporal behavior as additional features Evaluate classification using a multivariate Gaussian framework (Quadratic Discriminate Analysis - QDA) 3

Method: Feature extraction 743-ms analysis frame 23-ms subframes Feature extraction Subframe feature vectors Spectral feature modeling Spectral Feature model 0 Hz 1-2 Hz 3-15 Hz 20-43 Hz Feature selection (9 best for maximum prediction training data) Final feature vector 4

Method: Classification Classification tasks Five class general audio classification Classical music (35), popular music (188), speech (31), background noise (25), crowd noise (31) Seven class music genre classification Jazz (38), Folk (23), Electronica (27), R&B (43), Rock (37), Reggae (11), Vocal (9) QDA training and cross-validation with the.632+ bootstrap method 5

Results: Standard Low Level features Feature ranking: General Audio, Music Genre 1. RMS level 3, 3 8 7, 9 2. Spectral centroid 3. Bandwidth 4. Zero crossing rate 5. Spectral roll-off freq 6. Band energy ratio 7. Delta spectrum mag. 8. Pitch 9. Pitch strength DC 6, 7 4 1, 2 2, 6 5, 5 9 1-2 Hz 3-15 Hz 4, 1 8 20-43 Hz 6

Results: Standard Low Level features Classification with 9 best features General Audio (86±4%) Music Genre (61±11%) Real Class Clas Pop Spch Ns e Crwd 0.98 ±0.02 0.83 ±0.03 0.94 ±0.04 0.6 ±0.12 0.97 ±0.02 Clas Pop Spch Ns e Crwd Jazz Folk Elct R&B Rock Regg Vocl 0.64 ±0.1 0.8 ±0.09 0.51 ±0.15 0.49 ±0.08 0.76 ±0.07 0.57 ±0.17 0.52 ±0.22 Jazz Folk Elct R&B Rock Regg Vocl Classification Result 7

Results: MFCC features Feature ranking: General Audio, Music Genre 1. MFCC 0 3, 2 2, 6 1 2. MFCC 1 3. MFCC 2 4. MFCC 3 5. MFCC 4 6. MFCC 5 7. MFCC 6 8. MFCC 7 9. MFCC 8 10. MFCC 9 11. MFCC 10 12. MFCC 11 13. MFCC 12 DC 1, 4 5, 7 3 6 5 9 7 8, 8 9 1-2 Hz 3-15 Hz 20-43 Hz 4 8

Results: MFCC features Classification with 9 best features General Audio (92±3%) Music Genre (65±10%) Real Class Clas Pop Spch Ns e Crwd 0.89 ±0.05 0.92 ±0.01 0.97 ±0.02 0.82 ±0.07 0.97 ±0.02 Clas Pop Spch Ns e Crwd Jazz Folk Elct R&B Rock Regg Vocl 0.68 ±0.08 0.83 ±0.07 0.53 ±0.13 0.46 ±0.09 0.78 ±0.05 0.54 ±0.16 0.73 ±0.2 Jazz Folk Elct R&B Rock Regg Vocl Classification Result 9

Results: Psychoacoustic features Feature ranking: General Audio, Music Genre DC 1-2 Hz 3-15 Hz 20-43 Hz 1. Roughness 3, 2 N/A N/A N/A 2. Roughness Std. Dev. 7 N/A N/A N/A 3. Loudness 4, 5 8 6, 6 5, 4 4. Sharpness 2, 1 9, 7 1, 3 8, 9 10

Results: Psychoacoustic features Classification with 9 best features General Audio (92±3%) Music Genre (62±10%) Real Class Clas Pop Spch Ns e Crwd 0.94 ±0.02 0.85 ±0.02 1 ±0 0.89 ±0.05 0.9 ±0.03 Clas Pop Spch Ns e Crwd Jazz Folk Elct R&B Rock Regg Vocl 0.63 ±0.08 0.72 ±0.09 0.71 ±0.09 0.52 ±0.09 0.69 ±0.08 0.55 ±0.18 0.5 ±0.2 Jazz Folk Elct R&B Rock Regg Vocl Classification Result 11

Results: AFTE features Feature ranking: General Audio, Music Genre 1. AFTE 1 (Fc = 26 Hz) 7, 6 N/A N/A 2. AFTE 2 (Fc = 88 Hz) 3. AFTE 3 (Fc = 164 Hz) 4. AFTE 4 (Fc = 258 Hz) 7. AFTE 7 (Fc = 703 Hz) 8. AFTE 8 (Fc = 927 Hz) 9. AFTE 9 (Fc = 1206 Hz) 12. AFTE 12 (Fc = 2514 Hz) 16. AFTE 16 (Fc = 6279 Hz) 17. AFTE 17 (Fc = 7848 Hz) 18. AFTE 18 (Fc = 9795 Hz) DC 1 1, 3 8 4 8 5 3, 2 3-15 Hz 7 5 20-150 Hz N/A 6 9 9 4 150-1000 Hz N/A N/A N/A N/A N/A 2 12

Results: AFTE features Classification with 9 best features General Audio (93±2%) Music Genre (74±9%) Real Class Clas Pop Spch Ns e Crwd 0.94 ±0.01 0.95 ±0.01 0.97 ±0.02 0.85 ±0.06 0.91 ±0.03 Clas Pop Spch Ns e Crwd Jazz Folk Elct R&B Rock Regg Vocl 0.81 ±0.05 0.84 ±0.06 0.71 ±0.11 0.68 ±0.07 0.77 ±0.07 0.61 ±0.17 0.76 ±0.16 Jazz Folk Elct R&B Rock Regg Vocl Classification Result 13

Results Summary SLL MFCC PA AFTE General Audio 86±4% 92±3% 92±3% 93±2% Music Genre 61±11% 65±10% 62±10% 74±9% 14

Conclusions Classification based on features from an auditory model (AFTE) is better than that from other standard feature sets. Temporal modulations of features are important for audio and music classification. Feature development can improve audio and music classification. 15