Acoustic Scene Classification

Similar documents
International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Topics in Computer Music Instrument Identification. Ioanna Karydi

Analysis, Synthesis, and Perception of Musical Sounds

Improving Frame Based Automatic Laughter Detection

Automatic Laughter Detection

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Singer Traits Identification using Deep Neural Network

Music Genre Classification and Variance Comparison on Number of Genres

A Survey on: Sound Source Separation Methods

Acoustic scene and events recognition: how similar is it to speech recognition and music genre/instrument recognition?

Singer Identification

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Speech and Speaker Recognition for the Command of an Industrial Robot

Music Information Retrieval with Temporal Features and Timbre

Automatic music transcription

Recognising Cello Performers using Timbre Models

Proposal for Application of Speech Techniques to Music Analysis

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

A System for Acoustic Chord Transcription and Key Extraction from Audio Using Hidden Markov models Trained on Synthesized Audio

Automatic Laughter Detection

Automatic Music Genre Classification

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

HUMANS have a remarkable ability to recognize objects

2. AN INTROSPECTION OF THE MORPHING PROCESS

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Recognising Cello Performers Using Timbre Models

TOWARD UNDERSTANDING EXPRESSIVE PERCUSSION THROUGH CONTENT BASED ANALYSIS

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Automatic Rhythmic Notation from Single Voice Audio Sources

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Chord Classification of an Audio Signal using Artificial Neural Network

Tempo and Beat Analysis

Features for Audio and Music Classification

Figure 1: Feature Vector Sequence Generator block diagram.

CSC475 Music Information Retrieval

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

A Categorical Approach for Recognizing Emotional Effects of Music

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Classification of Timbre Similarity

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Automatic Piano Music Transcription

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

Subjective Similarity of Music: Data Collection for Individuality Analysis

Audio classification from time-frequency texture

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

AUD 6306 Speech Science

Singing Voice Detection for Karaoke Application

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

WE ADDRESS the development of a novel computational

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

Comparison Parameters and Speaker Similarity Coincidence Criteria:

MUSI-6201 Computational Music Analysis

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Automatic Classification of Instrumental Music & Human Voice Using Formant Analysis

Representations of Sound in Deep Learning of Audio Features from Music

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Chapter 1 Introduction to Sound Scene and Event Analysis

Topic 10. Multi-pitch Analysis

Music Genre Classification

Semi-supervised Musical Instrument Recognition

Supervised Learning in Genre Classification

Singing voice synthesis based on deep neural networks

MIRAI: Multi-hierarchical, FS-tree based Music Information Retrieval System

Audio-Based Video Editing with Two-Channel Microphone

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Music Segmentation Using Markov Chain Methods

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Normalized Cumulative Spectral Distribution in Music

THE importance of music content analysis for musical

Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility

Non Stationary Signals (Voice) Verification System Using Wavelet Transform

NEURAL NETWORKS FOR SUPERVISED PITCH TRACKING IN NOISE. Kun Han and DeLiang Wang

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

A New Method for Calculating Music Similarity

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

The song remains the same: identifying versions of the same piece using tonal descriptors

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Singer Recognition and Modeling Singer Error

Toward Multi-Modal Music Emotion Classification

LAUGHTER serves as an expressive social signal in human

Pitch Perception. Roger Shepard

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly

An Examination of Foote s Self-Similarity Method

Automatic Music Clustering using Audio Attributes

Singing Pitch Extraction and Singing Voice Separation

Automatic Construction of Synthetic Musical Instruments and Performers

Statistical Modeling and Retrieval of Polyphonic Music

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Lecture 9 Source Separation

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Transcription:

Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1

Outline Acoustic Scene Classification - definition History and state of the art Two approaches o Statistic o Human Conclusion Further research Questions and Answers Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 2

Acoustic Scene Classification Computational Auditory Scene Analysis (CASA) Classifying the environment of an audio record Acoustic event classification Cherry (1953): Cocktail party problem. Human vs. machine Application: o Hearing aids o Speech recognition o Context aware computing applications Shutterstock.com Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 3

History and state of the art 1932 1953 1982 1990 1997 1998 2003 2013 2015 Speech recognition at Bell labs Cherry: Cocktail party problem. David Marr - information processing of the brain from a computational view Bregman Auditory Scene Analysis Development of digital hearing aids pushed CASA Sawhney and Maes first exclusive CASA method Hidden Markov Models TrecVid started Mel Frequency Cepstral Coefficients IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) IEEE WASPAA (forthcoming) Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 4

Two approaches Statistic pure physical information Low-level grouping Monaural Brute force all data analysed Human Brainwork Low-level grouping High-level grouping Binaural Attention Filters (Band-pass, ) Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 5

Two approaches -similarities- Preparation of the audio stream o E.g. windowing, Physical features of the audio stream are extracted o E.g. MFCC, F 0,.. Events are hints to the scene Training and classification phase Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 6

Technical methods F 0 (fundamental frequency) o Detection and summation of harmonics for finding f 0 o Speech recognition o Multi speaker problem MFCC (Mel Feature Cepstral Coefficients) o Transformation of audio invented for speech recognition o Mel: perceptual scale of pitches o Cepstrum: Inverse Fourier transform (IFT) of the logarithm of the estimated spectrum of a signal possibility to divide vocal excitation (pitch) and vocal tract (formants) Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 7

Technical methods LPI (Latent Perceptual Indexing) o Similar to latent semantic indexing for text analysis o Points out the super ordinated attributes/key attributes o For huge amounts of data o Needs lot of training SVM (Support Vector Machine) o Representation of acoustic events as vectors o Certain vectors (support vectors) construct a hyper plane dividing scene classes Ennepetaler86 from www.wikipedia.org Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 8

Statistic approach Geiger et. al. Audio preparation o Monaural o Windows (overlapping) opensmile:) feature extractor o MFCC (Mel Feature Cepstral Coefficients) o F 0 (sub harmonic summation and probability of voicing) o Classification o SVM (Support Vector Machines) o LPI (Latent Perceptual Indexing) Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 9

Statistic approach -results- More training data needed for LPI SVM obtained best results Window size matters MFCC does the main part (68% combined with SVM) 71% on training data 69% on evaluation data Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 10

Human based approach Kalinli et. al. How does the ear perceive sounds? What is happening in the brain while listening? What influence has experience? How does attention work? LISA (Latent Indexing using SAliency) Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 11

Human based approach -sound perception- Human Usually two ears (binaural hearing) Sounds have spectral harmonics Frequency dependent perception of the cochlea Implementation Two microphones F 0 Band-Pass filter Noise reduction Constant noises are partially supressed "Anatomy of the Human Ear, A. Brockmann Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 12

Human based approach -brainwork- Human Implementation Auditory cortex (feature extraction) Comparison and grouping of cues Experience Information storage MFCC, F 0 High-level cue grouping Context awareness Neural network Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 13

Human based approach -attention- Human Implementation Like a spotlight Suppression of noise without attention (binaural) Microphone just cacophony Direction and movement detection (binaural) Salient event detector Saliency feature filter o Intensity o Frequency contrast o Temporal contrast o Orientations/latency Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 14

Human based approach -results- Goal was not to reach best results Comparison LISA vs. Baseline (40%) 74% reduced data for better results (50% using top 35 salient events) Up to 98% reduced data for baseline results (40% using top 10 salient events) Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 15

Conclusion Basic methods are similar (MFCC, LPI, ) Different audio databases (no direct comparison) Statistical methods seem to be more accurate Human mimicking methods vastly reduce data and computing effort Both approaches do not hit the mean human accuracy (71%) Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 16

Further research Algorithms for devices with limited computational power Independent systems for unlabelled scenes Including external information e.g. Geo location Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 17

References Daniele Barchiesi, Dimitrios Giannoulis, Dan Stowell and Mark D. Plumbley, Senior Member, IEEE. School of Electronic Engineering and Computer Science, Acoustic Scene Classification, November 17, 2014. Ozlem Kalinli, Shiva Sundaram, Shrikanth Narayanan. Saliency-Driven Unstructured Acoustic Scene Classification Using Latent Perceptual Indexing. MMSP 09, October 5-7, 2009. Jürgen T. Geiger, Björn Schuller, Gerhard Rigoll. Large-Scale Audio Feature Extraction And Svm For Acoustic Scene Classification. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 20-23, 2013, New Paltz, NY. Malcolm Slaney. The History and Future of CASA. In Perspectives on Speech Separation, Editor: P. Divenyi, Kluwer, 2006. Deliang Wang and Guy J. Brown. Fundamentals of Computational Auditory Scene Analysis. 2006. Ben Milner and Dan Smith. Acoustic Environment Classification. ACM Transactions on Speech and Language Processing, Vol. 3, No. 2, July 2006, Pages 1 22. Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 18