GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Similar documents
CTP431- Music and Audio Computing Musical Acoustics. Graduate School of Culture Technology KAIST Juhan Nam

CTP 431 Music and Audio Computing. Basic Acoustics. Graduate School of Culture Technology (GSCT) Juhan Nam

Features for Audio and Music Classification

Classification of Timbre Similarity

Violin Timbre Space Features

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

2018 Fall CTP431: Music and Audio Computing Fundamentals of Musical Acoustics

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Recognising Cello Performers using Timbre Models

Recognising Cello Performers Using Timbre Models

MUSI-6201 Computational Music Analysis

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

WE ADDRESS the development of a novel computational

Topic 10. Multi-pitch Analysis

Analysis, Synthesis, and Perception of Musical Sounds

Tempo and Beat Analysis

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Topics in Computer Music Instrument Identification. Ioanna Karydi

Perceptual dimensions of short audio clips and corresponding timbre features

TYING SEMANTIC LABELS TO COMPUTATIONAL DESCRIPTORS OF SIMILAR TIMBRES

CSC475 Music Information Retrieval

Supervised Learning in Genre Classification

TOWARD UNDERSTANDING EXPRESSIVE PERCUSSION THROUGH CONTENT BASED ANALYSIS

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Music Representations

DERIVING A TIMBRE SPACE FOR THREE TYPES OF COMPLEX TONES VARYING IN SPECTRAL ROLL-OFF

A prototype system for rule-based expressive modifications of audio recordings

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Automatic Classification of Instrumental Music & Human Voice Using Formant Analysis

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Normalized Cumulative Spectral Distribution in Music

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

A New Method for Calculating Music Similarity

The song remains the same: identifying versions of the same piece using tonal descriptors

Topic 4. Single Pitch Detection

Chord Classification of an Audio Signal using Artificial Neural Network

Music Genre Classification and Variance Comparison on Number of Genres

Automatic music transcription

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

Music Information Retrieval with Temporal Features and Timbre

An Examination of Foote s Self-Similarity Method

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

Speech and Speaker Recognition for the Command of an Industrial Robot

Automatic Construction of Synthetic Musical Instruments and Performers

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

CTP 431 Music and Audio Computing. Course Introduction. Graduate School of Culture Technology (GSCT) Juhan Nam

Automatic Rhythmic Notation from Single Voice Audio Sources

LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Subjective Similarity of Music: Data Collection for Individuality Analysis

Environmental sound description : comparison and generalization of 4 timbre studies

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Semi-supervised Musical Instrument Recognition

Music Information Retrieval

An Accurate Timbre Model for Musical Instruments and its Application to Classification

Feature-based Characterization of Violin Timbre

Data Driven Music Understanding

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Singer Identification

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Scoregram: Displaying Gross Timbre Information from a Score

MOTIVATION AGENDA MUSIC, EMOTION, AND TIMBRE CHARACTERIZING THE EMOTION OF INDIVIDUAL PIANO AND OTHER MUSICAL INSTRUMENT SOUNDS

Automatic Laughter Detection

Animating Timbre - A User Study

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

HUMANS have a remarkable ability to recognize objects

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

Timbre perception

MUSICAL INSTRUMENTCLASSIFICATION USING MIRTOOLBOX

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

CONCATENATIVE SYNTHESIS FOR NOVEL TIMBRAL CREATION. A Thesis. presented to. the Faculty of California Polytechnic State University, San Luis Obispo

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes

Acoustic Instrument Message Specification

An interdisciplinary approach to audio effect classification

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Figure 1: Feature Vector Sequence Generator block diagram.

Extracting Information from Music Audio

Acoustic Scene Classification

A Need for Universal Audio Terminologies and Improved Knowledge Transfer to the Consumer

Quarterly Progress and Status Report. Violin timbre and the picket fence

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Singer Traits Identification using Deep Neural Network

Modeling and Control of Expressiveness in Music Performance

Psychophysical quantification of individual differences in timbre perception

Audio Feature Extraction for Corpus Analysis

Creating a Feature Vector to Identify Similarity between MIDI Files

Music Information Retrieval for Jazz

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

ACOUSTIC FEATURES FOR DETERMINING GOODNESS OF TABLA STROKES

Transcription:

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral summary features Mel-Frequency Cepstral Coefficient (MFCC) 2

What is timbre? Definition Attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar (ANSI) Tone color or quality that defines a particular sound Associated with classifying or identifying sound sources Class: piano, guitar, singing voice, engine sound Identity: Steinway Model D, Fender Stratocaster, Michael Jackson, Harley Davisson Also used to holistically describe polyphonic sounds For example, music or environmental sounds Associated with genre, mood or other high-level descriptions 3

What is timbre? Timbre is a very vague concept There is no single quantitative scale like loudness or pitch. There are actually multiple attributes. Different aspects of the multiplicity Acoustic attributes: temporal or spectral factors Timber space: perceptual similarity/dissimilarity Semantic attributes: textual descriptions 4

Acoustic Attributes in Timbre Perception Acoustic Attributes (Schouten, 1968) Harmonicity: the range between tonal and noise-like character Time envelope (ADSR) Spectral envelope Changes of spectral envelope and fundamental frequency The onset of a sound differing notably from the sustained vibration ADSR Changes of spectral envelope 5

Acoustic Attributes in Timbre Perception Sound design problem? 6

Timbre Space Perceptual multi-dimensional attributes based on measuring similarity Ask human to listen a pair of sounds and judge the degree of similarity as a score The similarity matrix is processed using multidimensional scaling (MDS), a dimensionality reduction algorithm which determines the timbre space Acoustic correlation with the three (reduced) dimensions Spectral energy distribution Attack and decay time Amount of inharmonic sound in the attack (Grey, 1977) 7

Semantic attributes Verbally describe different characteristics of timbre using words Dull Brilliant Cold Warm Pure Rich (Pratt and Doak, 1976) Dull Sharp Compact Scattered Full Empty Colorful Colorless (von Bismark, 1974) (T. Rossing s music150 slides) 8

Timbre Feature Extraction Extracting acoustic features from signals Low-level Acoustic Features Zero-crossing rates Spectral summaries Spectral envelope: MFCC 9

Zero-Crossing Rate (ZCR) ZCR is low for harmonic (voiced) sounds and high for noisy (unvoiced) sounds For simple periodic signals, it is related to the F0 Voiced Unvoiced 10

Spectral Summary Features Spectral Centroid: Center of gravity of the spectrum Associated with the brightness of sounds SC(t) = k k f k X t (k) X t (k) Spectral Roll-off: frequency under which 85% or 95% of spectral energy is concentrated in R t X t (k) = 0.85 X t (k) k N k 11

Spectral Summary Features Spectral Spread(SS): a measure of the bandwidth of the spectrum SS(t) = Spectral flatness (SF): a measure of the noisiness of the spectrum The ratio between the geometric and arithmetic means ( f k SC(t)) 2 X t (k) Examples: white noise à 1, pure tone à 0 k k X t (k) SF(t) = K 1 K k k X t (k) X t (k) 12

Examples of Spectral Centroids 10000 10000 9000 9000 8000 8000 7000 7000 frequency [Hz] 6000 5000 4000 frequency [Hz] 6000 5000 4000 3000 3000 2000 2000 1000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 time [sec] 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 time [sec] Classical: Beethoven String Quartet Pop: Video killed the radio star 13

Mel-Frequency Cepstral Coefficient (MFCC) Most popularly used audio feature that extracts spectral envelop from an audio frame Standard audio feature in speech recognition Introduced in music domain by Logan in 2000 Computation Steps DFT (audio frame) Mapping freq. scale to mel Log magnitude DCT 14

Mel-Frequency Spectrogram Convert linear frequency to mel scale Usually reduce the dimensionality of spectrum Spectrum Spectrum (mel-scaled) 15

Discrete Cosine Transform Real-valued transform: similar to DFT De-correlate the mel-scaled log spectrum and reduce the dimensionality again X DCT (k) = 2 N N 1 n=1 x(n)cos( πk N (n 0.5)) Spectrum (mel-scaled) MFCC 16

Reconstructed Frequency Spectrum from MFCC Frequency spectrum (512 bins) Frequency spectrum (mel-scaled, 60 bins) MFCC (13 dim) Reconstructed Frequency spectrum Reconstructed Frequency Spectrum (mel-scaled) 17

Comparison of Spectrogram and MFCC Spectrogram Mel-frequency Spectrogram MFCC Reconstructed Spectrogram from MFCC 18

Sound Examples of MFCC Original: MFCC reconstruction (using white-noise as a source): 19

Post-processing Adding temporal dynamics Short-term dynamics of features are characterized with delta or double-delta Δx = x(n) x(n h) h ΔΔx = Δx(n) Δx(n h) h 39 MFCCs in speech recognition: 13 MFCCs + 13 delta + 13 double-delta Normalization Cepstral Mean Subtraction (CMS): subtract the mean over surrounding frames Standardization: subtract the mean and divide by the variance 20

Applications Music Musical Instrument classification Music genre/mood classification Similarity-based audio retrieval Speech Speech recognition Speaker recognition 21

References J. Grey, Multidimensional Perceptual Scaling of musical timbre, 1977 D. Wessel, Timbre Space as a musical control structure, 1979 S. Donnadieu, Mental Representation of the Timbre of Complex Sounds, book chapter (ch. 8) in Analysis, Synthesis and Perception of Musical sounds, ed. J. Beauchamp, 2007 B. Logan, Mel Frequency Cepstral Coefficients for Music Modeling, 2000 22