Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Similar documents
Music Representations

Music Processing Introduction Meinard Müller

Music Structure Analysis

Music Processing Audio Retrieval Meinard Müller

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Tempo and Beat Tracking

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

Tempo and Beat Analysis

Music Information Retrieval

Beethoven, Bach, and Billions of Bytes

Audio Structure Analysis

Music Representations

Robert Alexandru Dobre, Cristian Negrescu

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Informed Feature Representations for Music and Motion

CSC475 Music Information Retrieval

Audio Structure Analysis

Further Topics in MIR

MUSIC is a ubiquitous and vital part of the lives of billions

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

AUDIO MATCHING VIA CHROMA-BASED STATISTICAL FEATURES

Music Information Retrieval (MIR)

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Meinard Müller. Beethoven, Bach, und Billionen Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Audio Structure Analysis

Music 175: Pitch II. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) June 2, 2015

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

2. AN INTROSPECTION OF THE MORPHING PROCESS

CS 591 S1 Computational Audio

Automatic music transcription

ONE main goal of content-based music analysis and retrieval

Music Information Retrieval (MIR)

Beethoven, Bach und Billionen Bytes

Figure 1: Feature Vector Sequence Generator block diagram.

FFT Laboratory Experiments for the HP Series Oscilloscopes and HP 54657A/54658A Measurement Storage Modules

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Chord Classification of an Audio Signal using Artificial Neural Network

Measurement of overtone frequencies of a toy piano and perception of its pitch

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

Music Source Separation

TOWARDS AN EFFICIENT ALGORITHM FOR AUTOMATIC SCORE-TO-AUDIO SYNCHRONIZATION

Automatic Classification of Instrumental Music & Human Voice Using Formant Analysis

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Voice & Music Pattern Extraction: A Review

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Algorithmic Composition: The Music of Mathematics

Analysing Musical Pieces Using harmony-analyser.org Tools

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Simple Harmonic Motion: What is a Sound Spectrum?

Analysis, Synthesis, and Perception of Musical Sounds

10 Visualization of Tonal Content in the Symbolic and Audio Domains

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Swept-tuned spectrum analyzer. Gianfranco Miele, Ph.D

Short-Time Fourier Transform

Spectral toolkit: practical music technology for spectralism-curious composers MICHAEL NORRIS

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Experiments on musical instrument separation using multiplecause

Outline. Why do we classify? Audio Classification

THE importance of music content analysis for musical

Aspects of Music. Chord Recognition. Musical Chords. Harmony: The Basis of Music. Musical Chords. Musical Chords. Piece of music. Rhythm.

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Query By Humming: Finding Songs in a Polyphonic Database

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

THE INDIAN KEYBOARD. Gjalt Wijmenga

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Pitch Perception. Roger Shepard

Week 14 Music Understanding and Classification

Music Structure Analysis

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

OCTAVE C 3 D 3 E 3 F 3 G 3 A 3 B 3 C 4 D 4 E 4 F 4 G 4 A 4 B 4 C 5 D 5 E 5 F 5 G 5 A 5 B 5. Middle-C A-440

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

INVESTIGATING KEY DETECTION TO FACILITATE HARMONIC MIXING

CS229 Project Report Polyphonic Piano Transcription

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

6.5 Percussion scalograms and musical rhythm

Algorithms for melody search and transcription. Antti Laaksonen

The Tone Height of Multiharmonic Sounds. Introduction

Course Web site:

Spectral Sounds Summary

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Lecture 10 Harmonic/Percussive Separation

Adaptive Resampling - Transforming From the Time to the Angle Domain

Music Segmentation Using Markov Chain Methods

Automatic Rhythmic Notation from Single Voice Audio Sources

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Polyphonic music transcription through dynamic networks and spectral pattern identification

Searching for Similar Phrases in Music Audio

An Effective Filtering Algorithm to Mitigate Transient Decaying DC Offset

Multichannel Satellite Image Resolution Enhancement Using Dual-Tree Complex Wavelet Transform and NLM Filtering

Music Structure Analysis

ECE438 - Laboratory 4: Sampling and Reconstruction of Continuous-Time Signals

Transcription:

Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de Chapter 2: Fourier Analysis of Signals Chapter 3: Music Synchronization 2.1 The in a Nutshell 2.2 Signals and Signal Spaces 2.3 2.4 Discrete (DFT) 2.5 Short-Time (STFT) 2.6 Further Notes Important technical terminology is covered in Chapter 2. In particular, we approach the Fourier transform which is perhaps the most fundamental tool in signal processing from various perspectives. For the reader who is more interested in the musical aspects of the book, Section 2.1 provides a summary of the most important facts on the Fourier transform. In particular, the notion of a spectrogram, which yields a time frequency representation of an audio signal, is introduced. The remainder of the chapter treats the Fourier transform in greater mathematical depth and also includes the fast Fourier transform (FFT) an algorithm of great beauty and high practical relevance. 3.1 Audio Features 3.2 Dynamic Time Warping 3.3 Applications 3.4 Further Notes As a first music processing task, we study in Chapter 3 the problem of music synchronization. The objective is to temporally align compatible representations of the same piece of music. Considering this scenario, we explain the need for musically informed audio features. In particular, we introduce the concept of chroma-based music features, which capture properties that are related to harmony and melody. Furthermore, we study an alignment technique known as dynamic time warping (DTW), a concept that is applicable for the analysis of general time series. For its efficient computation, we discuss an algorithm based on dynamic programming a widely used method for solving a complex problem by breaking it down into a collection of simpler subproblems.

Idea: Decompose a given signal into a superposition of sinusoidals (elementary signals). Each sinusoidal has a physical meaning and can be described by three parameters: Sinusoidals Sinusoidals Signal Each sinusoidal has a physical meaning and can be described by three parameters: Each sinusoidal has a physical meaning and can be described by three parameters: Sinusoidals Signal Signal Fouier transform 1 0.5 1 2 3 4 5 6 7 8 Example: Superposition of two sinusoidals Example: C4 played by piano

Example: C4 played by trumpet Example: C4 played by violine Example: C4 played by flute Example: Speech Bonn Example: Speech Zürich Example: C-major scale (piano)

Example: Chirp signal Each sinusoidal has a physical meaning and can be described by three parameters: Polar coordinates: Im Complex formulation of sinusoidals: Re Signal Signal Fourier representation, Fourier representation, Fourier transform Fourier transform Tells which frequencies occur, but does not tell when the frequencies occur. Frequency information is averaged over the entire time interval. Time information is hidden in the phase Short Time Idea (Dennis Gabor, 1946): Consider only a small section of the signal for the spectral analysis recovery of time information Short Time (STFT) Section is determined by pointwise multiplication of the signal with a localizing window function

Short Time Short Time Short Time Short Time Short Time Short Time

Short Time Short Time Definition Signal Window function (, ) STFT with Short Time Intuition: Short Time Intuition: is musical note of frequency, which oscillates within the translated window is musical note of frequency, which oscillates within the translated window Inner product measures the correlation between the musical note and the signal. Window Function Window Function Box window Triangle window

Window Function Window Function Hann window Trade off between smoothing and ringing Time-Frequency Representation Time-Frequency Representation Frequency (Hertz) Frequency (Hertz) Spectrogram Time (seconds) Intensity (db) Time (seconds) Intensity (db) Time-Frequency Representation Chirp signal and STFT with box window of length 0.05 Time-Frequency Representation Chirp signal and STFT with Hann window of length 0.05

Time-Frequency Localization Short Time Signal and STFT with Hann window of length 0.02 Size of window constitutes a trade-off between time resolution and frequency resolution: Large window : poor time resolution good frequency resolution Small window : good time resolution poor frequency resolution Heisenberg Uncertainty Principle: there is no window function that localizes in time and frequency with arbitrary position. Short Time MATLAB Signal and STFT with Hann window of length 0.1 MATLAB function SPECTROGRAM N = window length (in samples) M = overlap (usually ) Compute DFT N for every windowed section Keep lower Fourier coefficients Sequence of spectral vectors (for each window a vector of dimension ) Example Let x be a discrete time signal Sampling rate: Window length: Overlap: Hopsize: Example Time resolution: Frequency resolution: Let corresponds to window Hz

Model assumption: Equal-tempered scale MIDI pitches: Piano notes: Concert pitch: Center frequency: Hz Idea: Binning of Fourier coefficients Divide up the fequency axis into logarithmically spaced pitch regions and combine spectral coefficients of each region to a single pitch coefficient. Logarithmic frequency distribution Octave: doubling of frequency Time-frequency representation Windowing in the time domain Windowing in the frequency domain Details: Let be a spectral vector obtained from a spectrogram w.r.t. a sampling rate and a window length N. The spectral coefficient corresponds to the frequency Let be the set of coefficients assigned to a pitch Then the pitch coefficient is defined as Example: A4, p = 69 Center frequency: Lower bound: Upper bound: STFT with, Example: A4, p = 69 Center frequency: Lower bound: Upper bound: STFT with, S(p = 69)

Note MIDI pitch Center [Hz] frequency Left [Hz] boundary Right [Hz] boundary Width [Hz] A3 57 220.0 213.7 226.4 12.7 A#3 58 233.1 226.4 239.9 13.5 B3 59 246.9 239.9 254.2 14.3 C4 60 261.6 254.2 269.3 15.1 C#4 61 277.2 269.3 285.3 16.0 D4 62 293.7 285.3 302.3 17.0 D#4 63 311.1 302.3 320.2 18.0 E4 64 329.6 320.2 339.3 19.0 F4 65 349.2 339.3 359.5 20.2 F#4 66 370.0 359.5 380.8 21.4 G4 67 392.0 380.8 403.5 22.6 G#4 68 415.3 403.5 427.5 24.0 A4 69 440.0 427.5 452.9 25.4 Note: For some pitches, S(p) may be empty. This particularly holds for low notes corresponding to narrow frequency bands. Linear frequency sampling is problematic! Solution: Multi-resolution spectrograms or multirate filterbanks Example: Friedrich Burgmüller, Op. 100, No. 2 Spectrogram Intensity 0 1 2 3 4 Spectrogram Pitch representation C8 C8: 4186 Hz C7: 2093 Hz Intensity C7 C6 C5 C6: 1046 Hz C5: 523 Hz C4: 261 Hz C4

Pitch representation C8 Example: Chromatic scale Spectrogram C7 C6 C5 MIDI pitch C4 Example: Chromatic scale Example: Chromatic scale Spectrogram Log-frequency spectrogram C8: 4186 Hz C8: 4186 Hz C7: 2093 Hz C7: 2093 Hz C6: 1046 Hz C5: 523 Hz C4: 261 Hz C3: 131 Hz C6: 1046 Hz C5: 523 Hz C4: 261 Hz C3: 131 Hz Example: Chromatic scale Example: Chromatic scale Log-frequency spectrogram Log-frequency spectrogram Pitch (MIDI note number) Pitch (MIDI note number) Chroma C

Example: Chromatic scale Example: Chromatic scale Log-frequency spectrogram Chroma representation Pitch (MIDI note number) Chroma Chroma C # Example: Chromatic scale Chroma representation (normalized, Euclidean) Chroma Intensity (normalized) Human perception of pitch is periodic in the sense that two pitches are perceived as similar in color if they differ by an octave. Seperation of pitch into two components: tone height (octave number) and chroma. Chroma : 12 traditional pitch classes of the equaltempered scale. For example: Chroma C Computation: pitch features chroma features Add up all pitches belonging to the same class Result: 12-dimensional chroma vector. C2 C3 C4 Chroma C

C # 2 C # 3 C # 4 Chroma C # D2 D3 D4 Chroma D Chromatic circle Shepard s helix of pitch perception Example: C-Major Scale Meinard Müller: Fundamentals of Music Processing Chapter 1: Music Representations, Fig. 1.3 Springer International Publishing Switzerland, 2015 C8 Pitch representation Chroma representation C7 C6 C5 MIDI pitch Chroma C4

Example: Beethoven s Fifth Chroma representation (normalized, 10 Hz) Chroma representation (normalized) Karajan Scherbakov Chroma Intensity (normalized) Example: Beethoven s Fifth Chroma representation (normalized, 2 Hz) Smoothing (2 seconds) + downsampling (factor 5) Karajan Scherbakov Example: Beethoven s Fifth Chroma representation (normalized, 1 Hz) Smoothing (4 seconds) + downsampling (factor 10) Karajan Scherbakov Example: Bach Toccata Example: Bach Toccata Koopman Ruebsam Koopman Ruebsam Time (sampels) Time (sampels) Time (sampels) Time (sampels) Feature resolution: 10 Hz

Example: Bach Toccata Example: Bach Toccata Koopman Ruebsam Koopman Ruebsam Time (sampels) Time (sampels) Time (sampels) Time (sampels) Feature resolution: 1 Hz Feature resolution: 0.33 Hz Sequence of chroma vectors correlates to the harmonic progression Example: Zager & Evans In The Year 2525 Normalization changes in dynamics makes features invariant to Further quantization and smoothing: CENS features Taking logarithm before adding up pitch coefficients accounts for logarithmic sensation of intensity How to deal with transpositions? Example: Zager & Evans In The Year 2525 Example: Zager & Evans In The Year 2525 Original: Original: Shifted:

Audio Features There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application http://www.mpi-inf.mpg.de/resources/mir/chromatoolbox/ MATLAB implementations for various chroma variants