Week 14 Music Understanding and Classification

Similar documents
MUSI-6201 Computational Music Analysis

Supervised Learning in Genre Classification

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Automatic Rhythmic Notation from Single Voice Audio Sources

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Analysis of local and global timing and pitch change in ordinary

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Music Understanding and the Future of Music

Hidden Markov Model based dance recognition

Music Genre Classification and Variance Comparison on Number of Genres

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

CS229 Project Report Polyphonic Piano Transcription

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Computational Modelling of Harmony

Outline. Why do we classify? Audio Classification

Chord Classification of an Audio Signal using Artificial Neural Network

Tempo and Beat Analysis

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Automatic Music Clustering using Audio Attributes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Automatic Laughter Detection

Automatic Laughter Detection

HST 725 Music Perception & Cognition Assignment #1 =================================================================

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Jazz Melody Generation and Recognition

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Automatic Labelling of tabla signals

Automatic Construction of Synthetic Musical Instruments and Performers

Computer Coordination With Popular Music: A New Research Agenda 1

Query By Humming: Finding Songs in a Polyphonic Database

Automatic Music Genre Classification

Notes on David Temperley s What s Key for Key? The Krumhansl-Schmuckler Key-Finding Algorithm Reconsidered By Carley Tanoue

Subjective Similarity of Music: Data Collection for Individuality Analysis

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Robert Alexandru Dobre, Cristian Negrescu

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

Features for Audio and Music Classification

Music Alignment and Applications. Introduction

Automatic Piano Music Transcription

Detecting Musical Key with Supervised Learning

Composer Style Attribution

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Topic 10. Multi-pitch Analysis

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

A probabilistic framework for audio-based tonal key and chord recognition

Creating a Feature Vector to Identify Similarity between MIDI Files

MUSIC CONTENT ANALYSIS : KEY, CHORD AND RHYTHM TRACKING IN ACOUSTIC SIGNALS

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Analysis, Synthesis, and Perception of Musical Sounds

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

Singer Recognition and Modeling Singer Error

A Survey of Audio-Based Music Classification and Annotation

Music Information Retrieval with Temporal Features and Timbre

CSC475 Music Information Retrieval

Music Similarity and Cover Song Identification: The Case of Jazz

Analyzer Documentation

Automatic music transcription

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

User-Specific Learning for Recognizing a Singer s Intended Pitch

Improving Frame Based Automatic Laughter Detection

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Music Genre Classification

Semi-supervised Musical Instrument Recognition

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Statistical Modeling and Retrieval of Polyphonic Music

ESTIMATING THE ERROR DISTRIBUTION OF A TAP SEQUENCE WITHOUT GROUND TRUTH 1

Audio Feature Extraction for Corpus Analysis

Measurement of overtone frequencies of a toy piano and perception of its pitch

Neural Network for Music Instrument Identi cation

Effects of acoustic degradations on cover song recognition

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Interacting with a Virtual Conductor

Acoustic and musical foundations of the speech/song illusion

CS 591 S1 Computational Audio

Lyrics Classification using Naive Bayes

THE importance of music content analysis for musical

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Timing In Expressive Performance

Labelling. Friday 18th May. Goldsmiths, University of London. Bayesian Model Selection for Harmonic. Labelling. Christophe Rhodes.

Searching for Similar Phrases in Music Audio

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Transcription:

Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n Style Recognition for Improvisation n Genre Classification n Emotion Classification n Beat Tracking n Key Finding n Harmonic Analysis (Chord Labeling) 2 1

Music Style Classification Pointilistic? Lyrical Frantic Syncopated 3 Video 4 2

What Is a Classifier? n What is the class of a given object? n Image: water, land, sky n Printer: people, nature, text, graphics n Tones: A, A#, B, C, C#, n Broadcast: speech or music, program or ad n In every case, objects have features: n RGB color n RGB Histogram n Spectrum n Autocorrelation n Zero crossings/second n Width of spectral peaks 5 What Is a Classifier? (2) n Training data n Objects with (manually) assigned classes n Assume to be representative sample n Test data n Separate from training data n Also labeled with classes n But labels are not known to the classifier n Evaluation: n Percentage of correctly labeled test data 6 3

Game Plan n We can look at training data to figure out typical features from classes n How do we get classes from features? n à Bayes Theorem n We ll need to estimate P(features class) n Put it all together 7 Bayes Theorem P(A B) = P(A&B)/P(B) P(B A) = P(A&B)/P(A) A A&B B P(A B)P(B) = P(A&B) P(B A)P(A) = P(A&B) P(A B)P(B) = P(B A)P(A) P(A B) = P(B A)P(A)/P(B) 8 4

P(A B) = P(B A)P(A)/P(B) n P(class features) = P(features class)p(class)/p(features) n Let s guess the most likely class n (maximum likelihood estimation, MLE) n Find class that maximizes: P(features class)p(class)/p(features) n And since P(features) independent of class, maximize P(features class)p(class) n Or if classes are equally likely, maximize: P(features class) 9 Bayesian Classifier n The most likely class is the one for which the observed features are most likely. n The most likely class: argmax P(class features) class n The class for which features are most likely: argmax P(features class) class 10 5

Game Plan n We can look at training data to figure out typical features from classes n How do we get classes from features? n à Bayes Theorem n We ll need to estimate P(features class) n Put it all together 11 Estimating P(features class) n A word of caution: Machine learning involves the estimation of parameters. The size of training data should be much larger than the number of parameters to be learned. n Naïve Bayesian classifiers have relatively few parameters, so they tend to be estimated more reliably than parameters of more sophisticated classifiers, hence a good place to start. 12 6

What s P(features class)? n Let s make a big (and wrong) assumption: n P(f1, f2, f3,, fn class) = P(f1 class)p(f2 class)p(f3 class) P(fn class) n This is the independence assumption n Let s also assume (also wrong) P(f i class) is normally distributed n So it s characterized completely by: n mean n standard deviation n Naive Bayesian Classifier: assumes features are independent and Gaussian 13 Estimating P(features class) (2) n Assume the distribution is Normal (same as Gaussian, Bell Curve) n Mean and variance are estimated by simple statistics on test set: n Classes partition test set into distinct sets n Collect mean and variance for each class n Multiple features have a multivariate normal distribution: n Intuition: Assuming independence, P(features class) is related to the distance from the peak (mean) to the feature 14 7

Putting It All Together n F i = i th feature n C = class n µ = mean n σ = standard deviation n Δ C = normalized distance from class n Estimate mean and standard deviation just by computing statistics on training data n Classifier computes Δ C for every class and picks the class (C) with the smallest value. 15 Style Recognition for Improvisation n Features are: n # of notes n Avg. midi key no n Std.Dev. of midi key no n Avg. duration n Std.Dev. of duration n Avg. duty factor n Windowed MIDI Data: n Std.Dev. of duty factor n No. of pitch bends n Avg. pitch n Std.Dev. of pitch n No. of volume controls n Avg. volume n Std.Dev. of volume 16 8

A Look At Some Data (Not all scatter plots show the data so well separated) 17 Training n Computer says what style to play n Musician plays in that style until computer says stop n Rest n Play another style n Note that collected data is labeled data 18 9

Results n With 4 classes, 98.1% accuracy n Lyrical n Syncopated n Frantic n Pointillistic n With 8 classes, 90.0% accuracy n Additional classes: blues, quote, high, low n Results did not apply to real performance situation, n but retraining in context helped 19 Cross-Validation Test Test Training Data Data Data Test Data Test Data Test Data 20 10

Other Types of Classifiers n Linear Classifier n assumes normal distributions n but not independence n closed-form, very fast training (unless many features) n Neural Networks capable of learning when features are not normally distributed, e.g. bimodal distributions. n knn k-nearest Neighbors n Find k closest exemplars in training data n SVM support vector machines 21 In Practice: Classifier Software n MATLAB Neural Networks, others n Weka http://www.cs.waikato.ac.nz/~ml/weka/ n Widely used n General data-mining toolset n ACE http://coltrane.music.mcgill.ca/ace/ n Especially made for music research n Handles classes organized as a hierarchical taxonomy n Includes sophisticated feature selection (note that sometimes classifiers get better with fewer features!) 22 11

Genre Classification n Popular task in Music Information Retrieval n Usually applied to audio n Features: n Spectrum (energy at different frequencies) n Spectral Centroid n Cepstrum coefficients (from speech recog.) n Noise vs. narrow spectral lines n Zero crossings n Estimates of beat strength and tempo n Statistics on these including variance or histograms 23 Typical Results n Artist ID: 148 artists, 1800 files n à 60-70% correct n Genre: 10 classes: ambient, blues, classical, electronic, ethnic, folk, jazz, new_age, punk, rock n à~80% correct n Example: http://www.youtube.com/watch?v=ndlhrc_wr5q 24 12

Summary n Machine Classifiers are an effective and not-so-difficult way to process music data n Convert low-level feature to high-level abstract concepts such as style n Can be applied to many problems: n Genre n Emotion n Timbre n Speech/music discrimination n Snare/hi-hat/bass drum/cowbell/etc. 25 Summary (2) n General Problem: map feature vector to class n Bayes Theorem tells us probability of class given feature vector is related to probability of feature vector given class n We can estimate the latter from training data 26 13

Beat Tracking The Problem n The foot tapping problem n Find the positions of beats in a song n Related problem: estimate the tempo (without resolving beat locations) n Two big assumptions: n Beats correspond to some acoustic feature(s) n Successive beats are spaced about equally (i.e. tempo varies slowly) 28 14

Acoustic Features n Can be local energy peaks n Spectral flux: the change from one short-term spectrum to the next n High Frequency Content: spectrum weighted toward high frequencies n With MIDI data, you can use note onsets 29 A Basic Beat Tracker n Start with initial tempo and first beat (maybe the onset of the first note) n Predict expected location of next beat n If actual beat is in neighborhood, speed up or slow down according to error 30 15

Society of Agents Model 31 Society of Agents (2) n Each agent tries to find periodic beats much like the basic beat tracker, but with a limited range of tempi n Agents report how well they are doing n A supervisor picks the best agent and may arrange for handoff from one agent to another n Agent is a bit overblown and anthropomorphic it s just a simple software object 32 16

Filter Bank and Oscillator Models Onset Detect 33 Oscillators n Some oscillator models (particularly in work by Ed Large) are inspired by actual neurons n Oscillators maintain approximate frequency but phase can be adjusted 34 17

Agents and Oscillators n Note that Agents act like oscillators n Detect periodicity n Tuned to small range of tempi n My opinion: n Music data is so noisy, you need to search within a narrow range of tempi n A wide-tempo-range tracker is likely to get lost n That s why multiple agents/oscillators work 35 Key Finding n Standard (or at least common) approach is based on Krumhansl-Schmuckler Key-Finding Algorithm n In turn based on key profile: essentially a histogram of pitches observed in a given key. n Key is estimated by: n Create a profile for a given work n Find the closest match among the Krumhansl- Schmuckler profiles 36 18

Variations on Key Finding n Weighting profile by note duration n Using exponential decay to give a more local estimate of key center n Using spectrum rather than pitches when the data is audio n Probably better results can be obtained with machine learning approaches and more features related to tonal harmony 37 Harmonic Analysis/Chord Labeling n An under-constrained problem n Goal is to give chord labels to music C F C Labeling #1 C Labeling #2 F is a passing tone 38 19

Chords n Conventionally, chords have 3 or 4 notes separated by major and minor thirds (intervals of 4 or 3 semitones) Major triad = 4 + 3 Minor triad = 3 + 4 Dominant Seventh = 4 + 3 + 3 39 Chords Can Be Complex n Any configuration of notes has an associated chord type (which may be highly improbable): n E.g. = C dominant seventh with a flat-5, added sharp 9 th, 11 th, and 13 th n Chords can change at any time: n Chords do not necessarily match all the notes (extra notes are called non-chord tones) 40 20

Chords as Hidden Variables Hidden State: chords chord chord chord chord Observables: notes 41 How Can We Approach This Problem? n Find a balance between n use relatively few chords n get good match between observed notes and chords (minimize non-chord tones) n Create a scoring function to rate a chord labeling n Penalty for each new chord n Penalty for each non-chord tone n Search for optimal labeling 42 21

What Do We Label? n Every place a note begins or ends, start a new segment (Pardo and Birmingham call this a concurrency) 43 Chord Labeling as Graph Algorithm Nodes are concurrencies, arcs are the cost of consolidating concurrencies and labeling them as one chord. n Cost depends on some assumptions, but can be N^2 using shortest path algorithm 44 22

Chord Recognition from Audio n For the latest, most advanced techniques, see the literature (esp. ISMIR Proceedings) n Another classification problem? n Given audio, classify into a chord type n Need to think about: n Labeled training data n Features n Training procedure 45 Chord Recognition: Training Data n (1) Use hand-labeled audio n (2) Create labels automatically from MIDI data; create audio by synthesizing MIDI n (3) Create labels automatically from MIDI; align MIDI to "real" audio (we will talk about alignment later) n Note: theoretically 2^12 chords, but typically stick to some subset of major, minor, dominant 7th, diminished, and augmented (each in all 12 transpositions) 46 23

Features: A Diversion on FFT n Audio analysis often begins with frequency content analysis. n Our ear is in some sense a frequency analyzer n Shape of the audio waveform is not really significant -- shifting the phase of one note can change wave shape completely, even if it "sounds the same" n Every sound can be broken down into frequency components: left Sound File right frequency frequency analyzer analyzer 47 FFT 48 n Typically many more frequency "bins" n Not continuous n Divide signal into regions called frames (not to be confused with sample periods) n Typical frame is 10 to 100ms n Each frame analyzed separately n 256 to 2048 frequency bins per frame http://www.dsprelated.com/josimages/sasp/img1411.png 24

FFT Frames 49 FFT Parameters n Frequencies in audio range from 0 to half the sample rate n An n-point FFT uses n samples, so it spans n/sr seconds n There are n/2 frequency bins, all same width over range from 0 to SR/2, so each bin is SR/n Hz wide. n Example: 4096-point FFT and 44.1kHz sample rate n Bins are 44.1k/4096 = 10.7Hz wide n Semitones (ratio of 1.059) are 10.7Hz wide at 181Hz n F3 in Hz is 175, F#3 in Hz is 185 n Larger FFT -> better frequency resolution n Smaller FFT -> better time resolution 50 25

Chroma Vector Source: Tristan Jehan, PhD Thesis 51 Chroma Vectors n Note that any given tone will have overtones that contribute to many chroma bins: n 3rd harmonic is roughly 19 semitones n 5th harmonic is roughly 28 semitones n 6th harmonic is roughly 31 semitones n 7th harmonic is roughly 34 semitones n (none of these is a factor of 12) 52 26

Why Chroma Vector? n Experience shows that chroma vectors capture harmonic and melodic information n Chroma vectors do not capture timbral information (well) n C major on a piano looks like C major from string orchestra -- this is a good thing! n Chroma vectors are typically normalized to eliminate any loudness information 53 Building a Simple Classifier n Classes are chords n E.g. major/minor * 12 gives 24 classes n Train classifier on labeled data n Computation n For each FFT frame: n Compute chroma vector (12 features) n Run classifier n Output most likely chord label n Example: https://www.youtube.com/watch?v=kh8mgjkefou 54 27

Using Context n "Absolute" (a priori) information: n Chord probabilities: e.g. P(major) > P(augmented) n Smoothing: n The sequence CCCCGCCCCC is likely all C's n Dynamic programming is a good way to optimize tradeoff between "cost" of transitions to new chords and likelihoods of chord choices n Context n Chord sequences are not random n Hidden Markov Models often used to model chord sequences and prefer chords that are more likely due to context. 55 Some References n Robert Rowe: Machine Musicianship n David Temperley: The Cognition of Basic Musical Structures n Danny Sleator: http://www.link.cs.cmu.edu/music-analysis/ (algorithms online) n ISMIR Proceedings (all online) 56 28

Summary and Conclusions n Music involves communication n Communication usually involves some conventions: syntax, phonemes, frequencies, selected/modulated to convey meaning n In music, notes are the syntax; meaning is somewhere else n Music Understanding attempts to get at these more abstract levels of meaning 57 Summary and Conclusions (2) n Many of these techniques are for tonal music n It s rich with structure and convention n We understand it well enough to decide what s right and what s wrong (to some extent) n But it s not what s happening now in music n Or at least it s restricted to popular music n Future work needs music theory, representations for time-based data, and sophisticated pattern recognition 58 29