Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Similar documents
Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Music Alignment and Applications. Introduction

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

Music Radar: A Web-based Query by Humming System

Outline. Why do we classify? Audio Classification

Singer Recognition and Modeling Singer Error

MUSI-6201 Computational Music Analysis

Query By Humming: Finding Songs in a Polyphonic Database

CSC475 Music Information Retrieval

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Content-based music retrieval

Algorithms for melody search and transcription. Antti Laaksonen

A Bootstrap Method for Training an Accurate Audio Segmenter

Automatic music transcription

Robert Alexandru Dobre, Cristian Negrescu

Week 14 Music Understanding and Classification

Chord Classification of an Audio Signal using Artificial Neural Network

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

User-Specific Learning for Recognizing a Singer s Intended Pitch

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Singer Traits Identification using Deep Neural Network

Music Information Retrieval. Juan P Bello

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

Effects of acoustic degradations on cover song recognition

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Introductions to Music Information Retrieval

Tempo and Beat Analysis

Evaluation of Melody Similarity Measures

A repetition-based framework for lyric alignment in popular songs

Subjective Similarity of Music: Data Collection for Individuality Analysis

CS 591 S1 Computational Audio

The song remains the same: identifying versions of the same piece using tonal descriptors

MATCH: A MUSIC ALIGNMENT TOOL CHEST

2. AN INTROSPECTION OF THE MORPHING PROCESS

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

Transcription of the Singing Melody in Polyphonic Music

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Music Processing Audio Retrieval Meinard Müller

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Predicting Variation of Folk Songs: A Corpus Analysis Study on the Memorability of Melodies Janssen, B.D.; Burgoyne, J.A.; Honing, H.J.

Pattern Based Melody Matching Approach to Music Information Retrieval

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

TANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM. M. Anand Raju, Bharat Sundaram* and Preeti Rao

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL

Pattern Recognition in Music

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Music Information Retrieval

Automatic Laughter Detection

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Audio alignment for improved melody transcription of Irish traditional music

A Music Retrieval System Using Melody and Lyric

Topic 10. Multi-pitch Analysis

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

AUDIO-BASED COVER SONG RETRIEVAL USING APPROXIMATE CHORD SEQUENCES: TESTING SHIFTS, GAPS, SWAPS AND BEATS

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Music Information Retrieval Using Audio Input

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

A LYRICS-MATCHING QBH SYSTEM FOR INTER- ACTIVE ENVIRONMENTS

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

THE importance of music content analysis for musical

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Toward Automatic Music Audio Summary Generation from Signal Analysis

Contextual music information retrieval and recommendation: State of the art and challenges

Lecture 15: Research at LabROSA

Audio Structure Analysis

The dangers of parsimony in query-by-humming applications

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Discovering Musical Structure in Audio Recordings

Music Database Retrieval Based on Spectral Similarity

Statistical Modeling and Retrieval of Polyphonic Music

A probabilistic framework for audio-based tonal key and chord recognition

Retrieval of textual song lyrics from sung inputs

Automatic Construction of Synthetic Musical Instruments and Performers

Automatic Laughter Detection

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

Improving Frame Based Automatic Laughter Detection

Music Segmentation Using Markov Chain Methods

Automatic Rhythmic Notation from Single Voice Audio Sources

Analysis of local and global timing and pitch change in ordinary

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

Content-based Indexing of Musical Scores

Figure 1: Feature Vector Sequence Generator block diagram.

Music Representations

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Music Information Retrieval for Jazz

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

Music Understanding and the Future of Music

MUSIC CONTENT ANALYSIS : KEY, CHORD AND RHYTHM TRACKING IN ACOUSTIC SIGNALS

HST 725 Music Perception & Cognition Assignment #1 =================================================================

CS229 Project Report Polyphonic Piano Transcription

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

Data Driven Music Understanding

Transcription:

Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music

Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2

Metadata-based Retrieval n Title n Artist n Genre n Year n Instrumentation n Etc. n What if we could search by content instead? 3

Melody-Based Retrieval n Representations: n Pitch sequence (not transposition invariant) n Intervals (chromatic or diatonic) n Approximate Intervals (unison, seconds, thirds, large) n Up/Down/Same: sududdsududdsuddddusddud n Rhythm can be encoded too: n IOI = Inter-onset interval n Duration sequences n Duration ratio sequences n Various quantization schemes 4

Indexing n Easily done, given exact, discrete keys* n Pitch-only index of incipits** n Manual / Printed index works if melody is transcribed without error *here, key is used in the CS sense of Searching involves deciding whether a search key is present in the data (as opposed to musical keys) ** the initial notes of a musical work 5

Computer-Based Melodic Search n Dynamic Programming n Typical Problem Statement: find the best match in a database to a query n Query is a sequence of pitches n best match means some substring of some song in the database with minimum edit distance n Query does not have to match beginning of song n Query does not have to contain entire song 6

What Features to Match? Absolute Pitch: 67 69 71 67 Relative Pitch: 2 2-4 IOI: 1.5.5 1 IOI Ratio:.5 1 2 Log IOI Ratio: -1 1 7

Dynamic Programming for Music Retrieval Initial Skip Cost is Zero... -1-2 -3-4 -5-6 -7 Skip Cost for Query Notes is 1 (per note) Read off minimum value in last column to find best match. 8

Example melody: key: C D A G F C -1-2 -3-4 A G E C D G 9

Example melody: key: C D A G F C -1-2 -3-4 -1-2 -3-2 -1-2 -3-3 A G E C D G 1-1 -2 2 1-1 1 1-1 2-1 -1-1 1-1 -1 1

Search Algorithm n For each melody in database: n Compute the best match cost for the query n Report the melody with the lowest cost n Linear in size of database and size of query 11

Themes n In many projects, themes are entered by hand n In MUSART, themes are extracted automatically from MIDI files n Interesting research in its own right n Colin Meek: themes are patterns that occur most often n Encode n-grams as bit strings and sort n Add some heuristics to emphasize interesting melodic material n Validated by comparing to a published thematic index 12

How Do We Evaluate Searching? n Typically there is a match score for each document n Sort the documents according to scores n Percent in top 1 : Count number of relevant /correct documents ranked in the top 1 n Mean Reciprocal Rank : the mean value of 1/rank, where rank is the lowest rank of a correct document. 1=perfect, worst à 13

MRR Example n Test with 5 keys (example only, you really should test with many) n Each search returns a list of top picks. n Let s say the correct matches rank #3, #1, #2, #2, and #1 in the lists of top picks n Reciprocals: 1/3, 1/1, ½, 1/2, 1/1 =.33, 1.,.5,.5,.1 n Sum = 1.98, divide by 5 ->.4 n MRR =.4 14

Corpus Musical Abstraction Processing Representation Translation From ISMIR 21/23 Theme Finding Chroma Analysis Markov Representation Frame Representation Search Techniques Markov Distance Melodic Pattern. Contour Distance Style Classifier Vowel Classifier Viterbi Search. Database. MUSART 15 Query Interface User Browsing Interface

Queries Databases High Quality: 16 queries, 2 singers, 1 folk songs Beatles (Set #1): 131 queries, 1 singers, 1 Beatles songs Popular (Set #2): 165 queries, various popular songs 1, Folk songs 258 Beatles songs (2844 themes) 868 Popular songs (8926 themes) 16

How good/bad are the queries? Good Match Partial Match Out-of-order or repetition No Match 17

Results Representations MRR Absolute Pitch & IOI.194 Absolute Pitch & IOIR.452 Absolute Pitch & LogIOIR.516 Relative Pith & IOI.132 Relative Pitch & IOIR.1355 Relative Pitch & LogIOIR.2323 18

Insertion/Deletion Costs C ins : C del MRR.5 :.5.129 1. : 1..1484 2. : 2..1613 1. :.5.1161 1.5 : 1..1355 2. : 1..129.5 : 1..1742 C ins : C del MRR 1. : 1.5.2.2 : 2..2194.4 : 2..2323.6 : 2..2323.8 : 2..2258 1. : 2..2129 19

Other Possibilities n Indexing not robust because of errors n N-gram indexing also not very robust n Dynamic Time Warping n Hidden Markov Models 2

N-Grams 21 n G G A G C B G G n à GGA, GAG, AGC, GCB, CBG, BGG, n A common text search technique n Rate documents by number of matches n Fast search by index (from n-gram to documents containing the n-gram) n Term Frequency Weighting n tf =count or percentage of occurrences in document n Inverse Document Frequency Weighting n idf = log(#docs / #(docs with matches)) n Does not work well (in our studies) with sung queries due to the high error rates: n n-grams are either to short to be specific or n n-grams are too long to get exact matches n Need something with higher precision

Dynamic Time Warping 22

Dynamic Time Warping (2) Target Data 6 6.1 6.2 65 64.9 Query Data 6 65 65 23

DP vs DTW n Dynamic Time Warping (DTW) is a special case of dynamic programming n (As is the LCS algorithm) n DTW implies matching or alignment of time-series data that is sampled at equal time intervals n Has some advantage for melody matching no need to parse melody into discrete notes 24

Calculation Patterns for DTW a b d = max(a, b + deletecost, c + insertcost) + distance c d The slope of the path is between ½ and 2. This tends to make warping more plausible, but ultimately, you should test on real data rather than speculate about these things. (In our experiments, this really does help for query-by-humming searches.) 25

Hidden Markov Models n Queries can have many types of errors: n Local pitch errors n Modulation errors n Local rhythm errors n Tempo change errors n Insertion and deletion errors n HMMs can encode errors as states and use current state (error type) to predict what will come next n Best match is an explanation of errors including their probabilities 26

Dynamic Programming with Probabilities n What does DP compute? Path length, a sum of costs based on mismatches, skips, and deletions. n Probability of independent events: P(a, b, c) = P(a)P(b)P(c) n So, log(p(a, b, c)) = log(p(a)) + log(p(b)) + log(p(c)) n Therefore, DP computes the most likely path, where each branch in the path is independent, and where skip, delete, and match costs represent logs of probabilities. 27

Example for Melodic Matching n Collect some typical vocal queries n By hand, label the queries with correct pitches (what the singer was trying to sing, not what they actually sang) n Get computer to transcribe the queries n Construct a histogram of relative pitch error: -12 (octave error) 12 (octave error) n With DP string matching, we added 1 for a match. With this approach, we add log(p(interval)). Skip and deletion costs are still ad-hoc. 28

Audio to Score Alignment Ning Hu, Roger B. Dannenberg and George Tzanetakis

Music Representations n Symbolic Representation n easy to manipulate n flat performance n Audio Representation n expressive performance n opaque & unstructured 3

Music Representations Align n Symbolic Representation n easy to manipulate n flat performance n Audio Representation n expressive performance n opaque & unstructured 31

Motivation n Query-by-Humming: find audio file from sung query n Where do we get a database of melodies (can t extract melody from general audio)? n Melodies can be extracted from MIDI files n Can we then match the MIDI files to audio files? 32

Alignment to Audio n Related work: please see paper & ISMIR3 n Obtain features from audio and from score n Chromagram n Pitch Histogram n Mel Frequency Cepstral Coefficients (MFCC) n Use DTW to align feature strings 33

Acoustic Features Chromagram n Sequence of 12-element Chroma vectors n Each element represents spectral energy corresponding to one pitch class (C, C#, D, ) n Computing process: Audio data (.25s per frame, non-overlapping) FFT Collapse into one octave (12 pitch classes) Average Magnitude of FFT bins Chroma Vectors n Advantages: n Sensitive to prominent pitches and chords n Insensitive to spectral shape 34

Chromagram Representation Spectrum Linear frequency to log frequency: "Semi vector": one bin per semitone Projection to pitch classes: "Chroma vector" C 1 +C 2 +C 3 +C 4 +C 5 +C 6 +C 7, C# 1 +C# 2 +C# 3 +C# 4 +C# 5 +C# 6 +C# 7, etc. "Distance Function": Euclidean, Cosine, etc. 5 c 217 c 218 by Roger by Roger B. B. Dannenberg

Alignment Audio File Analysis Frame Sequence MIDI File MIDIàAudio Analysis Frame Sequence DTW Alignment Path Average Frame Distance In Path 36

Comparing & Matching Chroma n Two sequences of chroma vectors n Audio from MIDI (using Timidity renderer) n Acoustic recording n Chroma comparison n Normalize chroma vectors (µ =, σ = 1) n Calculate Euclidean distance between vectors n Distance = perfect agreement 37

Locate Optimal Alignment Path n Dynamic Time Warping (DTW) algorithm j i C D A B D = M i,j = min(a,b,c)+dist(i,j) n The calculation pattern for cell (i,j) in the matrix 38

Similarity Matrix (Duration: 7:49) 39 Similarity Matrix for Beethoven s 5 th Symphony, first movement (Duration: 6:17)

Similarity Matrix (Duration: 7:49) Optimal Alignment Path Oboe solo: Acoustic Recording Audio from MIDI 4 Similarity Matrix for Beethoven s 5 th Symphony, first movement (Duration: 6:17)

Optimization n Chroma is not sensitive to timbre MIDI synthesized using original symphonic instrumentation MIDI synthesized using only piano sound n Avoid MIDI synthesizing & extracting chroma vectors n Map each pitch to a chroma vector n Sum vectors & then normalize 41

Alignment Audio File Analysis Frame Sequence MIDI File MIDIàAudio Analysis Frame Sequence DTW Alignment Path Average Frame Distance In Path 42

Alignment Successful Even With Vocals Let It Be with vocals matched with MIDI data 43

Intelligent Audio Editor Mock-up 44

Summary & Conclusions (on Audio to Score Alignment) n How to align MIDI to Audio n Simple computation no learning, few parameters to tune n Evaluated several different features n Investigated searching for MIDI files given audio n Building a bridge between signal and symbol representations n In many cases, serves as a replacement for polyphonic music transcription 45

Music Fingerprinting photo by Philips

Music Fingerprinting n Motivation: How do you n find the title of a song playing in a club n or on the radio n generate playlists from radio broadcasts for royalty distribution n detect copies of songs n find original work, given a copy n Note: recordings and copies have many kinds of distortion and time stretching 47

Audio Fingerprinting Problem Statement n Given: a partial copy of a music recording (usually about 1 or 15 seconds), n with some distortion n E.g. cell phone audio n Radio stations often shorten songs n Given: a database of original, high-quality audio n Find: audio in database that is, with high probability, the original recording 48

How It Works (General) n Find some unique audio features that survive distortion and transformation with high probability n Build an index from (quantized) features to database n Search: n Calculate (many) features from query n Look up matching songs in database n Output song(s) with sufficient number of matches 49

Features: Spectral Flux n Philips system uses spectral flux: audio FFT derivative derivative derivative derivative derivative derivative quantize quantize quantize quantize quantize quantize 1 1 1 n Output is stream of 32-bit words n Each word is indexed n Search looks for a number of exact matches that indicate a roughly constant time stretch 5

Comparing Fingerprints 51

Features: Spectral Peaks n Shazam uses pairs of spectral peaks:.......................... Spectrogram... n Peaks are likely to survive any distortion and time stretch n Pairs are unique enough to serve as a good index 52

Performance and Business n Shazam: >1M tunes, 3s retrieval by cell phone n Gracenote bought Philips technology (Gracenote is behind CDDB). Says 28M songs (wikipedia). Mobil Music ID - phone in song to buy matching ringtone. n NTT and others announced systems in the past n Echo Nest (bought by Spotify) n Last.fm 53

Query-By-Humming with Audio Database n Problem: given an audio database, find songs that match a sung audio query n So far, extracting melody from audio is quite difficult and error prone. n QBH with symbolic data is already pretty marginal n A few systems have been built SoundHound, Midomi but results are not nearly as strong as with music fingerprinting 54

Finding Covers n A cover is a performance of a song by someone other than the original artist n Finding covers in a database, given the original recording is similar to music fingerprinting, but n Music Fingerprinting uses distinctive acoustic features, n Not high-level semantic features that are shared between originals and covers n Some success matching chromagram features computed at very low (1 second) rates averages almost all but chord/key change/very prominent melodic material. 55

Music Information Retrieval Summary n Query-By-Humming: n Techniques n String matching techniques n Dynamic Time Warping n Hidden Markov Models n N-Grams n Representation is critical n Tie DP & DTW to (log) probabilities 56

Music Information Retrieval Summary (2) n Audio to Score Matching n Chromagram representation is very successful n Robust enough for real-world applications now n Audio Fingerprinting n Key is to find robust and distinctive acoustic features n Indexing used for fast retrieval n Some post processing to select songs with multiple consistent hits n Already a big business 57

Summary n Music Fingerprinting works by forming an index of features that are highly reproducible from (re)recorded audio n Audio-to-Symbolic Music Alignment works well, at least with limited temporal precision n Other MIR tasks: Query by Humming and Cover Song Detection are much more difficult; no general and robust solutions exist. 58