Audio Structure Analysis

Similar documents
Audio Structure Analysis

Music Structure Analysis

Audio Structure Analysis

Music Structure Analysis

CS 591 S1 Computational Audio

Music Structure Analysis

AUTOMATED METHODS FOR ANALYZING MUSIC RECORDINGS IN SONATA FORM

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

Tempo and Beat Tracking

Music Processing Audio Retrieval Meinard Müller

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

Informed Feature Representations for Music and Motion

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Tempo and Beat Analysis

Music Processing Introduction Meinard Müller

MUSI-6201 Computational Music Analysis

Music Information Retrieval (MIR)

Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 marl music and audio research lab

Shades of Music. Projektarbeit

Music Information Retrieval (MIR)

AUDIO-BASED MUSIC STRUCTURE ANALYSIS

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

The song remains the same: identifying versions of the same piece using tonal descriptors

A repetition-based framework for lyric alignment in popular songs

AUDIO MATCHING VIA CHROMA-BASED STATISTICAL FEATURES

A MID-LEVEL REPRESENTATION FOR CAPTURING DOMINANT TEMPO AND PULSE INFORMATION IN MUSIC RECORDINGS

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

AUDIO-BASED MUSIC STRUCTURE ANALYSIS

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Music Segmentation Using Markov Chain Methods

Melody Retrieval On The Web

MODELS of music begin with a representation of the

Further Topics in MIR

Music Information Retrieval

Chord Recognition. Aspects of Music. Musical Chords. Harmony: The Basis of Music. Musical Chords. Musical Chords. Music Processing.

Automatic Piano Music Transcription

Music Similarity and Cover Song Identification: The Case of Jazz

Semantic Segmentation and Summarization of Music

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Music Alignment and Applications. Introduction

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310,

SHEET MUSIC-AUDIO IDENTIFICATION

GROUPING RECORDED MUSIC BY STRUCTURAL SIMILARITY

CS229 Project Report Polyphonic Piano Transcription

Popular Song Summarization Using Chorus Section Detection from Audio Signal

USING MUSICAL STRUCTURE TO ENHANCE AUTOMATIC CHORD TRANSCRIPTION

New Developments in Music Information Retrieval

Music Representations

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

DISCOVERY OF REPEATED VOCAL PATTERNS IN POLYPHONIC AUDIO: A CASE STUDY ON FLAMENCO MUSIC. Univ. of Piraeus, Greece

Music Database Retrieval Based on Spectral Similarity

JOINT STRUCTURE ANALYSIS WITH APPLICATIONS TO MUSIC ANNOTATION AND SYNCHRONIZATION

Analysing Musical Pieces Using harmony-analyser.org Tools

Query By Humming: Finding Songs in a Polyphonic Database

Chord Classification of an Audio Signal using Artificial Neural Network

TANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM. M. Anand Raju, Bharat Sundaram* and Preeti Rao

Discovering Musical Structure in Audio Recordings

CHAPTER 6. Music Retrieval by Melody Style

Beethoven, Bach, and Billions of Bytes

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Toward Automatic Music Audio Summary Generation from Signal Analysis

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Audio Feature Extraction for Corpus Analysis

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

The Effect of DJs Social Network on Music Popularity

Meinard Müller. Beethoven, Bach, und Billionen Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Automatic Music Genre Classification

Effects of acoustic degradations on cover song recognition

Piya Pal. California Institute of Technology, Pasadena, CA GPA: 4.2/4.0 Advisor: Prof. P. P. Vaidyanathan

Subjective Similarity of Music: Data Collection for Individuality Analysis

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

Wipe Scene Change Detection in Video Sequences

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data

Aspects of Music. Chord Recognition. Musical Chords. Harmony: The Basis of Music. Musical Chords. Musical Chords. Piece of music. Rhythm.

TOWARD AUTOMATED HOLISTIC BEAT TRACKING, MUSIC ANALYSIS, AND UNDERSTANDING

Perceptual Evaluation of Automatically Extracted Musical Motives

Music Radar: A Web-based Query by Humming System

Hidden Markov Model based dance recognition

Lecture 9 Source Separation

Computational Modelling of Harmony

Torsional vibration analysis in ArtemiS SUITE 1

Towards Automated Processing of Folk Song Recordings

Topic 10. Multi-pitch Analysis

FREISCHÜTZ DIGITAL: A CASE STUDY FOR REFERENCE-BASED AUDIO SEGMENTATION OF OPERAS

Statistical Modeling and Retrieval of Polyphonic Music

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT

Doubletalk Detection

Deep learning for music data processing

TOWARDS AUTOMATED EXTRACTION OF TEMPO PARAMETERS FROM EXPRESSIVE MUSIC RECORDINGS

ONE main goal of content-based music analysis and retrieval

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Transcription:

Lecture Music Processing Audio Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Music Structure Analysis Music segmentation pitch content (e.g., melody, harmony) music texture (e.g., timbre, instrumentation, sound) rhythm Detection of repeating sections, phrases, motives song structure (e.g., intro, versus, chorus) musical form (e.g., sonata form, rondo form) Detection of other hidden relationships

Repetition-Based Audio Structure Analysis Extract the repetitive structure of a given audio recording Often corresponds to musical form of the underlying piece The thumbnail is the most repetitive segment Repetition-Based Audio Structure Analysis Extract the repetitive structure of a given audio recording Often corresponds to musical form of the underlying piece The thumbnail is the most repetitive segment Example: Zager & Evans In The Year 2525 50 100 150

Repetition-Based Audio Structure Analysis Extract the repetitive structure of a given audio recording Often corresponds to musical form of the underlying piece The thumbnail is the most repetitive segment Example: Folk Song Field Recording (Nederlandse Liederenbank) Repetition-Based Audio Structure Analysis Extract the repetitive structure of a given audio recording Often corresponds to musical form of the underlying piece The thumbnail is the most repetitive segment Example: Brahms Hungarian Dance No. 5 (Ormandy) 50 100 150 200

Repetition-Based Audio Structure Analysis Extract the repetitive structure of a given audio recording Often corresponds to musical form of the underlying piece The thumbnail is the most repetitive segment Example: Brahms Hungarian Dance No. 5 (Ormandy) 50 100 150 200 Repetition-Based Audio Structure Analysis Extract the repetitive structure of a given audio recording Often corresponds to musical form of the underlying piece The thumbnail is the most repetitive segment Example: Brahms Hungarian Dance No. 5 (Ormandy) 50 100 150 200

Repetition-Based Audio Structure Analysis Extract the repetitive structure of a given audio recording Often corresponds to musical form of the underlying piece The thumbnail is the most repetitive segment Lots of previous work such as: Dannenberg/Hu (ISMIR 2002) Peeters/Burthe/Rodet (ISMIR 2002) Cooper/Foote (ISMIR 2002) Goto (ICASSP 2003) Chai/Vercoe (ACM Multimedia 2003) Lu/Wang/Zhang (ACM Multimedia 2004) Bartsch/Wakefield (IEEE Trans. MM 2005) Goto (IEEE Trans. Audio 2006) Müller/Kurth (EURASIP 2007) Rhodes/Casey (ISMIR 2007) Peeters (ISMIR 2007) Paulus/Klapuri (IEEE TASLP 2009) Paulus/Müller/Klapuri (ISMIR 2010) Müller/Grosche/Jiang (ISMIR 2011) System: SmartMusicKiosk (Goto)

System: SyncPlayer/AudioStructure Basic Procedure Audio features Cost measure and cost matrix self-similarity matrix Path extraction (pairwise similarity of segments) Global structure (clustering, grouping)

Basic Procedure Audio = 12-dimensional normalized chroma vector Local cost measure cost matrix quadratic self-similarity matrix Basic Procedure Self-similarity matrix Similarity structure 50 100 150 200

Basic Procedure Self-similarity matrix Similarity structure 50 100 150 200 Basic Procedure Self-similarity matrix Similarity structure 50 100 150 200

Basic Procedure Self-similarity matrix Similarity structure 50 100 150 200 Basic Procedure Self-similarity matrix Similarity structure 50 100 150 200

Basic Procedure Self-similarity matrix Similarity structure 50 100 150 200 Basic Procedure Self-similarity matrix Path relations 1 2 3 4 5 6 7 50 100 150 200

Basic Procedure Self-similarity matrix Path relations 1 2 3 4 5 6 7 50 100 150 200 Grouping / Transitivity Basic Procedure Self-similarity matrix Path relations 1 2 3 4 5 6 7 50 100 150 200 Grouping / Transitivity

Basic Procedure Self-similarity matrix Path relations 1 2 3 4 5 6 7 50 100 150 200 Grouping / Transitivity 50 100 150 200 Matrix Enhancement Challenge: Presence of musical variations Fragmented paths and gaps Paths of poor quality Regions of constant (low) cost Curved paths Idea: Enhancement of path structure

Matrix Enhancement Shostakovich Waltz 2, Jazz Suite No. 2 (Chailly) Matrix Enhancement Idea: Usage of contextual information (Foote 1999) Comparison of entire sequences length of sequences enhanced cost matrix smoothing effect

Matrix Enhancement (Shostakovich) Cost matrix Matrix Enhancement (Shostakovich) Enhanced cost matrix

Matrix Enhancement (Brahms) Cost matrix Matrix Enhancement (Brahms) Enhanced cost matrix Problem: Relative tempo differences are smoothed out

Matrix Enhancement Idea: Smoothing along various directions and minimizing over all directions th direction of smoothing enhanced cost matrix w.r.t. Usage of eight slope values tempo changes of -30 to +40 percent Matrix Enhancement

Matrix Enhancement Cost matrix Matrix Enhancement Cost matrix with Filtering along main diagonal

Matrix Enhancement Cost matrix with Filtering along 8 different directions and minimizing Path Extraction Start with initial point Extend path in greedy fashion Remove path neighborhood

Path Extraction Cost matrix Path Extraction Enhanced cost matrix

Path Extraction Enhanced cost matrix Path Extraction Thresholded

Path Extraction Thresholded, upper left Path Extraction Path removal

Path Extraction Path removal Path Extraction Path removal

Path Extraction Extracted paths Path Extraction Extracted paths after postprocessing

Global Structure Global Structure How can one derive the global structure from pairwise relations?

Global Structure Taks: Computation of similarity clusters Problem: Missing and inconsistent path relations Strategy: Approximate transitive hull Global Structure Path relations

Global Structure Path relations Global Structure Path relations

Global Structure Path relations Global Structure Path relations

Global Structure Path relations Final result Ground truth Transposition Invariance Example: Zager & Evans In The Year 2525

Transposition Invariance Goto (ICASSP 2003) Cyclically shift chroma vectors in one sequence Compare shifted sequence with original sequence Perform for each of the twelve shifts a separate structure analysis Combine the results Transposition Invariance Goto (ICASSP 2003) Cyclically shift chroma vectors in one sequence Compare shifted sequence with original sequence Perform for each of the twelve shifts a separate structure analysis Combine the results Müller/Clausen (ISMIR 2007) Integrate all cyclic information in one transposition-invariant self-similarity matrix Perform one joint structure analysis

Transposition Invariance Example: Zager & Evans In The Year 2525 Original: Transposition Invariance Example: Zager & Evans In The Year 2525 Original: Shifted:

Transposition Invariance Transposition Invariance

Transposition Invariance Transposition Invariance

Transposition Invariance Minimize over all twelve matrices Transposition Invariance Thresholded self-similarity matrix

Transposition Invariance Path extraction Transposition Invariance Path extraction Computation of similarity clusters

Transposition Invariance Stabilizing effect Self-similarity matrix (thresholded) Transposition Invariance Stabilizing effect Self-similarity matrix (thresholded)

Transposition Invariance Stabilizing effect Transposition-invariant self-similarity matrix (thresholded) Transposition Invariance Transposition-invariant matrix Minimizing shift index

Transposition Invariance Transposition-invariant matrix Minimizing shift index = 0 Transposition Invariance Transposition-invariant matrix Minimizing shift index = 1

Transposition Invariance Transposition-invariant matrix Minimizing shift index = 2 Transposition Invariance Serra/Gomez (ICASSP 2008): Used for Cover Song ID Discrete structure suitable for indexing?

Transposition Invariance Example: Beethoven Tempest Self-similarity matrix Transposition Invariance Example: Beethoven Tempest Transposition-invariant self-similarity matrix

Conclusions: Audio Structure Analysis Challenge: Musical variations Timbre, dynamics, tempo Musical key cyclic chroma shifts Major/minor Differences at note level / improvisations Conclusions: Audio Structure Analysis Strategy: Matrix enhancement Filtering techniques / contextual information Cooper/Foote (ISMIR 2002) Müller/Kurth (ICASSP 2006) Transposition-invariant similarity matrices Goto (ICASSP 2003) Müller/Clausen (ISMIR 2007) Higher-order similarity matrices Peeters (ISMIR 2007)

Novel Approach for Audio Thumbnailing Original approach: Two steps 1. Path extraction Paths of poor quality (fragmented, gaps) Regions of constant (low) cost Curved paths 2. Grouping: Noisy relations (missing, distorted, overlapping) Transitivity computation difficult Both steps are problematic! Our main idea: Do both, path extraction and grouping, jointly One optimization scheme for both steps Stabilizing effect Efficient Novel Approach for Audio Thumbnailing Our main idea: Do both path extraction and grouping jointly For each audio segment we define a fitness value This fitness value expresses how well the segment explains the entire audio recording The segment with the highest fitness value is considered to be the thumbnail As main technical concept we introduce the notion of a path family

Fitness Measure Self-similarity matrix Fitness Measure 200 180 160 140 120 100 80 60 40 20 0 0 50 100 150 200 1 0.5 0 0.5 1 1.5 2 Self-similarity matrix Smoothing Transposition-Invariance Normalization Thresholding Negative score

Fitness Measure Path over segment Consider a fixed segment Fitness Measure Path over segment Consider a fixed segment Path over segment Induced segment Score is high

Fitness Measure Path over segment Consider a fixed segment Path over segment Induced segment Score is high A second path over segment Induced segment Score is not so high Fitness Measure Path over segment Consider a fixed segment Path over segment Induced segment Score is high A second path over segment Induced segment Score is not so high A third path over segment Induced segment Score is very low

Fitness Measure Path family Consider a fixed segment A path family over a segment is a family of paths such that the induced segments do not overlap. Fitness Measure Path family Consider a fixed segment A path family over a segment is a family of paths such that the induced segments do not overlap. This is not a path family!

Fitness Measure Path family Consider a fixed segment A path family over a segment is a family of paths such that the induced segments do not overlap. This is a path family! (Even though not a good one) Fitness Measure Optimal path family Consider a fixed segment

Fitness Measure Optimal path family Consider a fixed segment Consider over the segment the optimal path family, i.e., the path family having maximal overall score. Call this value: Score(segment) Note: This optimal path family can be computed using dynamic programming. Fitness Measure Optimal path family Consider a fixed segment Consider over the segment the optimal path family, i.e., the path family having maximal overall score. Call this value: Score(segment) Furthermore consider the amount covered by the induced segments. Call this value: Coverage(segment)

Fitness Measure Fitness Consider a fixed segment P := R := Score(segment) Coverage(segment) Fitness Measure Fitness Consider a fixed segment Self-explanation are trivial! P := R := Score(segment) Coverage(segment)

Fitness Measure Fitness Consider a fixed segment Self-explanation are trivial! Substract length of segment P := R := Score(segment) Coverage(segment) - length(segment) - length(segment) Fitness Measure Fitness Consider a fixed segment Self-explanation are trivial! Substract length of segment Normalization P := Normalize( Score(segment) - length(segment) ) R := Normalize( Coverage(segment) - length(segment) ) [0,1] [0,1]

Fitness Measure Fitness Consider a fixed segment Fitness(segment) F := 2 P R / (P + R) P := Normalize( Score(segment) - length(segment) ) R := Normalize( Coverage(segment) - length(segment) ) [0,1] [0,1] Thumbnail Fitness Scape Plot Fitness Segment length Segment center Segment center

Thumbnail Fitness Scape Plot Fitness Fitness(segment) Segment length Segment center Segment center Thumbnail Fitness Scape Plot Fitness Segment center

Thumbnail Fitness Scape Plot Fitness Segment center Note: Self-explanations are ignored fitness is zero Thumbnail Fitness Scape Plot Fitness Segment center Thumbnail := segment having the highest fitness

Thumbnail Fitness Scape Plot Fitness Segment center Example: Brahms Hungarian Dance No. 5 (Ormandy) Thumbnail Fitness Scape Plot Fitness Segment center Example: Brahms Hungarian Dance No. 5 (Ormandy)

Thumbnail Fitness Scape Plot Fitness Segment center Example: Brahms Hungarian Dance No. 5 (Ormandy) Thumbnail Fitness Scape Plot Fitness Segment center Example: Brahms Hungarian Dance No. 5 (Ormandy)

Thumbnail Fitness Scape Plot Fitness Segment center Example: Zager & Evans In The Year 2525 Thumbnail Fitness Scape Plot Fitness Segment center Example: Zager & Evans In The Year 2525

Thumbnail Fitness Scape Plot Fitness Segment center Example: Zager & Evans In The Year 2525 Thumbnail Fitness Scape Plot Fitness Segment center Example: Beethoven Tempest, Pollini

Thumbnail Fitness Scape Plot Fitness Segment center Example: Beethoven Tempest, Pollini Thumbnail Fitness Scape Plot Fitness Segment center Example: Beethoven Tempest, Pollini Musical knowledge: Minimum length for thumbnail

Thumbnail Fitness Scape Plot Fitness Segment center Example: NLB72246 Thumbnail Fitness Scape Plot Fitness Segment center Example: NLB72246

Conclusions Path family: Couples path extraction and grouping Fitness: Quality of segment in context of entire recording Combination of score and coverage Trivial self-explanation are disregarded Thumbnail: Segment of maximal fitness Fintness scape plot: Global structure visualization Future work: Multiscale approach Combination with novelty detection Interface for structure navigation