Extracting and Using Music Audio Information

Similar documents
Data Driven Music Understanding

Extracting Information from Music Audio

Beat-Synchronous Chroma Representations for Music Analysis

Music Information Retrieval for Jazz

Searching for Similar Phrases in Music Audio

Data Driven Music Understanding

Lecture 11: Chroma and Chords

Content-based music retrieval

Supervised Learning in Genre Classification

MUSI-6201 Computational Music Analysis

Lecture 12: Alignment and Matching

Topic 10. Multi-pitch Analysis

Lecture 15: Research at LabROSA

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

THE importance of music content analysis for musical

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Outline. Why do we classify? Audio Classification

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Effects of acoustic degradations on cover song recognition

Music Genre Classification and Variance Comparison on Number of Genres

A Survey of Audio-Based Music Classification and Annotation

Transcription of the Singing Melody in Polyphonic Music

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Tempo and Beat Analysis

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

MODELS of music begin with a representation of the

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Subjective Similarity of Music: Data Collection for Individuality Analysis

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Music Radar: A Web-based Query by Humming System

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Singer Recognition and Modeling Singer Error

Classification of Timbre Similarity

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Automatic Rhythmic Notation from Single Voice Audio Sources

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Automatic Piano Music Transcription

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

A New Method for Calculating Music Similarity

Week 14 Music Understanding and Classification

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Music Structure Analysis

Music Recommendation from Song Sets

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Computational Modelling of Harmony

Singer Traits Identification using Deep Neural Network

Music Similarity and Cover Song Identification: The Case of Jazz

Chord Classification of an Audio Signal using Artificial Neural Network

Music Information Retrieval

Beat Tracking by Dynamic Programming

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Introductions to Music Information Retrieval

Music Segmentation Using Markov Chain Methods

Classification-Based Melody Transcription

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Automatic Labelling of tabla signals

AUDIO COVER SONG IDENTIFICATION: MIREX RESULTS AND ANALYSES

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

Music Genre Classification

Recognising Cello Performers using Timbre Models

Breakscience. Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass

/$ IEEE

Singing Pitch Extraction and Singing Voice Separation

Music Information Retrieval Community

A repetition-based framework for lyric alignment in popular songs

Statistical Modeling and Retrieval of Polyphonic Music

Query By Humming: Finding Songs in a Polyphonic Database

Unifying Low-level and High-level Music. Similarity Measures

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

Timing In Expressive Performance

Recognising Cello Performers Using Timbre Models

Automatic music transcription

Music Information Retrieval

WE ADDRESS the development of a novel computational

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

MUSIC CONTENT ANALYSIS : KEY, CHORD AND RHYTHM TRACKING IN ACOUSTIC SIGNALS

Content-based Music Structure Analysis with Applications to Music Semantics Understanding

TRACKING THE ODD : METER INFERENCE IN A CULTURALLY DIVERSE MUSIC CORPUS

Analysing Musical Pieces Using harmony-analyser.org Tools

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

CSC475 Music Information Retrieval

Analysis of local and global timing and pitch change in ordinary

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

Transcription:

Extracting and Using Music Audio Information Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. Motivation: Music Collections 2. Music Information 3. Music Similarity 4. Music Structure Discovery Music Audio Information - Ellis 2007-11-02 p. 1 /42

LabROSA Overview Information Extraction Music Machine Learning Recognition Separation Retrieval Speech Environment Signal Processing Music Audio Information - Ellis 2007-11-02 p. 2 /42

1. Managing Music Collections A lot of music data available e.g. 60G of MP3 1000 hr of audio, 15k tracks Management challenge how can computers help? Application scenarios personal music collection discovering new music music placement Music Audio Information - Ellis 2007-11-02 p. 3 /42

Learning from Music What can we infer from 1000 h of music? common patterns sounds, melodies, chords, form what is and what isn t music 60 50 40 30 Scatter of PCA(3:6) of 12x16 beatchroma Data driven musicology? Applications modeling/description/coding computer generated music curiosity... 20 10 60 50 40 30 20 10 10 20 30 40 50 60 10 20 30 40 50 60 Music Audio Information - Ellis 2007-11-02 p. 4 /42

The Big Picture Low-level features Classification and Similarity browsing discovery production Music audio Melody and notes Key and chords Tempo and beat Music Structure Discovery modeling generation curiosity.. so far Music Audio Information - Ellis 2007-11-02 p. 5 /42

2. Music Information How to represent music audio? Audio features spectrogram, MFCCs, bases Musical elements notes, beats, chords, phrases requires transcription Or something inbetween? optimized for a certain task? Frequency 4000 3000 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time Music Audio Information - Ellis 2007-11-02 p. 6 /42

Transcription as Classification Exchange signal models for data transcription as pure classification problem: Training data and features: MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). Normalized magnitude STFT. Classification: N-binary SVMs (one for ea. note). Independent frame-level classification on 10 ms grid. Dist. to class bndy as posterior. Temporal Smoothing: Two state (on/off) independent HMM for ea. note. Parameters learned from training data. Find Viterbi sequence for ea. note. feature representation classification posteriors hmm smoothing Poliner & Ellis 05, 06, 07 feature vector Music Audio Information - Ellis 2007-11-02 p. 7 /42

Polyphonic Transcription Real music excerpts + ground truth Frame-level transcription Estimate the fundamental frequency of all notes present on a 10 ms grid 1.25 1.00 0.75 0.50 0.25 0 Precision Recall Acc Etot Esubs Emiss Efa Note-level transcription Group frame-level predictions into note-level transcriptions by estimating onset/offset 1.25 1.00 0.75 0.50 0.25 0 Precision Recall Ave. F-measure Ave. Overlap Music Audio Information - Ellis 2007-11-02 p. 8 /42 MIREX 2007

Beat Tracking Goal: One feature vector per beat (tatum) for tempo normalization, efficiency Onset Strength Envelope sumf(max(0, difft(log X(t, f) ))) freq / mel 40 30 20 10 Ellis 06, 07 0 0 5 10 time / sec 15 Autocorr. + window global tempo estimate 0 168.5 BPM 0 100 200 300 400 500 600 700 800 900 1000 lag / 4 ms samples Music Audio Information - Ellis 2007-11-02 p. 9 /42

Beat Tracking Dynamic Programming finds beat times {t i } optimizes i O(t i ) + i W((t i+1 t i p )/ ) where O(t) is onset strength envelope (local score) W(t) is a log-gaussian window (transition cost) p is the default beat period per measured tempo incrementally find best predecessor at every time backtrace from largest final score to get beats C*(t) O(t) τ t C*(t) = γ O(t) + (1 γ)max{w((τ τ p )/β)c*(τ)} τ P(t) = argmax{w((τ τ p )/β)c*(τ)} τ Music Audio Information - Ellis 2007-11-02 p. 10/42

freq / Bark band freq / Bark band Beat Tracking DP will bridge gaps (non-causal) 40 30 20 10 there is always a best path... 2nd place in MIREX 2006 Beat Tracking compared to McKinney & Moelants human data 40 30 20 10 Alanis Morissette - All I Want - gap + beats 182 184 186 188 190 192 time / sec test 2 (Bragg) - McKinney + Moelants Subject data Subject # 40 20 0 0 5 10 time / s 15 Music Audio Information - Ellis 2007-11-02 p. 11/42

Piano scale Chroma Features Chroma features convert spectral energy into musical weights in a canonical octave freq / khz 4 3 2 1 0 i.e. 12 semitone bins A 2 4 6 8 10 time / sec 100 200 300 400 500 600 700 time / frames Can resynthesize as Shepard Tones all octaves at once level / db Piano chromatic scale 0 12 Shepard tone spectra -10-20 -30-40 -50-60 0 500 1000 1500 2000 2500 freq / Hz chroma freq / khz G F D C 4 3 2 1 0 IF chroma Shepard tone resynth 2 4 6 8 10 time / sec Music Audio Information - Ellis 2007-11-02 p. 12/42

Key Estimation Covariance of chroma reflects key Normalize by transposing for best fit single Gaussian model of one piece find ML rotation of other pieces model all transposed pieces iterate until convergence aligned chroma G F D C A G F D C Taxman Eleanor Rigby I'm Only Sleeping Love You To A A C D F G G F D C G F D C A A C D F G Aligned Global model G F D C A A C D F G A A C D F G Yellow Submarine She Said She Said Good Day Sunshine And Your Bird Can Sing G F D C Ellis ICASSP 07 G F D C G F D C A A C D F G G F D C A A C D F G A A C D F G A A C D F G aligned chroma Music Audio Information - Ellis 2007-11-02 p. 13/42

Chord Transcription Real Books give chord transcriptions but no exact timing.. just like speech transcripts Use EM to simultaneously learn and align chord models Sheh & Ellis 03 # The Beatles - A Hard Day's Night # G Cadd9 G F6 G Cadd9 G F6 G C D G C9 G G Cadd9 G F6 G Cadd9 G F6 G C D G C9 G Bm Em Bm G Em C D G Cadd9 G F6 G Cadd9 G F6 G C D G C9 G D G C7 G F6 G C7 G F6 G C D G C9 G Bm Em Bm G Em C D G Cadd9 G F6 G Cadd9 G F6 G C D G C9 G C9 G Cadd9 Fadd9 Model inventory ae 1 ae 2 ae 3 dh 1 dh 2 Labelled training data dh ax k ae t s ae t aa n Initialization parameters Θ init dh ax k ae s ae t aa n t Uniform initialization alignments Repeat until convergence E-step: probabilities of unknowns M-step: maximize via parameters p(q i n X N 1, Θ old ) dh ax Θ : max E[log p(x,q Θ)] k ae Music Audio Information - Ellis 2007-11-02 p. 14/42

Frame-level Accuracy Feature Recog. Alignment MFCC 8.7% 22.0% PCP_ROT 21.7% 76.0% MFCCs are poor (can overtrain) PCPs better (ROT helps generalization) Chord Transcription (random ~3%) pitch class true # G # F E # D # C B # Beatles - Beatles For Sale - Eight Days a Week (4096pt) A 16.27 24.84 time / sec E G D Bm G 120 100 80 60 40 20 0 intensity align E G DBm G Needed more training data... recog E G Bm Am Em7 Bm Em7 Music Audio Information - Ellis 2007-11-02 p. 15/42

3. Music Similarity The most central problem... motivates extracting musical information supports real applications (playlists, discovery) But do we need content-based similarity? compete with collaborative filtering compete with fingerprinting + metadata Maybe... for the Future of Music connect listeners directly to musicians Music Audio Information - Ellis 2007-11-02 p. 16/42

Discriminative Classification Classification as a proxy for similarity Distribution models... Training Mandel & Ellis 05 MFCCs GMMs Artist 1 KL Min Artist Artist 2 KL Test Song vs. SVM Training Artist 2 Artist 1 MFCCs Song Features D D D D D D DAG SVM Artist Test Song Music Audio Information - Ellis 2007-11-02 p. 17/42

Segment-Level Features Statistics of spectra and envelope define a point in feature space for SVM classification, or Euclidean similarity... { } Mandel & Ellis 07 Music Audio Information - Ellis 2007-11-02 p. 18/42

MIREX 07 Results One system for similarity and classification 0.8 Audio Music Similarity 80 Audio Classification 0.7 70 0.6 60 0.5 50 0.4 40 0.3 30 0.2 0.1 0 Greater0 Psum Fine WCsum SDsum Greater1 PS GT LB CB1 TL1 ME TL2 CB2 CB3 BK1 PC BK2 PS = Pohle, Schnitzer; GT = George Tzanetakis; LB = Barrington, Turnbull, Torres, Lanckriet; CB = Christoph Bastuck; TL = Lidy, Rauber, Pertusa, Iñesta; ME = Mandel, Ellis; BK = Bosteels, Kerre; PC = Paradzinets, Chen 20 10 0 Genre ID Hierarchical Genre ID Raw Mood ID Composer ID Artist ID IM svm ME spec ME TL GT IM knn KL CL GH IM = IMIRSEL M2K; ME = Mandel, Ellis; TL = Lidy, Rauber, Pertusa, Iñesta; GT = George Tzanetakis; KL = Kyogu Lee; CL = Laurier, Herrera; GH = Guaus, Herrera Music Audio Information - Ellis 2007-11-02 p. 19/42

Active-Learning Playlists SVMs are well suited to active learning solicit labels on items closest to current boundary Automatic player with skip = Ground truth data collection active-svm automatic playlist generation Music Audio Information - Ellis 2007-11-02 p. 20/42

freq / khz Cover Song Detection Cover Songs = reinterpretation of a piece different instrumentation, character no match with timbral features 4 3 2 Let It Be - The Beatles Let It Be / Beatles / verse 1 freq / khz 4 3 2 Let It Be - Nick Cave Let It Be / Nick Cave / verse 1 Ellis & Poliner 07 1 1 chroma 0 2 4 6 8 10 time / sec Need a different representation! G F D C beat-synchronous chroma features Beat-sync chroma features chroma 0 G F D C 2 4 6 8 10 Beat-sync chroma features time / se A 5 10 15 20 25 beats A 5 10 15 20 25 beat Music Audio Information - Ellis 2007-11-02 p. 21/42

Beat-Synchronous Chroma Features Beat + chroma features / 30ms frames average chroma within each beat compact; sufficient? &# 34,5-.-6,7 %# $# "# 89/,)-/)4,9:); # 0;48+2-1*9/ 0;48+2-1*9/ "$ "# ( ' & $ #! "# )*+,-.-/,0 "! "$ "# ( ' & $! "# "! $# $! %# %! )*+,-.-1,2)/ Music Audio Information - Ellis 2007-11-02 p. 22/42

Matching: Global Correlation Cross-correlate entire beat-chroma matrices... at all possible transpositions implicit combination of match quality and duration chroma bins chroma bins skew / semitones G E D C A G E D C A +5 0 Elliott Smith - Between the Bars 100 200 300 400 500 beats @281 BPM Glen Phillips - Between the Bars Cross-correlation -5-500 -400-300 -200-100 0 100 200 300 400 skew / beats One good matching fragment is sufficient...? Music Audio Information - Ellis 2007-11-02 p. 23/42

MIREX 06 Results Cover song contest 30 songs x 11 versions of each (!) (data has not been disclosed) # true covers in top 10 8 systems compared (4 cover song + 4 similarity) Found 761/3300 = 23% recall next best: 11% guess: 3% song-set (each row is one query song) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 MIREX 06 Cover Song Results: # Covers retrieved per song per system CS DE KL1 KL2 KWL KWT LR TP cover song systems similarity systems 8 6 4 2 0 correct matches retrieved Music Audio Information - Ellis 2007-11-02 p. 24/42

Cross-Correlation Similarity Use cover-song approach to find similarity e.g. similar note/instrumentation sequence may sound very similar to judges Numerous variants try on chroma (melody/harmony) and MFCCs (timbre) try full search (xcorr) or landmarks (indexable) compare to random, segment-level stats Evaluate by subjective tests modeled after MIREX similarity Music Audio Information - Ellis 2007-11-02 p. 25/42

Cross-Correlation Similarity Human web-based judgments binary judgments for speed 6 users x 30 queries x 10 candidate returns sible of 180. Algorithm Similar count (1) Xcorr, chroma 48/180 = 27% (2) Xcorr, MFCC 48/180 = 27% (3) Xcorr, combo 55/180 = 31% (4) Xcorr, combo + tempo 34/180 = 19% (5) Xcorr, combo at boundary 49/180 = 27% (6) Baseline, MFCC 81/180 = 45% (7) Baseline, rhythmic 49/180 = 27% (8) Baseline, combo 88/180 = 49% Random choice 1 22/180 = 12% Random choice 2 28/180 = 16% Cross-correlation inferior to baseline...... but is getting somewhere, even with landmark Music Audio Information - Ellis 2007-11-02 p. 26/42

Cross-Correlation Similarity Results are not overwhelming.. but database is only a few thousand clips Music Audio Information - Ellis 2007-11-02 p. 27/42

Anchor Space Acoustic features describe each song.. but from a signal, not a perceptual, perspective.. and not the differences between songs Use genre classifiers to define new space prototype genres are anchors Berenzweig & Ellis 03 Audio Input (Class i) Audio Input (Class j) Anchor Anchor Anchor Anchor Anchor Anchor n-dimensional vector in "Anchor Space" p(a 1 x) p(a n-dimensional vector 2 x) in "Anchor Space" p(a 1 x) p(a n x) p(a 2 x) Conversion to Anchorspace p(a n x) GMM Modeling GMM Modeling Similarity Computation KL-d, EMD, etc. Conversion to Anchorspace Music Audio Information - Ellis 2007-11-02 p. 28/42

Anchor Space Frame-by-frame high-level categorizations compare to raw features? fifth cepstral coef 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 Cepstral Features madonna bowie 1 0.5 0 0.5 third cepstral coef properties in distributions? dynamics? Electronica 0 5 10 15 Anchor Space Features madonna bowie 15 10 5 Country Music Audio Information - Ellis 2007-11-02 p. 29/42

Playola Similarity Browser Music Audio Information - Ellis 2007-11-02 p. 30/42

Ground-truth data Hard to evaluate Playola s accuracy user tests... ground truth? Ellis et al, 02 Musicseer online survey/game: ran for 9 months in 2002 > 1,000 users, > 20k judgments http://labrosa.ee.columbia.edu/ projects/musicsim/ Music Audio Information - Ellis 2007-11-02 p. 31/42

Semantic Bases Describe segment in human-relevant terms e.g. anchor space, but more so Need ground truth... what words to people use? MajorMiner game: 400 users 7500 unique tags 70,000 taggings 2200 10-sec clips used Train classifiers... Music Audio Information - Ellis 2007-11-02 p. 32/42

3. Music Structure Discovery Use the many examples to map out the manifold of music audio... and hence define the subset that is music artist model s tina_turner roxette rolling_stones queen pink_floyd metallica madonna green_day genesis garth_brooks fleetwood_mac depeche_mode dave_matthews_band creedence_clearwater_revival bryan_adams beatles aerosmith Problems u2 32GMMs on 1000 MFCC20s ae be br cr da de fl ga ge gr ma me pi qu ro ro ti u2 test tracks alignment/registration of data factoring & abstraction separating parts? x 10 4-2.5-3 -3.5-4 -4.5-5 -5.5-6 -6.5-7 Music Audio Information - Ellis 2007-11-02 p. 33/42

Eigenrhythms: Drum Pattern Space Pop songs built on repeating drum loop variations on a few bass, snare, hi-hat patterns Ellis & Arroyo 04 Eigen-analysis (or...) to capture variations? by analyzing lots of (MIDI) data, or from audio Applications music categorization beat box synthesis insight Music Audio Information - Ellis 2007-11-02 p. 34/42

Aligning the Data Need to align patterns prior to modeling... tempo (stretch): by inferring BPM & normalizing downbeat (shift): correlate against mean template Music Audio Information - Ellis 2007-11-02 p. 35/42

Eigenrhythms (PCA) Need 20+ Eigenvectors for good coverage of 100 training patterns (1200 dims) Eigenrhythms both add and subtract Music Audio Information - Ellis 2007-11-02 p. 36/42

Posirhythms (NMF) Posirhythm 1 Posirhythm 2 HH HH SN SN BD BD Posirhythm 3 Posirhythm 4 HH HH SN SN BD BD Posirhythm 5 Posirhythm 6 HH HH SN SN BD BD 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 samples (@ 2 1 2 3 4 1 2 3 4 beats (@ 120 0.1 0-0.1 Nonnegative: only adds beat-weight Capturing some structure Music Audio Information - Ellis 2007-11-02 p. 37/42

Eigenrhythm BeatBox Resynthesize rhythms from eigen-space Music Audio Information - Ellis 2007-11-02 p. 38/42

Melody Clustering Goal: Find fragments that recur in melodies.. across large music database.. trade data for model sophistication Training data Melody extraction 5 second fragments VQ clustering Data sources pitch tracker, or MIDI training data Melody fragment representation Top clusters DCT(1:20) - removes average, smoothes detail Music Audio Information - Ellis 2007-11-02 p. 39/42

Melody Clustering Clusters match underlying contour: Some interesting matches: e.g. Pink + Nsync Music Audio Information - Ellis 2007-11-02 p. 40/42

Beat-Chroma Fragment Codebook Idea: Find the very popular music fragments e.g. perfect cadence, rising melody,...? Clustering a large enough database should reveal these but: registration of phrase boundaries, transposition Need to deal with really large datasets e.g. 100k+ tracks, multiple landmarks in each but: Locality Sensitive Hashing can help - quickly finds most points in a certain radius Experiments in progress... Music Audio Information - Ellis 2007-11-02 p. 41/42

Conclusions Low-level features Classification and Similarity browsing discovery production Music audio Melody and notes Key and chords Tempo and beat Music Structure Discovery modeling generation curiosity Lots of data + noisy transcription + weak clustering musical insights? Music Audio Information - Ellis 2007-11-02 p. 42/42