Lecture 15: Research at LabROSA

ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 15: Research at LabROSA 1. Sources, Mixtures, & Perception 2. Spatial Filtering 3. Time-Frequency Masking 4. Model-Based Separation Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e4896/ E4896 Music Signal Processing (Dan Ellis) 214-5-5-1 /19

Sparse + Low-Rank + NMF Optimization to decompose spectogram: minimize s.t. Zhuo Chen S 1 + L + D KL (Y S L H W) Y = S + L + H W Y H W L S E4896 Music Signal Processing (Dan Ellis) 214-5-5-2 /19

Beta Process NMF Automatically choose how many components to use Liang, Hoffman X = D(S Z)+E E4896 Music Signal Processing (Dan Ellis) 214-5-5-3 /19

Music Complexity Colin Raffel How can we capture musical patterns in the Million Song Dataset? Network analysis of quantized simultaneities after Serrà et al. 212 from Serrà, Corral, Boguña, Haro, & Arcos, 212 E4896 Music Signal Processing (Dan Ellis) 214-5-5-4 /19

Large-Scale Cover Recognition 1 Thierry Bertin-Mahieux How can we find covers in 1M songs? @ 1 sec / comparison, one search = 11.5 CPU-days full N 2 mining = 16, CPU-years Need a hashing technique landmark-based description of chroma patches Euclidean space projection? E4896 Music Signal Processing (Dan Ellis) 214-5-5-5 /19

Large-Scale Cover Recognition 2 2D Fourier Transform Magnitude (2DFTM) fixed-size feature to capture essence of chromagram: First results on finding covers in 1M songs Thierry Bertin-Mahieux Average rank meanap random 5,. jumpcodes 2 38,369.2 2DFTM (5 PC) 137,117.2 E4896 Music Signal Processing (Dan Ellis) 214-5-5-6 /19

Jazz Discography Project How can MIR help organize jazz collections? our tools are quite genre-specific e.g. beat tracker is fine for pop, useless for Jazz 4 3 2 1 8 6 4 2 84 86 88 9 92 94 96 98 E4896 Music Signal Processing (Dan Ellis) 214-5-5-7 /19

Local Tagging MFCC-statistics classifiers on 5 sec windows trained from MajorMiner data 1 Soul Eyes freq / Hz 2416 1356 761 427 24 135 _9s club trance end drum_bass singing horns punk samples silence quiet noise solo strings indie house alternative r_b funk soft ambient british distortion drum_machine country keyboard saxophone fast instrumental electronica 8s voice beat slow rap hip_hop jazz piano techno dance female bass vocal pop electronic rock synth male guitar drum 5 1 15 2 25 3 4 8 12 16 2 24 28 32 time / s E4896 Music Signal Processing (Dan Ellis) 214-5-5-8 /19 1.5 1.5.5 1 1.5 2

Onset Correlation Ahead of or behind the beat? Brian McFee Tony Williams Elvin Jones E4896 Music Signal Processing (Dan Ellis) 214-5-5-9 /19

Structural Similarity Diego Silva Helene Papadopoulos Self-similarity shows repeating structure in music Can we find similar pieces by finding similar structures? from Bello 211 E4896 Music Signal Processing (Dan Ellis) 214-5-5-1/19

Ordinal LDA Segmentation Low-rank decomposition of skewed selfsimilarity to identify repeats Learned weighting of multiple factors to segment Linear Discriminant Analysis between adjacent segments Beat Lag 55 11 165 22 275 33 Beat McFee E4896 Music Signal Processing (Dan Ellis) 214-5-5-11/19 33 275 22 165 11 55 33 22 11-11 -22 Self-similarity Filtered self-sim. -33 55 11 165 22 275 33 Beat Lag 33 22 11-11 -22 Skewed self-sim. -33 55 11 165 22 275 33 Beat Factor 1 2 3 4 5 6 7 Latent repetition 55 11 165 22 275 33 Beat

Lyric Recognition Speech Recognition for Songs lots of interference atypical speech Matt McVicar 4 Polyphonic Audio 4 Acapella Audio Frequency (khz) 3 2 1 2 4 6 8 3 2 1 2 4 6 8 4 Natural Speech 4 Synthesized Speech Frequency (khz) 3 2 1 1 2 3 4 5 6 Time (seconds) 3 2 1 1 2 3 4 5 6 7 Time (seconds) E4896 Music Signal Processing (Dan Ellis) 214-5-5-12/19

Singing ASR Speech recognition adapted to singing needs aligned data Align scraped acapellas and full mix including jumps McVicar E4896 Music Signal Processing (Dan Ellis) 214-5-5-13/19

Remixavier" Optimal align-and-cancel of mix and acapella timing and channel may differ Raffel E4896 Music Signal Processing (Dan Ellis) 214-5-5-14/19

Million Song Dataset Many Facets Echo Nest audio features + metadata Echo Nest taste profile user-song-listen count Second Hand Song covers musixmatch lyric BoW last.fm tags Now with audio? resolving artist / album / track / duration against what.cd Bertin-Mahieux McFee E4896 Music Signal Processing (Dan Ellis) 214-5-5-15/19

MIDI-to-MSD Aligned MIDI to Audio is a nice transcription Raffel Shi E4896 Music Signal Processing (Dan Ellis) 214-5-5-16/19

De-DTMF Problem: Stationary tones confuse speech detector Adaptively filter sinusoids with steady amplitude Frequency Input audio tcp_d1_2_counting_cia_irdial Spectrum and LPC fit 3 6 2 1 55 56 57 Time Framing Gain / db 4 2 2 5 1 15 Freq / Hz Imaginary Part 1 LPC poles 2 1 1 1 Real Part t Imaginary Par.7 LPC poles detail.68.68.7.72 Real Part LPC fit Find roots Transform radii 1..8.6.4. 2. Mapped radius Ouput audio Overlapadd Filter audio frames Add poles Map to zeros Frequency 3 2 1 Filtered signal 55 56 57 Time Filter response & spectrum 6 Gain / db 4 2 2 5 1 15 Freq / Hz 1 1 1 Real Part E4896 Music Signal Processing (Dan Ellis) 214-5-5-17/19 Imaginary Part 1 Transformed filter 15 Imaginary Part Transformed filter detail.72.7.68.68.7.72 Real Part

Pitch-based Filtering Resample to flatten pitch, then filter E4896 Music Signal Processing (Dan Ellis) 214-5-5-18/19

Summary Signal Separation NMF, RPCA, cancellation, filtering Music Information Beat tracking, segmentation Large datasets Indexing & retrieval Speech Lyric recognition Speech detection & enhancement E4896 Music Signal Processing (Dan Ellis) 214-5-5-19/19

References [Bello 211] J P Bello, Measuring structural similarity in music, IEEE Tr. Audio, Speech, & Lang., 19(7): 213-225, 211. [Serra et al. 212] J Serrà, A Corral, M Boguña, M. Haro, & J. Arcos, Measuring the evolution of contemporary western popular music, Scientific Reports, 2:521, 212. E4896 Music Signal Processing (Dan Ellis) 214-5-5-2/19