Learning about Music Cognition by Asking MIR Questions Sebastian Stober August 12, 2016 CogMIR, New York City sstober@uni-potsdam.de http://www.uni-potsdam.de/mlcog/ MLC g Machine Learning in Cognitive Science Lab
From MIR to MIIR Music Imagery Information Retrieval = retrieving music information from brain signals Sebastian Stober - CogMIR 2016 2016-08-12 2
12 audio stimuli from 8 music pieces 4 songs recorded each with and without lyrics 4 instrumental pieces complete musical phrases length between 6.9s and 16s (mean 10.5) https://github.com/sstober/openmiir Sebastian Stober - CogMIR 2016 2016-08-12 3
Experiment Setup sound booth presentation system feedback video audio events presentation system screen & speakers feedback keyboard markers recording system stimtracker receiver (optical) EEG amp on battery Biosemi ActiveTwo, 64 EEG + 4 EOG channels @ 512 Hz MLC g Sebastian Stober - CogMIR 2016 2016-08-12 4
The 12 Music Stimuli songs with / without lyrics: meter tempo length (s) 1 Chim Chim Cheree 3/4 210 13.3 13.5 2 Take me out to the Ballgame 3/4 186 7.7 7.7 3 Jingle Bells 4/4 200 9.7 9.0 4 Mary Had a Little Lamb 4/4 160 11.6 12.2 instrumental pieces: 1 Emperor Waltz 3/4 178 8.3 2 Harry Potter Theme 3/4 166 16.0 3 Imperial March (Star Wars Theme) 4/4 104 9.2 4 Eine Kleine Nachtmusik 4/4 140 6.9 Sebastian Stober - CogMIR 2016 2016-08-12 5
MIIR Questions audio reconstruction failed (non-sparsity) stimulus identification beat and tempo tracking meter classification lyrics / non-lyrics / instrumental classification Sebastian Stober - CogMIR 2016 2016-08-12 6
Stimulus Identification: Pre-Training Method Learning Distinguishing Features using Similarity-Constraint Encoders Sebastian Stober - CogMIR 2016 2016-08-12 7
Similarity-Constraint Encoder exploit synchronization between trials expect similar temporal patterns for the same stimulus goal: improve signal-to-noise ratio learn signal filters that lead to distinguishing (temporal) patterns for the different classes Sebastian Stober - CogMIR 2016 2016-08-12 8
Similarity-Constraint Encoder motivated by relative constraints used for metric learning: for all paired trials (A,B) + trial C from other class: sim(a,b) > sim(a,c) many combination for (A,B) and C favors features that are representative and allow to distinguish classes Sebastian Stober - CogMIR 2016 2016-08-12 9
Similarity-Constraint Encoder (virtual network structure) Input Triplet Feature Extraction (signal filter) Pairwise Similarity (dot product) Prediction (probabilities) Reference Input Encoder Paired Input Trial Encoder Similarity Softmax Other Input Trial Encoder Similarity (shared weights) minimize constraint violations Sebastian Stober - CogMIR 2016 2016-08-12 10
Stimulus Identification 12-class single-trial classification Sebastian Stober - CogMIR 2016 2016-08-12 11
Nested Cross-Validation 9-fold subject cross-validation train on data from 8 subjects (8x5x12=480 trials), test on remaining subject (1x5x12=60 trials) Pre-Training Supervised Training 5-fold trial block cross-validation 8x4x12=384 training trials from 4 trial blocks, 8x1x12=96 validation trials from remaining trial block train on 50688 triplets from 384 training trials select model (early stopping) based on 21120 validation triplets (a,b,c) with a from 96 validation trials and b,c from 480 training and validation trials encoder layer (L1) = average over folds 5-fold trial block cross-validation same training/validation splits as in pre-training phase SVC: select best value for C (grid search) based on highest mean validation accuracy train with selected C on 480 training trials Neural Network: select fold model (early stopping) based on highest validation accuracy classifier layer (L2) = average over folds Sebastian Stober - CogMIR 2016 2016-08-12 12
Resulting Spatial Filter only use within-subject trial triplets train on 8 of 9 subjects 9 versions (just 1 filter per encoder!) Sebastian Stober - CogMIR 2016 2016-08-12 13
Stimulus Classification (9-fold cross-subject validation) Classifier Features Accuracy SVC raw EEG 18.52% SVC raw EEG channel mean 12.41% End-to-end NN raw EEG 16.30% SVC 12-class encoder output 27.22% Neural Net 12-class encoder output 26.67% significant improvement over baseline (McNemar s test with n=540, p < 0.0002) Sebastian Stober - CogMIR 2016 2016-08-12 14
Stimulus Classification (9-fold cross-subject validation) Chim Chim Cheree (lyrics) Take Me Out to the Ballgame (lyrics) Jingle Bells (lyrics) Mary Had a Little Lamb (lyrics) Chim Chim Cheree Take Me Out to the Ballgame Jingle Bells Mary Had a Little Lamb Emperor Waltz Hedwig s Theme (Harry Potter) Imperial March (Star Wars Theme) Eine Kleine Nachtmusik Sebastian Stober - CogMIR 2016 2016-08-12 15
Mean NN Parameters very simple model similar patterns for lyrics / non-lyrics pairs Sebastian Stober - CogMIR 2016 2016-08-12 16
Sebastian Stober - CogMIR 2016 2016-08-12 17
Sebastian Stober - CogMIR 2016 2016-08-12 18
Classifying Imagination using the same pre-training technique hardly above random accuracy most likely due to poor timing / sync using the same pre-trained filter same problem: hard to learn temporal patterns => experiment redesign / different encoder Sebastian Stober - CogMIR 2016 2016-08-12 19
Tempo Extraction [ISMIR 16] Sebastian Stober - CogMIR 2016 2016-08-12 20
Tempo Extraction [ISMIR 16] (a) (b) Audio Tempo (BPM) 159 BPM Time (seconds) Tempo (BPM) (c) (d) EEG Tempo (BPM) 158 BPM Time (seconds) Tempo (BPM) Sebastian Stober - CogMIR 2016 2016-08-12 21
#peaks tempo error (%) (a) Single-trial (b) Fusion I (c) Fusion II nn = 1 δδ (a) (b) (c) 0 98 97 83 3 84 80 58 5 78 75 50 7 75 72 42 Stimulus ID Absolute BPM Error nn = 2 δδ (a) (b) (c) 0 96 97 83 3 79 67 42 5 71 57 33 7 65 52 25 Stimulus ID Absolute BPM Error δδ (a) (b) (c) 0 96 97 83 nn = 3 3 73 60 42 5 62 47 25 7 54 40 25 error tolerance (BPM) Stimulus ID Participant ID Participant ID Absolute BPM Error
Meter Classification 3/4 vs. 4/4 Sebastian Stober - CogMIR 2016 2016-08-12 23
Meter Classification (9-fold cross-subject validation) Classifier Features Accuracy SVC raw EEG 62.04% SVC raw EEG channel mean 58.52% End-to-end NN raw EEG 60.56% Dummy output of 12-class classifier 59.63% SVC 12-class encoder output 69.44% Neural Net 12-class encoder output 67,77% SVC meter-class encoder output 60.19% Neural Net meter-class encoder output 58.88% Sebastian Stober - CogMIR 2016 2016-08-12 24
Meter Classification (9-fold cross-subject validation using spatial filter from stimulus recognition) spatial filter SVC confusion NN confusion NN temporal patterns: 3/4 4/4 time (samples) Sebastian Stober - CogMIR 2016 2016-08-12 25
Group Classification lyrics / non-lyrics / instrumental Sebastian Stober - CogMIR 2016 2016-08-12 26
Group Classification (9-fold cross-subject validation) Classifier Features Accuracy SVC raw EEG 40.37% SVC raw EEG channel mean 38.70% End-to-end NN raw EEG 37.40% Dummy output of 12-class classifier 38.89% SVC 12-class encoder output 48.88% Neural Net 12-class encoder output 48.88% SVC group-class encoder output 35.37% Neural Net group-class encoder output 34.63% Sebastian Stober - CogMIR 2016 2016-08-12 27
Group Classification (9-fold cross-subject validation using spatial filter from stimulus recognition) spatial filter SVC confusion NN confusion NN temporal patterns: 0x 1x 2x time (samples) Sebastian Stober - CogMIR 2016 2016-08-12 28
Conclusions Sebastian Stober - CogMIR 2016 2016-08-12 29
MIIR Questions audio reconstruction failed (non-sparsity) stimulus identification beat and tempo tracking meter classification lyrics / non-lyrics / instrumental classification Sebastian Stober - CogMIR 2016 2016-08-12 30
Proposed MIIR Approach for different music features attempt classification / regression (derived from typical MIR tasks) use similarity-constraint encoder for contrasting i.e. learn features (from data) that are most different for the classes hypothesis-driven encoder design (assumptions about brain activity / features) limits: amount of trials; subject / stimuli bias Sebastian Stober - CogMIR 2016 2016-08-12 31
New Questions 1. How can the spatial filter be interpreted? recall: it produces distinguishable waveforms forward modeling (regression) 2. Which cognitive process results in the prominent signal peak at the 3 rd downbeat? => learn more about music cognition Sebastian Stober - CogMIR 2016 2016-08-12 32
Thank You! Avital Sternin, Jessica A. Grahn, Adrian M. Owen, Thomas Prätzlich, Meinard Müller contact: sstober@uni-potsdam.de www.uni-potsdam.de/mlcog/ code: https://github.com/sstober/deepthought (update coming!) dataset: https://github.com/sstober/openmiir Sebastian Stober - CogMIR 2016 2016-08-12 33