1 Teasing the Music out of Digital Data Matthias Mauch November, 2012
Me come from Unna Diplom in maths at Uni Rostock (2005) PhD at Queen Mary: Automatic Chord Transcription from Audio Using Computational Models of Musical Context (2010) since then: AIST, Japan and Last.fm now: Research Fellow, Lecturer at Queen Mary, University of London my website: http://matthiasmauch.net
Centre for Digital Music C4DM is part of School of Electronic Engineering and Computer Science at Queen Mary, University of London ~10 years old (founded in 2003) led by Mark Plumbley; more than 50 full-time members: academics (professors, lecturers), research staff, research students, guests
Centre for Digital Music C4DM is part of School of Electronic Engineering and Computer Science at Queen Mary, University of London ~10 years old (founded in 2003) led by Mark Plumbley; more than 50 full-time members: academics (professors, lecturers), research staff, research students, guests
Areas in the C4DM Audio Engineering: auto-mixing, feedback-elimination,... Josh Reiss Interactional Sound & Music: interfaces for interaction with music,... Nick Bryan-Kinns Machine Listening: sparse models of audio, object coding, nonmusical/speech sound classification,... Mark Plumbley Music Informatics: automatic transcription, my area music classification and retrieval (by genre, mood, similarity), segmentation,... Simon Dixon Music Cognition: models of music in human brains... Geraint Wiggins, Marcus Pearce New Research Areas include performance studies and augmented musical instruments... Elaine Chew, Andrew McPherson
Areas in the C4DM Audio Engineering: auto-mixing, feedback-elimination,... Josh Reiss Interactional Sound & Music: interfaces for interaction with music,... Nick Bryan-Kinns Machine Listening: sparse models of audio, object coding, nonmusical/speech sound classification,... Mark Plumbley Music Informatics: automatic my transcription, area music classification and retrieval (by genre, mood, similarity), segmentation,... Simon Dixon Music Cognition: models of music in human brains... Geraint Wiggins, Marcus Pearce New Research Areas include performance studies and augmented musical instruments... Elaine Chew, Andrew McPherson
Areas in the C4DM Audio Engineering: auto-mixing, feedback-elimination,... Josh Reiss Interactional Sound & Music: interfaces for interaction with music,... Nick Bryan-Kinns Machine Listening: sparse models of audio, object coding, nonmusical/speech sound classification,... Mark Plumbley Music Informatics: automatic my transcription, area music classification and retrieval (by genre, mood, similarity), segmentation,... Simon Dixon Music Cognition: models of music in human brains... Geraint Wiggins, Marcus Pearce New Research Areas include performance studies and augmented musical instruments... Elaine Chew, Andrew McPherson
Music Informatics my area, led by Simon Dixon harmony analysis: automatic chord transcription, chord progressions, key detection transcription: multiple fundamental frequency estimation, semiautomatic techniques music classification: genre classification, mood classification other stuff: analysis of violoncello timbre in recordings, automatic classification of harpsichord temperament, beat tracking, drum patterns...
My work PhD audio chord transcription post-doc lyrics-to-audio alignment, Songle chord/key/beat tracking Research Fellow at Last.fm Driver s Seat harpsichord tuning estimation DarwinTunes analysis of musical evolution
Audio Chord Transcription I metric pos. M i 1 M i DBN models key K i 1 K i musical context [1][2] bass, key, metric chord C i 1 C i position bass B i 1 B i 2012 state of the art adaptation: Ni et al. [3] bass chroma X bs i 1 X bs i treble chroma X tr i 1 X tr i
Audio Chord Transcription I metric pos. M i 1 M i DBN models full plain musical context [1][2] full MB full M mean overlap rank key gvmm!dpoufyu! npefmk i 1 K i bass, key, metric Weller al. full MBK chord C i 1 C i position 2012 state of the art adaptation: Ni et al. [3] 1.5 2 2.5 3 3.5 4 4.5 bass significant improvement in chord transcription bass chroma B i 1 X bs i 1 B i X bs i treble chroma X tr i 1 X tr i
Audio Chord Transcription I metric pos. M i 1 M i DBN models key full plain mean overlap rank K i 1 gvmm!dpoufyu npefm K i musical context [1][2] full M full MB bass, key, metric chord full MBK Weller et al. C i 1 C i position 2012 state of the art adaptation: Ni et al. [3] bass bass chroma 1.5 2 2.5 3 3.5 4 4.5 B i 1 X bs i 1 B i significant improvement in chord transcription X bs i treble chroma X tr i 1 X tr i
Audio Chord Transcription II averaging features across repeated song segments [4] non-systematic noise is attenuated better results automatic segmentation part n1 part A part B part A pa chord correct using auto seg. chord correct baseline meth. 1 0.5 0 0 20 40 60 80 time/s
Figure 6.7: Song-wise improvement in RCO for th Audio Chord Transcription II 60 averaging features across repeated song segments [4] non-systematic noise is attenuated better results number of songs 40 20 2000 180 accuracy of 160 most songs improves 140 120 automatic segmentation part n1 part A part B part A pa song 100 80 chord correct using auto seg. 60 40 chord correct baseline meth. 1 0.5 0 0 20 40 60 80 time/s 20 0 15 10 5 0 5 10 15 improvement in percentage points (a) autobeat-autoseg against autobeatnoseg (b no
Chordino & NNLS Chroma NNLS Chroma [5] Vamp plugin (e.g. for Sonic Visualiser): download: http://isophonics.net/nnlschroma source: http://code.soundsoftware.ac.uk/ projects/nnls-chroma contains Chordino a basic chord estimator
SongPrompter... first verse: all lyrics and chords given subsequent verse: only lyrics; chords are omitted blank line separates song segments X X Verse: Bm G D A Once you were my love, now just a friend, Bm G D A What a cruel thing to pretend. Bm G D A A mistake I made, there was a price to pay. Bm G D A In tears you walked away, Verse: When I see you hand in hand with some other I slowly go insane. Memories of the way we used to be... Oh God, please stop the pain. Chorus: D G Em A Oh, once in a life time D/F# G A Nothing can last forever D/F# G I know it's not too late A7 F#/A# Would you let this be our fate? Bm G Asus4 I know you'd be right but please stay A Don't walk away Instrumental: Bm G D A Bm G A heading defines segment type ### Chorus: Oh, once in a life time Nothing can last forever I know it's not too late Would you let this be our fate? I know you'd be right but please stay. Don't walk away....
Hiromasa Fujihara Masataka Goto ed Industrial Science and Technology (AIST), Japan fujihara, m.goto}@aist.go.jp SongPrompter methods. In the ov model rd model tic audio ns found the first d method e incomegmentaon availhe lyrics -labelled e use our audio MFCCs chroma Figure 1: Integrating chord information in the lyricsto-audio alignment process (schematic illustration). The chords printed black represent chord changes, grey chords are continued from a prior chord change. Word-chord
SongPrompter automatic alignment works best with speech and chord features [6] visual display from automatic alignment lyrics, segmentation and chords audio playback original audio auto-extracted bass and drum track
SongPrompter automatic alignment works best with speech and chord features [6] visual display from automatic alignment lyrics, segmentation and chords audio playback original audio auto-extracted bass and drum track lbsbplf!gps! hvjubsjtut"
SongPrompter demo
Songle Web Service
Songle.jp web service [7] adding interaction engaging user experience insights through automatic annotations anyone can contribute it s social! use for MIR research crowd-sourcing more training data exposure to broader audience
Songle.jp web service [7] adding interaction engaging user experience insights through automatic annotations LIVE AND anyone can contribute it s social! ONLINE use for MIR research crowd-sourcing more training data exposure to broader audience
Driver s Seat Last.fm already have genre tags, similarity we want a complement: intuitively understandable audio features harmonic creativity (structural change [8]) noisiness, energy, rhythmic regularity,... Spotify app based on Last.fm audio API
Driver s Seat Esjwfs t! Tfbu Fyusbdujpo Bvejp Gfbuvsf BQJ Tqpujgz! JE BQJ Tqpujgz Bqqt BQJ Bvejp
Driver s Seat
DarwinTunes A G n N 100 project by Bob MacCallum and Armand Leroi at Imperial College; paper: [9] phenotypeproduction & rating selection 20 10 3.4 4.2 0.5 2.3 use genetic algorithms to evolve short musical loops reproduction, recombination & mutation G n+1 20 100 1. selection process is web-based, crowd-sourced (>6000 unique voters) 3.5 2. evolutionary analysis based on fitness (votes) and phenotype 2.5 (sound surface) B M (mean rating) 4.0 3.0 sound surface: a scientific application of music informatics 2.0 1.5
DarwinTunes A -2.8 B 950-3.0 900 C L -3.2 R 850-3.4 800-3.6-3.8 750-4.0 700 C 0 300 600 900 1200 1500 1800 2100 2400 2700 D 0 300 600 900 1200 1500 1800 2100 2400 2700 Chordino log-likelihood and Rhythmic Complexity measures both indicate a drastic rise and subsequent stagnation plateau best explained by fragile features despite the existence of better tunes transmission imposes limit
Zukunftsmusik Drum transcription. Improve drum transcription by language modelling from a large corpus of symbolic drum patterns Singing research. Make a user interface Tony for the quick and simple annotation of pitches in monophonic audio. How do singers correct pitch errors? Do we have a background tuning process in our heads? Collaborate with ethnomusicologists, musicians, psychologists...
References [1] Mauch, M., & Dixon, S. (2010). Simultaneous Estimation of Chords and Musical Context from Audio. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1280-1289. [2] Mauch, M. (2010). Automatic Chord Transcription from Audio Using Computational Models of Musical Context. Queen Mary University of London. [3] Ni, Y., McVicar, M., Santos-Rodriguez, R., & De Bie, T. (2012). An end-to-end machine learning system for harmonic analysis of music. IEEE Transactions on Audio, Speech, and Language Processing, in print. [4] Mauch, M., Noland, K. C., & Dixon, S. (2009). Using Musical Structure to Enhance Automatic Chord Transcription. Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR 2009). [5] Mauch, M., & Dixon, S. (2010). Approximate Note Transcription for the Improved Identification of Difficult Chords. Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010). [6] Mauch, M., Fujihara, H. & Goto, M. (2012). Integrating Additional Chord Information Into HMM-Based Lyrics-to-Audio Alignment. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 200-210. [7] Goto, M., Yoshii, K., Fujihara, H., Mauch, M., & Nakano, T. (2011). Songle: A Web Service for Active Music Listening Improved by User Contributions. Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011) (pp. 311-316). [8] Mauch, M., & Levy, M. (2011). Structural Change on Multiple Time Scales as a Correlate of Musical Complexity. Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011) (pp. 489-494). [9] MacCallum, B., Mauch, M., Burt, A., & Leroi, A. M. (2012). Evolution of Music by Public Choice. Proceedings of the National Academy of Sciences of the United States of America, 109(30), 12081-12086.