CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

CTP431- Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology KAIST Juhan Nam 1

Introduction ü Instrument: Piano ü Genre: Classical ü Composer: Chopin ü Key: E-minor ü Mood: Melancholy, Sad, ü Songs with similar melody - ELO After all - Radiohead Exit Music ü Can you transcribe the song into a music score? 2

Information in Music Factual Information track, artist, years, composers Musical Information Music score: instrument, notes, meter, expressions Melody, rhythm, chords, structure Semantic Information genre, mood, text descriptions 3

Music Understanding by Human http://www.slideshare.net/daritsetseg/brainstem-auditory-evoked-responses-baer-or-abr-45762118 5

Music Understanding by Computer Music Information Retrieval (MIR) An area of research that aims to infer various types of information from music by computers 6

Applications of MIR Music listening Music identification, search and recommendation Music Performance Interactive music performance Musical Instrument learning Music composition Automatic composition and arrangement Entertainment Singing evaluation, game Sound production Sound sample search in sound libraries Automatic segmentation and digital audio Effects 7

Background Scale and diversity of music contents Commercial music tracks Spotify: 30M+ songs (2015) Bugs music: 10M+ songs (2017) User contents YouTube: 300h+ video uploaded per min (2015) SoundCloud: 12h+ audio uploaded per minute (2014) User data Profile, play history, rate, Spotify: +24M active users (as of Jan, 2014) YouTube: +1B unique users visit each month (as of Dec, 2014) All the music contents are readily accessible. How can we find music of my taste? Can we have a Google for music?

Music Identification Query by music Search a single unique song identified by the query Audio fingerprinting Audio Fingerprinting (http://labrosa.ee.columbia.edu/matlab/fingerprint/) Shazam 10

Music Identification Query by humming Sing with humming and find closest matches Melody-based match Melody Extraction SoundHound 11

Music Search and Recommendation Music Recommendation Playlist generation: personalized internet radio Matching songs to users Song information: genre, years, artist, audio User information: profile, play history, rating, context (places) Music service item in industry: Google, Apple, Pandora, Spotify, Melon, Bugs, itunes Music Pandora 12

Current Approaches Manual Curation Human Expert Analysis Collaborative Filtering Content-based Analysis (by computers) 13

Manual Curation Playlist generation by music experts (or users) Traditional: AM/FM radio The majority of current music services are based on this approach Advantages Effective for usage-based music services (workout, study, driving or prenatal education) Good for music discovery Often with story-telling Limitations No personalization Not scalable [www.soribada.com] 14

Human Expert Analysis Pandora: music genome project (1999) Musicologists analyze a song for about 450 musical attributes in various categories Big success as a music service Advantages High-quality analysis Good for music discovery Limitations Expensive: take 20-30 minutes for a song to be analyzed Not scalable : only for commercial tracks? 15

Collaborative Filtering (CF) Basic idea Person A: I like songs A, B, C and D. Person B: I like songs A, B, C and E. Person A: Really? You should check out song D. Person B: Wow, you also should check out song E. Formation Matrix factorization (or matrix completion) problem Song Preference p us = x u T y s y s User Similarity q u1u2 = x T u1 x u2 Juhan Gangnam Style x u Gangnam Style s latent vector Juhan s latent vector Song Similarity r s1s2 = y T s1 y s2 16 16

Collaborative Filtering Advantages Capture semantics of music in the aspect of human Enable personalized recommendation (by nature) Limitations The cold start problem: what if a song was never played by anyone? Popularity bias: likely to recommend (already) well-known songs or songs from the same musician or album 17

Collaborative Filtering Bad examples Can you find songs similar to this musician? [Oord et. al, 2013] 18

Content-Based Analysis An intelligent approach that makes computers listen to music and predict descriptive words from audio tracks Tags: genre, mood, instrument, voice quality, usage Features: Spectrogram, MFCC, Algorithms: GMM, SVM, Neural Networks Audio Files Audio Features Algorithms 19

Text-based Music Retrieval by Auto-tagging Sort the probability of the query tag and choose top-n songs Like text-based Google search Query word: Female Lead Vocals Top 5 ranked songs Norah Jones Don t know why Dido Here with me Sheryl Crow I shall believe No doubt Simple kind of like Carpenters Rainy days and Mondays We also can compute similarity between songs using the estimated tag probabilities E.g. cosine distance between two tag probability vectors Applicable to query by audio 21

Demo: Music Galaxy Hitchhiker (b) Search by Song mode with highlighted search results

Content-based Music Recommendation Blending audio and user data Replace the text-based tags with the latent vector of a song user song Gangnam Style s latent vector Matrix factorization from collaborative filtering [Oord et. al, 2013] Audio Track of Gangnam Style 23

Music Retrieval Results Collaborative Filtering only Collaborative Filtering + Audio Content [Oord et. al, 2013] 24

Content-Based Analysis Advantages Free of cold-start and popularity bias Highly scalable: using high-performance computing Works for music in other media or user content as well Can be combined with other approaches Limitations Social context is also important: indy, idol, affilation Do not care of music quality (e.g. level of performance), especially for user contents 25

Automatic Music Transcription (AMT) Predict score information from audio Note information: note onset, duration, velocity Rhythm: tempo, beat, down-beat Chord Structure

Zenph s Re-performance

Entertainment / Education Yousician 29

Score-Audio Alignment Temporally align audio and score Dynamic time warping of AMT results as audio features Applications Score Following Automatic page turning Auto-accompaniment Performance analysis

Automatic Page Turner (JKU, Austria)

The Piano Music Companion (JKU, Austria) 32

Sonation s Cadenza 33

Music Production https://www.youtube.com/watch?v=rmt6mdod3uc

Music Production Adaptive Audio Effects: automatic effect control Loudness Compressor Pitch Pitch correction (e.g. auto-tune) Harmonizer Timbre Genre-based automatic EQ Antares Auto-tune

Music Production Singing Expression Transfer Given two renditions of the same piece of music Transfer singing expressions from one voice to another Note timing, Pitch, Dynamics

Singing Expression Transfer Temporal Alignment Pitch Alignment Dynamics Alignment Target Singing Voice Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Singing Voice Time-Scale Modification Pitch Shifting s s " s "# s "#$ Gain Modified Singing Voice

Singing Expression Transfer: Demo Examples source target all modified source 벚꽃엔딩 Let it go 취중진담

Music Production Sound Sample search Imagine Research s MediaMind: search sound effect sample for media production (e.g. film, drama) Izotope s Breaktweaker: search similar timbre of drum sounds 39

Automatic Music Composition Algorithmic Composition An Area of Generative Art Types of Algorithms Generative Grammar Transition Network Markov Model Generic Algorithms Neural Networks

Automatic Music Composition David Cope s EMI (Experiments in Music Intelligence) (1980s) Based on Style Imitation Augmented Transition Networks

Recent Work: Automatic Music Composition Flow Machine Style Imitation based on Markov Model http://www.flow-machines.com/ Magenta Python Library based Deep Neural Networks (TensorFlow) https://magenta.tensorflow.org/welcome-to-magenta

Daddy s car : Sony CSL Lab s Flow Machines

Automatic Music Composition Background Music Generation: www.jukedeck.com

Automatic Music Arrangement 쿨잼 (Cool Jamm) Hum On

Musical Process and Data Musical Knowledge Base Composer Data Process Listener Perception Cognition Sound Field Symbolic Representation Temporal Control Performer Room Source Sound Instrument Physical Knowledge Base

Music Technology: The Present Musical Knowledge Base Composer Data Process Listener Perception Cognition Sound Field Symbolic Representation Temporal Control Performer Room Source Sound Instrument Physical Knowledge Base

Music Technology: The Future Musical Knowledge Base Composer Data Process Listener Perception Cognition Sound Field Symbolic Representation Temporal Control Performer Room Source Sound Instrument Physical Knowledge Base