Old School Production Professional Studio New School Personal Studio 1
Old School Distribution New School Large Scale Physical Cumbersome Small Scale Virtual Portable 2
Old School Critics Promotion New School Social Networks Radio DJs Personalized Internet Radio 3
Age of Music Proliferation Producers Consumers 5M Artists 140M ipods 150M Songs Semantic Music Discovery Engine 50M Customers 27K Record Labels 31% Americans 4
Talk Outline Age of Music Proliferation - Sec. 1.1 Music Search & Discovery - Sec. 1.2 Semantic Music Discovery Engine - Sec. 1.3 Collecting Music Information - Ch. 3, 4 Autotagging System - Ch. 2 CAL Music Discovery Engine - Sec. 1.4 Concluding Remarks - Ch. 5 5
Music Search Search - retrieving specific audio content Common Paradigms: 1. Query-by-Metadata 2. Query-by-Performance 3. Query-by-Fingerprint 6
Music Discovery Discovery - finding new music or relationships Common Paradigms: 1. Recommendation-by-Popularity 2. Browse-by-Genre 3. Query-by-Similarity Acoustic Social Semantic 4. Query-by-Description 7
Semantic Music Discovery Engine Index music with tags so that it can be retrieved using a semantic description Tag - a short text-based token mellow, classic rock, acoustic slide guitar real-valued weight strength of association Semantic - use meaningful words to describe music mellow classic rock that sounds like the Beatles and features an acoustic slide guitar akin to Internet Search Engines 8
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Surveys Audio Characteristics Annotation Games Autotagging System Autotags Internet Music Sites Text-mining System Analytic Systems Automatic Annotation Human Annotation Music Information Index Discovery Engine Search Engine Internet Radio Social Network 9
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Audio Tracks Metadata Data Sources Human Annotation 10
Music Last.fm - 150M songs by 16M artists CAL500-500 songs by 500 artist Long Tail Economics - Chris Anderson (2004) Popularity Short Tail - Popular Long Tail - Obscure Songs Cold Start Problem - Songs in the long tail are not annotated and thus can not be discovered. 11
Metadata Factual information about music song, album, artist, record label year, biographical, charts heterogeneous data strings, numbers, images, graphs 12
Metadata 13 http://www.allmusic.com/cg/amg.dll?p=amg&sql=11:difrxqr5ldje
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Audio Tracks Music Processing System Metadata Audio Characteristics Data Sources Human Annotation Analytic Systems Automatic Annotation 14
Music Processing Systems Information extracted from audio signal Acoustic - noise, roughness Rhythmic - tempo, patterns Harmonic - key, major/minor Structural - chorus locations 15
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Audio Tracks Metadata Tags Music Processing System Audio Characteristics Internet Music Sites Data Sources Human Annotation 16
Surveys Pandora Music Genome Project 400 Objective Genes 50 trained music experts 750,000 songs annotated 17
Surveys CAL500 Survey 174-tag vocab - genre, emotion, Paid 55 undergrads to annotate music for 120 hours 500 songs annotated by 3 people 18
Human Annotations Conducting a survey Reliable, Precise, Tailored to Application X Expensive, Laborious, Not Scalable 19
Annotation Games Human-Computation Web-based, multi-player game with real-time interaction Player contribute useful annotations through game play ESPGame for images [Von Ahn] Listen Game for songs 20
Listen Game 21
Human Annotation Survey Reliable, Precise, Tailored to Application X Expensive, Laborious, Not Scalable Annotation Game Cheap, Scalable, Precise, Personalized X Need to create a viral user experience 22
Music Web Sites 1. Social Tagging Site Users annotate music with tags Last.fm - 960K distinct tags 23 http://www.last.fm/music/redhotchilipeppers/_/giveitaway
Music Web Sites 2. Collecting Web Documents Song & Album Reviews Artist Biographies Music Blogs, Discussion Boards Allmusic, Rolling Stone, Amazon, Mog 24
Web Documents Genres: Funk (3) Funk-metal Funk-rock Pop Rap Vocals: Nasal Staccato Enunciation Distinctive vocals Instruments: Guitar Bass Jew s-harp Adjective: Hard-rocking (2) Noisy Scratchy Sliding Positive vibes 25
Collecting an Annotated Music Corpus Survey Reliable, Precise, Tailored to Application X Expensive, Laborious, Not Scalable Annotation Game Cheap, Scalable, Precise, Personalized X Need to create a viral user experience Music Web Sites Cheap, Annotations for short-tail X Noisy, long-tail is poorly represented 26
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Audio Tracks Metadata Tags Music Processing System Audio Characteristics Autotagging System Autotags Internet Music Sites Analytic Systems Automatic Annotation Data Sources Human Annotation 27
Autotagging System Our goal is to build a system that can 1. Annotate a song with meaningful tags 2. Retrieve songs given a text-based query Frank Sinatra Fly Me to the Moon Annotation Retrieval Jazz Male Vocals Sad Slow Tempo Plan: Learn a probabilistic model that captures a relationship between audio content and tags. 28
System Overview Data Representation Modeling Evaluation Training Data Vocabulary Parametric Model T T Annotation Annotation Vectors Audio Feature Extraction Parameter Estimation Novel Song Evaluation (annotation) Music Review Inference Text Query (retrieval) 29
Semantic Representation Choose vocabulary of musically relevant tags Instruments, Genre, Emotion, Vocal, Usages Annotations are converted to a real-valued vector Semantic association between a tag and a song Example: Frank Sinatra s Fly Me to the Moon Vocab = {funk, jazz, guitar, sad, female vocals} y = [0/4, 3/4, 4/4, 2/4, 0/4] 30
Acoustic Representation Each song is represented as a bag-of-feature-vectors Pass a short time window over the audio signal Extract a feature vector for each short-time audio segment Ignore temporal relationships of time series X = x 1, x 2, x,..., x 3 t 31
Audio Features We calculate MFCCDeltas feature vectors Mel-frequency Cepstral Coefficients (MFCC) Low dimensional representation short-term spectrum Popular for both representing speech, music, and sound effects Instantaneous derivatives (deltas) encode short-time temporal info 5,200 39-dimensional vectors per minute Numerous other audio representations Spectral features, modulation spectra, chromagrams, 32
Statistical Model Supervised Multi-class Labeling model One Gaussian Mixture Model (GMM) per tag - p(x t) Key Idea: GMM trained with songs associated with tag Notes: Developed for image annotation [Carneiro & Vasconcelos 05] Scalable and Parallelizable Modified for real-value weights rather than binary labels Extended formulation to handle multi-tag queries 33
34 Modeling a Song EM Bag of MFCC vectors Algorithm 1. Segment audio signals 2. Extract short-time feature vectors 3. Estimate GMM with EM algorithm
Modeling a Tag Algorithm: 1. Identify songs associated with tag t 2. Estimate a song GMM for each song - p(x s) 3. Use the Mixture Hierarchies EM algorithm [Vasconcelos01] Learn a mixture of mixture components romantic Standard EM romantic Mixture Hierarchies EM Tag Model p(x t) Benefits Computationally efficient for parameter estimation and inference Smoothed song representation better density estimate 35
Assuming Annotation Given a novel song X = {x 1,, x T }, calculate 1. Uniform tag prior 2. Vectors are conditionally independent given a tag 3. Geometric average of likelihoods 4. Tags are mutually exclusive and exhaustive Semantic Multinomial: P(t X) s multinomial distribution over the tag vocabulary Annotation: peaks of multinomial 36
Annotation Semantic Multinomial for Give it Away by the Red Hot Chili Peppers P(t X) 37
Annotation: Automatic Music Reviews Dr. Dre (feat. Snoop Dogg) - Nuthin' but a 'G' thang This is a dance poppy, hip-hop song that is arousing and exciting. It features drum machine, backing vocals, male vocal, a nice acoustic guitar solo, and rapping, strong vocals. It is a song that is very danceable and with a heavy beat that you might like listen to while at a party. Frank Sinatra - Fly me to the moon This is a jazzy, singer / songwriter song that is calming and sad. It features acoustic guitar, piano, saxophone, a nice male vocal solo, and emotional, high-pitched vocals. It is a song with a light beat and a slow tempo that you might like listen to while hanging with friends. 38
Retrieval 1. Annotate each song in corpus with a semantic multinomial p p = {P(t 1 X),, P(t V X)} 2. Given a text-based query, construct a query multinomial q q i = 1/ t, if tag t appears in the query string q i = 0, otherwise 3. Rank all songs by the Kullback-Leibler (KL) divergence 39
Retrieval Query: a tender pop song with female vocals 0.33 Query Multinomial tender pop female vocals 0.024 1. Shakira - The One 0.024 2. Alicia Keyes - Fallin 0.024 3. Evanescence - My Immortal 40
Retrieval Query Retrieved Songs Tender Crosby, Stills and Nash - Guinevere Jewel - Enter from the East Art Tatum - Willow Weep for Me Female Vocals Alicia Keys - Fallin Shakira - The One Junior Murvin - Police and Thieves Tender AND Female Vocals Jewel - Enter from the East Evanescence - My Immortal Cowboy Junkies - Postcard Blues 41
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Internet Music Sites Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Audio Characteristics Autotagging System Autotags Text-mining System Analytic Systems Automatic Annotation Human Annotation 42
Text-mining System Relevance Scoring [Knees 08] site-specific queries Amazon, AMG, Billboards, etc. weight-based approach Step 1: Collect Corpus For each song, use a search engine to retrieve web pages: site:<website> <artist> music site:<website> <artist> <album> music review site:<website> <artist> <song> music review Maintain I s,d = mapping of songs to documents 43
Text-mining System Step 2: Autotag songs For each tag t: 1. Query corpus with tag t to find relevant documents w t,d relevance score for document d 2. For each song s, sum relevance scores for documents that are related to song s w s,t = Σ d I s,d w t,d 44
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Internet Music Sites Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Audio Characteristics Autotagging System Autotags Text-mining System Analytic Systems Automatic Annotation Human Annotation Music Information Index 45
Comparing Tags Groundtruth CAL500 - binary labeling of song-tag pairs Long Tail - subset of 87 obscure songs Approaches 1. Social Tags - Last.fm 2. Annotation Game - Listen Game 3. Web Autotags - Site-specific relevance scoring 4. Audio Autotags - SML model w/ MFCCs 46
Comparing Tags For each approach: For each tag: 1. Rank songs 2. Calculate Area under the ROC curve (AROC) 0.5 random ranking (Bad) 1.0 perfect ranking (Good) Calculate mean AROC 47
Comparing Tags Approach Songs AROC Social Tags Game Web Autotags Audio Autotags CAL500 0.62 Long Tail 0.54 CAL500 0.65 Long Tail * CAL500 0.66 Long Tail 0.56 CAL500 0.69 Long Tail 0.70 48
Combining Tags Approaches 1. Autotagging - single best approach 2. Best Rank Interleaving 3. Isotonic Regression - [Zadrozny 02] 4. RankBoost - [Freund03] 49
Combining Tags Approach Audio Autotags Best Rank Interleaving Isotonic Regression AROC 0.69 0.74 0.75 RankBoost 0.75 50
Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Internet Music Sites Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Audio Characteristics Autotagging System Autotags Text-mining System Analytic Systems Automatic Annotation Human Annotation Music Information Index Discovery Engine Search Engine Internet Radio Social Network 51
CAL Music Discovery Engine 52
CAL Music Discovery Engine 53
Research Challenges What s on tap 1. Explore music similarity with semantics 2. Explore discriminative approaches [Eck 07] 3. Combine heterogeneous data sources Game Data, Social Networks, Web Documents, Popularity Info 4. Focus on person rather than population Demographic and Psychographic Groups Individuals Emotional states of an Individual 54
References Semantic Annotation and Retrieval [IEEE TASLP 08, SIGIR 07, ISMIR08?] Music Annotation Games [ISMIR 07a] Related: Query-by-Semantic-Similarity [ICASSP 07, MIREX 07] Tag Vocabulary Selection with Sparce CCA [ISMIR 07b] Supervised Music Boundary Detection [ISMIR 07c] Work-in-Progress: 1. Combining Tags from Multiple Sources Rank Aggregation, Kernel Combination [ISMIR 08?] 2. Music Similarity with Semantics 3. (More Social) Music Annotation Games 55
Thanks Gert, Charles, Lawrence, Shlomo, Serge, Sanjoy Advice and perspective Gary Cottrell, Virginia de Sa, IGERT Enabling creative and interdisciplinary pursuits Damien O malley, Aron Tremble, VLC Thinking beyond the walls of academia Luke Barrington, Antoni Chan, David Torres Friends and collaborators 56
Talking about music is like dancing about architecture it s a really stupid thing to want to do - Elvis Costello and others Douglas Turnbull Computer Audition Laboratory UC San Diego dturnbul@cs.ucsd.edu cs.ucsd.edu/~dturnbul 57
Design and Development of a Semantic Music Discovery Engine Douglas Turnbull Ph.D. Thesis Defense University of California, San Diego Committee: Gert Lanckriet, Charles Elkan, Lawrence Saul, Shlomo Dubnov, Serge Belongie, Sanjoy Dasgupta May 7, 2008 58
The Age of Music Proliferation Production: 5M artist pages - 150M distinct songs - Distribution 1.5M simultaneous P2P users (Feb 01) - 27K record labels - 4B songs to 50M customers - Consumption 11M Internet radio users - 110M ipods sold - 59
Quantifying Retrieval Rank order test set songs KL between a query multinomial and semantic multinomials 1-, 2-, 3-word queries with 5 or more examples Metric: Area under the ROC Curve (AROC) Rank by Romantic Rank Label TP FP 1/2 0 1 2 R - 1/2 1/3 3 4 5 R - - 1 1 1 1/3 2/3 1 0 True Positive Rate 1 1 AROC = 5/6 False Positive Rate Mean AROC is the average AROC over a large number of queries. 60
Comparing Tags Approach Songs Density AROC Ground Truth CAL500 Social Tags Last.fm Game Listen Game Web Autotags Audio Autotags All 0.15 1.00 Long-Tail 0.15 1.00 All 0.23 0.62 Long-Tail 0.03 0.54 All 0.37 0.65 Long-Tail * * All 0.67 0.66 Long-Tail 0.25 0.56 All 1.00 0.69 Long-Tail 1.00 0.70 61
Music & Technology Technology is changing how music is produced, distributed, promoted and consumed. 62