Production. Old School. New School. Personal Studio. Professional Studio

Similar documents

Music Information Retrieval Community

MODELS of music begin with a representation of the

UC San Diego UC San Diego Electronic Theses and Dissertations

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

MUSI-6201 Computational Music Analysis

USING ARTIST SIMILARITY TO PROPAGATE SEMANTIC INFORMATION

Subjective Similarity of Music: Data Collection for Individuality Analysis

Content-based music retrieval

Music Information Retrieval

Autotagger: A Model For Predicting Social Tags from Acoustic Features on Large Music Databases

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Supervised Learning in Genre Classification

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Towards Musical Query-by-Semantic-Description using the CAL500 Data Set

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Music Recommendation from Song Sets

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Combining Audio Content and Social Context for Semantic Music Discovery

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Automatic Rhythmic Notation from Single Voice Audio Sources

Using Genre Classification to Make Content-based Music Recommendations

Lecture 15: Research at LabROSA

Classification of Timbre Similarity

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

A New Method for Calculating Music Similarity

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Learning to Tag from Open Vocabulary Labels

Features for Audio and Music Classification

Music Genre Classification and Variance Comparison on Number of Genres

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Computational Modelling of Harmony

Singer Traits Identification using Deep Neural Network

A Survey of Audio-Based Music Classification and Annotation

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Types of music SPEAKING

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

SIGNAL + CONTEXT = BETTER CLASSIFICATION

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Release Year Prediction for Songs

ISMIR 2008 Session 2a Music Recommendation and Organization

Tempo and Beat Analysis

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Automatic Laughter Detection

Music Processing Audio Retrieval Meinard Müller

Music Information Retrieval. Juan P Bello

Automatic Laughter Detection

TOWARDS TIME-VARYING MUSIC AUTO-TAGGING BASED ON CAL500 EXPANSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

Outline. Why do we classify? Audio Classification

Creating a Feature Vector to Identify Similarity between MIDI Files

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Audio Feature Extraction for Corpus Analysis

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Automatic Music Genre Classification

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Topic 10. Multi-pitch Analysis

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

/$ IEEE

Effects of acoustic degradations on cover song recognition

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

MUSIC tags are descriptive keywords that convey various

Introductions to Music Information Retrieval

Breakscience. Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass

Statistical Modeling and Retrieval of Polyphonic Music

Enhancing Music Maps

Chord Classification of an Audio Signal using Artificial Neural Network

Lyrics Classification using Naive Bayes

Lecture 9 Source Separation

Retrieval of textual song lyrics from sung inputs

A Categorical Approach for Recognizing Emotional Effects of Music

Unifying Low-level and High-level Music. Similarity Measures

The Million Song Dataset

Automatic Music Clustering using Audio Attributes

Improving Frame Based Automatic Laughter Detection

Singer Identification

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Music Similarity and Cover Song Identification: The Case of Jazz

Tempo and Beat Tracking

Semi-supervised Musical Instrument Recognition

Information storage & retrieval systems Audiovisual materials

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

Music Genre Classification

Singer Recognition and Modeling Singer Error

Learning Word Meanings and Descriptive Parameter Spaces from Music. Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab

HIT SONG SCIENCE IS NOT YET A SCIENCE

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Musical Hit Detection

An Examination of Foote s Self-Similarity Method

Detecting Musical Key with Supervised Learning

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Singing Pitch Extraction and Singing Voice Separation

Recognising Cello Performers using Timbre Models

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

TEMPORAL MUSIC CONTEXT IDENTIFICATION WITH USER LISTENING DATA

Transcription:

Old School Production Professional Studio New School Personal Studio 1

Old School Distribution New School Large Scale Physical Cumbersome Small Scale Virtual Portable 2

Old School Critics Promotion New School Social Networks Radio DJs Personalized Internet Radio 3

Age of Music Proliferation Producers Consumers 5M Artists 140M ipods 150M Songs Semantic Music Discovery Engine 50M Customers 27K Record Labels 31% Americans 4

Talk Outline Age of Music Proliferation - Sec. 1.1 Music Search & Discovery - Sec. 1.2 Semantic Music Discovery Engine - Sec. 1.3 Collecting Music Information - Ch. 3, 4 Autotagging System - Ch. 2 CAL Music Discovery Engine - Sec. 1.4 Concluding Remarks - Ch. 5 5

Music Search Search - retrieving specific audio content Common Paradigms: 1. Query-by-Metadata 2. Query-by-Performance 3. Query-by-Fingerprint 6

Music Discovery Discovery - finding new music or relationships Common Paradigms: 1. Recommendation-by-Popularity 2. Browse-by-Genre 3. Query-by-Similarity Acoustic Social Semantic 4. Query-by-Description 7

Semantic Music Discovery Engine Index music with tags so that it can be retrieved using a semantic description Tag - a short text-based token mellow, classic rock, acoustic slide guitar real-valued weight strength of association Semantic - use meaningful words to describe music mellow classic rock that sounds like the Beatles and features an acoustic slide guitar akin to Internet Search Engines 8

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Surveys Audio Characteristics Annotation Games Autotagging System Autotags Internet Music Sites Text-mining System Analytic Systems Automatic Annotation Human Annotation Music Information Index Discovery Engine Search Engine Internet Radio Social Network 9

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Audio Tracks Metadata Data Sources Human Annotation 10

Music Last.fm - 150M songs by 16M artists CAL500-500 songs by 500 artist Long Tail Economics - Chris Anderson (2004) Popularity Short Tail - Popular Long Tail - Obscure Songs Cold Start Problem - Songs in the long tail are not annotated and thus can not be discovered. 11

Metadata Factual information about music song, album, artist, record label year, biographical, charts heterogeneous data strings, numbers, images, graphs 12

Metadata 13 http://www.allmusic.com/cg/amg.dll?p=amg&sql=11:difrxqr5ldje

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Audio Tracks Music Processing System Metadata Audio Characteristics Data Sources Human Annotation Analytic Systems Automatic Annotation 14

Music Processing Systems Information extracted from audio signal Acoustic - noise, roughness Rhythmic - tempo, patterns Harmonic - key, major/minor Structural - chorus locations 15

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Audio Tracks Metadata Tags Music Processing System Audio Characteristics Internet Music Sites Data Sources Human Annotation 16

Surveys Pandora Music Genome Project 400 Objective Genes 50 trained music experts 750,000 songs annotated 17

Surveys CAL500 Survey 174-tag vocab - genre, emotion, Paid 55 undergrads to annotate music for 120 hours 500 songs annotated by 3 people 18

Human Annotations Conducting a survey Reliable, Precise, Tailored to Application X Expensive, Laborious, Not Scalable 19

Annotation Games Human-Computation Web-based, multi-player game with real-time interaction Player contribute useful annotations through game play ESPGame for images [Von Ahn] Listen Game for songs 20

Listen Game 21

Human Annotation Survey Reliable, Precise, Tailored to Application X Expensive, Laborious, Not Scalable Annotation Game Cheap, Scalable, Precise, Personalized X Need to create a viral user experience 22

Music Web Sites 1. Social Tagging Site Users annotate music with tags Last.fm - 960K distinct tags 23 http://www.last.fm/music/redhotchilipeppers/_/giveitaway

Music Web Sites 2. Collecting Web Documents Song & Album Reviews Artist Biographies Music Blogs, Discussion Boards Allmusic, Rolling Stone, Amazon, Mog 24

Web Documents Genres: Funk (3) Funk-metal Funk-rock Pop Rap Vocals: Nasal Staccato Enunciation Distinctive vocals Instruments: Guitar Bass Jew s-harp Adjective: Hard-rocking (2) Noisy Scratchy Sliding Positive vibes 25

Collecting an Annotated Music Corpus Survey Reliable, Precise, Tailored to Application X Expensive, Laborious, Not Scalable Annotation Game Cheap, Scalable, Precise, Personalized X Need to create a viral user experience Music Web Sites Cheap, Annotations for short-tail X Noisy, long-tail is poorly represented 26

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Audio Tracks Metadata Tags Music Processing System Audio Characteristics Autotagging System Autotags Internet Music Sites Analytic Systems Automatic Annotation Data Sources Human Annotation 27

Autotagging System Our goal is to build a system that can 1. Annotate a song with meaningful tags 2. Retrieve songs given a text-based query Frank Sinatra Fly Me to the Moon Annotation Retrieval Jazz Male Vocals Sad Slow Tempo Plan: Learn a probabilistic model that captures a relationship between audio content and tags. 28

System Overview Data Representation Modeling Evaluation Training Data Vocabulary Parametric Model T T Annotation Annotation Vectors Audio Feature Extraction Parameter Estimation Novel Song Evaluation (annotation) Music Review Inference Text Query (retrieval) 29

Semantic Representation Choose vocabulary of musically relevant tags Instruments, Genre, Emotion, Vocal, Usages Annotations are converted to a real-valued vector Semantic association between a tag and a song Example: Frank Sinatra s Fly Me to the Moon Vocab = {funk, jazz, guitar, sad, female vocals} y = [0/4, 3/4, 4/4, 2/4, 0/4] 30

Acoustic Representation Each song is represented as a bag-of-feature-vectors Pass a short time window over the audio signal Extract a feature vector for each short-time audio segment Ignore temporal relationships of time series X = x 1, x 2, x,..., x 3 t 31

Audio Features We calculate MFCCDeltas feature vectors Mel-frequency Cepstral Coefficients (MFCC) Low dimensional representation short-term spectrum Popular for both representing speech, music, and sound effects Instantaneous derivatives (deltas) encode short-time temporal info 5,200 39-dimensional vectors per minute Numerous other audio representations Spectral features, modulation spectra, chromagrams, 32

Statistical Model Supervised Multi-class Labeling model One Gaussian Mixture Model (GMM) per tag - p(x t) Key Idea: GMM trained with songs associated with tag Notes: Developed for image annotation [Carneiro & Vasconcelos 05] Scalable and Parallelizable Modified for real-value weights rather than binary labels Extended formulation to handle multi-tag queries 33

34 Modeling a Song EM Bag of MFCC vectors Algorithm 1. Segment audio signals 2. Extract short-time feature vectors 3. Estimate GMM with EM algorithm

Modeling a Tag Algorithm: 1. Identify songs associated with tag t 2. Estimate a song GMM for each song - p(x s) 3. Use the Mixture Hierarchies EM algorithm [Vasconcelos01] Learn a mixture of mixture components romantic Standard EM romantic Mixture Hierarchies EM Tag Model p(x t) Benefits Computationally efficient for parameter estimation and inference Smoothed song representation better density estimate 35

Assuming Annotation Given a novel song X = {x 1,, x T }, calculate 1. Uniform tag prior 2. Vectors are conditionally independent given a tag 3. Geometric average of likelihoods 4. Tags are mutually exclusive and exhaustive Semantic Multinomial: P(t X) s multinomial distribution over the tag vocabulary Annotation: peaks of multinomial 36

Annotation Semantic Multinomial for Give it Away by the Red Hot Chili Peppers P(t X) 37

Annotation: Automatic Music Reviews Dr. Dre (feat. Snoop Dogg) - Nuthin' but a 'G' thang This is a dance poppy, hip-hop song that is arousing and exciting. It features drum machine, backing vocals, male vocal, a nice acoustic guitar solo, and rapping, strong vocals. It is a song that is very danceable and with a heavy beat that you might like listen to while at a party. Frank Sinatra - Fly me to the moon This is a jazzy, singer / songwriter song that is calming and sad. It features acoustic guitar, piano, saxophone, a nice male vocal solo, and emotional, high-pitched vocals. It is a song with a light beat and a slow tempo that you might like listen to while hanging with friends. 38

Retrieval 1. Annotate each song in corpus with a semantic multinomial p p = {P(t 1 X),, P(t V X)} 2. Given a text-based query, construct a query multinomial q q i = 1/ t, if tag t appears in the query string q i = 0, otherwise 3. Rank all songs by the Kullback-Leibler (KL) divergence 39

Retrieval Query: a tender pop song with female vocals 0.33 Query Multinomial tender pop female vocals 0.024 1. Shakira - The One 0.024 2. Alicia Keyes - Fallin 0.024 3. Evanescence - My Immortal 40

Retrieval Query Retrieved Songs Tender Crosby, Stills and Nash - Guinevere Jewel - Enter from the East Art Tatum - Willow Weep for Me Female Vocals Alicia Keys - Fallin Shakira - The One Junior Murvin - Police and Thieves Tender AND Female Vocals Jewel - Enter from the East Evanescence - My Immortal Cowboy Junkies - Postcard Blues 41

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Internet Music Sites Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Audio Characteristics Autotagging System Autotags Text-mining System Analytic Systems Automatic Annotation Human Annotation 42

Text-mining System Relevance Scoring [Knees 08] site-specific queries Amazon, AMG, Billboards, etc. weight-based approach Step 1: Collect Corpus For each song, use a search engine to retrieve web pages: site:<website> <artist> music site:<website> <artist> <album> music review site:<website> <artist> <song> music review Maintain I s,d = mapping of songs to documents 43

Text-mining System Step 2: Autotag songs For each tag t: 1. Query corpus with tag t to find relevant documents w t,d relevance score for document d 2. For each song s, sum relevance scores for documents that are related to song s w s,t = Σ d I s,d w t,d 44

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Internet Music Sites Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Audio Characteristics Autotagging System Autotags Text-mining System Analytic Systems Automatic Annotation Human Annotation Music Information Index 45

Comparing Tags Groundtruth CAL500 - binary labeling of song-tag pairs Long Tail - subset of 87 obscure songs Approaches 1. Social Tags - Last.fm 2. Annotation Game - Listen Game 3. Web Autotags - Site-specific relevance scoring 4. Audio Autotags - SML model w/ MFCCs 46

Comparing Tags For each approach: For each tag: 1. Rank songs 2. Calculate Area under the ROC curve (AROC) 0.5 random ranking (Bad) 1.0 perfect ranking (Good) Calculate mean AROC 47

Comparing Tags Approach Songs AROC Social Tags Game Web Autotags Audio Autotags CAL500 0.62 Long Tail 0.54 CAL500 0.65 Long Tail * CAL500 0.66 Long Tail 0.56 CAL500 0.69 Long Tail 0.70 48

Combining Tags Approaches 1. Autotagging - single best approach 2. Best Rank Interleaving 3. Isotonic Regression - [Zadrozny 02] 4. RankBoost - [Freund03] 49

Combining Tags Approach Audio Autotags Best Rank Interleaving Isotonic Regression AROC 0.69 0.74 0.75 RankBoost 0.75 50

Semantic Music Discovery Engine Discovery Extraction Collection Artists & Record Labels Surveys Annotation Games Internet Music Sites Data Sources Audio Tracks Metadata Tags Web-documents Music Processing System Audio Characteristics Autotagging System Autotags Text-mining System Analytic Systems Automatic Annotation Human Annotation Music Information Index Discovery Engine Search Engine Internet Radio Social Network 51

CAL Music Discovery Engine 52

CAL Music Discovery Engine 53

Research Challenges What s on tap 1. Explore music similarity with semantics 2. Explore discriminative approaches [Eck 07] 3. Combine heterogeneous data sources Game Data, Social Networks, Web Documents, Popularity Info 4. Focus on person rather than population Demographic and Psychographic Groups Individuals Emotional states of an Individual 54

References Semantic Annotation and Retrieval [IEEE TASLP 08, SIGIR 07, ISMIR08?] Music Annotation Games [ISMIR 07a] Related: Query-by-Semantic-Similarity [ICASSP 07, MIREX 07] Tag Vocabulary Selection with Sparce CCA [ISMIR 07b] Supervised Music Boundary Detection [ISMIR 07c] Work-in-Progress: 1. Combining Tags from Multiple Sources Rank Aggregation, Kernel Combination [ISMIR 08?] 2. Music Similarity with Semantics 3. (More Social) Music Annotation Games 55

Thanks Gert, Charles, Lawrence, Shlomo, Serge, Sanjoy Advice and perspective Gary Cottrell, Virginia de Sa, IGERT Enabling creative and interdisciplinary pursuits Damien O malley, Aron Tremble, VLC Thinking beyond the walls of academia Luke Barrington, Antoni Chan, David Torres Friends and collaborators 56

Talking about music is like dancing about architecture it s a really stupid thing to want to do - Elvis Costello and others Douglas Turnbull Computer Audition Laboratory UC San Diego dturnbul@cs.ucsd.edu cs.ucsd.edu/~dturnbul 57

Design and Development of a Semantic Music Discovery Engine Douglas Turnbull Ph.D. Thesis Defense University of California, San Diego Committee: Gert Lanckriet, Charles Elkan, Lawrence Saul, Shlomo Dubnov, Serge Belongie, Sanjoy Dasgupta May 7, 2008 58

The Age of Music Proliferation Production: 5M artist pages - 150M distinct songs - Distribution 1.5M simultaneous P2P users (Feb 01) - 27K record labels - 4B songs to 50M customers - Consumption 11M Internet radio users - 110M ipods sold - 59

Quantifying Retrieval Rank order test set songs KL between a query multinomial and semantic multinomials 1-, 2-, 3-word queries with 5 or more examples Metric: Area under the ROC Curve (AROC) Rank by Romantic Rank Label TP FP 1/2 0 1 2 R - 1/2 1/3 3 4 5 R - - 1 1 1 1/3 2/3 1 0 True Positive Rate 1 1 AROC = 5/6 False Positive Rate Mean AROC is the average AROC over a large number of queries. 60

Comparing Tags Approach Songs Density AROC Ground Truth CAL500 Social Tags Last.fm Game Listen Game Web Autotags Audio Autotags All 0.15 1.00 Long-Tail 0.15 1.00 All 0.23 0.62 Long-Tail 0.03 0.54 All 0.37 0.65 Long-Tail * * All 0.67 0.66 Long-Tail 0.25 0.56 All 1.00 0.69 Long-Tail 1.00 0.70 61

Music & Technology Technology is changing how music is produced, distributed, promoted and consumed. 62