Content-based music retrieval

Similar documents
Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Supervised Learning in Genre Classification

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Music Information Retrieval

MUSI-6201 Computational Music Analysis

Outline. Why do we classify? Audio Classification

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Extracting Information from Music Audio

MODELS of music begin with a representation of the

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Subjective Similarity of Music: Data Collection for Individuality Analysis

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Data Driven Music Understanding

Effects of acoustic degradations on cover song recognition

The song remains the same: identifying versions of the same piece using tonal descriptors

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Singer Recognition and Modeling Singer Error

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Singer Traits Identification using Deep Neural Network

Music Genre Classification and Variance Comparison on Number of Genres

Music Similarity and Cover Song Identification: The Case of Jazz

Computational Modelling of Harmony

Music Information Retrieval for Jazz

Music Information Retrieval. Juan P Bello

Music Information Retrieval Community

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Introductions to Music Information Retrieval

Statistical Modeling and Retrieval of Polyphonic Music

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Rhythm related MIR tasks

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

Music Information Retrieval


A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Transcription of the Singing Melody in Polyphonic Music

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Contextual music information retrieval and recommendation: State of the art and challenges

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Analysing Musical Pieces Using harmony-analyser.org Tools

Lecture 15: Research at LabROSA

Probabilist modeling of musical chord sequences for music analysis

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Sparse Representation Classification-Based Automatic Chord Recognition For Noisy Music

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Unifying Low-level and High-level Music. Similarity Measures

Searching for Similar Phrases in Music Audio

Automatic Rhythmic Notation from Single Voice Audio Sources

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

THE importance of music content analysis for musical

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

Enhancing Music Maps

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

USING ARTIST SIMILARITY TO PROPAGATE SEMANTIC INFORMATION

Robert Alexandru Dobre, Cristian Negrescu

Content-based Music Structure Analysis with Applications to Music Semantics Understanding

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

Detecting Musical Key with Supervised Learning

Classification of Timbre Similarity

CSC475 Music Information Retrieval

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

From Low-level to High-level: Comparative Study of Music Similarity Measures

The Million Song Dataset

Audio Feature Extraction for Corpus Analysis

Aalborg Universitet. Feature Extraction for Music Information Retrieval Jensen, Jesper Højvang. Publication date: 2009

Lecture 11: Chroma and Chords

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. X, MONTH Unifying Low-level and High-level Music Similarity Measures

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

Chroma-based Predominant Melody and Bass Line Extraction from Music Audio Signals

Chord Classification of an Audio Signal using Artificial Neural Network

SIMAC: SEMANTIC INTERACTION WITH MUSIC AUDIO CONTENTS

Lecture 9 Source Separation

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

jsymbolic 2: New Developments and Research Opportunities

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Week 14 Music Understanding and Classification

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

Automatic Music Genre Classification

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Melody, Bass Line, and Harmony Representations for Music Version Identification

Efficient Vocal Melody Extraction from Polyphonic Music Signals

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility

Transcription:

Music retrieval 1 Music retrieval 2 Content-based music retrieval Music information retrieval (MIR) is currently an active research area See proceedings of ISMIR conference and annual MIREX evaluations Background for this growth: We have huge amounts of music available and easily accessible over the Internet Outline: Music similarity Music classification Transcription / separation oriented approaches Lyrics and paralinguistic information Efficient indexing techniques [itunes] Music retrieval 3 [Spotify] Music retrieval 4

[YouTube] Music retrieval 5 Music retrieval 6 [Last.fm] Plenty of music Consequences Music retrieval 7 Modern ways of searching music Music retrieval 8 Traditional ways of finding music are no longer sufficient we cannot browse through all the music we would potentially like record companies and radio stations are no longer critical gatekeepers in music distribution Relying just on popularity statistics is not effective music tastes are so different that averaging opinions does not produce precise information for an individual UK Singles Chart etc. sales statistics work badly as a guide for the consumer Two complementary approaches: 1. Collaborative filtering based on (users x items) matrix music likings metadata recommend music by comparing user profiles and predicting likings for new pieces measure similarity of music pieces (acoustics, usage, etc.) based on piece profiles users music pieces user profile piece profile rating (or usage) of piece j by user i 2. Content-based retrieval topic of this talk either based on automatic signal analysis or collaborative tagging by users Old ways of discovering music are still relevant too (though ineffective) talking to friends, relying on experts (e.g. listening to FM radio you like)

Music retrieval 9 Music retrieval 10 Audio-based MIR is needed Manual tagging as an alternative? Collaborative filtering (CF) does not solve it all CF does not allow separating the various dimensions of music similarity, but these are all mixed in the piece profile CF alone is not able to deal with items that are new or do not have many listeners Audio-based MIR addresses the above problems enables truly musical queries with specific musical criteria, such as requesting pieces with certain vocal characteristics or slow tempo can be employed even on media libraries that do not have any audience of listeners also, enables musically interesting listening UIs that encourage music understanding On the other hand, audio-based MIR alone cannot measure aspects like quality, usage, or culture The two approaches are complementary Music annotation by human experts is costly and limits the coverage not easy to integrate in music production process since music making is anarchistic Pandora.com is audio-based MIR service (US only) based on expert tagging [www.pandora.com] Collaborative tagging by music service users (for example last.fm) is effective for items that are sufficiently popular Tagging games can achieve better coverage, but (currently) less users [www.majorminer.org] Music retrieval 11 Music retrieval 12 MIR user interfaces Music similarity Non-speech audio is difficult to describe with words expressing a music query is hard Music similarity estimation enables query by example and browsing by artist similarity Query mechanisms query by example browse by similarity (see Figure) query by humming or tapping tempo lyrics music categories (genre, mood, tags) Tailor UI to match user abilities Music rainbow [Pampalk-Goto-2006] Widely used acoustic features Mel-frequency cepstral coefficients (MFCCs) timbre/instrumentation chroma [Bartsch-2001]: collapse spectral content into one octave and use 12 bins for the total spectral energy on each pitch class (c, c#, d,...,b) harmonic content rhythmogram (or, fluctuation patterns): cosine transform in blocks that extend in time direction rhythm Spectral features For a more comprehensive list, see e.g. [Peeters-2004] Also, presenting (multiple) retrieval results is challenging thumbnail extraction (chorus detection etc.) also spatialisation for simultaneous presentation has been tried Rhythmic features

Music retrieval 13 Features define similarity Music retrieval 14 Similarity as such is not well-defined For example: Is Bohemian rhapsody by Queen more similar to a) Bohemian rhapsody by London Symphony Orchestra, or b) Killer Queen by Queen? Features define similarity Music retrieval 15 Music retrieval 16 Similarity measures between audio clips Similarity as such is not well-defined For example: Is Bohemian rhapsody by Queen more similar to a) Bohemian rhapsody by London Symphony Orchestra, or b) Killer Queen by Queen? Similarity between two audio signals is typically calculated based on the statistics of extracted features Bag of features approach: collapse all temporal structure in data Traditional distance measures utilize means and covariances of the features. For example Mahalanobis distance between clips f and g: The riddle is solved by choosing the acoustic features chroma a) is more similar (composition) MFCCs b) is more similar (instrumentation) D Mah f, g f T g 1 f g g mean of features in clip g covariance of all features User may wish to specify the features when doing query by example Narrowing down a perfect piece by using multiple examples enabled when huge amounts of music is available

Music retrieval 17 Similarity measures between audio clips Music retrieval 18 Music similarity: some evaluation results Cross-likelihood ratio test is a bit more sophisticated distance measure: where p (, ) = B ( ) ( ) ( ) ( ) A pa1, a2,.., a TA B is the probability of feature vectors extracted from A given the model trained for B (GMM model, HMM model, or some other). Results from MIREX 2007 evaluation [Downie-ismir-2007] organised by IMIRSEL Group at University of Illinois Task repeat query-by-example 100 times return 5 closest from among 7000 songs human listeners rated the relevance of the returned pieces Accurate and well motivated, but requires going through the feature vectors computationally inefficient Music similarity: techniques used (reference material do not memorize) Music retrieval 19 pdf-based similarity measures Music retrieval 20 Pohle-Schnitzer modification of [Pampalk-ismir-2006] features: MFCCs, fluctuation patterns (0-10Hz at several frequency bands), gravity (slow of fast) + bass extracted from fluctuation patterns features are averaged and normalised over the piece one feature vector per song cosine distance is employed as distance measure between the feature of two pieces Tzanetakis several spectral features (incl. MFCCs) two-step mean & std calculation of framewise features normalization, Euclidean distance Barrington-Turnbull-et al map audio tracks into a semantic feature space one feature vector per song resulting feature vector: 146-dimensional vector of posterior probabilities of certain concepts occurring, given the audio features one feature vector per song concepts included words that characterize the genre, instrumentation, vocals, emotion, rhythm, usage, etc. similarity measured with KL divergence between two feature vectors Idea: measure similarity by calculating distance between the probability density functions (pdfs) of features each song is represented by its pdf of features instead of just one feature vector more flexible and accurate than using the means and covariances of features no need to go through the feature vectors (after the models have been trained), however computational complexity higher than when using 1 feature vector / song For example Euclidean distance, Kullback-Leibler (KL) divergence etc.

Music retrieval 21 Using temporal sequences for similarity Above methods collapse the time structure of feature sequences Using temporal sequences for similarity requires time-alignment non-trivial: tempo differences, different numbers of sectional parts, etc. Beat-synchronous feature extraction reconciles for tempo differences = track the beat of each song and extract one feature vector per inter-beat-interval Used previously in cover song detection using beat-synchronised chroma features (e.g. [Ellis-2006]) analysis of the sectional form (verse, chorus,...) of a piece (e.g. [Paulus-dafx-2008]) Music retrieval 22 Cover song identification: example methods (reference material do not memorize) Serrà and Gómez extract a sequence of tonal descriptors (harmonic pitch class profiles) compute a similarity matrix between two pieces use dynamic programming to align the two pieces in time and to obtain similarity Ellis and Cotton beat-synchronized chroma features cross-correlation of the feature sequences of two pieces MIREX 2007 [Downie-ismir-2007] Music classification Music retrieval 23 Classification and identification tasks Music retrieval 24 Music can be classified according to genre, mood, etc. Classical train/test supervised classification scenario: audio Feature extraction Model training Models Classify music into categories genre: rock, hip hop, jazz, classical,... (here 10) mood: aggressive, passionate, humorous, cheerful,...(5) artist identification (here 102 artists) classical composer identification (here 11) Train-test setup Results from MIREX 2007 [Downie-ismir-2007] organised by IMIRSEL Group at University of Illinois Classify recognition result

Music retrieval 25 Music retrieval 26 Artist vs singer: task definition Artist vs singer: task definition Do we want to recognize the singer (person) or the artist name? Do we want to recognize the singer (person) or the artist name? Singer: Tarja Turunen Artist: Nightwish Singer: Anette Olzon Artist: Nightwish Singer: Tarja Turunen Artist: Beto Vazques Infinity Classification and identification tasks (reference material do not memorize) Music retrieval 27 Technically, the leading classification / identification systems are surprisingly similar Compare IMIRSEL vs. Mandel-Ellis vs. Tzanetakis vs. Guaus-Herrera all use a single feature vector per audio clip all obtain the feature by calculating statistics of feature over the clip (mean, std, covariances,...) frame-level features and the statistical measures do vary from system to system all found support vector machine (SVM) classifier to be the best Participants did not vary their systems much between different tasks Music retrieval 28 Transcription / separation approaches = Approaches where some musically meaningful part of the signal, such as the melody line, is extracted and analyzed Convergence of the techniques does not mean we are done glass ceiling... partly due to the fuzzyness of ground truth (genre, mood)

Music transcription Music retrieval 29 Query by humming Music retrieval 30 Transcription of melody [Goto; Paiva; Ellis-Poliner; Dressler; Ryynänen] bass line [Goto; Hainsworth; Ryynänen] drums [Paulus; Yoshii; FitzGerald] chords [Sheh-Ellis; Bello-Pickens; Harte; Lee] key/mode (e.g. [Gomez-phd]) tempo, meter instrument recognition Consists of two main steps: 1. transcribing a hummed or sung query into a suitable higher-level representation 2. matching that representation against a large database of known reference items Some QBH services are already available, for example SoundHound and Musipedia: Separation vocals drums MIREX 2007: polyphonic transcription task [Downie-ismir-2007] Query by humming of audio Music retrieval 31 Query by chord sequence similarity Music retrieval 32 Example method [Ryynänen-icassp-2008] preprocessing: extract melodies automatically from music pieces transcribe the query match by Euclidean distance between the two melodic contours (allow time scaling) efficient indexing using locality sensitive hashing Demos Example A query retrieval results #1 #2 #3 Example B query retrieval results #1 #2 #3 Example C query retrieval results #1 #2 #3 Query by example, determining similarity based on transcribed chords database of 1294 music pieces preprocessing: transcribe the chords (24 triads) from all pieces [Ryynänen-2008], Resample to get beat-synchronous chord sequence. Key normalisation. query: let the user select 10 second segment from an arbitrary song retrieve segment t 0 from song i that has the most similar chord sequence i, t T 2 i 0 arg min d qt, st 0 t i, t0 t 1 where distance d(x,y) between chords x and y is the Euclidean distance in the chord space [Krumhansl-90] after key-normalisation q t chord of query signal at beat t s i t chord of target signal i at beat t

Music retrieval 33 Music retrieval 34 Rhythmic similarity Lyrics: what is this song about? Approaches rhythmogram features + distance measure collapsing time structure [Dixon-03], [Paulus-dafx-2008] framewise features + distance aligning with dynamic time warping [Paulus-02] transcribe drums + similarity measure (e.g. Eigenrhythms [Ellis-04]) beat-synchronised features + distance measure comparing feature sequences classification into rhythmic categories [Kapur-2004] goal: recognize the words from a song Musipedia allows query by tapping (using the keyboard) www.musipedia.org (MIDI) Miranda in the morning takes her eggs sunny side up Music retrieval 35 Music retrieval 36 Indexing techniques Indexing-based audio analysis? Locality sensitive hashing (LSH) computationally efficient indexing technique to searching nearest neighbors 1) in large databases and 2) in high-dimensional feature spaces [Datar-2004] idea: project data points on random lines and subdivide the lines into hash buckets Increased memory capacity and powerful indexing techniques allow storing examples as an alternative for training a (statistical) model Imagine a situation where we have indexed, say, 10 10 audio signals on a server, and given a query example, could retrieve perceptually the most similar clips in an instant. Provided that some contextual information about the stored signals would be available too (which is realistic if the data is collected with mobile devices) this would result in a huge machine hearing system

Music retrieval 37 Music retrieval 38 Toolboxes Conclusions See Tools we use page edited by Paul Lamere http://www.music-ir.org/evaluation/tools.html Music in large quantitites functions as a resource that is potentially useful for many purposes: to pass time, to help concentrate in work, to improve physical exercise, to create suitable atmosphere in a social situation, etc. With the help of proper MIR tools, the large supply of music meets the even greater demand