Experimenting with Musically Motivated Convolutional Neural Networks

Similar documents
Deep learning for music data processing

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

In this project you will learn how to code a live music performance, that you can add to and edit without having to stop the music!

TOWARDS CHARACTERISATION OF MUSIC VIA RHYTHMIC PATTERNS

BAYESIAN METER TRACKING ON LEARNED SIGNAL REPRESENTATIONS

Classification of Dance Music by Periodicity Patterns

MODAL ANALYSIS AND TRANSCRIPTION OF STROKES OF THE MRIDANGAM USING NON-NEGATIVE MATRIX FACTORIZATION

ADSR AMP. ENVELOPE. Moog Music s Guide To Analog Synthesized Percussion. The First Step COMMON VOLUME ENVELOPES

RHYTHMIC PATTERN MODELING FOR BEAT AND DOWNBEAT TRACKING IN MUSICAL AUDIO

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

Musicological perspective. Martin Clayton

EVALUATION OF FEATURE EXTRACTORS AND PSYCHO-ACOUSTIC TRANSFORMATIONS FOR MUSIC GENRE CLASSIFICATION

Chord Classification of an Audio Signal using Artificial Neural Network

Music BCI ( )

A Music Information Retrieval Approach Based on Power Laws

gresearch Focus Cognitive Sciences

AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES. A Thesis. presented to

Singer Traits Identification using Deep Neural Network

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Music Genre Classification

PULSE-DEPENDENT ANALYSES OF PERCUSSIVE MUSIC

Representations of Sound in Deep Learning of Audio Features from Music

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

TOWARD UNDERSTANDING EXPRESSIVE PERCUSSION THROUGH CONTENT BASED ANALYSIS

ON RHYTHM AND GENERAL MUSIC SIMILARITY

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

A Survey of Audio-Based Music Classification and Annotation

Music Similarity and Cover Song Identification: The Case of Jazz

An AI Approach to Automatic Natural Music Transcription

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

MUSI-6201 Computational Music Analysis

Lecture 9 Source Separation

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

THE importance of music content analysis for musical

BEAT HISTOGRAM FEATURES FROM NMF-BASED NOVELTY FUNCTIONS FOR MUSIC CLASSIFICATION

From Low-level to High-level: Comparative Study of Music Similarity Measures

MODELING MUSICAL RHYTHM AT SCALE WITH THE MUSIC GENOME PROJECT Chestnut St Webster Street Philadelphia, PA Oakland, CA 94612

Coming Soon! New Latin Styles. by Marc Dicciani

Lecture 10 Harmonic/Percussive Separation

JOINT BEAT AND DOWNBEAT TRACKING WITH RECURRENT NEURAL NETWORKS

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Lesson 1 name: Style Studies: Drum Set

Analysing Musical Pieces Using harmony-analyser.org Tools

Music Information Retrieval

Neural Network for Music Instrument Identi cation

Music Genre Classification and Variance Comparison on Number of Genres

USING VOICE SUPPRESSION ALGORITHMS TO IMPROVE BEAT TRACKING IN THE PRESENCE OF HIGHLY PREDOMINANT VOCALS. Jose R. Zapata and Emilia Gomez

GOOD-SOUNDS.ORG: A FRAMEWORK TO EXPLORE GOODNESS IN INSTRUMENTAL SOUNDS

Music 209 Advanced Topics in Computer Music Lecture 4 Time Warping

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

Rhythm related MIR tasks

A COMPARISON OF MELODY EXTRACTION METHODS BASED ON SOURCE-FILTER MODELLING

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

LSTM Neural Style Transfer in Music Using Computational Musicology

Automatic Piano Music Transcription

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

School District of Marshfield Course Syllabus

Lets go through the chart together step by step looking at each bit and understanding what the Chart is asking us to do.

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Preview Only. A Holiday Encore for Band. Arranged by ROBERT W. SMITH (ASCAP) and MICHAEL STORY (ASCAP)

advanced spectral processing

Allegretto we are the leaders in culture an eisteddfod with a difference

CS 7643: Deep Learning

Sub Kick This particular miking trick is one that can be used to bring great low-end presence to the kick drum.

The Million Song Dataset

Unifying Low-level and High-level Music. Similarity Measures

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

AUSTRALIAN PIPE BAND COLLEGE PRELIMINARY DRUMMING SYLLABUS Written by Greg Bassani B.Ed., Dip.T., B.Tech., APBA Principal of Drumming, 2004.

STRUCTURAL ANALYSIS AND SEGMENTATION OF MUSIC SIGNALS

HST 725 Music Perception & Cognition Assignment #1 =================================================================

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Computational Rhythm Similarity Development and Verification Through Deep Networks and Musically Motivated Analysis

Using FR-4x Editor. Introduction. Installation. Connecting the FR-4x to the Editor

Music Understanding and the Future of Music

Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models

Multidimensional analysis of interdependence in a string quartet

Using the MPEG-7 Standard for the Description of Musical Content

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Tempo and Beat Analysis

LEARNING A FEATURE SPACE FOR SIMILARITY IN WORLD MUSIC

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

Dr Kelly Jakubowski Music Psychologist October 2017

Automatic Identification of Samples in Hip Hop Music

Chairs: Josep Lladós (CVC, Universitat Autònoma de Barcelona)

This is why when you come close to dance music being played, the first thing that you hear is the boom-boom-boom of the kick drum.

Classroom. Chapter 2: Lesson 12

Music Curriculum Kindergarten

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

CS 591 S1 Computational Audio

Transcription:

Experimenting with Musically Motivated Convolutional Neural Networks Jordi Pons 1, Thomas Lidy 2 and Xavier Serra 1 1 Music Technology Group, Universitat Pompeu Fabra, Barcelona 2 Institute of Software Technology and Interactive Systems, TU Wien January 23, 2017 Download the paper CLICKING HERE! J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 1 / 18

Outline Motivation Motivation J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 2 / 18

Input representation: why log-mel spectrograms for CNN? Input spectrograms for CNN: interpretable filters in time and frequency! N=80 1.88 seg. J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 3 / 18

Squared/rectangular filters: inertia from computer vision. Are these efficiently representing the relevant local stationaries in music data? J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 4 / 18

Squared/rectangular filters: inertia from computer vision. Are these efficiently representing the relevant local stationaries in music data? J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 5 / 18

CNNs modeling music data? Three filter shapes are discussed in the following: 1 Squared/rectangular filters 2 Temporal filters 3 Frequency filters Which musical concepts can these filters model? J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 6 / 18

Temporal filters (1-by-n): Setting m = 1, NO frequency features but temporal cues. Filters can learn musical concepts at different time-scales depending how n is set, i.e.: Onsets, attack-sustain-release: n N. BPM and rhythm patterns: n N. J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 7 / 18

Frequency filters (m-by-1): Setting n = 1, frequency features but NO temporal cues. Filters can learn different aspects depending how m is set, i.e.: Timbre + note: m = M. Similar to NMF!! Timbre: m < M. J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 8 / 18

Squared/rectangular filters (m-by-n filters) learning time and frequency features at the same time. Filters can learn different aspects depending how m/n are set, i.e.: Bass or kick modeling: m M and n N. Represented by a sub-band for a short-time. Cymbals or snare drums modeling: m = M and n N. Broad in frequency with a fixed decay time. J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 9 / 18

Architectures J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 10 / 18

Architectures J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 11 / 18

Architectures Joint architecture: Time-Frequency J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 12 / 18

Black-box and time architectures Filter shape Accuracy: mean ± std Architecture (m,n) # param. 10 cross-fold validation Black-box (12,8) 3.275.312 87.25 ± 3.39 % Time (1,60) 7.336 81.79 ± 4.72 % Frequency (32,1) 3.368 59.59 ± 5.82 % Frequency (36,1) 2.472 57.88 ± 5.38 % Frequency (40,1) 1.576 52.43 ± 5.63 % Time-Frequency (1,60)-(32,1) 196.816 86.54 ± 4.29 % Time-FrequencyInit (1,60)-(32,1) 196.816 87.68 ± 4.44 % 1 Genre classification task with the Ballroom dataset. 93.12% (Marchand et al.) using time and frequency cues 82.3% (Gouyon et al.) using only time cues 15.9% predicting most probable class 2 Black-box and Time-Frequency architecture achieve inferior results than the state-of-the-art. 3 Time architecture achieve equivalent results as its baseline. 4 Frequency architecture outperforms the random baseline: frequency features are more relevant than expected. J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 13 / 18

Pitch invariance experiment Filter shape Accuracy: mean ± std Architecture (m,n) # param. 10 cross-fold validation Black-box (12,8) 3.275.312 87.25 ± 3.39 % Time (1,60) 7.336 81.79 ± 4.72 % Frequency (32,1) 3.368 59.59 ± 5.82 % Frequency (36,1) 2.472 57.88 ± 5.38 % Frequency (40,1) 1.576 52.43 ± 5.63 % Time-Frequency (1,60)-(32,1) 196.816 86.54 ± 4.29 % Time-FrequencyInit (1,60)-(32,1) 196.816 87.68 ± 4.44 % 1 Designing the filters such that they can convolve in frequency (m < M), helps predicting the Ballroom classes. because is pitch invariant? because the network is more expressive? J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 14 / 18

Time-Frequency experiment Filter shape Accuracy: mean ± std Architecture (m,n) # param. 10 cross-fold validation Black-box (12,8) 3.275.312 87.25 ± 3.39 % Time (1,60) 7.336 81.79 ± 4.72 % Frequency (32,1) 3.368 59.59 ± 5.82 % Frequency (36,1) 2.472 57.88 ± 5.38 % Frequency (40,1) 1.576 52.43 ± 5.63 % Time-Frequency (1,60)-(32,1) 196.816 86.54 ± 4.29 % Time-FrequencyInit (1,60)-(32,1) 196.816 87.68 ± 4.44 1 Pre-initializing the weights is beneficial. 2 With a much less expressive network, Time-FrequencyInit, we obtain similar accuracy results than Black-box. we propose efficient way of representing music data. J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 15 / 18

We have discussed how several CNNs filter shapes can model musical aspects. We have proposed some musically motivated deep learning architectures. We have shown that these can achieve competitive results on predicting the Ballroom dataset classes. understand what the architectures are learning efficient way of representing musical concepts J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 16 / 18

Thanks! Reproduce our research, the code is available: github.com/jordipons/cbmi2016/ Experimenting with Musically Motivated Convolutional Neural Networks Jordi Pons 1, Thomas Lidy 2 and Xavier Serra 1 1 Music Technology Group, Universitat Pompeu Fabra, Barcelona 2 Institute of Software Technology and Interactive Systems, TU Wien J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 17 / 18

Ballroom dataset 698 songs. 30 seconds long 8 music genres: cha-cha-cha, jive, quickstep, rumba, samba, tango, viennese-waltz and slow-waltz. J. Pons, T. Lidy and X. Serra January 23, 2017 Experimenting with Musically Motivated CNN 18 / 18