Deep learning for music data processing

Similar documents
Experimenting with Musically Motivated Convolutional Neural Networks

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

Lecture 9 Source Separation

Singer Traits Identification using Deep Neural Network

Audio spectrogram representations for processing with Convolutional Neural Networks

Music genre classification using a hierarchical long short term memory (LSTM) model

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

JOINT BEAT AND DOWNBEAT TRACKING WITH RECURRENT NEURAL NETWORKS

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.sd] 9 Dec 2017

Audio: Generation & Extraction. Charu Jaiswal

Joint bottom-up/top-down machine learning structures to simulate human audition and musical creativity

CONDITIONING DEEP GENERATIVE RAW AUDIO MODELS FOR STRUCTURED AUTOMATIC MUSIC

Real-valued parametric conditioning of an RNN for interactive sound synthesis

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Towards End-to-End Raw Audio Music Synthesis

arxiv: v1 [cs.sd] 18 Oct 2017

Music Composition with RNN

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Introductions to Music Information Retrieval

Representations of Sound in Deep Learning of Audio Features from Music

Lecture 10 Harmonic/Percussive Separation

Classical Music Generation in Distinct Dastgahs with AlimNet ACGAN

The Million Song Dataset

THE COMPOSITIONAL HIERARCHICAL MODEL FOR MUSIC INFORMATION RETRIEVAL

arxiv: v1 [cs.lg] 16 Dec 2017

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Music BCI ( )

MUSI-6201 Computational Music Analysis

arxiv: v2 [cs.sd] 31 Mar 2017

Music Information Retrieval

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Improving singing voice separation using attribute-aware deep network

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

A Survey of Audio-Based Music Classification and Annotation

arxiv: v1 [cs.sd] 5 Apr 2017

Rhythm related MIR tasks

Shimon the Robot Film Composer and DeepScore

Audio Structure Analysis

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

An Introduction to Deep Image Aesthetics

Music Genre Classification and Variance Comparison on Number of Genres

Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks with a Novel Image-Based Representation

LSTM Neural Style Transfer in Music Using Computational Musicology

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

Robert Alexandru Dobre, Cristian Negrescu

Voice & Music Pattern Extraction: A Review

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

DOWNBEAT TRACKING USING BEAT-SYNCHRONOUS FEATURES AND RECURRENT NEURAL NETWORKS

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

An AI Approach to Automatic Natural Music Transcription

The song remains the same: identifying versions of the same piece using tonal descriptors

Blues Improviser. Greg Nelson Nam Nguyen

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Further Topics in MIR

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

DOWNBEAT TRACKING WITH MULTIPLE FEATURES AND DEEP NEURAL NETWORKS

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

arxiv: v1 [cs.sd] 17 Dec 2018

arxiv: v2 [stat.ml] 17 Nov 2017

2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY

Quantitative Emotion in the Avett Brother s I and Love and You. has been around since the prehistoric eras of our world. Since its creation, it has

SentiMozart: Music Generation based on Emotions

Representations in Deep Neural Nets. Paul Humphreys July

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

Singing voice synthesis based on deep neural networks

Music Information Retrieval

Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models

Automatic Music Genre Classification

Audio Structure Analysis

Using Deep Learning to Annotate Karaoke Songs

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

BAYESIAN METER TRACKING ON LEARNED SIGNAL REPRESENTATIONS

PULSE-DEPENDENT ANALYSES OF PERCUSSIVE MUSIC

Algorithmic Music Composition using Recurrent Neural Networking

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

Deep Neural Networks in MIR


Music Structure Analysis

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Experiments on musical instrument separation using multiplecause

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS

Modeling Musical Context Using Word2vec

MUSIC TRANSCRIPTION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Chord Classification of an Audio Signal using Artificial Neural Network

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

A Survey on: Sound Source Separation Methods

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

EXPLORING DATA AUGMENTATION FOR IMPROVED SINGING VOICE DETECTION WITH NEURAL NETWORKS

Laugh when you re winning

Image-to-Markup Generation with Coarse-to-Fine Attention

Autumn. A: Plan, develop and deliver a music product B: Promote a music product C: Review the management of a music product

Transcription:

Deep learning for music data processing A personal (re)view of the state-of-the-art Jordi Pons www.jordipons.me Music Technology Group, DTIC, Universitat Pompeu Fabra, Barcelona. 31st January 2017 Jordi Pons 31st January 2017 Deep learning for music data processing 1 / 33

What problems do we care about in music technology research? (Automatically) cataloging large-scale music collections. Music recommendation. Similarity ie. Shanzam. Synthesis: instruments, singing voice.... Some of them can be approached with deep learning. Jordi Pons 31st January 2017 Deep learning for music data processing 2 / 33

Why deep learning might be useful for music data processing? Music is hierarchic in frequency (note, chord) and time (onset, rhythm) and deep learning naturally allows this representation. Contextual analysis Short time-scale features: CNNs - ie. note, chords. Long time-scale features: RNNs - ie. structure. Unsupervised learning: potential of learning from any audio! Time/frequency invariant operations: max-pool. Any input: spectrogram, MFCCs, self-similarity matrices, video, text. Jordi Pons 31st January 2017 Deep learning for music data processing 3 / 33

Acronyms: MLP: multi layer perceptron feed-forward neural network. RNN: recurrent neural network. LSTM: long-short term memory. CNN: convolutional neural network. Assumed notion of deep learning: It is deep when several non-linearities are applied to the input. The parameters of the network are learnt: typically by using back-propagation. Jordi Pons 31st January 2017 Deep learning for music data processing 4 / 33

Chronology: the big picture Jordi Pons 31st January 2017 Deep learning for music data processing 5 / 33

Chronology: the big picture Jordi Pons 31st January 2017 Deep learning for music data processing 6 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 7 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 8 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 9 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 10 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 11 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 12 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 13 / 33

Jordi Pons 31st January 2017 Deep learning for music data processing 14 / 33

Used for: Classification: genre, artist, singing-voice detection, music-speech. Pons et al., Lidy et al. Auto-tagging. Dieleman et al., Choi et al. Key estimation. Humphrey et al., Korzeniowski et al. Feature extraction (unsupervised). Hamel et al., Lee et al. Music similarity estimation. Schlüter et al. Music recommendation. Aäron van den Oord et al. Onset/boundary detection. Böck et al., Durand et al. Source separation. Huang et al., Miron et al. Singing voice synthesis. Blaauw et al. Jordi Pons 31st January 2017 Deep learning for music data processing 15 / 33

Chronology: the big picture Jordi Pons 31st January 2017 Deep learning for music data processing 16 / 33

LSTMs for automatic music composition with symbolic data Eck and Schmidhuber. Learning The Long-Term Structure of the Blues. ICANN 02..compositions are quite pleasant Some examples of music composed by LSTMs: 1 Bob Sturm plays: The Mal s Copporim. 2 LSTMetallica: Drums from Metallica. Choi et al. 3 LSTM Realbook: Generation of Jazz chord progressions. Jordi Pons 31st January 2017 Deep learning for music data processing 17 / 33

CNNs interpretation and filter shapes discussion S. Dieleman. http://benanne.github.io/2014/08/05/spotify-cnns.html Content-based music recommendation @ Spotify. CNN is learning (music) hierarchical features: L1 Vibrato, vocal thirds, bass drums, A/Bb pitch, A/Am chord. L3 Christian rock, Chinese pop, 8-bit, multimodal. Jordi Pons 31st January 2017 Deep learning for music data processing 18 / 33

Lee et al. Unsupervised feature learning for audio classification using convolutional deep belief networks. NIPS 09 Visualization of some randomly selected first-layer convolutional filters trained with music. Jordi Pons 31st January 2017 Deep learning for music data processing 19 / 33

Lee et al. Unsupervised feature learning for audio classification using convolutional deep belief networks. NIPS 09 Visualization of the four different phonemes and their corresponding first-layer convolutional filters trained with speech. Jordi Pons 31st January 2017 Deep learning for music data processing 20 / 33

Choi et al. Explaining Deep CNNs on Music Classification. arxiv:1607.02444 Figure : Filters of the first CNN layer trained for genre classification Layer 1 : onsets. Layer 2 : onsets, bass, harmonics, melody. Layer 3 : onsets, melody, kick, percussion. Layer 4 : harmonic structures, notes, vertical-horizontal lines. Layer 5 : textures, harmo-rhythmic patterns. 3x3 filters are limiting the representational power of the 1st layer! Does it make sense then to use computer vision architectures? as in: Hershey et al. CNN architectures for large-scale audio classification. ICASSP 17 Jordi Pons 31st January 2017 Deep learning for music data processing 21 / 33

Pons et al. Experimenting with musically motivated CNNs. CBMI 16 Squared/rectangular filters (m-by-n): kick, notes: m M and n N Temporal filters (1-by-n): onsets, patterns....very efficient! Frequency filters (m-by-1): timbre, chords....interpretable! Jordi Pons 31st January 2017 Deep learning for music data processing 22 / 33

Pons et al. Experimenting with musically motivated CNNs. CBMI 16 Pons & Serra. Designing efficient architectures for modeling temporal features with CNNs. ICASSP 17 Jordi Pons 31st January 2017 Deep learning for music data processing 23 / 33

in collaboration with Thomas Lidy: CNNs (12x8, 1x80, 40x1) white > black Jordi Pons 31st January 2017 Deep learning for music data processing 24 / 33

Source Separation Jordi Pons 31st January 2017 Deep learning for music data processing 25 / 33

Po-Sen Huang et al. Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks ISMIR 14 3 deep layers (2nd recurrent) estimating 2 sources simultaneously. Joint modelling of DRNN + mask with a discriminative cost. Jordi Pons 31st January 2017 Deep learning for music data processing 26 / 33

Chandna et al. Monoaural audio source separation using deep convolutional neural networks. LVA-ICA 17 Presented to Signal Separation Evaluation Campaign 2017. Jordi Pons 31st January 2017 Deep learning for music data processing 27 / 33

End-to-end learning S. Dieleman and B. Schrauwen. End-to-end learning for music audio. ICASSP 14 Learning frequency selective filters similar to MEL filter bank. Jordi Pons 31st January 2017 Deep learning for music data processing 28 / 33

Aäron van den Oord et al. Wavenet: A generative model for raw audio. arxiv:1609.03499 (2016) Generative model for speech and music audio. Jordi Pons 31st January 2017 Deep learning for music data processing 29 / 33

Chronology: the big picture Jordi Pons 31st January 2017 Deep learning for music data processing 30 / 33

Limitations the academic music technology community is facing when approaching their problems with deep learning: Lack of annotated data. Lack of hardware (GPUs) Expertise goes to the industry. Jordi Pons 31st January 2017 Deep learning for music data processing 31 / 33

Limitations the academic music technology community is facing when approaching their problems with deep learning: Lack of annotated data. Lack of hardware (GPUs) Expertise goes to the industry. Trends for solving the issue of annotated data: Collaborative effort for jointly annotating music data. Artificial augmentation of the annotated data. Jordi Pons 31st January 2017 Deep learning for music data processing 31 / 33

Limitations the academic music technology community is facing when approaching their problems with deep learning: Lack of annotated data. Lack of hardware (GPUs) Expertise goes to the industry. Trends for solving the issue of annotated data: Collaborative effort for jointly annotating music data. Artificial augmentation of the annotated data. Trends for solving hardware limitations: Researchers avoid end-to-end learning approaches: Inputting hand-crafted features to deep networks. Using non deep learning classifiers/models stacked on top of deep learning feature extractors. Constraining the solution space considering prior information: music nature or human audio perception. Jordi Pons 31st January 2017 Deep learning for music data processing 31 / 33

Limitations the academic music technology community is facing when approaching their problems with deep learning: Lack of annotated data. Lack of hardware (GPUs) Expertise goes to the industry. Trends for solving the issue of annotated data: Collaborative effort for jointly annotating music data. Artificial augmentation of the annotated data. Trends for solving hardware limitations: Researchers avoid end-to-end learning approaches: Inputting hand-crafted features to deep networks. Using non deep learning classifiers/models stacked on top of deep learning feature extractors. Constraining the solution space considering prior information: music nature or human audio perception. References @ jordipons.me/lack-of-annotated-music-data-restrict-the-solution-space/ Jordi Pons 31st January 2017 Deep learning for music data processing 31 / 33

Imaginable research directions? End-to-end learning from raw audio. Aytar et al. SoundNet: Learning Sound Representations from Unlabeled Video. @ NIPS 16 Multimodal deep processing. Slizovskaia et al. Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. @ SMC 16 Unsupervised learning such as generative models. Aaron van den Oord et al. Wavenet: A generative model for raw audio. @ arxiv:1609.03499 (2016) Efficient learning long-term dependencies. Eck and Schmidhuber. Learning The Long-Term Structure of the Blues. @ICANN02 Understanding which features are learnt. Pons et al. Experimenting with musically motivated convolutional NNs. @ CBMI 16 Jordi Pons 31st January 2017 Deep learning for music data processing 32 / 33

Thanks! :) Deep learning for music data processing A personal (re)view of the state-of-the-art Jordi Pons www.jordipons.me Music Technology Group, DTIC, Universitat Pompeu Fabra, Barcelona. 31st January 2017 Jordi Pons 31st January 2017 Deep learning for music data processing 33 / 33