Audio: Generation & Extraction. Charu Jaiswal

Similar documents
Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks

Music Composition with RNN

Learning Musical Structure Directly from Sequences of Music

LSTM Neural Style Transfer in Music Using Computational Musicology

arxiv: v1 [cs.lg] 15 Jun 2016

RoboMozart: Generating music using LSTM networks trained per-tick on a MIDI collection with short music segments as input.

Recurrent Neural Networks and Pitch Representations for Music Tasks

Some researchers in the computational sciences have considered music computation, including music reproduction

Deep learning for music data processing

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

CS229 Project Report Polyphonic Piano Transcription

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

Rewind: A Music Transcription Method

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Music Genre Classification and Variance Comparison on Number of Genres

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

Generating Music with Recurrent Neural Networks

An AI Approach to Automatic Natural Music Transcription

Lecture 9 Source Separation

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

CSC475 Music Information Retrieval

BachBot: Automatic composition in the style of Bach chorales

YEAR 5 AUTUMN 1. Working with pentatonic scales

Composing a melody with long-short term memory (LSTM) Recurrent Neural Networks. Konstantin Lackner

Voice & Music Pattern Extraction: A Review

Singer Traits Identification using Deep Neural Network

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

CSC475 Music Information Retrieval

Outline. Why do we classify? Audio Classification

JazzGAN: Improvising with Generative Adversarial Networks

CS 591 S1 Computational Audio

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Algorithmic Music Composition using Recurrent Neural Networking

DCI Requirements Image - Dynamics

Audio Feature Extraction for Corpus Analysis

Music Composition Using Recurrent Neural Networks and Evolutionary Algorithms

arxiv: v2 [cs.sd] 31 Mar 2017

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach Nikhil Kotecha Columbia University

MUSIC scores are the main medium for transmitting music. In the past, the scores started being handwritten, later they

arxiv: v1 [cs.ai] 2 Mar 2017

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

CONDITIONING DEEP GENERATIVE RAW AUDIO MODELS FOR STRUCTURED AUTOMATIC MUSIC

CREATING all forms of art [1], [2], [3], [4], including

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Music Representations

Singer Identification

The Sparsity of Simple Recurrent Networks in Musical Structure Learning

Effects of acoustic degradations on cover song recognition

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Using Deep Learning to Annotate Karaoke Songs

Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks with a Novel Image-Based Representation

Research Projects. Measuring music similarity and recommending music. Douglas Eck Research Statement 2

Modeling Musical Context Using Word2vec

Release Year Prediction for Songs

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Jammer: A Music Generation Model

JOINT BEAT AND DOWNBEAT TRACKING WITH RECURRENT NEURAL NETWORKS

EDDY CURRENT IMAGE PROCESSING FOR CRACK SIZE CHARACTERIZATION

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS

Pattern Discovery and Matching in Polyphonic Music and Other Multidimensional Datasets

EVALUATION OF A SCORE-INFORMED SOURCE SEPARATION SYSTEM

arxiv: v1 [cs.sd] 8 Jun 2016

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

DETAILED TEST RESULTS ON SEVEN TOWNSVILLE KONGSBERG TARGETS

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Introductions to Music Information Retrieval

IMPLEMENTING AI-AIDED CONTENT DISTRIBUTION STRATEGIES IN THE GDPR ERA

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

Generating Spectrally Rich Data Sets Using Adaptive Band Synthesis Interpolation

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network

Part 4: Introduction to Sequential Logic. Basic Sequential structure. Positive-edge-triggered D flip-flop. Flip-flops classified by inputs

Chord Classification of an Audio Signal using Artificial Neural Network

Robert Alexandru Dobre, Cristian Negrescu

A Unit Selection Methodology for Music Generation Using Deep Neural Networks

A probabilistic approach to determining bass voice leading in melodic harmonisation

Image-to-Markup Generation with Coarse-to-Fine Attention

Automatic Construction of Synthetic Musical Instruments and Performers

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Experiments on musical instrument separation using multiplecause

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

arxiv: v1 [cs.sd] 11 Aug 2017

Single Channel Vocal Separation using Median Filtering and Factorisation Techniques

Speech To Song Classification

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

The Million Song Dataset

Notes on David Temperley s What s Key for Key? The Krumhansl-Schmuckler Key-Finding Algorithm Reconsidered By Carley Tanoue

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

Finding Sarcasm in Reddit Postings: A Deep Learning Approach

Various Artificial Intelligence Techniques For Automated Melody Generation

Automatic Piano Music Transcription

Comparison Parameters and Speaker Similarity Coincidence Criteria:

SentiMozart: Music Generation based on Emotions

Detecting Musical Key with Supervised Learning

Examining the Role of National Music Styles in the Works of Non-Native Composers. Katherine Vukovics Daniel Shanahan Louisiana State University

Transcription:

Audio: Generation & Extraction Charu Jaiswal

Music Composition which approach? Feed forward NN can t store information about past (or keep track of position in song) RNN as a single step predictor struggle with composition, too Vanishing gradients means error flow vanishes or grows exponentially Network can t deal with long-term dependencies But music is all about long-term dependencies! 2

Music Long-term dependencies define style: Spanning bars and notes contribute to metrical and phrasal structure How do we introduce structure at multiple levels? Eck and Schmidhuber àlstm 3

Why LSTM? Designed to obtain constant error flow through time Protect error from perturbations Uses linear units to overcome decay problems with RNN Input gate: protects flow from perturbation by irrelevant inputs Output gate: protects other units from perturbation from irrelevant memory Forget gate: reset memory cell when content is obsolete Hochreiter & Schmidhuber, 1997 4

Data Representation Chords : Only quarter notes No rests Notes: Training melodies written by Eck Dataset of 4096 segments Eck and Schmidhuber, 2002 5

Experiment 1- Learning Chords Objective: show that LSTM can learn/represent chord structure in the absence of melody Network: 4 cell blocks w/ 2 cells each are fully connected to each other + input Output layer is fully connected to all cells and to input layer Training & testing: predict probability of a note being on or off Use network predictions for ensuing time steps with decision threshold CAVEAT: treat outputs as statistically independent. This is untrue! (Issue #1) Result: generated chord sequences 6

Experiment 2 Learning Melody and Chords Can LSTM learn chord & melody structure, and use these structures for composition? Network: Difference for ex1. : chord cell blocks have recurrent connections to themselves + melody; melody cell blocks are only recurrently connected to melody Training: predict probability for a note to be on or off 7

Sample composition Training set: http://people.idsia.ch/~juergen/blues/train.32.mp3 Chord + melody sample: http://people.idsia.ch/~juergen/blues/lstm_0224_1510.32.mp3 8

Issues No objective way to judge quality of compositions Repetition and similarity to training set Considered notes to be independent Limited to quarter notes + no rests Uses symbolic representations (modified sheet notation) à how could it handle real time performance music (MIDI or audio) Would allow interaction (live improvisation) 9

Audio Extraction (source separation) How do we separate sources? Engineering approach: decompose mixed audio signal into spectrogram, assign time-frequency element to source Ideal binary mask: each element is attributed to source with largest magnitude in the source spectrogram This is then used to est. reference separation 10

DNN Approach Dataset: 63 pop songs (50 for training) binary mask computed: determined by comparing magnitudes of vocal/nonvocal spectrograms and assigning mask a 1 when vocal had greater mag 11

DNN Trained a feed-forward DNN to predict binary masks for separating vocal and non-vocal signals for a song Spectrogram window was unpacked into a vector Probabilistic binary mask: testing used sliding window, and output of model described predictions of binary mask in sliding window format Confidence threshhold (alpha): Mv binary mask 12

Separation of sources using DNN 13

Separation quality as a function of alpha SIR (red) = signal-tointerference ratio SDR(green) = signal-todistortion SAR(blue) = signal-toartefact SAR and SIR can be interpreted as energetic equivalents of positive hit rate (SIR) and false positive rate (SAR) 14

Like-to-like Comparison Plots mean SAR as a function of mean SIR for both models DNN provides ~3dB better SAR performance for a given SIR index mean, ~5dB for vocal and and only a small advantage for non-vocal signals DNN seems to have biased its learnings toward making good predictions via correct positive identification of vocal sounds 15

Critique of Paper + Next Steps DNN seems to have biased its learnings toward making good predictions via correct positive identification of vocal sounds Only a small advantage to using DNN vs. traditional approach Expand data set 16