Joint bottom-up/top-down machine learning structures to simulate human audition and musical creativity

Similar documents
Deep learning for music data processing

Motivation, Microdrives and Microgoals in Mockingbird

A SYSTEM FOR MUSICAL IMPROVISATION COMBINING SONIC GESTURE RECOGNITION AND GENETIC ALGORITHMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Music Understanding and the Future of Music

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Query By Humming: Finding Songs in a Polyphonic Database

TongArk: a Human-Machine Ensemble

Automatic Labelling of tabla signals

Computational Modelling of Harmony

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

Methods to measure stage acoustic parameters: overview and future research

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Interacting with a Virtual Conductor

Introductions to Music Information Retrieval

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

A System for Acoustic Chord Transcription and Key Extraction from Audio Using Hidden Markov models Trained on Synthesized Audio

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Acoustic Scene Classification

Automatic music transcription

A Bayesian Network for Real-Time Musical Accompaniment

jsymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada

Automatic Laughter Detection

Topics in Computer Music Instrument Identification. Ioanna Karydi

jsymbolic 2: New Developments and Research Opportunities

Week 14 Music Understanding and Classification

gresearch Focus Cognitive Sciences

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

What is proximity, how do early reflections and reverberation affect it, and can it be studied with LOC and existing binaural data?

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

Music Radar: A Web-based Query by Humming System

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

Hidden Markov Model based dance recognition

An Empirical Comparison of Tempo Trackers

Topic 10. Multi-pitch Analysis

PLOrk Beat Science 2.0 NIME 2009 club submission by Ge Wang and Rebecca Fiebrink

Current Research in Systematic Musicology

Singer Traits Identification using Deep Neural Network

Music BCI ( )

Melody Retrieval On The Web

DYNAMIC AUDITORY CUES FOR EVENT IMPORTANCE LEVEL

Music Information Retrieval with Temporal Features and Timbre

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

Laugh when you re winning

158 ACTION AND PERCEPTION

Measuring & Modeling Musical Expression

Automatic Laughter Detection

Singing voice synthesis based on deep neural networks

A prototype system for rule-based expressive modifications of audio recordings

Brain.fm Theory & Process

Computers Composing Music: An Artistic Utilization of Hidden Markov Models for Music Composition

Various Artificial Intelligence Techniques For Automated Melody Generation

A Discriminative Approach to Topic-based Citation Recommendation

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

Statistical Modeling and Retrieval of Polyphonic Music

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

BayesianBand: Jam Session System based on Mutual Prediction by User and System

Improving Frame Based Automatic Laughter Detection

An AI Approach to Automatic Natural Music Transcription

Speech and Speaker Recognition for the Command of an Industrial Robot

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

ακούειν Acoustics Electro-Acoustics Communication Acoustics Communication Acoustics

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Automatic Music Genre Classification

A Beat Tracking System for Audio Signals

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection

Next Generation Software Solution for Sound Engineering

Polyphonic music transcription through dynamic networks and spectral pattern identification

Concert halls conveyors of musical expressions

Preference of reverberation time for musicians and audience of the Javanese traditional gamelan music

CS 7643: Deep Learning

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Investigation into Background Noise Conditions During Music Performance

Transcription of the Singing Melody in Polyphonic Music

Part II: Dipping Your Toes Fingers into Music Basics Part IV: Moving into More-Advanced Keyboard Features

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

The Million Song Dataset

Shades of Music. Projektarbeit

TECHNIQUES FOR AUTOMATIC MUSIC TRANSCRIPTION. Juan Pablo Bello, Giuliano Monti and Mark Sandler

Classification of Iranian traditional musical modes (DASTGÄH) with artificial neural network

MUSI-6201 Computational Music Analysis

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

The Human Features of Music.

Music and Brain Symposium 2013: Hearing Voices. Acoustics of Imaginary Sound Chris Chafe

Robert Rowe MACHINE MUSICIANSHIP

Chords not required: Incorporating horizontal and vertical aspects independently in a computer improvisation algorithm

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

MASTER'S THESIS. Listener Envelopment

Transcription:

Joint bottom-up/top-down machine learning structures to simulate human audition and musical creativity Jonas Braasch Director of Operations, Professor, School of Architecture Rensselaer Polytechnic Institute, Troy, NY

CAIRA THE CREATIVE ARTIFICIALLY INTUITIVE AND REASONING AGENT Pauline Oliveros Selmer Bringsjord goal driven Cognition Rational Thinking deductive Top down Bottom up process driven Intuitive Thinking Sensation inductive

CAIRA (started 2008) THE CREATIVE ARTIFICIALLY INTUITIVE AND REASONING AGENT J. Braasch, S. Bringsjord P. Oliveros, D. Van Nort Agent uses Auditory Scene Analysis algorithms to extract low-level acoustic features HMM-based machine listening tools for texture analysis Genetic Algorithms for the creation of new material Logic-Based Reasoning to make cognitive decisions Dummy head Braasch, Bringsjord, Oliveros, Van Nort

Cognition High Level 1 st and higher order logic Deep Neural Network Top down Mid Level HMM, EMD Neural Networks Bottom up Low Level Auditory Sensing Auditory Signal Processing Sensation

Cognition High Level 1 st and higher order logic Deep Neural Network Top down Mid Level HMM, EMD Neural Networks Bottom up Low Level Auditory Sensing Auditory Signal Processing Sensation

frequency autocorrelation autocorrelation autocorrelation autocorrelation autocorrelation autocorrelation autocorrelation autocorrelation Duplex Pitch Perception Model J. Braasch, D. Dahlbom BP filter BP filter BP filter BP filter BP filter BP filter BP filter BP filter Pitch Analysis Model +: strong pitch cue/within band x: strong pitch cue/outside band o: weak pitch cue/within band *: weak pitch cue/outside band from Zigmond et al., 1999 Major chord with 1/f tone complexes (+880 Hz tone) Braasch et al. Springer 2017

Binaurally Integrated Cross-correlation/ Auto-correlation Mechanism (BICAM) Motivation Architecture A) B) Binaural Activity Pattern Step 1: Calculate 2 Autocorrelation functions Step 3: Cross-Correlate both functions (window) move by k d 2 nd -Layer Cross-correlation R xx (m) R yy (m) No ITD L R Time R xx (m) R xy (m) k d L R Time AC L CC LR CC RL AC R Step 2: Calculate Cross- Correlation function Step 4: Replace CC function with AC function Patent: US10068586B2 JASA 2016 Bandpass filter bank Left Ear Signal Bandpass filter bank Right Ear Signal R xx (m) R xy (m) L Time R R xx (m) R yy (m) k d ITD D ITD R1 ITD R2 L R Time

BICAM Localization Results using Deep Neural Networks N. Deshpande J. Braasch Binaural Activity Pattern Binaural Activity Map Task Training Set Validation Testing Set Dir Lat (only) 98.2% 98.98% 98.6% Dir Lat (w/ refl) 94.0% 92.9% 92.4% Time Delay 98.6% 98.4% 98.0% Ref Lat 85.9% 85.9% 84.6% Methods Data Set: Reverberated Speech BICAM output Apple TuriCreate API 5 Lateral positions, 4 Delay conditions

Cognition High Level 1 st and higher order logic Deep Neural Network Top down Mid Level HMM, EMD Neural Networks Bottom up Low Level Auditory Sensing Auditory Signal Processing Sensation

Sonic Gesture Recognizer FILTER System Overview for audio including feature extraction, recognition and mapping to GA process. Sampling of State Sequences by HMM Recognizer: all eight features are combined to form one state. D. Van Nort, J. Braasch, P. Oliveros (2009), SMC

Cognition High Level 1 st and higher order logic Deep Neural Network Top down Mid Level HMM, EMD Neural Networks Bottom up Low Level Auditory Sensing Auditory Signal Processing Sensation

Computation of a tension arc J. Braasch, S. Bringsjord P. Oliveros, D. Van Nort Task: Agent will determine the solo/tutti Constellation in a Free-Music Ensemble Focuses on Tension data, which correlates strongly with Loudness Musician A Musician B CAIRA T=L+0.5 ((1 b) R+b I+O)) with I the information rate, and O the onset rate. Note that all parameters, L, R, I, O, are normalized between 0 and 1 and the exponential relationships between the input parameters and T are also factored into these variables. Loudness (0.906 correlation) Loudness & Roughness (0.915 correlation) Patent: US10032443B2 Braasch et al., Springer 2017

CAIRA s First-order Logic J. Braasch, S. Bringsjord P. Oliveros, D. Van Nort The agent needs to know about dynamic levels We need at least two musicians to form an ensemble At a given time slot, we have either a solo or an ensemble part If we have only one musician, he/she must play a solo

pitch MIKA Agent (RPI/IBM AIRC project) C. Bahn, J. Braasch, M. Goodheart, M. Simoni, N. Keil, J. Stewart, J. Yang Input Model Analysis Cognitive Functions Top-down logicreasoning using rulebased internal world representation Mid-level Learning Function Bottom-up driven statistical learning approaches Auditory Periphery and Mid-Brain Acoustic Sensing Binaural manikin, Nearfield microphone First-order logical reasoning based on Music Theory Manual and semi-automated ontology creation Bayesian Networks Deep Learning Networks Hidden Markov Models (HMM) Pitch extraction On/offset detection Timbre analysis Beat analysis Loudness estimation Polyphonic analysis Output (knowledge acquisition and creation) Understanding development of Theory and Performance Practice over time - Past Analysis Analysis of theoretical works Analysis of interpretation of selected jazz standard over time Back Home Again in Indiana Summertime Stella by Starlight Donna Lee Analysis of leadsheets and transcription from MIDI file database Future Projection Extension of theoretical corpus New forms of compositions and performance practice New music generation Recording analysis from database 1900 1925 1950 1975 2000 Blues Ragtime Swing Bebop Cool Hardbop Fusion Avantgarde Future Jazz First NN results Piano Roll: Green: in scale notes Red: out of scale notes time 12-bar blues

Environment Monitoring M. Morgan, J. Braasch American crow calls Red-winged blackbird calls siren airplane NN: Multilayer Perceptron (MLP) CNN: Convolutional Neural Networks (CNN) TensorFlow Inception-v3 pretrained ImageNet 4000 training steps

Training Data Collection Synthetic Data Generation using Synthetic Sound Fields and/or MIDI orchestration Interactive User Tracking In virtual Reality @ CRAIVE-Lab 24/7 Spatial AV/ Recordings AV recording at Lake George Jefferson Project CISL infrastructure J. Braasch N. Keil AIRCC project Team J. Braasch, B. Chang, J. Goebel, R. Radke, Q. Ji CISL project Team Mallory Morgan, RPI Vincent W Moriarty, IBM Jonas Braasch, RPI CISL and JP teams

Next Directions Combination of 1. signal processing 2. statistical models 3. neural networks & logic Automatic creation of ontologies Automatic creation of acoustic scene databases Rule discovery in systems where rule violations are (i) allowed, (ii) strategically used, and (iii) changed over time. Multi-modal analysis