Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley

Similar documents
Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Communication Avoiding Successive Band Reduction

Fooling the Masses with Performance Results: Old Classics & Some New Ideas

Data Driven Music Understanding

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.

Lecture 9 Source Separation

Voice & Music Pattern Extraction: A Review

PRACE Autumn School GPU Programming

Outline. Why do we classify? Audio Classification

Acoustic Instrument Message Specification

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Tempo Estimation and Manipulation

INTERNATIONAL TELECOMMUNICATION UNION. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Coding of moving video

Music Segmentation Using Markov Chain Methods

THE importance of music content analysis for musical

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Robert Alexandru Dobre, Cristian Negrescu

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Automatic Labelling of tabla signals

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

DESIGN PHILOSOPHY We had a Dream...

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan

ALONG with the progressive device scaling, semiconductor

Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Musical Hit Detection

Music Information Retrieval for Jazz

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Introductions to Music Information Retrieval

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

Content-based music retrieval

Scalability of MB-level Parallelism for H.264 Decoding

Rewind: A Music Transcription Method

2. AN INTROSPECTION OF THE MORPHING PROCESS

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Figure 1: Feature Vector Sequence Generator block diagram.

Introduction To LabVIEW and the DSP Board

Interacting with a Virtual Conductor

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Lab experience 1: Introduction to LabView

MUSIC/AUDIO ANALYSIS IN PYTHON. Vivek Jayaram

Major Differences Between the DT9847 Series Modules

MUSI-6201 Computational Music Analysis

Quartzlock Model A7-MX Close-in Phase Noise Measurement & Ultra Low Noise Allan Variance, Phase/Frequency Comparison

Music Source Separation

Lab 5 Linear Predictive Coding

Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era

Music Understanding and the Future of Music

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

CS 591 S1 Computational Audio

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

AUTOMATIC LICENSE PLATE RECOGNITION(ALPR) ON EMBEDDED SYSTEM

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

SPATIAL LIGHT MODULATORS

DSP First Lab 04: Synthesis of Sinusoidal Signals - Music Synthesis

CS229 Project Report Polyphonic Piano Transcription

FPGA Laboratory Assignment 4. Due Date: 06/11/2012

Implementation of a turbo codes test bed in the Simulink environment

Music Information Retrieval

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

VLSI System Testing. BIST Motivation

Music Genre Classification and Variance Comparison on Number of Genres

Automatic Construction of Synthetic Musical Instruments and Performers

CSC475 Music Information Retrieval

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Digital Signal Processing Detailed Course Outline

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

ni.com Digital Signal Processing for Every Application

Voxengo PHA-979 User Guide

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Detecting Musical Key with Supervised Learning

GALILEO Timing Receiver

Voxengo Soniformer User Guide

Spectrum Analyser Basics

Highly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU

Reconfigurable Neural Net Chip with 32K Connections

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Singer Traits Identification using Deep Neural Network

Supervised Learning in Genre Classification

COE328 Course Outline. Fall 2007

New Technologies: 4G/LTE, IOTs & OTTS WORKSHOP

Lecture 10 Harmonic/Percussive Separation

ISOMET. Compensation look-up-table (LUT) and Scan Uniformity

The Effect of Plate Deformable Mirror Actuator Grid Misalignment on the Compensation of Kolmogorov Turbulence

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

2 MHz Lock-In Amplifier

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

Implementation of Memory Based Multiplication Using Micro wind Software

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

Transcription:

Eric Battenberg and David Wessel Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley Microsoft Parallel Applications Workshop 28, 29 May 2009

Range of Apps Hundreds of apps and plug-ins Performance/Composition Music Information Retrieval Hearing Augmentation for Music 3D Sound: Speaker/Microphone Arrays 2

In this talk Background on music applications Insights into music and parallel computing Organizing Apps with Parallel Design Patterns Case study Parallelizing drum track extraction on OpenMP and CUDA Brainstorm The future of performance and retrieval 3

Music Performance and Composition Novel musical interfaces allow for accessible and interesting performances. Multi-Touch Array Designed by David Wessel, Adrian Freed, Rimas Avizienis, and Matthew Wright Tablo Designed by Adrian Freed Reactable Designed by Sergi Jordà, Marcos Alonso, Martin Kaltenbrunner and Günter Geiger 4

Music Performance and Composition It is becoming common for amateur musicians to create professional-quality music in a home studio or Digital Audio Workstation DAW = Personal computer Sound card/mixer Audio editing software + + 5

Music Performance and Composition The power of audio editing/processing software lies in its extensibility via plug-ins. In an audio processing chain, plug-ins can be composed in a task-parallel matter. When composed: Are they thread safe? Will they cause catastrophic performance conflicts? Will they appropriately share hardware resources with other programs? Audio plug-ins 6

Partitioning Hardware Resources What do we need from the OS? Tesselation: low-level resource allocation For music, we also need timing/deadline guarantees for real-time performance/processing What do we do with the allocated resources? Naïve composition of computational kernels can destroy performance. Lithe: Second-level application-aware low-level resource partitioning. 7

Music is inherently very parallel Multiple tracks, lines, voices, parts, channels, etc. But audio synchronization and timing are very important in parallel music apps. 8

Audio Synchronization/Timing The ear is verysensitive to timing. If tasks are processed on separate cores, delays can be introduced. If these delays are not compensated for, the sound quality can be adversely affected. Examples: Musical piece played without any delay Same piece with a copy added that is delayed by 1ms. We get a combing effect in the frequency domain. frequency response due to adding a copy delayed by 1ms magnitude response 0-5 -10-15 No delay 1ms delay -20 0 0.2 0.4 0.6 0.8 1 freq [Hz] 1.2 1.4 1.6 1.8 2 x 10 4 9

Open Sound Control (OSC) a way to achieve synchronization Communication protocol to share musical data over a network. Symbolic and high-resolution numeric argument data Pattern matching language to specify multiple recipients of a single message High resolution time tags for sub-sample accurate synchronization "Bundles" of messages whose effects must occur simultaneously (atomic updates) 10

MIR Apps Music Information Retrieval, Machine Listening, Music Understanding Transcription - Automatically generate a score or tablature from audio Source separation - Isolate certain instruments (including the singer) Similarity, Playlist creation, content discovery Automatically generate a playlist to fit a mood or based on song similarity. Artist, genre, mood classification or quantification Help organize a music archive Score Following, lyrics sync, beat tracking Useful for DJs, karaoke, music education, and automated accompaniment. Song Segmentation Partition song into discrete passages (verse, chorus, bridge) for individual analysis The hope is that someday you will be able to query for music like this: I like the drummer but can t stand the singer. Find me something in the same genre with drumming like this but with a singer that sounds more like John Lennon. 11

Case Study: Drum Track Extraction An example of source separation where the drum track is isolated. Useful in drum transcription, beat tracking, and rhythm analysis. Audio spectrogram is factorized into components using Non-negative Matrix Factorization (NMF). Components are classified using a Support Vector Machine (SVM). Percussive components are used to synthesize an audio drum track. NMF step is most computationally intensive. 80% of time in Matlab(18.5 sec of 23.1 sec total for 20 sec of audio) We will parallelize NMF using OpenMP (for multi-core) and CUDA (for GPUs) Input audio Spectral Feature Extraction Spectrogram NMF Time/frequency components Component Feature Extraction Audio Resynthesis Percussive components SVM Classifier Percussive features Drum track 12

Case Study: Drum Track Extraction Audio examples (listen for drums in original) Original 1 2 3 Drum Track Input audio Spectral Feature Extraction Spectrogram NMF Time/frequency components Component Feature Extraction Drum track Audio Resynthesis Percussive components SVM Classifier Percussive features 13

Case Study: Drum Track Extraction Use Non-negative Matrix Factorization to separate an audio spectrogram into sources. (X = W*H) Here we see a spectrogram surrounded by its time (H)and frequency (W) component matrices. (3 sources). The time components in Hare aligned with the corresponding drum score. 14

Case Study: Drum Track Extraction NMF is the optimization problem: A cost function that works well for music: Similar to Kullback-Leibler divergence Multiplicative gradient-based updates 15

Case Study: Drum Track Extraction For [512 x 30 x 3445] NMF, 512 frequency components, 30 sources, 3445 time frames (~20 sec) For each iteration we have: 423 Mflops of SGEMMs (Single-precision General Matrix Multiply) 3.6 Mflops of element-divides (slow) 0.1 Mflops element-multiplies 0.1 Mflops sums (requires communication) Also: Add a small constant to divisor matrices to prevent divide-by-zero. (Add EPS, 3.6 Mflops) Compute log-based cost function every 25 iterations to check for convergence. 16

Organizing Parallel Apps How can we organize the design of our applications? How can we best communicate our development process and computing demands to other applications experts? 17

Parallel Design Patterns Application developers are starting to adopt HPC jargon since science has been using parallel computing for decades. The Par Lab, led by Tim Mattson and Kurt Keutzer, is developing a parallel pattern language, OPL. OPL is hierarchical Higher-level patterns rely on the details contained in lowerlevel patterns Purpose of parallel pattern language. Education about best practices Common terminology Guides the design process. 18

Parallel Design Patterns Example design pattern decomposition for CUDA implementation of NMF The pattern language helps us organize our code. Each design pattern is described in a document, outlining best practices and giving pointers to helpful resources. W H SGEMM X W SGEMM Column sums Element -divide Elementdivide Elementadd Elementmult Pipe-and-Filter SGEMMs Map-Reduce Sums Element-wise arithmetic Dense Linear Algebra Graph Algorithms Data Parallel Geometric Decomposition Data Parallel Recursive Splitting Data Parallel Distributed Array SPMD Distributed Array SPMD Strict Data Parallel SIMD Coll. Sync SIMD Coll. Sync SIMD 19

OpenMP (the easy stuff) Data-parallel for loop To be used for element-wise arithmetic Create team of ntthreads to do independent chunks of work Reduction For sums Createteam of nt threads to compute partial sums Then addthe partial sums to final variable s 20

OpenMP (the easy stuff) We use MKL forsgemms Use OpenMP for other routines Performancescaling on dual-socket Core i7 920: SGEMMs show most significant speedup Highest work to communication ratio Non-linearspeedup suggests this won t scale well to more cores using this architecture and programming model. However, >7x speedup compared to Matlab >4xspeedup compared to sequential C 21

CUDA (some harder stuff) CUDA is used to program Nvidia GPUs for general computation. GPU code is executed by many threads independently in a SPMD manner. Threads grouped into a thread block can share memory. Threads are physically executed in groups of 32, called warps. If all threads within a warp do the same thing, we get SIMD. Below we see a kernel definition and invocation for vector addition. Kernel is invoked with B blocks of N threads. Each thread operates on one element of each array. The element index is computed from the thread ID, block ID, and block size corresponding to the running thread. 22

CUDA (some harder stuff) NMF Implementation in CUDA SGEMMs use CUBLAS 2.1, achieves 60% of peak (373 GFLOPS on GTX 280) Padding matrices to multiples of 32 reduces SGEMM running time by 26% Element-wise arithmetic similar to example code Reductions (sums) a lot harder in CUDA than OpenMP Use optimizations covered in CUDA SDK for shared memory reduction. Reorganize binary tree traversal. Loop unrolling, multiple reads per thread. Run the 30 sums concurrently. An important optimization. 57x speedup overall increasing optimization 23

CUDA vs. OpenMP CUDA achieves much higher performance on current GPUs for highly dataparallel computations. (>30x speedup compared to Matlab, 4x faster than OpenMP+Nehalem) OpenMPcan achieve multi-core speedup on data-parallel computations with very little programmer effort. If inter-thread communication is required, things become much more difficult. OpenMP gets harder. CUDA gets a lot harder. For music application developers, CUDA is only feasible for computational kernels that require very high performance. What about latency of going to GPU and back? We will be releasing Python modules based on these implementations. Can be used for general NMF as well. 24

An idea for the future: Analysis/Performance Hybrid Combine MIR analysis on a database of music in the cloud with audio synthesis techniques to create custom music controlled by gestural processing and personal preferences. Automatic Mash-ups/Remixes. Gestural music selection (e.g. at a party) As little or as much interaction as desired. Can be used in music performance or just for interactive listening. 25

Brainstorm: Interactive Musical Experience Audio Database Personal Preference + Collaborative Filtering Music Information Retrieval Controller Audio Synthesis /Playback Multi-touch interface User Input Sensors + Gestural Processing 26

Wrap There are tons of music applications. For both music fans and musicians. Parallel computing enables new music applications But synchronization and real-time are important. Parallel design patterns are useful for communicating ideas and organizing code. Questions? 27