Musical Hit Detection

Similar documents
Comparison Parameters and Speaker Similarity Coincidence Criteria:

Supervised Learning in Genre Classification

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Tempo Estimation and Manipulation

Neural Network for Music Instrument Identi cation

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.

MUSI-6201 Computational Music Analysis

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

CS229 Project Report Polyphonic Piano Transcription

Detecting Musical Key with Supervised Learning

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Automatic Rhythmic Notation from Single Voice Audio Sources

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Music Understanding and the Future of Music

Figure 1: Feature Vector Sequence Generator block diagram.

Music Genre Classification and Variance Comparison on Number of Genres

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

THE importance of music content analysis for musical

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Voice Controlled Car System

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Composer Style Attribution

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Outline. Why do we classify? Audio Classification

Subjective Similarity of Music: Data Collection for Individuality Analysis

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Drum Stroke Computing: Multimodal Signal Processing for Drum Stroke Identification and Performance Metrics

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Design of a Speaker Recognition Code using MATLAB

Speech Recognition and Signal Processing for Broadcast News Transcription

Speech and Speaker Recognition for the Command of an Industrial Robot

Hidden Markov Model based dance recognition

Please feel free to download the Demo application software from analogarts.com to help you follow this seminar.

Automatic Labelling of tabla signals

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Heart Rate Variability Preparing Data for Analysis Using AcqKnowledge

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

HIT SONG SCIENCE IS NOT YET A SCIENCE

Recommending Music for Language Learning: The Problem of Singing Voice Intelligibility

Topics in Computer Music Instrument Identification. Ioanna Karydi

Spectrum Analyser Basics

PS3$and$Physics$E.1bx$ Lab$2:$EKG$and$Sound$ 2015$ $

What s New in Raven May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven

Music 209 Advanced Topics in Computer Music Lecture 4 Time Warping

PulseCounter Neutron & Gamma Spectrometry Software Manual

Chapter 1. Introduction to Digital Signal Processing

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Transducers and Sensors

Pre-processing of revolution speed data in ArtemiS SUITE 1

ACTIVE SOUND DESIGN: VACUUM CLEANER

MUSICAL INSTRUMENTCLASSIFICATION USING MIRTOOLBOX

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1,

Robert Alexandru Dobre, Cristian Negrescu

VivoSense. User Manual Galvanic Skin Response (GSR) Analysis Module. VivoSense, Inc. Newport Beach, CA, USA Tel. (858) , Fax.

Motion Video Compression

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Real-time Granular Sampling Using the IRCAM Signal Processing Workstation. Cort Lippe IRCAM, 31 rue St-Merri, Paris, 75004, France

Doubletalk Detection

R&S CA210 Signal Analysis Software Offline analysis of recorded signals and wideband signal scenarios

Interacting with a Virtual Conductor

The Measurement Tools and What They Do

Dynamic Spectrum Mapper V2 (DSM V2) Plugin Manual

Experiments on musical instrument separation using multiplecause

Music Information Retrieval

Analyzing Modulated Signals with the V93000 Signal Analyzer Tool. Joe Kelly, Verigy, Inc.

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

Major Differences Between the DT9847 Series Modules

Latest Assessment of Seismic Station Observations (LASSO) Reference Guide and Tutorials

RECOMMENDATION ITU-R BT (Questions ITU-R 25/11, ITU-R 60/11 and ITU-R 61/11)

Improving Frame Based Automatic Laughter Detection

ARECENT emerging area of activity within the music information

NewsComm: A Hand-Held Device for Interactive Access to Structured Audio

Chord Classification of an Audio Signal using Artificial Neural Network

AUTOMASHUPPER: AN AUTOMATIC MULTI-SONG MASHUP SYSTEM

1.1 Digital Signal Processing Hands-on Lab Courses

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

MTL Software. Overview

Liquid Mix Plug-in. User Guide FA

ECE 5765 Modern Communication Fall 2005, UMD Experiment 10: PRBS Messages, Eye Patterns & Noise Simulation using PRBS

Monitoring of audio visual quality by key indicators

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

MELODY ANALYSIS FOR PREDICTION OF THE EMOTIONS CONVEYED BY SINHALA SONGS

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

MUSICAL NOTE AND INSTRUMENT CLASSIFICATION WITH LIKELIHOOD-FREQUENCY-TIME ANALYSIS AND SUPPORT VECTOR MACHINES

How to Describe a Sound Trademark in an Application (in the form of a staff)

SDR Implementation of Convolutional Encoder and Viterbi Decoder

Music Processing Introduction Meinard Müller

Transcription:

Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to provide aestheticallypleasing audio-synchronized graphics. In popular music, musical instrumentation changes known as hits are an important indicator of changes in the music s mood. Ideally, a visualizer should respond to a hit by also changing the mood of the displayed graphics to match the music. This project will focus on using machine learning techniques to detect hits, and therefore mood changes, in a song. 2 Approach 2.1 Data Collection Our approach utilizes supervised learning to train a hit detection algorithm. The supervised learning approach is used since it is easy for a human operator to label hits within a song. Additionally, a hit detector should operate on only a small segment of music data ahead of the current playback location, facilitating hit detection in streaming music. Thus, the learning algorithm only takes into account music data in the vicinity of hits. We start by selecting a set of songs containing strong mood changes and another set of songs without mood changes. As the song plays, a human operator marks hit/no-hit locations in the song. At each specified mark, two 5-second musical clips are extracted from the song: one clip ending at the mark (pre-mark clip) and one clip beginning at the mark (post-mark clip). These musical clips and their associated hit/no-hit labels are imported into MATLAB where they are fed into a supervised learning algorithm. Page 1 of 5

A GUI assists the marking of hit/no-hit locations in songs. Figure 1 shows a screenshot of the GUI. Figure 1: Music hit/no-hit marking GUI Dubbed mampe, the GUI allows the user to load mp3 songs and choose a data file to which hit/no-hit labels and clip file names are stored. The GUI also allows the user to play the song, and as the song comes across a hit/no-hit, the user may click on a set of buttons to automatically save the pre-mark/post-mark clips. 2.2 Feature Selection Changes in song mood generally correspond to changes in: Beat frequency Beat amplitude Instrumentation (current set of instruments playing) From the pre and post-mark clips, a feature describing the hit/no-hit status of the clips is calculated. We note that musical mood changes correspond to changes in music amplitude and instrumentation. While a change in amplitude can be assessed from the time domain, a change in instrumentation is expressed much more clearly in the frequency domain. In order to capture changes in amplitude and instrumentation with as few calculations as possible, the Power Spectral Densities (PSDs) of the pre and post-mark clips are used to construct the feature vector. As Figure 2 shows, the absolute value of the difference of the 129-point pre- and post-mark PSDs serves as our feature vector. It should be noted that deriving the feature from the PSD Page 2 of 5

Figure 2: Feature creation algorithm yields better performance than deriving the feature from the Fast Fourier Transform (FFT). In particular, Figure 3 shows that the PSD-feature s learning curve shows improvement over test set error as more training samples are added to the training set, while the FFT-feature s learning curve does not show this improvement. This result implies that the PSD contains information more relevant to hits than the FFT. Figure 3: Learning curve with FFT-based feature (left) and PSD-based feature (right) A likely explanation for the performance improvement is that the differenced PSD better captures changes in musical intensity (waveform energy) than the FFT, and sudden changes in musical intensity are very indicative of hits. 2.3 Training the Classifier The hit detector takes the form of a Support Vector Machine (SVM), trained using a regularized, kernelized SMO algorithm. To test the trained SVM, hold-out cross-validation was used - 20% of the total data set was randomly held out of the training data in order to calculate test error. Page 3 of 5

The kernelization was chosen after our initial SVM training using non-kernelized features yielded 30% training and test set error - an unacceptable level of performance. Since the training and test set error were similar, performance improvements could only be made by reducing bias. Thus, kernelization was used as a method to increase the dimensionality of our feature space. Several different kernels were evaluated, and a Gaussian kernel resulted in the best performance with 22% training and test set error. This performance was adequate to reliably detect hits. 2.4 Classification Given a set of contiguous pre-mark and post-mark sound clips, the classification will classify the mark as a hit based on the result of kernelized classification expression: m α i y (i) K(x (i), x) + b 0 i=1 During real-time song playback, the hit detector periodically computes the feature vector from pre- and post-mark clips relative to the current playback location, then uses the kernelized SVM classification to classify whether that particular point in the song is a hit or a non-hit. Additionally, the confidence of the hit can be judged by the functional margin of the real-time feature. 3 Results In order to test the hit detection algorithm, we developed a MATLAB GUI to display hit information via a simple visualizer while playing test songs, shown in Figure 4. The visualizer shows the low frequency portion of the FFT of song data for the next second. The colors correspond to the hit or non-hit label that the algorithm has assigned to that moment during the song. A non-hit will be displayed in green and blue shades, whilst a hit will be shown as a shade between yellow and red, increasing from yellow to red with increasing confidence levels. The online classifier correctly identified hits from many songs from popular music. However, the classifier also produced erroneous hits/no-hits, leading to two key observations: 1. Though musical hits are instantaneous events, there are frequently build-ups, such as drum rolls or guitar riffs, leading up to that instant. As a result, the algorithm will detect hits during this build-up period with increasing confidence. For a brief period after a hit, the change can still be apparent in the feature vector, causing a taper-off period back down from a hit classification to a non-hit classification in the new section after the hit. Page 4 of 5

Figure 4: Hit Display GUI 2. A high-volume vocal track without any other change in underlying beat or instrumentation would frequently result in false hit identifications. This may have stemmed from a lack of vocals in our pre- and post-mark clips in our training data. As MATLAB is not a real-time operating environment, we were not able to identify hits in real-time. Thus, we pre-computed at hit classification of each second of the song to provide a demonstration of how well the algorithm works while playing the song. A future version of the classifier should include a multi-threaded process which could easily perform the same classification in real-time. 4 Future Work There are a number of areas that could be investigated to further enable correct hitclassification. As previously mentioned, hits are sometimes preceded by a crescendo in the music which can decrease the probability of a correct detection. The build-up error could possibly be rectified by inserting a gap between the pre-mark and post-mark clips and increasing the clip length. This would make the pre- and post-mark clips more distinct, allowing for better classification. In order to implement a hit detection algorithm with a visualizer, the hit detection s functional margin can be fused with other musical property sensors, such as beat detection in order to detect tempo changes. For actual usage, this algorithm would need to be recast in C or C++ to enable real-time hit detection and appropriate visualization. Page 5 of 5