Automatic Music Transcription: The Use of a. Fourier Transform to Analyze Waveform Data. Jake Shankman. Computer Systems Research TJHSST. Dr.

Similar documents
Robert Alexandru Dobre, Cristian Negrescu

Automatic Rhythmic Notation from Single Voice Audio Sources

Music Radar: A Web-based Query by Humming System

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Speech and Speaker Recognition for the Command of an Industrial Robot

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

Query By Humming: Finding Songs in a Polyphonic Database

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

A Novel System for Music Learning using Low Complexity Algorithms

Voice Controlled Car System

Supervised Learning in Genre Classification

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Music Segmentation Using Markov Chain Methods

Music Genre Classification and Variance Comparison on Number of Genres

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

Introductions to Music Information Retrieval

Musical Hit Detection

Physics 105. Spring Handbook of Instructions. M.J. Madsen Wabash College, Crawfordsville, Indiana

Analyzing Modulated Signals with the V93000 Signal Analyzer Tool. Joe Kelly, Verigy, Inc.

Statistical Modeling and Retrieval of Polyphonic Music

Automatic Piano Music Transcription

A REAL-TIME SIGNAL PROCESSING FRAMEWORK OF MUSICAL EXPRESSIVE FEATURE EXTRACTION USING MATLAB

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Automatic Construction of Synthetic Musical Instruments and Performers

A prototype system for rule-based expressive modifications of audio recordings

Music Alignment and Applications. Introduction

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

QSched v0.96 Spring 2018) User Guide Pg 1 of 6

Audio-Based Video Editing with Two-Channel Microphone

Music Database Retrieval Based on Spectral Similarity

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Introduction To LabVIEW and the DSP Board

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Tempo Estimation and Manipulation

Tempo and Beat Analysis

Lab experience 1: Introduction to LabView

CSC475 Music Information Retrieval

Hidden Markov Model based dance recognition

Pitch correction on the human voice

MAutoPitch. Presets button. Left arrow button. Right arrow button. Randomize button. Save button. Panic button. Settings button

Automatic Music Clustering using Audio Attributes

Automatic music transcription

MUSIC TRANSCRIBER. Overall System Description. Alessandro Yamhure 11/04/2005

2. AN INTROSPECTION OF THE MORPHING PROCESS

Real-time Granular Sampling Using the IRCAM Signal Processing Workstation. Cort Lippe IRCAM, 31 rue St-Merri, Paris, 75004, France

NanoGiant Oscilloscope/Function-Generator Program. Getting Started

Controlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach

Spectrum Analyser Basics

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

J-Syncker A computational implementation of the Schillinger System of Musical Composition.

PulseCounter Neutron & Gamma Spectrometry Software Manual

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Digitizing and Sampling

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Semi-supervised Musical Instrument Recognition

Melody transcription for interactive applications

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Multichannel Satellite Image Resolution Enhancement Using Dual-Tree Complex Wavelet Transform and NLM Filtering

DSP First Lab 04: Synthesis of Sinusoidal Signals - Music Synthesis

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

ni.com Digital Signal Processing for Every Application

Topic 10. Multi-pitch Analysis

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

Doubletalk Detection

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Figure 1: Feature Vector Sequence Generator block diagram.

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

Rechnergestützte Methoden für die Musikethnologie: Tool time!

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Building a Better Bach with Markov Chains

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

EE-217 Final Project The Hunt for Noise (and All Things Audible)

Analysis of Musical Content in Digital Audio

Analysis of local and global timing and pitch change in ordinary

Jam Tomorrow: Collaborative Music Generation in Croquet Using OpenAL

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION. Hsin-Chu, Taiwan

Singer Recognition and Modeling Singer Error

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

The Measurement Tools and What They Do

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Adaptive Resampling - Transforming From the Time to the Angle Domain

R&S CA210 Signal Analysis Software Offline analysis of recorded signals and wideband signal scenarios

Research on sampling of vibration signals based on compressed sensing

Honours Project Dissertation. Digital Music Information Retrieval for Computer Games. Craig Jeffrey

REAL-TIME PITCH TRAINING SYSTEM FOR VIOLIN LEARNERS

Notey. A real-time music notation detection software. Hila Shmuel September 2011

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Spectral toolkit: practical music technology for spectralism-curious composers MICHAEL NORRIS

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases *

Auto-Tune. Collection Editors: Navaneeth Ravindranath Tanner Songkakul Andrew Tam

Lab 1 Introduction to the Software Development Environment and Signal Sampling

Realizing Waveform Characteristics up to a Digitizer s Full Bandwidth Increasing the effective sampling rate when measuring repetitive signals

Transcription:

Automatic Music Transcription: The Use of a Fourier Transform to Analyze Waveform Data Jake Shankman Computer Systems Research TJHSST Dr. Torbert 29 May 2013

Shankman 2 Table of Contents Abstract... 3 Background... 4 Materials Used... 5 My Approach... 6 Phase 1 Data Extraction... 6 Phase 2 Normalization...... 6 Phase 3 FFT... 7 Phase 4 Pitch Mapping..... 7 Phase 5 Rhythm Tracking... 8 Phase 6 Transcription... 8 Results... 9 Conclusions and Analysis... 15 References... 17

Shankman 3 Abstract: The concept of Automatic Music Transcription, or the creation of sheet music by a computer program, is an idea that has been around since at least the mid 70s (Gallard and Piszczalski). In order for a computer to be able to do such a task, it must take a musical input from a sound file, usually of the.wav variety (from MIDI input) and perform an analysis of frequency and duration. Although this topic has been approached for over 30 years (Gallard and Piszczalski), a general solution has yet to be found and is an active are of research today (Lu). A very popular, method used to determine the pitches comprising a sound file is to apply a Fast Fourier Transform (FFT). The choice for an FFT is most common due to the speed enhancements present over the similar Discrete Fourier Transform. Both functions are known to take waveform data in the time domain and convert it to a frequency domain that is suitable for music transcription. In essence, the FFT acts like a tuner on an individual time sample; pitch is returned at that specific time input. Being a very well defined function, the commonplace use of the FFT has made it a standard in attempting to perform Automatic Music Transcription. Running an FFT alone will not necessarily map to pitch. In order to do so, every element returned from the FFT is then linked directly to a known pitch value, in the form of the previously mentioned frequency file. Using python, this frequency file is read into a dictionary. Then, an array of equal size to the FFT result is created whose elements are equated to the closest absolute pitch from frequency to the corresponding coefficient in the FFT output. The results images show the results of my running.wav files through my program, amt.py. Engraving was done by Lilypond and represents standard sheet music in the treble clef.

Shankman 4 Background: The concept of Automatic Music Transcription, or the creation of sheet music by a computer program, is an idea that has been around since at least the mid 70s (Gallard and Piszczalski). In order for a computer to be able to do such a task, it must take a musical input from a sound file, usually of the.wav variety (from MIDI input) and perform an analysis of frequency and duration. Although this topic has been approached for over 30 years (Gallard and Piszczalski), a general solution has yet to be found and is an active area of research today (Lu). One possible function to employ in determining the pitch of a note is the Autocorrelation function; this is a summation of portions of a note, taking into account the periodicity of the note and the lag of the system (Bello, Monti and Sandler). Bello, Monti and Sandler used this function in their own experimentation and achieved rather successful results. Another approach to this problem is through the use of a genetic algorithm. In his 2006 work, Lu attempted to analyze simple musical patterns with a genetic algorithm and was highly accurate in his work. A possible implementation is as follows. Initially, a base population of notes will be bred. Utilizing fitness functions, which determine how accurate the notes are to the actual sound file, portions of the base population will be removed. From here, the population will then breed with itself to produce a fitter offspring. Additionally, each offspring has the potential to undergo a mutation, completely changing note structure. Ultimately, through natural selection, the correct musical notation will be achieved (Lu). A final, and very popular, method used to determine the pitches in a sound file is to apply a Fast Fourier Transform (FFT). The choice for an FFT is due to the speed enhancements present over the similar Discrete Fourier Transform. Both functions take waveform time-data and convert it to a frequency domain that is suitable for music transcription. In essence, the FFT acts like a tuner on an individual time sample; pitch is returned at that specific time input. Being a very well defined function, the use of the FFT has made it a standard in attempting Automatic Music Transcription.

Shankman 5 Materials Used: Throughout the course of this research project, certain software libraries and computing setups were used. In order to best replicate the results of this research, one should try to create a setup containing the following components. Unless noted, the software should be of the latest release. Python (preferably 2.7 or higher) Gentoo Linux, or an appropriate OS Audacity Scipy Numpy Imri Goldberg's fft_utils.py Imri Goldberg's pytuner.py Lilypond Music21 frequency.txt; a file containing all pitch values from 0 to D#8 Various.wav files The format one chooses to save the result output and the code editor used are irrelevant to the results of this research. That option is left to personal preference along with a suitable image viewer for analyzing results.

Shankman 6 My Approach: To best tackle the problem of Automatic Music Transcription, I have chosen to divide my program into several phases. By sub-setting this program into smaller pieces, or phases, issues of debugging, testing and modification become much simpler. Thus, I addressed the task of music transcription in phases dedicated to data extraction, normalization, mapping pitch to frequency through an FFT, rhythm analysis and transcription, or output, of my results to a sheet music image file. The following sections aim on providing detailed explanations on my process. Phase 1 Data Extraction: In order to perform Automatic Music Transcription, it is necessary to have audio data to transcribe. Due to the widespread popularity, smaller file size, universal access in varying operating systems and ease of availability, the audio files used in this project will be in the.wav file format. Files like these contain the waveform data necessary to perform manipulation for transcription. After choosing the.wav audio file to transcribe, it is necessary to make sure it is free of metadata. This requirement is due to the nature of my implementation of data extraction. To clear the metadata, one can simply open a program like Audacity, load the.wav file and edit the metadata through the file menu's Open Metadata Editor command. Clicking that option will open a GUI where the user must manually delete all metadata. Once that is done, make sure to save the cleaned file so it's data is ready to be extracted. Now that the metadata has been cleared away, data can be extracted. The tool used to do this is Scipy's own waveform data extractor; scipy.io.wavfile.read(). An array containing the sampling rate and waveform data is returned in that order. Phase 2 Normalization: Before performing a Fast Fourier Transform on the returned waveform data, it is necessary to normalize said data using a window. This is because the FFT assumes the waveform to be a continuous data set due to the sinusoidal nature of the function. Without any normalization, or

Shankman 7 process by which the data is made continuous, an anomaly called spectral leakage will occur. Spectral leakage will cause a spill of audio data into other bins, thus creating an excess amount of noise; the FFT will become muddied by this noise and results are very less likely to work. Performing the normalization on my data set is done by utilizing Scipy's Hann function, scipy.signal.hann(). The data set will become windowed and appear continuous to the FFT, making for better accuracy in pitch estimation. Phase 3 FFT: After normalizing the waveform audio data, it is possible to extract pitch by performing a FFT. This function takes input in the time domain and will output coefficients in the frequency domain. The FFT is a summation function performed by manipulating sine and cosine expansions. As previously mentioned, the input data from the.wav file is in the time domain. Running through the fft will return a matrix whose indexes and coefficients correspond to the pitch at a given time. Essentially, this maps waveform audio data to a discernible frequency, otherwise known as pitch. Due to issues with noise, only the peak value of each index will yield the appropriate pitch. Additionally, to ensure greater accuracy, the FFT will be performed on smaller subdivisions of the sound file; these chunks allow the algorithm to pitch out small pieces and determine pitch values that would otherwise be missed. Chunks are necessary for this mapping as otherwise, the FFT would only return one pitch value; the more chunks there are, the more data will be mapped to pitch. Phase 4 Pitch Mapping: Running an FFT alone will not necessarily map to pitch. In order to do so, every element returned from the FFT is then linked directly to a known pitch value, in the form of the previously mentioned frequency file. Using python, this frequency file is read into a dictionary. Then, an array of equal size to the FFT result is created whose elements are equated to the closest absolute pitch from frequency to the corresponding coefficient in the FFT output. Processing the data like this

Shankman 8 prepares it for the next step in the transcription process. At this time, the pitch detection is done. Each index represents a point in the audio file and contains the corresponding pitch, as given by the FFT. Now all that is left to do is determine rhythm and transcribe. Phase 5 Rhythm Tracking: Rhythm tracking is no easy task. To perform, it is necessary to recursively run through the mapped pitch data, apply Bayesian networks and constantly check back on the result. With that in mind, rhythm tracking will allow us to determine note length, tempo, rests and other facets that we need to appropriately place our sheet music on the staff. //edit to better explain Phase 6 Transcription: With rhythm and pitch both mapped for each note, it is now time to transcribe the audio file to digital sheet music. Music21, a free online program from MIT, will easily allow us to do this. All that must be done is open a stream object from Music21 in python and then append all our notes; each index in the pitch and rhythm arrays correspond to a note. To append, the array will be looped over at each index, creating a note whose pitch and duration map the values of the corresponding rhythm and pitch indexes. Upon completion of the loop, a virtual staff will exist containing notes derived from the initial waveform audio file. Music21 is unable to display sheet music on its own, so the program will enlist the help of Lilypond to engrave the stream as a digital image. By calling a show( lily ) method, Music21 is capable of completing the transcription process; data gathered by manipulating audio is finally put done in a standard notation that a musician can read.

Shankman 9 Results: The following images show the results of my running.wav files through my program, amt.py. Engraving was done by Lilypond and represents standard sheet music in the treble clef. Illustration 1: Results from couchplayin2.wav, 19 Feb 2013

Illustration 2: 1st Result from couchplayin2.wav, 28 Feb 2013 Shankman 10

Illustration 3: 2nd Result from couchplayin2.wav, 28 Feb 2013 Shankman 11

Illustration 4: 3rd Result from couchplayin2.wav, 28 Feb 2013 Shankman 12

Illustration 5: 4th Result from couchplayin2.wav, 28 Feb 2013 Shankman 13

Illustration 6: 5th Result from couchplayin2.wav, 28 Feb 2013 Shankman 14

Shankman 15 Conclusions and Analysis: An analysis of the output data shown in the preceding section has interesting implications about my automatic music transcription program. Clearly, in it's current state, my program is incapable of accurately producing tempo and rhythm (as it displays all notes as half notes). This was more a design choice than an experimental-programming error; there was not enough time to attempt both pitch detection and rhythm tracking, so the more important of the two tasks was chosen to be undertaken. In regards to the accuracy of my pitch-detection algorithm, the results obtained are not favorable. With a base knowledge of music, it is very clear that all six trials of couchplayin2.wav do not map accurately to what is being played. Reading the sheet music while listening to the file clearly indicates that the frequencies presented don't match. Unfortunately, without knowing the exact notes played in that audio file, it is impossible for me to determine the degree of error for my code. When utilizing self-recorded audio files of known note-frequency, I still am not able to ascertain any useful results. While I do know the notes being played in these files, my program returns a blank piece of sheet music for all of these alternative trials. This is most likely due to some sort of noise interference due to a combination of the camera, recording software and recording environment. Although these sound files would be useful in determining the accuracy of my program, they ultimately have little use due to noise-processing issues. One interesting discovery from this program is through the various couchplayin2.wav results. The different trials each featured a differing manipulation of two parameters in the algorithm, noise and time duration. By manipulating the threshold for noise detection, I am able to determine which frequencies are phased in and out of my sheet music. Meanwhile, playing with the length of the sample given to the FFT will manipulate this same frequency due to sinusoidal windowing. With each increasing trial, duration and the noise-threshold were lowered; ironically, this produced

Shankman 16 results that are plausibly more in-line with what the actual sheet music for couchplayin2.wav would look like. This leads me to the conclusion that noise and duration do have an impact on music transcription, but to an unknown degree; more experimentation must be done to determine this exact order. Finally, another unique piece of information was gathered. More trials were run than reported in the results section on couchplayin2.wav. However, all of these non-represented datasets were run using the same parameters of noise and time as the results displayed above. Every time code was run on this file under the same conditions, the exact same result was returned. Consistency like this is important and leads me to better trust the validity of my results. As previously mentioned, more trials must be run using my code. These trials will consist of manipulations of both time and noise, to determine the optimal level for a given sound file, as well as on various other audio files to determine my program has universal applications. After that, I will be able to take the current state of my program from a pseudo-tuner to transcription software, complete with a more advanced graphics user interface.

Shankman 17 References: 4 Bacon, R. A., Carter, N. P., and Messenger, T. The Acquisition, Representation and Reconstruction of Printed Music by Computer: A Review. Computers and the Humanities. Vol.22, No.2 (1988): 117 136. JSTOR. Web. 9 March 2012. 3 Bello, Juan Pablo, Monti, Guliano and Sandler, Mark. Techniques for Autmoatic Music Transcription. King's College London. Web. 9 March 2012. Cemgil, Ali Taylan. Bayesian Music Transcription. PDF File. 14 Sept 2004. Web. 12 Nov 2012. Cheng, Xiaowen, Hart, Jarod V., and Walker, James S. Time-frequency Analysis of Musical Rhythm. PDF File. Web. 2 Jan 2013. 5 Galler, Bernard A. and Piszczalski, Martin. Automatic Music Transcription. Computer Music Journal. Vol. 1, No. 4 (1997): 24 31. JSTOR. Web. 9 March 2012. 1 Galler, Bernard A., and Piszczalski, Martin. Computer Analysis and Transcription of Performed Music: A Project Report. Computers and the Humanities. Vol. 13, No.3 (1979): 195-206. JSTOR. Web. 9 March 2012. Glover, John, Lazzarini, Victor, and Timoney, Joseph. Python for Audio Signal Processing. The Sound and Digital Music Research Group, National University of Ireland. PDF File. Web. 8 Jan 2013. Goldberg, Imri. base_tools.py. 2007. Python file. Goldberg, Imri. fft_utils.py. 2007. Python file. Goldberg, Imri. pytuner.py. 2007. Python file. Klapuri, Anssi. Automatic Music Transcription. Institute of Signal Processing, Tampere University of Technology. PDF File. Web. 12 Nov 2012. Lomont, Chris. The Fast Fourier Transform. lomont.org. Jan 2010. Web. 11 Sept 2012. LDS Dactron. Understanding FFT Windows. 2003. PDF File. Web. 22 Oct 2012.

Shankman 18 2 Lu, David. Automatic Music Transcription Using Genetic Algorithms and Electronic Synthesis. 25 April, 2006. Web. 9 March 2012. Performing FFT Spectrum Analysis. Avant!. PDF File. Web. 8 Jan 2013. Raphael, Christopher. Automated Rhythm Transcription. Department of Mathematics and Statistics, University of Massachusetts, Amherst. PDF File. 21 May 2001. Web. 12 Nov 2012. Sek, Michael. Frequency Analysis Fast Fourier Transform (FFT). Victoria University. PDF File. Web. 9 Oct 2012. Takeda, Haruto, Nishimoto Takuya, and Sagayama, Shigeki. Rhythm and Tempo Analysis Towards Automatic Music Transcription. Graduate School of Science and Technology, University of Tokyo. PDF File. Web. 2 Jan 2013. 6 Wellhausen, Jens. Towards Automatic Music Transcription: Extraction of MIDI-Data out of Polyphonic Piano Music. Auschen University Institute of Communications Engineering. Web. 9 March 2012.