Robert Alexandru Dobre, Cristian Negrescu

Similar documents
2. AN INTROSPECTION OF THE MORPHING PROCESS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Automatic music transcription

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

CSC475 Music Information Retrieval

Voice & Music Pattern Extraction: A Review

A prototype system for rule-based expressive modifications of audio recordings

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

Tempo and Beat Analysis

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Music Radar: A Web-based Query by Humming System

Music Representations

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

Music Representations

ECE438 - Laboratory 4: Sampling and Reconstruction of Continuous-Time Signals

Automatic Rhythmic Notation from Single Voice Audio Sources

Music Segmentation Using Markov Chain Methods

OCTAVE C 3 D 3 E 3 F 3 G 3 A 3 B 3 C 4 D 4 E 4 F 4 G 4 A 4 B 4 C 5 D 5 E 5 F 5 G 5 A 5 B 5. Middle-C A-440

Autocorrelation in meter induction: The role of accent structure a)

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Outline. Why do we classify? Audio Classification

CLASSIFICATION OF MUSICAL METRE WITH AUTOCORRELATION AND DISCRIMINANT FUNCTIONS

Onset Detection and Music Transcription for the Irish Tin Whistle

Music Source Separation

THE importance of music content analysis for musical

Query By Humming: Finding Songs in a Polyphonic Database

Chord Classification of an Audio Signal using Artificial Neural Network

QUALITY OF COMPUTER MUSIC USING MIDI LANGUAGE FOR DIGITAL MUSIC ARRANGEMENT

MUSIC is a ubiquitous and vital part of the lives of billions

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

Simple Harmonic Motion: What is a Sound Spectrum?

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

Effects of acoustic degradations on cover song recognition

Topic 10. Multi-pitch Analysis

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Subjective Similarity of Music: Data Collection for Individuality Analysis

Removal of Decaying DC Component in Current Signal Using a ovel Estimation Algorithm

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Harmonic Generation based on Harmonicity Weightings

Spectrum Analyser Basics

An Effective Filtering Algorithm to Mitigate Transient Decaying DC Offset

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Author Index. Absolu, Brandt 165. Montecchio, Nicola 187 Mukherjee, Bhaswati 285 Müllensiefen, Daniel 365. Bay, Mert 93

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Proceedings of the 7th WSEAS International Conference on Acoustics & Music: Theory & Applications, Cavtat, Croatia, June 13-15, 2006 (pp54-59)

Upgrading E-learning of basic measurement algorithms based on DSP and MATLAB Web Server. Milos Sedlacek 1, Ondrej Tomiska 2

Various Artificial Intelligence Techniques For Automated Melody Generation

Transcription of the Singing Melody in Polyphonic Music

Lecture 9 Source Separation

Controlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach

Polyphonic music transcription through dynamic networks and spectral pattern identification

Music Similarity and Cover Song Identification: The Case of Jazz

ENGIN 100: Music Signal Processing. PROJECT #1: Tone Synthesizer/Transcriber

Experiments on musical instrument separation using multiplecause

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

Topic 4. Single Pitch Detection

Interacting with a Virtual Conductor

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

DSP First Lab 04: Synthesis of Sinusoidal Signals - Music Synthesis

Algorithms for melody search and transcription. Antti Laaksonen

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

CSC475 Music Information Retrieval

HST 725 Music Perception & Cognition Assignment #1 =================================================================

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

10 Visualization of Tonal Content in the Symbolic and Audio Domains

Introductions to Music Information Retrieval

Tempo and Beat Tracking

Statistical Modeling and Retrieval of Polyphonic Music

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

Appendix A Types of Recorded Chords

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Automatic Music Transcription: The Use of a. Fourier Transform to Analyze Waveform Data. Jake Shankman. Computer Systems Research TJHSST. Dr.

Music Information Retrieval Using Audio Input

An Empirical Comparison of Tempo Trackers

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Figure 1: Feature Vector Sequence Generator block diagram.

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Extracting Significant Patterns from Musical Strings: Some Interesting Problems.

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Speech and Speaker Recognition for the Command of an Industrial Robot

Audio-Based Video Editing with Two-Channel Microphone

Pitch correction on the human voice

Algorithmic Composition: The Music of Mathematics

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

Adaptive Key Frame Selection for Efficient Video Coding

Transcription:

ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q Transform Telecommunications Department Politehnica University of Bucharest Bucharest, Romania rdobre@elcom.pub.ro Abstract The paper presents an automatic music transcription software which uses a constant Q transform as a time-frequency analysis tool showing why this transform is more adequate than the discrete Fourier transform for the presented application. The software can transcribe melodic structures played in any musical key by any equal tempered instrument. The input of the software is represented by an audio file containing the recording to be transcribed and the output is the musical score with standard notation as a PDF file. MIDI (Musical Instrument Digital Interface) files can also be exported which are helpful in the musical production processes. Keywords-music transcription; constant Q transform; music production; I. INTRODUCTION Music is stored in documents in the form of scores or sheets. A musical score is a graphical representation of a musical work using standard symbols. These specific symbols are in correspondence with the characteristics of each sound, the most important being its pitch (which is directly linked by its frequency), its duration and its temporal placement in the song. The symbol which stores the information about the pitch and duration of a sound is called musical note. Songs are composed of two parts: one called harmonic and another called melodic. The harmonic structure is represented by the chords which form the accompaniment for the melody. The melodic structure is the melody itself, the part that is remembered by listeners. Being iconic for a musical work, the melodic structure has a great importance and it is the one that is targeted by this paper. Music transcription[1][2] is the process of obtaining the musical score of a melody starting from a recording of it. Traditionally this is done by people which have absolute pitch and vast musical experience. Absolute pitch is the capability of relatively few people to recognize the pitch of a sound without a reference. The musical experience is needed in order to be able to recognize the rhythmic properties of the song and to accurately determine the duration and placement of the notes. Even for these gifted people, music transcription remains a process which takes a lot of time. Parts must be played repeatedly live or from a recording. Music scores are usually written by hand and then introduced note by note into a computer software in order to obtain a typography ready variant, just like a writing manuscript is transformed into a book using word processors. The paper presents an algorithm which could do all the time consuming hard work delivering directly the final, ready for printing, version of the musical score starting from a recording which can be easily done by any songwriter since most of today s multimedia devices have a record function (smartphones, music players etc.). The algorithm does not need assistance[3]. MIDI files can also be exported by the developed software which will come in handy if the melody is to be made a part of a material whose production already started. Since many multimedia productions are done by teams which independently work far from one another and which can use different software, MIDI is a format accepted by most of multimedia tools and can easily form a way to digitally represent and transfer ideas. MIDI can also be visualized using programs called sequencers, and may be preferred over standard musical score in some modern styles of music. The paper is structured in four sections starting with this introduction. Section II details the constant Q transform[4] with its advantages over the discrete Fourier transform (DFT) in this particular application, Section III describes the actual music transcription algorithm and Section IV presents results obtained using the software. II. THE CONSTANT Q TRANSFORM Since a musical score contains information about the pitch (frequency) and the duration of the notes, it is very similar, at a first glance, with a spectrogram. A spectrogram is a three dimensional representation of a signal in which the time and frequency are two orthogonal axes and the third dimension is the magnitude of the spectral components illustrated using different colors. These similarities suggest that starting from a well-chosen spectrogram, a musical transcription system can be developed. In order to obtain a spectrogram, a transformation must be applied on the signal to determine its spectral content. The most intensively used frequency analysis tool is the discrete Fourier transform[5]. In order to show why this is not the optimal tool for spectral analysis in the case of musical signals, the frequencies of musical

2 notes in equal tempered keys must be discussed. There are 12 musical notes per octave. The way frequencies are allocated to each note depends on how an instrument is tuned. Most of today s instruments are equal tempered meaning that the frequencies of the notes are in a geometric progression. The ratio of the frequencies of two notes placed at an octave apart is 2. Combining this with the information about the number of notes in an octave, and the fact that their frequencies must respect a geometric progression, it results that its common ratio is: = 2 The DFT for a signal is computed using the following formula: ( ) = ( ) N is the length of the sequence, ( ) is the time domain signal, and ( ) is its DFT. N complex spectral coefficients are obtained which are called bins. Considering the sampling frequency, the frequency of each coefficient can be found using: = It can be observed that the frequency domain is uniformly split in N equal intervals and the spectral resolution of this analysis is constant. This kind of analysis is similar to having N band pass filters with the central frequencies given by (3). In order for the resolution to remain constant, the quality factor (Q) must be different for each filter. It can be calculated that for the filter centered on frequency, with its bandwidth denoted with Δ, its quality factor is: = = = This analysis is not adequate for signals whose spectral components are placed at frequencies which follow a geometric progression, like musical signals. A time-frequency analysis tool which exhibit human hearing characteristics, having a better spectral resolution at low frequencies than at high frequencies, will be more efficient in this case. Using the aforementioned notions, this translates in having a transformation which is similar to a filter bank with band pass filters having the same quality factor. The bandwidths of the filters will be equal on a logarithmic frequency scale, just like the human hearing model suggests. The central frequencies of the filters must follow the rule: = 2, = 0, 1 where b indicates the number of filters per octave and is the frequency of the first filter (the lowest frequency that is analyzed). In order to cover the entire frequency domain, the bandwidth of the filter centered at is denoted Δ and can be computed using: Δ = = (2 1) It can be easily shown that such a transformation uses the same quality factor for each filter and it can be calculated using the following equation: = = ( ) By choosing b =12 and the lowest frequency to be taken into account (depending on the analyzed musical instrument) each frequency bin will be placed exactly at a frequency of a note so the analysis tool is perfectly adapted to the content to be analyzed. The filter analogy is useful for understanding the advantages of the constant Q transform over the DFT in this particular situation. The expression for actually computing the constant Q transform is obtained by modifying (2) in order to impose constant quality factor filters. Since the spectral resolution differs, this implies that the temporal resolution will also differ meaning that a different number of samples will be used for the computation of each bin. A normalization factor is needed and introduced. This way: = ( ) = ( ) The algorithm to determine the constant Q transform for a signal is: Choose the lowest frequency used in analysis, denoted with, based on the musical instrument that plays the melody. Choose the number of octaves which will represent the interest domain (the frequency range of the instrument). Choose the number of bins per octave, denoted b (12 to have one bin for each musical note). Determine the number of bins to be calculated using the following formula: = where ceil means to take the immediately larger integer value. Compute the quality factor using (7), using (8) and then ( ) using (10) for <.

Automatic Music Transcription Software Based on Constant Q Transform 3 Figure 1. DFT of a signal composed of three sine waves with equal amplitudes and frequencies an octave apart. Figure 2. Constant Q transform of a signal composed of three sine waves with equal amplitudes and frequencies an octave apart. The difference between DFT and constant Q transform for a sum of sine waves with equal amplitudes and frequencies respecting a geometrical progression with the common ratio equal with 2 can be observed in Fig. 1 and Fig. 2. The amplitudes are correctly determined using constant Q transform because the frequencies of the signals are perfectly aligned with the corresponding frequency bins. Some components fall between bins in the DFT case and amplitudes are not correctly determined. The constant Q transform has also other applications in music[6]. III. Figure 3. THE MUSIC TRANSCTRIPTION ALGORITHM The block diagram of the automatic music transcription algorithm is depicted in Fig. 3. It was implemented using Matlab. The input file containing the recording of the melody is loaded into the program. The constant Q spectrogram of the signal using 24 bins per octave is computed and it will serve as a base for all the remaining steps. The resolution is half of a semitone (frequency difference between two consecutive notes). The user must only set the rhythm information the tempo in beats per minute (BPM or usually quarter notes per minute). This is easy to determine manually, but very hard to be done automatically using only a melodic structure. The block diagram of the automatic music transcription algorithm. The other operations are done on frames, not on the whole signal. The temporal length of each frame is dependent on the parameters of the spectral analysis tool (, ). Overlapping fames can be used. For each frame, if its energy (E) is greater than a th reshold value (Et), the candidate peaks are determined. These must have an amplitude greater than 75% of an average maximum peak value computed using the greatest peaks from 10 past and 10 future frames. The candidates must also be spaced at least by one semitone. From the candidates, the peak which will give the pitch of the note is selected using a recursive sequence: the candidate peak placed at the lowest frequency is called the main candidate. It is checked if another peak placed at an octave below the main candidate s frequency has an amplitude greater than 10% of the main candidate s amplitude, it will become the new main candidate. This runs until the main candidate is not replaced. The position of the peak (bin number) will easily give the frequency of the note using (5). An entry which contains the temporal information (given by the frame number) and the frequency of the note is added to a time-frequency table. Since the durations of the notes can only have certain values depending on the BPM, corrections can

4 be made if errors occur in the phase in which the timefrequency table is determined. This way very short notes (shorter that 1/32 of a quarter note) are considered errors and are aligned with their neighbors. The mechanism can be observed in Fig. 4. Error in frequency determination Neighbors comparison Correction Figure 6. The final, corrected, representation of the timefrequency table which was used to generate the musical score for the Pink panther jazz theme. Figure 4. Highlight of frequency determination error correction and duration measurement. Figure 7. The score obtained for violin fast arpeggios. Based on the aforementioned table, each frequency is converted in note names (Do, Re, Mi etc.). T he duration of each note in seconds is measured, denoted Δ in Fig. 4, then using the BPM information it is converted into musical durations (whole, half, quarter, eight etc.) using the formula: = 4 A duration equal to 1 means whole note, 2 means half note, 4 means quarter note etc. In the end a free external program (LilyPond) is called which, based on the table now containing musical durations and pitches, it generates the PDF file representing the musical score. A Matlab MIDI free library[7] can be used to also write the MIDI files. IV. RESULTS Tests were done using piano, violin and guitar. The score obtained for the well-known jazz theme Pink panther played by a synthesized piano is presented in Fig. 5 as it was given by the proposed software. Figure 5. The musical score obtained using the proposed software for the Pink panther jazz theme. The time-frequency representation which was used to generate the musical score presented in Fig. 5 is illustrated in Fig. 6. It can be observed that the algorithm successfully corrected the error in frequency detection highlighted in Fig. 4. The algorithm gives good results even for fast musical phrases. The score presented in Fig. 7 was obtained for fast violin arpeggios. The representation of the time-frequency table associated with this experiment is depicted in Fig. 8. The proposed software gives good results using simple melodic structures which form most of today s music. For more complex melodies which contain more than one note at a time, multiple fundamental frequency estimators must be included. Figure 8. The representation of the time-frequency table which was used to generate the musical score for the violin melody. REFERENCES [1] E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, A. Klapuri, Automatic music transcription: challenges and future directions, Journal of Intelligent Information Systems, vol. 41, pp. 407 434, July 2013. [2] A. Klapuri, M. Davy, Signal processing methods for music transcription, Springer Science & Business Media, 2007. [3] S. Siddharth, E. Benetos, N. Boulanger-Lewandowski, T. Weyde, A. S. d'avila Garcez, S. Dixon, A hybrid recurrent neural network for music transcription, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2061 2065, 2015. [4] C. Schörkhuber and A. Klapuri, Constant-Q transform toolbox for music processing, 7th Sound and Music Computing Conference, 2010. [5] C. Marghescu and A. Drumea, Modelling and simulation of energy harvesting with solar cells, Proc. of the Advanced Topics in Optoelectronics, Microelectronics, and Nanotechnologies 2014, pp. 92582L-92582L8, February 2015. [6] C. Schörkhuber, A. Klapuri, A. Sontacchi. Audio pitch shifting using the constant-q transform, Journal of the Audio Engineering Society, vol. 61, issue 7/8, pp. 562 572, July 2013. [7] E. Tuomas and P. Toiviainen. MIDI toolbox: MATLAB tools for music research, 2004.

Automatic Music Transcription Software Based on Constant Q Transform 5