A Variable Resolution transform for music analysis

Size: px
Start display at page:

Download "A Variable Resolution transform for music analysis"

Transcription

1 1 A Variable Resolution transorm or music analysis Aliaksandr Paradzinets and Liming Chen A Research Report, Lab. LIRIS, Ecole Centrale de Lyon Ecully, June 2009

2 2 A Variable Resolution transorm or music analysis Aliaksandr Paradzinets and Liming Chen Laboratoire d'inormatique en Images et Systemes d'inormation (LIRIS), Département MI, Ecole Centrale de Lyon, University o Lyon 36 avenue Guy de Collongue, Ecully Cedex, France; {aliaksandr.paradzinets; liming.chen}@ec-lyon.r Abstract This paper presents a novel music representation using a Variable Resolution Transorm (VRT) which is particularly well adapted or music audio analysis. The VRT is inspired by continuous wavelet transorm and it applies dierent wavelet unction at dierent scale. This method enables a good lexibility o the transorm in order to ollow log scale o musical note requencies and at the same time to maintain good time and requency resolution. As an example o application o this novel VRT, a multiple 0 detection algorithm is presented and evaluated showing convincing results. Furthermore, a direct comparison with the FFT applied to the same algorithm is also provided. Index Terms music representation, music analysis, Variable resolution transorm, multiple undamental requency estimation I. INTRODUCTION As a major product or entertainment, there is a huge amount o digital musical content produced, broadcasted, distributed and exchanged. Consequently there is a rising demand or better ways o cataloging, annotating and accessing these musical data. This in turn has motivated intensive research activities or music analysis, content-based music retrieval, etc. The primary stage in any kind o audio signal processing is an eective audio signal representation. While there exists some algorithms perorming music data analysis in the time domain as or example some beat detection algorithms, the majority o music processing algorithms perorm their computation in the requency domain, or a time-requency representation, to be exact. So, the perormance o all urther steps o processing is strictly dependent on the initial data representation. As compared to a vocal signal, a music signal is likely to be more stationary and owns some very speciic properties in terms o musical tones, intervals, chords, instruments, melodic lines and rhythms, etc. [1]. While many eective and high perormance music inormation retrieval (MIR) algorithms have been proposed [2-9], most o

3 3 these works unortunately tend to consider a music signal as a vocal one and make use o MFCC-based eatures which are primarily designed or speech signal processing. Mel Frequency Cepstrum Coeicients (MFCC) was introduced in the 60 s and used since that time or speech signal processing. The MFCC computation averages spectrum in sub-bands and provides the average spectrum characteristics. Whereas they are inclined to capture the global timbre o a music signal and claimed to be o use in music inormation retrieval [10; 11], they cannot characterize the aorementioned music properties as needed or perceptual understanding by human beings and quickly ind their limits [12]. Recent works suggest combining spectral similarity descriptors with high-level analysis in order to overcome existing ceiling [13]. The Fast Fourier Transorm and the Short-Time Fourier Transorm have been the traditional techniques in audio signal processing. This classical approach is very powerul and widely used owing to its great advantage o rapidity. However, a special eature o musical signals is the exponential law o notes requencies. The requency and time resolution o the FFT is linear and constant across the requency scale while the human perception o a sound is logarithmic according to Weber-Fechner law (including loudness and pitch perception). Indeed, as it is well known, the requencies o notes in equally-tempered tuning system in music ollow an exponential law (with each semi-tone the requency is increased by a actor o 2 1/12 ). I we consider a requency range or dierent octaves, this requency range is growing as the number o octave increases. Thus, to cover a wide range o octaves with a good requency grid large sized windows are necessary in the case o FFT; this aects the time resolution o the analysis. On the contrary, the use o small windows makes resolving requencies o neighboring notes in low octaves almost impossible. The ability o catching all octaves in music with the same requency resolution is essential or music signal analysis, in particular construction o melodic similarity eatures. In this paper, we propose a new music signal analysis technique by variable-resolution transorm (VRT) particularly suitable to music signal. Our VRT is inspired by Continuous Wavelet Transormation (CWT) [14] and speciically designed to overcome the limited time-requency localization o the Fourier-Transorm or non-stationary signals. Unlike classical FFT, our VRT depicts similar properties as CWT, i.e. having a variable time-requency resolution grid with a high requency resolution and a low time resolution in low-requency area and a high temporal/low requency resolution on the other requency side, thus behaving as a human ear which exhibits similar time-requency resolution characteristics [15]. The remainder o this paper is organized as ollows. Section II overviews related music signal representations.

4 4 Our variable resolution transorm is then introduced in section III. The experiments and the results are discussed in section IV. Finally, we conclude our work in section V. II. RELATED WORKS There are plenty o works in the literature dedicated to musical signal analysis. In this section, we propose irst to compare the popular FFT with wavelet transorm on the basis o desirable properties or music signal analysis and then overviews some other transorms and ilter banks so ar proposed in the literature. A. Time-requency transorms: FFT vs WT The common approach is the use o FFT (Fast Fourier Transorm) which has become a de-acto standard in music inormation retrieval community. The use o FFT seems straightorward in this ield and relevance o its application or music signal analysis is almost never motivated. There are some works in music inormation retrieval attempting to make use o wavelet transorm as a novel and powerul tool in musical signal analysis. However, this new direction is not very well explored. [8] proposes to rely on discrete wavelet transorm or beat detection. Discrete packet wavelet transorm is studied in [15] to build time and requency eatures in music genre classiication. In [16], wavelets are also used or automatic pitch detection. As it is well known, Fourier transorm enables a spectral representation o a periodic signal as a possibly sum o a series o sines and cosines. While Fourier transorm gives an insight into the spectral properties o a signal, its major disadvantage is that a decomposition o a signal by Fourier transorm has ininite requency resolution and no time resolution. It means that we are able to determine all requencies in the signal, but without any knowledge about when they are present. This drawback makes Fourier transorm to be perect or analyzing stationary signals but unsuitable or irregular signals whose characteristics change in time. To overcome this problem several solutions have been proposed in order to represent more or less the signal in time and requency domains. One o these techniques is windowed Fourier transorm or short-time Fourier transorm. The idea behind is to bring time localization into classic Fourier transorm by multiplying the signal with an analyzing window. The problem here is that the short-time discrete Fourier transorm has a ixed resolution. The width o the windowing unction is a tradeo between a good requency resolution transorm and a good time resolution transorm. Shorter window leads to smaller requency resolution but higher time resolution while larger window leads to greater requency resolution but lower time resolution. This phenomenon is related to Heisenberg s uncertainty principle which says that

5 5 1 Δt ~ (1) Δ where t is a time resolution step and is a requency resolution step. Remember that in our work the main goal is music analysis. In this respect, we consider a rather music-related example which illustrates speciicities o musical signals. As it is known, the requencies o notes in equally-tempered tuning system in western music ollow a logarithmic law, i.e. adding a certain interval (in semitones) corresponds to multiplying a requency by a given actor. For an equally-tempered tuning system a semitone is deined by a requency ratio o 2 1/12. So, the interval between two requencies is n = 12 log (2) I we consider a requency range or dierent octaves, it is growing as the number o octave is higher. Thus, applying the Fast Fourier Transorm we either lose resolution o notes in low octaves (Figure 1) or we are not able to distinguish high-requency events which are closer in time and have shorter duration. Frequency resolution Notes requencies Figure 1. Mismatch o note requencies and requency resolution o the FFT. Time-requency representation, which can overcome resolution issues o the Fourier transorm is Wavelet transorm. Wavelets (literally small waves ) are a relatively recent instrument in modern mathematics. Introduced about 20 years ago, wavelets have made a revolution in theory and practice o non-stationary signal analysis [14; 17]. Wavelets have been irst ound in the literature in works o Grossmann and Morlet [18]. Some ideas o wavelets partly existed long time ago. In 1910 Haar published a work about a system o locally-deined basis unctions. Now these unctions are called Haar wavelets. Nowadays wavelets are widely used in various signal analysis, ranging rom image processing, analysis and synthesis o speech, medical data and music [16; 19]. Continuous wavelets transorm o a unction (t) L 2 (R) is deined as ollows:

6 6 W 1 * t b, = ( t) ψ dt (3) a a ( a b) where a, b R, a 0. In the equation (3) ψ(t) is called basic wavelet or mother wavelet unction (* stands or complex conjugate). Parameter a is called wavelet scale. It can be considered as analogous to requency in the Fourier transorm. Parameter b is localization or shit. It has no correspondence in the Fourier transorm. One important thing is that the wavelet transorm does not have a single set o basis unctions like the Fourier transorm. Instead, the wavelet transorm utilizes an ininite set o possible basis unctions. Thus, it has an access to a wide range o inormation including the inormation which can be obtained by other time-requency methods such as Fourier transorm. As explained in brie introduction on music signal, a music excerpt can be considered as a sequence o note (pitches) events lasting certain time (durations). Beside beat events, singing voice and vibrating or sweeping instruments, the signal between two note events can be assumed to be quasi-stationary. The duration o a note varies according to the main tempo o the play, type o music and type o melodic component the note is representing. Fast or short notes usually ound in melodic lines in high requency area while slow or long notes are usually ound in bass lines with rare exceptions. Let s consider the ollowing example in order to see the dierence between the Fourier transorm and wavelet one. We construct a test signal as containing two notes E1 and A1 playing simultaneously during the whole period o time (1 second). These two notes can represent a bass line, which, as it is well known, does not change quickly in time. At the same time, we add 4 successive notes B5 with small intervals between them (around 1/16 sec). These notes can theoretically be notes o the main melody line. Let s see now the Fourier spectrogram o the test signal with a small analyzing window. Frequency Time Figure 2. Small-windowed Fourier transorm (512 samples) o the test signal containing notes E1 and A1 at the bottom and 4 repeating B5 notes at the top.

7 7 As we can see rom Figure 2, while high-octave notes can be resolved in time, two bass notes are irresolvable in requency domain. Now we increase the size o the window in the Fourier transorm. Figure 3 illustrates the resulting spectrogram. Frequency Time Figure 3. Large-windowed Fourier transorm ( 1024 samples) o the test signal containing notes E1 and A1 at the bottom and 4 repeating B5 notes at the top. As we can see, two lines at the bottom o the spectrogram are now clearly distinguishable while the time resolution o high-octave notes has been lost. Finally we apply wavelet transorm to the test signal. Figure 4 shows such Morlet-based wavelet spectrogram o our test signal. Frequency Time Figure 4. Wavelet transorm (Morlet) o the test signal containing notes E1 and A1 at the bottom and 4 repeating B5 notes at the top. O course, the given example is quite artiicial; however it explains well our motivation or a wavelet like time-requency representation o a signal. It is also known, that human ear exhibits time-requency characteristic closer to that rom wavelet transorm [20].

8 8 B. Other transorms and ilter banks The idea to adapt the time/requency scale o a Fourier-related transorm to musical applications is not completely novel. A technique called Constant Q Transorm [21] is related to the Fourier transorm and it is used to transorm a data series to the requency domain. Similar to the Fourier transorm a constant Q transorm is a bank o ilters, but contrary to the Fourier transorm it has geometrically spaced center requencies k b = 2 (k = k 0 0; ), where b is the number o ilters per octave. In addition it has a constant requency resolutions ratio R 1 1 b = / Δ. Choosing appropriately k and makes central requencies to correspond to the requencies o notes. In general, the transorm is well suited to musical data (see e.g. [22], in [23] it was successully used or recognizing instruments), and this can be seen in some o its advantages compared to the Fast Fourier Transorm. As the output o the transorm is eectively amplitude/phase against log requency, ewer spectral bins are required to cover a given range eectively, and this proves useul when requencies span several octaves. The downside o this is a reduction in requency resolution with higher requency bins. Besides constant Q transorm there are bounded version o it (BQT) which use quasi-linear requency sampling when requency sampling remains linear within separate octaves. This kind o modiication allows construction o medium complexity computation schemes in comparison to standard CQT. However, making the requency sampling quasi-linear (within separate octaves) renders the inding o harmonic structure much more complex task. Fast Filter Banks are designed to deliver higher requency selectivity maintaining low computational complexity. This kind o ilter banks inherits all disadvantages o FFT in music analysis applications. More advanced techniques, described or example in [24] are medium-complexity methods which aim to overcome disadvantages o FFT and try to ollow note system requency sampling. However, octave-linear requency sampling keeps the same disadvantage as in the case o bounded Q transorms. III. VARIABLE RESOLUTION TRANSFORM Our Variable Resolutions Transorm (VRT) is irst derived rom the classic deinition o Continuous Wavelet Transorm (CWT) in order to enable a variable time-requency coverage which should it to music signal analysis better. The consideration o speciic properties o music signal inally leads us to change the mother unction as

9 9 well and thus our VRT is not a true CWT but a ilter bank. We start the construction o our VR Transorm rom Continuous Wavelet Transorm deined by (3). Thus, we deine our mother unction as ollows ψ j 2π t () t = H( t, l) e (4) where H(t,l) is the Hann window unction o a length l with l Z as deined by (5). In our case l will lie in a range between ms. Notice that using dierent length values l amounts to change the mother wavelet unctionψ. H 1 1 2πt, = + cos (5) 2 2 l ( t l) Once the length l is ixed, unction (4) becomes much more similar to a Morlet wavelet. It is an oscillating unction, a lat wave modulated by a Hann window. The parameter l deines the number o periods to be present in the wave. Figure 5 illustrates such a unction with l=20 waves. Figure 5. Our mother wavelet unction. A lat wave modulated by a Hann window with l=20. We can write according to the deinition o the unction (since l < ): () t ψ dt < and () t 2 ψ dt < (6) The unction is oscillating symmetrically around its 0 value, hence ψ () t dt 0 (7) Using (3) we write a discrete version o the transorm or a sampled signal between the instants o time orm t l/2 to t+l/2. Applying the wavelet transorm to the signal, we are interested in spectrum magnitude

10 10 W 2 2 l / 2 l / 2 1 t (, ) [ ], t t cos 2 + [ + ], t a b = s t + b H l s t b H l sin 2π t= l / 2 a a t= l / 2 a a a π (8) Here W(a,b) is the magnitude o the spectral component or the signal s[t] at time instant b and wavelet scale a. The value o W(a,b) can be obtained or any a and b provided that b does not exceed the length o the signal. The equation (8) thus deines a Continuous Wavelet Transorm or a discrete signal (time sampling). The scale o wavelet a can be expressed in terms o central requency corresponding to it since our mother unction is a unit oscillation: S a = (9) where S is the sampling requency o the signal. A higher value o a stands or a lower central requency. A. Logarithmic requency sampling First o all, the sampling o the scale axis is chosen to be logarithmic in the meaning o requency. It means that each musical octave or each note will have an equal number o spectral samples. Such a choice is explained by the properties o a music signal, which is known to have requencies o notes to ollow a logarithmic law (ollowing the human perception). Logarithmic requency sampling also simpliies harmonic structure analysis and economizes the amount o data necessary to cover the musical tuning system eectively. A voiced signal with single pitch is in the general case represented by its undamental requency and the undamental requency s partials (harmonics) with the requencies equal to the undamental requency multiplied by the number o a partial. Hence the distances between partials (harmonic components) and 0 (basic requency) in logarithmic requency scale are constant independently rom 0. Such harmonic structure looks like a ence, depicted on Figure Log requency Figure 6. Harmonic structure in logarithmic requency scale.

11 11 In order to cover the requency axis orm min to max with N requency samples with a logarithmic law we deine a discrete unction a(n), which denotes the scale o wavelet and where n stands or a wavelet bin number ranging in the interval 0..N-1. a ( n) = min e S n ln N max min (10) Now the transorm (8) sampled in both directions gives W l / 2 1 = n C t min e i min S ( n, b) s[ t + b] H, l e S n C mine l / 2 t S e n C (11) 1 where the constant C = ln N max min. Expression (11) is the basic expression to obtain an N-bin spectrogram o the signal at time instant b. Thus, or a discrete signal o length S, expression (11) provides S N values or each instant o time, N being the number o requency samples. The expression (11) is still a sampled version o the Continuous Wavelet Transorm where the sampling o the scale axis has been chosen logarithmic or N samples. Frequency dependency on the bin number has the ollowing orm (with min =50, max =8000, N=1000). n N ln min nc ( n) = e = e min max min (12) In order to depict the time/requency properties o music signals by this discretized wavelet transorm with a ixed length value (l=20), let s consider wavelet spectrograms o several test signals. Figure 7 shows the wavelet spectrogram W(n,b) o a piano recording. One can observe single notes on the let and chords on the right. Fundamental requency ( 0 ) and its harmonics can be observed in the spectrum o each note. As we can see rom the Figure 7, up to 5 harmonics are resolvable. Higher harmonics ater the 5 th one become indistinguishable especially in the case o chords where the number o simultaneously present requency components is higher.

12 12 Bin number n (requency) cursor 3*F 0 2*F 0 F 0 Time Single notes Spectral proile at cursor Chords Figure 7. Wavelet spectrogram o a piano recording (wavelet (4)). Single notes on the let and chords on the right. Up to 5 harmonics are resolvable. Higher harmonics ater the 5 th one become indistinguishable especially in the case o chords where the number o simultaneous requency components is higher. Good time resolution is important in such tasks as beat or onset detection or music signal analysis. The next example serves to illustrate the time resolution properties o the Variable Resolution Transorm we are developing. In this example we examine a signal with a series o delta-pulses (Dirac) as illustrated in Figure 8 which is a wavelet spectrogram o 5 delta-pulses (1 on the let, 2 in the middle and 2 on the right). As we can see rom this igure, Delta-pulses on the picture are still distinguishable even i the distance between them is only 8 ms (right case). In the case o FFT one need 64-sample window size in order to obtain such time resolution. Figure 8. Wavelet transorm o a signal containing 5 delta-pulses. The distance between two pulses on the right is only 8 ms.

13 13 A quite straightorward listening experiment that we have carried out reveals that the human auditory system is capable to distinguish delta-pulses when a distance between them is around 10 ms. On the other hand, the human auditory system is also able to distinguish very close requencies - 4Hz in average 1, and down to 0.1Hz B. Varying the mother unction However, music analysis requires good requency resolution as well. As we can see rom the spectrogram in Figure 7, neither high-order partials nor close notes are resolvable, because the spectral localization o the used wavelet is too wide. Increasing the length parameter l in (4) or (11) o the Hann window would render our wavelet transorm unusable in low-requency area since the time resolution in low-requency area would rise exponentially. Thus, we propose in this work to make dynamic parameter l with a possibility to adjust its behavior across the scale axis. For such a purpose we propose to use the ollowing law or parameter l in (11) instead o applying scale a(n) to parameter t in H(t,l): l k2 N ( n) L 1 k e n n = 1 (13) N where L is the initial window size, k 1 and k 2 adjustable parameters The transorm (11) becomes: W l / 2 1 = n C tmin e i k2 S ( n, b) s[ t + b] H t, L 1 k e e S n C mine l / 2 1 n N n N (14) The expression (13) allows the eective wavelet width to vary in dierent ways: rom linear to completely S exponential to ollow the original transorm deinition. When L =, k 1 =0 and k 2 =C N, (14) is equivalent to (11). min Figure 9. Various l(n), depending on parameters. From linear (let) to exponential (right). 1

14 14 Doing so, we are now able to control the time resolution behavior o our transorm. In act, such transorm is not anymore a wavelet transorm since the mother-unction changes across the scale axis. For this reason we call the resulted transorm as variable resolution transorm (VRT). It can be also reerred as a custom ilter bank. As the eective mother-unction width (number o wave periods) grows in high-requency relatively to the original mother-unction, the spectral line width becomes more narrow, and hence the transorm allows to resolve harmonic components (partials) o the signal. An example o the spectrogram with new variable resolution transorm is depicted in Figure 10. Bin number n (requency) 3*F 0 2*F 0 F 0 Single notes Chords Time Figure 10. VRT spectrogram o the piano recording used in the previous experiment. Fundamental requencies and partials are distinguishable (k1=0.8, k2=2.1). C. Properties o the VR transorm A music signal between 50 and 8000 Hz contains approximately 8 octaves. Each octave consists o 12 notes, leading to a total number o notes around 100. A ilterbank with 100 ilters would be enough to cover such octave range. In reality, requencies o notes may dier rom the theoretical note requencies o equal-tempered tune because o recording and other conditions. Thereore or music signal analysis considered here, we are working with spectrogram size o 1024 bins 10 times the amount necessary which covers the note scale by 10 bins per note. Timbre is a one o major properties o music signal along with melody and rhythm. Let s consider now a structure o partials o a harmonic signal (harmonic structure). In Figure 6 we have depicted an approximate view o such structure in logarithmic requency scale. According to the deinition o the unction (n) (12), the distance

15 15 between partial i and partial j in terms o number o bins is independent o the absolute undamental requency value. Indeed, according to (12) ( ) n 1 ( ) n( ) = ln( j) j i C 1 n = ln and taking into account i =i* 0 and j =j* 0 we obtain: C min 1 C ( ln ) ( ln( i) ln ) = ( ln( j) ln( i) ) 0 min 0 min 0 0 = 1 C 1 ln C An accurate harmonic analysis o music signal implies that requency resolution in terms o spectrogram bin number, expressed by the spectral dispersion, should be always below the distance between neighboring components under consideration. Having the total width o 20-partial harmonic structure to be a constant around 600 points in terms o number o bins (n( 20 ) - n( 0 )), we can establish that the requency resolution o the obtained transorm is large enough to resolve high-order partials we are interested in at all positions o the VRT spectrogram, especially or low octave notes. It means that a 20-partial harmonic structure starting rom the beginning o the spectrogram will always lie above the dispersion curve. I we consider now the time resolution o the transorm, we must recall Figure 9, where various dependencies on the eective width o ilter were given. I we deine the maximum eective window size to be 180ms (recall our musical signal properties) we obtain the ollowing time resolution grid as illustrated in Figure 11. j i Figure 11. Time resolution dependency o VR transorm with k 2 =0.8, k 2 =2.1. D. Discussion As we can see, our Variable Resolution Transorm is derived rom the classic deinition o Continuous Wavelet Transorm [25; 26]. However, our VRT is not a CWT even though they have many similarities. The main dierence between VRT and CWT resides in the requency axis sampling, as well as in the mother wavelet

16 16 unction which is changing its orm across the scale (or requency) axis in the case o VRT in order to have enough resolution details or high order requency partials. This last property is not a wavelet transorm, because in the true wavelet transorm the mother unction is only scaled and shited making a discrete tiling o the time-requency space in the case o DWT or ininite coverage in the case o CWT. Our VRT can be also reerred to as a specially crated ilter bank. Major dierences between our VRT and a wavelet transorm are: no 100% space tiling no 100% signal reconstruction (depending on parameters) mother unction changes Major similarities between our VRT and a wavelet transorm are the ollowing: They are based on specially sampled version o CWT with certain parameters they can provide 100% signal reconstruction low time resolution and high requency resolution in low requency area and high time with low requency resolution in high requency area IV. APPLICATIONS: MULTIPLE F0 ESTIMATION A music signal generally is a composite signal blended o signals rom several instruments and/or voices thus having multiple undamental requencies. Accurate estimation o these multiple F0s can greatly contribute to urther music signal processing and it is an important scientiic issue in the ield. As the estimation o multiple F0s mostly requires the signal processing in the requency domain, this problem is a very good illustration highlighting the properties o our VRT. Early works on automatic pitch detection were developed or speech signal. (see e.g. [27; 28]). Much literature nowadays treats the monophonic case (only one 0 present and detected) o undamental requency estimation. There are also works studying the polyphonic case o music signal. However, in most o these works the polyphonic music signal is usually considered with a number o restrictions such as the number o notes played simultaneously or some hypothesis about the instruments involved. The work [29] presents a pitch detection technique using separate time-requency windows. Both monophonic and two-voice polyphonic cases are studied. Multiple-pitch estimation in the polyphonic single-instrument case is described in [30] where authors propose to apply a comb-ilter mapping linear requency scale o FFT into logarithmic scale o notes requencies. As the method is FFT-based, the technique inherits drawbacks o FFT or

17 17 music signal analysis as we highlighted in Chapter 3, namely requiring large FFT analysis windows thus leading to low time resolution. An advanced 0 detection algorithm is presented in [31] which is based on inding requencies which maximize a 0 probability density unction. The algorithm is claimed to work in the general case and have been tested on CD recordings. We can also mention many other recent works on multiple undamental requency estimation, or instance the ones in [32; 33]. Both these works are probabilistic methods. The irst one uses a probabilistic HMM-based approach taking into account some a priori musical knowledge such as tonality. Variable results rom 50% to 92% o recognition rates or dierent instruments in MIDI synthesized sequences are reported. The second algorithm is evaluated on synthetic samples where each ile contains only one combination o notes (1 note or 1 chord). It is not evident how to compare these dierent multiple 0 estimation algorithms as assumptions or models on the polyphonic music signal are oten not explicitly stated. On the other hand, there is no single evident way o multiple 0 detection. Some algorithms are strong in noisy environment; some algorithms require a priori training; others are able to detect inharmonic tones etc. The most popular approach to 0 estimation is harmonic pattern matching in requency domain. Our multiple- 0 estimation algorithm makes use o this basic idea. It is illustrated in this paper as an example which relies on our VRT speciically designed or music signal analysis. A. VRT-based multiple 0 estimation The basic principle o the 0 estimation algorithm consists o modeling o our VRT spectrum with harmonic models. Real musical instruments are known to have inharmonic components in their spectrum [34]. It means that the requency o the n th partial can be not strictly equal to 0 *n. The algorithm we describe does not take such inharmonic components into account, but it tolerates some displacement o partials in a natural way. A typical lat harmonic structure used to model the spectrum is depicted in the Figure 12. Figure 12. Harmonic structure. This ence is a vertical cut o VRT spectrogram calculated rom a synthetic signal representing an ideal harmonic instrument. The width o peaks and space between them is variable because the VR transorm has a

18 18 logarithmic requency scale. In the next step, these models are used to approximate the spectrum o the signal being analyzed in order to obtain a list o 0 candidates. Harmonic models F 0 VRT spectrum Figure 13. Matching o harmonic models to spectrum. During every iteration o the algorithm, such harmonic ence is shited along the requency axis o the spectrogram and matched with it at each starting point. The matching o the harmonic model is done as ollows. At every harmonic their amplitudes a i are taken rom the values o the spectrogram or the requencies o i th harmonics. As requencies o harmonics do not necessarily have integer ratios to the undamental requency, we take the maximum amplitude in a close neighborhood, as it is explained in Figure 14. a 1, a 2, a 3, a 4 a n maximal values Spectrum Tolerance windows Figure 14. Procedure o extraction o harmonic amplitude vector. This procedure orms a unction A() which is a norm o the vector a or the requency. The value o requency or which the unction A takes its maximum value is considered as an 0 candidate.

19 19 Further, the obtained 0 candidate and the corresponding vector a o harmonics amplitudes is transormed into a spectrum slice like in Figure 12. The shape o peaks is taken rom the shape o VRT spectrum o a signal with a sine wave with corresponding requency. This slice is then subtracted rom the spectrum under study. The iterative process is repeated either until the current value o harmonic structure A() becomes inerior compared to a certain threshold or until the maximum number iterations has been reached. We limit the maximum number o iterations to 4, and thereore the maximum number o notes that can be simultaneously detected is 4. As it was observed in preliminary experiments, increasing the number o simultaneously detected notes doesn t improve the 0 detection perormance signiicantly or high-polyphonic music, because ater 3 rd or 4 th iteration the residue o spectrum is already quite noisy as almost all harmonic components have been already subtracted rom it due to harmonic overlaps. The procedure o note extraction is applied each 25 ms to the input signal sampled at 16 khz 16 bits. Hence, or the shortest notes with duration around ms we obtain note candidates at least twice in order to be able to apply iltering techniques. Every slice produces a certain number o 0 candidates; then, 0 candidates are iltered in time in order to remove noise and unreliable notes. The time iltering method used is the nearest neighbor interrame iltering. 3 successive rames are taken and 0 candidates in the middle rame are changed according to the 0 candidates in the side neighbors. This ilter removes noisy (alse detected) 0 candidates as well as holes in notes issued by misdetection. B. Experimental evaluation The easiest way to make basic evaluation experiments in automated music transcription is to use MIDI iles (plenty o them can be reely ound on the Internet) rendered into waves as input data. The MIDI events themselves serve as the ground truth. However, the real lie results must be obtained rom recorded music with true instruments and then transcribed by educated music specialists. In our work we used wave iles synthesized rom MIDI using hardware wavetable synthesis o Creative SB Audigy2 soundcard with a high quality 140Mb SoundFont bank Fluid_R3 reely available on the Internet. In such wavetable synthesis banks all instruments are sampled with good sampling rates rom real ones: the majority o pitches producible by an instrument are recorded as sampled (wave) block and stored in the soundont. In the soundont we used, acoustic grand piano, or example, is sampled every our notes rom a real acoustic grand piano. Waves or notes which are in between these reerence notes are taken as resampled waves o closest reerence notes. Thereore, signal generated using such wavetable synthesis can be considered as a real instrument signal

20 20 recorded under ideal conditions. And a polyphonic piece is an ideal linear mixture o true instruments. To make the recording conditions closer to reality in some tests we passed the signal over speakers and record it with a microphone. Recall and Precision measures are used to measure the perormance o the note detection. Recall measure is deined as: the number correct notes detected Recall = the actual number o notes (15) Precision is deined as ollows: the number correct notes detected Precision = the number o all notes deteced (16) For the overall measure o the transcription perormance, the ollowing F1 measure is used Recall Precision F1 = 2 Recall + Precision (17) All alsely detected notes also include those with octave errors. For some tasks o music indexing as or instance tonality determination, what is important is the note basis and not the octave number. For this reason, the perormance o note detection without taking into account octave errors is estimated as well. Our test dataset consists o 10 MIDI iles o classical and pop compositions containing 200 to 3000 notes. Some other test sequences were directly played using the keyboard. The ollowing tables (Table 1 - Table 4) display precision results o our multiple pitch detection. Per.Oct column stands or perormance o note detection not ta king into account notes octaves (just the basic note is important). The polyphony column indicates the maximum and the average number o simultaneously sounding notes ound in the play. Table 1. Note detection perormance in monophonic case. Sequences are played manually using the keyboard. Name o notes Polyphony Perormance Per. Oct max / avg Recall Prec F1 F1 Piano Manual / Violin Manual / Table 2. Note detection perormance in polyphonic case. Sequences o chords are played manually using the keyboard. Name o notes Polyphony Perormance Per. Oct max / avg Recall Prec F1 F1 Piano Manual / Piano Manual / Flute Manual / Table 3. Note detection perormance in polyphonic case. Classical music titles (s ingle- and multi-instrument, no percussion).

21 21 Name Polyphony Perormance Per. Oct o notes max / avg Recall Prec F1 F1 Fur_Elize / Fur_Elize w/ microphone / Tchaikovski / Tchaikovski / Bach / Bach / Bach Fugue / Vivaldi Mandolin Concerto / Table 4. Note detection perormance in polyphoni c case. Popular and other music ( multi-instrument with percussion). Name Polyphony Perormance Per. Oct o notes max / avg Recall Prec F1 F1 K. Minogue / Madonna / Soundtrack / Godather / As we can see rom these t ables, our algorithm perorms quite well in the monophonic case. Good results are also obtained in polyphonic case with classical music having a low average level o polyphony (number o notes simultaneously played). More complex musical compositions which include percussion instrument and have high polyphony rate have produced lower recognition rates. In our note detection algorithm, we have limited the maximal detectable polyphony to 4 while the maximal and average polyphony in the case o popular and other music is 10 and 4.7 correspondingly. The octave precision, however, stays high (per. Oct F1 ield). For comparison purpose, we also implemented our note detection algorithm based on FFT with dierent window size instead o our VRT. We carried out an experiment with a set o polyphonic classical compositions (~1000 notes) using this FFT-based note detection algorithm. Table 5 and Figure 15 summarize the experimental results. Table 5. Comparison o transcription perormance based on dierent time-requency transorms (the FFT with various window sizes versus VRT). Transorm FFT FFT FFT VRT FFT size or number o VRT requency samples Result (F1)

22 22 F1, % FFT-based VRT-based number o requency samples Figure 15. Note detection algorithm perormance according to underlying spectral analysis approach. Further increase o the FFT window size lowers the time resolution down to seconds so that note changes quicker that 0.5 seconds cannot be resolved anymore. These experimental results show the advantage o our VRT and its simple use perorms multiple note detection quite well in the case o low average polyphony rate. V. CONCLUSION In this paper we have introduced a Variable Resolution Transorm as a novel signal processing technique speciically designed or music signal analysis. A music signal is characterized by our major properties: melody, harmony, rhythm and timbre. The classic Fast Fourier transorm, a de-acto standard in music signal analysis in the current literature, has its main drawback o having a uniorm time-requency scale which makes it impossible to perorm eicient spectrum analysis together with good time resolution. The wavelet transorm overcomes this limit by varying the scale o mother-wavelet unction and, hence, the eective window size. This kind o transorm keeps requency details in low-requency area o the spectrum as well as time localization inormation about quickly changing high-requency components. However, the dramatic decrease o requency resolution o the basic wavelet transorm in high-requency area leads to conusion in high order harmonic components where a suicient resolution is necessary or the analysis o harmonic properties o a music signal. We have thus introduced our Variable Resolution Transorm in varying mother-unction. The law o variation is controlled by two parameters, linearity and exponentiality, which can be careully chosen in order to adjust the requency-time resolution grid o the VRT. Hence, our VRT takes advantage o the classic continuous wavelet transorm and the windowed or short-time. As an example o direct VRT application we have presented a VRT-based multiple- 0 estimation algorithm characterized by its simplicity, rapidity and high temporal resolution as opposed to the FFT-based methods. It perorms pretty well in the detection o multiple pitches with non-integer rates. However, as other similar

23 23 algorithms, our VRT-based multiple 0 estimation algorithm does not solve the ollowing problem: two notes with a distance o an octave can hardly be separated, because the second note does not bring any new harmonics into the spectrum, but rather changes the amplitude o existing harmonics o the lower note, so some knowledge o the instruments involved in the play or instrument recognition techniques and multi-channel source separation is necessary to resolve the problem. Our note detection mechanism was evaluated in its direct application musical transcription rom the signal. In this evaluation ground truth data was taken as note score iles MIDI. These iles rom various genres (mostly classical) were rendered into waves using high-quality wavetable synthesis. The resulting wave iles were passed as input or the transcriptions algorithm. The results o the transcription and the ground-truth data were compared and a perormance measure was calculated. Compared to the FFT, the VRT being used in described 0 estimation algorithms gives much higher results together with excellent time resolution. As a major drawback o the VRT an important complexity could be mentioned. Nevertheless, it does not hamper a real-time audio processing every 25ms. Actually we also applied the VRT to the extraction o other music eatures including timber, tempo estimation or music similarity-based retrieval [25; 26]. In all these problems, the VRT has depicted interesting properties or music signal analysis [thesis]. VI. REFERENCES [1] Tanguiane A.S.. Artiicial perception and music recognition (lecture notes in computer science).. Springer, October [2] Casagrande N., Eck D., Kegl B.. Frame-level audio eature extraction using adaboost. Proceedings o the ISMIR International Conerence on Music Inormation Retrieval (London) (2005) : pp [3] Logan B. SA. A music similarity unction based on signal analysis.. In Proceedings o IEEE International Conerence on Multimedia and Expo ICME 01 (2001) [4] Mandel M. ED. Song-level eatures and support vector machines or music classiication. Proceedings o the ISMIR International Conerence on Music Inormation Retrieval (London) (2005) : pp [5] McKinney M.F. BJ. Features or audio and music classiication. Proceedings o the ISMIR International Conerence on Music (2003) : pp [6] Meng A., Shawe-Taylor J.,. An investigation o eature models or music genre classiication using the

24 24 support vector classiier. Proceedings o the ISMIR International Conerence on Music Inormation Retrieval (London) (2005) : pp [7] Scaringella N. ZG. On the modeling o time inormation or automatic genre recognition systems in audio signals. Proceedings o the ISMIR International Conerence on Music Inormation Retrieval (London) (2005) : pp [8] Tzanetakis G. CP. Automatic musical genre classiication o audio signals. IEEE Transactions on Speech and Audio Processing 10 (2002) : p. no 5, [9] West K. CS. Features and classiiers or the automatic classiication o musical audio signals. Proceedings o the ISMIR International Conerence on Music Inormation Retrieval (Barcelona, Spain) (2004) : pp [10] Foote Jonathan T.. Content-based retrieval o music and audio. Proceedings o SPIE Multimedia Storage and Archiving Systems II (Bellingham,WA) vol. 3229, SPIE (1997) : pp [11] Logan B.. Mel requency cepstral coeicients or music modeling. Proceedings o the ISMIR International Symposium on Music Inormation Retrieval (Plymouth, MA) (2000) [12] Aucouturier J.J. PF. Timbre similarity: how high is the sky?. In JNRSAS (2004) [13] Pampalk E.. Computational models o music similarity and their application in music inormation retrieval. PhD thesis at Technishen Universitaet Wien, Fakultaet uer Inormatik [14] Kronland-Martinet R. MJAGA. Analysis o sound patterns through wavelet transorm. International Journal o Pattern Recognition and Artiicial Intelligence,Vol. 1(2) (1987) : pp [15] Grimaldi M., Kokaram A., Cunningham P.. Classiying music by genre using the wavelet packet transorm and a round-robin ensemble. (2002) [16] Kadambe S. FBG. Application o the wavelet transorm or pitch detection o speech signals. IEEE Transactions on Inormation Theory (1992) 38, no 2: pp [17] Mallat S.G.. A wavelet tour o signal processing.. Academic Press, [18] Grossman A. MJ. Decomposition o hardy into square integrable wavelets o constant shape. SIAM J. Math. Anal. (1984) 15: pp [19] Lang W.C. FK. Time-requency analysis with the continuous wavelet transorm. Am. J. Phys. (1998) 66(9): pp [20] Tzanetakis G., Essl G., Cook P.. Audio analysis using the discrete wavelet transorm.. WSES Int. Con. Acoustics and Music: Theory 2001 and Applications (AMTA), Skiathos, Greece (2001)

25 25 [21] Brown J. C.. Calculation o a constant q spectral transorm. J. Acoust. Soc. Am. (1991) 89(1): pp [22] Nawab S.H., Ayyash S.H., Wotiz R.. Identiication o musical chords using constant-q spectra.. In Proc. ICASSP (2001) [23] Essid S.. Classisication automatique des signaux audio-réquences : reconnaissance des instruments de musique. PhD thesis Inormatique, Telecommunications et Electronique, ENST [24] Diniz F.C.C.B, Kothe I, Netto S.L., Biscainho L.P.. High-selectivity ilter banks or spectral analysis o music signals. EURASIP Journal on Advances in Signal Processing (2007) [25] Paradzinets A., Harb H., Chen L.,. Use o continuous wavelet-like transorm in automated music transcription. Proceedings o EUSIPCO (2006) [26] Paradzinets A., Kotov O., Harb H., Chen L.,. Continuous wavelet-like transorm based music similarity eatures or intelligent music navigtion. In proceedings o CBMI (2007) [27] Abe T. et al.. Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous requency. In proceedings o ICSLP'96 (1996) : pp [28] Hu J., Sheng Xu., Chen J.. A modiied pitch detection algorithm. IEEE COMMUNICATIONS LETTERS (2001) Vol. 5, No 2 [29] Klapuri A.. Pitch estimation using multiple independent time-requency windows. IEEE Workshop on Applications o Signal Processing to Audio and Acoustics (1999) [30] Lao W., Tan E.T., Kam A.H.. Computationally inexpensive and eective scheme or automatic transcription o polyphonic music. Proceedings o ECME (2004) [31] Goto M.. A predominant-0 estimation method or cd recordings: map estimation using em algorithm or adaptive tone models. In proceedings o ICASSP (2001) [32] Li Y. WD. Pitch detection in polyphonic music using instrument tone models. In proceedings o ICASSP (2007) [33] Yeh C., Roebel A., Rodet X.. Multiple undamental requency estimation o polyphonic music signals. in Proc. IEEE, ICASSP (2005) [34] Klapuri A.. Signal processing methods or the automatic transcription o music. PhD thesis at Tampere University o Technology.2004.

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

CONTINUOUS WAVELET-LIKE TRANSFORM BASED MUSIC SIMILARITY FEATURES FOR INTELLIGENT MUSIC NAVIGATION

CONTINUOUS WAVELET-LIKE TRANSFORM BASED MUSIC SIMILARITY FEATURES FOR INTELLIGENT MUSIC NAVIGATION CONTINUOUS WAVELET-LIKE TRANSFORM BASED MUSIC SIMILARITY FEATURES FOR INTELLIGENT MUSIC NAVIGATION Aliaksandr Paradzinets 1, Oleg Kotov 2, Hadi Harb 3, Liming Chen 4 Ecole Centrale de Lyon Departement

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T ) REFERENCES: 1.) Charles Taylor, Exploring Music (Music Library ML3805 T225 1992) 2.) Juan Roederer, Physics and Psychophysics of Music (Music Library ML3805 R74 1995) 3.) Physics of Sound, writeup in this

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Measurement of overtone frequencies of a toy piano and perception of its pitch

Measurement of overtone frequencies of a toy piano and perception of its pitch Measurement of overtone frequencies of a toy piano and perception of its pitch PACS: 43.75.Mn ABSTRACT Akira Nishimura Department of Media and Cultural Studies, Tokyo University of Information Sciences,

More information

Normalized Cumulative Spectral Distribution in Music

Normalized Cumulative Spectral Distribution in Music Normalized Cumulative Spectral Distribution in Music Young-Hwan Song, Hyung-Jun Kwon, and Myung-Jin Bae Abstract As the remedy used music becomes active and meditation effect through the music is verified,

More information

Transcription An Historical Overview

Transcription An Historical Overview Transcription An Historical Overview By Daniel McEnnis 1/20 Overview of the Overview In the Beginning: early transcription systems Piszczalski, Moorer Note Detection Piszczalski, Foster, Chafe, Katayose,

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB Laboratory Assignment 3 Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB PURPOSE In this laboratory assignment, you will use MATLAB to synthesize the audio tones that make up a well-known

More information

Experiments on musical instrument separation using multiplecause

Experiments on musical instrument separation using multiplecause Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk

More information

A prototype system for rule-based expressive modifications of audio recordings

A prototype system for rule-based expressive modifications of audio recordings International Symposium on Performance Science ISBN 0-00-000000-0 / 000-0-00-000000-0 The Author 2007, Published by the AEC All rights reserved A prototype system for rule-based expressive modifications

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Roger B. Dannenberg and Ning Hu School of Computer Science, Carnegie Mellon University email: dannenberg@cs.cmu.edu, ninghu@cs.cmu.edu,

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

HST 725 Music Perception & Cognition Assignment #1 =================================================================

HST 725 Music Perception & Cognition Assignment #1 ================================================================= HST.725 Music Perception and Cognition, Spring 2009 Harvard-MIT Division of Health Sciences and Technology Course Director: Dr. Peter Cariani HST 725 Music Perception & Cognition Assignment #1 =================================================================

More information

THE APPLICATION OF SIGMA DELTA D/A CONVERTER IN THE SIMPLE TESTING DUAL CHANNEL DDS GENERATOR

THE APPLICATION OF SIGMA DELTA D/A CONVERTER IN THE SIMPLE TESTING DUAL CHANNEL DDS GENERATOR THE APPLICATION OF SIGMA DELTA D/A CONVERTER IN THE SIMPLE TESTING DUAL CHANNEL DDS GENERATOR J. Fischer Faculty o Electrical Engineering Czech Technical University, Prague, Czech Republic Abstract: This

More information

Designing Filters with the AD6620 Greensboro, NC

Designing Filters with the AD6620 Greensboro, NC Designing Filters with the AD66 Greensboro, NC Abstract: This paper introduces the basics o designing digital ilters or the AD66. This article assumes a basic knowledge o ilters and their construction

More information

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

6.5 Percussion scalograms and musical rhythm

6.5 Percussion scalograms and musical rhythm 6.5 Percussion scalograms and musical rhythm 237 1600 566 (a) (b) 200 FIGURE 6.8 Time-frequency analysis of a passage from the song Buenos Aires. (a) Spectrogram. (b) Zooming in on three octaves of the

More information

UNIVERSITY OF DUBLIN TRINITY COLLEGE

UNIVERSITY OF DUBLIN TRINITY COLLEGE UNIVERSITY OF DUBLIN TRINITY COLLEGE FACULTY OF ENGINEERING & SYSTEMS SCIENCES School of Engineering and SCHOOL OF MUSIC Postgraduate Diploma in Music and Media Technologies Hilary Term 31 st January 2005

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals By: Ed Doering Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals By: Ed Doering Online:

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Music Representations

Music Representations Lecture Music Processing Music Representations Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral

More information

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach

More information

Spectrum Analyser Basics

Spectrum Analyser Basics Hands-On Learning Spectrum Analyser Basics Peter D. Hiscocks Syscomp Electronic Design Limited Email: phiscock@ee.ryerson.ca June 28, 2014 Introduction Figure 1: GUI Startup Screen In a previous exercise,

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING Zhiyao Duan University of Rochester Dept. Electrical and Computer Engineering zhiyao.duan@rochester.edu David Temperley University of Rochester

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Holger Kirchhoff 1, Simon Dixon 1, and Anssi Klapuri 2 1 Centre for Digital Music, Queen Mary University

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 1 Methods for the automatic structural analysis of music Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 2 The problem Going from sound to structure 2 The problem Going

More information

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. Pitch The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. 1 The bottom line Pitch perception involves the integration of spectral (place)

More information

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 6, OCTOBER 2011 1205 Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE,

More information

Analysis, Synthesis, and Perception of Musical Sounds

Analysis, Synthesis, and Perception of Musical Sounds Analysis, Synthesis, and Perception of Musical Sounds The Sound of Music James W. Beauchamp Editor University of Illinois at Urbana, USA 4y Springer Contents Preface Acknowledgments vii xv 1. Analysis

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology 26.01.2015 Multipitch estimation obtains frequencies of sounds from a polyphonic audio signal Number

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

HUMANS have a remarkable ability to recognize objects

HUMANS have a remarkable ability to recognize objects IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER 2013 1805 Musical Instrument Recognition in Polyphonic Audio Using Missing Feature Approach Dimitrios Giannoulis,

More information

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang

More information

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM Thomas Lidy, Andreas Rauber Vienna University of Technology, Austria Department of Software

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Topic 4. Single Pitch Detection

Topic 4. Single Pitch Detection Topic 4 Single Pitch Detection What is pitch? A perceptual attribute, so subjective Only defined for (quasi) harmonic sounds Harmonic sounds are periodic, and the period is 1/F0. Can be reliably matched

More information

Lecture 10 Harmonic/Percussive Separation

Lecture 10 Harmonic/Percussive Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 10 Harmonic/Percussive Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing

More information

OCTAVE C 3 D 3 E 3 F 3 G 3 A 3 B 3 C 4 D 4 E 4 F 4 G 4 A 4 B 4 C 5 D 5 E 5 F 5 G 5 A 5 B 5. Middle-C A-440

OCTAVE C 3 D 3 E 3 F 3 G 3 A 3 B 3 C 4 D 4 E 4 F 4 G 4 A 4 B 4 C 5 D 5 E 5 F 5 G 5 A 5 B 5. Middle-C A-440 DSP First Laboratory Exercise # Synthesis of Sinusoidal Signals This lab includes a project on music synthesis with sinusoids. One of several candidate songs can be selected when doing the synthesis program.

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller) Topic 11 Score-Informed Source Separation (chroma slides adapted from Meinard Mueller) Why Score-informed Source Separation? Audio source separation is useful Music transcription, remixing, search Non-satisfying

More information

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Jana Eggink and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 11

More information

Automatic Construction of Synthetic Musical Instruments and Performers

Automatic Construction of Synthetic Musical Instruments and Performers Ph.D. Thesis Proposal Automatic Construction of Synthetic Musical Instruments and Performers Ning Hu Carnegie Mellon University Thesis Committee Roger B. Dannenberg, Chair Michael S. Lewicki Richard M.

More information

Musical Sound: A Mathematical Approach to Timbre

Musical Sound: A Mathematical Approach to Timbre Sacred Heart University DigitalCommons@SHU Writing Across the Curriculum Writing Across the Curriculum (WAC) Fall 2016 Musical Sound: A Mathematical Approach to Timbre Timothy Weiss (Class of 2016) Sacred

More information

Violin Timbre Space Features

Violin Timbre Space Features Violin Timbre Space Features J. A. Charles φ, D. Fitzgerald*, E. Coyle φ φ School of Control Systems and Electrical Engineering, Dublin Institute of Technology, IRELAND E-mail: φ jane.charles@dit.ie Eugene.Coyle@dit.ie

More information

Music Alignment and Applications. Introduction

Music Alignment and Applications. Introduction Music Alignment and Applications Roger B. Dannenberg Schools of Computer Science, Art, and Music Introduction Music information comes in many forms Digital Audio Multi-track Audio Music Notation MIDI Structured

More information

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1 02/18 Using the new psychoacoustic tonality analyses 1 As of ArtemiS SUITE 9.2, a very important new fully psychoacoustic approach to the measurement of tonalities is now available., based on the Hearing

More information

Simple Harmonic Motion: What is a Sound Spectrum?

Simple Harmonic Motion: What is a Sound Spectrum? Simple Harmonic Motion: What is a Sound Spectrum? A sound spectrum displays the different frequencies present in a sound. Most sounds are made up of a complicated mixture of vibrations. (There is an introduction

More information

Tempo and Beat Tracking

Tempo and Beat Tracking Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Tempo and Beat Tracking Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

Music Database Retrieval Based on Spectral Similarity

Music Database Retrieval Based on Spectral Similarity Music Database Retrieval Based on Spectral Similarity Cheng Yang Department of Computer Science Stanford University yangc@cs.stanford.edu Abstract We present an efficient algorithm to retrieve similar

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Music Complexity Descriptors. Matt Stabile June 6 th, 2008 Music Complexity Descriptors Matt Stabile June 6 th, 2008 Musical Complexity as a Semantic Descriptor Modern digital audio collections need new criteria for categorization and searching. Applicable to:

More information

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL MUSIC TRANSCRIPTION USING INSTRUMENT MODEL YIN JUN (MSc. NUS) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF COMPUTER SCIENCE DEPARTMENT OF SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE 4 Acknowledgements

More information

POLYPHONIC TRANSCRIPTION BASED ON TEMPORAL EVOLUTION OF SPECTRAL SIMILARITY OF GAUSSIAN MIXTURE MODELS

POLYPHONIC TRANSCRIPTION BASED ON TEMPORAL EVOLUTION OF SPECTRAL SIMILARITY OF GAUSSIAN MIXTURE MODELS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 POLYPHOIC TRASCRIPTIO BASED O TEMPORAL EVOLUTIO OF SPECTRAL SIMILARITY OF GAUSSIA MIXTURE MODELS F.J. Cañadas-Quesada,

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Montserrat Puiggròs, Emilia Gómez, Rafael Ramírez, Xavier Serra Music technology Group Universitat Pompeu Fabra

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

We realize that this is really small, if we consider that the atmospheric pressure 2 is

We realize that this is really small, if we consider that the atmospheric pressure 2 is PART 2 Sound Pressure Sound Pressure Levels (SPLs) Sound consists of pressure waves. Thus, a way to quantify sound is to state the amount of pressure 1 it exertsrelatively to a pressure level of reference.

More information

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing Universal Journal of Electrical and Electronic Engineering 4(2): 67-72, 2016 DOI: 10.13189/ujeee.2016.040204 http://www.hrpub.org Investigation of Digital Signal Processing of High-speed DACs Signals for

More information