Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips

Size: px
Start display at page:

Download "Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips"

Transcription

1 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, Rea-Time Audio-to-Score Aignment of Music Performances Containing Errors and Arbitrary Repeats and Skips Tomohiko Nakamura, Student Member, IEEE, Eita Nakamura, Member, IEEE, Shigeki Sagayama, Member, IEEE. arxiv: v1 [cs.sd] 24 Dec 2015 Abstract This paper discusses rea-time aignment of audio signas of music performance to the corresponding score (a.k.a. score foowing) which can hande tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score foowing is particuary usefu in automatic accompaniment for practices and rehearsas, where errors and repeats/skips are often made. Simpe extensions of the agorithms previousy proposed in the iterature are not appicabe in these situations for scores of practica ength due to the probem of arge computationa compexity. To cope with this probem, we present two hidden Markov modes of monophonic performance with errors and arbitrary repeats/skips, and derive efficient score-foowing agorithms with an assumption that the prior probabiity distributions of score positions before and after repeats/skips are independent from each other. We confirmed rea-time operation of the agorithms with music scores of practica ength (around notes) on a modern aptop and their tracking abiity to the input performance within 0.7 s on average after repeats/skips in carinet performance data. Further improvements and extension for poyphonic signas are aso discussed. Keywords Score foowing, audio-to-score aignment, arbitrary repeats and skips, fast Viterbi agorithm, hidden Markov mode, music signa processing I. INTRODUCTION Rea-time aignment of an audio signa of a music performance to a given score, aso known as score foowing, has been gathering attention since its first appearance in 1984 [1], [2]. Score foowing is a basic technique for reatime musica appications such as automatic accompaniment, automatic score page-turning [3] and automatic captioning to music videos. The technique is particuary essentia for automatic accompaniment, which synchronizes an accompaniment to a performer on the fy, referring to performance and accompaniment scores. Automatic accompaniment enabes ive Citation information: DOI /TASLP , IEEE/ACM Transactions on Audio, Speech, and Language Processing. (c) 2015 IEEE. Persona use is permitted, but repubication/redistribution requires IEEE permission. See standards/ pubications/rights/index.htm for more information. T. Nakamura is with the Department of Information Physics and Computing, Graduate Schoo of Information Science and Technoogy, the University of Tokyo, Tokyo , Japan (Tomohiko Nakamura@ipc.i.u-tokyo.ac.jp). E. Nakamura is with the Graduate Schoo of Informatics, Kyoto University, Kyoto , Japan (enakamura@am.kuis.kyoto-u.ac.jp). S. Sagayama is a Professor Emeritus of University of Tokyo, Tokyo, , Japan and currenty with the Schoo of Interdiscipinary Mathematica Sciences, Meiji University, Tokyo , Japan (sagayama@meiji.ac.jp). performance of ensembe music by one or a few performers. Many studies of score foowing have been carried out (see [4] for a review and [5] [13] for recent progress). Automatic accompaniment is particuary usefu for practices, rehearsas and persona enjoyment of ensembe music. In these situations, performers often make errors. Moreover, performers may want to start paying from the midde of a score and generay make repeats and/or skips (repeats/skips). Since errors and repeats/skips are hard to predict, a scorefoowing agorithm capabe of handing arbitrary errors and repeats/skips is necessary to reaize an automatic accompaniment system effective in those situations. Our aim is to deveop such an agorithm. Treatment of errors in score foowing is discussed in some studies [4], [5], [13], [14]. However, a detaied discussion and a systematic evauation of the effectiveness of the methods for audio score foowing have not been given in the iterature. Score-foowing agorithms that can foow repeats/skips have been proposed in [5], [11], [15]. The targets of these agorithms are predetermined repeats/skips from and to specific score positions, and treatment of arbitrary repeats/skips is not discussed nor guaranteed. In fact, as we wi show in this paper, simpe extensions of these agorithms have the probem of arge computationa cost and cannot work in rea time for ong scores of practica ength. Uness the probem is soved, score-foowing systems can ony work with imited scores with very short ength or we must give up foowing arbitrary repeats/skips as most of the current systems do, both of which sacrifice the vast potentia appication of score foowing. Therefore, it is essentia to reduce the computationa compexity to foow arbitrary repeats/skips. The authors have presented a new type of hidden Markov mode (HMM) that describes musica instrument digita interface (MIDI) performances with errors and arbitrary repeats/skips, and derived a computationay efficient agorithm for the HMM [13]. It reduces the computationa compexity with an assumption to simpify a probabiity distribution of score positions before and after repeats/skips. Whie a simiar mode woud be appicabe to the audio case, further discussions are required since audio inputs (frame-wise discrete in time and continuous in features) significanty differ with MIDI inputs (continuous in time and discrete in pitches) in nature. The main contribution of this paper is to present rea-time agorithms that can foow monophonic audio performances containing arbitrary repeats/skips and errors. Athough monophonic score foowing has been addressed since [1], [2],

2 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, arbitrary repeats/skips have never been discussed despite the practica importance of their treatment as the above mentioned. Because poyphonic score foowing is sti an active fied of research and the extension of the present method for poyphonic performances requires many additiona issues discussed in Sec. V, we confine ourseves to monophonic performances. We deveop a mode of music performances containing errors and arbitrary repeats/skips with an HMM. We first discuss how various types of errors can be incorporated into the mode (Sec. II). Next, we extend the mode to incorporate arbitrary repeats/skips. In order to sove the probem of arge computationa cost for foowing arbitrary repeats/skips, two HMMs with refined topoogies are presented. We derive efficient score-foowing agorithms with reduced computationa compexity based on both HMMs (Sec. III). We demonstrate that both agorithms can work in rea time with scores of practica ength on a modern aptop computer and are effective in foowing performances with errors and arbitrary repeats/skips through evauations using carinet performances during practice (Sec. IV). We discuss possibe improvements and extensions of the proposed agorithms for poyphonic inputs (Sec. V). Part of this study (Sec. III and a part of Sec. IV) was reported in our previous conference paper [12]. II. SCORE FOLLOWING FOR PERFORMANCES WITH ERRORS A. Variety in Audio Performance and Statistica Approach Score foowing is generay chaenging since audio signas of music performances widey vary even if the same score is used. Four typica sources of variety in monophonic audio performance are isted beow. (a) Acoustic variations: Spectra features of audio performances depend on musica instruments and are not stationary. In addition, audio performances usuay incude noise caused by the surrounding environment and musica instruments (e.g. resonance, background noise, breath noise and other acoustics). (b) Tempora fuctuations: The tempo of the performance and onset times and durations of performed notes deviate from those indicated in scores due to performer s skis, physica imitations of musica instruments and musica expressions. For exampe, performances during practice are often rendered in sow tempo to avoid errors. (c) Performance errors: Performers may make errors due to ack of performance skis or mis-readings of the score. Errors are categorized into pitch errors (substitution errors), dropping notes (deetion errors), adding extra notes (insertion errors) [1]. Besides, performers may make pauses between notes, for exampe, to turn a page of the score and to check the next note. (d) Repeats/skips: Performers may repeat and/or skip phrases in particuar during practice. Furthermore, the performers generay add or deete a repeated section. These four sources of variety in monophonic audio performance make score foowing difficut and motivate us to study it. In particuar, it is essentia to adapt automatic accompaniment systems to the variety in order to keep synchronization Fig. 1. A hierarchica hidden Markov mode with two eves that describes a music performance with deetion, insertion and substitution errors. See text. to ive performances. Athough it is out of the scope of this paper, there are other sources of variety in music performance such as ornaments [6], [13], [16], [17] and improvisation [18], [19]. Recent score-foowing systems commony use probabiistic modes such as HMM to capture the variety of audio performances, and their effectiveness has been we confirmed [4] (and references in the Introduction). They are particuary advantageous to capture continuous variations of audio features and to hande errors which are hard to predict. Therefore, we take the statistica approach in this study. B. Performance HMM We represent the performance score with N musica events, each of which is a note or a rest. A performer reads the score from event to event and keeps making a sound corresponding to an event. This process of performance can be modeed with a hierarchica HMM with two eves [20], [21], which we ca the performance HMM. The top eve describes the progression of performed events, and the bottom eve expresses tempora structure of the audio signa in a performed event. Events correspond to states (top states) of the top-eve HMM (top HMM), and the performance is described as transitions between the top states. Let z (top) t = 0,, N 1 denote the random variabe describing the top state at the tth frame (t = 0,, T 1), and et i and j abe a top state. The top HMM is parameterized by state transition probabiities a j,i and initia probabiities π i : a j,i := P (z (top) t = i z (top) t 1 = j), (1) π i := P (z (top) 0 = i), (2) which satisfy N 1 i=0 π i = 1 and N 1 i=0 a j,i = 1 for a j. Each top state is itsef an HMM (bottom HMM), whose states (bottom states) correspond to subevents in an event, for

3 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, exampe, sustain of an instrumenta sound, pauses between notes, etc. Let L denote the number of bottom states in the top state, z (bot) t = 0,, L 1 denote the random variabe describing the bottom state at the tth frame, and et and abe a bottom state. The state transitions of the bottom HMM are characterized by three kinds of probabiities. The initia probabiity π (i) describes the probabiity of a transition to bottom state when top state i is entered, the exiting probabiity e (i) describes the probabiity of exiting top state i from bottom state, and the transition probabiity a (i), := P (z (bot) t = z (bot) t 1 = ) represents the transition from bottom state to bottom state in top state i. These probabiities satisfy L 1 =0 π(i) = 1 and L 1 =0 a(i), + e (i) = 1 for a and i. Thus, the performance is modeed as a sequence of T pairs of random variabes {(z (top) t, z (bot) t )} T t=0 1 (Fig. 1). For exampe, if the pair z t := (z (top) t, z (bot) t ) equas to (i, ), the score position at frame t is at bottom state of top state i. Observed audio features are described as being stochasticay generated from a bottom state. Given an audio feature y t := [y t,0, y t,1,, y t,d 1 ] at frame t as a D-dimensiona rea vector, the emission probabiity of state (i, ) is defined as b (i) (y t ) := P (y t z t = (i, )). (3) C. Emission Probabiity and Substitution Error From here to Sec. II-E, we consider the performance HMM with L = 1 for simpicity, but the case for L > 1 can be treated simiary. To extract pitch information from the input signa, we need a suitabe feature representation. In the comparison of some audio features in [7], [22], the magnitude of a constant-q transform (CQT) [23] with a quaity factor set to one semitone yieded the best resut of score foowing for monophonic audio input. Furthermore, normaizing magnitudes of CQTs such that D 1 d=0 y d = 1 makes them insusceptibe to dynamic variations. Athough one may think that the normaization makes it difficut to discriminate pauses from notes, the difference in spectra shape between pauses and notes can hep the discrimination: The CQT of a pitched sound have cear peaks at its fundamenta frequency and harmonics, whereas the CQTs at pauses are reativey fat. We use normaized magnitudes of CQTs (normaized CQTs) as audio features. Let k be the pitch index and K be the set of possibe pitches. For convenience, we indicate the pitches A0 to C8 in the range of a standard piano as k = 21 to k = 108 and sience as k = 1, and K = {21, 22,, 108} { 1}. We assume that normaized CQTs corresponding to pitch k foow a D-dimensiona norma distribution with mean µ k and covariance matrix Σ k, denoted by N (y t µ k, Σ k ). The emission probabiity b (i) 0 (y t) of bottom state 0 of top state i is given as b (i) 0 (y t) = w (i) k,0 N (y t µ k, Σ k ). (4) k K Here w (i) k,0 [0, 1] is a mixture weight of pitch k of bottom state 0 of top state i, which satisfies k K w(i) k,0 = 1 for a i. Fig. 2. A pause between notes is described with the pause state (gray disk) which emits audio features corresponding to sience. When substitution errors are not made, w (i) k,0 = 0 uness k = p i, where p i K denotes the pitch of event i (p i = 1 for a rest). On the other hand, to describe a performance with substitution errors, we have sma positive vaues of w (i) k,0 for k p i since a substitution error is represented by an emission of an audio feature with an incorrect pitch. D. Transition Probabiity and Deetion and Insertion Errors Transition probabiities in the top eve a j,i represent the frequency of the transitions between the events. If performances do not contain insertion and deetion errors, a j,i = 0 uness i = j + 1. We can express an insertion error and a deetion error with a sef transition and a transition to the second next top state, which correspond to a j,j and a j,j+2. The sef-transition probabiity a (i) 0,0 of bottom state 0 of top state i describes the expected duration of the corresponding event d i, which is computed as a product of the note vaue of the event and the score-notated tempo: d i = k(a (i) 0,0 )k 1 (1 a (i) 0,0 ) = 1. (5) 1 a (i) 0,0 k=1 If d i is shorter than a processing time interva, we put a (i) 0,0 = 0. This probabiistic representation of the event duration describes the tempora fuctuations of music performance. E. Pauses between Notes Pauses between notes can be introduced into the performance HMM by adding an extra bottom state with index 1, which we ca a pause state (Fig. 2). The occurrence of the pause is expressed as a transition to the pause state, which corresponds to a (i) 0,1. The duration of the extra pause is represented by the sef-transition probabiity of the pause state a (i) 1,1, which can be set simiary to Eq. (5). We put a(i) 1,0 = 0 and π (i) 1 = 0 for a i. We assume that b (i) 1 (y t) = N (y t µ 1, Σ 1 ). F. Estimation of Score Positions For the convenience of estimating score positions, we convert the performance HMM into an equivaent standard HMM. Its state corresponds to a bottom state of the performance HMM and is abeed with (i, ). The standard HMM is

4 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, parameterized by emission probabiities b (i,) (y t ), initia probabiities π (i,), and transition probabiities ã (j, ),(i,), defined by b (i,) (y t ) := b (i) (y t ), π (i,) := π i π (i), and ã (j, ),(i,) := { a (i) a i,i π (i) (i = j) a j,i π (i) (i j), (6), + e(i) e (j) Given observed normaized CQTs up to the tth frame y 0:t = {y τ } t τ=0, the score position at frame t is estimated with the standard HMM by soving where argmax P (z t y 0:t ) = argmax P (y 0:t, z t ), (7) z t z t P (y 0:t, z t ) = z 0:t 1 ( t bzτ (y τ )ã zτ 1,z τ ) bz0 (y 0 ) π z0. (8) τ=1 Here z 0:t 1 denotes {z τ } t 1 τ=0. Eq. (7) is derived from the Bayes theorem. This maximization probem can be soved efficienty with the forward agorithm. It computes the forward variabe α t,zt := P (y 0:t, z t ) in a recursive manner: b(i,) (y t ) α t 1,(j, )ã (j, ),(i,) (t 1), α t,(i,) = j=0,,n 1 =0,,L 1 b(i,) (y 0 ) π (i,) (t = 0). (9) Since ã (j, ),(i,) = 0 uness 0 i j 2, the compexity of computing α t,(i,) is of O(LN) at each time step. III. INCORPORATING ARBITRARY REPEATS/SKIPS AND FAST SCORE-FOLLOWING ALGORITHMS A. Incorporating Arbitrary Repeats/Skips and Computationa Compexity for Inference So far, the top HMM is eft-to-right and its states are connected ony to their neighboring states. However, a top states must be connected to describe arbitrary repeats/skips, i.e. a j,i > 0 for a j and i. The mode is a generaization of the performance modes in previous studies [5], [11], [15]. Assuming L = 1 for simpicity and dropping the subscripts, from the parameters of the standard HMM and the forward variabes as ã j,i := ã (j,0),(i,0), b i (y t ) := b (i,0) (y t ), π i := π (i,0) and α t,i := α t,(i,0), Eq. (9) can be rewritten as N 1 bi (y t ) α t 1,j ã j,i (t 1), α t,i = (10) j=0 bi (y 0 ) π i (t = 0). Eq. (10) for t 1 contains a summation over N states for each i, and the compexity is of O(N 2 ). As we wi experimentay show in Sec. IV-A, this compexity is too arge to run in rea time with scores of practica ength on a modern aptop. Therefore, it is crucia to reduce the compexity. It is noteworthy that a simiar arge compexity can emerge even if ony specific repeats/skips are aowed (e.g. transitions between the first notes of bars in a score), since the number of such specific transitions often increases in proportion to N. One may think that pruning techniques can be used to reduce the computationa compexity. However, pruning is ineffective here since repeats/skips sedom occur, and it is necessary to take a transitions into account. Computing a transitions has a benefit aso in foowing performances without repeats/skips. When an estimation error of score position occurs, a score foower may fai to track the performance and become ost. It often happens that a score foower with a pruning technique (e.g. with a imited search window) cannot recover from being ost. By contrast, if a score foower searches a transitions, it can return to find the correct score position after a whie if the performer continues the performance. B. Reduction of Computationa Compexity by Factorizing Probabiities of Repeats/Skips One method to reduce the computationa compexity whie computing a transitions is to introduce some constraints on the transition probabiities. In [13], reduction of the computationa compexity is achieved with an assumption that the probabiity of score positions where performers stop before repeats/skips (stop positions) is the same regardess of where they resume performing after repeats/skips (resumption positions). We sha introduce this assumption to the performance HMM. The transition probabiity of a repeat/skip from event j to event i is then written as a product of two probabiities s j and r i. s j is the probabiity of stopping at event j before a repeat/skip, and r i is the probabiity of resuming a performance at event i after a repeat/skip. The transition probabiity of the top HMM is then written as a j,i = a (nbh) j,i + s j r i. (11) where a (nbh) j,i is a band matrix satisfying a (nbh) j,i = 0 uness 0 i j 2. The parameter a (nbh) j,i characterizes transitions within neighboring states and is determined according to the normaization constraint of a j,i, which is written as 1 = i a j,i = i a(nbh) j,i + s j i r i for a j. Without oss of generaity, we can assume i r i = 1 and then we have i a(nbh) j,i = 1 s j. Let us denote the set of neighboring states of top state i by nbh(i) := {j; j = 0,, N 1, 0 i j 2}. The transition probabiity of the standard HMM ã j,i for j / nbh(i) is written as With Eqs. (12) and (10), we have { α t,i = b i (y t ) α t 1,j ã j,i + r i π (i) 0 j nbh(i) ( N 1 j=0 ã j,i = e (j) 0 s jr i π (i) 0. (12) α t 1,j e (j) 0 s j j nbh(i) )} α t 1,j e (j) 0 s j. (13)

5 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, Fig. 3. A repeat/skip can be described with two-step transitions via the break state representing sient breaks. Since the first summation in the parentheses of the second term is independent of i, it is sufficient to cacuate it once at each time step. This term and the rest of Eq. (13) are of O(N), and hence the tota computationa compexity is O(N). The space compexity is aso reduced: The transition probabiity matrix in the top eve is now parameterized by 4(N 1) parameters (s j, r i and a (nbh) i,j ). It has N(N 1) parameters originay. This resut can be generaized for the performance HMM with L > 1. The standard HMM has LN states and updating α t,(i,) at each time step is of O((LN) 2 ) according to Eq. (9). If we introduce the above assumption, the transition probabiity of the standard HMM ã (j, ),(i,) can aso be divided into a component dependent ony on i, and a component dependent ony on j,. Therefore, the tota computationa compexity is reduced to O(LN) (see Appendix B for detais). Importanty, this reduction method can be used regardess of the topoogy of the bottom HMMs, and it is compatibe with the pause states and appicabe to performance HMMs with more compex structure of bottom HMMs (e.g. [6], [20], [24], [25]). A simiar reduction method is vaid for the Viterbi agorithm and the backward agorithm. The method can be appied to any HMM and simiar dynamic programming techniques as we, and it can be usefu for appications other than score foowing, (e.g. timbre editing of music signas [26]). C. Expicit Description of Sient Breaks at Repeats/Skips We can achieve a simiar reduction of the computationa compexity by using another assumption on arbitrary repeats/skips. Performers frequenty make sient breaks at repeats/skips to get ready for resuming the performance. In fact, 59 of 63 repeats/skips accompanied the breaks onger than 500 ms in actua performances used in Sec. IV-C1. Let us represent the sient breaks by introducing an additiona state (the break state) as the N th top state. The duration of the breaks is described with the sef-transition probabiity of the bottom state of the break state a (N) 0,0, and its vaue is determined simiary to Eq. (5). Repeats/skips are represented as two-step transitions via the break state (Fig. 3). Stopping (resuming) a performance is expressed as transitions to (from) the break state whose probabiity is denoted by s j (r i, respectivey). We note that the top states excuding the break state are connected ony to neighboring top states, and thus ã j,i = 0 if j / nbh(i) for a i, j N. On the other hand, the break state is connected to a top states except itsef. We put ã N,N = 0. The transition probabiity of the standard HMM from or to the break state is written as ã j,n =e (j) 0 s jπ (N) 0 (j N), (14) { e (N) 0 r ã N,i = i π (i) 0 (i N), a (N) (15) 0,0 (i = N) where e (N) 0 (= 1 a (N) 0,0 ) and π(n) 0 (= 1) denote the exiting probabiity and the initia probabiity of state (N, 0). For this mode, Eq. (10) for t 1 can be written as ( ) bi (y t ) α t 1,j ã j,i + α t 1,N ã N,i (i N) j nbh(i) α t,i = N 1 bn (y t ) α t 1,j ã j,n (i = N). j=0 (16) We see that updating α t,i invoves summation of at most four terms for each i N and N terms for i = N. The tota compexity is thus O(N) for each time step. This reduction method can aso be extended to the case of L > 1 (see Appendix C). It is noteworthy that the performance HMM with the break state is reated to the performance HMM presented in Sec. III-B. If we assume that transitions go through the break state in no time, the two-step transition from top state j to top state i via the break state is reduced to the direct transition from top state j to top state i, and its probabiity is written as a product of s j and r i. In other words, the difference between these modes is whether breaks are expicity described. Since it is difficut to quantify its effect on the performance of score foowing anayticay, we wi evauate the effect through an experiment in Sec. IV-C2. IV. EXPERIMENTAL EVALUATION OF THE PROPOSED SCORE-FOLLOWING ALGORITHMS A. Processing Time We measured processing times in order to evauate the reduction of the computationa compexity with the proposed agorithms. The processing time depends on the number of events N and virtuay not on other score content and signa content. We used synthetic scores with 10 to 10 6 events 1 and a random signa of two seconds ength with a samping rate of 16 khz as an audio input. Normaized CQTs were computed with a frame ength of 128 ms and a hopsize of 20 ms. Their center frequencies ranged from 55 to 7040 Hz at a semitone interva, and the quaity factor was set to 16, which approximatey corresponds to one semitone. Agorithms were impemented in C++ on a computer with 3.30 GHz CPU (Inte(R) Core(TM) i CPU) and 8 GB memory running Debian. Processing times averaged over 100 frames with standard errors are shown in Fig. 4 for the agorithms proposed in Sec. III-C (break agorithm) with and without the pause states (L = 2 and L = 1) and the agorithm that cacuates α t,i 1 Practica scores contain O(10 3 ) to O(10 4 ) notes. For instance, there are around 2200 events in the carinet part of the first movement in the Mozart s Carinet Quintet.

6 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, Processing time [s] W/o pause W/ pause Number of events Baseine Fig. 4. Average processing times with standard errors with respect to the number of events. W/ pause and W/o pause represent the break agorithm with and without the pause states, respectivey. Baseine represents a simpe extension of the agorithms proposed in previous studies [5], [11], [15]. according to Eq. (10) (baseine agorithm). (The resuts for the agorithm proposed in Sec. III-B (no-break agorithm) did not significanty differ with the resuts for the break agorithms.) It can be confirmed that the average processing times increased asymptoticay in proportion to N 2 (N) with the baseine agorithm (the break agorithms, respectivey). The resut shows that the proposed agorithms significanty suppress the increase of processing times. The processing times for N 1000 were arger than the hopsize with the baseine agorithm, and the agorithm can work in rea time with scores with ony up to O(10 2 ) events, which is the size of short music pieces. By contrast, the average processing times were beow the hopsize for N (N 50000) with the break agorithm with (without, respectivey) the pause states. Therefore, the proposed agorithms with and without the pause states can work in rea time with scores with up to O(10 3 ) events and O(10 4 ) events, respectivey. Note that processing times depend on the computing power, but their reative vaues remain amost the same and the proposed agorithms are aways effective in reducing the computationa compexity. B. Score-Foowing Accuracy for Performances with Errors 1) Data Preparation: To evauate the score-foowing accuracy for performances with errors, we conducted an experiment using the Bach10 dataset [27]. It consists of audio recordings of ten four-part choraes by J. S. Bach. The soprano, ato, tenor and bass parts of each piece were separatey recorded and performed by the vioin, carinet, saxophone and bassoon, respectivey. Their durations ranged from 25 to 41 seconds. Since the performances did not contain errors, we simuated errors by randomy inserting, dropping and substituting notes in each score, which correspond to deetion, insertion and substitution errors in the performance, respectivey. Their probabiity vaues were obtained from the MIDI piano performances during practice in [13]: for deetion errors and for insertion errors. For simpicity, substitution errors were restricted to three types typica in carinet performances, namey errors in semitone, whoe-tone and perfect 12th. The first two errors are often caused by fingering errors and mis-readings of the score, and the ast error is caused by overbowing on a carinet. The probabiity vaues of the three pitch errors were , and in the simuation, where the probabiity of the perfect 12th pitch error was substituted by that of the octave pitch errors obtained in [13]. 2) Experimenta Conditions: We conducted a preiminary experiment and set the parameter for performance errors as foows: a i,i+2 = for deetion errors, a i,i = 0 for insertion errors, and a (i) 1,1 = and a(i) 0,1 = for pauses between notes. Athough the mixture weight w (i) k,0 can be earned from audio signas at each k and i in principe, it is difficut to obtain them independenty for the ack of enormous performance data. To reduce the number of parameters, we considered ony the most important three substitution errors described in the previous section. The mixture weights w (i) k,0 for the errors were designed in proportion to their frequencies used in the simuation: w (i) k,0 = 1 C (k = p i ) C (k = p i ± 1) C (k = p i ± 2) C (k = p i ± 19) 0 (otherwise) (17) for a p i 1, where C is the probabiity of pitch errors. The vaue of C was optimized in a preiminary experiment and we set C = For p i = 1, we put w (i) k,0 = 0 uness k = 1. The probabiities of stopping and resuming a performance s j, r i were set uniformy in i, j: s 0 = s 1 = = s N 1 = x for some positive x and r 0 = r 1 = = r N 1 = 1/N. Since the vaue of a (N) 0,0 did not significanty change the resut in a preiminary experiment, we fixed a (N) 0,0 = The accuracy of score foowing generay depends on the parameters of the emission probabiities. It has been reported that earning them from audio performances improves the accuracy [10], [22], and thus we earned the parameters µ k and Σ k from audio signas. The parameters can be earned from every musica instrument if necessary data is avaiabe and we can form a detaied mode for a specific instrument. Aternativey, we can use a set of data consisting of severa musica instruments to form a genera mode that can be appied for a wider cass of instruments. Such a earning method is appicabe for any instruments in principe, and it can be even more effective for musica instruments with compex signas, for which physica modeing or manua spectrum-tempate construction is more difficut. In genera, there is a tradeoff between the generaization capabiity and the adaptation abiity. Here, we earned the parameters with performance data of severa musica instruments and used them to measure the accuracy of score foowing. The earning data consisted of performances payed by the vioin and carinet in RWC musica instrument database [28]. To reduce overfitting, we assumed that Σ k is diagona and

7 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, Piecewise precision rate Break No-Break Antescofo Fig. 5. Average piecewise precision rates and standard errors with respect to s j for audio performances obtained by simuating errors. The break agorithm ( Break ) and the no-break agorithm ( No-Break ) without the pause states are compared to Antescofo [6]. introduced a ower bound, or a fooring vaue F, on the diagona eements of Σ k. The introduction of F is caed the fooring method and generay used for speech recognition (e.g. see [29]). We conducted a preiminary experiment and found the optima F = The initia probabiities were set as π i = 0 for i 0 and π 0 = 1. We compared the proposed agorithms with Antescofo [6], which is one of the most known score-foowing systems appied to various musica pieces and used in the most severe artistic situations. Antescofo was not deveoped to cope with repeats/skips in monophonic performances, and is without specia treatments for repeats/skips. It had the best accuracy in the music information retrieva evauation exchange (MIREX 2006) [30], which is the most famous evauation contest in this fied. Since Antescofo ended score foowing when the ast note in the score was estimated, estimated score positions were assumed to be the ast note from the time when Antescofo ended score foowing. The overa accuracy of score foowing was measured by piecewise precision rate (PPR), defined as the piecewise average rate of onsets correcty detected within ms error. The PPR has been used with = 2000 ms in MIREX [30], [31]. 3) Resuts: Tab. I summarizes average PPRs and standard errors with = 300 ms for every musica instrument. The resuts for the no-break agorithm did not significanty differ with the resuts for the break agorithm when s j = 0.0. We found that the proposed agorithms provided simiar accuracies for the saxophone and bassoon data, which were not contained in the earning data, compared to the carinet and vioin data. The PPRs obtained with the proposed agorithms were simiar to those obtained with Antescofo in a data. Fig. 5 iustrates average PPRs and standard errors with = 300 ms. As described in Sec. III-A, computing a transitions hep that the score foower returns to recover from being ost. The benefit can be confirmed from that the proposed TABLE I. AVERAGE PIECEWISE PRECISION RATES AND STANDARD ERRORS FOR VIOLIN, CLARINET, SAXOPHONE AND BASSOON PERFORMANCES WITH ERRORS. PROPOSED (s j = 0) ( ANTESCOFO ) DENOTES THE break agorithm WITH s j = 0 (ANTESCOFO [6], RESPECTIVELY). TABLE II. Musica instrument Proposed (s j = 0) Antescofo Vioin 0.72 ± ± 0.06 Carinet 0.61 ± ± 0.08 Saxophone 0.63 ± ± 0.06 Bassoon 0.76 ± ± 0.04 THE NUMBER OF ERRORS AND REPEATS/SKIPS IN THE USED CLARINET PERFORMANCES. Pauses Deetion Insertion Substitution Repeats/skips between notes error error error Count agorithms with s j = provided around 0.05 higher accuracy than Antescofo, which searches ony oca transitions. On the other hand, s j s arger than caused the frequent overdetection of repeats/skips and the accuracy became ower than s j = 0. A simiar tendency was observed in PPR with = 500 and 2000 ms. Large vaues of s j deteriorated the score-foowing accuracy of the present agorithms as shown in Fig. 5. This is because the arger s j, the more frequenty the agorithms may misestimate insertion/deetion/substitution errors as repeats/skips. We indeed confirmed that the number of misdetected repeats/skips increased with arger s j. There was around 0.1 difference in PPR between the agorithms when s j is arge. We found that the tota number of misdetected repeats/skips by the no-break agorithm was around 1.2 times arger than that of the break agorithm for s j Since the break agorithm assumes that repeats/skips aways accompany breaks and simuated errors did not accompany pauses, the resuts suggest that the expicit description of the breaks reduced misestimations of the errors as repeats/skips. C. Score-Foowing Accuracy for Performances with Errors and Repeats/Skips 1) Performance Data During Practice: We coected 16 audio recordings of carinet performances with a time range of 31 to 213 s (totay 28 min 48 s). We requested an amateur carinetist to freey practice seven music pieces containing cassica and popuar music pieces and nursery rhymes, partiay from RWC music database [28]. His performances were recorded with a vibration microphone attached to the carinet. The performances were aigned to the notes in the scores by one of the authors. The tota number of performed notes was 2672, and Tab. II ists the count of errors and repeats/skips. Tab. III summarizes differences in score times before and after repeats/skips in the performance data, and we see that they contain repeats/skips between remote score positions. Here, ony breaks and pauses between notes onger than 500 ms were counted since it is difficut to accuratey annotate offsets of performed notes and short sient breaks and pauses between notes. A transitions with j / nbh(i) were counted

8 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, TABLE III. STATISTICS OF DIFFERENCES IN SCORE TIMES BEFORE AND AFTER REPEATS/SKIPS IN THE PERFORMANCE DATA. QU. IS AN ABBREVIATION FOR QUARTILE. Piecewise precision rate Score time Min. 1st Qu. Median Mean 3rd Qu. Max. In second In event Break No-Break Antescofo Fig. 6. Average piecewise precision rates with standard errors with respect to s j. The agorithms are same as in Fig. 5. as repeats/skips, where i and j denote stop and resumption positions. 2) Resuts: The parameters were same as in Sec. IV-B2. To measure how we the agorithms foowed repeats/skips, we cacuated a detection rate of repeats/skips and the time interva between a repeat/skip and its detection, which we ca foowing time. A repeat/skip was defined to be detected if there was a correcty estimated frame unti the next repeat/skip or the end of the audio recording. Fig. 6 iustrates average PPRs with standard errors for = 300 ms. Both proposed agorithms outperformed Antescofo at a s i s, ceary showing that the proposed agorithms are effective in foowing performances with errors and repeats/skips. A simiar tendency was observed in PPR with = 100, 500 and 2000 ms. We aso measured the effect of adding the pause states in the proposed agorithms with s i = , and found that it increased PPRs by 0.05 on average. Tab. IV summarizes the detection rates of repeats/skips, and Fig. 7 iustrates averages of foowing times over a detected repeats/skips (average foowing times) and standard errors in second. Since the standard error for Antescofo was too arge to dispay in the figure, ony the average vaue is shown. Both proposed agorithms ceary outperformed Antescofo in the detection rate and the foowing time. For exampe, compared to Antescofo, both proposed agorithms with s j = detected 14 times more repeats/skips and caught up with them 20 times faster in second. These resuts show that the proposed modes are effective for repeats/skips. The break agorithm (the no-break agorithm) with s j = detected 56 (57) repeats/skips, but faied to TABLE IV. DETECTION RATES OF REPEATS/SKIPS FOR VARYING s j. THE ALGORITHMS ARE SAME AS IN FIG. 5. Foowing time [s] s j Break No-Break Antescofo /63 60/ /63 59/ /63 60/ /63 59/ /63 57/ /63 55/ /63 55/ /63 43/ /63 13/63 4/63 Break No-Break Antescofo Fig. 7. Average foowing time and standard error for varying s j. The agorithms are same as in Fig. 5. For Antescofo, ony the average foowing time is shown. detect seven (six, respectivey) repeats/skips. These faiures were caused by the existence of sections and phrases simiar to each other in the scores (e.g. choruses in popuar music) and consideraby short performances between repeats/skips. For exampe, nine performances between repeats/skips were beow five seconds. Most of the repeats/skips accompanied sient breaks, but the break agorithm provided simiar resuts to the no-break agorithm. This is because the top states associated with rests can pay the same roe of the break state since these top states were connected to a top states. Furthermore, we measured foowing times and detection rates for performances payed by other musica instruments. The audio recordings in the Bach10 dataset did not contain repeats/skips, and we synthesized performances containing repeats/skips by randomy jumped between breaks in each recording with a probabiity of 0.1 and inserting sient breaks at repeats/skips. The durations of the breaks were samped uniformy from 0.5 to 30 seconds and each synthesized performance was forced to contain at east one repeat/skip. After the synthesis, errors were simuated in the same way as in Sec. IV-B1. Tab. V summarizes detection rates of repeats/skips for every musica instrument. The proposed agorithms with s j = outperformed Antescofo in the detection rate, and we found simiar tendency in the PPR and the foow-

9 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, Break No-Break Antescofo Break No-Break Antescofo Piecewise precision rate Piecewise precision rate (a) Average piecewise precision rates with standard errors (b) Average foowing times with standard errors Fig. 8. (a) Average piecewise precision rates and (b) average foowing times with respect to s j for audio performances with simuated errors and repeats/skips. The agorithms are same as in Fig. 5, and ony the average foowing time is shown for Antescofo in the right pane. TABLE V. DETECTION RATES OF REPEATS/SKIPS FOR VIOLIN, CLARINET, SAXOPHONE AND BASSOON DATA WITH SIMULATED ERRORS AND REPEATS/SKIPS. THE ALGORITHMS ARE SAME AS IN FIG. 5, AND s j = WAS USED IN BOTH PROPOSED ALGORITHMS. Musica instrument Break No-Break Antescofo Vioin 13/13 13/13 2/13 Carinet 11/11 10/11 4/11 Saxophone 11/12 11/12 2/12 Bassoon 10/10 10/10 0/10 ing time as shown in Fig. 8 (a) and (b), respectivey. These resuts show that the proposed agorithms are aso effective in foowing performances with errors and repeats/skips for various musica instruments. A demonstration video of an automatic accompaniment system using the break agorithm without the pause states is avaiabe at on Youtube [32]. In the video, the break agorithm successfuy foows the performances during practice and catches up the performances after repeats/skips within a few seconds. V. DISCUSSIONS A. Improvement of the Proposed agorithms We now discuss possibe extensions of the proposed agorithms. The stop and resumption positions are not competey random, and their distributions have certain tendencies in actua performances [13]. For exampe, performers frequenty resume from the first beats of bars and the beginning of phrases, which refects performers understanding of musica structures. These tendencies can be incorporated in s j, r i in our performance HMMs, and the accuracy and foowing times of the proposed agorithms woud improve [13]. Another method to improve the proposed agorithms is to refine the mode of the durations of performed events. For this purpose, we can assign mutipe bottom states to mode the duration [20], [24], [25] or expicity introduce its probabiity distribution [6]. This refinement is compatibe with the proposed methods to reduce the computationa cost since they can be used regardess of the topoogy of the bottom HMMs. The proposed agorithms successfuy foowed carinet performances against tempo changes in the experiment and the demonstration video in Sec. IV-C. However, the accuracy may deteriorate for the performances with arge tempo changes. To suppress the deterioration, it woud be effective to adequatey change d i on the fy, referring to estimated tempos. B. Extension to Poyphonic Music Athough we have confined ourseves to monophonic performances, et us briefy discuss the poyphonic case. We can construct a performance HMM for poyphonic scores simiary to the monophonic case. By associating top states with musica events (chords, notes and rests) in a poyphonic score, the top HMM can be used without any change, and insertions and deetions of chords, pauses between chords and repeats/skips can be incorporated in the same way. Importanty, the present methods to reduce the computationa compexity can be appied to the poyphonic case since it is independent of detais of the bottom HMMs. On the other hand, we need to extend the bottom HMMs to incude chords. Especiay, errors may occur at every note in a chord, and there are a combinatoriay arge number of possibe forms of errors for a arge chord. Athough we coud prepare spectra tempates for a possibe forms of payed chords and use a mixture distribution simiary to Eq. (4) in principe, it requires arge computationa cost in estimating score positions. However, the infuence of note-wise errors in spectra differences is generay ess significant for a arge chord, and a bod approximation of negecting note-wise errors woud work reativey we for such

10 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, a case, which can serve as a practica method to avoid the arge computationa cost. There are other issues for poyphonic performances. For exampe, notes in a chord are indicated to be performed simutaneousy in the score, but they can be actuay performed at different times. Aso, reative energy of notes in a chord depends on the performer. Their treatment requires additiona discussions and experiments, and the extension to poyphonic performances is now under investigation. VI. CONCLUSION We discussed score foowing of monophonic music performances with errors and arbitrary repeats/skips by constructing a stochastic mode of music performance. We incorporated possibe errors in audio performances into the mode. In order to sove the probem of arge computationa cost for foowing arbitrary repeats/skips, we presented two HMMs that describe a probabiity of repeats/skips with a probabiity of stop positions and a probabiity of resumption positions, and derived computationay efficient agorithms. We demonstrated rea-time working of the agorithms with scores of practica ength (O(10 3 ) to O(10 4 ) events). Experimenta evauations using carinet performance data showed that the agorithms outperformed Antescofo in the accuracy of score foowing and the tracking abiity of repeats/skips. In addition, we briefy discussed methods to improve the proposed agorithms and extend them for poyphonic inputs. ACKNOWLEDGEMENTS We thank Yuu Mizuno and Kosuke Suzuki for participating in the eary stage of this work, Naoya Ito for paying the carinet, and Hirokazu Kameoka for usefu discussions. This research was supported in part by JSPS Research Feowships for Young Scientists No. 15J0992 (T. N.), and JSPS Grant-in- Aid No. 15K16054 (E. N.) and No (S. S.). REFERENCES [1] R. B. Dannenberg, An on-ine agorithm for rea-time accompaniment, in Proc. Int. Computer Music Conf., pp , [2] B. Vercoe, The synthetic performer in the context of ive performance, in Proc. Int. Computer Music Conf., pp , [3] A. Arzt, G. Widmer, and S. Dixon, Automatic page turning for musicians via rea-time machine istening, in Proc. European Conf. Artificia Inteigence, pp , [4] N. Orio, S. Lemouton, D. Schwarz, and N. Schne, Score foowing: State of the art and new deveopments, in Proc. New Interfaces for Musica Expression, pp , [5] B. Pardo and W. Birmingham, Modeing form for on-ine foowing of musica performances, in Proc. AAAI, vo. 2, pp , [6] A. Cont, A couped duration-focused architecture for rea-time musicto-score aignment, IEEE Trans. Pattern Ana. Mach. Inte., vo. 32, pp , June [7] C. Joder, S. Essid, and G. Richard, A comparative study of tona acoustic features for a symboic eve music-to-score aignment, in Proc. IEEE Workshop Appications Signa Process. Audio Acoust., pp , [8] Z. Duan and B. Pardo, A state space mode for onine poyphonic audio-score aignment, in Proc. Int. Conf. Acoust. Speech Signa Process., pp , [9] T. Otsuka, K. Nakadai, T. Takahashi, T. Ogata, and H. G. Okuno, Reatime audio-to-score aignment using partice fiter for copayer music robots, EURASIP J. Appied Signa Process., vo. 2011, no , pp. 1 13, [10] C. Joder, S. Essid, and G. Richard, A conditiona random fied framework for robust and scaabe audio-to-score matching, IEEE Trans. Acoust., Speech, and Language Process., vo. 19, no. 8, pp , [11] N. Montecchio and A. Cont, A unified approach to rea time audioto-score and audio-to-audio aignment using sequentia Montecaro inference techniques, in Proc. Int. Conf. Acoust. Speech Signa Process., pp , [12] T. Nakamura, E. Nakamura, and S. Sagayama, Acoustic score foowing to musica performance with errors and arbitrary repeats and skips for automatic accopaniment, in Proc. Sound and Music Computing Conf., pp , Aug [13] E. Nakamura, T. Nakamura, Y. Saito, N. Ono, and S. Sagayama, Outerproduct hidden Markov mode and poyphonic MIDI score foowing, J. New Music Res., vo. 43, no. 2, pp , [14] D. Schwarz, N. Orio, and N. Schne, Robust poyphonic MIDI score foowing with hidden Markov modes, in Proc. Int. Computer Music Conf., [15] C. Oshima, K. Nishimoto, and M. Suzuki, A Piano Duo Performance Support System to Motivate Chidren s Practice at Home, Trans. Info. Process. Soc. Japan, vo. 46, no. 1, pp , in Japanese. [16] E. Nakamura, Y. Saito, N. Ono, and S. Sagayama, Merged-output hidden Markov mode for score foowing of MIDI performance with ornaments, desynchronized voices, repeats and skips, in Proc. Joint Conf. of 40th Int. Computer Music Conf. and 11th Sound and Music Computing Conf., pp , [17] E. Nakamura, N. Ono, S. Sagayama, and K. Watanabe, A stochastic tempora mode of poyphonic MIDI performance with ornaments, in preparation. [arxiv: ]. [18] C. Fremerey, M. Müer, and M. Causen, Handing repeats and jumps in score-performance synchronization, in Proc. Int. Symposium Music Info. Retrieva, pp , [19] Z. Duan and B. Pardo, Aigning semi-improvised music audio with its ead sheet., in Proc. Int. Symposium Music Info. Retrieva, pp , [20] N. Orio and F. Déchee, Score foowing using spectra anaysis and hidden Markov modes, in Proc. Int. Computer Music Conf., vo. 1001, pp , [21] A. Cont, Reatime audio to score aignment for poyphonic music instruments, using sparse non-negative constraints and hierarchica HMMs, in Proc. Int. Conf. Acoust. Speech Signa Process., vo. 5, pp , [22] C. Joder, S. Essid, and G. Richard, Learning optima features for poyphonic audio-to-score aignment, IEEE Trans. Acoust., Speech, and Language Process., vo. 21, pp , Oct [23] J. Brown and M. Puckette, An efficient agorithm for the cacuation of a constant Q transform, J. Acoust. Soc. Am., vo. 92, pp , [24] P. Cano, A. Loscos, and J. Bonada, Score-performance matching using HMMs, in Proc. Int. Computer Music Conf., pp , [25] C. Raphae, Automatic segmentation of acoustic musica signas using hidden Markov modes, IEEE Trans. Pattern Ana. Mach. Inte., vo. 21, no. 4, pp , [26] T. Nakamura, H. Kameoka, K. Yoshii, and M. Goto, Timbre repacement of harmonic and drum components for music audio signas, in Proc. Int. Conf. Acoust. Speech Signa Process., pp , [27] Z. Duan and B. Pardo, Soundprism: An onine system for scoreinformed source separation of music audio, IEEE J. Se. Topics. Signa Process., vo. 5, no. 6, pp , [28] M. Goto, Deveopment of the RWC Music Database, in Proc. Int. Congress Acoust., vo. 1, pp , 2004.

11 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, TABLE VI. Mathematica notation a i,j π i a (i), π (i) e (i) b (i) IMPORTANT PARAMETERS AND THEIR MEANINGS OF THE PROPOSED MODELS. Meaning Transition probabiity of the top HMM Initia probabiity of the top HMM Transition probabiity of the i-th bottom HMM Initia probabiity of the i-th bottom HMM Exiting probabiity of the i-th bottom HMM (y t) Emission probabiity of state (i, ) for observation y t ã (i,),(j, ) Transition probabiity of the standard HMM obtained by fatting the two-eve HMM π (i,) Initia probabiity of the standard HMM obtained by fatting the two-eve HMM Initia probabiity of the states of the standard HMM b(i,) (y t) obtained by fatting the two-eve HMM s j The probabiity of stopping at event j before a repeat/skip r i The probabiity of resuming a performance at event i after a repeat/skip [29] HTK Speech Recognition Tookit. [Onine; accessed 11-February-2015]. [30] MIREX HOME - MIREX Wiki. MIREX HOME. [Onine; accessed 11-February-2015]. [31] A. Cont, D. Schwarz, N. Schne, and C. Raphae, Evauation of reatime audio-to-score aignment, in Proc. Int. Symposium Music Info. Retrieva, [32] S. Sagayama, T. Nakamura, E. Nakamura, Y. Saito, H. Kameoka, and N. Ono, Automatic music accompaniment aowing errors and arbitrary repeats and jumps, in Proc. Meetings on Acoustics, vo. 21, , pp. 1 11, Acoustica Society of America, APPENDIX A. List of important parameters Important parameters of the proposed modes are isted in Tab. VI. B. Derivation of the No-Break Agorithm for L > 1 We now derive an efficient agorithm of computing α t,(i,) for the performance HMM without the break state in the case of L > 1. Assuming that the transition probabiity of repeats/skips is described as a product of s j and r i, the transition probabiity of the standard HMM ã (j, ),(i,) for j / nbh(i) can be written as The first summation in the parentheses of Eq. (19) is of O(L). The second summation can be converted into α t 1,(j, )e (j) s j = j / nbh(i) =0,,L 1 j=0,,n 1 =0,,L 1 α t 1,(j, )e (j) s j j nbh(i) =0,,L 1 α t 1,(j, )e (j) s j. (20) The first summation of the right-hand side of Eq. (20) is independent of i and thus it is sufficient to compute it once at each time step. Hence, the tota computationa compexity at each time step is of O(LN). C. Derivation of the Break Agorithm for L > 1 Let us consider the performance HMM with the break state and with L bottom states in each top state. In the same way as Sec. III-C, sient breaks at repeats/skips can be introduced as top state N (the break state) and arbitrary repeats/skips are described with two-step transitions via the break state. Since the transition probabiity of the standard HMM ã (j, ),(i,) is zero uness j nbh(i) {N}, Eq. (9) for t 1 and i N can be rewritten as ( α t,(i,) = b (i,) (y t ) α t 1,(j, )ã (j, ),(i,) j nbh(i) =0,,L 1 α t 1,(N, )ã (N, ),(i,) L 1 + =0 ), (21) The second term in the parentheses of Eq. (21) for each i N is of a constant computationa compexity. On the other hand, Eq. (9) for t 1 and i = N is converted into α t,(n,) = b (N,) (y t ) α t 1,(N,) ã (j, ),(N,). (22) j=0,,n 1 =0,,L 1 This computation is of O(LN) and hence the tota computationa compexity is of O(LN) at each time step. ã (j, ),(i,) = e (j) s j r i π (i), (18) and Eq. (9) for t 1 is rewritten as ( α t,(i,) = b (i,) (y t ) α t 1,(i, )ã (j, ),(i,) + r i π (i) j nbh(i) =0,,L 1 j / nbh(i) =0,,L 1 α t 1,(j, )e (j) s j ). (19)

12 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. YY, Tomohiko Nakamura He received his B.E and M.S. degrees from the University of Tokyo, Japan, in 2011 and 2013, respectivey. He is currenty a Ph. D. student at the University of Tokyo and a research feow of Japan Society for the Promotion of Science (JSPS). His research interests invove audio signa processing and statistica machine earning. He received Internationa Award from the Society of Instrument and Contro Engineers (SICE) Annua Conference 2011, SICE Best Paper Award (Takeda Award) in 2015, and Yamashita SIG Research Award from the Information Processing Society of Japan (IPSJ) in Eita Nakamura He received a Ph. D. in physics from the University of Tokyo in After having been a post-doctora researcher at the Nationa Institute of Informatics and Meiji University, he is currenty a post-doctora researcher at the Speech and Audio Processing Group at Kyoto University. His research interests incude music information processing and statistica machine earning. Shigeki Sagayama He received the B.E., M.S., and Ph.D. degrees from the University of Tokyo, Tokyo, Japan, in 1972, 1974, and 1998, respectivey, a in mathematica engineering and information physics. He joined Nippon Teegraph and Teephone Pubic Corporation (currenty, NTT) in 1974 and started his career in speech anaysis, synthesis, and recognition at NTT Labs in Musashino, Japan. From 1990, he was Head of the Speech Processing Department, ATR Interpreting Teephony Laboratories, Kyoto, Japan where he was in charge of an automatic speech transation project. In 1993, he was responsibe for speech recognition, synthesis, and diaog systems at NTT Human Interface Laboratories, Yokosuka, Japan. In 1998, he became a Professor of the Graduate Schoo of Information Science, Japan Advanced Institute of Science and Technoogy (JAIST), Ishikawa. In 2000, he was appointed Professor at the Graduate Schoo of Information Science and Technoogy (formery, Graduate Schoo of Engineering), the University of Tokyo. After his retirement from the University of Tokyo, he is a Professor of Meiji University from His major research interests incude the processing and recognition of speech, music, acoustic signas, handwriting, and images. He was the eader of anthropomorphic spoken diaog agent project (Gaatea Project) from 2000 to Prof. Sagayama received the Nationa Invention Award from the Institute of Invention of Japan in 1991, the Director Genera s Award for Research Achievement from the Science and Technoogy Agency of Japan in 1996, and other academic awards incuding Paper Awards from the Institute of Eectronics, Information and Communications Engineers, Japan (IEICEJ) in 1996 and from the Information Processing Society of Japan (IPSJ) in He is a member of the Acoustica Society of Japan, IEICEJ, and IPSJ.

Drum Transcription in the presence of pitched instruments using Prior Subspace Analysis

Drum Transcription in the presence of pitched instruments using Prior Subspace Analysis ISSC 2003, Limerick. Juy -2 Drum Transcription in the presence of pitched instruments using Prior Subspace Anaysis Derry FitzGerad φ, Bob Lawor*, and Eugene Coye φ φ Music Technoogy Centre, Dubin Institute

More information

Prior Subspace Analysis for Drum Transcription

Prior Subspace Analysis for Drum Transcription Audio Engineering Society Convention Paper Presented at the 4th Convention 23 March 22 25 Amsterdam, he Netherands his convention paper has been reproduced from the author's advance manuscript, without

More information

NCH Software VideoPad Video Editor

NCH Software VideoPad Video Editor NCH Software VideoPad Video Editor This user guide has been created for use with VideoPad Video Editor Version 4.xx NCH Software Technica Support If you have difficuties using VideoPad Video Editor pease

More information

Using wordless picture books in schools and libraries. Ideas for using wordless picture books in reading, writing and speaking activities

Using wordless picture books in schools and libraries. Ideas for using wordless picture books in reading, writing and speaking activities CfE eves Eary to Fourth (Ages 3-16) Using wordess picture books in schoos and ibraries Ideas for using wordess picture books in reading, writing and speaking activities Resource created by Scottish Book

More information

Operation Guide

Operation Guide MO0503-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy and keep it on hand for ater reference when

More information

Remarks on The Logistic Lattice in Random Number Generation. Neal R. Wagner

Remarks on The Logistic Lattice in Random Number Generation. Neal R. Wagner Remarks on The Logistic Lattice in Random Number Generation Nea R. Wagner 1. Introduction Pease refer to the quoted artice before reading these remarks. I have aways been fond of this particuar random

More information

Topology of Musical Data

Topology of Musical Data Topoogy of Musica Data Wiiam A. Sethares Department of Eectrica and Computer Engineering, University of Wisconsin, Madison, USA, sethares@ece.wisc.edu November 27, 2010 Abstract Techniques for discovering

More information

Merged-Output Hidden Markov Model for Score Following of MIDI Performance with Ornaments, Desynchronized Voices, Repeats and Skips

Merged-Output Hidden Markov Model for Score Following of MIDI Performance with Ornaments, Desynchronized Voices, Repeats and Skips Merged-Output Hidden Markov Model for Score Following of MIDI Performance with Ornaments, Desynchronized Voices, Repeats and Skips Eita Nakamura National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku,

More information

Running a shared reading project. A scheme of activities to help older children share picture books with younger ones

Running a shared reading project. A scheme of activities to help older children share picture books with younger ones CFE Leves Eary Senior phase (Ages 3-16) Running a shared reading project A scheme of activities to hep oder chidren share picture books with younger ones Resource created by Scottish Book Trust Contents

More information

Autoregressive hidden semi-markov model of symbolic music performance for score following

Autoregressive hidden semi-markov model of symbolic music performance for score following Autoregressive hidden semi-markov model of symbolic music performance for score following Eita Nakamura, Philippe Cuvillier, Arshia Cont, Nobutaka Ono, Shigeki Sagayama To cite this version: Eita Nakamura,

More information

EDT/Collect for DigitalMicrograph

EDT/Collect for DigitalMicrograph May 2016 (Provisiona) EDT/Coect for DigitaMicrograph Data Coection for Eectron Diffraction Tomography EDT/Coect Manua 1.0 HREM Research Inc. Introduction The EDT/Coect software has been deveoped by HREM

More information

Operation Guide 3197

Operation Guide 3197 MO1004-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Keep the watch exposed to bright ight The eectricity

More information

Operation Guide 5200

Operation Guide 5200 MO1103-EA Getting Acquainted ongratuations upon your seection of this ASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Be sure to keep a user documentation handy for

More information

Energy meter MRE-44S. MRE-44S/DC24V energy meter

Energy meter MRE-44S. MRE-44S/DC24V energy meter MRE-44S MRE-44S/DC24V energy meter Comprehensive consumption data anaysis in rea time High resoution and accuracy (cass 0.) even in harmonicay distorted grids Aso anayses harmonics (optiona, up to 50 Hz)

More information

Diploma Syllabus. Music Performance from 2005

Diploma Syllabus. Music Performance from 2005 Dipoma Syabus Music Performance from 2005 SPECIAL NOTICES This Music Performance Dipoma Syabus from 2005 is a revised version of the Performing sections of the Dipoma Syabus from 2000. It is vaid wordwide

More information

Operation Guide 4717

Operation Guide 4717 MO0812-EB Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. This watch does not have a Time Zone that corresponds

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Operation Guide 3271

Operation Guide 3271 MO1106-EA Operation Guide 3271 Getting Acquainted ongratuations upon your seection of this ASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Be sure to keep a user documentation

More information

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller) Topic 11 Score-Informed Source Separation (chroma slides adapted from Meinard Mueller) Why Score-informed Source Separation? Audio source separation is useful Music transcription, remixing, search Non-satisfying

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Holger Kirchhoff 1, Simon Dixon 1, and Anssi Klapuri 2 1 Centre for Digital Music, Queen Mary University

More information

Multi-TS Streaming Software

Multi-TS Streaming Software Appication Note Thomas Lechner 1.2017 0e Muti-TS Streaming Software Appication Note Products: R&S CLG R&S CLGD R&S SLG The R&S TSStream muti-ts streaming software streams a number of MPEG transport stream

More information

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 6, OCTOBER 2011 1205 Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE,

More information

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Eita Nakamura and Shinji Takaki National Institute of Informatics, Tokyo 101-8430, Japan eita.nakamura@gmail.com, takaki@nii.ac.jp

More information

LONG term evolution (LTE) has now been operated in

LONG term evolution (LTE) has now been operated in IEEE/ACM TRANSACTIONS ON NETWORKING 1 A Pricing-Aware Resource Scheduing Framework for LTE Networks You-Chiun Wang and Tzung-Yu Tsai Abstract Long term evoution (LTE) is a standard widey used in ceuar

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

ITU BS.1771 Loudness Meter BLITS Channel Identification for 5.1 Surround Sound

ITU BS.1771 Loudness Meter BLITS Channel Identification for 5.1 Surround Sound RW-6seiter_IBC_009_GB_RZ_V5.qxp 0: Seite Functions of the various modes: ypica dispay patterns and their interpretation a few exampes: 900 900S 900D 900SD Mode 960 960S 960D 960SD 900 900S 900D 900SD F

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

TRANSCENSION DMX OPERATOR 2 USER MANUAL

TRANSCENSION DMX OPERATOR 2 USER MANUAL TRANSCENSION DMX OPERATOR 2 USER MANUAL I. PRODUCT DESCRIPTIONS Thank you for using our company the 192 CH DMX OPERATOR. To optimize the performance of this product, pease read these operating instructions

More information

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS Mutian Fu 1 Guangyu Xia 2 Roger Dannenberg 2 Larry Wasserman 2 1 School of Music, Carnegie Mellon University, USA 2 School of Computer

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

UNIFIED INTER- AND INTRA-RECORDING DURATION MODEL FOR MULTIPLE MUSIC AUDIO ALIGNMENT

UNIFIED INTER- AND INTRA-RECORDING DURATION MODEL FOR MULTIPLE MUSIC AUDIO ALIGNMENT UNIFIED INTER- AND INTRA-RECORDING DURATION MODEL FOR MULTIPLE MUSIC AUDIO ALIGNMENT Akira Maezawa 1 Katsutoshi Itoyama 2 Kazuyoshi Yoshii 2 Hiroshi G. Okuno 3 1 Yamaha Corporation, Japan 2 Graduate School

More information

Horizontal Circuit Analyzing

Horizontal Circuit Analyzing THE HA2500 Horizonta Circuit Anayzing Rea Answers - Rea Profits - Rea Fast! HA2500 Universa Horizonta Anayzer Why A Universa Horizonta Anayzer For Your Business? Today s CRT video dispay monitors support

More information

Specifications. Lens. Lens Shift. Light Source Lamp. Connectors. Digital. Video Input Signal Format. PC Input Signal Format.

Specifications. Lens. Lens Shift. Light Source Lamp. Connectors. Digital. Video Input Signal Format. PC Input Signal Format. Projection Distance Chart Dispay size (16:9) Projection distance Screen diagona (inch) W (mm) H (mm) Wide (m) Tee (m) 60 1,328 747 1.78 3.66 70 1,549 872 2.09 4.28 80 1,771 996 2.40 4.89 90 1,992 1,121

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

Vocal Technique. A Physiologic Approach. Second Edition

Vocal Technique. A Physiologic Approach. Second Edition Voca Technique A Physioogic Approach Second Edition Voca Technique A Physioogic Approach Second Edition Jan E. Bicke, D.M.A. 5521 Ruffin Road San Diego, CA 92123 e-mai: info@purapubishing.com Website:

More information

MUSC5 (MUS5A, MUS5B, MUS5C) General Certificate of Education Advanced Level Examination June Developing Musical Ideas.

MUSC5 (MUS5A, MUS5B, MUS5C) General Certificate of Education Advanced Level Examination June Developing Musical Ideas. Genera Certificate of Education Advanced Leve Examination June 2011 Music MUSC5 (MUS5A, MUS5B, MUS5C) Unit 5 Deveoping Musica Ideas Briefs To be issued to candidates at the start of the 20 hours of controed

More information

Refined Spectral Template Models for Score Following

Refined Spectral Template Models for Score Following Refined Spectral Template Models for Score Following Filip Korzeniowski, Gerhard Widmer Department of Computational Perception, Johannes Kepler University Linz {filip.korzeniowski, gerhard.widmer}@jku.at

More information

Section 2 : Exploring sounds and music

Section 2 : Exploring sounds and music Section 2 : Exporing sounds and music Copyright 2014 The Open University Contents Section 2 : Exporing sounds and music 3 1. Using stories and games to introduce sound 3 2. Working in groups to investigate

More information

Operation Guide 3270/3293

Operation Guide 3270/3293 MO1109-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Be sure to keep a user documentation handy

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

D-ILA PROJECTORS DLA-X95R DLA-X75R DLA-X55R DLA-X35

D-ILA PROJECTORS DLA-X95R DLA-X75R DLA-X55R DLA-X35 D-ILA PROJECTORS DLA-X95R DLA-X75R DLA-X55R DLA-X35 D L A-X S e r i e s DLA-X95R 4K-resoution D-ILA Projector JVC D-ILA projector premium mode that adopts high-grade parts reaises 4K-resoution* 1 and industry

More information

Operation Guide 5135

Operation Guide 5135 MO1006-EA Operation Guide 5135 Getting Acquainted ongratuations upon your seection of this ASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. This watch does not have

More information

MMS-Übungen. Einführung in die Signalanalyse mit Python. Wintersemester 2016/17. Benjamin Seppke

MMS-Übungen. Einführung in die Signalanalyse mit Python. Wintersemester 2016/17. Benjamin Seppke MIN-Fakutät Fachbereich Informatik Arbeitsbereich SAV/BV (KOGS) MMS-Übungen Einführung in die Signaanayse mit Python Wintersemester 2016/17 Benjamin Seppke MMS-Übungen: Einführung in die Signaanayse mit

More information

Heritage Series. Heritage Heritage Heritage Heritage Extender. Heritage 1000

Heritage Series. Heritage Heritage Heritage Heritage Extender. Heritage 1000 Heritage Series Heritage 4 Heritage 3 Heritage Heritage Extender Heritage Heritage 4 The Midas Heritage 4 is an evoution of the award winning Heritage 3 with an additiona 6 more busses, which has resuted

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Down - (DW Sampler Hold Buffer * Digital Filter * Fig. 1 Conceptual bunch-by-bunch, downsampled feedback system.

Down - (DW Sampler Hold Buffer * Digital Filter * Fig. 1 Conceptual bunch-by-bunch, downsampled feedback system. Bunch-by-Bunch Feedback for PEP II* G. Oxoby, R. Caus, N. Eisen, J. Fox, H. Hindi, J.Hoefich, J. Osen, and L. Sapozhnikov. Stanford Linear Acceerator Center, Stanford University, Stanford, CA 94309 I.

More information

A ROBOT SINGER WITH MUSIC RECOGNITION BASED ON REAL-TIME BEAT TRACKING

A ROBOT SINGER WITH MUSIC RECOGNITION BASED ON REAL-TIME BEAT TRACKING A ROBOT SINGER WITH MUSIC RECOGNITION BASED ON REAL-TIME BEAT TRACKING Kazumasa Murata, Kazuhiro Nakadai,, Kazuyoshi Yoshii, Ryu Takeda, Toyotaka Torii, Hiroshi G. Okuno, Yuji Hasegawa and Hiroshi Tsujino

More information

Intercom & Talkback. DanteTM Network Intercom BEATRICE R8. Glensound. Network Intercom. Eight Channel Rackmount Intercom.

Intercom & Talkback. DanteTM Network Intercom BEATRICE R8. Glensound. Network Intercom. Eight Channel Rackmount Intercom. G ensound Dante Intercom & Takback Eight Channe Rackmount Intercom Highights Dante and AES67 Compiant Simpe To Use Inteigeabe Loudspeaker 48kHz Crysta Cear Digita Audio Mains/ PoE Powered Low Noise Microphone

More information

MUSIC transcription is one of the most fundamental and

MUSIC transcription is one of the most fundamental and 1846 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2017 Note Value Recognition for Piano Transcription Using Markov Random Fields Eita Nakamura, Member, IEEE,

More information

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC th International Society for Music Information Retrieval Conference (ISMIR 9) A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC Nicola Montecchio, Nicola Orio Department of

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

3,81 mm Wide Magnetic Tape Cartridge for Information Interchange - Helical Scan Recording - DDS-2 Format using 120 m Length Tapes

3,81 mm Wide Magnetic Tape Cartridge for Information Interchange - Helical Scan Recording - DDS-2 Format using 120 m Length Tapes Standard ECMA-198 2nd Edition - June 1995 Standardizing Information and Communication Systems 3,81 mm Wide Magnetic Tape Cartridge for Information Interchange - Heica Scan Recording - DDS-2 Format using

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.

More information

Operation Guide 2531

Operation Guide 2531 MO0404-EC Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to carefuy read this manua and keep it on hand for ater reference when

More information

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION Tsubasa Fukuda Yukara Ikemiya Katsutoshi Itoyama Kazuyoshi Yoshii Graduate School of Informatics, Kyoto University

More information

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You Chris Lewis Stanford University cmslewis@stanford.edu Abstract In this project, I explore the effectiveness of the Naive Bayes Classifier

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Important Information... 3 Cleaning the TV... 3

Important Information... 3 Cleaning the TV... 3 Contents Important Information... 3 Ceaning the TV... 3 Using the Remote Contro... 4 How to Use the Remote Contro... 4 Cautions... 4 Instaing the Remote Contro Batteries... 4 The Front and Rear Pane...

More information

Operation Guide 3150

Operation Guide 3150 MO0805-EA Getting Acquainted ongratuations upon your seection of this ASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Keep the watch exposed to bright ight The eectricity

More information

Operation Guide

Operation Guide MO1302-EB Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Warning! The measurement functions buit into

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Background Talent. Chapter 13 BACKGROUND CASTING AGENCIES. Finding Specific Types THE PROCESS

Background Talent. Chapter 13 BACKGROUND CASTING AGENCIES. Finding Specific Types THE PROCESS Chapter 13 Background Taent Note that whie The Screen Actors Guid has changed the designation of extra to that of background actor, for the purpose of this chapter, the terms extra, extra taent, background

More information

Event-based Multitrack Alignment using a Probabilistic Framework

Event-based Multitrack Alignment using a Probabilistic Framework Journal of New Music Research Event-based Multitrack Alignment using a Probabilistic Framework A. Robertson and M. D. Plumbley Centre for Digital Music, School of Electronic Engineering and Computer Science,

More information

Falcons team update. Presentation Portugal Workshop 2015

Falcons team update. Presentation Portugal Workshop 2015 Facons team update Presentation Portuga Workshop 2015 Contents ASML s mission for Robocup Software updates New vision system Mission Vision for 2015-2016 Inside ASML there is a growing awareness for the

More information

Operation Guide 2804

Operation Guide 2804 MO007-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to carefuy read this manua and keep it on hand for ater reference when necessary.

More information

Music Similarity and Cover Song Identification: The Case of Jazz

Music Similarity and Cover Song Identification: The Case of Jazz Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary

More information

Introductions to Music Information Retrieval

Introductions to Music Information Retrieval Introductions to Music Information Retrieval ECE 272/472 Audio Signal Processing Bochen Li University of Rochester Wish List For music learners/performers While I play the piano, turn the page for me Tell

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

The Basics of Monitor Technology (1)

The Basics of Monitor Technology (1) The Basics of Monitor Technoogy 2-187-799-12(1) Preface In recent years, the editing systems and equipment used by broadcasters, production houses and independent studios have improved dramaticay, resuting

More information

A probabilistic framework for audio-based tonal key and chord recognition

A probabilistic framework for audio-based tonal key and chord recognition A probabilistic framework for audio-based tonal key and chord recognition Benoit Catteau 1, Jean-Pierre Martens 1, and Marc Leman 2 1 ELIS - Electronics & Information Systems, Ghent University, Gent (Belgium)

More information

AN ADAPTIVE KARAOKE SYSTEM THAT PLAYS ACCOMPANIMENT PARTS OF MUSIC AUDIO SIGNALS SYNCHRONOUSLY WITH USERS SINGING VOICES

AN ADAPTIVE KARAOKE SYSTEM THAT PLAYS ACCOMPANIMENT PARTS OF MUSIC AUDIO SIGNALS SYNCHRONOUSLY WITH USERS SINGING VOICES AN ADAPTIVE KARAOKE SYSTEM THAT PLAYS ACCOMPANIMENT PARTS OF MUSIC AUDIO SIGNALS SYNCHRONOUSLY WITH USERS SINGING VOICES Yusuke Wada Yoshiaki Bando Eita Nakamura Katsutoshi Itoyama Kazuyoshi Yoshii Department

More information

25th DOE/NRC NUCLEAR AIR CLEANING AND TREATMENT CONFERENCE

25th DOE/NRC NUCLEAR AIR CLEANING AND TREATMENT CONFERENCE DEEP BED CHARCOAL FILTER RETENTION SCREEN IN-PLACE REPLACEMENT AND REPAIR Wiiam Burns and Rajendra Paude Commonweath Edison Company LaSae County Station Raymond Rosten and Wiiam Knous Duke Engineering

More information

Operation Guide 4719

Operation Guide 4719 MO0801-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Keep the watch exposed to bright ight The eectricity

More information

Operation Guide 3172

Operation Guide 3172 MO1007-EC Congratuations upon your seection of this CASIO watch. Appications The buit-in sensors of this watch measure direction, barometric pressure, temperature and atitude. Measured vaues are then shown

More information

Operation Guide 3143

Operation Guide 3143 MO0804-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. This watch does not have a time zone that corresponds

More information

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING Zhiyao Duan University of Rochester Dept. Electrical and Computer Engineering zhiyao.duan@rochester.edu David Temperley University of Rochester

More information

Operation Guide

Operation Guide MO1603-EA 2016 ASIO OMPUTER O., LT. Operation Guide 5484 5485 Getting Acquainted ongratuations upon your seection of this ASIO watch. To get the most out of your purchase, be sure to read this manua carefuy.

More information

Operation Guide

Operation Guide MO0908-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Warning! The measurement functions buit into

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Image Generation in Microprocessor-based System with Simultaneous Video Memory Read/Write Access

Image Generation in Microprocessor-based System with Simultaneous Video Memory Read/Write Access Image Generation in Microprocessor-based System with Simutaneous Video Memory Read/rite Access Mountassar Maamoun 1, Bouaem Laichi 2, Abdehaim Benbekacem 3, Daoud Berkani 4 1 Department o Eectronic, Bida

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

Professional HD Integrated Receiver Decoder GEOSATpro DSR160

Professional HD Integrated Receiver Decoder GEOSATpro DSR160 Professiona HD Integrated Receiver Decoder GEOSATpro DSR160 User Manua V1.00-C Preface About This Manua This manua provides introductions to users about how to operate the device correcty. The content

More information

NAIVE - Network Aware Internet Video Encoding

NAIVE - Network Aware Internet Video Encoding NAIVE - Network Aware Internet Video Encoding Hector M. Bricefio MIT hbriceno@cs. mit. edu Steven Gorter Harvard University sjg @ cs. harvard. edu Leonard McMian MIT mcmian@cs.mit. edu Abstract The distribution

More information

Operation Guide 3220

Operation Guide 3220 MO1007-EA Getting Acquainted Congratuations upon your seection of this CASIO watch. To get the most out of your purchase, be sure to read this manua carefuy. Keep the watch exposed to bright ight The eectricity

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Geometric Path Planning for Automatic Parallel Parking in Tiny Spots

Geometric Path Planning for Automatic Parallel Parking in Tiny Spots 13th IC Smposium on Contro in Transportation Sstems The Internationa ederation of utomatic Contro September 1-14, 01. Sofia, ugaria eometric Path Panning for utomatic Parae Parking in Tin Spots Héène Vorobieva

More information

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

A Shift-Invariant Latent Variable Model for Automatic Music Transcription Emmanouil Benetos and Simon Dixon Centre for Digital Music, School of Electronic Engineering and Computer Science Queen Mary University of London Mile End Road, London E1 4NS, UK {emmanouilb, simond}@eecs.qmul.ac.uk

More information

Melodic Outline Extraction Method for Non-note-level Melody Editing

Melodic Outline Extraction Method for Non-note-level Melody Editing Melodic Outline Extraction Method for Non-note-level Melody Editing Yuichi Tsuchiya Nihon University tsuchiya@kthrlab.jp Tetsuro Kitahara Nihon University kitahara@kthrlab.jp ABSTRACT In this paper, we

More information

Musicians Adjustment of Performance to Room Acoustics, Part III: Understanding the Variations in Musical Expressions

Musicians Adjustment of Performance to Room Acoustics, Part III: Understanding the Variations in Musical Expressions Musicians Adjustment of Performance to Room Acoustics, Part III: Understanding the Variations in Musical Expressions K. Kato a, K. Ueno b and K. Kawai c a Center for Advanced Science and Innovation, Osaka

More information

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno* *Grad. Sch l of Informatics, Kyoto Univ. **PRESTO JST / Nat

More information