This is a repository copy of A New Method of Onset and Offset Detection in Ensemble Singing.

Size: px
Start display at page:

Download "This is a repository copy of A New Method of Onset and Offset Detection in Ensemble Singing."

Transcription

1 This is a repository copy of A New Method of Onset and Offset Detection in Ensemble Singing. White Rose Research Online URL for this paper: Version: Published Version Article: D'Amario, Sara, Daffern, Helena orcid.org/ and Bailes, Freya (2018) A New Method of Onset and Offset Detection in Ensemble Singing. Logopedics Phoniatrics Vocology,. Reuse This article is distributed under the terms of the Creative Commons Attribution (CC BY) licence. This licence allows you to distribute, remix, tweak, and build upon the work, even commercially, as long as you credit the authors for the original work. More information and the full terms of the licence here: Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by ing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. eprints@whiterose.ac.uk

2 Logopedics Phoniatrics Vocology ISSN: (Print) (Online) Journal homepage: A new method of onset and offset detection in ensemble singing Sara D Amario, Helena Daffern & Freya Bailes To cite this article: Sara D Amario, Helena Daffern & Freya Bailes (2018): A new method of onset and offset detection in ensemble singing, Logopedics Phoniatrics Vocology, DOI: / To link to this article: The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group Published online: 27 Mar Submit your article to this journal Article views: 351 View Crossmark data Full Terms & Conditions of access and use can be found at

3 LOGOPEDICS PHONIATRICS VOCOLOGY, RESEARCH ARTICLE A new method of onset and offset detection in ensemble singing Sara D Amario a, Helena Daffern a and Freya Bailes b a Department of Electronic Engineering, University of York, York, UK; b School of Music, University of Leeds, Leeds, UK ABSTRACT This paper presents a novel method combining electrolaryngography and acoustic analysis to detect the onset and offset of phonation as well as the beginning and ending of notes within a sung legato phrase, through the application of a peak-picking algorithm, TIMEX. The evaluation of the method applied to a set of singing duo recordings shows an overall performance of 78% within a tolerance window of 50 ms compared with manual annotations performed by three experts. Results seem very promising in light of the state-of-the-art techniques presented at MIREX in 2016 yielding an overall performance of around 60%. The new method was applied to a pilot study with two duets to analyse synchronization between singers during ensemble performances. Results from this investigation demonstrate bidirectional temporal adaptations between performers, and suggest that the precision and consistency of synchronization, and the tendency to precede or lag a co-performer might be affected by visual contact between singers and leader follower relationships. The outcomes of this paper promise to be beneficial for future investigations of synchronization in singing ensembles. ARTICLE HISTORY Received 28 June 2017 Revised 8 February 2018 Accepted 12 March 2018 KEYWORDS Interpersonal interaction; offset detection; onset detection; singing ensemble; synchronization Introduction Accurate analysis of sound, typically musical tones, as performed by an individual is fundamental to the investigation of performed musical characteristics such as tempo, rhythm and pitch structure. The analysis of singing ensemble recordings represents a major challenge in this respect, due to the difficulties of: (i) separating individual voices within polyphonic recordings to evaluate the contribution of each singer and (ii) identifying tone onsets and offsets. Whilst onsets and offsets are often clearly distinguishable for percussive sounds, in singing these vary according to vibrato, vocal fluctuations, timbral characteristics and onset envelopes, especially within a legato phrase where consonants are absent. Currently, there are no robust methods to identify onsets and offsets of individual voices, particularly in the context of ensemble singing. A protocol for onset offset detection of singing ensemble recordings might be useful for a range of aspects of music performance analysis and audio signal processing, such as music information retrieval, transcription applications and to evaluate synchronization between musicians during singing ensemble performances. The use of close proximity microphones, although capturing the data of the individual singers, does not eliminate bleed from other performers (1), and makes isolation of individual notes and therefore onsets and offsets difficult. Recent studies conducted by David Howard analysed tuning in two different SATB ensembles: the complexities of polyphonic analysis associated with audio recordings (2 4) were avoided by applying acoustic analysis in conjunction with electrolaryngography (Lx) to extract the f o estimates from vocal fold contact information. Electrolaryngography, coupled with electroglottography (EGG), two non-invasive techniques that assess vocal fold vibration in vivo through electrodes placed externally on either side of the neck at the level of the larynx, allow measurement of performance data in solo and ensemble performances and are often employed in singing research (for a recent review, see (5)). However, the use of Lx/EGG for the temporal analysis of onsets and offsets to assess synchronization between singers during vocal ensemble performances has still to be evaluated. Several approaches have been suggested for note-onset detection (for a review, see (6)). A few studies have focused on spectral features of the signals (7), combined phase and energy information (8), analysed phase deviations across the frequency domain (9), considered change of energy in frequency sub-bands (10), or are based on probabilistic methods such as hidden Markov models (11). Other approaches are based on the fundamental frequency contour and sound level envelope (12), or on time and frequency domain features (13). The selection and reliability of the algorithms mentioned above are strictly correlated to the type and quality of the audio signal; for example, time domain methods perform relatively well if the signal is very percussive as in piano or drum recordings. It is noteworthy that existing algorithms perform less well in singing compared with other classes, such as solo brass, wind instruments and polyphonic pitched instruments. In Music Information Retrieval Evaluation exchange (MIREX 2016), the best-performing algorithm for onset detection of solo singing voice achieved an F-measure, which is a metric of the overall performance, of 61.7%; whereas, the CONTACT Sara D Amario sda513@york.ac.uk Department of Electronic Engineering, University of York, Heslington, York YO10 5DD, UK ß 2018 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

4 2 S. D AMARIO ET AL. best-performing algorithms for drums, plucked strings, brass andwindinstrumentsachieved anf-measure of 93%, 92%, 91% and 78%, respectively. Toh et al. (14) implemented a system for the analysis of the solo singing voice that accurately identified 85% of onsets within 50 ms of the ground truth such as the manually annotated values of the same recordings. However, this is not precise enough for the analysis of the highly accurate coordination that is found in professional music ensembles, known to be in the order of tens of milliseconds (15,16). In summary, automated onset detection of nonpercussive performances, such as singing ensemble performances, from audio recordings remains a challenge and is currently under development and evolving. A robust algorithm able to automatically extract timing information in such performances will be highly beneficial for the investigation of synchronization between members of a singing ensemble. This paper addresses the complexities of analysing onset and offset timings in polyphonic singing recordings through a case study considering synchronisation in singing ensemble performances. A novel method to investigate temporal coordination in singing ensembles is developed and tested, based on the combined application of electrolaryngography and acoustic analysis, and on a new automated algorithm, termed TIMEX, that automatically extracts timing in monoaural singing performances. The effectiveness of this new method for the analysis of synchronization in singing ensembles was tested in a pilot study. A secondary aim of the pilot study was to investigate the importance of visual cues and leader follower relationships in singers synchronization during vocal ensemble performances with the central question: Do the presence/absence of visual contact (VC) between musicians and the instruction to act as leader or follower affect synchronization between singers in vocal duos? Synchronization between musicians is maintained through iterative temporal adjustments which might relate to expressive interpretations or noise during cognitivemotor processes. Research suggests that synchronization in small ensembles (17,18) might be affected by VC between musicians when auditory feedback is limited or musical timing is irregular, and by leader follower relationships between members of a musical ensemble. However, how synchronization evolves during vocal ensemble performances in relation to these factors still needs to be fully understood. Based on previous evidence, it was hypothesized that the combination of electrolaryngography and acoustic analysis is a valuable tool for the analysis of synchronization in singing ensembles by tracking the f o profile, as this combination proved to be a successful method in studies analysing intonation in SATB quartets from f o estimates (2 4). It was also conjectured that the leader s onset might tend to precede those of the follower, as found by (17) in piano duos. Finally, it was hypothesized that singers do not significantly rely on VC to temporally synchronize their actions with the co-performers actions during the ensemble performance of regular rhythms, as found by (19) in piano duos. The remainder of this paper is organized in four sections. First, an overview and evaluation of the novel onset/offset detection method is presented (see section TIMEX ). A case study of synchronization between singers in two vocal duos, based on the application of the new protocol is then described (see section Case study of synchronization in singing ensembles ). Finally, results of the algorithm s evaluation and the case study are discussed and conclusions presented. TIMEX: an algorithm for the automatic detection of note onsets and offsets The purpose of this section is to first describe (see section Algorithm specification ) and then test (see section Algorithm evaluation ) a novel algorithm developed to automatically extract temporal information relating to the notes within a legato phrase sung on any vowel. The input for the algorithm is the f o profile extracted from monoaural audio recordings of a singing ensemble obtained using Lx and a head-mounted microphone. Algorithm specification When singers perform legato, there are no silences between the notes within a phrase: phonation continues until the next rest/breath, effectively creating a portamento between notes. In the development of the algorithm, it was therefore necessary to set criteria with which to analyse the beginning and ending of each note within the piece. This has resulted in four categories being defined to denote the true beginning and ending of the scored notes. These are shown in Figure 1 and defined as: Onset (ON): beginning of phonation after a silence Note ending (NE): peak/trough in f o during phonation within a legato phrase, that is atypical of a vibrato cycle s characteristics for extent and frequency, calculated between 80 and 120 cents and between 2 and 9 Hz, respectively, and refined for each singer Note beginning (NB): peak/trough in f o during phonation that exceeds the maximum vibrato extent and is less than the vibrato frequency following a note ending Offset (OF): ending of phonation followed by a silence. In order to automate the extraction of the above categories, the following definitions have been formulated and parameter values inputted. The values were manually determined by testing with several recordings and can be modified by the users. Break: a sequence of one or more points where the Lx signal is null. Rest: a sequence of a minimum number of consecutive points where the Lx signal is null. The number of minimum points required to classify a break as a rest is arbitrarily defined; for this specific set of recordings, it has been set to a corresponding time window of 300 ms representing a quaver rest at 100 beats per minute (BPM). Phrase: a section of the Lx recording comprised between an onset and the following offset.

5 LOGOPEDICS PHONIATRICS VOCOLOGY 3 Figure 1. The f o profile of measures 1 3 of the raw Lx and audio signal from an upper voice performance of the two-part piece composed for this study (see section Stimulus material ), showing: (i) on the top panel, the Lx recording with the four sets of categories identified for each note within a legato phrase (i.e. onset, note beginning, note ending and offset), a local peak and the phrases; (ii) on the bottom, the audio recording, with the ON and OF fluctuation ranges and the break range. Fluctuation: the difference in frequency between two Lx or AUDIO points; the fluctuation can be linear or logarithmic, depending how it is measured. For these recordings, it was set to 80 cents. Local max: a point where the Lx/AUDIO value is higher than the Lx/AUDIO values at the previous and at the following point. Local min: a point where the Lx/AUDIO value is lower than the Lx/AUDIO values at the previous and at the following point. Onset/offset fluctuation range: the range of points after an onset or before an offset, where the singer s voice typically oscillates; local max/min points are ignored within this range, because they are not aligned with note changes, but are the result of the vibrato. Its duration is arbitrarily defined; a value of 300 ms has been used, as appropriate with this set of recordings. Vibrato frequency threshold: the minimum frequency of oscillation of the Lx or audio signal that classifies the segment as vibrato, and therefore is not associated to a true note change from the score. For these recordings, it was set to 5 Hz. Local peak: a point with a positive Lx value that falls in the middle of a range of a prescribed temporal window, where at least one point with null Lx frequency exists before and after such a point. The temporal window to conduct the check is arbitrarily defined; a time span of 500 ms centred around the point in question has been used with satisfactory results in this project. Spiking range: a range of points immediately before an onset or after an offset, where the Lx signal artificially spikes relative to the corresponding AUDIO signal. The width of such a range is arbitrarily defined; given the steepness of the spikes, a value of just 10 ms has proven sufficient to isolate the spikes. TIMEX detects and extracts ON, NB, NE and OF ensuring consistency of the analysis based on the following steps, as shown in Figure 2. Step 1: removal of Lx readings in the spiking range. The first operation performed on the raw Lx data is to remove all the positive Lx readings within the spiking range (adjacent to the breaks), replacing them with null values. This step is executed to prevent the artificial spikes from leading to a skewed and distorted reconstruction of the Lx signal from the AUDIO signal (the reconstruction procedure is explained in Step 2). Step 2: reconstruction of the missing Lx signal from the AUDIO signal. If the Lx signal is weak, the algorithm reconstructs the signal from the audio recording. This is achieved

6 4 S. D AMARIO ET AL. Figure 2. Algorithm flowchart. through a normalization procedure designed to reconstruct the Lx signal to follow the same shape as the AUDIO signal. The audio signal is scaled to match the original Lx values at the edges of the interval where the Lx signal is missing, therefore avoiding artificial max/min points being generated at the edges; from here on, the original Lx signal refers to the signal after the Lx readings in the spiking range have been removed, as per Step 1. Using the following nomenclature: t 0, t 1 : time intervals at the boundaries of the range where the original Lx signal is missing or weak, and the audio signal is at least partially available. f o _Lx 0, f o _Lx 1 : the values of the original Lx signal at t 0 and t 1 ; they are both positive by definition of how t 0 and t 1 are selected. f o _AUDIO 0, f o _AUDIO 1 : the values of the AUDIO signal at t 0 and t 1 ; if one of them is zero, it is calculated as the other one multiplied by the ratio between f o _Lx at that point and f o _Lx at the other end, while if both are zero reconstruction is not attempted for this interval. f o _Lx L (t), f o _AUDIO L (t): the values of the linearized Lx signal and the AUDIO signal respectively at time t, with t falling between t 0 and t 1 ; these are linearized as falling on a straight line connecting f o _Lx 0 and f o _Lx 1, and f o _AUDIO 0 and f o _AUDIO 1, respectively. f o _Lx(t), f o _AUDIO(t): the values of the original Lx signal and the AUDIO signal respectively at time t, with t falling between t 0 and t 1. The linearized Lx and AUDIO values are first computed as follows: f o Lx L ðþ¼ t f o Lx 0 þ ðf o Lx 1 f o Lx 0 Þ t t 0 (1) t 1 t 0 f o AUDIO L ðþ¼ t f o AUDIO 0 þ ðf o AUDIO 1 f o AUDIO 0 Þ t t 0 t 1 t 0 (2) Then, if f o _AUDIO(t) ¼ 0, f o _Lx(t) ¼ 0 (reconstruction not possible at a point where even the microphone reading is not available), otherwise f o _Lx(t) is reconstructed as f o LxðÞ¼ t f o AUDIOðÞ t f o Lx L ðþ t f o AUDIO L ðþ t The result of this reconstruction is that the Lx signal follows the shape of the AUDIO signal in the areas where the raw signal is not available, remaining continuous with the original values where present, as shown in the example of Figure 3. Step 3: removal of Lx local peaks. After the Lx signal has been reconstructed, any remaining local peaks are identified, based on the selected range (see definition above) and removed. The purpose is to eliminate spurious readings that are sometimes produced by the Lx sensor, which typically occur in a narrow time range, and can be identified via a proper selection of the local peak range. Removing the peaks after the signal has been reconstructed, from the AUDIO data where possible, allows the maximum amount of Lx data to be retained. The resulting Lx signal left after the (3)

7 LOGOPEDICS PHONIATRICS VOCOLOGY 5 Figure 3. Excerpt of the Lx and AUDIO signals from a recording of the upper voice performance, showing the reconstruction of the f o _Lx signal from f o _AUDIO signal in the temporal interval t 0 t 1, in which the Lx signal was missing. The Lx signal was reconstructed (see f o _Lx_Reconstructed) based on the linearized Lx and AUDIO signal (see f o _Lx_Linearized and f o _AUDIO_Linearized, respectively). removal of the local spikes is defined as the reconstructed Lx signal. Step 4: identification of onsets, offsets, note beginnings and note endings. Once the Lx signal has been reconstructed, it is processed to extract onsets and offsets of phonation and local max/min points during phonation. Then, local max/ min points are retained if all the following conditions are satisfied: 1.1. The point is not too close to the adjacent local max/ mins. Points that are too close to each other are removed, to avoid retaining small steps within a tone ascending or descending section as note beginnings or note endings, when they are just fluctuations of the singer s voice that sometimes occur within a note change. A value of just 10 ms is sufficient to discriminate those points from the max/mins to be retained The point does not fall within the onset or the offset fluctuation range Any of the following two conditions are satisfied: The logarithmic fluctuation, measured in cents, of the current point from the previous onset or max/min, or to the next max/min, is greater than a prescribed threshold. The distance in cents between two points at frequencies f 1 and f 2 is defined as in (3) max ðf 1 ; f 2 Þ cf ð 1 ; f 2 Þ ¼ 3986:3137 log 10 min ðf 1 ; f 2 Þ The frequency of oscillation of the point, relative to the previous and the next point, is lower than the vibrato frequency threshold; this condition is applied to disregard any max/mins that are the result of a vibrato of the singer s voice, without having to set a threshold (4) Figure 4. Example of the vibration frequency computed across a full cycle, extracted from an audio clip of the upper voice used for the study. that is too high for the logarithmic fluctuation, which would lead to discarding valid note beginnings or endings for semitones. The vibration frequency (vf n ) of the point is defined as the lowest of the oscillation frequencies relative to the previous and the next max/min, as shown in Figure 4: 1 vf n ¼ maxðt n t n 1 ; t nþ1 t n Þ The ability to manually tweak the results after visual validation is set to ensure that all and only the relevant max/min points are retained as note beginnings/endings. Algorithm evaluation Testing TIMEX on a set of singing performances The effectiveness of the algorithm was tested on 28 Lx recordings of a two-part piece composed by the first author for the following case study, as shown in Figure 5, and performed by two singing duos (see section Participants for more details). The data collected include 728 note beginnings, 728 note endings, 112 onsets and 112 offsets, with a total of 1680 timing extractions. Each audio file was approximately 25 s long, and the total length of the audio clips was about 10 minutes, which is much longer than the singing recordings used in the Music Information Retrieval Evaluation exchange (MIREX 2016) onset detection task. Recordings were manually cross-annotated by three experts, external to this investigation, who marked the beginning and ending of each note using Praat software (20,21). Experts used the same software setup displaying a spectrogram and a waveform with a fixed time window, and a tier for hand annotations; this display setup also gave the experts the chance to listen to the recordings. Markings were applied to monoaural recordings of the two-part performances sampled at 48 khz and post-processed with a time step (5)

8 6 S. D AMARIO ET AL. Figure 5. Duet exercise composed for the study, showing the notes chosen (see ) for the analysis of the synchronization and the four sets of time categories (e.g. ON: onset; NB: note beginning; NE: note ending; OF: offset). All notes were used for the evaluation of the reliability of TIMEX. of 1 ms. This time step setting was chosen to allow the detection of small asynchronies in the order of tens of milliseconds, such as those found in the literature of music ensemble performances. The evaluation procedure followed that described in MIREX 2016 for onset detection. A tolerance value was set to ±50 ms and the detected times were compared with ground-truth values manually detected by the experts. This is a standard procedure for the evaluation of onset detection algorithms, although the comparison of values detected by the algorithm with those manually detected by experts, and commonly referred to as ground-truth values, remains ambiguous and subjective as there can be no true objective value. A large time displacement of 50ms is a well-known criterion in the field of onset detection that takes into account inaccuracy of the hand labelling process (6). In addition, a small-time window of 10 ms was also chosen to detect small asynchronies in the synchronization during professional ensemble performances. The mean of the standard deviations for the manual annotations computed across the three experts was 59 ms. For a given ground-truth onset time, any extracted value falling within the tolerance time window of 10 or 50 ms was considered correct detection (CD). If the algorithm detected no value within the time window, the detection of that ground-truth time was reported as a false negative (FN). Detections outside all the tolerance windows were counted as false positives (FPs). The performance of the detection method was evaluated based on the three measures commonly used in the field of onset detection: Precision (P), Recall (R) and F-measure (F). The Precision measures the probability that the detected value is a true value, thus calculating how much noise the algorithm provides. The Recall indicates the probability that a true value is identified, therefore, measuring how much of the ground truth the algorithm identifies. The F-measure represents the overall Table 1. Performance of TIMEX. Tolerance Precision Recall F-measure 50 ms 65% 97% 78% 10 ms 23% 89% 36% performance, calculating the harmonic mean of Precision and Recall. The measures are computed as follows: N cd P ¼ N cd þ N fp (6) N cd R ¼ N cd þ N fn (7) F ¼ 2PR P þ R (8) N cd is the number of correct values detected by the algorithm; N fp is the number of false values detected; N fn is the number of missed values. As files were cross-annotated by three experts, the mean Precision and Recall rates were defined by averaging Precision and Recall rates computed for each annotation. The overall results are reported in Table 1. TIMEX achieved higher results in all measures than the best-performing algorithms for the singing voice from MIREX 2016 (22) with the same threshold of 50 ms, although based on a different data set and extracting different timing categories, such as onsets in MIREX and onsets/ offsets/beginnings/endings by TIMEX. The full data set of detection errors was scrutinized to investigate how FP and FN errors were distributed across performers and over the duration of the pieces. As shown in Table 2, the detection errors, computed with a tolerance level set at 10 ms, varied across the four performers: the total number of FNs found for singer 2 performing the upper voice was approximately half that of singer 1 performing the same piece, and the total number of FPs for singer 4 performing the lower voice was less than those found for singer

9 LOGOPEDICS PHONIATRICS VOCOLOGY 7 3 performing the lower part. These results suggest that singers might have a particular technique that affects the performance of the algorithm. As shown in Figure 6, the total number of FPs was distributed similarly across the course of the piece. However, FNs were more likely to occur when the note being analysed was a semitone from the previous note (as found regarding notes 1 2, 6 7, and of the upper voice, and 4 5, 16 18, of the lower voice) or for intervals greater than a 3rd (as found regarding note of the upper voice, and and of the lower voice). Evaluating the algorithm s reconstruction process The algorithm s reconstruction process was evaluated with respect to: (i) reliability of the Lx signal, as indexed by the measurement of the continuous/discontinuous parts of the Lx signal and (ii) performance of the reconstruction process. Onset/offset detection based on the AUDIO recording is not Table 2. Distribution of detection errors across performers. Performer s part False negatives False positives S1 upper voice S2 upper voice S3 lower voice S4 lower voice False negatives and false positives were averaged across performances. Tolerance level set at 10 ms. fully reliable in the case of singing ensemble recordings, therefore, the quantification of the percentage of times that this step was followed is important to test the reliability of the protocol. The analysis of the quality of the Lx signal analysis was conducted on the full set of recordings collected for the following case-study, including 96 recordings of the upper voice and 96 recordings of the lower voice part of a duet piece composed for the experiment. Sections of the Lx signal associated with rests in the music score were not scrutinized, as the Lx was supposed to be null in the absence of phonation. Results show that the Lx signal was unusable for 0.7% of the recordings and, therefore, the algorithm s application of the AUDIO signal was limited to 0.7% of the full set of recordings. Analysis also shows that the discontinuous Lx segments were on average 31 ms long (SD 18 ms). A subset of 40 discontinuous Lx segments averaging 30 ms in length was used to assess the precision of the reconstruction method, by comparing the reconstructed Lx signal with the corresponding Lx signal. The Lx values from the raw segments were initially deleted, then the reconstruction process was run based on the Lx and AUDIO signal, and eventually the raw values were compared with the reconstructed recordings. Results show an average margin of error of 0.034%; the margin of error (E) was first computed for each data point as follows, E ¼ V raw av rec V raw (9) Figure 6. Distribution of percentage detection errors computed at the beginning and ending of each note across the course of the piece.

10 8 S. D AMARIO ET AL. and then averaged across the entire sample. V raw represents the raw value extracted from the Lx signal, whilst the V rec is the value reconstructed from the algorithm based on the shape of the AUDIO signal. NVC_UpperVoiceL: without VC, and upper voice designated leader and lower voice follower NVC_UpperVoiceF: without VC, and upper voice designated follower and lower voice leader Case study of synchronization in singing ensembles The following case study aims to test the overall protocol featuring the application of TIMEX to Lx and audio recordings, to analyse the effect of VC and the instruction to act as leader or follower on the synchronization between singers during singing duo performances. This study serves as a test for a subsequent experiment with a larger sample of duos. Methods Participants Four undergraduate singing students (three females and one male) were recruited from the Department of Music at the University of York. Singers had at least 7 years experience performing in a singing ensemble (mean 9.3 years, SD 2.1), but they had not sung together prior to the experiment. They reported having normal hearing and not having absolute pitch. Stimulus material A vocal duet exercise was composed for this study, featuring mostly a homophonic texture to allow investigation of the synchronization per note, as shown in Figure 5. The upper voice has a range of a 7th, whilst the lower voice a range of a 5th; the upper voice features a higher tessitura than the lower voice. Apparatus Participants were invited to sing in a recording studio at the University of York, treated with absorptive acoustic material. Singers wore head-mounted close proximity microphones (DPA 4065), placed on the cheek at approximately 2 cm from the lips, and electrolaryngograph electrodes (Lx, from Laryngograph Ldt placed on the neck positioned either side of the thyroid cartilage. One stereo condenser microphone (Rode NT4) was placed at equal distance in front of the singers at approximately 1.5 m from the lips. The five outputs (2 Lx, 2 head-mounted mics, one stereo mic) were connected to a multichannel hard disk recorder (Tascam DR680) and recorded at a sampling frequency of 48 khz and 24-bit depth. Design The study consisted of a within subject design in which participants were asked to sing the piece in the following four conditions, applied in a randomised order: VC_UpperVoiceL: with VC, and upper voice designated leader and lower voice (LowerVoice) follower VC_UpperVoiceF: with VC, and upper voice designated follower and lower voice leader Each condition was presented three times, resulting in 12 takes; each take consisted of four repeated performances of the piece, resulting in a 4 (conditions) 3 (repeated performances of each condition), 4 (repeated performances within each condition) design featuring a total of 48 repetitions of the piece per duet. Procedure Singers received the stimulus material prior to the experiment, to practise the piece. On the day of the experiment, first participants were asked to fill in a background questionnaire and consent form. Then, head mounted microphones and Lx electrodes were placed on each singer and adjusted. The correct placement of the Lx electrodes was verified by checking the signal on the visual display and listening over headphones. The microphones were adjusted for the sound pressure level of each participant to avoid clipping. Singers were invited to familiarize themselves with the piece for 10 minutes, singing together from the score to the vowel/i/, while listening for 10 seconds to a metronome set at 100 BPM before starting to rehearse. If singers were able to perform the piece without errors, the four conditions and associated 12 takes were then presented; otherwise, they were allowed to practise the piece for 10 more minutes and then the test was repeated. Once the musicians passed the performance test without errors with the score, each singer was assigned the role of leader or follower; these roles were then reversed according to UpperVoiceL and UpperVoiceF conditions. Signs labelled leader and follower were placed on the floor in front of the participants, to remind them of their roles. Each singer only had one assigned part/musical voice. Singers were invited to face each other at a distance of 1.5 m in the visual condition and to face away from each other at the same distance in the non-visual contact (NVC) condition. Singers were not aware of the purpose of the study. The 12 takes were recorded singing by heart with short breaks between each of them. The experiment lasted approximately one hour. Ethical approval for the study was obtained from the Physical Sciences Ethics Committee (PSEC) at The University of York (UK). Analysis For each recorded performance, two sets of data including the audio waveform from the microphones and the Lx waveform were imported into Praat as.wav files and f o was extracted with a time step of 1 ms. These data were imported into Microsoft Excel 2016 in the form of a tabular list of data points, including the f o in Hertz and corresponding timestamp. Asynchronies were then calculated to measure the phase synchrony between singers for NB, NE, ON and OF of the selected notes, as shown in Figure 5.

11 LOGOPEDICS PHONIATRICS VOCOLOGY 9 Table 3. Summary of the mean and median values per condition showing the differences across conditions and the levels of p values for the significant effects ( p <.05; p <.01). Duo 1 Duo 2 Duo 1 Duo 2 VC NVC VC NVC UpperVoiceL UpperVoiceF UpperVoiceL UpperVoiceF ON Precision (M) Consistency (SD) Consistency (CV) Tendency to lead (median signed) NB Precision (M) Consistency (SD) Consistency (CV) Tendency to lead (median signed) NE Precision (M) Consistency (SD) Consistency (CV) Tendency to lead (median signed) OF Precision (M) Consistency (SD) Consistency (CV) Tendency to lead (median signed) Mean, SD and median asynchronies are expressed in ms, whilst CV values are dimensionless numbers. Those notes were chosen as being relevant to synchronization. The phase asynchrony was computed subtracting the follower s timestamp values from the leader s (leader minus follower) related to NB, NE, ON and OF of the selected notes. Negative values show that the leader preceded the follower, while positive values indicate that the follower is ahead of the leader. The detection of ON, NB, NE and OF was automated through the application of TIMEX and the resulting timestamp data obtained from the note detection algorithm were then analysed in SPSS (SPSS 24, IBM, Armonk, NY). This event detection method was visually validated for the entire data set by the first author (SD). In addition, occasional pitch errors due to the musician singing a wrong note were also investigated by comparing the f o values and the audio recording with the notated score. Takes in which a pitch error occurred were excluded from the analysis. The overall error rate was less than 1%. Outliers were identified based on the MAD (median absolute deviation), and asynchronies that fell more than 2.5 absolute deviations from the median were excluded. This approach is the most robust method to detect outliers, when the distribution is not normal and outliers are present (23), as in this case. Results The following sections present the results of four sets of analyses that were run to measure the effect of VC and leader follower relationships on interpersonal synchronization. The first set measures the precision of interpersonal synchronization, as indexed by the mean of absolute asynchronies. The second and third set of analyses investigate the amount of variation of interpersonal synchronization, as indexed by the standard deviation (SD) and the coefficient of variation (CV) of the absolute asynchronies. The fourth set of analyses focuses on the tendency to precede or lag a co-performer, as indexed by the median (Mdn) of signed asynchronies. Each set of analyses was run on ON, NB, NE and OF across each duo/performance in VC, NVC, UpperVoiceL and UpperVoiceF. Each set includes descriptive analyses and paired tests, including dependent paired t-tests and Wilcoxon s signed rank tests. t-tests were chosen to analyse differences between means within the absolute asynchronies data sample, whilst Wilcoxon s tests were selected to assess median differences across signed asynchronies. These statistical tests were run for each condition. Results, using Bonferroni s correction for multiple comparisons, are summarized in Table 3. Visual contact Duo 1 Mean, SD and CV of absolute asynchronies and median of signed asynchronies for duo 1, calculated for ON, NB, NE and OF during VC and NVC, are shown in Figure 7. Results from the paired sample tests showed a significant effect of the presence of VC on the NB standard deviation asynchronies, t(23) ¼ 2.43, p ¼.023, r ¼.45. As can be seen in Figure 7(B), consistency of synchronization was found to significantly increase in the NVC condition for NB standard deviation asynchronies, compared with the VC condition. No significant effect was found for the remaining paired sample tests conducted across duo 1. Duo 2 Median signed asynchronies, and mean, SD and CV of absolute asynchronies for duo 2 are shown in Figure 8. Paired sample tests were run as for duo 1. The t-test on the mean NB asynchronies highlighted a significant effect of the presence/absence of VC, t(23) ¼ 2.86, p ¼.018, r ¼.51, showing that precision improved in NVC. No significant effect was found for the remaining paired sample tests.

12 10 S. D AMARIO ET AL. Figure 7. Interpersonal synchronization of duo 1 with visual contact (VC) and without visual contact (NVC) between singers, as indexed by the mean (A), standard deviation (B), coefficient of variation (CV) of absolute asynchronies (C) and median of signed asynchronies (D) calculated across ON, NB, NE and OF. Error bars represent the standard error of the mean for precision and consistency, and the interquartile range for tendency to precede. Smallest values in the precision and consistency of asynchronies indicate an increase in coordination, whilst negative values in the tendency to precede mean that the designated leader is ahead of the follower. p <.05. Leader follower relationships Duo 1 Mean, SD and CV of absolute asynchronies, and median signed asynchronies, averaged across the 48 performances in the UpperVoiceL and UpperVoiceF conditions for duo 1, are shown in Figure 9. Paired sample t-tests yielded a significant effect of the instruction to act as leader or follower on both measures of consistency for NB: SD asynchronies, t(23) ¼ 2.48, p ¼.0021, r ¼.46, and CV asynchronies, t(23) ¼ 2.60, p ¼.016, r ¼.48. Consistency of NB synchronization was significantly better when the upper voice was instructed to follow, rather than to lead, as shown in Figure 9(B,C). Wilcoxon s tests revealed a main significant effect of leader follower instruction on the degree of preceding ON median asynchronies, T ¼ 60, p ¼.010, and NB median asynchronies, T ¼ 71, p ¼.024. One sample t-tests conducted on ON and NB for each condition showed that: (i) ON median asynchronies when the upper voice was instructed to follow were significantly different from 0, t(23) ¼ 3.208, p ¼.004, r ¼.56; (ii) NB median values when the upper voice was instructed to lead were significantly different from 0, t(23)¼ 6.287, p ¼ , r ¼.80; and (iii) NB median data when the upper voice was instructed to follow were different from 0, t(23)¼ , p ¼ E 11, r ¼.92. These results demonstrate that when either voice was instructed to lead, the designated leader significantly tended to precede the designated follower at NB. However, when the upper voice was instructed to follow, the designated follower (i.e. the upper voice voice) significantly tended to precede at ON. Duo 2 Median signed asynchronies, and mean, SD and CV of absolute asynchronies computed for duo 2 in UpperVoiceL and UpperVoiceF conditions are shown in Figure 10. Paired sample tests were calculated as for duo 1. A significant effect of the leader follower instruction was found on the consistency of NB synchronization, as indexed by: (i) SD

13 LOGOPEDICS PHONIATRICS VOCOLOGY 11 Figure 8. Interpersonal synchronization of duo 2 with visual contact (VC) and without visual contact (NVC) between singers, as indexed by the mean (A), standard deviation (B), coefficient of variation (CV) of absolute asynchronies (C) and median of signed asynchronies (D) calculated across ON, NB, NE and OF. Error bars represent the standard error of the mean for precision and consistency, and the interquartile range for tendency to precede. Smallest values in the precision and consistency of asynchronies indicate an increase in coordination, whilst negative values in the tendency to precede mean that the designated leader is ahead of the follower. p <.05. asynchronies, t(23) ¼ 4.40, p ¼.0002, r ¼.8; and (ii) CV asynchronies, t(23) ¼ 2.65, p ¼.014, r ¼.48. Consistency of NB synchronization was better when the upper voice was instructed to lead and the lower voice to follow. Finally, as shown in Figure 10(D), Wilcoxon tests revealed a significant effect of leader follower instruction on the degree of preceding/lagging: (i) median NB asynchronies, T ¼ 38.5, p ¼.001; (ii) median NE asynchronies, T ¼ 33, p ¼.001; and (iii) median OF asynchronies, T ¼ 42, p ¼.002. One sample t-tests on median ON, NB, NE and OF were conducted as for duo 1 to observe whether the tendency to precede/lag was significant in each condition. Results showed that: (i) NB asynchronies were significantly different from 0 when the upper voice was instructed to lead, t(23) ¼ 3.564, p ¼.002, r ¼.60, and to follow t(23) ¼ 2.718, p ¼.012, r ¼.49; (ii) NE value were significantly different from 0 when the upper voice was instructed to lead, t(23)¼ 2.845, p ¼.009, r ¼.51, and also to follow, t(23) ¼ 3.144, p ¼.005, r ¼.55; and (iii) OF asynchronies were significantly different from 0 when the upper voice was instructed to lead t(23) ¼ 4.695, p ¼.00009, r ¼.70. These results demonstrate that when either voice was instructed to lead, the upper voice significantly tended to precede the lower voice at NB and NE. However, when the upper voice was instructed to lead, the designated leader tended to lag at OF. These results show a complex pattern of leader and follower relationships, rather than a clear separation of roles, which seems to be independent of the researcher s instruction to lead or follow. Piece learning effects Prior to investigating the effect of VC and leader follower relationships, data were examined for evidence of changes in interpersonal synchrony across the course of the 48 repeated performances. The learning effect was investigated averaging the asynchronies for each performance and for each synchronization measure (i.e. precision, consistency and tendency to precede). Results show that there were no

14 12 S. D AMARIO ET AL. Figure 9. Interpersonal synchronization for duo 1 with the upper voice as the designated leader (UpperVoiceL) or follower (UpperVoiceF), as indexed by the mean (A), standard deviation (B), coefficient of variation (CV) of absolute asynchronies (C) and median of signed asynchronies (D) calculated across ON, NB, NE and OF. Error bars indicate the standard error of the mean for precision and consistency, and the interquartile range for tendency to precede. Smallest values in the precision and consistency of asynchronies indicate an increase in coordination, whilst negative values in the tendency to precede mean that the designated leader is ahead of the follower. p <.05. discernible learning effects for duo 1 or duo 2, as shown in Figures 11 and 12. Discussion The aim of the study was to describe and test a novel algorithm, TIMEX, that extracts onsets and offsets of phonation and note beginnings and endings from monoaural recordings of ensemble singing. The algorithm presented in this paper is based on the fundamental frequency profile. It has been developed on the basis of a purely mathematical definition of a local max/ min, with the addition of a series of rules to ignore points that the definition would retain but would not represent a change of note in the score being performed. The rules have been conceived based on the issues encountered during the first processing attempts, such as local spikes, vibrato, Lx signal interruptions and onset/offset fluctuation range. Each of these rules is associated with the definition of a threshold parameter to enforce the rule, which has been tweaked by trial and error to provide the most accurate results, comparing the output of the algorithm for the selected recording to the score that was performed. When testing the algorithm and in the case study presented, the same parameters were used for the four semiprofessional performers involved, and for the upper and lower voice parts. The fluctuation threshold and the vibrato frequency threshold can be expected to be different for opera singers, who might exhibit a larger vibrato extent. Optimal values regarding rest, fluctuation and spiking range are expected to vary across pieces, especially if the tempo and duration of rests and notes at the beginning and ends of phrases (and therefore onset and offset of phonation) are very different from the two-part piece used for this set of recordings. The evaluation of TIMEX in the present study showed an overall F-measure of 78% within a tolerance window of 50ms, which seems very promising in light of the state-ofthe-art techniques presented at MIREX in 2016 yielding F-measures of around 60%. Direct comparisons with other

15 LOGOPEDICS PHONIATRICS VOCOLOGY 13 Figure 10. Interpersonal synchronization for duo 2 with the upper voice as the designated leader (UpperVoiceL) or follower (UpperVoiceF), as indexed by the mean (A), standard deviation (B), coefficient of variation (CV) of absolute asynchronies (C) and median of signed asynchronies (D) calculated across ON, NB, NE and OF. Error bars indicate the standard error of the mean for precision and consistency, and the interquartile range for tendency to precede. Smallest values in the precision and consistency of asynchronies indicate an increase in coordination, whilst negative values in the tendency to precede mean that the designated leader is ahead of the follower. p <.05; p <.01. methods cannot be made unless the same data set is used; comparative evaluations are planned in the future. Other avenues of research should take into account issues relating to the small fluctuations within the onsets; TIMEX limits the detection to local max/min points, whilst the ground truth considers also the steepness of the f o profile, by detecting onsets based on the rate of change of the curve. This could be addressed by developing the algorithm further using the second derivative of the waveform in addition to the max/min points. A future direction of this research should also consider the analysis of the singing voice with lyrics. It is reasonable to expect that this algorithm will work well with percussive instruments, although they would probably require the use of different thresholds for the same rules. Whilst the issues of singing onset detection cannot be considered solved by this system, its potential is promising. Furthermore, this study described and tested a new protocol for the analysis of synchronization in singing ensembles based on the combined application of electrolaryngography and acoustics analysis, and the TIMEX algorithm. The use of electrolaryngography allowed the identification of the contribution of individual voices, avoiding the complication of polyphonic recordings. This set-up was very successful: the signal failed on only 0.7% of the entire set of recordings, during which the analysis had to rely on the acoustic signal, which could potentially suffer from audio bleed from the other singers. In order to ensure accurate and reliable recordings of vocal fold vibration in the Lx signal, the proper placement of the electrodes is fundamental. The electrodes should be placed in the thyroid region behind the vocal folds in the middle of each thyroid lamina (24). Furthermore, consideration should be given to the fact that the Lx signal may be too weak or noisy to be reliable for use on certain populations, including children (25), sopranos (26), and when a thick layer of subcutaneous tissue is present in the neck (27,24). Finally, the role of VC and leader follower relationships was investigated in the two singing duets. Synchronisation was assessed by analysing timings between singers in each duo, as indexed by ON, NB, NE and OF asynchronies

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

Computer Coordination With Popular Music: A New Research Agenda 1

Computer Coordination With Popular Music: A New Research Agenda 1 Computer Coordination With Popular Music: A New Research Agenda 1 Roger B. Dannenberg roger.dannenberg@cs.cmu.edu http://www.cs.cmu.edu/~rbd School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis Semi-automated extraction of expressive performance information from acoustic recordings of piano music Andrew Earis Outline Parameters of expressive piano performance Scientific techniques: Fourier transform

More information

Measurement of overtone frequencies of a toy piano and perception of its pitch

Measurement of overtone frequencies of a toy piano and perception of its pitch Measurement of overtone frequencies of a toy piano and perception of its pitch PACS: 43.75.Mn ABSTRACT Akira Nishimura Department of Media and Cultural Studies, Tokyo University of Information Sciences,

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS Rui Pedro Paiva CISUC Centre for Informatics and Systems of the University of Coimbra Department

More information

Temporal coordination in string quartet performance

Temporal coordination in string quartet performance International Symposium on Performance Science ISBN 978-2-9601378-0-4 The Author 2013, Published by the AEC All rights reserved Temporal coordination in string quartet performance Renee Timmers 1, Satoshi

More information

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT Zheng Tang University of Washington, Department of Electrical Engineering zhtang@uw.edu Dawn

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

The Relationship Between Auditory Imagery and Musical Synchronization Abilities in Musicians

The Relationship Between Auditory Imagery and Musical Synchronization Abilities in Musicians The Relationship Between Auditory Imagery and Musical Synchronization Abilities in Musicians Nadine Pecenka, *1 Peter E. Keller, *2 * Music Cognition and Action Group, Max Planck Institute for Human Cognitive

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene Beat Extraction from Expressive Musical Performances Simon Dixon, Werner Goebl and Emilios Cambouropoulos Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria.

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Equal or non-equal temperament in a capella SATB singing

Equal or non-equal temperament in a capella SATB singing Equal or non-equal temperament in a capella SATB singing David M Howard Head of the Audio Laboratory, Intelligent Systems Research Group Department of Electronics, University of York, Heslington, York,

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Video-based Vibrato Detection and Analysis for Polyphonic String Music Video-based Vibrato Detection and Analysis for Polyphonic String Music Bochen Li, Karthik Dinesh, Gaurav Sharma, Zhiyao Duan Audio Information Research Lab University of Rochester The 18 th International

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Quarterly Progress and Status Report. Replicability and accuracy of pitch patterns in professional singers

Quarterly Progress and Status Report. Replicability and accuracy of pitch patterns in professional singers Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Replicability and accuracy of pitch patterns in professional singers Sundberg, J. and Prame, E. and Iwarsson, J. journal: STL-QPSR

More information

Music Representations

Music Representations Lecture Music Processing Music Representations Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound Pitch Perception and Grouping HST.723 Neural Coding and Perception of Sound Pitch Perception. I. Pure Tones The pitch of a pure tone is strongly related to the tone s frequency, although there are small

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Voice source and acoustic measures of girls singing classical and contemporary commercial styles

Voice source and acoustic measures of girls singing classical and contemporary commercial styles International Symposium on Performance Science ISBN 978-90-9022484-8 The Author 2007, Published by the AEC All rights reserved Voice source and acoustic measures of girls singing classical and contemporary

More information

Contest and Judging Manual

Contest and Judging Manual Contest and Judging Manual Published by the A Cappella Education Association Current revisions to this document are online at www.acappellaeducators.com April 2018 2 Table of Contents Adjudication Practices...

More information

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad. Getting Started First thing you should do is to connect your iphone or ipad to SpikerBox with a green smartphone cable. Green cable comes with designators on each end of the cable ( Smartphone and SpikerBox

More information

APP USE USER MANUAL 2017 VERSION BASED ON WAVE TRACKING TECHNIQUE

APP USE USER MANUAL 2017 VERSION BASED ON WAVE TRACKING TECHNIQUE APP USE USER MANUAL 2017 VERSION BASED ON WAVE TRACKING TECHNIQUE All rights reserved All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH by Princy Dikshit B.E (C.S) July 2000, Mangalore University, India A Thesis Submitted to the Faculty of Old Dominion University in

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Case Study Monitoring for Reliability

Case Study Monitoring for Reliability 1566 La Pradera Dr Campbell, CA 95008 www.videoclarity.com 408-379-6952 Case Study Monitoring for Reliability Video Clarity, Inc. Version 1.0 A Video Clarity Case Study page 1 of 10 Digital video is everywhere.

More information

Automatic Construction of Synthetic Musical Instruments and Performers

Automatic Construction of Synthetic Musical Instruments and Performers Ph.D. Thesis Proposal Automatic Construction of Synthetic Musical Instruments and Performers Ning Hu Carnegie Mellon University Thesis Committee Roger B. Dannenberg, Chair Michael S. Lewicki Richard M.

More information

AUD 6306 Speech Science

AUD 6306 Speech Science AUD 3 Speech Science Dr. Peter Assmann Spring semester 2 Role of Pitch Information Pitch contour is the primary cue for tone recognition Tonal languages rely on pitch level and differences to convey lexical

More information

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Holger Kirchhoff 1, Simon Dixon 1, and Anssi Klapuri 2 1 Centre for Digital Music, Queen Mary University

More information

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric

More information

A Bayesian Network for Real-Time Musical Accompaniment

A Bayesian Network for Real-Time Musical Accompaniment A Bayesian Network for Real-Time Musical Accompaniment Christopher Raphael Department of Mathematics and Statistics, University of Massachusetts at Amherst, Amherst, MA 01003-4515, raphael~math.umass.edu

More information

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t MPEG-7 FOR CONTENT-BASED MUSIC PROCESSING Λ Emilia GÓMEZ, Fabien GOUYON, Perfecto HERRERA and Xavier AMATRIAIN Music Technology Group, Universitat Pompeu Fabra, Barcelona, SPAIN http://www.iua.upf.es/mtg

More information

Zooming into saxophone performance: Tongue and finger coordination

Zooming into saxophone performance: Tongue and finger coordination International Symposium on Performance Science ISBN 978-2-9601378-0-4 The Author 2013, Published by the AEC All rights reserved Zooming into saxophone performance: Tongue and finger coordination Alex Hofmann

More information

Multidimensional analysis of interdependence in a string quartet

Multidimensional analysis of interdependence in a string quartet International Symposium on Performance Science The Author 2013 ISBN tbc All rights reserved Multidimensional analysis of interdependence in a string quartet Panos Papiotis 1, Marco Marchini 1, and Esteban

More information

Acoustic and musical foundations of the speech/song illusion

Acoustic and musical foundations of the speech/song illusion Acoustic and musical foundations of the speech/song illusion Adam Tierney, *1 Aniruddh Patel #2, Mara Breen^3 * Department of Psychological Sciences, Birkbeck, University of London, United Kingdom # Department

More information

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Investigation

More information

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING Mudhaffar Al-Bayatti and Ben Jones February 00 This report was commissioned by

More information

Instructions to Authors

Instructions to Authors Instructions to Authors European Journal of Psychological Assessment Hogrefe Publishing GmbH Merkelstr. 3 37085 Göttingen Germany Tel. +49 551 999 50 0 Fax +49 551 999 50 111 publishing@hogrefe.com www.hogrefe.com

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

Estimating the Time to Reach a Target Frequency in Singing

Estimating the Time to Reach a Target Frequency in Singing THE NEUROSCIENCES AND MUSIC III: DISORDERS AND PLASTICITY Estimating the Time to Reach a Target Frequency in Singing Sean Hutchins a and David Campbell b a Department of Psychology, McGill University,

More information

Analysis of local and global timing and pitch change in ordinary

Analysis of local and global timing and pitch change in ordinary Alma Mater Studiorum University of Bologna, August -6 6 Analysis of local and global timing and pitch change in ordinary melodies Roger Watt Dept. of Psychology, University of Stirling, Scotland r.j.watt@stirling.ac.uk

More information

How do we perceive vocal pitch accuracy during singing? Pauline Larrouy-Maestri & Peter Q Pfordresher

How do we perceive vocal pitch accuracy during singing? Pauline Larrouy-Maestri & Peter Q Pfordresher How do we perceive vocal pitch accuracy during singing? Pauline Larrouy-Maestri & Peter Q Pfordresher March 3rd 2014 In tune? 2 In tune? 3 Singing (a melody) Definition è Perception of musical errors Between

More information

MAutoPitch. Presets button. Left arrow button. Right arrow button. Randomize button. Save button. Panic button. Settings button

MAutoPitch. Presets button. Left arrow button. Right arrow button. Randomize button. Save button. Panic button. Settings button MAutoPitch Presets button Presets button shows a window with all available presets. A preset can be loaded from the preset window by double-clicking on it, using the arrow buttons or by using a combination

More information

A prototype system for rule-based expressive modifications of audio recordings

A prototype system for rule-based expressive modifications of audio recordings International Symposium on Performance Science ISBN 0-00-000000-0 / 000-0-00-000000-0 The Author 2007, Published by the AEC All rights reserved A prototype system for rule-based expressive modifications

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Introduction to Performance Fundamentals

Introduction to Performance Fundamentals Introduction to Performance Fundamentals Produce a characteristic vocal tone? Demonstrate appropriate posture and breathing techniques? Read basic notation? Demonstrate pitch discrimination? Demonstrate

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

ANALYSING DIFFERENCES BETWEEN THE INPUT IMPEDANCES OF FIVE CLARINETS OF DIFFERENT MAKES

ANALYSING DIFFERENCES BETWEEN THE INPUT IMPEDANCES OF FIVE CLARINETS OF DIFFERENT MAKES ANALYSING DIFFERENCES BETWEEN THE INPUT IMPEDANCES OF FIVE CLARINETS OF DIFFERENT MAKES P Kowal Acoustics Research Group, Open University D Sharp Acoustics Research Group, Open University S Taherzadeh

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Singing accuracy, listeners tolerance, and pitch analysis

Singing accuracy, listeners tolerance, and pitch analysis Singing accuracy, listeners tolerance, and pitch analysis Pauline Larrouy-Maestri Pauline.Larrouy-Maestri@aesthetics.mpg.de Johanna Devaney Devaney.12@osu.edu Musical errors Contour error Interval error

More information

The Measurement Tools and What They Do

The Measurement Tools and What They Do 2 The Measurement Tools The Measurement Tools and What They Do JITTERWIZARD The JitterWizard is a unique capability of the JitterPro package that performs the requisite scope setup chores while simplifying

More information

Controlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach

Controlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach Controlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach Carlos Guedes New York University email: carlos.guedes@nyu.edu Abstract In this paper, I present a possible approach for

More information

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T ) REFERENCES: 1.) Charles Taylor, Exploring Music (Music Library ML3805 T225 1992) 2.) Juan Roederer, Physics and Psychophysics of Music (Music Library ML3805 R74 1995) 3.) Physics of Sound, writeup in this

More information

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC Lena Quinto, William Forde Thompson, Felicity Louise Keating Psychology, Macquarie University, Australia lena.quinto@mq.edu.au Abstract Many

More information

Predicting Variation of Folk Songs: A Corpus Analysis Study on the Memorability of Melodies Janssen, B.D.; Burgoyne, J.A.; Honing, H.J.

Predicting Variation of Folk Songs: A Corpus Analysis Study on the Memorability of Melodies Janssen, B.D.; Burgoyne, J.A.; Honing, H.J. UvA-DARE (Digital Academic Repository) Predicting Variation of Folk Songs: A Corpus Analysis Study on the Memorability of Melodies Janssen, B.D.; Burgoyne, J.A.; Honing, H.J. Published in: Frontiers in

More information

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1 02/18 Using the new psychoacoustic tonality analyses 1 As of ArtemiS SUITE 9.2, a very important new fully psychoacoustic approach to the measurement of tonalities is now available., based on the Hearing

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller) Topic 11 Score-Informed Source Separation (chroma slides adapted from Meinard Mueller) Why Score-informed Source Separation? Audio source separation is useful Music transcription, remixing, search Non-satisfying

More information

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,

More information

Music Information Retrieval Using Audio Input

Music Information Retrieval Using Audio Input Music Information Retrieval Using Audio Input Lloyd A. Smith, Rodger J. McNab and Ian H. Witten Department of Computer Science University of Waikato Private Bag 35 Hamilton, New Zealand {las, rjmcnab,

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals

More information

Augmentation Matrix: A Music System Derived from the Proportions of the Harmonic Series

Augmentation Matrix: A Music System Derived from the Proportions of the Harmonic Series -1- Augmentation Matrix: A Music System Derived from the Proportions of the Harmonic Series JERICA OBLAK, Ph. D. Composer/Music Theorist 1382 1 st Ave. New York, NY 10021 USA Abstract: - The proportional

More information

Interface Practices Subcommittee SCTE STANDARD SCTE Measurement Procedure for Noise Power Ratio

Interface Practices Subcommittee SCTE STANDARD SCTE Measurement Procedure for Noise Power Ratio Interface Practices Subcommittee SCTE STANDARD SCTE 119 2018 Measurement Procedure for Noise Power Ratio NOTICE The Society of Cable Telecommunications Engineers (SCTE) / International Society of Broadband

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 6, OCTOBER 2011 1205 Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE,

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function Phil Clendeninn Senior Product Specialist Technology Products Yamaha Corporation of America Working with

More information

A Beat Tracking System for Audio Signals

A Beat Tracking System for Audio Signals A Beat Tracking System for Audio Signals Simon Dixon Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria. simon@ai.univie.ac.at April 7, 2000 Abstract We present

More information

Timbre blending of wind instruments: acoustics and perception

Timbre blending of wind instruments: acoustics and perception Timbre blending of wind instruments: acoustics and perception Sven-Amin Lembke CIRMMT / Music Technology Schulich School of Music, McGill University sven-amin.lembke@mail.mcgill.ca ABSTRACT The acoustical

More information

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING José Ventura, Ricardo Sousa and Aníbal Ferreira University of Porto - Faculty of Engineering -DEEC Porto, Portugal ABSTRACT Vibrato is a frequency

More information

Music BCI ( )

Music BCI ( ) Music BCI (006-2015) Matthias Treder, Benjamin Blankertz Technische Universität Berlin, Berlin, Germany September 5, 2016 1 Introduction We investigated the suitability of musical stimuli for use in a

More information