GMM-based Synchronization rules for HMM-based Audio-Visual laughter synthesis

Size: px
Start display at page:

Download "GMM-based Synchronization rules for HMM-based Audio-Visual laughter synthesis"

Transcription

1 2015 International Conference on Affective Coputing and Intelligent Interaction (ACII) GMM-based Synchronization rules for HMM-based Audio-Visual laughter synthesis Hüseyin Çakak, UMONS, Place du Parc 20, 7000 Mons, Kévin El Haddad, UMONS, Place du Parc 20, 7000 Mons, Thierry Dutoit, UMONS, Place du Parc 20, 7000 Mons, Abstract In this paper we propose synchronization rules between acoustic and visual laughter synthesis systes. Previous works have addressed separately the acoustic and visual laughter synthesis following an HMM-based approach. The need of synchronization rules coes fro the constraint that in laughter, HMM-based synthesis cannot be perfored using a unified syste where coon transcriptions ay be used as it has been shown to be the case for audio-visual speech synthesis. Therefore acoustic and visual odels are trained independently without any synchronization constraints. In this work, we propose rules derived fro the analysis of audio and visual laughter transcriptions in order to be able to generate a visual laughter transcriptions corresponding to an audio laughter data. Keywords - audio-visual; synchronization; laughter; synthesis I. INTRODUCTION Aong features of huan interactions, laughter is one of the ost significant. It is a way to express our eotions and ay even be a response in soe interactions. The last decades witnessed a considerable progress in speech processing and affective coputing. Also, huan-achine interactions are becoing ore and ore present in our daily lives. So, considering the iportance of laughter in our daily counications, this non-verbal counicative signal can and should be successfully detected, analyzed and produced by achines. This work focuses on laughter production and ore specifically on the synchronization between audio and visual laughter in the fraework of HMM-based audio-visual laughter synthesis. Acoustic synthesis of laughter using Hidden Markov Models (HMMs) has already been addressed in a previous work [1]. To characterize the acoustic laughter, phoneic transcriptions were used and the results outperfored the state of the art. Extensions of the latter work were done to perfor autoatic phoneic transcriptions [2] and to integrate the arousal in the syste [3]. The goal of audio-visual laughter synthesis is to generate an audio wavefor of laughter as well as its corresponding facial aniation. In statistical data-driven audiovisual speech synthesis, it is coon that acoustic and visual odels are trained separately [4], [5], [6], [7]. The training also soeties include an additional explicit tie difference odel for synchronization purpose [8], [9]. In 2014, a visual laughter synthesis syste has also been proposed and is the basis of the visual laughter synthesis in this work [10]. Indeed, the authors showed in that work that a separate segentation of the laughter is needed to correctly odel the visual trajectories eaning that phoneic transcriptions are not suited to describe the visual cues for laughter as it has been shown to be feasible for speech [8], [11], [12], [13]. Further developents have shown that the head otion should be odeled separately as well [14]. Modeling independently audio, facial data and head data eans having specific transcriptions for each and thus, the need for synchronization arises. In [10], the synchronization between odalities was guaranteed by iposing synthesized durations to be the sae as in the database, in which the transcriptions are synchronous in the first place. To bring this to the next level and to be able to synthesize audio-visual laughter with any wanted duration, a ethod was proposed in [15] to odel the relationships between transcriptions. An iproved ethod for synchronization between audio and visual transcriptions is proposed in this paper. The basic principle lying under the proposed ethod is a Gaussian Mixture Models (GMM)-based apping used to generate the tie delays between the beginning (ending) of the audio and the beginning (ending) of the visual laughter. First a silence reoval ethod is used to estiate at what ties the laugh begins and ends in the given audio file. Then, a GMM [16] is trained on features extracted fro the audio. It is then used to generate the tie delays to add to the audio laughter liits to get the visual laughter liits. Once these liits are set, visual transcriptions ay be built to feed an HMM-based visual laughter synthesizer which will produce visual trajectories synchronous with the given laughter audio file. Two iproveents are introduced in coparison to the previous work : i) the phoneic transcriptions are not needed anyore since the ethod relies only on the input audio file ii) the accuracy of the predicted delays has been iproved as shown by the RMSE coparison. The paper is organized as follows : Section II gives a brief overview on the database used in this work, Section III explains the audio and visual laughter synthesis ethods in the frae of which this work is taking place, Section IV explains the previous ethod proposed for synchronization between acoustic and visual synthesis, Section V describes the /15/$ IEEE 428

2 Audio-visual database BVH Motion Files Audio-visual database Synthesized face + head WAV Audio Files Transcription files Corrections Filtering AV Synchronisation Segentation Transcriptions Feature extraction Head data separation Synthesized audio Figure 1. Data recording pipeline Figure 2. synthesis Overview of the pipeline for HMM-based audio-visual laughter new ethod proposed in this paper, Section VI describes the evaluation and Section VII concludes and gives an overview of future work. II. THE AVLASYN DATABASE The AVLASYN Database [17] used in this work is a synchronous audio-visual laughter database designed for laughter synthesis. The corpus contains data fro one ale subject recorded using professional audio equipent and a arkerbased otion capture syste. Figure 1 gives an overview of the recording pipeline. The database contains laughter-segented audio files in WAV forat and corresponding otion data in the Biovision Hierarchy (BVH) forat. A first segentation was done to get files containing only laughter, then these files were phoneically annotated. Please refer to [18] for ore inforation on transcriptions. The laughs were triggered by watching videos found on the web. The subject was free to watch whatever he wanted. A total aount of 125 inutes were watched by the subject to build this corpus. This led to roughly 48 inutes of visual laughter and 13 inutes of audible laughter. This work uses a sub-set of the AVLASYN Database. In this work, only the ost coon laugh pattern (i.e. a silence phase followed by a laugh and possible inhalation and finishing with a silence) is considered[19]. III. HMM-BASED LAUGHTER SYNTHESIS Details on the odels can be found in [1] for acoustic odels and in [10] for visual odels. A brief overview is given below. The HMM-based trajectories were synthesized using the HMM-based Speech Synthesis Syste (HTS) [20]. Figure 2 gives the general pipeline followed to build the odels. The ain steps that ust be introduced to understand the reainder of this paper are : 1) Features are extracted fro the audio, the face oveents and the head oveents. [10], [14]). 2) These features are odeled independently with their respective transcriptions. 3) Once the odels are trained for each odality, trajectories are synthesized. For audio synthesis, the duration of each phone ay either be estiated by the syste or ay be iposed. For visual synthesis, durations are iposed fro rules based on acoustic features to be synchronized with the (cf Section IV and V). The reference visual transcriptions are built fro an autoated Gaussian Mixture Models (GMM)-based segentation syste detailed in [10], [14]. The ai of the present work is to be able to generate new such visual transcriptions that would be in synchronization with a corresponding audio laughter file. This would allow to synthesize visual laughter aniation fro existing HMM odels that will be consistent with a given audio laughter. Since the synchronization ethod presented in this work does not rely on the synthesis ethod itself, we focus on visual transcriptions generation starting fro a given audio file in the reainder of this paper. IV. PREVIOUS ATTEMPT In the HMM-based synthesis fraework, the first stage is the training of the odels. The training stage strongly relies on the provided annotations. It is thus iportant that the annotations correctly represent the data that the HMMs will odel. In the case of acoustic data, the annotations are phoneic transcriptions. An ideal case would have been that these phoneic transcriptions could be used as annotations for the visual data as well. This was successfully applied to audio-visual speech synthesis in [11], [12], [13], [8]. However, due to the fact that laughter, contrary to speech, is an inarticulate utterance [21], the correlation between the produced sounds and the facial expressions, in particular the outh shape, is uch lower and akes it ipossible to use the sae annotations for both odalities. This is why separate annotations were necessary for visual data training [10]. Instead of using phoneic classes, three specific classes related to the deforations on the face were used. A following study has shown that a third odality, the head otion, should be considered independently to better odel the head otion by considering the shaking otion /15/$ IEEE 429

3 Audio Visual ha ha ha inhalation tie Head Oscillations Figure 3. Scheatic representation of the different transcriptions (audio, face, head) during laughter. This approach perfored better in perception tests [14]. Finally, three different odalities (audio, facial and head oveents) all related to the sae phenoenon (laughter) are used. Each odality has its own transcriptions and therefore synchronization rules between these odalities are necessary since they are trained independently and nothing ensures the synchronization at the synthesis step. The transcriptions for the audio odality consist in several successive phones such as fricatives, different vowels, nasal sounds, nareal fricatives, silence and inhalation (cf [17] for ore details). The ost coon phoneic sequence for audio laughter is siilar to : silence-h-a-h-a-h-a-inhalation-silence. In the case of facial data, three classes are used in the visual transcriptions :, and Sei-. The latter is a facial expression between no expression at all and a slight sile (cf [10] for ore details). The ajority of the laughs in the database are a succession of the first two classes in the following order : --. Finally, the head otion transcriptions are the result of a sub-segentation of the facial class defined above. Each occurrence of one class during a laughter sequence represents one period of the head oscillation that occurs in laughter (cf [14] for ore details). Figure 3 gives a scheatic overview of the different transcriptions. As we can see, the beginning of the audio laughter (end of the silence) is not exactly aligned with the beginning of visual laughter (end of the neutral face). Siilarly, the visual laughter class ends soe tie after the last audible contribution. This shows that visual laughter is teporally wider than acoustic voiced laughter. Figure 3 also shows head oscillations with red circles. As it ay be seen, head otion transcriptions are defined such that the class reains the sae as in facial transcriptions and the laughter class is sub-segented into oscillation periods. The previous attept to build a synchronization ethod was to study the relation between the audio and facial data transcriptions. The ai was to derive rules fro the study of transcriptions to later use these rules to produce facial transcriptions corresponding to phoneic transcriptions. To odel the relationship between audio and visual transcriptions, Table I THE SIX PDFS USED TO MODEL DELAYS BETWEEN AUDIO AND VISUAL TRANSCRIPTIONS. DIFFERENT PDFS WERE BUILT BASED ON CHARACTERISTICS ON ACOUSTIC TRANSCRIPTIONS. nasal = VOICED NASAL SOUND LIKE N. nf = NAREAL FRICATIVE CORRESPONDING TO AN UNVOICED AIR EXPELLATION FROM THE NOSE. fricativeh i = THE H SOUND OCCURRING WHEN INHALATION WITH THE MOUTH. nf i = THE SOUND WHEN WHEN INHALATION WITH THE NOSE. AV,start silence-nasal silence-nf anything else AV,end ending inhalation is fricativeh i ending inhalation is nf i anything else the tie delay between the end of the initial silence in the phoneic transcriptions and the end of the neutral expression in the visual transcriptions was calculated. The tie shift at the end of the laughter between audio and visual odalities was calculated as well. These tie delays were odeled using kernel density estiation which is a non-paraetric ethod to estiate the probability density function of a rando variable [22], [23]. This fitting process was done for three different cases for the beginning delay and three different cases for the ending delay. Table I gives an overview of the probability density functions used in [15]. In that work, as in the present one, the facial transcriptions are assued to be a sequence of type -- which is the ost coon sequence (±80% of the database). Once facial transcriptions are generated, head otion transcriptions are generated fro these facial transcriptions. Finally, the generated visual transcriptions are used as input to their respective HMM odels for synthesis and trajectories are produced. The synthesized visual data for face and head are then erged and transfored appropriately [14] before application on a 3D face. Finally, video aniations are produced with the corresponding audio data. V. PROPOSED GMM-BASED MAPPING METHOD The present work also ais at estiating the teporal boundaries of visual laughter to feed an HMM-based visual laughter synthesizer as explained above. Copared to the previous ethod detailed in Section IV, two iproveents are targeted : 1) To throw off the need of the phoneic transcriptions as an input to the synchronization syste. The ai is to be able to work directly with the audio file using an autoated process rather than using anually annotated phoneic transcriptions which ay not always be available. 2) To iprove the accuracy of the estiated tie delays by using a GMM-apping approach based on acoustic features directly extracted fro the input audio laughter file /15/$ IEEE 430

4 These iproveents would bring us one step further towards the synchronization of a laughter aniation with any given audio laughter as input. Figure 4 gives an overview of the proposed ethod. A. Defining references fro audio In order to obtain the transcriptions autoatically, a siple silence reoval ethod ipleented in Matlab [24] was used to discriinate the silent parts fro the laughter parts in a given audio file. Originally dedicated for audio signals containing speech, this ethod proved to be efficient for laughter detection as well. It is to note that this is not a laughter recognition process since our audio files contain only laughter or silence as entioned earlier. Audio files are divided into non-overlapping fraes of 5 s each. Two features are then extracted fro each frae: the signal energy and the spectral centroid. Threshold values are then deterined for each of these features and used to discriinate the laughter segents fro the silence ones as described in [24]. As the ethod ay give several laughter detections in the sae laughter segent, a post-processing step is applied to erge the possible overlapping laughter segents detected so that one audio file contains one audio laughter segent as the input files are assued to be. To evaluate this ethod, the beginning and ending tie of laughter were copared to their corresponding values in the anual transcriptions. The coparison was done via a siple Root Mean Square Error estiation (RMSE). For the beginning and ending tie, we obtained RMSE values of sec and sec respectively. These results suggest that the used ethod is accurate enough for the purpose of this work. Indeed, rather than finding the exact sae boundaries as in the anual transcriptions, what is needed here is a consistent way of deterining the positions of the beginning and ending ties of the audio laughter onset. Fro these boundaries, tie delays between audio and visual laughter will be calculated and odels will be built as explained below. B. GMM-based apping to deterine the tie delays between audio and visual laughter Once the beginning tie t A,start and ending ties t A,end of the audio laughter are deterined using the ethod explained in the previous section, we can calculate the tie delays AV,start and AV,end between these estiated audio liits and the visual laughter liits fro the reference visual transcriptions. A teporal scheatic representation of the tie delays is given in Figure 5. 1) Features used for GMM odeling: A set of features are extracted fro the audio files. The first four features are scalar values while the rest are curves extracted by using a frae length of 25 s and a frae shift of 10 s. The Audio Visual t V,start t A,start Δ AV,start Δ AV,end t A,end t V,end tie Figure 5. Scheatic representation of the different ties and delays Table II LIST OF THE FEATURES CONSIDERED IN GMM MODELING Scalar features RMS value of a curve derived fro the spectrogra Utterance length Variance duration = t A,end - t A,start considered features are given in Table II. Continuous features Zero Crossing Rate Energy Energy Entropy Spectral Centroid Spectral Entropy Spectral flux Spectral Rolloff 13 MFCCs Fundaental Frequency F0 Chroa Vector Spectral Zone To reduce the nuber of diensions of each of these features, their histogra is calculated by iposing the nuber and centers of bins for each feature. This allows to produce 3-diensional feature vectors for each of the features listed above. RMS values are also included. The Pearson s correlation coefficients are then calculated between all the feature and the value of delays AV,start and AV,end for each file. The ost correlated features are kept for GMM odeling, they are suarized below: duration = t A,end - t A,start RMS Energy 1 st histogra bin of Spectral Centroid 3 d histogra bin of MFCC 1 3 d histogra bin of MFCC 7 1 st histogra bin of MFCC 11 3 d histogra bin of MFCC 13 2) GMM-based apping: We investigate the use of GMM apping fraework proposed in 1996 by Stylianou [25] for voice conversion. The ipleentation used here is the one of Kain [26] also used in recent work such as [27]. The ipleentation is based on the joint probability density of source and target vectors p(z) = p(x, Y ) with : /15/$ IEEE 431

5 Input WAV Files Detection t A,start t A,end Defining references fro audio Reference Visual Transcriptions Δ AV,start Δ AV,end Getting tie delays fro database for training Feature Extraction Features Matrix Training GMM to ap extracted features to audiovisual tie delays GMM training GMM odels Applying GMM-based apping to get the tie delays corresponding to a given feature vector New Source Feature Vector GMM-based apping algorith Δ AV,start Δ AV,end Output apped tie delays Figure 4. Overview of the proposed ethod for tie delays prediction follows : Z = [ X Y ] (1) x x 1dx y y 1dy = (2) x N1... x Ndx y N1... y Ndy where X and Y are the sequence of source and target (N sequences each) and d x, d y are the diensions of the source and target vectors. In the present work, d x is equal to 7 (see kept features above) and d y is equal to 2 corresponding to AV,start and AV,end which are the values that we want to predict fro the set of 7 source features. The apping function that estiates the target vector ŷ t starting fro a source vector x t at tie t is forulated as ŷ t = F (x t ) = M (W c t + b ) P (c x t ) (3) =1 where W is the transforation atrix and b the bias vector related to the th coponent of the odel and are defined as : Y X W = ( XX ) 1 (4) b = µ Y W µ X (5) and where /15/$ IEEE 432

6 = µ = XX Y X µx µ Y XY Y Y (6) (7) P (c x t ) is the probability that the source vector is related to the th coponent. This probability is defined as : P (c x t ) = ( α N M ( α p N p=1 x t, µ X, XX ) x t, µ X p, XX p ) (8) where N (x, µ, ) is a Gaussian distribution with ean µ and covariance atrix. α is the weight of the considered coponent. The Gaussian distributions are trained using the iterative Expectation Maxiization (EM) algorith. Since the observations in this work are the laughter utterance files (fro which a finite nuber of features (7) are extracted), the data for training the GMMs is quite liited and therefore in order to liit the nuber of paraeters that need to be estiated in the training stage, we use the siplest configuration. A GMM with 1 coponent, 9 diensions (source + target) and full covariance atrices is trained. VI. RMSE-BASED EVALUATION The first synchronization ethod (Method 1) presented in Section IV has been evaluated in a previous study through perception tests. The results showed that the obtained synchronization was perceived as slightly less accurate than the case where the original visual transcriptions were used. To copare the proposed ethod in this work (Method 2) with Method 1, we have calculated the Root Mean Square Error (RMSE) between the generated visual transcriptions and the original reference visual transcriptions for both ethods. Since rando processes are part of the ethods, the accuracy and therefore the RMSE ay change between two different applications of the ethods. To alleviate this, we run both ethods on all the available files 100 ties and calculate the ean RMSE for each ethod and each delay to estiate ( AV,start and AV,end ). In the case of Method 2 which include a data-driven training, GMMs were trained for each file following a leave-one-out protocol eaning that every tie a target vector had to be estiated, the source vector was not included in the training while all the other observations in the available data were included. Table III gives the results. Table III MEAN RMSE ERROR VALUES WITH THEIR STANDARD ERRORS FOR EACH METHOD AND EACH VALUE PREDICTED Method 1 Method 2 Mean RMSE AV,start (sec) (std err. = ) (0.0014) Mean RMSE AV,end (sec) (0.0156) * (0.0059) between RMSE eans between both ethod in the case of AV,start (p-value = 0.85) while there is a strong difference between the eans between ethods in the case of AV,end (p-value < 0.01). These results tend to show that the proposed ethod perfors as good as the previous one in the case of the estiation of the tie delay between audio and visual laughter at the beginning ( AV,start ) and perfors better in the case of the estiation of the delay at the end ( AV,end ). VII. CONCLUSIONS AND FUTURE WORKS In this paper, we have proposed a synchronization ethod based on the generation of visual transcription intended to be used as input to an HMM-based visual laughter synthesizer. Copared to the previous ethod, two iproveents were introduced. Firstly, the phoneic transcriptions which ay not always be available are not needed as input as it was the case in the previous ethod. The proposed ethod only uses an audio laughter file as input. Secondly, ean RMSE calculations between generated and original visual transcriptions have been conducted for both the previous and proposed ethods. The results showed that the proposed ethod perfors better for the prediction of the delay at the end of the laughter than the previous ethod and as good as it for the prediction of the delay at the beginning. Future work include deeper evaluation of how well the accuracy iproveent ipacts the perception in rendered aniations. The study of the extrapolation and use of the proposed ethod to synthesize visual aniation fro audio laughter files belonging to several different persons is also an interesting line to follow. Further accuracy iproveents ay possibly be achieved by adding ore supra-segental features such as the nuber of vowels and fricatives in the file. This would require the developent of a robust autoatic laughter phonee recognition syste. ACKNOWLEDGMENT Hüseyin Çakak H. Çakak receives a Ph.D. grant fro the Fonds de la Recherche pour l Industrie et l Agriculture (F.R.I.A.), Belgiu. A Tukey Honest Significant Difference (HSD) test with a significance level of 95% was perfored on the eans obtained for each ethod. There is no significant difference /15/$ IEEE 433

7 REFERENCES [1] J. Urbain, H. Çakak, and T. Dutoit, Evaluation of HMM-based laughter synthesis, in Acoustics Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, [2] J. Urbain, H. Çakak, and T. Dutoit, Autoatic phonetic transcription of laughter and its application to laughter synthesis, in Proceedings of the 5 th biannual Huaine Association Conference on Affective Coputing and Intellignet Interaction (ACII), Geneva, Switzerland, 2-5 Septeber 2013, pp [3] J. Urbain, H. Çakak, Aurélie Charlier, Maxie Denti, T. Dutoit, and Stéphane Dupont, Arousal-driven synthesis of laughter, IEEE Journal of Selected Topics in Signal Processing, vol. 8, pp , [4] G. Hofer, J. Yaagishi, and H. Shiodaira, Speech-driven lip otion generation with a trajectory h, [5] S. Sako, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitaura, H-based text-to-audio-visual speech synthesis, in ICSLP, [6] L. Wang, Y. Wu, X. Zhuang, and F. Soong, Synthesizing visual speech trajectory with iniu generation error, in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011, pp [7] G. Hofer and K. Richond, Coparison of h and td ethods for lip synchronisation, [8] O. Govokhina, G. Bailly, G. Breton, et al., Learning optial audiovisual phasing for a h-based control odel for facial aniation, in 6th ISCA Workshop on Speech Synthesis (SSW6), [9] G. Bailly, O. Govokhina, F. Elisei, and G. Breton, Lip-synching using speakerspecific articulation, shape and appearance odels, EURASIP Journal on Audio, Speech, and Music Processing, vol. 2009, pp. 5, [10] H. Çakak, J. Urbain, J. Tilanne, and T. Dutoit, Evaluation of h-based visual laughter synthesis, in Acoustics Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, [11] D. Schabus, M. Pucher, and G. Hofer, Joint audiovisual hidden sei-arkov odel-based speech synthesis, Selected Topics in Signal Processing, IEEE Journal of, [12] T. Masuko, T. Kobayashi, M. Taura, J. Masubuchi, and K. Tokuda, Text-tovisual speech synthesis based on paraeter generation fro h, in Acoustics, Speech and Signal Processing, Proceedings of the 1998 IEEE International Conference on, 1998, vol. 6. [13] M. Taura, T. Masuko, T. Kobayashi, and K. Tokuda, Visual speech synthesis based on paraeter generation fro h: Speech-driven and text-and-speechdriven approaches, in AVSP 98 Int. Conf. on Auditory-Visual Speech Processing, [14] H. Çakak, J. Urbain, and T. Dutoit, H-based synthesis of laughter facial expression, Journal on Multiodal User Interfaces (JMUI), 2015, [Subitted]. [15] Hüseyin Çakak, Jérôe Urbain, and Thierry Dutoit, Synchronization rules for HMM-based audio-visual laughter synthesis, in IEEE International Conference on Acoustics Speech and Signal Processing ICASSP, [16] G. McLachlan and D. Peel, Finite Mixture Models, [17] H. Çakak, J. Urbain, and T. Dutoit, The av-lasyn database : A synchronous corpus of audio and 3d facial arker data for audio-visual laughter synthesis, in Proc. of the 9th Int. Conf. on Language Resources and Evaluation (LREC 14), [18] J. Urbain and T. Dutoit, A phonetic analysis of natural laughter, for use in autoatic laughter processing systes, in ACII 2011, 2011, pp [19] Willibald Ruch and Paul Ekan, The expressive pattern of laughter, Eotion, qualia, and consciousness, pp , [20] H. Zen, T. Nose, J. Yaagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, The h-based speech synthesis syste (hts) version 2.0, in Proc. of Sixth ISCA Workshop on Speech Synthesis, [21] W. Ruch and P. Ekan, The expressive pattern of laughter, in Eotion, qualia and consciousness, A. Kaszniak, Ed., pp World Scientific Publishers, [22] A.W. Bowan and A. Azzalini, Applied Soothing Techniques for Data Analysis : The Kernel Approach with S-Plus Illustrations: The Kernel Approach with S-Plus Illustrations, OUP Oxford, [23] N.L. Johnson, S. Kotz, and N. Balakrishnan, Continuous univariate distributions, Nuber vol. 2 in Wiley series in probability and atheatical statistics: Applied probability and statistics. Wiley & Sons, [24] Theodoros Giannakopoulos, A ethod for silence reoval and segentation of speech signals, ipleented in atlab, University of Athens, Athens, [25] Ioannis Stylianou, Haronic plus noise odels for speech, cobined with statistical ethods, for speech and speaker odification, Ph.D. thesis, Ecole Nationale Supérieure des Télécounications, [26] Alexander Blouke Kain, High resolution voice transforation, Ph.D. thesis, Oregon Health & Science University, [27] Thoas Hueber, Elie-Laurent Benaroya, Bruce Denby, and Gérard Chollet, Statistical apping between articulatory and acoustic data for an ultrasound-based silent speech interface., in INTERSPEECH, 2011, pp /15/$ IEEE 434

An Investigation of Acoustic Features for Singing Voice Conversion based on Perceptual Age

An Investigation of Acoustic Features for Singing Voice Conversion based on Perceptual Age INTERSPEECH 13 An Investigation of Acoustic Features for Singing Voice Conversion based on Perceptual Age Kazuhiro Kobayashi 1, Hironori Doi 1, Tooki Toda 1, Tooyasu Nakano 2, Masataka Goto 2, Graha Neubig

More information

The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis

The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis Hüseyin Çakmak, Jérôme Urbain, Joëlle Tilmanne and Thierry Dutoit University of Mons,

More information

Estimating PSNR in High Definition H.264/AVC Video Sequences Using Artificial Neural Networks

Estimating PSNR in High Definition H.264/AVC Video Sequences Using Artificial Neural Networks RADIOEGIEERIG, VOL. 7, O. 3, SEPTEMBER 008 3 Estiating PSR in High Definition H.64/AVC Video Sequences Using Artificial eural etworks Martin SLAIA, Václav ŘÍČÝ Dept. of Radio Electronics, Brno University

More information

FPGA Implementation of High Performance LDPC Decoder using Modified 2-bit Min-Sum Algorithm

FPGA Implementation of High Performance LDPC Decoder using Modified 2-bit Min-Sum Algorithm Second International Conference on Coputer Research and Developent FPGA Ipleentation of High Perforance LDPC Decoder using Modified 2-bit Min-Su Algorith Vikra Arkalgud Chandrasetty and Syed Mahfuzul Aziz

More information

LONGITUDINAL AND TRANSVERSE PHASE SPACE CHARACTERIZATION

LONGITUDINAL AND TRANSVERSE PHASE SPACE CHARACTERIZATION SPARC-BD-3/6 SPARC-RF-3/3 25 Noveber 23 LONGITUDINAL AND TRANSVERSE PHASE SPACE CHARACTERIZATION D. Alesini, C. Vaccarezza, (INFN/LNF) Abstract The characterization of the longitudinal and transverse phase

More information

Focus. Video Encoder Optimisation to Enhance Motion Representation in the Compressed-Domain

Focus. Video Encoder Optimisation to Enhance Motion Representation in the Compressed-Domain Q u a r t e r l y n e w s l e t t e r o f t h e M U S C A D E c o n s o r t i u Special points of interest: The position stateent is on Video Encoder Optiisation to Enhance Motion Representation in the

More information

Singing voice synthesis based on deep neural networks

Singing voice synthesis based on deep neural networks INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda

More information

An Industrial Case Study for X-Canceling MISR

An Industrial Case Study for X-Canceling MISR An Industrial Case Study for X-Canceling MISR Joon-Sung Yang, Nur A. Touba Coputer Engineering Research Center University of Texas, Austin, TX 7872 {jsyang,touba}@ece.utexas.edu Shih-Yu Yang, T.M. Mak

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Design for Verication at the Register Transfer Level. Krishna Sekar. Department of ECE. La Jolla, CA RTL Testbench

Design for Verication at the Register Transfer Level. Krishna Sekar. Department of ECE. La Jolla, CA RTL Testbench Design for Verication at the Register Transfer Level Indradeep Ghosh Fujitsu Labs. of Aerica, Inc. Sunnyvale, CA 94085 USA Krishna Sekar Departent of ECE Univ. of California, San Diego La Jolla, CA 92093

More information

A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems

A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems Jérôme Urbain and Thierry Dutoit Université de Mons - UMONS, Faculté Polytechnique de Mons, TCTS Lab 20 Place du

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios

Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios ecoendation ITU- T.6-7 (3/) Studio encoding paraeters of digital television for standard 4:3 and wide-screen 6:9 aspect ratios T Series roadcasting service (television) ii ec. ITU- T.6-7 Foreword The role

More information

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Jordi Bonada, Martí Umbert, Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Spain jordi.bonada@upf.edu,

More information

A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS

A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS Bajibabu Bollepalli 1, Jérôme Urbain 2, Tuomo Raitio 3, Joakim Gustafson 1, Hüseyin Çakmak 2 1 Department of Speech, Music

More information

single-phase industrial vacuums for dust Turbine motorized industrial vacuums for dust single-phase industrial vacuums for WeT & dry

single-phase industrial vacuums for dust Turbine motorized industrial vacuums for dust single-phase industrial vacuums for WeT & dry IndustrIal vacuus Professional and industrial line of single-phase vacuus with a copact design. available according to the needs in single phase version for dust or for liquids and dust. all the sart odels

More information

Phone-based Plosive Detection

Phone-based Plosive Detection Phone-based Plosive Detection 1 Andreas Madsack, Grzegorz Dogil, Stefan Uhlich, Yugu Zeng and Bin Yang Abstract We compare two segmentation approaches to plosive detection: One aproach is using a uniform

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

LAN CABLETESTER INSTRUCTION MANUAL I. INTRODUCTION

LAN CABLETESTER INSTRUCTION MANUAL I. INTRODUCTION LAN CABLETESTER 4 INSTRUCTION MANUAL ftnf I. INTRODUCTION The LAN Cable Tester is an easy and effective cable tester with the ability to identify cable failures, check wiring, and easure cable length in

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Motion-Induced and Parametric Excitations of Stay Cables: A Case Study

Motion-Induced and Parametric Excitations of Stay Cables: A Case Study Motion-Induced and Paraetric Excitations of Stay Cables: A Case Study Authors: Stoyan Stoyanoff, Rowan Willias Davies and Irwin, Inc., 09 bd. de Broont, Broont, Quebec, JL K7, Stoyan.Stoyanoff@rwdi.co

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Evaluation of School Bus Signalling Systems

Evaluation of School Bus Signalling Systems Evaluation of School Bus Signalling Systes Michael Paine Alec Fisher Sydney, May 1995 Evaluation of School Bus Signalling Systes Prepared for the Bus Safety Advisory Coittee New South Wales Departent of

More information

Laugh when you re winning

Laugh when you re winning Laugh when you re winning Harry Griffin for the ILHAIRE Consortium 26 July, 2013 ILHAIRE Laughter databases Laugh when you re winning project Concept & Design Architecture Multimodal analysis Overview

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Fig. 1. Fig. 3. Ordering data. Fig. Mounting

Fig. 1. Fig. 3. Ordering data. Fig. Mounting -SMART PTM Abient light sensor and presence detector for cstant light ctrol Product descripti Abient light sensor with oti detector Switch input for / switching and diing different casing versis For or

More information

Advanced Signal Processing 2

Advanced Signal Processing 2 Advanced Signal Processing 2 Synthesis of Singing 1 Outline Features and requirements of signing synthesizers HMM based synthesis of singing Articulatory synthesis of singing Examples 2 Requirements of

More information

Benefits of a Small Diameter Category 6A Copper Cabling System

Benefits of a Small Diameter Category 6A Copper Cabling System Benefits of a Sall Diaeter Category 6A Copper Cabling Syste The Panduit TX6A-SD Gig UTP Copper Cabling Syste with MaTriX Technology is a cost effective, sall diaeter Category 6A UTP cabling syste that

More information

Benefits of a Small Diameter Category 6A Copper Cabling System

Benefits of a Small Diameter Category 6A Copper Cabling System Benefits of a Sall Diaeter Category 6A Copper Cabling Syste The Panduit TX6A-SD Gig UTP Copper Cabling Syste with MaTriX Technology is a cost effective, sall diaeter Category 6A UTP cabling syste that

More information

The Evaluation of rock bands using an Integrated MCDM Model - An Empirical Study based on Finland (2000s)

The Evaluation of rock bands using an Integrated MCDM Model - An Empirical Study based on Finland (2000s) The Evaluation of rock bands using an Integrated MCDM Model - An Epirical Study based on Finland (2000s) a b Sarfaraz Hashekhani Zolfani a Meysa Donyavi Rad b Sa.hashekhani@gail.co eysa.donyavirad@yahoo.co

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis

A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis INTERSPEECH 2014 A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis S. W. Lee 1, Zhizheng Wu 2, Minghui Dong 1, Xiaohai Tian 2, and Haizhou Li 1,2 1 Human Language Technology

More information

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Jana Eggink and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 11

More information

A Bayesian Network for Real-Time Musical Accompaniment

A Bayesian Network for Real-Time Musical Accompaniment A Bayesian Network for Real-Time Musical Accompaniment Christopher Raphael Department of Mathematics and Statistics, University of Massachusetts at Amherst, Amherst, MA 01003-4515, raphael~math.umass.edu

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

A Survey on: Sound Source Separation Methods

A Survey on: Sound Source Separation Methods Volume 3, Issue 11, November-2016, pp. 580-584 ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org A Survey on: Sound Source Separation

More information

MOVIES constitute a large sector of the entertainment

MOVIES constitute a large sector of the entertainment 1618 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 11, NOVEMBER 2008 Audio-Assisted Movie Dialogue Detection Margarita Kotti, Dimitrios Ververidis, Georgios Evangelopoulos,

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

Towards a Mathematical Model of Tonality

Towards a Mathematical Model of Tonality Towards a Matheatical Model of Tonality by Elaine Chew S.M., Operations Research, 1998 Massachusetts Institute of Technology B.A.S., Matheatical and Coputational Sciences, and Music, 1992, Stanford University

More information

SINCE the lyrics of a song represent its theme and story, they

SINCE the lyrics of a song represent its theme and story, they 1252 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 6, OCTOBER 2011 LyricSynchronizer: Automatic Synchronization System Between Musical Audio Signals and Lyrics Hiromasa Fujihara, Masataka

More information

Retrieval of textual song lyrics from sung inputs

Retrieval of textual song lyrics from sung inputs INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Retrieval of textual song lyrics from sung inputs Anna M. Kruspe Fraunhofer IDMT, Ilmenau, Germany kpe@idmt.fraunhofer.de Abstract Retrieving the

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

A HMM-based Mandarin Chinese Singing Voice Synthesis System

A HMM-based Mandarin Chinese Singing Voice Synthesis System 19 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 3, NO., APRIL 016 A HMM-based Mandarin Chinese Singing Voice Synthesis System Xian Li and Zengfu Wang Abstract We propose a mandarin Chinese singing voice

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Laughter Animation Synthesis

Laughter Animation Synthesis Laughter Animation Synthesis Yu Ding Institut Mines-Télécom Télécom Paristech CNRS LTCI Ken Prepin Institut Mines-Télécom Télécom Paristech CNRS LTCI Jing Huang Institut Mines-Télécom Télécom Paristech

More information

1. Introduction NCMMSC2009

1. Introduction NCMMSC2009 NCMMSC9 Speech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices * Takeshi SAITOU 1, Masataka GOTO 1, Masashi

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA

More information

Audio Professional LPR 35

Audio Professional LPR 35 Technical Data Audio Professional () / 99 Audio Professional LPR Modern, high-output, analogue agnetic tape designed specifically for low speed recording, giving extra wide dynaic range, low noise, low

More information

Obbi Silver Luxo. external automations for swing gates

Obbi Silver Luxo. external automations for swing gates Obbi Silver Luxo external autoations for swing gates External Autoation Swing gates require great attention as regards their otorization because they are exposed to bad weather and frequent use. In order

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

"Glued to the Sofa": Exploring Guilt and Television Binge-Watching Behaviors

Glued to the Sofa: Exploring Guilt and Television Binge-Watching Behaviors Trinity University Digital Coons @ Trinity Counication Honors Theses Counication Departent 5-2016 "Glued to the Sofa": Exploring Guilt and Television Binge-Watching Behaviors Charles N. Wagner Trinity

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

A prototype system for rule-based expressive modifications of audio recordings

A prototype system for rule-based expressive modifications of audio recordings International Symposium on Performance Science ISBN 0-00-000000-0 / 000-0-00-000000-0 The Author 2007, Published by the AEC All rights reserved A prototype system for rule-based expressive modifications

More information

AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE

AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE 1th International Society for Music Information Retrieval Conference (ISMIR 29) AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE Tatsuya Kako, Yasunori

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine Project: Real-Time Speech Enhancement Introduction Telephones are increasingly being used in noisy

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

A Categorical Approach for Recognizing Emotional Effects of Music

A Categorical Approach for Recognizing Emotional Effects of Music A Categorical Approach for Recognizing Emotional Effects of Music Mohsen Sahraei Ardakani 1 and Ehsan Arbabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran,

More information

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Seminar CHIST-ERA Istanbul : 4 March 2014 Kick-off meeting : 27 January 2014 (call IUI 2012)

Seminar CHIST-ERA Istanbul : 4 March 2014 Kick-off meeting : 27 January 2014 (call IUI 2012) project JOKER JOKe and Empathy of a Robot/ECA: Towards social and affective relations with a robot Seminar CHIST-ERA Istanbul : 4 March 2014 Kick-off meeting : 27 January 2014 (call IUI 2012) http://www.chistera.eu/projects/joker

More information

Proposal for Application of Speech Techniques to Music Analysis

Proposal for Application of Speech Techniques to Music Analysis Proposal for Application of Speech Techniques to Music Analysis 1. Research on Speech and Music Lin Zhong Dept. of Electronic Engineering Tsinghua University 1. Goal Speech research from the very beginning

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Comparison Parameters and Speaker Similarity Coincidence Criteria: Comparison Parameters and Speaker Similarity Coincidence Criteria: The Easy Voice system uses two interrelating parameters of comparison (first and second error types). False Rejection, FR is a probability

More information

Bertsokantari: a TTS based singing synthesis system

Bertsokantari: a TTS based singing synthesis system INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Bertsokantari: a TTS based singing synthesis system Eder del Blanco 1, Inma Hernaez 1, Eva Navas 1, Xabier Sarasola 1, Daniel Erro 1,2 1 AHOLAB

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005. Wang, D., Canagarajah, CN., & Bull, DR. (2005). S frame design for multiple description video coding. In IEEE International Symposium on Circuits and Systems (ISCAS) Kobe, Japan (Vol. 3, pp. 19 - ). Institute

More information

Type-2 Fuzzy Logic Sensor Fusion for Fire Detection Robots

Type-2 Fuzzy Logic Sensor Fusion for Fire Detection Robots Proceedings of the 2 nd International Conference of Control, Dynamic Systems, and Robotics Ottawa, Ontario, Canada, May 7 8, 2015 Paper No. 187 Type-2 Fuzzy Logic Sensor Fusion for Fire Detection Robots

More information

VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION

VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION Tomoyasu Nakano Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan

More information

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark 214 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION Gregory Sell and Pascal Clark Human Language Technology Center

More information

HT300 PLUS. SIM2 Multimedia is certified

HT300 PLUS. SIM2 Multimedia is certified Headquarters: SIM2 MULTIMEDIA S.p.A. Viale Lino Zanussi, 11 33170 Pordenone Italy Tel. +39.0434.383256 Telefax +39.0434.383260 Eail: info@si2.it Web site: www.si2.co USA: SIM2 SELECO USA INC. 10108 USA

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Multimodal Analysis of laughter for an Interactive System

Multimodal Analysis of laughter for an Interactive System Multimodal Analysis of laughter for an Interactive System Jérôme Urbain 1, Radoslaw Niewiadomski 2, Maurizio Mancini 3, Harry Griffin 4, Hüseyin Çakmak 1, Laurent Ach 5, Gualtiero Volpe 3 1 Université

More information

Musical Entrainment Subsumes Bodily Gestures Its Definition Needs a Spatiotemporal Dimension

Musical Entrainment Subsumes Bodily Gestures Its Definition Needs a Spatiotemporal Dimension Musical Entrainment Subsumes Bodily Gestures Its Definition Needs a Spatiotemporal Dimension MARC LEMAN Ghent University, IPEM Department of Musicology ABSTRACT: In his paper What is entrainment? Definition

More information

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Marcello Herreshoff In collaboration with Craig Sapp (craig@ccrma.stanford.edu) 1 Motivation We want to generative

More information

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS A. Zehetner, M. Hagmüller, and F. Pernkopf Graz University of Technology Signal Processing and Speech Communication Laboratory, Austria ABSTRACT Wake-up-word (WUW)

More information

Classification of Timbre Similarity

Classification of Timbre Similarity Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information