Improving Frame Based Automatic Laughter Detection

Size: px
Start display at page:

Download "Improving Frame Based Automatic Laughter Detection"

Transcription

1 Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for this project was to improve upon my previous work to automatically detect laughter on a frame-byframe basis. My previous system (baseline system) detected laughter based on shortterm features including MFCCs, pitch, and energy. In this project, I have explored the utility of additional features (phone and prosodic) both by themselves and in combination with the baseline system. I improved the baseline system by 0.1% absolute and achieved an equal error rate (EER) of 7.9% for laughter detection on the ICSI Meetings database. 1 Introduction Audio communication contains a wealth of information in addition to spoken words. Specifically, laughter provides cues regarding the emotional state of the speaker [1], topic changes in the conversation [2], and the speaker s identity. Accurate laughter detection could be useful in a variety of applications. A laughter detector incorporated with a digital camera could be used to identify an opportune time to take a picture [3]. Laughter could be useful in a video search of humorous clips [4]. In speech recognition, identifying laughter could decrease word error rate by identifying nonspeech sounds [2]. The overall goal of this study is to use laughter for speaker recognition, as my intuition is that many individuals have their own distinct laugh. To be able to explore the utility of laughter segments for speaker recognition, however, we first need to build a robust system to detect laughter, which is the focus of this paper. Previous work has studied the acoustics of laughter [5, 6, 7]. Many agree that laughter has a breathy consonant-vowel structure [5, 8]. Some have made generalizations about laughter, such as Provine, who concluded that laughter is usually a series of short syllables repeated approximately every 210 ms [7]. Yet, others have found laughter to be highly variable [8] and thus difficult to stereotype [6]. These conclusions lead me to believe that automatic laughter detection is not a simple task. The most relevant previous work on automatic laughter detection has been that of Kennedy and Ellis [2], Truong and van Leeuwen [1], and Knox and Mirghafori [9] (my

2 previous work). However, the experimental setups and objectives of these works differ from this study, with the exception of my previous work. Kennedy and Ellis [2] studied the detection of overlapped (multiple speaker) laughter in the Meetings domain. They split the data into non-overlapping one second segments, which were then classified based on whether or not multiple speakers laughed. They used support vector machines (SVMs) trained on four features: MFCCs, delta MFCCs, modulation spectrum, and spatial cues. They achieved a true positive rate of 87%. Truong and van Leeuwen [1] classified presegmented ICSI Meetings data as laughter or speech. The segments were determined prior to training and testing their system and had variable time durations. The average duration of laughter and speech segments were 2.21 and 2.02 seconds, respectively. They used Gaussian mixture models trained with perceptual linear prediction (PLP) features, pitch and energy, pitch and voicing, and modulation spectrum. They built models for each of the feature sets. The model trained with PLP features performed the best at 13.4% 1 EER for a data set similar to the one used in this study. In my previous work [9], I did laughter detection at the frame level. A neural network was trained on each of the following features: MFCCs, energy (RMS), fundamental frequency (F 0 ), and the largest cross correlation value used to compute the fundamental frequency (AC PEAK). The score level combination of MFCC and AC PEAK features had the lowest EER at 8.0% 2 for the same dataset used in this study. Current state-of-the-art speech recognition systems also detect laughter. However, the threshold used is such that most occurrences of laughter are not identified. For example, SRI s speech recognizer was run on the same data used in this study and achieved a false acceptance rate of 0.1% and a false rejection rate of 78.2% for laughter. Since laughter is a somewhat rare event, occurring only 6.3% of the time in the dataset used in this study, it is important to identify as many laughter segments as possible in order to have data to use in a speaker recognition system. That being the case, the SRI speech recognizer would not be very useful for this task. Furthermore, the systems which used 1 second (or greater) segments or presegmented data would not be able to precisely detect laughter segments. Thus, for this project I expanded upon my previous system which performed laughter detection for each frame by including phone and prosodic features. The outline for this report is as follows: in Section 2 I describe the data used in this study, in Section 3 I further describe my previous work, in Section 4 I explain the current system and results, in Section 5 I discuss the results, and in Section 6 I provide my conclusions and ideas for future work. 2 Data I trained and tested the detector on the ICSI Meeting Recorder Corpus [10], a hand transcribed corpus of multi-party meeting recordings, in which each of the speakers was recorded on a close-talking microphone (which is the data used in this study) as well as distant microphones. The full text was transcribed in addition to non-lexical events (including coughs, 1 Each segment, which had varying duration, was equally weighted in the EER computation 2 Each frame, or time unit, was equally weighted in the EER computation 2

3 Figure 1: Histogram of laugh duration for the Bmr subset of the ICSI Meeting Recorder Corpus. lip smacks, mic noise, and most importantly, laughter). There were a total of 75 meetings in this corpus. In order to compare my results to the work done by Kennedy and Ellis [2] and Truong and van Leeuwen [1], I used the same training and testing sets, which were from the Bmr subset of the corpus. This subset contains 29 meetings. The first 26 were used for training and the last 3 were used to test the detector. I trained and tested only on data which was hand transcribed to be either laughter or non-laughter. Laughter-colored speech, that is, cases in which the hand transcribed documentation had both speech and laughter listed under a single start and end time were disregarded since I would not specifically know which time interval(s) contained laughter. Also, if the transcription did not include information for a period of time for a channel, that audio was excluded. This exclusion reduced training and testing on cross-talk and allowed me to train and test on channels only when they were in use. Ideally, an automatic silence detector would be employed in this step instead of relying on the transcripts. As a note, unlike Truong and van Leeuwen I included audio that contained non-lexical vocalized sounds other than laughter. Figure 1 shows the histogram of the laughter durations. The average laugh duration was seconds with a standard deviation of seconds. 3 Previous Work 3.1 Previous System My previous system trained a neural network with MFCC, pitch, and energy features to detect whether a given frame contained laughter Features Mel Frequency Cepstral Coefficients (MFCCs) MFCCs were used to capture the spectral features of laughter and non-laughter. The first order regression coefficients of the MFCCs (delta MFCCs) and the second order regression coefficients (delta-delta MFCCs) 3

4 were also computed and used as features for the neural network. I used the first 12 MFCCs as well as the 0 th coefficient, which were computed over a 25 ms window with a 10 ms forward shift, as features for the neural network. MFCC features were extracted using the Hidden Markov Model Toolkit (HTK) [11]. For each frame I computed 13 MFCCs, 13 delta MFCCs, and 13 delta-delta MFCCs. Pitch and energy Studies in the acoustics of laughter [5, 6] and in automatic laughter detection [1] investigated the pitch and energy of laughter as potentially important features. I used the ESPS pitch tracker get f0 [12] to extract the fundamental frequency (F 0 ), local root mean squared energy (RMS), and the highest normalized cross correlation value found to determine F 0 (AC PEAK) for each frame. The delta and delta-delta coefficients were computed for each of these features as well Learning Method I did frame-wise laughter detection. Since the frames were short in duration (10 ms) and each laughter segment was on average seconds in this data set, I decided it would be best to use a large context window of features as inputs in the neural network. A neural network with one hidden layer was trained using QuickNet [13]. The input to the neural network was a window of feature frames, where the center frame was the target frame. I used the softmax activation function to compute the probability that the frame was laughter. To prevent over-fitting, the data used to train the neural network was split into two groups: training (the first 21 Bmr meetings) and cross validation (the last 5 meetings from the original training set). The neural network weights were updated based on the training data via the back-propagation algorithm and then the cross validation data was scored after every training epoch resulting in the cross validation frame accuracy (CVFA). Training was concluded once the CVFA increased by less than 0.5% for a second time. 3.2 Previous experiments and results Parameter settings I first needed to determine the input window size and the number of hidden units in the neural network. Empirically, I found that a context window of 75 consecutive frames (0.75 seconds) worked well. To make the classification of laughter based on the middle frame, I set the offset to 37 frames. In other words, the inputs to the neural network were the features from the frame to be classified and the 37 frames before and after this frame. Figure 2 shows the windowing technique. I also had to determine the number of hidden units. MFCCs were the most valuable features for Kennedy and Ellis [2] and I suspected my system would have similar results. Thus, I used the MFCCs as the input features and modified the number of hidden units while keeping all other parameters the same. Based on the accuracy on the cross validation set, I saw that 200 hidden units performed best. Similarly, I varied the number of hidden units using F 0 features. The CVFA was approximately the same for a range of hidden units but the system with 200 was marginally better than the rest. 4

5 1 t w i n d o w w i t h 7 5 f r a m e s ( m s ) t a r g e t f r a m e ( 1 0 m s ) Figure 2: For each frame being evaluated, features from a window of 75 frames is input to the neural network Systems The neural networks were first separately trained on the four classes of features: MFCCs, F 0, RMS, and AC PEAK. The EERs for each of the classes is shown in Table 1. Each column lists the EER for a neural network trained with the feature itself, the deltas, the delta-deltas, and the feature level combination of the feature, delta, and delta-delta (the All System). I combined the All systems on the score level to improve our results using another neural network, this time using a smaller window size and fewer hidden units. Since each of the inputs was the probability of laughter for each frame, I shortened the input window size of the combiner neural network from 75 frames to 9 and reduced the number of hidden units to 2. Since the system using MFCC features had the lowest EER, I combined MFCCs with each of the other classes of features. Table 2 shows that after combining, the MFCC+AC PEAK system (baseline system) performed the best. I then combined MFCC, AC PEAK, and RMS features. Finally, I combined all of the systems and computed the EER. Table 1: Equal Error Rates (%). MFCCs F 0 RMS AC PEAK Feature Delta Delta-Delta All Table 2: Equal Error Rates for Combined Systems (%). EER MFCC+F MFCC+RMS 8.92 MFCC+AC PEAK 7.97 MFCC+AC PEAK+RMS 8.26 MFCC+AC PEAK+RMS+F

6 4 Current Work Figure 3: Example hand-transcription. While the results from the previous work were good. By including more features I hoped to improve upon the baseline system. 4.1 Current System Additional Features Phones Laughter has a repeated consonant vowel structure [5, 8]. Thus, phone sequences seemed to be a good feature to use to identify laughter. I used the SRI phone recognizer, Decipher [14], in order to extract the phones; however, Decipher annotates nonstandard phones including laughter. Although this was not the original information I intended to extract, it seemed plausible for Decipher s laughter detector to improve the baseline results. Prosodic My previous system only included short-term features. However, laughter is different from most speech sounds because it repeats approximately every 210 ms [7]. Since prosodic features are extracted over longer intervals of time, they may distinguish laughter from non-laughter. I used 21 prosodic features, which were statistics (including min, max, mean, standard deviation, etc.) on pitch, energy, long-term average spectrum, and noiseto-harmonic ratio Learning Methods For the phone features, I trained a neural network with a 46 dimensional feature vector (1 dimension for each possible phone) for each frame. Each feature vector had only 1 nonzero value, which corresponded to the phone for that frame. I again used features over a context window as the input to the neural network and trained the neural network using the same training and cross validation sets as before in order to prevent over-fitting. Prosodic features were extracted for each segment as documented in the hand-transcriptions. Figure 3 is an example hand-transcription which shows each segment marked with a start and end time. Segmenting based on the transcript guaranteed that non-laughter segments were separated from laughter segments. In the future, it would be better to have an automatic segmenter. However, since this is the my first experimentation with prosodic features, I wanted to easily decipher which features perform well for the task of laughter detection. Since the prosodic features were computed for the entire segment, I used an SVM to build a model to detect laughter. 6

7 4.2 Current experiments and results Parameter settings Many of the parameters used in the neural network trained on phone features were chosen to be compatible with the previous system. For example, I again used a window size of 75 frames. This was done in order to compare score level combinations of the phone and baseline systems with feature level combinations, which will be computed in the future. Based on the CVFA score on the cross validation set, I chose the number of hidden units to be 9. For the prosodic features, I set the cost-factor (amount that training errors on positive examples outweighs training errors on negative examples) to 10 when building the SVM models. This was done because there were approximately 10 non-laughter segments for every laughter segment System Results I first trained systems on phone features and prosodic features alone. The neural network trained on phone features achieved an EER of 18.45%, as shown in Table 4. Using the training set, SVM models were trained using all of the statistics for each class of prosodic features: pitch, energy, long-term average spectrum (LTAS), and the noise-to-harmonic ratio (N2H). The EER was computed using the cross validation set (each segment, which had varying duration, was equally weighted in the EER computation). As shown in Table 3, the energy features performed the best. I then combined the energy features with each of the other classes of features. The system trained with energy, noise-to-harmonic ratio, and pitch features was the best prosodic system. I then computed the EER where each frame was weighted equally to be 50%. This result is show in Table 4. Table 3: Equal Error Rates (each segment was equally weighted) for Prosodic Features(%). EER PITCH ENERGY LTAS N2H ENERGY+PITCH ENERGY+LTAS ENERGY+N2H ENERGY+N2H+PITCH ENERGY+N2H+LTAS ENERGY+N2H+PITCH+LTAS I then performed score level combinations of the phone, prosodic, and baseline systems using the same neural network parameters as the previous system s score level combinations. While the score level combination of the baseline system and the phone system improved to 7.9%, the combination of the baseline and prosodic systems degraded to 9.8% EER. When 7

8 Table 4: Equal Error Rates(%). EER PHONES PROSODIC 50 PHONES+PROSODIC BASELINE+PHONES 7.87 BASELINE+PROSODIC 9.80 BASELINE+PHONES+PROSODIC 8.75 all three systems were combined the EER was 8.75%. As shown in Table 4, the combination of the baseline and phone systems had the best result. 5 Discussion From Table 4, it is clear that the prosodic system does not score as well when each frame is weighted equally. It may be be beneficial to either change the training method for the prosodic system from the segment level to the frame level, perhaps using a neural network, or modify the method of combining the prosodic system with the other systems to make use of the prosodic system s results when each segment is weighted equally. Since the prosodic system did not score well on its own, it was not surprising that when combined with the baseline system and the baseline+phone system the resulting systems were worse. However, when the prosodic system was combined with the phone system the EER improved. Despite this improvement the score was still worse than the baseline system. Taking advantage of Decipher s phone recognizer proved beneficial. The EER for the combined baseline and phone system improved to 7.87%, which is currently the best EER for laughter detection at the frame level. 6 Conclusion and future work In conclusion, we have improved upon our baseline results by including additional features. The phone system improved the baseline system by 0.1% absolute. Although prosodic features did not improve the baseline system. I think that there are many other prosodic features which will improve laughter detection. In the future, I plan on scoring feature level combinations of the system using a comparable number of hidden units. Also, many other prosodic features have already been extracted and may be beneficial for this study. After determining which prosodic features work well, I will use a moving window technique to extract the features so that this laughter detector can be run on audio that may not have hand-transcriptions. I also plan on experimenting with using a neural network to train the prosodic system instead of an SVM. 8

9 7 Acknowledgments I would like to thank Christian Mueller for all of his help in extracting the prosodic features and Nikki Mirghafori for her guidance throughout this study. 9

10 References [1] Truong, K.P. and Van Leeuwen, D.A., Automatic detection of laughter, In Proceedings of Interspeech, Lisbon, Portugal, [2] Kennedy, L. and Ellis, D., Laughter detection in meetings, NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, [3] A. Carter, Automatic acoustic laughter detection, Masters Thesis, Keele University, [4] Cai, R., Lu, L., Zhange, H.-J., Cai, L.-H., Highlight sound effects detection in audio stream, in Proc. Intern. Confer. on Multimedia and Expo, Baltimore, [5] C. Bickley and S. Hunnicutt, Acoustic analysis of laughter, in Proc. ICSLP, pp , Banff, Canada, [6] Bachorowski, J., Smoski, M., Owren, M., The acoustic features of human laughter, Acoustical Society of America, pp , [7] R. Provine, Laughter, American Scientist, January-February [8] J. Trouvain, Segmenting phonetic units in laughter, in Proc ICPhS, [9] M. Knox and N. Mirghafori, Automatic laughter detection using neural networks, Unpublished. [10] Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C., The ICSI meeting corpus, ICAASP, Hong Kong, April [11] Hidden Markov Model Toolkit (HTK), [12] Entropic Research Laboratory, Washington, D.C., Esps version 5.0 programs manual, August [13] QuickNet, [14] Cohen, M., Murveit, H., Bernstein, J., Price, P., Weintraub, M., The DECI- PHER speech recognition system, In Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp , Albuquerque,

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

Automatic Laughter Segmentation. Mary Tai Knox

Automatic Laughter Segmentation. Mary Tai Knox Automatic Laughter Segmentation Mary Tai Knox May 22, 2008 Abstract Our goal in this work was to develop an accurate method to identify laughter segments, ultimately for the purpose of speaker recognition.

More information

Automatic discrimination between laughter and speech

Automatic discrimination between laughter and speech Speech Communication 49 (2007) 144 158 www.elsevier.com/locate/specom Automatic discrimination between laughter and speech Khiet P. Truong *, David A. van Leeuwen TNO Human Factors, Department of Human

More information

Detecting Attempts at Humor in Multiparty Meetings

Detecting Attempts at Humor in Multiparty Meetings Detecting Attempts at Humor in Multiparty Meetings Kornel Laskowski Carnegie Mellon University Pittsburgh PA, USA 14 September, 2008 K. Laskowski ICSC 2009, Berkeley CA, USA 1/26 Why bother with humor?

More information

A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems

A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems Jérôme Urbain and Thierry Dutoit Université de Mons - UMONS, Faculté Polytechnique de Mons, TCTS Lab 20 Place du

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text Sabrina Stehwien, Ngoc Thang Vu IMS, University of Stuttgart March 16, 2017 Slot Filling sequential

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

AUTOMATIC RECOGNITION OF LAUGHTER

AUTOMATIC RECOGNITION OF LAUGHTER AUTOMATIC RECOGNITION OF LAUGHTER USING VERBAL AND NON-VERBAL ACOUSTIC FEATURES Tomasz Jacykiewicz 1 Dr. Fabien Ringeval 2 JANUARY, 2014 DEPARTMENT OF INFORMATICS - MASTER PROJECT REPORT Département d

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark 214 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION Gregory Sell and Pascal Clark Human Language Technology Center

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Features for Audio and Music Classification

Features for Audio and Music Classification Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

More information

Retrieval of textual song lyrics from sung inputs

Retrieval of textual song lyrics from sung inputs INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Retrieval of textual song lyrics from sung inputs Anna M. Kruspe Fraunhofer IDMT, Ilmenau, Germany kpe@idmt.fraunhofer.de Abstract Retrieving the

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Speech To Song Classification

Speech To Song Classification Speech To Song Classification Emily Graber Center for Computer Research in Music and Acoustics, Department of Music, Stanford University Abstract The speech to song illusion is a perceptual phenomenon

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory

More information

Singing voice synthesis based on deep neural networks

Singing voice synthesis based on deep neural networks INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Phone-based Plosive Detection

Phone-based Plosive Detection Phone-based Plosive Detection 1 Andreas Madsack, Grzegorz Dogil, Stefan Uhlich, Yugu Zeng and Bin Yang Abstract We compare two segmentation approaches to plosive detection: One aproach is using a uniform

More information

Modeling memory for melodies

Modeling memory for melodies Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University

More information

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Comparison Parameters and Speaker Similarity Coincidence Criteria: Comparison Parameters and Speaker Similarity Coincidence Criteria: The Easy Voice system uses two interrelating parameters of comparison (first and second error types). False Rejection, FR is a probability

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

AUD 6306 Speech Science

AUD 6306 Speech Science AUD 3 Speech Science Dr. Peter Assmann Spring semester 2 Role of Pitch Information Pitch contour is the primary cue for tone recognition Tonal languages rely on pitch level and differences to convey lexical

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

Reducing False Positives in Video Shot Detection

Reducing False Positives in Video Shot Detection Reducing False Positives in Video Shot Detection Nithya Manickam Computer Science & Engineering Department Indian Institute of Technology, Bombay Powai, India - 400076 mnitya@cse.iitb.ac.in Sharat Chandran

More information

Analysis of the Occurrence of Laughter in Meetings

Analysis of the Occurrence of Laughter in Meetings Analysis of the Occurrence of Laughter in Meetings Kornel Laskowski 1,2 & Susanne Burger 2 1 interact, Universität Karlsruhe 2 interact, Carnegie Mellon University August 29, 2007 Introduction primary

More information

Fusion for Audio-Visual Laughter Detection

Fusion for Audio-Visual Laughter Detection Fusion for Audio-Visual Laughter Detection Boris Reuderink September 13, 7 2 Abstract Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

Singer Identification

Singer Identification Singer Identification Bertrand SCHERRER McGill University March 15, 2007 Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 1 / 27 Outline 1 Introduction Applications Challenges

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

Experiments on musical instrument separation using multiplecause

Experiments on musical instrument separation using multiplecause Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park katepark@stanford.edu Annie Hu anniehu@stanford.edu Natalie Muenster ncm000@stanford.edu Abstract We propose detecting

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Jordi Bonada, Martí Umbert, Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Spain jordi.bonada@upf.edu,

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC Vaiva Imbrasaitė, Peter Robinson Computer Laboratory, University of Cambridge, UK Vaiva.Imbrasaite@cl.cam.ac.uk

More information

LAUGHTER serves as an expressive social signal in human

LAUGHTER serves as an expressive social signal in human Audio-Facial Laughter Detection in Naturalistic Dyadic Conversations Bekir Berker Turker, Yucel Yemez, Metin Sezgin, Engin Erzin 1 Abstract We address the problem of continuous laughter detection over

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval Data-Driven Solo Voice Enhancement for Jazz Music Retrieval Stefan Balke1, Christian Dittmar1, Jakob Abeßer2, Meinard Müller1 1International Audio Laboratories Erlangen 2Fraunhofer Institute for Digital

More information

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues

Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Laughbot: Detecting Humor in Spoken Language with Language and Audio Cues Kate Park, Annie Hu, Natalie Muenster Email: katepark@stanford.edu, anniehu@stanford.edu, ncm000@stanford.edu Abstract We propose

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

PSYCHOLOGICAL AND CROSS-CULTURAL EFFECTS ON LAUGHTER SOUND PRODUCTION Marianna De Benedictis Università di Bari

PSYCHOLOGICAL AND CROSS-CULTURAL EFFECTS ON LAUGHTER SOUND PRODUCTION Marianna De Benedictis Università di Bari PSYCHOLOGICAL AND CROSS-CULTURAL EFFECTS ON LAUGHTER SOUND PRODUCTION Marianna De Benedictis marianna_de_benedictis@hotmail.com Università di Bari 1. ABSTRACT The research within this paper is intended

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

MODELING OF PHONEME DURATIONS FOR ALIGNMENT BETWEEN POLYPHONIC AUDIO AND LYRICS

MODELING OF PHONEME DURATIONS FOR ALIGNMENT BETWEEN POLYPHONIC AUDIO AND LYRICS MODELING OF PHONEME DURATIONS FOR ALIGNMENT BETWEEN POLYPHONIC AUDIO AND LYRICS Georgi Dzhambazov, Xavier Serra Music Technology Group Universitat Pompeu Fabra, Barcelona, Spain {georgi.dzhambazov,xavier.serra}@upf.edu

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

A Music Retrieval System Using Melody and Lyric

A Music Retrieval System Using Melody and Lyric 202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent

More information

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS A. Zehetner, M. Hagmüller, and F. Pernkopf Graz University of Technology Signal Processing and Speech Communication Laboratory, Austria ABSTRACT Wake-up-word (WUW)

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories

More information

Acoustic and musical foundations of the speech/song illusion

Acoustic and musical foundations of the speech/song illusion Acoustic and musical foundations of the speech/song illusion Adam Tierney, *1 Aniruddh Patel #2, Mara Breen^3 * Department of Psychological Sciences, Birkbeck, University of London, United Kingdom # Department

More information

A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS

A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS Bajibabu Bollepalli 1, Jérôme Urbain 2, Tuomo Raitio 3, Joakim Gustafson 1, Hüseyin Çakmak 2 1 Department of Speech, Music

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

NewsComm: A Hand-Held Device for Interactive Access to Structured Audio

NewsComm: A Hand-Held Device for Interactive Access to Structured Audio NewsComm: A Hand-Held Device for Interactive Access to Structured Audio Deb Kumar Roy B.A.Sc. Computer Engineering, University of Waterloo, 1992 Submitted to the Program in Media Arts and Sciences, School

More information

Acoustic Prosodic Features In Sarcastic Utterances

Acoustic Prosodic Features In Sarcastic Utterances Acoustic Prosodic Features In Sarcastic Utterances Introduction: The main goal of this study is to determine if sarcasm can be detected through the analysis of prosodic cues or acoustic features automatically.

More information

Research & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION

Research & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION Research & Development White Paper WHP 232 September 2012 A Large Scale Experiment for Mood-based Classification of TV Programmes Jana Eggink, Denise Bland BRITISH BROADCASTING CORPORATION White Paper

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Advanced Signal Processing 2

Advanced Signal Processing 2 Advanced Signal Processing 2 Synthesis of Singing 1 Outline Features and requirements of signing synthesizers HMM based synthesis of singing Articulatory synthesis of singing Examples 2 Requirements of

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15 Piano Transcription MUMT611 Presentation III 1 March, 2007 Hankinson, 1/15 Outline Introduction Techniques Comb Filtering & Autocorrelation HMMs Blackboard Systems & Fuzzy Logic Neural Networks Examples

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection

Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection Published at Interspeech 13, Lyon France, August 13 Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection Malcolm Slaney 1, Elizabeth Shriberg 1, and Jui-Ting Huang 1 Microsoft Research,

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About

More information

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Analytic Comparison of Audio Feature Sets using Self-Organising Maps Analytic Comparison of Audio Feature Sets using Self-Organising Maps Rudolf Mayer, Jakob Frank, Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology,

More information

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine Project: Real-Time Speech Enhancement Introduction Telephones are increasingly being used in noisy

More information