A natural acoustic front-end for Interactive TV in the EU-Project DICIT

Size: px
Start display at page:

Download "A natural acoustic front-end for Interactive TV in the EU-Project DICIT"

Transcription

1 A natural acoustic front-end for Interactive TV in the EU-Project DICIT L. Marquardt a,p.svaizer b,e.mabande a,a.brutti b,c.zieger b,m.omologo b, and W. Kellermann a a Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr. 7, 9158 Erlangen, Germany b Fondazione Bruno Kessler - irst, Via Sommarive 18, 381 Trento, Italy addresses: {marquardt,mabande,wk}@lnt.de a, {svaizer,brutti,zieger,omologo}@fbk.eu b Abstract Distant-talking Interfaces for Control of Interactive TV (DICIT) is a European Union-funded project whose main objective is to integrate distant-talking voice interaction as a complementary modality to the use of a remote control in interactive TV systems. Hands-free and seamless control enables a natural user-system interaction providing a suitable means to greatly ease information retrieval. In the given living room scenario the system recognizes commands spoken by multiple and possibly moving users, even in the presence of background noise and TV surround audio. This paper focuses on the multichannel acoustic frontend (MCAF) processing for acoustic scene interpretation which is based on the combination of multi-channel acoustic echo cancellation, blind source separation, beamforming, acoustic event classification, and multiple speaker localization. The fully functional DICIT prototype consists of the MCAF, automatic speech recognition, natural language understanding, mixed-initiative dialogue and satellite connection. 1. Introduction The goal of DICIT [1] is to provide a user-friendly multimodal interface that allows voice-based access to a virtual smart assistant for interacting with TV-related digital devices and infotainment services, such as digital TV, Hi- Fi audio devices, etc., in a typical living room. Multiple and possibly moving users can use their voice for controlling the TV, e.g., requesting information about an upcoming program and scheduling its recording without the need for any hand-held or head-mounted gear. This scenario requires real-time-capable acoustic signal processing techniques which compensate for the impairments of the desired speech signals by acoustic echoes from the loudspeakers, local interferers, ambient noise and reverberation. Accordingly, one of the key components for the prototypes developed within the DICIT project is the combination of state-of-the-art multichannel acoustic echo cancellation (MC-AEC), beamforming (BF), blind source separa- This work was partially supported by the European Commission within the DICIT project under contract number tion (BSS), smart speech filtering (SSF) based on acoustic event detection and classification, and multiple source localization (SLOC) techniques. The subsequent sections of this paper are structured as follows: In Sect. 2 we describe the general architecture of the overall DICIT system. The acoustic front-end as a crucial building block of the DICIT system is presented in Sect. 3: We first describe the currently fully integrated front-end which is based on MC-AEC, BF, SLOC and SSF (see also the video at [1]). An alternative approach under development, featuring BSS, MC-AEC and SSF, is presented next. Conclusions and an outlook on next steps and further possible improvements are given in Sect The DICIT System In the following, we first describe the architecture of the overall DICIT system, outline the functionality of its most important components, and briefly describe the used hardware System architecture The main building blocks of the DICIT system are the signal acquisition and playback hardware, the acoustic front-end processing, the automatic speech recognition (ASR) and natural language understanding (NLU) unit, and the actual dialogue manager (DM) as depicted in Fig. 1. Figure 1. DICIT architecture The first block comprises the hardware for signal acquisition and reproduction as detailed in the upper part of Fig. 2. The main components are here the 13-channel microphone array and a multichannel loudspeaker system, capturing the acoustic signals from the environment, and playing back the digitally mixed outputs from the TV and the dialogue system in stereo format, respectively. Note that the TV system comprises a remote control device as well as a set-top box (STB) platform, providing access to on-air satellite signals.

2 The acoustic front-end processing, which will be described in detail in Sect. 3, extracts the desired speech from the microphone signals and passes it to the subsequent ASR. Given the state of the art in robust speech recognition, it is still crucial for the targeted environment to remove to the greatest possible extent any signal impairments due to reverberation, background noise, interferers, and acoustic feedback from the loudspeakers to the microphones and to forward only signal segments to the ASR that can reliably be classified as user speech. Continuous speech recognition technology in DICIT is based on IBM Embedded ViaVoice (EVV) [2]. Acoustic models have been trained in order to optimize the recognition performance given the distant-talking voice characteristics as well as the typical noisy and reverberant conditions of the addressed scenario. The ASR output is interpreted by an NLU unit which employs a statistical modelling called Multi-level Action Classifier [2]. This processing chain has been optimized for the English, German, and Italian languages to enable multilinguality as additional feature of the system. The DM finally manages all interactions with user input and system output and interfaces to external data and devices. Depending on the NLU output or remote control input, the DM is primarily responsible for informationretrieval from the electronic program guide (EPG) and for controlling the TV/STB-System. Feedback to the user is possible by acoustic means via speech generation and visual means via the screen Hardware setup Apart from the microphone array and the loudspeakers, the hardware setup consists of AD-/DA-converters, preamplifiers, the STB and two PCs - the usage of two PCs was commanded by the use of two different operating systems. The first Linux-based PC is equipped with a multichannel digital soundcard and hosts the acoustic front-end processing modules. The second Windows-based PC hosts ASR, NLU, and the DM - communication between the two PCs is established via a TCP-based standard internet protocol. The video signal from the STB is displayed via an LCD screen or video projector. 3. Acoustic front-end The acoustic front-end foresees different combinations of signal processing components. Its configuration depends primarily on computational constraints and the requirements of the specific scenario. The following subsections describe two practically relevant architectures, both featuring MC-AEC but differing with respect to the employed spatial processing and source localization techniques. While the first configuration is part of the current DICIT prototype and uses beamforming and traditional correlation-based source localization, the BSS-based architecture which aims at an extended functionality and reduces the number of microphones, is currently being integrated BF-/SLOC-based front-end The DICIT prototype is based on an acoustic front-end which efficiently combines stereo acoustic echo cancellation (SAEC), BF, SLOC, and SSF. The front-end and its connection to the signal acquisition and playback stage are depicted in Fig. 2. Figure 2. Acoustic front-end (based on BF and SLOC) We first consider the structure of the entire BF-/SLOCbased front-end, before its individual components are described in more detail. While BF extracts the speech signal originating from the desired look direction with minimum distortion and suppresses unwanted noise and interference [3], AEC compensates for the acoustic coupling between loudspeakers and sensors [4]. Since the scenario implies an almost unconstrained and possibly time-varying user position, an according adaptive BF structure was employed. Its combination with the SAEC structure was guided by the principles laid out in [5]: Since applying SAEC to all 13 microphone signals is computationally too expensive, SAEC was placed behind the BF structure. A set of five data-independent beamformers is computed in parallel, which cover possible speaker positions and track moving users by switching between beams. Thereby, the AECs do not need to track time-varying beamformers. Instead of one SAEC behind each beamformer output, only one SAEC is calculated for the beam covering the source of interest. Assuming that beam-switches occur infrequently, the necessary readaptation of the SAEC filter coefficients is

3 acceptable. The reuse of AEC filter coefficients determined for previously selected beamformers further reduces the impact of occasionally switching beams. The selection of the beamformer output to be passed to the SAEC is made by the source localization. As the SLOC needs to use microphone signals which still contain acoustic echoes of the TV audio signals, a-priori knowledge on the loudspeaker positions has to be exploited to exclude TV loudspeakers as sources of interest. Finally, the SSF module analyzes the output of the SAEC in order to detect speech segments from the user. For a robust system it is crucial that only the desired speech segments and no nonstationary noise or echo residuals will be passed to the ASR - the corresponding decision is supported by the SLOC information. As an example, Fig. 3 shows the effect of the front-end processing for a recording containing five control utterances ( ok, set volume to seven, CNN, set volume to five, and show me the EPG ), of a speaker at a distance of 2.5 meters to the microphone array in broadside direction, in a room with a reverberation time of 3msec, a background noise level of 36dB SPL, and real TV audio output. The upper subplot shows a single microphone input while the lower plot depicts the AEC output together with the correct segmentation by the SSF unit. The cancellation of TV loudspeaker echoes is characterized by a mean error return loss enhancement (ERLE) of 28dB calculated over the last five seconds. (The delay between microphone input and AEC output is 2msec.) Microphone signal Segmented signal after BF, AEC, and SSF t [s] Figure 3. Acoustic front-end processing The following paragraphs outline the algorithms that have been chosen and adapted for the described scenario. Beamforming. To account for the wideband nature of speech and ensure good spatial selectivity, a nested arraybased BF design was chosen [6] using 13 microphones for four subarrays, one which uses seven microphones and three of which use five microphones each, with spacings of.32 m,.16 m,.8 m and.4 m, respectively. These subarrays operate in the frequency bands of 1-9Hz, 9-18Hz, 18-36Hz, and 36-8 Hz, respectively. In the acoustic front-end, the BF module consists of a filter-and-sum beamformer (FSB) and five steering units (SU). The FSB based on a Dolph-Chebyshev design (FSB- DC) [7] with FIR filters of length 512 taps was selected here for its good spatial selectivity and its robustness to sensor calibration errors. The steering units (SU) consist of sets of fractional delay filters [8] which perform the steering of the beam to the five predefined look directions. They are inserted after the FSB filtering of the individual channels. Thereby, the FSB filtering of the microphone signals is required only once for all beams and only the delaying and the summation of the microphone channels has to be carried out for each beam. Multi-channel Acoustic Echo Cancellation. The algorithm employed for the current acoustic front-end here is based on the generalized frequency-domain adaptive filtering (GFDAF) paradigm [9]. Exploiting the computational efficiency of the FFT for minimizing computational load, it also accounts for the cross-correlations among the different reproduction channels to accelerate convergence of the filters and, consequently, achieves a more efficient echo suppression. This is crucial in the given scenario as user movements have to be expected, which in turn imply rapid changes of the impulse responses of the loudspeakerenclosure-microphone (LEM) system that has to be identified by the adaptive filters. Since the stereo channels of the TV audio are usually very similar and therefore not only highly auto-correlated but also often strongly cross-correlated, a preceding channel decorrelation (see Fig. 2) allows a further acceleration of the filter convergence. Apart from breaking up the interchannel correlation it is required that the introduced signal manipulations must not cause audible artifacts. For the discussed acoustic front-end the phase modulation-based approach according to [1] has been implemented which reconciles the requirements of low complexity and convergence support with the demand for not impairing subjective audio quality, especially the spatial image of the reproduced sound. Due to the combination of a single AEC with the switched beamformer described above, the AEC sees a different acoustic echo path after each beam-switch. To avoid the need to readapt the AEC filters starting from nonmatching coefficients, the filter coefficients that were identified in the previous use of the respective beam can be used as a starting point for readaptation [5]. In the given scenario, this proves to be very efficient as underlined by Fig. 4, where ERLE is compared for adaptation with (right) and without coefficient buffering (left) following a beam-switch of the DICIT beamformer at t=2s, given continuous TV audio output. Source Localization Acoustic maps, computed on a grid of points in an enclosure, express the plausibility of sound being generated at those points and hence represent a valid

4 inst. ERLE [db] t [s] t [s] Figure 4. Effect of beam-switching without and with coefficient buffering solution to the SLOC problem. In particular the global coherence field (GCF) [11], also known as SRP-PHAT [3], combines the information obtained through a generalized cross-correlation phase transform (GCC-PHAT) [12] analysis at different microphone pairs. Given a GCF map, the SLOC problem can be addressed by picking the peaks appearing at the spatial points corresponding to active acoustic sources. In DICIT, the sub-array consisting of seven microphones at.32 m distance is used for GCF computation as it guarantees good performance at a reasonable computational power cost. In order to avoid beam-switching during silence phases and to reduce the impact of false beam-switches due to faulty localization estimates, the SLOC module provides the BF with a new position estimation only if the map peak is above a given fixed threshold. In fact, the amplitude of the peak is correlated with the relevance of acoustic activity and can therefore act as an embedded acoustic activity detector. If the map peak is below the chosen threshold, the previous position is kept. Besides robustness, promptness is a crucial requirement for the module so that the system can quickly steer the beam toward the speaker as soon as he/she starts speaking. A memoryless localization is therefore employed in combination with a post-processing whose goal is to suppress outliers, i.e. isolated estimates located far away from the current speaker area. As the SLOC module operates on the microphone signals still containing the TV echoes, estimating the position of the user requires suppression of the loudspeaker signals. In DICIT, the loudspeaker contributions are removed at GCC-PHAT level by exploiting the knowledge of their positions relative to the microphone array. The approach is derived from the multiple source localization approach in [13], treating the single user plus TV loudspeakers as multiple simultaneously active sources. Fig. 5 shows an example of a GCF map before (left) and after (right) the removal of the loudspeaker contributions (bright colors represent high values, the stereo loudspeakers and the DICIT array are schematically depicted on the right side of each plot). Only after the deemphasis of the loudspeakers, the user position (indicated by the circle) corresponds to the highest activity region as visible in the right plot. Experiments conducted on Wizard of Oz data collected in reverberant rooms [14] show that the SLOC module estimates the source position with an RMS error of 7.5 degrees. Figure 5. GCF map before and after the removal of the loudspeaker contributions Smart Speech Filtering After the signal processing by MC-AEC, sound produced by the TV has been almost completely cancelled from beamformer output, therefore user commands can be detected on the basis of the dynamics of the resulting signal. Constraints are applied concerning minimum duration of utterances and maximum duration of pauses between words in order to isolate potential relevant signal segments. Additionally, only signals segments exhibiting a sufficient spatial coherence at the microphones, possibly produced by a speaker in an area in front of and oriented towards the TV, are retained. Thus, speakers in other areas or not addressing DICIT can be ignored. SLOC information is exploited at this stage in order to take into account both the speaker s position and likely orientation [15] BSS-based front-end The BSS-based front-end to be described in the following represents an alternative approach to the front-end presented in the previous section and is currently being integrated into the overall system. Fig. 6 shows the corresponding block diagram. Figure 6. Acoustic front-end (based on BSS) Since BSS can be interpreted as a set of adaptive null-beamformers, it replaces the functionality of dataindependent beamformers and source localization of the first approach. One major advantage of the BSS-based

5 front-end is the reduction of the number of microphones. For the envisaged BSS-based front-end only two sensors will be needed, which is supposed to be of great importance with respect to the overall system complexity, user acceptance and cost. A second benefit is that, in contrast to the prototype that is based on the front-end described in Sect. 3.1 which can currently extract only one active user, BSS using two sensor signals is also able to extract two simultaneously speaking users. In any case two streams of data will be delivered to the following SSF module, carrying the following signals: If no user is active, two zero-valued signals arrive at the SSF component, If one user is active, its signal will appear in one SSF input and will be attenuated in the other SSF input, If two users are simultaneously active, each SSF input will be dominated by one user signal. BSS can be combined in two different ways with AEC. The AEC can be performed directly on the microphone inputs, or it can be applied at a later stage, to the BSS outputs. Taking into account considerations described in [5, 16], we concentrated on the AEC-first alternative, as shown in Fig. 6. The SLOC module depicted in Fig. 6 represents an additional source of information and might supplement the BSS-inherent source localization and thus also help to improve the decisions to be made by the SSF. SSF first processes the two input streams provided by BSS in order to detect speech segments and reject any non-speech event by means of an acoustic event classifier. Moreover, because SSF here has to work on more than one input stream it is likely that two simultaneously active speakers will create two streams with valid speech segments. Therefore it must be decided which speech signal to pass to the ASR and which one to reject. This decision can be based on a speaker identification. The algorithms for MC-AEC and the related signal decorrelation will be the same as in the preceding Sect The following paragraph introduces the components which are only used within the BSS-based architecture. Blind Source Separation. The extraction of up to two simultaneously active sources with two microphones corresponds to the overdetermined or the determined BSS case, respectively. Approaches based on independent component analysis (ICA) are well suited for both cases, merely under the assumption of statistical independence of the original source signals. Here, we consider a broadband BSS approach based on the TRINICON framework [17]. For the development of the BSS-based front-end, we implemented an efficient second-order-statistics (SOS) version of the TRINICON update rule [18]. While BSS recovers the original source signals from a (possibly reverberant) sound mixture without a priori knowledge about the locations of the sources, the BSS demixing filters also contain information on the source locations. One way to retrieve the localization information has been presented in [19]. It relies on the ability of a broadband BSS algorithm to perform blind adaptive identification of the acoustical environment for two microphone channels. Thus, two time-differences-of-arrival (TDOAs) can be extracted by identifying the highest peaks in the BSS filters, corresponding to the direct paths. Acoustic Event Classification and Speaker Identification for SSF In the foreseen scenario a classification step may be necessary to discriminate actual speech segments from other interfering events (phone ringing, sneezing, laughing...). The foreseen acoustic event classification (AECL) is based on a set of mel frequency cepstral coefficients (MFCCs) as acoustic signal features. A score is computed by comparing the observed feature vector with Gaussian mixture models (GMM), trained on examples of the considered acoustic events. The best match in terms of average likelihood provides the classification of the signal segment [2]. Moreover, when the classified event is speech, it may be necessary to classify the speaker identity as well. In this case a speaker identification (SID) capability must be introduced in the SSF, consisting of the two steps of feature extraction and score computation. The acoustic features are again MFCCs, while the scoring is accomplished by combining the results of two sub-systems implementing GMM and support vector machine (SVM), respectively [21]. In the GMM-based sub-system speaker dependent models are obtained through maximum a posteriori (MAP) adaptation of the mean vectors starting from a universal background model (UBM) that represents the background speaker population. In the SVM-based sub-system, elements belonging to non-linearly separable classes are discriminated on the basis of a binary classification, operated by non-linear kernel functions. When more than one speaker is active, the performance of SID is strongly related to the amount of residual interfering speech that may be present at the BSS output. The effect of BSS on SID performance will be further investigated. 4. Conclusions and outlook In this paper we presented the multichannel acoustic front-end of an already fully functional prototype for interactive TV, which has been developed within the EU-funded project DICIT. We also introduced an alternative architecture based on BSS, which extends the functionality of the BF-/SLOC-based approach for multi-user scenarios. The BF-/SLOC-based front-end supports one user whose movements can be tracked fast enough by the SLOC module, so that the combination of BF and AEC guarantees a good signal quality of the desired speech. The SSF can thus pass the user commands to a subsequent ASR while rejecting undesired residual disturbances. The front-end architecture accounts for computational constraints with an efficient

6 combination of a switched beamformer and AEC. As illustrated by experimental results, AEC filter coefficient buffering has proven a simple but effective strategy to improve AEC performance in the case of beam-switches. During the following months, this prototype performance will be evaluated by 18 test subjects. As a next step, and to overcome the limitation to single-user scenarios, BSS in conjunction with AEC will be used for both extraction and localization of multiple users. This also allows to drastically reduce the number of necessary microphones. For the BSS-based approach an adequate SSF module including speaker identification capabilities will be developed before a first comparative evaluation of both acoustic front-ends. In general, the short utterances that characterize the dialogue provide a persisting challenge to further optimize convergence speed of the involved adaptive filtering algorithms in AEC and BSS and to find decision criteria for beam-switching, localization, and SSF, which provide maximum reliability with a minimum amount of observation data. References [1] [2] J. Huang, M. Epstein, and M. Matassoni. Effective acoustic adaptation for a distant-talking interactive TV system. Proc. Interspeech 8, Brisbane, Australia, September 28. [3] M.S. Brandstein and D.B. Ward Eds. Microphone Arrays: Signal Processing Techniques and Applications. Springer, Berlin, 21. [4] E. Haensler and G. Schmidt. Acoustic Echo and Noise Control: A Practical Approach. Wiley, New York, 24. [5] W. Kellermann. Strategies for combining acoustic echo cancellation and adaptive beamforming microphone arrays. Proc. ICASSP 97, Munich, Germany, 1: , April [6] J. L. Flanagan, J. D. Johnson, R. Zahn, and G. W. Elko. Computer-steered microphone arrays for sound transduction in large rooms. JASA, 78(5): , November [7] W. Herbordt. Sound Capture for Human/Machine Interfaces. Springer, Berlin, 25. [8] T. I. Laakso, V. Vlimki, M. Karjalainen, and U. K. Laine. Splitting the unit delay - tools for fractional delay filter design. IEEE Sign. Proc. Mag., 13(1):3 6, January [9] H. Buchner, J. Benesty, and W. Kellermann. Generalized multichannel frequency-domain adaptive filtering: efficient realization and application to handsfree speech communication. 85(3):549 57, March 25. Signal Processing, [1] J. Herre, H. Buchner, and W. Kellermann. Acoustic echo cancellation for surround sound using perceptually motivated convergence enhancement. Proc. ICASSP 7, Honolulu, Hawaii, April 27. [11] R. DeMori. Spoken Dialogue with Computers. Academic Press, London, [12] C. Knapp and G. Carter. The generalized correlation method for estimation of time delay. IEEE Trans. ASSP, 24(4), [13] A. Brutti, M. Omologo, and P. Svaizer. Localization of multiple speakers based on a two step acoustic map analysis. Proc. ICASSP 8, Las Vegas, USA, April 28. [14] A. Brutti, L. Cristoforetti, W. Kellermann, L. Marquardt, and M. Omologo. WOZ acoustic data collection for interactive TV. Proc. LREC 8, Marrakech, Morocco, May 28. [15] A. Brutti, M. Omologo, and P. Svaizer. Speaker localization based on oriented global coherence field. Proc. Interspeech 6, Pittsburgh, USA, September 26. [16] A. Lombard, K. Reindl, and W. Kellermann. Combination of adaptive feedback cancellation and binaural adaptive filtering in hearing aids. Accepted in EURASIP Journal on Advances in Signal Processing, 29. [17] H. Buchner, R. Aichner, and W. Kellermann. TRINICON: A versatile framework for multichannel blind signal processing. Proc. ICASSP 4, Montreal, Canada, 3: , May 24. [18] H. Buchner, R. Aichner, and W. Kellermann. A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans. SAP, 13(1):12 134, January 25. [19] H. Buchner, R. Aichner, J. Stenglein, H. Teutsch, and W. Kellermann. Simultaneous localization of multiple sound sources using blind adaptive MIMO filtering. Proc. ICASSP 5, Philadelphia, USA, 3:97 1, March 25. [2] C. Zieger and M. Omologo. Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm. Proc. Interspeech 8, Brisbane, Australia, September 28. [21] C. Zieger, M. Omologo. Combination of clean, and contaminated GMM/SVM for far-field textindependent speaker verification. Proc. Interspeech 8, Brisbane, Australia, September 28.

WOZ Acoustic Data Collection For Interactive TV

WOZ Acoustic Data Collection For Interactive TV WOZ Acoustic Data Collection For Interactive TV A. Brutti*, L. Cristoforetti*, W. Kellermann+, L. Marquardt+, M. Omologo* * Fondazione Bruno Kessler (FBK) - irst Via Sommarive 18, 38050 Povo (TN), ITALY

More information

FP6 IST

FP6 IST FP6 IST-034624 http://dicit.itc.it Deliverable 2.4 Hardware and Software Architecture for the Final STB Prototype Lead Authors Rajesh Balchandran Martin Labsky Affiliation IBM Research Date: August 20,

More information

DESIGNING OPTIMIZED MICROPHONE BEAMFORMERS

DESIGNING OPTIMIZED MICROPHONE BEAMFORMERS 3235 Kifer Rd. Suite 100 Santa Clara, CA 95051 www.dspconcepts.com DESIGNING OPTIMIZED MICROPHONE BEAMFORMERS Our previous paper, Fundamentals of Voice UI, explained the algorithms and processes required

More information

TEPZZ A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: H04S 7/00 ( ) H04R 25/00 (2006.

TEPZZ A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: H04S 7/00 ( ) H04R 25/00 (2006. (19) TEPZZ 94 98 A_T (11) EP 2 942 982 A1 (12) EUROPEAN PATENT APPLICATION (43) Date of publication: 11.11. Bulletin /46 (1) Int Cl.: H04S 7/00 (06.01) H04R /00 (06.01) (21) Application number: 141838.7

More information

TEPZZ 94 98_A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2015/46

TEPZZ 94 98_A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2015/46 (19) TEPZZ 94 98_A_T (11) EP 2 942 981 A1 (12) EUROPEAN PATENT APPLICATION (43) Date of publication: 11.11.1 Bulletin 1/46 (1) Int Cl.: H04S 7/00 (06.01) H04R /00 (06.01) (21) Application number: 1418384.0

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

FP6 IST

FP6 IST FP6 IST-034624 http://dicit.itc.it Deliverable 2.1 DICIT Architecture Tools, Standards, Hardware and Software for the First Prototypes Authors: Gregg Daggett Affiliations: IBM Date: 5-Oct-2007 Document

More information

Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics

Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics Master Thesis Signal Processing Thesis no December 2011 Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics Md Zameari Islam GM Sabil Sajjad This thesis is presented

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Digital Correction for Multibit D/A Converters

Digital Correction for Multibit D/A Converters Digital Correction for Multibit D/A Converters José L. Ceballos 1, Jesper Steensgaard 2 and Gabor C. Temes 1 1 Dept. of Electrical Engineering and Computer Science, Oregon State University, Corvallis,

More information

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK. Andrew Robbins MindMouse Project Description: MindMouse is an application that interfaces the user s mind with the computer s mouse functionality. The hardware that is required for MindMouse is the Emotiv

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Smart Traffic Control System Using Image Processing

Smart Traffic Control System Using Image Processing Smart Traffic Control System Using Image Processing Prashant Jadhav 1, Pratiksha Kelkar 2, Kunal Patil 3, Snehal Thorat 4 1234Bachelor of IT, Department of IT, Theem College Of Engineering, Maharashtra,

More information

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices Audio Converters ABSTRACT This application note describes the features, operating procedures and control capabilities of a

More information

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Investigation

More information

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine Project: Real-Time Speech Enhancement Introduction Telephones are increasingly being used in noisy

More information

TERRESTRIAL broadcasting of digital television (DTV)

TERRESTRIAL broadcasting of digital television (DTV) IEEE TRANSACTIONS ON BROADCASTING, VOL 51, NO 1, MARCH 2005 133 Fast Initialization of Equalizers for VSB-Based DTV Transceivers in Multipath Channel Jong-Moon Kim and Yong-Hwan Lee Abstract This paper

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Using the BHM binaural head microphone

Using the BHM binaural head microphone 11/17 Using the binaural head microphone Introduction 1 Recording with a binaural head microphone 2 Equalization of a recording 2 Individual equalization curves 5 Using the equalization curves 5 Post-processing

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

How to Obtain a Good Stereo Sound Stage in Cars

How to Obtain a Good Stereo Sound Stage in Cars Page 1 How to Obtain a Good Stereo Sound Stage in Cars Author: Lars-Johan Brännmark, Chief Scientist, Dirac Research First Published: November 2017 Latest Update: November 2017 Designing a sound system

More information

ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION. Hsin-Chu, Taiwan

ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION. Hsin-Chu, Taiwan ICSV14 Cairns Australia 9-12 July, 2007 ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION Percy F. Wang 1 and Mingsian R. Bai 2 1 Southern Research Institute/University of Alabama at Birmingham

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

The Distortion Magnifier

The Distortion Magnifier The Distortion Magnifier Bob Cordell January 13, 2008 Updated March 20, 2009 The Distortion magnifier described here provides ways of measuring very low levels of THD and IM distortions. These techniques

More information

Digital Signal Processing Detailed Course Outline

Digital Signal Processing Detailed Course Outline Digital Signal Processing Detailed Course Outline Lesson 1 - Overview Many digital signal processing algorithms emulate analog processes that have been around for decades. Other signal processes are only

More information

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Comparison Parameters and Speaker Similarity Coincidence Criteria: Comparison Parameters and Speaker Similarity Coincidence Criteria: The Easy Voice system uses two interrelating parameters of comparison (first and second error types). False Rejection, FR is a probability

More information

Speech Recognition and Signal Processing for Broadcast News Transcription

Speech Recognition and Signal Processing for Broadcast News Transcription 2.2.1 Speech Recognition and Signal Processing for Broadcast News Transcription Continued research and development of a broadcast news speech transcription system has been promoted. Universities and researchers

More information

ADAPTIVE DIFFERENTIAL MICROPHONE ARRAYS USED AS A FRONT-END FOR AN AUTOMATIC SPEECH RECOGNITION SYSTEM

ADAPTIVE DIFFERENTIAL MICROPHONE ARRAYS USED AS A FRONT-END FOR AN AUTOMATIC SPEECH RECOGNITION SYSTEM ADAPTIVE DIFFERENTIAL MICROPHONE ARRAYS USED AS A FRONT-END FOR AN AUTOMATIC SPEECH RECOGNITION SYSTEM Elmar Messner, Hannes Pessentheiner, Juan A. Morales-Cordovilla, Martin Hagmüller Signal Processing

More information

Techniques for Extending Real-Time Oscilloscope Bandwidth

Techniques for Extending Real-Time Oscilloscope Bandwidth Techniques for Extending Real-Time Oscilloscope Bandwidth Over the past decade, data communication rates have increased by a factor well over 10X. Data rates that were once 1Gb/sec and below are now routinely

More information

News from Rohde&Schwarz Number 195 (2008/I)

News from Rohde&Schwarz Number 195 (2008/I) BROADCASTING TV analyzers 45120-2 48 R&S ETL TV Analyzer The all-purpose instrument for all major digital and analog TV standards Transmitter production, installation, and service require measuring equipment

More information

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer by: Matt Mazzola 12222670 Abstract The design of a spectrum analyzer on an embedded device is presented. The device achieves minimum

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 5.3 ACTIVE NOISE CONTROL

More information

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting Page 1 of 10 1. SCOPE This Operational Practice is recommended by Free TV Australia and refers to the measurement of audio loudness as distinct from audio level. It sets out guidelines for measuring and

More information

Acoustic Measurements Using Common Computer Accessories: Do Try This at Home. Dale H. Litwhiler, Terrance D. Lovell

Acoustic Measurements Using Common Computer Accessories: Do Try This at Home. Dale H. Litwhiler, Terrance D. Lovell Abstract Acoustic Measurements Using Common Computer Accessories: Do Try This at Home Dale H. Litwhiler, Terrance D. Lovell Penn State Berks-LehighValley College This paper presents some simple techniques

More information

HEAD. HEAD VISOR (Code 7500ff) Overview. Features. System for online localization of sound sources in real time

HEAD. HEAD VISOR (Code 7500ff) Overview. Features. System for online localization of sound sources in real time HEAD Ebertstraße 30a 52134 Herzogenrath Tel.: +49 2407 577-0 Fax: +49 2407 577-99 email: info@head-acoustics.de Web: www.head-acoustics.de Data Datenblatt Sheet HEAD VISOR (Code 7500ff) System for online

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

Supervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing

Supervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing Welcome Supervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing Jörg Houpert Cube-Tec International Oslo, Norway 4th May, 2010 Joint Technical Symposium

More information

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Vinaykumar Bagali 1, Deepika S Karishankari 2 1 Asst Prof, Electrical and Electronics Dept, BLDEA

More information

Journal of Theoretical and Applied Information Technology 20 th July Vol. 65 No JATIT & LLS. All rights reserved.

Journal of Theoretical and Applied Information Technology 20 th July Vol. 65 No JATIT & LLS. All rights reserved. MODELING AND REAL-TIME DSK C6713 IMPLEMENTATION OF NORMALIZED LEAST MEAN SQUARE (NLMS) ADAPTIVE ALGORITHM FOR ACOUSTIC NOISE CANCELLATION (ANC) IN VOICE COMMUNICATIONS 1 AZEDDINE WAHBI, 2 AHMED ROUKHE,

More information

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT Stefan Schiemenz, Christian Hentschel Brandenburg University of Technology, Cottbus, Germany ABSTRACT Spatial image resizing is an important

More information

OBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST

OBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST OBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST Dr.-Ing. Renato S. Pellegrini Dr.- Ing. Alexander Krüger Véronique Larcher Ph. D. ABSTRACT Sennheiser AMBEO, Switzerland Object-audio workflows for traditional

More information

IP Telephony and Some Factors that Influence Speech Quality

IP Telephony and Some Factors that Influence Speech Quality IP Telephony and Some Factors that Influence Speech Quality Hans W. Gierlich Vice President HEAD acoustics GmbH Introduction This paper examines speech quality and Internet protocol (IP) telephony. Voice

More information

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing Universal Journal of Electrical and Electronic Engineering 4(2): 67-72, 2016 DOI: 10.13189/ujeee.2016.040204 http://www.hrpub.org Investigation of Digital Signal Processing of High-speed DACs Signals for

More information

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 International Conference on Applied Science and Engineering Innovation (ASEI 2015) Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 1 China Satellite Maritime

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Enhancing Music Maps

Enhancing Music Maps Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing

More information

CHARACTERIZATION OF END-TO-END DELAYS IN HEAD-MOUNTED DISPLAY SYSTEMS

CHARACTERIZATION OF END-TO-END DELAYS IN HEAD-MOUNTED DISPLAY SYSTEMS CHARACTERIZATION OF END-TO-END S IN HEAD-MOUNTED DISPLAY SYSTEMS Mark R. Mine University of North Carolina at Chapel Hill 3/23/93 1. 0 INTRODUCTION This technical report presents the results of measurements

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Piotr KLECZKOWSKI, Magdalena PLEWA, Grzegorz PYDA

Piotr KLECZKOWSKI, Magdalena PLEWA, Grzegorz PYDA ARCHIVES OF ACOUSTICS 33, 4 (Supplement), 147 152 (2008) LOCALIZATION OF A SOUND SOURCE IN DOUBLE MS RECORDINGS Piotr KLECZKOWSKI, Magdalena PLEWA, Grzegorz PYDA AGH University od Science and Technology

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

White Paper. Uniform Luminance Technology. What s inside? What is non-uniformity and noise in LCDs? Why is it a problem? How is it solved?

White Paper. Uniform Luminance Technology. What s inside? What is non-uniformity and noise in LCDs? Why is it a problem? How is it solved? White Paper Uniform Luminance Technology What s inside? What is non-uniformity and noise in LCDs? Why is it a problem? How is it solved? Tom Kimpe Manager Technology & Innovation Group Barco Medical Imaging

More information

Voice Controlled Car System

Voice Controlled Car System Voice Controlled Car System 6.111 Project Proposal Ekin Karasan & Driss Hafdi November 3, 2016 1. Overview Voice controlled car systems have been very important in providing the ability to drivers to adjust

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 7.9 THE FUTURE OF SOUND

More information

Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series

Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series Introduction System designers and device manufacturers so long have been using one set of instruments for creating digitally modulated

More information

OVERVIEW. YAMAHA Electronics Corp., USA 6660 Orangethorpe Avenue

OVERVIEW. YAMAHA Electronics Corp., USA 6660 Orangethorpe Avenue OVERVIEW With decades of experience in home audio, pro audio and various sound technologies for the music industry, Yamaha s entry into audio systems for conferencing is an easy and natural evolution.

More information

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes Digital Signal and Image Processing Lab Simone Milani Ph.D. student simone.milani@dei.unipd.it, Summer School

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

White Paper Measuring and Optimizing Sound Systems: An introduction to JBL Smaart

White Paper Measuring and Optimizing Sound Systems: An introduction to JBL Smaart White Paper Measuring and Optimizing Sound Systems: An introduction to JBL Smaart by Sam Berkow & Alexander Yuill-Thornton II JBL Smaart is a general purpose acoustic measurement and sound system optimization

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Chapter 6: Real-Time Image Formation

Chapter 6: Real-Time Image Formation Chapter 6: Real-Time Image Formation digital transmit beamformer DAC high voltage amplifier keyboard system control beamformer control T/R switch array body display B, M, Doppler image processing digital

More information

Torsional vibration analysis in ArtemiS SUITE 1

Torsional vibration analysis in ArtemiS SUITE 1 02/18 in ArtemiS SUITE 1 Introduction 1 Revolution speed information as a separate analog channel 1 Revolution speed information as a digital pulse channel 2 Proceeding and general notes 3 Application

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Recommendation ITU-T H.261 Fernando Pereira The objective of this lab session about Recommendation ITU-T H.261 is to get the students familiar with many aspects

More information

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Arijit Ghosal, Rudrasis Chakraborty, Bibhas Chandra Dhara +, and Sanjoy Kumar Saha! * CSE Dept., Institute of Technology

More information

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS P. Th. Savvopoulos. PhD., A. Apostolopoulos, L. Dimitrov 3 Department of Electrical and Computer Engineering, University of Patras, 65 Patras,

More information

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric

More information

PEP-II longitudinal feedback and the low groupdelay. Dmitry Teytelman

PEP-II longitudinal feedback and the low groupdelay. Dmitry Teytelman PEP-II longitudinal feedback and the low groupdelay woofer Dmitry Teytelman 1 Outline I. PEP-II longitudinal feedback and the woofer channel II. Low group-delay woofer topology III. Why do we need a separate

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Summary of Speech Technology and Market Opportunities in the TV and Set-top Box Markets: hands-free remote control systems

Summary of Speech Technology and Market Opportunities in the TV and Set-top Box Markets: hands-free remote control systems Summary of Speech Technology and Market Opportunities in the TV and Set-top Box Markets: hands-free remote control systems DICIT Consortium 1 (IBM (Praha - Czech Republic, T.J Watson Research Center -

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

DISTRIBUTION STATEMENT A 7001Ö

DISTRIBUTION STATEMENT A 7001Ö Serial Number 09/678.881 Filing Date 4 October 2000 Inventor Robert C. Higgins NOTICE The above identified patent application is available for licensing. Requests for information should be addressed to:

More information

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND Aleksander Kaminiarz, Ewa Łukasik Institute of Computing Science, Poznań University of Technology. Piotrowo 2, 60-965 Poznań, Poland e-mail: Ewa.Lukasik@cs.put.poznan.pl

More information

OPTICAL POWER METER WITH SMART DETECTOR HEAD

OPTICAL POWER METER WITH SMART DETECTOR HEAD OPTICAL POWER METER WITH SMART DETECTOR HEAD Features Fast response (over 1000 readouts/s) Wavelengths: 440 to 900 nm for visible (VIS) and 800 to 1700 nm for infrared (IR) NIST traceable Built-in attenuator

More information

A New "Duration-Adapted TR" Waveform Capture Method Eliminates Severe Limitations

A New Duration-Adapted TR Waveform Capture Method Eliminates Severe Limitations 31 st Conference of the European Working Group on Acoustic Emission (EWGAE) Th.3.B.4 More Info at Open Access Database www.ndt.net/?id=17567 A New "Duration-Adapted TR" Waveform Capture Method Eliminates

More information

CURRICULUM VITAE John Usher

CURRICULUM VITAE John Usher CURRICULUM VITAE John Usher John_Usher-AT-me.com Education: Ph.D. Audio upmixing signal processing and sound quality evaluation. 2006. McGill University, Montreal, Canada. Dean s Honours List Recommendation.

More information

AR SWORD Digital Receiver EXciter (DREX)

AR SWORD Digital Receiver EXciter (DREX) Typical Applications Applied Radar, Inc. Radar Pulse-Doppler processing General purpose waveform generation and collection Multi-channel digital beamforming Military applications SIGINT/ELINT MIMO and

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal

International Journal of Engineering Research-Online A Peer Reviewed International Journal RESEARCH ARTICLE ISSN: 2321-7758 VLSI IMPLEMENTATION OF SERIES INTEGRATOR COMPOSITE FILTERS FOR SIGNAL PROCESSING MURALI KRISHNA BATHULA Research scholar, ECE Department, UCEK, JNTU Kakinada ABSTRACT The

More information

Reconfigurable Neural Net Chip with 32K Connections

Reconfigurable Neural Net Chip with 32K Connections Reconfigurable Neural Net Chip with 32K Connections H.P. Graf, R. Janow, D. Henderson, and R. Lee AT&T Bell Laboratories, Room 4G320, Holmdel, NJ 07733 Abstract We describe a CMOS neural net chip with

More information

PulseCounter Neutron & Gamma Spectrometry Software Manual

PulseCounter Neutron & Gamma Spectrometry Software Manual PulseCounter Neutron & Gamma Spectrometry Software Manual MAXIMUS ENERGY CORPORATION Written by Dr. Max I. Fomitchev-Zamilov Web: maximus.energy TABLE OF CONTENTS 0. GENERAL INFORMATION 1. DEFAULT SCREEN

More information

Standard Definition. Commercial File Delivery. Technical Specifications

Standard Definition. Commercial File Delivery. Technical Specifications Standard Definition Commercial File Delivery Technical Specifications (NTSC) May 2015 This document provides technical specifications for those producing standard definition interstitial content (commercial

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

Area-Efficient Decimation Filter with 50/60 Hz Power-Line Noise Suppression for ΔΣ A/D Converters

Area-Efficient Decimation Filter with 50/60 Hz Power-Line Noise Suppression for ΔΣ A/D Converters SICE Journal of Control, Measurement, and System Integration, Vol. 10, No. 3, pp. 165 169, May 2017 Special Issue on SICE Annual Conference 2016 Area-Efficient Decimation Filter with 50/60 Hz Power-Line

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

What s New in Raven May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven

What s New in Raven May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven What s New in Raven 1.3 16 May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven 1.2.1. Extensible multi-channel audio input device support

More information

How to use the DC Live/Forensics Dynamic Spectral Subtraction (DSS ) Filter

How to use the DC Live/Forensics Dynamic Spectral Subtraction (DSS ) Filter How to use the DC Live/Forensics Dynamic Spectral Subtraction (DSS ) Filter Overview The new DSS feature in the DC Live/Forensics software is a unique and powerful tool capable of recovering speech from

More information

Introduction. Edge Enhancement (SEE( Advantages of Scalable SEE) Lijun Yin. Scalable Enhancement and Optimization. Case Study:

Introduction. Edge Enhancement (SEE( Advantages of Scalable SEE) Lijun Yin. Scalable Enhancement and Optimization. Case Study: Case Study: Scalable Edge Enhancement Introduction Edge enhancement is a post processing for displaying radiologic images on the monitor to achieve as good visual quality as the film printing does. Edges

More information

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator. CARDIFF UNIVERSITY EXAMINATION PAPER Academic Year: 2013/2014 Examination Period: Examination Paper Number: Examination Paper Title: Duration: Autumn CM3106 Solutions Multimedia 2 hours Do not turn this

More information

A. Ideal Ratio Mask If there is no RIR, the IRM for time frame t and frequency f can be expressed as [17]: ( IRM(t, f) =

A. Ideal Ratio Mask If there is no RIR, the IRM for time frame t and frequency f can be expressed as [17]: ( IRM(t, f) = 1 Two-Stage Monaural Source Separation in Reverberant Room Environments using Deep Neural Networks Yang Sun, Student Member, IEEE, Wenwu Wang, Senior Member, IEEE, Jonathon Chambers, Fellow, IEEE, and

More information

Major Differences Between the DT9847 Series Modules

Major Differences Between the DT9847 Series Modules DT9847 Series Dynamic Signal Analyzer for USB With Low THD and Wide Dynamic Range The DT9847 Series are high-accuracy, dynamic signal acquisition modules designed for sound and vibration applications.

More information

On the Characterization of Distributed Virtual Environment Systems

On the Characterization of Distributed Virtual Environment Systems On the Characterization of Distributed Virtual Environment Systems P. Morillo, J. M. Orduña, M. Fernández and J. Duato Departamento de Informática. Universidad de Valencia. SPAIN DISCA. Universidad Politécnica

More information

Implementation of a turbo codes test bed in the Simulink environment

Implementation of a turbo codes test bed in the Simulink environment University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 Implementation of a turbo codes test bed in the Simulink environment

More information

Digital Signal Processing. Prof. Dietrich Klakow Rahil Mahdian

Digital Signal Processing. Prof. Dietrich Klakow Rahil Mahdian Digital Signal Processing Prof. Dietrich Klakow Rahil Mahdian Language Teaching: English Questions: English (or German) Slides: English Tutorials: one English and one German group Exercise sheets: most

More information

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong Appendix D UW DigiScope User s Manual Willis J. Tompkins and Annie Foong UW DigiScope is a program that gives the user a range of basic functions typical of a digital oscilloscope. Included are such features

More information