A natural acoustic front-end for Interactive TV in the EU-Project DICIT
|
|
- Ashley Morrison
- 6 years ago
- Views:
Transcription
1 A natural acoustic front-end for Interactive TV in the EU-Project DICIT L. Marquardt a,p.svaizer b,e.mabande a,a.brutti b,c.zieger b,m.omologo b, and W. Kellermann a a Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr. 7, 9158 Erlangen, Germany b Fondazione Bruno Kessler - irst, Via Sommarive 18, 381 Trento, Italy addresses: {marquardt,mabande,wk}@lnt.de a, {svaizer,brutti,zieger,omologo}@fbk.eu b Abstract Distant-talking Interfaces for Control of Interactive TV (DICIT) is a European Union-funded project whose main objective is to integrate distant-talking voice interaction as a complementary modality to the use of a remote control in interactive TV systems. Hands-free and seamless control enables a natural user-system interaction providing a suitable means to greatly ease information retrieval. In the given living room scenario the system recognizes commands spoken by multiple and possibly moving users, even in the presence of background noise and TV surround audio. This paper focuses on the multichannel acoustic frontend (MCAF) processing for acoustic scene interpretation which is based on the combination of multi-channel acoustic echo cancellation, blind source separation, beamforming, acoustic event classification, and multiple speaker localization. The fully functional DICIT prototype consists of the MCAF, automatic speech recognition, natural language understanding, mixed-initiative dialogue and satellite connection. 1. Introduction The goal of DICIT [1] is to provide a user-friendly multimodal interface that allows voice-based access to a virtual smart assistant for interacting with TV-related digital devices and infotainment services, such as digital TV, Hi- Fi audio devices, etc., in a typical living room. Multiple and possibly moving users can use their voice for controlling the TV, e.g., requesting information about an upcoming program and scheduling its recording without the need for any hand-held or head-mounted gear. This scenario requires real-time-capable acoustic signal processing techniques which compensate for the impairments of the desired speech signals by acoustic echoes from the loudspeakers, local interferers, ambient noise and reverberation. Accordingly, one of the key components for the prototypes developed within the DICIT project is the combination of state-of-the-art multichannel acoustic echo cancellation (MC-AEC), beamforming (BF), blind source separa- This work was partially supported by the European Commission within the DICIT project under contract number tion (BSS), smart speech filtering (SSF) based on acoustic event detection and classification, and multiple source localization (SLOC) techniques. The subsequent sections of this paper are structured as follows: In Sect. 2 we describe the general architecture of the overall DICIT system. The acoustic front-end as a crucial building block of the DICIT system is presented in Sect. 3: We first describe the currently fully integrated front-end which is based on MC-AEC, BF, SLOC and SSF (see also the video at [1]). An alternative approach under development, featuring BSS, MC-AEC and SSF, is presented next. Conclusions and an outlook on next steps and further possible improvements are given in Sect The DICIT System In the following, we first describe the architecture of the overall DICIT system, outline the functionality of its most important components, and briefly describe the used hardware System architecture The main building blocks of the DICIT system are the signal acquisition and playback hardware, the acoustic front-end processing, the automatic speech recognition (ASR) and natural language understanding (NLU) unit, and the actual dialogue manager (DM) as depicted in Fig. 1. Figure 1. DICIT architecture The first block comprises the hardware for signal acquisition and reproduction as detailed in the upper part of Fig. 2. The main components are here the 13-channel microphone array and a multichannel loudspeaker system, capturing the acoustic signals from the environment, and playing back the digitally mixed outputs from the TV and the dialogue system in stereo format, respectively. Note that the TV system comprises a remote control device as well as a set-top box (STB) platform, providing access to on-air satellite signals.
2 The acoustic front-end processing, which will be described in detail in Sect. 3, extracts the desired speech from the microphone signals and passes it to the subsequent ASR. Given the state of the art in robust speech recognition, it is still crucial for the targeted environment to remove to the greatest possible extent any signal impairments due to reverberation, background noise, interferers, and acoustic feedback from the loudspeakers to the microphones and to forward only signal segments to the ASR that can reliably be classified as user speech. Continuous speech recognition technology in DICIT is based on IBM Embedded ViaVoice (EVV) [2]. Acoustic models have been trained in order to optimize the recognition performance given the distant-talking voice characteristics as well as the typical noisy and reverberant conditions of the addressed scenario. The ASR output is interpreted by an NLU unit which employs a statistical modelling called Multi-level Action Classifier [2]. This processing chain has been optimized for the English, German, and Italian languages to enable multilinguality as additional feature of the system. The DM finally manages all interactions with user input and system output and interfaces to external data and devices. Depending on the NLU output or remote control input, the DM is primarily responsible for informationretrieval from the electronic program guide (EPG) and for controlling the TV/STB-System. Feedback to the user is possible by acoustic means via speech generation and visual means via the screen Hardware setup Apart from the microphone array and the loudspeakers, the hardware setup consists of AD-/DA-converters, preamplifiers, the STB and two PCs - the usage of two PCs was commanded by the use of two different operating systems. The first Linux-based PC is equipped with a multichannel digital soundcard and hosts the acoustic front-end processing modules. The second Windows-based PC hosts ASR, NLU, and the DM - communication between the two PCs is established via a TCP-based standard internet protocol. The video signal from the STB is displayed via an LCD screen or video projector. 3. Acoustic front-end The acoustic front-end foresees different combinations of signal processing components. Its configuration depends primarily on computational constraints and the requirements of the specific scenario. The following subsections describe two practically relevant architectures, both featuring MC-AEC but differing with respect to the employed spatial processing and source localization techniques. While the first configuration is part of the current DICIT prototype and uses beamforming and traditional correlation-based source localization, the BSS-based architecture which aims at an extended functionality and reduces the number of microphones, is currently being integrated BF-/SLOC-based front-end The DICIT prototype is based on an acoustic front-end which efficiently combines stereo acoustic echo cancellation (SAEC), BF, SLOC, and SSF. The front-end and its connection to the signal acquisition and playback stage are depicted in Fig. 2. Figure 2. Acoustic front-end (based on BF and SLOC) We first consider the structure of the entire BF-/SLOCbased front-end, before its individual components are described in more detail. While BF extracts the speech signal originating from the desired look direction with minimum distortion and suppresses unwanted noise and interference [3], AEC compensates for the acoustic coupling between loudspeakers and sensors [4]. Since the scenario implies an almost unconstrained and possibly time-varying user position, an according adaptive BF structure was employed. Its combination with the SAEC structure was guided by the principles laid out in [5]: Since applying SAEC to all 13 microphone signals is computationally too expensive, SAEC was placed behind the BF structure. A set of five data-independent beamformers is computed in parallel, which cover possible speaker positions and track moving users by switching between beams. Thereby, the AECs do not need to track time-varying beamformers. Instead of one SAEC behind each beamformer output, only one SAEC is calculated for the beam covering the source of interest. Assuming that beam-switches occur infrequently, the necessary readaptation of the SAEC filter coefficients is
3 acceptable. The reuse of AEC filter coefficients determined for previously selected beamformers further reduces the impact of occasionally switching beams. The selection of the beamformer output to be passed to the SAEC is made by the source localization. As the SLOC needs to use microphone signals which still contain acoustic echoes of the TV audio signals, a-priori knowledge on the loudspeaker positions has to be exploited to exclude TV loudspeakers as sources of interest. Finally, the SSF module analyzes the output of the SAEC in order to detect speech segments from the user. For a robust system it is crucial that only the desired speech segments and no nonstationary noise or echo residuals will be passed to the ASR - the corresponding decision is supported by the SLOC information. As an example, Fig. 3 shows the effect of the front-end processing for a recording containing five control utterances ( ok, set volume to seven, CNN, set volume to five, and show me the EPG ), of a speaker at a distance of 2.5 meters to the microphone array in broadside direction, in a room with a reverberation time of 3msec, a background noise level of 36dB SPL, and real TV audio output. The upper subplot shows a single microphone input while the lower plot depicts the AEC output together with the correct segmentation by the SSF unit. The cancellation of TV loudspeaker echoes is characterized by a mean error return loss enhancement (ERLE) of 28dB calculated over the last five seconds. (The delay between microphone input and AEC output is 2msec.) Microphone signal Segmented signal after BF, AEC, and SSF t [s] Figure 3. Acoustic front-end processing The following paragraphs outline the algorithms that have been chosen and adapted for the described scenario. Beamforming. To account for the wideband nature of speech and ensure good spatial selectivity, a nested arraybased BF design was chosen [6] using 13 microphones for four subarrays, one which uses seven microphones and three of which use five microphones each, with spacings of.32 m,.16 m,.8 m and.4 m, respectively. These subarrays operate in the frequency bands of 1-9Hz, 9-18Hz, 18-36Hz, and 36-8 Hz, respectively. In the acoustic front-end, the BF module consists of a filter-and-sum beamformer (FSB) and five steering units (SU). The FSB based on a Dolph-Chebyshev design (FSB- DC) [7] with FIR filters of length 512 taps was selected here for its good spatial selectivity and its robustness to sensor calibration errors. The steering units (SU) consist of sets of fractional delay filters [8] which perform the steering of the beam to the five predefined look directions. They are inserted after the FSB filtering of the individual channels. Thereby, the FSB filtering of the microphone signals is required only once for all beams and only the delaying and the summation of the microphone channels has to be carried out for each beam. Multi-channel Acoustic Echo Cancellation. The algorithm employed for the current acoustic front-end here is based on the generalized frequency-domain adaptive filtering (GFDAF) paradigm [9]. Exploiting the computational efficiency of the FFT for minimizing computational load, it also accounts for the cross-correlations among the different reproduction channels to accelerate convergence of the filters and, consequently, achieves a more efficient echo suppression. This is crucial in the given scenario as user movements have to be expected, which in turn imply rapid changes of the impulse responses of the loudspeakerenclosure-microphone (LEM) system that has to be identified by the adaptive filters. Since the stereo channels of the TV audio are usually very similar and therefore not only highly auto-correlated but also often strongly cross-correlated, a preceding channel decorrelation (see Fig. 2) allows a further acceleration of the filter convergence. Apart from breaking up the interchannel correlation it is required that the introduced signal manipulations must not cause audible artifacts. For the discussed acoustic front-end the phase modulation-based approach according to [1] has been implemented which reconciles the requirements of low complexity and convergence support with the demand for not impairing subjective audio quality, especially the spatial image of the reproduced sound. Due to the combination of a single AEC with the switched beamformer described above, the AEC sees a different acoustic echo path after each beam-switch. To avoid the need to readapt the AEC filters starting from nonmatching coefficients, the filter coefficients that were identified in the previous use of the respective beam can be used as a starting point for readaptation [5]. In the given scenario, this proves to be very efficient as underlined by Fig. 4, where ERLE is compared for adaptation with (right) and without coefficient buffering (left) following a beam-switch of the DICIT beamformer at t=2s, given continuous TV audio output. Source Localization Acoustic maps, computed on a grid of points in an enclosure, express the plausibility of sound being generated at those points and hence represent a valid
4 inst. ERLE [db] t [s] t [s] Figure 4. Effect of beam-switching without and with coefficient buffering solution to the SLOC problem. In particular the global coherence field (GCF) [11], also known as SRP-PHAT [3], combines the information obtained through a generalized cross-correlation phase transform (GCC-PHAT) [12] analysis at different microphone pairs. Given a GCF map, the SLOC problem can be addressed by picking the peaks appearing at the spatial points corresponding to active acoustic sources. In DICIT, the sub-array consisting of seven microphones at.32 m distance is used for GCF computation as it guarantees good performance at a reasonable computational power cost. In order to avoid beam-switching during silence phases and to reduce the impact of false beam-switches due to faulty localization estimates, the SLOC module provides the BF with a new position estimation only if the map peak is above a given fixed threshold. In fact, the amplitude of the peak is correlated with the relevance of acoustic activity and can therefore act as an embedded acoustic activity detector. If the map peak is below the chosen threshold, the previous position is kept. Besides robustness, promptness is a crucial requirement for the module so that the system can quickly steer the beam toward the speaker as soon as he/she starts speaking. A memoryless localization is therefore employed in combination with a post-processing whose goal is to suppress outliers, i.e. isolated estimates located far away from the current speaker area. As the SLOC module operates on the microphone signals still containing the TV echoes, estimating the position of the user requires suppression of the loudspeaker signals. In DICIT, the loudspeaker contributions are removed at GCC-PHAT level by exploiting the knowledge of their positions relative to the microphone array. The approach is derived from the multiple source localization approach in [13], treating the single user plus TV loudspeakers as multiple simultaneously active sources. Fig. 5 shows an example of a GCF map before (left) and after (right) the removal of the loudspeaker contributions (bright colors represent high values, the stereo loudspeakers and the DICIT array are schematically depicted on the right side of each plot). Only after the deemphasis of the loudspeakers, the user position (indicated by the circle) corresponds to the highest activity region as visible in the right plot. Experiments conducted on Wizard of Oz data collected in reverberant rooms [14] show that the SLOC module estimates the source position with an RMS error of 7.5 degrees. Figure 5. GCF map before and after the removal of the loudspeaker contributions Smart Speech Filtering After the signal processing by MC-AEC, sound produced by the TV has been almost completely cancelled from beamformer output, therefore user commands can be detected on the basis of the dynamics of the resulting signal. Constraints are applied concerning minimum duration of utterances and maximum duration of pauses between words in order to isolate potential relevant signal segments. Additionally, only signals segments exhibiting a sufficient spatial coherence at the microphones, possibly produced by a speaker in an area in front of and oriented towards the TV, are retained. Thus, speakers in other areas or not addressing DICIT can be ignored. SLOC information is exploited at this stage in order to take into account both the speaker s position and likely orientation [15] BSS-based front-end The BSS-based front-end to be described in the following represents an alternative approach to the front-end presented in the previous section and is currently being integrated into the overall system. Fig. 6 shows the corresponding block diagram. Figure 6. Acoustic front-end (based on BSS) Since BSS can be interpreted as a set of adaptive null-beamformers, it replaces the functionality of dataindependent beamformers and source localization of the first approach. One major advantage of the BSS-based
5 front-end is the reduction of the number of microphones. For the envisaged BSS-based front-end only two sensors will be needed, which is supposed to be of great importance with respect to the overall system complexity, user acceptance and cost. A second benefit is that, in contrast to the prototype that is based on the front-end described in Sect. 3.1 which can currently extract only one active user, BSS using two sensor signals is also able to extract two simultaneously speaking users. In any case two streams of data will be delivered to the following SSF module, carrying the following signals: If no user is active, two zero-valued signals arrive at the SSF component, If one user is active, its signal will appear in one SSF input and will be attenuated in the other SSF input, If two users are simultaneously active, each SSF input will be dominated by one user signal. BSS can be combined in two different ways with AEC. The AEC can be performed directly on the microphone inputs, or it can be applied at a later stage, to the BSS outputs. Taking into account considerations described in [5, 16], we concentrated on the AEC-first alternative, as shown in Fig. 6. The SLOC module depicted in Fig. 6 represents an additional source of information and might supplement the BSS-inherent source localization and thus also help to improve the decisions to be made by the SSF. SSF first processes the two input streams provided by BSS in order to detect speech segments and reject any non-speech event by means of an acoustic event classifier. Moreover, because SSF here has to work on more than one input stream it is likely that two simultaneously active speakers will create two streams with valid speech segments. Therefore it must be decided which speech signal to pass to the ASR and which one to reject. This decision can be based on a speaker identification. The algorithms for MC-AEC and the related signal decorrelation will be the same as in the preceding Sect The following paragraph introduces the components which are only used within the BSS-based architecture. Blind Source Separation. The extraction of up to two simultaneously active sources with two microphones corresponds to the overdetermined or the determined BSS case, respectively. Approaches based on independent component analysis (ICA) are well suited for both cases, merely under the assumption of statistical independence of the original source signals. Here, we consider a broadband BSS approach based on the TRINICON framework [17]. For the development of the BSS-based front-end, we implemented an efficient second-order-statistics (SOS) version of the TRINICON update rule [18]. While BSS recovers the original source signals from a (possibly reverberant) sound mixture without a priori knowledge about the locations of the sources, the BSS demixing filters also contain information on the source locations. One way to retrieve the localization information has been presented in [19]. It relies on the ability of a broadband BSS algorithm to perform blind adaptive identification of the acoustical environment for two microphone channels. Thus, two time-differences-of-arrival (TDOAs) can be extracted by identifying the highest peaks in the BSS filters, corresponding to the direct paths. Acoustic Event Classification and Speaker Identification for SSF In the foreseen scenario a classification step may be necessary to discriminate actual speech segments from other interfering events (phone ringing, sneezing, laughing...). The foreseen acoustic event classification (AECL) is based on a set of mel frequency cepstral coefficients (MFCCs) as acoustic signal features. A score is computed by comparing the observed feature vector with Gaussian mixture models (GMM), trained on examples of the considered acoustic events. The best match in terms of average likelihood provides the classification of the signal segment [2]. Moreover, when the classified event is speech, it may be necessary to classify the speaker identity as well. In this case a speaker identification (SID) capability must be introduced in the SSF, consisting of the two steps of feature extraction and score computation. The acoustic features are again MFCCs, while the scoring is accomplished by combining the results of two sub-systems implementing GMM and support vector machine (SVM), respectively [21]. In the GMM-based sub-system speaker dependent models are obtained through maximum a posteriori (MAP) adaptation of the mean vectors starting from a universal background model (UBM) that represents the background speaker population. In the SVM-based sub-system, elements belonging to non-linearly separable classes are discriminated on the basis of a binary classification, operated by non-linear kernel functions. When more than one speaker is active, the performance of SID is strongly related to the amount of residual interfering speech that may be present at the BSS output. The effect of BSS on SID performance will be further investigated. 4. Conclusions and outlook In this paper we presented the multichannel acoustic front-end of an already fully functional prototype for interactive TV, which has been developed within the EU-funded project DICIT. We also introduced an alternative architecture based on BSS, which extends the functionality of the BF-/SLOC-based approach for multi-user scenarios. The BF-/SLOC-based front-end supports one user whose movements can be tracked fast enough by the SLOC module, so that the combination of BF and AEC guarantees a good signal quality of the desired speech. The SSF can thus pass the user commands to a subsequent ASR while rejecting undesired residual disturbances. The front-end architecture accounts for computational constraints with an efficient
6 combination of a switched beamformer and AEC. As illustrated by experimental results, AEC filter coefficient buffering has proven a simple but effective strategy to improve AEC performance in the case of beam-switches. During the following months, this prototype performance will be evaluated by 18 test subjects. As a next step, and to overcome the limitation to single-user scenarios, BSS in conjunction with AEC will be used for both extraction and localization of multiple users. This also allows to drastically reduce the number of necessary microphones. For the BSS-based approach an adequate SSF module including speaker identification capabilities will be developed before a first comparative evaluation of both acoustic front-ends. In general, the short utterances that characterize the dialogue provide a persisting challenge to further optimize convergence speed of the involved adaptive filtering algorithms in AEC and BSS and to find decision criteria for beam-switching, localization, and SSF, which provide maximum reliability with a minimum amount of observation data. References [1] [2] J. Huang, M. Epstein, and M. Matassoni. Effective acoustic adaptation for a distant-talking interactive TV system. Proc. Interspeech 8, Brisbane, Australia, September 28. [3] M.S. Brandstein and D.B. Ward Eds. Microphone Arrays: Signal Processing Techniques and Applications. Springer, Berlin, 21. [4] E. Haensler and G. Schmidt. Acoustic Echo and Noise Control: A Practical Approach. Wiley, New York, 24. [5] W. Kellermann. Strategies for combining acoustic echo cancellation and adaptive beamforming microphone arrays. Proc. ICASSP 97, Munich, Germany, 1: , April [6] J. L. Flanagan, J. D. Johnson, R. Zahn, and G. W. Elko. Computer-steered microphone arrays for sound transduction in large rooms. JASA, 78(5): , November [7] W. Herbordt. Sound Capture for Human/Machine Interfaces. Springer, Berlin, 25. [8] T. I. Laakso, V. Vlimki, M. Karjalainen, and U. K. Laine. Splitting the unit delay - tools for fractional delay filter design. IEEE Sign. Proc. Mag., 13(1):3 6, January [9] H. Buchner, J. Benesty, and W. Kellermann. Generalized multichannel frequency-domain adaptive filtering: efficient realization and application to handsfree speech communication. 85(3):549 57, March 25. Signal Processing, [1] J. Herre, H. Buchner, and W. Kellermann. Acoustic echo cancellation for surround sound using perceptually motivated convergence enhancement. Proc. ICASSP 7, Honolulu, Hawaii, April 27. [11] R. DeMori. Spoken Dialogue with Computers. Academic Press, London, [12] C. Knapp and G. Carter. The generalized correlation method for estimation of time delay. IEEE Trans. ASSP, 24(4), [13] A. Brutti, M. Omologo, and P. Svaizer. Localization of multiple speakers based on a two step acoustic map analysis. Proc. ICASSP 8, Las Vegas, USA, April 28. [14] A. Brutti, L. Cristoforetti, W. Kellermann, L. Marquardt, and M. Omologo. WOZ acoustic data collection for interactive TV. Proc. LREC 8, Marrakech, Morocco, May 28. [15] A. Brutti, M. Omologo, and P. Svaizer. Speaker localization based on oriented global coherence field. Proc. Interspeech 6, Pittsburgh, USA, September 26. [16] A. Lombard, K. Reindl, and W. Kellermann. Combination of adaptive feedback cancellation and binaural adaptive filtering in hearing aids. Accepted in EURASIP Journal on Advances in Signal Processing, 29. [17] H. Buchner, R. Aichner, and W. Kellermann. TRINICON: A versatile framework for multichannel blind signal processing. Proc. ICASSP 4, Montreal, Canada, 3: , May 24. [18] H. Buchner, R. Aichner, and W. Kellermann. A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans. SAP, 13(1):12 134, January 25. [19] H. Buchner, R. Aichner, J. Stenglein, H. Teutsch, and W. Kellermann. Simultaneous localization of multiple sound sources using blind adaptive MIMO filtering. Proc. ICASSP 5, Philadelphia, USA, 3:97 1, March 25. [2] C. Zieger and M. Omologo. Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm. Proc. Interspeech 8, Brisbane, Australia, September 28. [21] C. Zieger, M. Omologo. Combination of clean, and contaminated GMM/SVM for far-field textindependent speaker verification. Proc. Interspeech 8, Brisbane, Australia, September 28.
WOZ Acoustic Data Collection For Interactive TV
WOZ Acoustic Data Collection For Interactive TV A. Brutti*, L. Cristoforetti*, W. Kellermann+, L. Marquardt+, M. Omologo* * Fondazione Bruno Kessler (FBK) - irst Via Sommarive 18, 38050 Povo (TN), ITALY
More informationFP6 IST
FP6 IST-034624 http://dicit.itc.it Deliverable 2.4 Hardware and Software Architecture for the Final STB Prototype Lead Authors Rajesh Balchandran Martin Labsky Affiliation IBM Research Date: August 20,
More informationDESIGNING OPTIMIZED MICROPHONE BEAMFORMERS
3235 Kifer Rd. Suite 100 Santa Clara, CA 95051 www.dspconcepts.com DESIGNING OPTIMIZED MICROPHONE BEAMFORMERS Our previous paper, Fundamentals of Voice UI, explained the algorithms and processes required
More informationTEPZZ A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: H04S 7/00 ( ) H04R 25/00 (2006.
(19) TEPZZ 94 98 A_T (11) EP 2 942 982 A1 (12) EUROPEAN PATENT APPLICATION (43) Date of publication: 11.11. Bulletin /46 (1) Int Cl.: H04S 7/00 (06.01) H04R /00 (06.01) (21) Application number: 141838.7
More informationTEPZZ 94 98_A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2015/46
(19) TEPZZ 94 98_A_T (11) EP 2 942 981 A1 (12) EUROPEAN PATENT APPLICATION (43) Date of publication: 11.11.1 Bulletin 1/46 (1) Int Cl.: H04S 7/00 (06.01) H04R /00 (06.01) (21) Application number: 1418384.0
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional
More informationFP6 IST
FP6 IST-034624 http://dicit.itc.it Deliverable 2.1 DICIT Architecture Tools, Standards, Hardware and Software for the First Prototypes Authors: Gregg Daggett Affiliations: IBM Date: 5-Oct-2007 Document
More informationSingle Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics
Master Thesis Signal Processing Thesis no December 2011 Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics Md Zameari Islam GM Sabil Sajjad This thesis is presented
More informationINTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION
INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for
More informationDigital Correction for Multibit D/A Converters
Digital Correction for Multibit D/A Converters José L. Ceballos 1, Jesper Steensgaard 2 and Gabor C. Temes 1 1 Dept. of Electrical Engineering and Computer Science, Oregon State University, Corvallis,
More informationMindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.
Andrew Robbins MindMouse Project Description: MindMouse is an application that interfaces the user s mind with the computer s mouse functionality. The hardware that is required for MindMouse is the Emotiv
More informationAutomatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting
Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced
More informationSmart Traffic Control System Using Image Processing
Smart Traffic Control System Using Image Processing Prashant Jadhav 1, Pratiksha Kelkar 2, Kunal Patil 3, Snehal Thorat 4 1234Bachelor of IT, Department of IT, Theem College Of Engineering, Maharashtra,
More informationMultiband Noise Reduction Component for PurePath Studio Portable Audio Devices
Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices Audio Converters ABSTRACT This application note describes the features, operating procedures and control capabilities of a
More informationKeywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Investigation
More informationDepartment of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement
Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine Project: Real-Time Speech Enhancement Introduction Telephones are increasingly being used in noisy
More informationTERRESTRIAL broadcasting of digital television (DTV)
IEEE TRANSACTIONS ON BROADCASTING, VOL 51, NO 1, MARCH 2005 133 Fast Initialization of Equalizers for VSB-Based DTV Transceivers in Multipath Channel Jong-Moon Kim and Yong-Hwan Lee Abstract This paper
More informationAudio-Based Video Editing with Two-Channel Microphone
Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science
More informationUsing the BHM binaural head microphone
11/17 Using the binaural head microphone Introduction 1 Recording with a binaural head microphone 2 Equalization of a recording 2 Individual equalization curves 5 Using the equalization curves 5 Post-processing
More informationSpeech and Speaker Recognition for the Command of an Industrial Robot
Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.
More informationImproving Frame Based Automatic Laughter Detection
Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for
More informationHow to Obtain a Good Stereo Sound Stage in Cars
Page 1 How to Obtain a Good Stereo Sound Stage in Cars Author: Lars-Johan Brännmark, Chief Scientist, Dirac Research First Published: November 2017 Latest Update: November 2017 Designing a sound system
More informationONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION. Hsin-Chu, Taiwan
ICSV14 Cairns Australia 9-12 July, 2007 ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION Percy F. Wang 1 and Mingsian R. Bai 2 1 Southern Research Institute/University of Alabama at Birmingham
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,
More informationThe Distortion Magnifier
The Distortion Magnifier Bob Cordell January 13, 2008 Updated March 20, 2009 The Distortion magnifier described here provides ways of measuring very low levels of THD and IM distortions. These techniques
More informationDigital Signal Processing Detailed Course Outline
Digital Signal Processing Detailed Course Outline Lesson 1 - Overview Many digital signal processing algorithms emulate analog processes that have been around for decades. Other signal processes are only
More informationComparison Parameters and Speaker Similarity Coincidence Criteria:
Comparison Parameters and Speaker Similarity Coincidence Criteria: The Easy Voice system uses two interrelating parameters of comparison (first and second error types). False Rejection, FR is a probability
More informationSpeech Recognition and Signal Processing for Broadcast News Transcription
2.2.1 Speech Recognition and Signal Processing for Broadcast News Transcription Continued research and development of a broadcast news speech transcription system has been promoted. Universities and researchers
More informationADAPTIVE DIFFERENTIAL MICROPHONE ARRAYS USED AS A FRONT-END FOR AN AUTOMATIC SPEECH RECOGNITION SYSTEM
ADAPTIVE DIFFERENTIAL MICROPHONE ARRAYS USED AS A FRONT-END FOR AN AUTOMATIC SPEECH RECOGNITION SYSTEM Elmar Messner, Hannes Pessentheiner, Juan A. Morales-Cordovilla, Martin Hagmüller Signal Processing
More informationTechniques for Extending Real-Time Oscilloscope Bandwidth
Techniques for Extending Real-Time Oscilloscope Bandwidth Over the past decade, data communication rates have increased by a factor well over 10X. Data rates that were once 1Gb/sec and below are now routinely
More informationNews from Rohde&Schwarz Number 195 (2008/I)
BROADCASTING TV analyzers 45120-2 48 R&S ETL TV Analyzer The all-purpose instrument for all major digital and analog TV standards Transmitter production, installation, and service require measuring equipment
More informationECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer
ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer by: Matt Mazzola 12222670 Abstract The design of a spectrum analyzer on an embedded device is presented. The device achieves minimum
More informationFigure 1: Feature Vector Sequence Generator block diagram.
1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.
More informationinter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE
Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 5.3 ACTIVE NOISE CONTROL
More informationFREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting
Page 1 of 10 1. SCOPE This Operational Practice is recommended by Free TV Australia and refers to the measurement of audio loudness as distinct from audio level. It sets out guidelines for measuring and
More informationAcoustic Measurements Using Common Computer Accessories: Do Try This at Home. Dale H. Litwhiler, Terrance D. Lovell
Abstract Acoustic Measurements Using Common Computer Accessories: Do Try This at Home Dale H. Litwhiler, Terrance D. Lovell Penn State Berks-LehighValley College This paper presents some simple techniques
More informationHEAD. HEAD VISOR (Code 7500ff) Overview. Features. System for online localization of sound sources in real time
HEAD Ebertstraße 30a 52134 Herzogenrath Tel.: +49 2407 577-0 Fax: +49 2407 577-99 email: info@head-acoustics.de Web: www.head-acoustics.de Data Datenblatt Sheet HEAD VISOR (Code 7500ff) System for online
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;
More informationSupervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing
Welcome Supervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing Jörg Houpert Cube-Tec International Oslo, Norway 4th May, 2010 Joint Technical Symposium
More informationDesign and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture
Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Vinaykumar Bagali 1, Deepika S Karishankari 2 1 Asst Prof, Electrical and Electronics Dept, BLDEA
More informationJournal of Theoretical and Applied Information Technology 20 th July Vol. 65 No JATIT & LLS. All rights reserved.
MODELING AND REAL-TIME DSK C6713 IMPLEMENTATION OF NORMALIZED LEAST MEAN SQUARE (NLMS) ADAPTIVE ALGORITHM FOR ACOUSTIC NOISE CANCELLATION (ANC) IN VOICE COMMUNICATIONS 1 AZEDDINE WAHBI, 2 AHMED ROUKHE,
More informationUNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT
UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT Stefan Schiemenz, Christian Hentschel Brandenburg University of Technology, Cottbus, Germany ABSTRACT Spatial image resizing is an important
More informationOBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST
OBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST Dr.-Ing. Renato S. Pellegrini Dr.- Ing. Alexander Krüger Véronique Larcher Ph. D. ABSTRACT Sennheiser AMBEO, Switzerland Object-audio workflows for traditional
More informationIP Telephony and Some Factors that Influence Speech Quality
IP Telephony and Some Factors that Influence Speech Quality Hans W. Gierlich Vice President HEAD acoustics GmbH Introduction This paper examines speech quality and Internet protocol (IP) telephony. Voice
More informationInvestigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing
Universal Journal of Electrical and Electronic Engineering 4(2): 67-72, 2016 DOI: 10.13189/ujeee.2016.040204 http://www.hrpub.org Investigation of Digital Signal Processing of High-speed DACs Signals for
More informationDetection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1
International Conference on Applied Science and Engineering Innovation (ASEI 2015) Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 1 China Satellite Maritime
More informationMusic Source Separation
Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or
More information2. AN INTROSPECTION OF THE MORPHING PROCESS
1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,
More informationInstrument Recognition in Polyphonic Mixtures Using Spectral Envelopes
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu
More informationEnhancing Music Maps
Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing
More informationCHARACTERIZATION OF END-TO-END DELAYS IN HEAD-MOUNTED DISPLAY SYSTEMS
CHARACTERIZATION OF END-TO-END S IN HEAD-MOUNTED DISPLAY SYSTEMS Mark R. Mine University of North Carolina at Chapel Hill 3/23/93 1. 0 INTRODUCTION This technical report presents the results of measurements
More informationPOST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS
POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music
More informationPiotr KLECZKOWSKI, Magdalena PLEWA, Grzegorz PYDA
ARCHIVES OF ACOUSTICS 33, 4 (Supplement), 147 152 (2008) LOCALIZATION OF A SOUND SOURCE IN DOUBLE MS RECORDINGS Piotr KLECZKOWSKI, Magdalena PLEWA, Grzegorz PYDA AGH University od Science and Technology
More informationOBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES
OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,
More informationWhite Paper. Uniform Luminance Technology. What s inside? What is non-uniformity and noise in LCDs? Why is it a problem? How is it solved?
White Paper Uniform Luminance Technology What s inside? What is non-uniformity and noise in LCDs? Why is it a problem? How is it solved? Tom Kimpe Manager Technology & Innovation Group Barco Medical Imaging
More informationVoice Controlled Car System
Voice Controlled Car System 6.111 Project Proposal Ekin Karasan & Driss Hafdi November 3, 2016 1. Overview Voice controlled car systems have been very important in providing the ability to drivers to adjust
More informationinter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE
Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 7.9 THE FUTURE OF SOUND
More informationCalibrate, Characterize and Emulate Systems Using RFXpress in AWG Series
Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series Introduction System designers and device manufacturers so long have been using one set of instruments for creating digitally modulated
More informationOVERVIEW. YAMAHA Electronics Corp., USA 6660 Orangethorpe Avenue
OVERVIEW With decades of experience in home audio, pro audio and various sound technologies for the music industry, Yamaha s entry into audio systems for conferencing is an easy and natural evolution.
More informationJoint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab
Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes Digital Signal and Image Processing Lab Simone Milani Ph.D. student simone.milani@dei.unipd.it, Summer School
More informationSkip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video
Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American
More informationEfficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas
Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied
More informationhit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.
CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating
More informationWhite Paper Measuring and Optimizing Sound Systems: An introduction to JBL Smaart
White Paper Measuring and Optimizing Sound Systems: An introduction to JBL Smaart by Sam Berkow & Alexander Yuill-Thornton II JBL Smaart is a general purpose acoustic measurement and sound system optimization
More informationVoice & Music Pattern Extraction: A Review
Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation
More informationChapter 6: Real-Time Image Formation
Chapter 6: Real-Time Image Formation digital transmit beamformer DAC high voltage amplifier keyboard system control beamformer control T/R switch array body display B, M, Doppler image processing digital
More informationTorsional vibration analysis in ArtemiS SUITE 1
02/18 in ArtemiS SUITE 1 Introduction 1 Revolution speed information as a separate analog channel 1 Revolution speed information as a digital pulse channel 2 Proceeding and general notes 3 Application
More informationAUDIOVISUAL COMMUNICATION
AUDIOVISUAL COMMUNICATION Laboratory Session: Recommendation ITU-T H.261 Fernando Pereira The objective of this lab session about Recommendation ITU-T H.261 is to get the students familiar with many aspects
More informationAutomatic Identification of Instrument Type in Music Signal using Wavelet and MFCC
Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Arijit Ghosal, Rudrasis Chakraborty, Bibhas Chandra Dhara +, and Sanjoy Kumar Saha! * CSE Dept., Institute of Technology
More informationDESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS
DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS P. Th. Savvopoulos. PhD., A. Apostolopoulos, L. Dimitrov 3 Department of Electrical and Computer Engineering, University of Patras, 65 Patras,
More informationEXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION
EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric
More informationPEP-II longitudinal feedback and the low groupdelay. Dmitry Teytelman
PEP-II longitudinal feedback and the low groupdelay woofer Dmitry Teytelman 1 Outline I. PEP-II longitudinal feedback and the woofer channel II. Low group-delay woofer topology III. Why do we need a separate
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationSummary of Speech Technology and Market Opportunities in the TV and Set-top Box Markets: hands-free remote control systems
Summary of Speech Technology and Market Opportunities in the TV and Set-top Box Markets: hands-free remote control systems DICIT Consortium 1 (IBM (Praha - Czech Republic, T.J Watson Research Center -
More informationAutomatic Rhythmic Notation from Single Voice Audio Sources
Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung
More informationMUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)
More informationDISTRIBUTION STATEMENT A 7001Ö
Serial Number 09/678.881 Filing Date 4 October 2000 Inventor Robert C. Higgins NOTICE The above identified patent application is available for licensing. Requests for information should be addressed to:
More informationMPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND
MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND Aleksander Kaminiarz, Ewa Łukasik Institute of Computing Science, Poznań University of Technology. Piotrowo 2, 60-965 Poznań, Poland e-mail: Ewa.Lukasik@cs.put.poznan.pl
More informationOPTICAL POWER METER WITH SMART DETECTOR HEAD
OPTICAL POWER METER WITH SMART DETECTOR HEAD Features Fast response (over 1000 readouts/s) Wavelengths: 440 to 900 nm for visible (VIS) and 800 to 1700 nm for infrared (IR) NIST traceable Built-in attenuator
More informationA New "Duration-Adapted TR" Waveform Capture Method Eliminates Severe Limitations
31 st Conference of the European Working Group on Acoustic Emission (EWGAE) Th.3.B.4 More Info at Open Access Database www.ndt.net/?id=17567 A New "Duration-Adapted TR" Waveform Capture Method Eliminates
More informationCURRICULUM VITAE John Usher
CURRICULUM VITAE John Usher John_Usher-AT-me.com Education: Ph.D. Audio upmixing signal processing and sound quality evaluation. 2006. McGill University, Montreal, Canada. Dean s Honours List Recommendation.
More informationAR SWORD Digital Receiver EXciter (DREX)
Typical Applications Applied Radar, Inc. Radar Pulse-Doppler processing General purpose waveform generation and collection Multi-channel digital beamforming Military applications SIGINT/ELINT MIMO and
More information6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016
6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that
More informationInternational Journal of Engineering Research-Online A Peer Reviewed International Journal
RESEARCH ARTICLE ISSN: 2321-7758 VLSI IMPLEMENTATION OF SERIES INTEGRATOR COMPOSITE FILTERS FOR SIGNAL PROCESSING MURALI KRISHNA BATHULA Research scholar, ECE Department, UCEK, JNTU Kakinada ABSTRACT The
More informationReconfigurable Neural Net Chip with 32K Connections
Reconfigurable Neural Net Chip with 32K Connections H.P. Graf, R. Janow, D. Henderson, and R. Lee AT&T Bell Laboratories, Room 4G320, Holmdel, NJ 07733 Abstract We describe a CMOS neural net chip with
More informationPulseCounter Neutron & Gamma Spectrometry Software Manual
PulseCounter Neutron & Gamma Spectrometry Software Manual MAXIMUS ENERGY CORPORATION Written by Dr. Max I. Fomitchev-Zamilov Web: maximus.energy TABLE OF CONTENTS 0. GENERAL INFORMATION 1. DEFAULT SCREEN
More informationStandard Definition. Commercial File Delivery. Technical Specifications
Standard Definition Commercial File Delivery Technical Specifications (NTSC) May 2015 This document provides technical specifications for those producing standard definition interstitial content (commercial
More informationMUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES
MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University
More informationArea-Efficient Decimation Filter with 50/60 Hz Power-Line Noise Suppression for ΔΣ A/D Converters
SICE Journal of Control, Measurement, and System Integration, Vol. 10, No. 3, pp. 165 169, May 2017 Special Issue on SICE Annual Conference 2016 Area-Efficient Decimation Filter with 50/60 Hz Power-Line
More informationTHE importance of music content analysis for musical
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With
More informationWhat s New in Raven May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven
What s New in Raven 1.3 16 May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven 1.2.1. Extensible multi-channel audio input device support
More informationHow to use the DC Live/Forensics Dynamic Spectral Subtraction (DSS ) Filter
How to use the DC Live/Forensics Dynamic Spectral Subtraction (DSS ) Filter Overview The new DSS feature in the DC Live/Forensics software is a unique and powerful tool capable of recovering speech from
More informationIntroduction. Edge Enhancement (SEE( Advantages of Scalable SEE) Lijun Yin. Scalable Enhancement and Optimization. Case Study:
Case Study: Scalable Edge Enhancement Introduction Edge enhancement is a post processing for displaying radiologic images on the monitor to achieve as good visual quality as the film printing does. Edges
More informationCM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.
CARDIFF UNIVERSITY EXAMINATION PAPER Academic Year: 2013/2014 Examination Period: Examination Paper Number: Examination Paper Title: Duration: Autumn CM3106 Solutions Multimedia 2 hours Do not turn this
More informationA. Ideal Ratio Mask If there is no RIR, the IRM for time frame t and frequency f can be expressed as [17]: ( IRM(t, f) =
1 Two-Stage Monaural Source Separation in Reverberant Room Environments using Deep Neural Networks Yang Sun, Student Member, IEEE, Wenwu Wang, Senior Member, IEEE, Jonathon Chambers, Fellow, IEEE, and
More informationMajor Differences Between the DT9847 Series Modules
DT9847 Series Dynamic Signal Analyzer for USB With Low THD and Wide Dynamic Range The DT9847 Series are high-accuracy, dynamic signal acquisition modules designed for sound and vibration applications.
More informationOn the Characterization of Distributed Virtual Environment Systems
On the Characterization of Distributed Virtual Environment Systems P. Morillo, J. M. Orduña, M. Fernández and J. Duato Departamento de Informática. Universidad de Valencia. SPAIN DISCA. Universidad Politécnica
More informationImplementation of a turbo codes test bed in the Simulink environment
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 Implementation of a turbo codes test bed in the Simulink environment
More informationDigital Signal Processing. Prof. Dietrich Klakow Rahil Mahdian
Digital Signal Processing Prof. Dietrich Klakow Rahil Mahdian Language Teaching: English Questions: English (or German) Slides: English Tutorials: one English and one German group Exercise sheets: most
More informationAppendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong
Appendix D UW DigiScope User s Manual Willis J. Tompkins and Annie Foong UW DigiScope is a program that gives the user a range of basic functions typical of a digital oscilloscope. Included are such features
More information