The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings

Similar documents
On viewing distance and visual quality assessment in the age of Ultra High Definition TV

Artefacts as a Cultural and Collaborative Probe in Interaction Design

Masking effects in vertical whole body vibrations

Sound quality in railstation : users perceptions and predictability

Laurent Romary. To cite this version: HAL Id: hal

Reply to Romero and Soria

QUEUES IN CINEMAS. Mehri Houda, Djemal Taoufik. Mehri Houda, Djemal Taoufik. QUEUES IN CINEMAS. 47 pages <hal >

Compte-rendu : Patrick Dunleavy, Authoring a PhD. How to Plan, Draft, Write and Finish a Doctoral Thesis or Dissertation, 2007

PaperTonnetz: Supporting Music Composition with Interactive Paper

On the Citation Advantage of linking to data

Influence of lexical markers on the production of contextual factors inducing irony

Interactive Collaborative Books

REBUILDING OF AN ORCHESTRA REHEARSAL ROOM: COMPARISON BETWEEN OBJECTIVE AND PERCEPTIVE MEASUREMENTS FOR ROOM ACOUSTIC PREDICTIONS

Multipitch estimation by joint modeling of harmonic and transient sounds

Embedding Multilevel Image Encryption in the LAR Codec

Learning Geometry and Music through Computer-aided Music Analysis and Composition: A Pedagogical Approach

Workshop on Narrative Empathy - When the first person becomes secondary : empathy and embedded narrative

Motion blur estimation on LCDs

Open access publishing and peer reviews : new models

La convergence des acteurs de l opposition égyptienne autour des notions de société civile et de démocratie

Spectral correlates of carrying power in speech and western lyrical singing according to acoustic and phonetic factors

No title. Matthieu Arzel, Fabrice Seguin, Cyril Lahuec, Michel Jezequel. HAL Id: hal

A new conservation treatment for strengthening and deacidification of paper using polysiloxane networks

The Brassiness Potential of Chromatic Instruments

Musical instrument identification in continuous recordings

An overview of Bertram Scharf s research in France on loudness adaptation

A PRELIMINARY STUDY ON THE INFLUENCE OF ROOM ACOUSTICS ON PIANO PERFORMANCE

Adaptation in Audiovisual Translation

Opening Remarks, Workshop on Zhangjiashan Tomb 247

From SD to HD television: effects of H.264 distortions versus display size on quality of experience

Corpus-Based Transcription as an Approach to the Compositional Control of Timbre

Creating Memory: Reading a Patching Language

Natural and warm? A critical perspective on a feminine and ecological aesthetics in architecture

A study of the influence of room acoustics on piano performance

Effects of headphone transfer function scattering on sound perception

Philosophy of sound, Ch. 1 (English translation)

DESIGNING OPTIMIZED MICROPHONE BEAMFORMERS

A joint source channel coding strategy for video transmission

Indexical Concepts and Compositionality

Regularity and irregularity in wind instruments with toneholes or bells

Editing for man and machine

Translating Cultural Values through the Aesthetics of the Fashion Film

Primo. Michael Cotta-Schønberg. To cite this version: HAL Id: hprints

Synchronization in Music Group Playing

Releasing Heritage through Documentary: Avatars and Issues of the Intangible Cultural Heritage Concept

Pseudo-CR Convolutional FEC for MCVideo

A Pragma-Semantic Analysis of the Emotion/Sentiment Relation in Debates

AutoPRK - Automatic Drum Player

Using the BHM binaural head microphone

Translation as an Art

A new HD and UHD video eye tracking dataset

Stories Animated: A Framework for Personalized Interactive Narratives using Filtering of Story Characteristics

Visual Annoyance and User Acceptance of LCD Motion-Blur

Panaray 802 Series III TECHNICAL DATA SHEET. loudspeaker. Key Features. Product Overview. Technical Specifications

Perceptual assessment of water sounds for road traffic noise masking

Sonic Ambiances Bruitage -Recordings of the Swiss International Radio in the Context of Media Practices and Cultural Heritage

Improvisation Planning and Jam Session Design using concepts of Sequence Variation and Flow Experience

Calibration of auralisation presentations through loudspeakers

StepArray+ Self-powered digitally steerable column loudspeakers

Artifactualization: Introducing a new concept.

pitch estimation and instrument identification by joint modeling of sustained and attack sounds.

The multimodal dining experience - A case study of space, sound and locality

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

DSP Monitoring Systems. dsp GLM. AutoCal TM

Coming in and coming out underground spaces

HEAD. HEAD VISOR (Code 7500ff) Overview. Features. System for online localization of sound sources in real time

Mechanical response characterization of saxophone reeds

Under the shadow of global cinematic metropoles: the case-study of Athens

Academic librarians and searchers: A new collaboration sets the path towards research project success

Performing a Sound Level Measurement

A framework for aligning and indexing movies with their script

Comparison of De-embedding Methods for Long Millimeter and Sub-Millimeter-Wave Integrated Circuits

Musicians on Jamendo: A New Model for the Music Industry?

ACOUSTICAL SOLUTIONS IN MODERN ARCHITECTURE

Review of A. Nagy (2017) *Des pronoms au texte. Etudes de linguistique textuelle*

A Comparative Study of Variability Impact on Static Flip-Flop Timing Characteristics

The 2015 Signal Separation Evaluation Campaign

A comparative case study of indoor soundscape approach on objective analyses and subjective evaluations of libraries

OMaxist Dialectics. Benjamin Lévy, Georges Bloch, Gérard Assayag

Signal Processing. Case Study - 3. It s Too Loud. Hardware. Sound Levels

Measures and models of real triodes, for the simulation of guitar amplifiers

Spectroscopy on Thick HgI 2 Detectors: A Comparison Between Planar and Pixelated Electrodes

OBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST

Some problems for Lowe s Four-Category Ontology

AT5040 White Paper Final 10/01/12

Noise assessment in a high-speed train

Product Information. EIB 700 Series External Interface Box

Natural Radio. News, Comments and Letters About Natural Radio January 2003 Copyright 2003 by Mark S. Karney

Using Multidimensional Sequences For Improvisation In The OMax Paradigm

Industry IoT Gateway for Cloud Connectivity

Autoregressive hidden semi-markov model of symbolic music performance for score following

On the visual display of audio data using stacked graphs

Localization of Noise Sources in Large Structures Using AE David W. Prine, Northwestern University ITI, Evanston, IL, USA

AcoustiSoft RPlusD ver

Video summarization based on camera motion and a subjective evaluation method

CBT 70J Constant Beamwidth Technology

CBT 100LA Constant Beamwidth Technology

Hybrid active noise barrier with sound masking

3-D position sensitive CdZnTe gamma-ray spectrometers

Log-detector. Sweeper setup using oscilloscope as XY display

Transcription:

The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings Joachim Thiemann, Nobutaka Ito, Emmanuel Vincent To cite this version: Joachim Thiemann, Nobutaka Ito, Emmanuel Vincent. The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings. 21st International Congress on Acoustics, Jun 2013, Montreal, Canada. 2013, <10.5281/zenodo.1227120>. <hal-007967> HAL Id: hal-007967 https://hal.inria.fr/hal-007967 Submitted on 8 Mar 2013 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

BACKGROUND In audio recordings outside of controlled studio setups, the presence of acoustic background noise is a simple fact of life. As a result, there is continued interest in developing methods to disentangle some sound of interest from that background noise [1, 2, 3]. Typically, the development and evaluation of algorithms that separate, reduce, or remove acoustic background noise uses setups with controlled or simulated environments [4]. Such artificial setups are, in general, sparse in terms of noise sources. This is a poor substitute for real acoustic background noise and does not sound natural to casual listeners evaluating the performance of the processing techniques being developed. The solution then is to record noise in various environments, based on the targeted use of the algorithm in question. If these recordings are made available to the research community, such recordings can be used as reference points for researchers to compare each others work. There are now several real-world noise databases, for example the AURORA-2 corpus [5], the CHiME background noise data [6], and the NOISEX-92 database [7]. Unfortunately, these databases provide only a very limited variety of environments, are limited to at most 2 channels and, with the exception of CHiME, are not free. In current research projects, we are investigating the use of source separation algorithms and beamforming techniques for signal enhancement and acoustic noise suppression/removal where the signal is captured using a multi-microphone array. Since the above mentioned databases do not provide more than two channels, we decided to create our own set of recordings and make these available under a Creative Commons Attribution-ShareAlike 3.0 Unported [8] license for general distribution. These recordings can be found at http://www.irisa.fr/metiss/demand/. PHYSICAL CHARACTERISTICS OF THE MICROPHONE ARRAY For the recordings, we built a planar array of 16 microphones supported by a structure of metal rods with cross-braces to avoid deformation. The 16 microphones were arranged in 4 staggered rows, with 5 cm spacing of each microphone from its immediate neighbours. Using this arrangement, the array could also be regarded as smaller linear arrays (in three directions) and smaller crystal arrays [9]. In all recordings, the plane of the array was parallel to the ground. The array was mounted on a standard microphone tripod at a height of 1.5 m. The tolerances of the construction were such that the actual locations of the microphones were within 2 mm of the design. Figure 1 presents the schematic of the physical design and shows a photograph of the actual array. MICROPHONES, AMPLIFIER, AND A/D CONVERTER The array used 16 Sony ECM-C10 omnidirectional electret condenser microphones. They were connected to an Inrevium / Tokyo Electron Device TD-BD-16ADUSB USB soundcard, which internally used Asahi Kasei AK4563A 16-bit A/D converters with internal preamps. The soundcard was connected to laptops running either Microsoft Windows or the Linux operating system. The choice of operating system did not affect the recordings. DATABASE DESIGN The database of recordings was designed to consist of six broad categories, with three environments being recorded within each category. Four of these categories consisted of enclosed

Tripod Mount 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (a) (b) FIGURE 1: (a) Diagram of microphone array layout. Black dots indicate microphone locations with channel numbers. Grey dots show connection bolts. (b) Photograph of microphone array. spaces, with the remaining two containing recordings done outdoors. The indoor environments were classified as Domestic, Office, Public, and Transportation; the open air environments were Nature and Street. This emphasis on indoor environments reflected the fact that the microphone array used for the recordings was not well suited to outdoor use, especially in less than ideal weather conditions (the array had no protection against rain or wind). Descriptions of the categories and the recordings within each category are given in Table 1. TABLE 1: Noise Database structure: Categories and recordings in each category Category Environment Description Domestic DKITCHEN inside a kitchen during the preparation of food DLIVINGR inside a living room DWASHING domestic washroom with washing machine running Office OHALLWAY a hallway inside an office building with occasional traffic OMEETING a meeting room while the microphone array is discussed OOFFICE a small office with a three people using computers Public PCAFETER a busy office cafeteria PRESTO a university restaurant at lunchtime PSTATION the main transfer area of a busy subway station Transportation TBUS a public transit bus TCAR a private passenger vehicle TMETRO a subway Nature NFIELD a sports field with activity nearby NPARK a well-visited city park NRIVER a creek of running water Street SCAFE the terrace of a cafe at a public square SPSQUARE a public town square with many tourists STRAFFIC a busy traffic intersection All recordings were made in Rennes (France) and its immediate vicinity, in the period between May and August 2012. Three environments were omitted from the initial release of the

database: DLIVINGR, NRIVER, and SCAFE. These will be added to the database when conditions are suitable to perform more field recordings. PROPERTIES OF THE SOUND RECORDINGS The recordings were captured at a sampling rate of 48 khz and with a target length of 5 minutes (300 s). Actual audio capture time was somewhat longer thereby allowing us to remove set-up noises and other artefacts by trimming. However, the recordings were not spliced yielding a single uninterrupted, contiguous time segment for each. The recorded signals were not subject to any gain normalization. Therefore, the original noise power in each environment was preserved. Given the size of the microphone array compared to the distances of the noise sources in each environment, we expected that the overall level of sound at each microphone would be roughly equal, barring occlusion effects from the support structure. However, the microphones of the array were electret microphones and contained internal preamplifiers. They were not calibrated with respect to each other, and so gain variations were expected (the data sheet for the microphones specifies a tolerance in sensitivity of 3.5 db [10]). Table 2 shows the calibration data obtained by placing a 01dB Cal21 calibrator at each microphone and recording the peak amplitude relative to full scale with a 1 khz 94 dbspl sinusoidal signal present at each microphone. TABLE 2: Calibration data for all channels of the array using a 1 khz 94dB SPL signal at each microphone. Values are given in db (peak) relative to the full scale of the sample data type. Channel 1 2 3 4 5 6 7 8 dbfs -23.4-23.4-22.9-24.2-22.5-23.2-23.6-23.0 Channel 9 10 11 12 13 14 15 16 dbfs -22.4-25.4-24.5-22.9-23.7-23.0-23.1-23.3 Figure 2 shows the loudness profiles for each of the recordings in dbspl (A-weighted) at microphone one. The loudness was calculated using the slow response (window size of about 1 s). The database contains recordings that are very even in terms of loudness (PRESTO, PSTATION, NFIELD) and others that vary by almost 30 db (DKITCHEN, TMETRO, STRAFFIC). Note that in one recording (TBUS), the signal was slightly clipped in some channels but, since this was mostly due to low-frequency vibration from the ground, it did not show up in the loudness profile. CONCLUSION Freely-available noise recordings are a valuable resource to research audio processing algorithms. The DEMAND database is a free Creative Commons licensed set of 16-channel recordings of noise, which to the best of our knowledge is not available from other sources. We hope that this database will prove useful to students and experienced researchers. ACKNOWLEDGMENTS This work was supported by INRIA under the Associate Team Program VERSAMUS (http://versamus.inria.fr). The authors would also like to thank Dan Freed of EarLens Corp. for advice regarding the calibration.

DKITCHEN DWASHING OHALLWAY OMEETING PRESTO TCAR NPARK OOFFICE PSTATION TMETRO SPSQUARE PCAFETER TBUS NFIELD STRAFFIC FIGURE 2: Loudness profiles of recordings, in dbspl (A-weighted), measured at channel one. The horizontal axis shows the time in seconds.

REFERENCES [1] M. S. Brandstein and D. B. Ward, eds., Microphone Arrays: Signal Processing Techniques and Applications (Springer) (2001). [2] S. Makino, T.-W. Lee, and H. Sawada, eds., Blind Speech Separation (Springer) (2007). [3] E. Vincent and Y. Deville, Audio applications, in Handbook of Blind Source Separation, Independent Component Analysis and Applications, edited by P. Comon and C. Jutten, 779 819 (Academic Press) (2010), URL http://hal.inria.fr/inria-00544027. [4] E. Vincent, S. Araki, F. J. Theis, G. Nolte, P. Bofill, H. Sawada, A. Ozerov, B. V. Gowreesunker, D. Lutter, and N. Q. K. Duong, The Signal Separation Evaluation Campaign (2007-2010): Achievements and Remaining Challenges, Signal Processing 92, 1928 1936 (2012), URL http://hal.inria.fr/inria-00630985. [5] European Language Resources Association, Aurora project database 2.0, (2008), URL http://catalog.elra.info/product_info.php?products_id=693. [6] J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, The PASCAL CHiME speech separation and recognition challenge, Computer Speech and Language In Press (2012), URL http://spandh.dcs.shef.ac.uk/projects/chime/pcc/datasets.html. [7] Noisex-92 database, (1992), URL http://www.speech.cs.cmu.edu/comp.speech/section1/data/noisex.html. [8] Creative Commons, Creative commons attribution-sharealike 3.0 unported, (2013), URL http://creativecommons.org/licenses/by-sa/3.0/deed.en_ca. [9] N. Ito, H. Shimizu, N. Ono, and S. Sagayama, Diffuse noise suppression using crystal-shaped microphone arrays, Audio, Speech, and Language Processing, IEEE Transactions on 19, 2101 2110 (2011). [10] Sony Corporation, ECM-C115/C10/CS10 Electret Condenser Microphone (2004).