Capturing Sound by Light: Towards Massive Channel Audio Sensing via LEDs and Video Cameras

Similar documents
HEVC/H.265 CODEC SYSTEM AND TRANSMISSION EXPERIMENTS AIMED AT 8K BROADCASTING

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

METHOD, COMPUTER PROGRAM AND APPARATUS FOR DETERMINING MOTION INFORMATION FIELD OF THE INVENTION

Exhibits. Open House. NHK STRL Open House Entrance. Smart Production. Open House 2018 Exhibits

Data Converters and DSPs Getting Closer to Sensors

Digital Video Telemetry System

Mobile Phone Camera-Based Indoor Visible Light Communications With Rotation Compensation

Transmission System for ISDB-S

Digital Television Fundamentals

DSP in Communications and Signal Processing

ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION. Hsin-Chu, Taiwan

METHODS TO ELIMINATE THE BASS CANCELLATION BETWEEN LFE AND MAIN CHANNELS

INTERNATIONAL TELECOMMUNICATION UNION GENERAL ASPECTS OF DIGITAL TRANSMISSION SYSTEMS PULSE CODE MODULATION (PCM) OF VOICE FREQUENCIES

IEEE802.11a Based Wireless AV Module(WAVM) with Digital AV Interface. Outline

Area-Efficient Decimation Filter with 50/60 Hz Power-Line Noise Suppression for ΔΣ A/D Converters

New GRABLINK Frame Grabbers

PSEUDO NO-DELAY HDTV TRANSMISSION SYSTEM USING A 60GHZ BAND FOR THE TORINO OLYMPIC GAMES

RECOMMENDATION ITU-R BT.1201 * Extremely high resolution imagery

Scalable Foveated Visual Information Coding and Communications

Pivoting Object Tracking System

Striking Clarity, Unparalleled Flexibility, Precision Control

Software Analog Video Inputs

Will Widescreen (16:9) Work Over Cable? Ralph W. Brown

Hidden melody in music playing motion: Music recording using optical motion tracking system

System Quality Indicators

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS

Set-Top Box Video Quality Test Solution

Major Differences Between the DT9847 Series Modules

Digital Video Engineering Professional Certification Competencies

Public exhibition information

Personal Mobile DTV Cellular Phone Terminal Developed for Digital Terrestrial Broadcasting With Internet Services

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

Film Grain Technology

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

On the Characterization of Distributed Virtual Environment Systems

RX460 4GB PCIEX16 4 X DisplayPort

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Simple LCD Transmitter Camera Receiver Data Link

Prototype Model of Li-Fi Technology using Visible Light Communication

Uncompressed 8K Video Streaming over 100Gbps Experimental Network

Digital Representation

Interactive Virtual Laboratory for Distance Education in Nuclear Engineering. Abstract

SCode V3.5.1 (SP-601 and MP-6010) Digital Video Network Surveillance System

THE NOISE PERFORMANCE OF EVALUATION BOARDS FOR A UNIVERSAL TRANSDUCER INTERFACE WITH USB CONNECTION

Scalable Low cost Ultrasound Beam former

SPATIAL LIGHT MODULATORS

Digital Audio and Video Fidelity. Ken Wacks, Ph.D.

Journal of Theoretical and Applied Information Technology 20 th July Vol. 65 No JATIT & LLS. All rights reserved.

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

A. All equipment and materials used shall be standard components that are regularly manufactured and used in the manufacturer s system.

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR NPTEL ONLINE CERTIFICATION COURSE. On Industrial Automation and Control

Constant Bit Rate for Video Streaming Over Packet Switching Networks

A low-power portable H.264/AVC decoder using elastic pipeline

IoT-based Monitoring System using Tri-level Context Making for Smart Home Services

SCode V3.5.1 (SP-501 and MP-9200) Digital Video Network Surveillance System

International Journal of Engineering Research-Online A Peer Reviewed International Journal

PROMAX NEWSLETTER Nº 25. Ready to unveil it?

DEVELOPMENT OF WDM OPTICAL TRANSMISSION SYSTEM OVER GI-POF PAIR CABLE FOR TELEVISION RF, GIGABIT-ETHERNET, AND HDMI/DVI

6.111 Project Proposal IMPLEMENTATION. Lyne Petse Szu-Po Wang Wenting Zheng

Understanding Compression Technologies for HD and Megapixel Surveillance

Optical shift register based on an optical flip-flop memory with a single active element Zhang, S.; Li, Z.; Liu, Y.; Khoe, G.D.; Dorren, H.J.S.

Jupiter PixelNet. The distributed display wall system. infocus.com

High Performance Real-Time Software Asynchronous Sample Rate Converter Kernel

VGA Configuration Algorithm using VHDL

New Products and Features on Display at the 2012 IBC Show

5G New Radio Technology and Performance. Amitava Ghosh Nokia Bell Labs July 20 th, 2017

A Study of Encoding and Decoding Techniques for Syndrome-Based Video Coding

Introduction to Data Conversion and Processing

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

OPEN STANDARD GIGABIT ETHERNET LOW LATENCY VIDEO DISTRIBUTION ARCHITECTURE

Optimized design for controlling LED display matrix by an FPGA board

Alain Legault Hardent. Create Higher Resolution Displays With VESA Display Stream Compression

TERRESTRIAL broadcasting of digital television (DTV)

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

THE USE OF forward error correction (FEC) in optical networks

IQDEC01. Composite Decoder, Synchronizer, Audio Embedder with Noise Reduction - 12 bit. Does this module suit your application?

LabView Exercises: Part II

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

PROTOTYPING AN AMBIENT LIGHT SYSTEM - A CASE STUDY

RF Technology for 5G mmwave Radios

IEEE P a. IEEE P Wireless Personal Area Networks. hybrid modulation schemes and cameras ISC modes

Understanding Multimedia - Basics

About... D 3 Technology TM.

3DTV: Technical Challenges for Realistic Experiences

PixelNet. Jupiter. The Distributed Display Wall System. by InFocus. infocus.com

A better way to get visual information where you need it.

A320 Supplemental Digital Media Material for OS

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

Adaptive Key Frame Selection for Efficient Video Coding

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Chapter 1. Introduction to Digital Signal Processing

Entrance Hall Exhibition

TEPZZ A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: H04S 7/00 ( ) H04R 25/00 (2006.

TEPZZ 94 98_A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2015/46

CHARACTERIZATION OF END-TO-END DELAYS IN HEAD-MOUNTED DISPLAY SYSTEMS

A better way to get visual information where you need it.

Spatial Light Modulators XY Series

Transcription:

: New Developments in Communication Science Capturing Sound by Light: Towards Massive Channel Audio Sensing via LEDs and Video Cameras Gabriel Pablo Nava, Yoshifumi Shiraki, Hoang Duy Nguyen, Yutaka Kamamoto, Takashi G. Sato, Noboru Harada, and Takehiro Moriya Abstract We envision the future of sound sensing as large acoustic sensor networks present in wide spaces providing highly accurate noise cancellation and ultra-realistic environmental sound recording. To achieve this, we developed a real-time system capable of recording the audio signals of large microphone arrays by exploiting the parallel-data transmission feature offered by free-space optical communication technology based on light-emitting diodes and a high speed camera. Typical audio capturing technologies face limitations in complexity, bandwidth, and the cost of deployment when aiming for large scalability. In this article, we introduce a prototype that can be easily scaled up to 2 audio channels, which is the world s first and largest real-time optical-wireless sound acquisition system to date. Keywords: microphone array, free-space optical communication, beamforming. Introduction Imagine the TV broadcast of a live event taking place in a noisy, wide open environment. At the user end, it is often desired to have not only high quality image reproduction but also highly realistic sound that gives a clear impression of the event []. This can be achieved by the use of microphone arrays. According to the theory of sensor array signal processing [2], it is possible to listen to a particular sound from a desired location, and also to suppress the noise from the surroundings, by properly aligning and mixing the audio signals recorded by a microphone array. The theory also indicates that large microphone arrays produce remarkable sound enhancement. Examples have been demonstrated with arrays of microphones [3]. However, to accurately record a three-dimensional sound field at a rate of up to 4 khz and impinging from every direction on a 2-m 2 wall of a room, an array of about 25 microphones would be needed. Moreover, if the space under consideration is larger, for example a concert hall, tens of thousands of microphones might be necessary. Unfortunately, typical wired microphones and audio recording hardware have limitations in terms of complexity and cost of deployment when the objective is large scalability. Furthermore, the use of multiple wireless microphones is constrained by radio frequency (RF) bandwidth issues. To overcome these difficulties, we developed a prototype that allows the simultaneous capture of multichannel audio signals from a large number of microphones (currently up to 2). In contrast with existing RF wireless audio interfaces, the proposed system relies on free-space optical transmission of digital signals. Such technology allows the parallel transmission of multiple data channels, each with full bandwidth capacity regardless of the number of channels transmitted. NTT Technical Review

High speed camera Optical wireless acoustic sensors Parallel processing server Noise Sound source Noise A/D A/D A/D 2 microphones A/D: analog to digital FIR: finite impulse response PCM: pulse code modulation Optical signals (PCM 6 bits) Lens Imaging sensor 6, frames/s Image processing Image processing Image processing Decoding Decoding Decoding FIR Filter FIR Filter FIR Filter Real-time processing in general-purpose graphics processing unit (GPGPU) Audio signal of sound source Fig.. Architecture of the multichannel audio acquisition system. 2. System description Our system is composed of three main parts: ) an optical wireless acoustic sensor (OWAS), 2) a high speed camera, and 3) a parallel processing server. The overall architecture of the system is illustrated in Fig.. 2. OWAS device An OWAS device is shown in Fig. 2. The microphone picks up samples of the acoustic waves at a rate of 6 khz and outputs a delta-sigma modulated digital stream. Then, a microcontroller converts that serial data into binary symbols of 6-bit pulse code modulation (PCM). The PCM symbols are used to light up an array of 6 light-emitting diodes (LEDs). An LED in the ON or OFF state means a binary or, respectively. Because the camera can observe several OWASs simultaneously, the sound field can be sensed with a large array of OWASs such as the one shown in Fig. 3, where 2 OWAS devices have been arranged in a 5 4-node grid. With this array, our current experimental setup can acquire signals from 2 OWAS devices as allowed by the maximum image size of the camera. Each OWAS device is also equipped with an infrared photosensor that enables it to receive the master clock signal emitted by the pulse generator shown in Fig. 4. Therefore, the synchronization between the OWAS devices and the high speed camera is maintained through the master clock generator. Microcontroller (back side) 3 cm Fig. 2. Photo of OWAS device. Microphone LEDs Infrared photosensor 2.2 High speed camera The imaging sensor of the high speed camera observes the optical signals from the OWAS and records them into intensity images at the rate of 6, frames per second (fps). An example of the actual images captured by the camera is shown in Fig. 5. To transfer the image data from the camera to the processing server, the camera is connected to a frame grabber card installed on the PCI (Peripheral Component Interconnect) bus of the server. Thus, the flow of the image data can be seen in Fig. 6. 2.3 Parallel processing server The server is equipped with dual CPU (central processing unit) support and a standard general-purpose graphics processing unit (GPGPU), which provides enough massive parallel computing power to process 6, images and 2 audio channels within a Vol. 2 No. Nov. 24 2

One OWAS device Fig. 3. Array of 2 acoustic sensors (5 4). Pulse generator Infrared LEDs Fig. 4. Infrared transmitter for camera-sensor synchronization. real-time factor of.75. The process to decode the audio signals from the images starts with the detection of the LEDs on the images. Several image processing algorithms have been proposed to accomplish this [4], and it was suggested that the optical transmission channel can be modeled as a MIMO (multiple input multiple output) port as shown in Fig. 7. With this model, the pixels of the images can be organized into clusters C by analyzing the spatiotemporal correlation of their intensity signals s across a block of captured frames, as shown in Fig. 8. Each cluster of pixels represents a detected LED on the images. Once the pixels for each LED have been identified, their intensity signals are optimally thresholded to convert them back to binary symbols (see Fig. 7). Finally, the binary data is decoded to obtain the originally transmitted audio signals from all the OWAS devices. These audio signals are further processed with digital filters and mixed down to produce a single output channel (see Fig. ). This multichannel signal processing, often known as beamforming [2], enhances the sound from the desired direction (with respect to the OWAS array), while the noise from other directions is reduced at the output audio signal. In other words, the OWAS array can be acoustically focused in any desired direction. In our preliminary experiments, we have been able to focus on the sound from targets placed as far as m away from the OWAS array while suppressing the noise from the surroundings. An online demonstration video showing the experimental setup is available [6]. Furthermore, we have also achieved the optical transmission of multichannel signals at a distance of 3 m from the receiver camera [5]. Our system also has large scalability. Our numerical simulations indicated that our algorithms can receive and decode the optical signals of as many as 2, OWAS devices simultaneously within a single GPGPU card, therefore maintaining real-time processing, as can be seen in Fig. 9. 3. Future work The current limitation we face in expanding our prototype involves the high speed camera. The resolution of existing commercial cameras must be considerably reduced in order to achieve high-speed frame rates (tens of thousands of fps). Nevertheless, the accelerated advances in imaging sensor and parallel computing technologies motivate us to carry out further development of our prototype. In the near future, we expect to build OWAS arrays over large 3 NTT Technical Review

Frame 4 Frame 3 Time Frame 2 Frame 6.25 µs 8 pixels One OWAS device Fig. 5. Images streamed from the high speed camera. High speed camera CoaXpress Max. speed 25 Gbit/s PCI Express 2. x 6 Max. speed 32 Gbit/s CPU memory GPGPU memory CPU: central processing unit Frame grabber Fig. 6. Data flow from camera to GPGPU. Sensor pixels Intensity signals LEDs Threshold Digital data Optical signals Received data Pixel clusters Fig. 7. Optical transmission channel as a MIMO port. Vol. 2 No. Nov. 24 4

Frame 3 Frame 2 Frame Time y x Pixels C 3 = {s 9, s, s, s 2} C 2 = {s 5, s 6, s 7, s 8} C = {s, s 2, s 3, s 4} Each cluster C represents a detected LED on the images. Fig. 8. Pixels clustered according to their spatiotemporal correlation. Processing time (sec) 2 3 6 ms of audio at 6 khz 2. k. k.5 k Number of audio channels 2.5 k 5.4 k (.93 Real time factor) HD: high definition VGA: video graphics array 4.8.8.3.92 2. (VGA) (HD) (Full HD) Image size (megapixels) Fig. 9. System scalability in terms of parallel processing power. spaces such as stadiums or concert halls, and we may achieve real-time position tracking of portable/wearable OWAS devices. Such progress will pave the way for novel applications and high quality services. References [] S. Koyama, Y. Hiwasaki, K. Furuya, and Y. Haneda, Inverse Wave Propagation for Reproducing Virtual Sources in Front of Loudspeaker Array, Proc. of the European Signal Processing Conference (EUSIP- CO) 2, pp. 322 326, Barcelona, Spain, 2. [2] H. L. Van Trees, Optimum Array Processing: Part VI of Detection, Estimation, and Modulation Theory, Wiley-Interscience, New York, 22. [3] K. Niwa, Y. Hioka, S. Sakauchi, K. Furuya, and Y. Haneda, Sharp Directive Beamforming Using Microphone Array and Planar Reflector, Acoustical Science and Technology, Vol. 34, No. 4, pp. 253 262, 23. [4] G. P. Nava, Y. Kamamoto, T. G. Sato, Y. Shiraki, N. Harada, and T. Moriya, Image Processing Techniques for High Speed Camerabased Free-field Optical Communication, Proc. of the IEEE International Conference on Signal and Image Processing Applications (ICSIPA) 23, pp. 384 389, Melaka, Malaysia, October 23. [5] G. P. Nava, Y. Kamamoto, T. G. Sato, Y. Shiraki, N. Harada, and T. Moriya, Simultaneous Acquisition of Massive Number of Audio Channels through Optical Means, Proc. of the 35th Audio Engineering Society (AES) Convention, New York, USA, October 23. [6] Demo video (media player required), mms://csflash.kecl.ntt.co.jp/cslab/mrl/pablo/vasdemo.wmv 5 NTT Technical Review

Gabriel Pablo Nava He completed his B.E. studies in Mexico City at the Instituto Politécnico Nacional in 999. He received his M.S. and Ph.D. in 24 and 27, respectively, from the Department of Information Science and Technology of the University of Tokyo. He also held a postdoctoral position at the University of Tokyo from March 28 to April 24. He joined NTT Communication Science Laboratories in April 28. He is currently conducting research and development on room acoustics, multichannel audio, microphone array signal processing, and digital image processing for next-generation ultra-realistic teleconferencing systems. Yoshifumi Shiraki He received his B.A. in natural science from International Christian University, Tokyo, in 28 and his M.S. in computational intelligence and systems science from Tokyo Institute of Technology in 2. Since joining NTT Communication Science Laboratories in 2, he has been studying signal processing, particularly distributed compressed sensing. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) and the Institute of Electrical and Electronics Engineers (IEEE). Hoang Duy Nguyen Intern, Moriya Research Laboratory, NTT Communication Science Laboratories. Mr. Nguyen was born in Brussels, Belgium, in 99. He graduated summa cum laude with an M.S. in electronics and telecommunications engineering from the University of Brussels, Belgium, in 23. From 22 to 23 he worked as a research engineer in the Smart Systems and Energy Technology (SSET) department of the Interuniversity Micro-Electronics Centre (IMEC), Leuven, Belgium. Since 24 he has been in the Special Research Group of NTT Communication Science Laboratories, Atsugi City, Japan, where he carries out experimental research and development involving free-field optical communication. His research interests include MIMO detection algorithms and implementation aspects of acoustic beamformers on GPGPUs. Yutaka Kamamoto He received his B.S. in applied physics and physico-informatics from Keio University, Kanagawa, in 23 and his M.S. and Ph.D. in information physics and computing from the University of Tokyo in 25 and 22, respectively. Since joining NTT Communication Science Laboratories in 25, he has been studying signal processing and information theory, particularly lossless coding of time-domain signals. He also joined NTT Network Innovation Laboratories, where he worked on the development of the audio-visual codec for online digital stuff (ODS) from 29 to 2. He has contributed to the standardization of coding schemes for MPEG-4 Audio lossless coding (ALS) and ITU-T Recommendation G.7. Lossless compression of G.7 pulse code modulation. He received the Telecom System Student Award from the Telecommunications Advancement Foundation (TAF) in 26, the IPSJ Best Paper Award from the Information Processing Society of Japan (IPSJ) in 26, the Telecom System Encouragement Award from TAF in 27, and the Awaya Prize Young Researcher s Award from the Acoustical Society of Japan (ASJ) in 2. He is a member of IPSJ, ASJ, IEICE, and IEEE. Takashi G. Sato He received his M.S. and Ph.D. in information science and technology from the University of Tokyo in 25 and 28, respectively. He is currently a researcher at NTT Communication Science Laboratories. His research interests include bioengineering and tactile and audio interfaces using psychophysiological and neural measurement as feedback information. He is a member of IEEE. Noboru Harada Senior Research Scientist, Moriya Research Laboratory, He received his B.S. and M.S. from the Department of Computer Science and Systems Engineering of Kyushu Institute of Technology in 995 and 997, respectively. He joined NTT in 997. His main research area is lossless audio coding, high-efficiency coding of speech and audio, and their applications. He was also with NTT Network Innovation Laboratories, where he worked on the development of the audio-visual codec for ODS, from 29 to 2. He is an editor of ISO/IEC 23-6:29 Professional Archival Application Format, ISO/IEC 4496-5:2/Amd.:27 reference software MPEG- 4 ALS, and ITU-T G.7.. He is a member of IEICE, ASJ, the Audio Engineering Society (AES), and IEEE. Vol. 2 No. Nov. 24 6

Takehiro Moriya NTT Fellow, Moriya Research Laboratory, He received his B.S., M.S., and Ph.D. in mathematical engineering and instrumentation physics from the University of Tokyo in 978, 98, and 989, respectively. Since joining NTT laboratories in 98, he has been engaged in research on medium- to low-bit-rate speech and audio coding. In 989, he worked at AT&T Bell Laboratories, NJ, USA, as a Visiting Researcher. Since 99, he has contributed to the standardization of coding schemes for the Japanese public digital cellular system, ITU-T, ISO/IEC MPEG, and 3GPP. He is a member of the Senior Editorial Board of the IEEE Journal of Selected Topics in Signal Processing. He is a Fellow member of IEEE and a member of IPSJ, IEICE, AES, and ASJ. 7 NTT Technical Review