Seeing Using Sound. By: Clayton Shepard Richard Hall Jared Flatow

Similar documents
Auto-Tune. Collection Editors: Navaneeth Ravindranath Tanner Songkakul Andrew Tam

Music Fundamentals 3: Minor Scales and Keys. Collection Editor: Terry B. Ewell

Contemp PIano 101 Instructions. Collection Editor: E T

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering

Video Surveillance *

Reading Music: Common Notation. By: Catherine Schmidt-Jones

VPL-HW45ES Home Theater Projector

Pitch: Sharp, Flat, and Natural Notes

4K Ultra HD DLP HDR Compatible RGBRGB 96% of Rec.709 Home Entertainment projector PX727-4K

VPL-DX131. 2,600 lumens XGA Desktop projector. Overview

The Yamaha Corporation

Introduction to Music Theory. Collection Editor: Catherine Schmidt-Jones

The reduction in the number of flip-flops in a sequential circuit is referred to as the state-reduction problem.

VPL-DX102. 2,300 lumens XGA Desktop projector. Overview

Getting Started After Effects Files More Information. Global Modifications. Network IDs. Strand Opens. Bumpers. Promo End Pages.

Communication Theory and Engineering

Octaves and the Major-Minor Tonal System

EH400. Bright, Full HD 1080p and portable. Bright 1080p projector 4000 ANSI Lumens. Easy connectivity - HDMI, VGA, 2W speaker

Full HD 3D home cinema projector with Reality Creation, SXRD panels, Bright Cinema and TV modes

Introduction to Music Theory. Collection Editor: Catherine Schmidt-Jones

Articulation * Catherine Schmidt-Jones. 1 What is Articulation? 2 Performing Articulations

(a) (b) Figure 1.1: Screen photographs illustrating the specic form of noise sometimes encountered on television. The left hand image (a) shows the no

Music Fundamentals 1: Pitch and Major Scales and Keys. Collection Editor: Terry B. Ewell

Compact multichannel MEMS based spectrometer for FBG sensing

VPL-DX220. 2,700 lumens XGA desktop projector. Overview

EH400. Bright, Full HD 1080p and portable. Bright 1080p projector 4000 ANSI Lumens. Easy connectivity - HDMI, VGA, 2W speaker

Conducting Historical Research: The Case of "Oriental Cairo" By: David Getman Paula Sanders

Elements of a Television System

Image Contrast Enhancement (ICE) The Defining Feature. Author: J Schell, Product Manager DRS Technologies, Network and Imaging Systems Group

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

EH345. Full HD 1080p, bright and powerful. Bright 1080p projector 3200 ANSI Lumens. Installation flexibility 1.3x zoom

18-551, Spring Group #4 Final Report. Get in the Game. Nick Lahr (nlahr) Bryan Murawski (bmurawsk) Chris Schnieder (cschneid)

Muscle Sensor KI 2 Instructions

X400. Bright and portable. Bright XGA projector 4000 ANSI lumens. Easy connectivity - HDMI, VGA, USB-A Power, 2W speaker

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

4K Ultra HD DLP 3500 lumens HDR Compatible SuperColor Home Entertainment projector PX747-4K

Department of Computer Science. Final Year Project Report

Popular music culture

3,500 ANSI Lumens XGA Short Throw Education Projector

Retired. 1. Power On/Off Button 4.Minus (-) Button 2. Power Indicator LED 5.Menu Select Button 3. Plus (+) Button 6.

EH331. Bright, Full HD 1080p and portable. Bright 1080p projector 3300 ANSI Lumens. Easy connectivity - HDMI, VGA, 2W speaker

VPL-D200 Series. Data Projectors VPL-DW240 VPL-DX270 VPL-DX240 VPL-DX220

Next Generation Software Solution for Sound Engineering

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Video Signals and Circuits Part 2

VPL-HW55ES. Full HD 3D home cinema projector with Reality Creation, SXRD panels, Bright Cinema and TV modes (colour availability may vary by country)

LOCOCODE versus PCA and ICA. Jurgen Schmidhuber. IDSIA, Corso Elvezia 36. CH-6900-Lugano, Switzerland. Abstract

W330. Widescreen, bright, vibrant and Portable. Bright WXGA projector 3000 ANSI Lumens. Accurate colours - srgb

Murdoch redux. Colorimetry as Linear Algebra. Math of additive mixing. Approaching color mathematically. RGB colors add as vectors

Harmonic Series II: Harmonics, Intervals, and Instruments *

Figure 2: Original and PAM modulated image. Figure 4: Original image.

Reverb 8. English Manual Applies to System 6000 firmware version TC Icon version Last manual update:

W400. Widescreen, bright and portable. Bright WXGA projector 4000 ANSI Lumens. Easy connectivity - 2x HDMI, MHL USB Power, 2W speaker

Altera s Max+plus II Tutorial

Version 1.0 February MasterPass. Branding Requirements

Super-sized 100-inch images - GT5000 placed 30cm away from a flat surface or screen

DH1009i. Full HD 1080p, Bright and Portable. Bright 1080p projector 3200 ANSI Lumens

W331. Widescreen, bright, vibrant and Portable. Bright WXGA projector 3300 ANSI Lumens. Accurate colours - srgb

W331. Widescreen, bright, vibrant and Portable. Bright WXGA projector 3300 ANSI Lumens. Accurate colours - srgb

16B CSS LAYOUT WITH GRID

S331. Digital only. Bright SVGA projector 3200 ANSI lumens. Accurate colours - srgb. Easy connectivity - 2x HDMI, MHL, 2W speaker

S331. Digital only. Bright SVGA projector 3200 ANSI lumens. Accurate colours - srgb. Easy connectivity - 2x HDMI, MHL, 2W speaker

DS348. Digital only. Bright SVGA projector 3000 ANSI lumens. Accurate colours - srgb. Easy connectivity - 2x HDMI, MHL, 2W speaker

OpenStax-CNX module: m Clef * Catherine Schmidt-Jones. Treble Clef. Figure 1

Route optimization using Hungarian method combined with Dijkstra's in home health care services

VPL-DX221. 2,800 lumens XGA desktop projector. Overview. Features

H183X. HD ready home entertainment projector. HD ready 3200 ANSI Lumens. Exceptional colour accuracy - Rec709

VPL-VW1100ES. The ultimate 4K home cinema projector for larger, luxury private screening rooms. Overview

Swept-tuned spectrum analyzer. Gianfranco Miele, Ph.D

W341. Widescreen, bright and portable. Bright WXGA projector 3600 ANSI Lumens. Easy connectivity - 2x HDMI, MHL USB Power, 10W speaker

Welcome Accelerated Algebra 2!

Journal Article Reference: More than Seven Authors *

Hidden Markov Model based dance recognition

H183X. HD ready home entertainment projector. HD ready 3200 ANSI Lumens. Exceptional colour accuracy - Rec709

Neural Network for Music Instrument Identi cation

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1,

Modes and Ragas: More Than just a Scale

ATSC Standard: Video Watermark Emission (A/335)

A COMPUTER VISION SYSTEM TO READ METER DISPLAYS

MPEG has been established as an international standard

H183X. HD ready home entertainment projector. HD ready 3200 ANSI Lumens. Exceptional colour accuracy - Rec709

Reducing CCD Imaging Data

Audio and Video Localization

Technical Developments for Widescreen LCDs, and Products Employed These Technologies

AutoBlend Screening in ScreenManager and Imaging Engine Release Notes

You're future-proofed with compatibility for the latest 4K standards - so you'll get the very best out of today's content, and tomorrow's.

ATSC Candidate Standard: Video Watermark Emission (A/335)

Audio Source Separation: "De-mixing" for Production

DX349. Bright and portable. Bright XGA projector 3000 ANSI lumens. Easy connectivity - HDMI, VGA, USB-A Power, 2W speaker

Illuminating the home theater experience.

Understanding Compression Technologies for HD and Megapixel Surveillance

Temporal coordination in string quartet performance

Elasticity Imaging with Ultrasound JEE 4980 Final Report. George Michaels and Mary Watts

1080p Living Room Theater Projector with Rec. 709 cinematic color

An Overview of Video Coding Algorithms

Introduction To LabVIEW and the DSP Board

APPLICATION NOTE. Fiber Alignment Now Achievable with Commercial Software

DSP Laboratory: Analog to Digital and Digital to Analog Conversion *

Product, Compact Projection EX632. Native XGA. Up to 6000 hours lamp life. Crestron RoomView RJ45 control and monitoring.

An Iot Based Smart Manifold Attendance System

Transcription:

Seeing Using Sound By: Clayton Shepard Richard Hall Jared Flatow

Seeing Using Sound By: Clayton Shepard Richard Hall Jared Flatow Online: < http://cnx.org/content/col10319/1.2/ > C O N N E X I O N S Rice University, Houston, Texas

This selection and arrangement of content as a collection is copyrighted by Clayton Shepard, Richard Hall, Jared Flatow. It is licensed under the Creative Commons Attribution 2.0 license (http://creativecommons.org/licenses/by/2.0/). Collection structure revised: December 15, 2005 PDF generated: October 25, 2012 For copyright and attribution information for the modules contained in this collection, see p. 15.

Table of Contents 1 Introduction and Background for Seeing with Sound......................................... 1 2 Seeing using Sound - Design Overview......................................................... 3 3 Canny Edge Detection............................................................................ 5 4 Seeing using Sound's Mapping Algorithm...................................................... 7 5 Demonstrations of Seeing using Sound.......................................................... 9 6 Final Remarks on Seeing using Sound......................................................... 13 Index................................................................................................ 14 Attributions.........................................................................................15

iv

Chapter 1 Introduction and Background for Seeing with Sound 1 1.1 Introduction Seeing with sound is our attempt to meaningfully transform an image to sound. The motivation behind it is simple, to convey visual information to blind people using their sense of hearing. We believe in time, the human brain can adapt to the sounds, making it a useful and worthwhile system. 1.2 Background and Problems In researching for this project, we found one marketed product online, the, voice 2, that did just what we set out to do. However, we believe that the voice 3 is not optimum, and we have a few improvements in mind. One idea is to make the center of the image the focus of the nal sound. We feel like the center of an image contains the most important information, and it gets lost in the left to right sweeping of voice 4. Also, some of the images are far too "busy" to use their technique. We the images need to be simplied so that only the most important information is conveyed in the sounds. 1 This content is available online at <http://cnx.org/content/m13222/1.1/>. 2 http://www.visualprosthesis.com/javoice.htm 3 http://www.visualprosthesis.com/javoice.htm 4 http://www.visualprosthesis.com/javoice.htm 1

2 CHAPTER 1. INTRODUCTION AND BACKGROUND FOR SEEING WITH SOUND

Chapter 2 Seeing using Sound - Design Overview 1 2.1 Input Filtering The rst step in our process is to lter the input image. This process helps solve the "busy" sound problem from the voice 2. We decided to rst smooth the image with a low pass lter, leaving only the most prominent features of the image behind. We then wanted to lter the result with an edge detector, essentially a high pass lter of some sort. We chose to use a Canny lter for the edge detection. The advantage of using an edge detector lies in simplifying the image while at the same time highlighting the most structurally signicant components of an image. This is especially applicable to using the system for the blind, as the structural features of the image are the most important to nd your way around a room. 2.2 The Mapping Process Simply put, the mapping process is the actual transformation between visual information and sound. This block takes the data from the ltered input, and produces a sequence of notes representing the image. The process of mapping images to sound is a matter of interpretation, there is no known "optimal" solution to the mapping for the human brain. Thus, we simply chose an interpretation that made sense to us. First of all, it seemed clear to us that the most intuitive use of frequency would be to correlate it to the relative vertical position of an edge in the picture. That is, higher frequencies should correspond to edges that are higher in the image than lower frequencies. The only other idea that we wanted to stick to was making the center the focus of the attention. For a complete description of this component, see the mapping process. 1 This content is available online at <http://cnx.org/content/m13224/1.1/>. 2 http://www.visualprosthesis.com/javoice.htm 3

4 CHAPTER 2. SEEING USING SOUND - DESIGN OVERVIEW

Chapter 3 Canny Edge Detection 1 3.1 Introduction to Edge Detection Edge detection is the process of nding sharp contrasts in intensities in an image. This process signicantly reduces the amount of data in the image, while preserving the most important structural features of that image. Canny Edge Detection is considered to be the ideal edge detection algorithm for images that are corrupted with white noise. For a more in depth introduction, see the Canny Edge Detection Tutorial 2. 3.2 Canny Edge Detection and Seeing Using Sound The Canny Edge Detector worked like a charm for Seeing Using Sound. We used a Matlab implementation of the Canny Edge Detector, which can be found at http://ai.stanford.edu/ mitul/cs223b/canny.m 3. Here is an example of the results of ltering an image with a Canny Edge Detector: Figure Title (optional) Figure 3.1: Before Edge Detection 1 This content is available online at <http://cnx.org/content/m13218/1.2/>. 2 http://www.pages.drexel.edu/ weg22/can_tut.html 3 http://ai.stanford.edu/ mitul/cs223b/canny.m 5

6 CHAPTER 3. CANNY EDGE DETECTION Figure Title (optional) Figure 3.2: After Edge Detection

Chapter 4 Seeing using Sound's Mapping Algorithm 1 The mapping algorithm is the piece of the system that takes in an edge-detected image, and produces a sound clip representing the image. The mapping as we implemented it takes three steps: Vertical Mapping Horizontal Mapping Color Mapping Mapping Diagram Figure 4.1: Illustration of our mapping algorithm 1 This content is available online at <http://cnx.org/content/m13226/1.3/>. 7

8 CHAPTER 4. SEEING USING SOUND'S MAPPING ALGORITHM 4.1 Vertical Mapping The rst step of the algorithm is to map the vertical axis of the image to the frequency content of the output sound at a given time. We implemented this by having the relative pitch of the output at that time correspond to rows in each column that have an edge. Basically, the higher the note you hear, the higher it is in your eld of vision, and the lower the note, the lower in your eld of vision. 4.2 Horizontal Mapping Next, we need some way of mapping the horizontal axis to the output sound. We chose to implement this by having our system "sweep" the image from the outside-in in time (see gure 1). The reasoning behind this is that the focus of the nal sound should be the center of the eld of vision, so we have everything meeting in the middle. This means that each image will have some period that it will take to be "displayed" as sound. The period begins at some time t0, and, with stereo sound, the left and right channels start sounding notes corresponding to edges on each side of the image, nally meeting in the middle at some time tf. 4.3 Color Mapping Using scales instead of continuous frequencies for the notes gives us some extra information to work with. We decided to also try to incorporate the color from the original image of the point at an edge. We were able to do this by letting the brightness of the scale that we use. For example, major scales sound much brighter than minor scales, so bright colors correspond to major scales, and darker ones correspond to minor. This eect is dicult to perceive for those that aren't trained, but we believe that the brain can adapt to this pattern regardless of whether or not the user truly understands the mapping.

Chapter 5 Demonstrations of Seeing using Sound 1 For each example, right click on the link to the corresponding sound and go to "Save Link Target As..." to download and play it. 5.1 Examples Identity Matrix Figure 5.1: Our Simplest Example - Listen 2 1 This content is available online at <http://cnx.org/content/m13219/1.2/>. 9

10 CHAPTER 5. DEMONSTRATIONS OF SEEING USING SOUND X Matrix Figure 5.2: Listen 3 Edge Detected Heart Figure 5.3: Listen 4 2 http://cnx.org/content/m13219/latest/identity.au 3 http://cnx.org/content/m13219/latest/crisscross.au 4 http://cnx.org/content/m13219/latest/heart.au

11 Front Door Repeated Figure 5.4: Our Hardest Example - Not for beginners! - Listen 5 5 http://cnx.org/content/m13219/latest/cdoor.au

12 CHAPTER 5. DEMONSTRATIONS OF SEEING USING SOUND

Chapter 6 Final Remarks on Seeing using Sound 1 6.1 Future Considerations and Conclusions There are many ways to improve upon our approach. One way to signicantly improve left/right positioning is to have the left and right scales play dierent instruments. Another way to improve resolution would be to have dierent neighboring blocks compare data so that when an edge spans many dierent blocks it does not sound like a cacophony. Other lters could be applied, besides edge detectors, to determine other features of the image, such as color gradients or the elements in the foreground. This information could be encoded into dierent elements of the basis scale, or even change the scale to a dierent, perhaps acyclic, pattern. One way to go about this might be to look at existing photo processing lters (e.g. in Photoshop) and use those for inspiration. 6.2 Contact Information of Group Members Flatow, Jared: jmizz @ rice dot edu Hall, Richard: rlhall @ rice dot edu Shepard, Clay: cwshep @ rice dot edu 1 This content is available online at <http://cnx.org/content/m13220/1.1/>. 13

14 INDEX Index of Keywords and Terms Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywords do not necessarily appear in the text of the page. They are merely associated with that section. Ex. apples, Ÿ 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1 C Canny, Ÿ 3(5) E Edge Detection, Ÿ 3(5)

ATTRIBUTIONS 15 Attributions Collection: Seeing Using Sound Edited by: Clayton Shepard, Richard Hall, Jared Flatow URL: http://cnx.org/content/col10319/1.2/ License: http://creativecommons.org/licenses/by/2.0/ Module: "Introduction and Background for Seeing with Sound" By: Richard Hall, Jared Flatow URL: http://cnx.org/content/m13222/1.1/ Page: 1 Copyright: Richard Hall, Jared Flatow License: http://creativecommons.org/licenses/by/2.0/ Module: "Seeing using Sound - Design Overview" By: Richard Hall, Jared Flatow URL: http://cnx.org/content/m13224/1.1/ Page: 3 Copyright: Richard Hall, Jared Flatow License: http://creativecommons.org/licenses/by/2.0/ Module: "Canny Edge Detection" By: Richard Hall, Jared Flatow URL: http://cnx.org/content/m13218/1.2/ Pages: 5-6 Copyright: Richard Hall, Jared Flatow License: http://creativecommons.org/licenses/by/2.0/ Module: "Seeing using Sound's Mapping Algorithm" By: Richard Hall, Jared Flatow URL: http://cnx.org/content/m13226/1.3/ Pages: 7-8 Copyright: Richard Hall, Jared Flatow License: http://creativecommons.org/licenses/by/2.0/ Module: "Demonstrations of Seeing using Sound" By: Richard Hall, Jared Flatow URL: http://cnx.org/content/m13219/1.2/ Pages: 9-11 Copyright: Richard Hall, Jared Flatow License: http://creativecommons.org/licenses/by/2.0/ Module: "Final Remarks on Seeing using Sound" By: Richard Hall, Jared Flatow URL: http://cnx.org/content/m13220/1.1/ Page: 13 Copyright: Richard Hall, Jared Flatow License: http://creativecommons.org/licenses/by/2.0/

Seeing Using Sound Elec 301 Project - Fall 2005. Seeing using Sound transforms images in to sound to aid blind people. About Connexions Since 1999, Connexions has been pioneering a global system where anyone can create course materials and make them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching and learning environment open to anyone interested in education, including students, teachers, professors and lifelong learners. We connect ideas and facilitate educational communities. Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12 schools, distance learners, and lifelong learners. Connexions materials are in many languages, including English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part of an exciting new information distribution system that allows for Print on Demand Books. Connexions has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.