Music Scope Headphones: Natural User Interface for Selection of Music

Similar documents
Chapter 7 Registers and Register Transfers

Motivation. Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

Polychrome Devices Reference Manual

Image Intensifier Reference Manual

2 Specialty Application Photoelectric Sensors

2 Specialty Application Photoelectric Sensors

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

Quality improvement in measurement channel including of ADC under operation conditions

2 Specialty Application Photoelectric Sensors

Line numbering and synchronization in digital HDTV systems

Video Cassette Recorder

Working with PlasmaWipe Effects

2 Specialty Application Photoelectric Sensors

ttco.com

8825E/8825R/8830E/8831E SERIES

SMARTEYE ColorWise TM. Specialty Application Photoelectric Sensors. True Color Sensor 2-65

Math of Projections:Overview. Perspective Viewing. Perspective Projections. Perspective Projections. Math of perspective projection

Manual Comfort Air Curtain

Application Example. HD Hanna. Firewire. Display. Display. Display. Display. Display. Computer DVD. Game Console. RS-232 Control.

References and quotations

Apollo 360 Map Display User s Guide

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

NexLine AD Power Line Adaptor INSTALLATION AND OPERATION MANUAL. Westinghouse Security Electronics an ISO 9001 certified company

CCTV that s light years ahead

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

Read Only Memory (ROM)

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

9311 EN. DIGIFORCE X/Y monitoring. For monitoring press-fit, joining, rivet and caulking operations Series 9311 ±10V DMS.

Manual Industrial air curtain

FHD inch Widescreen LCD Monitor USERGUIDE

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

BesTrans AOC (Active Optical Cable) Spec and Manual

The Blizzard Challenge 2014

T-25e, T-39 & T-66. G657 fibres and how to splice them. TA036DO th June 2011

PowerStrip Automatic Cut & Strip Machine

THE Internet of Things (IoT) is likely to be incorporated

Internet supported Analysis of MPEG Compressed Newsfeeds

The Communication Method of Distance Education System and Sound Control Characteristics

MODELLING PERCEPTION OF SPEED IN MUSIC AUDIO

ProductCatalog

For children aged 5 7

Manual RCA-1. Item no fold RailCom display. tams elektronik. n n n

STx. Compact HD/SD COFDM Transmitter. Features. Options. Accessories. Applications

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Organic Macromolecules and the Genetic Code A cell is mostly water.

Before you submit your application for a speech generating device, we encourage you to take the following steps:

DIGITAL DISPLAY SOLUTION REAL ESTATE POINTS OF SALE (POS)

PROJECTOR SFX SUFA-X. Properties. Specifications. Application. Tel

Practice Guide Sonata in F Minor, Op. 2, No. 1, I. Allegro Ludwig van Beethoven

Index. LV Series. Multimedia Projectors FULL LINE PRODUCT GUIDE. usa.canon.com/projectors. REALiS LCOS Projectors. WUX10 Mark II D WUX10 Mark II...

Voice Security Selection Guide

Image Enhancement in the JPEG Domain for People with Vision Impairment

Twin City Fan & Blower

,..,,.,. - z : i,; ;I.,i,,?-.. _.m,vi LJ

Comparative Study of Different Techniques for License Plate Recognition

Taking your meetings to the next level is how we re engineering a better world.

Innovation in the Multi-Screen World. Sirius 800 Series. Multi-format, expandable routing that stands out from the crowd

Canon Canada Builds Its New LEED Gold Certified Canadian Headquarters in Partnership with Applied Electronics

Part II: Derivation of the rules of voice-leading. The Goal. Some Abbreviations

Australian Journal of Basic and Applied Sciences

Daniel R. Dehaan Three Études For Solo Voice Summer 2010, Chicago

BARCAROLLE for alto saxophone and live electronics PATRICK LONG

How the IoT Fuels Airlines Industry's Flight into the Future

The new, parametrised VS Model for Determining the Quality of Video Streams in the Video-telephony Service

SG Alternatives, LLC 2004 Parts Catalog

DCT 1000 Cable Terminal Installation Manual

Implementation of Expressive Performance Rules on the WF-4RIII by modeling a professional flutist performance using NN

NewBlot PVDF 5X Stripping Buffer

Research on the Classification Algorithms for the Classical Poetry Artistic Conception based on Feature Clustering Methodology. Jin-feng LIANG 1, a

University Student Design and Applied Solutions Competition

RELIABILITY EVALUATION OF REPAIRABLE COMPLEX SYSTEMS AN ANALYZING FAILURE DATA

CODE GENERATION FOR WIDEBAND CDMA

Description Type Page

Manual WIB Carriage lighting Colour of lighting: warm white. Item no tams elektronik. tams elektronik n n n

TRAINING & QUALIFICATION PROSPECTUS

L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture

Volume 20, Number 2, June 2014 Copyright 2014 Society for Music Theory

A Backlight Optimization Scheme for Video Playback on Mobile Devices

Forces: Calculating Them, and Using Them Shobhana Narasimhan JNCASR, Bangalore, India

Manual Air heater. English. Model NOZ2. Version 7.0 Original Manual

UNIT 7. Could You...?

Elizabeth H. Phillips-Hershey and Barbara Kanagy Mitchell

Guide to condition reports for domestic electrical installations

Preview Only. Legal Use Requires Purchase W PREVIEW PREVIEW PRE IEW PREVIEW PREVIEW PREVIEW PREVIEW PREVIE PREVIEW PREVIEW PREVIEW PREVIEW

FLUID COOLING Industrial BOL Series

COLLEGE READINESS STANDARDS

Perspectives AUTOMATION. As the valve turns By Jim Garrison. The Opportunity to make Misteaks By Doug Aldrich, Ph.D., CFM

CRAYON. The crayons for the digital generation

VOCALS SYLLABUS SPECIFICATION Edition

Using a Computer Screen as a Whiteboard while Recording the Lecture as a Sound Movie

CSI 2130 Machinery Health Analyzer

RHYTHM TRANSCRIPTION OF POLYPHONIC MIDI PERFORMANCES BASED ON A MERGED-OUTPUT HMM FOR MULTIPLE VOICES

lev-lok Modular Wiring Device System The safer and more efficient solution for modern building electrical systems

Because your pack is worth protecting. Tobacco Biaxially Oriented Polypropylene Films. use our imagination...

DIGITAL SYSTEM DESIGN

BOUND FOR SOUTH AUSTRALIA

COMMITTEE ON THE HISTORY OF THE FEDERAL RESERVE SYSTEM. Register of Papers CHARLES SUMNER HAMLIM ( )

Our competitive advantages : Solutions for X ray Tubes. X ray emitters. Long lifetime dispensers cathodes n. Electron gun manufacturing capability n

ROUNDNESS EVALUATION BY GENETIC ALGORITHMS

THE UNIVERSITY OF THE SOUTH PACIFIC LIBRARY Author Statement of Accessibility. Yes % %

Transcription:

Music Scope Headphoes: Natural User Iterface for Selectio of Music Masatoshi Hamaaka Presto, Japa Sciece ad Techology Agecy A.I.S.T. Mbox 604 1-1-1 Umezoo, Tsukuba, Ibaraki, 305-8568 Japa m.hamaaka@aist.go.jp Abstract This paper describes a ovel audio oly iterface for selectig music which eables us to select sogs without havig to click a mouse. Usig previous music players with ormal headphoes, we ca hear oly oe sog at a time ad we thus have to play pieces idividually to select the oe we wat to hear from umerous ew music files, which ivolves a large umber of mouse operatios. The mai advatage of our headphoes is that they detect atural movemets, such as the head or had movig whe users are listeig to music ad they ca focus o a particular musical source that they wat to hear. By movig their head left or right, listeers ca hear the source from a frotal positio as the digital compass detects the chage i the directio they are facig. By lookig up or dow, the tilt sesor will detect the chage i the face s agle of elevatio; they ca better hear the source that is allocated to a more distat or closer positio. By puttig their had behid their ear, listeers ca adjust the focus sesor o the headphoes to focus o a particular musical source that they wat to hear. Keywords: Headphoes, music iterface, digital compass, tilt sesor, ifrared distace sesor. 1. Itroductio Although we have recetly bee able to dowload a huge umber of sogs through Iteret music delivery services, users are oly listeig to a small umber because opportuities to fid ufamiliar musical pieces i the collectio are limited. Our goal was to costruct a system that would eable people to easily select musical sources that they had a affiity for from may ukow oes. Previous music retrieval methods that use queries such as similarity-based [1-3] searchig, text-based searchig [4], or collaborative filterig based searchig [5, 6] are useful for arrowig the umber of musical pieces, but after the list of rakigs has bee provided we have to liste to sogs oe by oe because o cosideratio has bee Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. 2006 Uiversity of Victoria Seughee Lee Uiversity of Tsukuba Teoudai 1-1-1, Tsukuba, Ibaraki, 305-8574, Japa lee@kasei.tsukuba.ac.jp give to fidig sogs oe has a affiity for from the list. Musicream [7], o the other had, make it possible to iteract with may music collectios by applyig operatios ad providig fuctios for the order of play. Papipuu [8] ad SmartMusicKIOSK [9] provide a music summary ad allow quick listeig i a maer similar to a stylus skippig o a scratched record. All these systems [7-9] eable us to save time by previewig sogs from a list of rakigs acquired from the results of music retrieval. However, these systems also force us to liste to sogs oe by oe ad ivolve may mouse operatios. I cotrast, our system, called Music Scope Headphoes, make it possible to select a musical source from the may available without the eed for mouse clicks or other visual maipulatios by detectig atural movemets whe users are listeig to music ad focusig o the particular musical source that they wat to hear. The Music Scope Headphoes provide a ovel music selectio iterface that eables the followig three fuctios to be applied. 1. Scopig fuctio: eable us to scope may musical sources allocated i 2-dimesioal space by movig our heads left or right or by lookig up or dow. The fuctio eables us to ladscape sogs ad save time i previewig them. 2. Focusig fuctio: highlights a particular musical source that users wat to hear by them placig their had behid a ear. This fuctio eables us to arrow the area i which sources are audible i 2- dimesioal space as if cotrollig the directivity of a microphoe. 3. Switchig fuctio: seamlessly chages musical sources i 2-dimesioal space through users gestures such as them oddig or shakig their heads, turig them aroud, or leaig them to oe side. For example, whe users are turig their head aroud, the ext 10 musical sources i the order o the list acquired from a music retrieval system will be allocated i 2-dimesioal space. We mouted three sesors to the headphoes, i.e., a digital compass, a tilt sesor, ad a focus sesor, which detect atural movemets, such as that of the head or the placemet of a had behid a ear, ad this allowed us to use these three fuctios without the eed for a display or a computer mouse. Users are freed from mouse operatios ad ca select music much more actively.

Previously reported headphoes with sesors to detect the directio users were facig or the locatio of the head could improve the sese of musical presece ad create a realistic impressio, but could ot highlight parts accordig to their wishes [10-12]. It was difficult to clearly hear a particular musical source from may other sources with these headphoes, icludig some that users may have preferred ot to hear. There are music spatializatio systems [13, 14] that allow users to cotrol the localizatio of each part i real time through a graphical iterface. However, it is difficult to cotrol each musical source s locatio through this iterface. This paper is orgaized as follows. Sectio 2 explais the fuctios of the headphoes. Sectio 3 describes system processig, ad Sectio 4 discusses the implemetatio. Sectios 6 ad 7 preset the experimetal results ad the coclusio. tured its papot to the ceter, while decreasig the volume of the other sources ad turig their papots left or right. I this way, a user ca easily scope a particular musical source. 2.2 How motio is detected Natural movemets must be detected while the user is listeig to music to cotrol the audio mixer through them. To eable this, we mouted the digital compass ad the tilt sesor o top of the headbad to detect the directio the user was facig ad to detect the face's agle of elevatio. We also mouted the focus sesor o the outside of the right speaker to detect the distace from the had to the ear (Figure 1). We prepared three focus sesor prototypes ad evaluated how practical they were i a experimet. (c) Focus sesor (a) Digital Compass (Ifrared distace sesor) 2. Music Scope Headphoes We costructed the Music Scope Headphoes that eabled us to save time i previewig sogs based o the followig three policies. Reduced mouse operatios Whe selectig music with a computer, we geerally have to play sogs idividually with may mouse operatios ad these iterrupts break the process of listeig just as if a telephoe were rigig. To solve this, we propose operatios without mouse clicks achieved by detectig atural movemets whe listeig to music ad usig these to cotrol the computer. Easy preview of may sogs We wated to icrease the umber of opportuities for ecouterig ufamiliar musical pieces i collectios. However, the umber of sogs that ca be previewed is limited withi a fixed amout of time. This is because the larger the umber of sogs to be previewed, the shorter the time to liste to each sog. To solve this, we propose a ovel way of selectig music by playig may musical sources at the same time. No computer display We wated the system to be used aywhere ad at ay time such as at work or whe ridig or walkig without the eed to see a computer display. We attempted to costruct a system to ivestigate whether it were possible to select ad maipulate sogs without a display. The Music Scope Headphoes let users cotrol a audio mixer through atural movemets, ad thus eable them to select a musical source that they wat to liste to from umerous soud sources. We will ow explai the problems ad solutios we ecoutered with the Music Scope Headphoes based o these policies. 2.1 How musical sources are scoped A particular musical source that the user temporarily wats to hear must be differetiated from other musical sources to scope it. We automatically adjusted each musical source's volume ad papot so that it could be distiguished from other musical sources. That is, we icreased that source's volume ad (b)tilt sesor Figure 1. Three sesors mouted to headphoe. 2.3 How fuctio ad motio are liked How usable the Music Scope Headphoes are depeds o the quality of the liks betwee the fuctios ad the users atural movemets while they are listeig to music. Let us imagie the followig sceario. We receive several sessio recordigs from a childhood fried. It souds like the fried is playig a saxophoe o oe of these recordigs. I such a case, we would ordiarily search for sogs with a saxophoe part, ad we the might wat to hear the saxophoe playig more clearly. We used the three liks that follow to achieve this. Lik for scopig fuctio Whe users move their head left (right), the musical source ormally heard from the left (right) side ca be heard from the frotal positio as the digital compass detects the chage i the directio they are facig. This allows users, through atural movemets, to scope the musical source they wat to hear most clearly ad hear it from the frot. Whe there are several musical sources at the frot, users might ot be able to hear the desired source clearly eve after turig their head left or right to hear it from the frot. I such a case, they ca chage the mix by movig their head up or dow; the tilt sesor will detect the chage i the face's agle of elevatio. By lookig up or dow, users ca icrease the volume of sources so that istrumets appear farther away or earer. Here, we chaged each source's positio i 2dimesioal space, as ca be see i the graphical user iterface i Figure 2. The circle at the ceter idicates the

positio of the user's avatar ad his/her head directio, ad the circled umbers aroud the avatar idicate the positios of the sources. We also had several preset allocatios for musical sources ad these were easy to chage by puttig oe's head to oe side (Figure 3). their head aroud i the sog selectio mode, the sogs allocated i 2-dimesioal space chage to the ext 10 sogs from the list (Figure 4 (d)). (d) Chages to the ext 10 sogs (b) Retur to the previous mode of situatio 4 5 6 7 Positio of musical sources (a) Select the focusig sogs 3 2 (a) Circle 1 10 9 8 Avatar (curretly lookig forward) Figure 2. GUI for locatig positios of parts. (b) Star (c) Bad (d) Orchestra Figure 3. Presets for allocatio. Lik for focusig fuctio The focus sesor is used to detect the motio of users puttig their had behid a ear while they are listeig to soud comig from the frotal positio. The distace betwee the had ad ear determies the area i which sources are audible. For example, whe users place their had close to their ear, they ca oly hear the sources from the frotal positio. Whe they removes their had, they ca hear all the sources except those behid them. Whe they put their had i the middle positio, they ca hear the sources located i the frot half positio. By adjustig the distace betwee their had ad ear i this way, they ca cotrol the focus level ad highlight the source of iterest. Lik for switchig fuctio The system has two modes, a sog selectig mode ad a part scopig mode. We ca scope ad preview 10 sogs i the sog selectig mode from those listed i order by a music retrieval system allocated i 2-dimesioal space. Whe covergig o several sogs usig the focusig fuctio i the sog selectig mode, users ca leave focused sogs ad delete ufocused sogs by oddig their head (Figure 4(a)). If they wat more covergece, they oly eed to adjust the focus level ad od their head agai. Coversely, users ca defocus by shakig their head ad retur to the previous sceario (Figure 4 (b)). Whe they oly select oe sog ad have a soud source where the tracks for each part have bee recorded separately, the system chages to the part scopig mode. The Music Scope Headphoes provide ovel etertaimet with this mode through which users ca "scope" oto the part they wat to hear more clearly. They ca retur to the sog selectio mode by shakig their head. Users ca chage the preset allocatio by puttig their head to oe side (Figure 4 (c)) durig the sog selectig mode or the part scopig mode. By turig 3. Processig (c) Chage the allocatio of the musical source by presets allocatio Figure 4. Lik for switchig fuctio. This sectio describes the processig flow for the system. We maily describe soud processig ad have omitted explaatios for detectig gestures, oddig, shakig, puttig the head to oe side, ad turig it aroud because of word limitatios. I the followig, we use θ (-π θ < π) as the facig directio detected by the digital compass, φ (-π φ < π) as the face's agle of elevatio detected by the tilt sesor, ad δ (0 δ 1) as the distace betwee the had ad the ear detected by the focus sesor (Figure 5). We use radias as agle uits ad set the startig directio ad agle of elevatio to zero. We ormalized δ from 0 to 1, ad the focus sesor could detect a distace from 0 to 3 cm. Whe the distace was 0 cm, δ was output as 0, ad whe the distace was 3 cm, δ was output as 1. Whe the distace was betwee 0 ad 3 cm, δ raged from 0 to 1. z y θ φ δ Figure 5. Three sesors mouted to headphoe. Pretreatmet We prepared soud source S by recordig a separate track for each part ad allocatig a positio o the graphical user iterface to each part (Figure 2). Here, l (0 l 1) idicates the distace from the avatar to each part ad θ idicates the directio of each part. We ormalized l so that the most distat part would have a value of 1. Step 1 h φ (0 h φ 1) was calculated as the amplificatio rate for each part,, which chages depedig o the agle of elevatio, φ. We used the followig formula so that whe users looked up (dow), the volumes of parts located far from (ear to) their positio would icrease. ~ φ 0 h < 0 φ φ ~ φ (1) h = h 0 h < 1 ~ φ 1 1< h, where x

~ φ 1 h = 1+ l siφ lm siφ m m m: umber of parts. Whe we allocated the positios for all parts as i Figure 6 (a), the mixig cosole was as i Figure 6 (b) whe φ was zero. Whe φ was egative, the mixig cosole was as i Figure 6 (c), idicatig that the volume of parts located ear to (far from) the avatar was icreased (decreased). (a) Positio of musical sources 6 7 8 9 10 1 2 3 4 5 (b) Lookig horizotally Figure 6. Agle of elevatio φ ad mixig cosole. Step 2 h δ was calculated as the amplificatio rate for all parts, which chages accordig to the distace betwee the had ad ear δ. Here, a idicates the absolute value of a, adθ (-π θ < π) idicates the agle betwee θ ad θ. 1 π δ θ δ (2) h = 0 π δ < θ For example, h δ = 0 correspods to the parts located behid the user ad h δ =1 correspods to the parts i frot of the user whe θ = π/3 ad δ = 0.5 (Figure 7). I this way, we ca elimiate parts the user does ot wat to hear. 3 4 2 h =0 δ 1 5 6 7 8 9 h δ =1 10 11 (c) Lookig dow Figure 7. Distace from had to ear δ ad h δ. Step 3 h θ (0 h θ 1) is calculated as the amplificatio rate for all parts, which chages accordig to the directio. The h θ output has a large value whe the part is located i frot of the user ad becomes smaller whe the part is located i aother directio. ~ θ θ 0 h < 0 h = (3) ~ θ ~ θ h 0 h, where 0 δ = 0 θ h = α θ 1 δ > 0. π δ Whe we allocated the positios of all parts as i Figure 2, the mixig cosole was as i Figure 8(a) whe users were lookig left, as i Figure 8(b) whe they were lookig straight ahead, ad as i Figure 8(c) whe they were lookig right. (a)lookig left (b) Lookig at ceter (c) Lookig right Figure 8. Directio θ ad mixig cosole. We used a adjustable parameter, α (0 α < 1), to decrease the amplificatio rate whe users placed their had o their ear ad δ < 1. Whe we allocated positios for all parts as i Figure 2, the mixig cosole was as i Figure 9 (a) whe users moved their had away from their ear, ad as i Figure 9 (b) whe they moved their had toward their ear. (a)removig had from ear (δ = 0) (b) Had approachig ear (δ > 0) Smooth distributio Sharp distributio Figure 9. Decreasig amplificatio rate while α > 0. Step 4 p (0 p < 1)is calculated as the left/right volume ratio depedig o directio θ. Here, p = 0 idicates that the ratio is 0:1 ad p = 0.5 idicates that it is 1:1. We used a adjustable parameter, β, to chage the left/right ratio whe the users put their had to their ear ad δ < 1. Whe β> 0 ad δ < 1, the papots of the parts move to the back except for the part i the frotal positio, ad users ca hear music as if focusig o the frot part. 1 β θ p = + (4) 2 π δ Step 5 The amplificatio rates acquired i Steps 1 to 4 are multiplied ad the the soud is output by summig up the souds of all parts. Right-side output: φ δ θ S = S h h h p ad (5) Right Left-side output: φ δ θ S = S h h h 1 p. (6) Left ( ) 4. Implemetatio We preseted the processig flow for the software i the previous sectio. It worked o the Max/MSP [16]. Here, we describe the implemetatio of the hardware. We implemeted the headphoes with the two policies that follow so that everyoe ca easily use them.

Lightweight yet strog. Easily coected to computer. Headphoes We selected headphoes (Zehizer: HD212Pro) that had adjusters iside the headbad, because we could mout the focus sesor outside the right headbad ad this would therefore work stably eve if the speakers were moved (Figure 1). Sesors We mouted the attitude detectio module (Aichi Micro Itelliget: AMI302-ATD) to the top of the headbad, which cosisted of the digital compass (MI sesor) ad tilt sesor (Figure 1 (a), ad (b)). Geerally, the larger the agles of elevatio, the larger the margi for error i the digital compass, because it detect the directio of the magetic lie of force. The mai advatage of usig the module was that the tilt sesor could correct the output of the digital compass. The detectio resolutio for the module was 2 degrees. We also mouted the focus sesor to the right of the headbad. We compared three focus-sesor prototypes i the experimets with musical ovices, which are described below, ad we selected the ifrared distace sesor (Sharp: GP2S40J) (Figure 1 (c)). The ifrared distace sesor cosists of illumiat ad acceptace of ifrared ad measures the distace betwee the sesor to objects by acceptig the reflectig ifrared. We prepared a circuit for moutig the ifrared distace sesor ad we mouted a semi-variable resistor so that the sesor could detect from 0 to 3 cm. Protectors We made protectors for the sesor out of acrylic resi (Figure 1). We tested several colors for the resi ad selected a light pik because this was affected least by sulight from the widows ad it wideed the detectio rage of the sesor. Circuit We itegrated the iformatio from the sesors by usig a microcomputer (Reesas: R8C/15) mouted iside the headphoe speaker housig ad it output a serial sigal. We could therefore reduce the umber of cores i the cable from the headphoe to the computer. The microcomputer set a sigal with output iformatio from the sesors every 120 ms. USB coversio We used a USB coverter (Silico Labs.: CP2102), which coverted the serial sigal to USB. It was mouted i the middle of the cable from the headphoes to eable easy coectio to the computer. We mouted LEDs o all sesors to idicate whether they were coected to the computer. If there was a coectio they bliked quickly ad if there was o coectio they bliked slowly ad we had to re-coect the USB cable. Power supply All the sesors ad the microcomputer worked o the bus curret of the USB, which simplified the coectio of the headphoes. All we eeded were the headphoes ad the computer. 5. Experimetal Results We desiged the Music Scope headphoes to eable ot oly a particular sog to be selected from a ordered list acquired from music retrievals but also to highlight a particular istrumet i the selected sog that a user may wat to hear more clearly. The system allows both audio files ad MIDI files. I the experimets, we used RWC music database, which cotais raw audio data before mix-dow [15]. 5.1 Evaluatio of usability of focus sesors Here, we discuss our evaluatio of how usable the three focus-sesor prototypes were. They were (a) a variable resistor, (b) a bed sesor o a plastic lever, ad (c) a ifrared distace sesor (Figure 10). All headphoes sets used the same digital compass ad tilt sesor. We asked three musical ovices to fid a particular istrumet, which we specified radomly, while listeig to a sog usig the headphoes. We used the sog RWC-MDB-J- 2001 No. 38 [15], which was played by 10 istrumets located aroud the avatar as i Figure 2. The subjects already kew the soud of each istrumet ad were allowed to use all headphoes several times before the experimet to familiarize themselves with their operatio. The adjustable parameters α ad β described i Sectio 3 were tued by the subjects as they wated. The followig describes oe trial of the experimet. (1) Before the sog was started we specified a istrumet to subjects. (2) We started the sog at a midpoit radomly selected for the specified istrumet. (3) We measured the time the subjects eeded to fid the istrumet. The locatio of all istrumets were radomly chaged at every trial. The musical ovices chaged their headphoes after every 10 trials. (a)variable resistor (b) Bed sesor (c) Ifrared distace sesor Figure 10. Three types of focus sesors. Table 1 lists the average results from 100 trials. While the bed sesor was o less accurate tha the variable resistor or the ifrared sesor, it was attached to a plastic lever, which made it difficult to precisely cotrol. Subjects A ad C could fid a istrumet more quickly whe usig the ifrared focus sesor. Subject B, o the other had, could fid a istrumet more quickly whe usig the variable resistor. We selected the ifrared distace sesor because the average time for the three subjects was the shortest. Table 1. Compariso of three kids of focus sesors. Variable resistace Bed sesor Ifrared distace sesor Subject A 1.84 sec. 1.28 sec. 1.12 sec. Subject B 0.72 sec. 1.04 sec. 0.84 sec. Subject C 1.02 sec. 2.01 sec. 0.74 sec. Average 1.19 sec. 1.44 sec. 0.90 sec.

5.2 Evaluatios of usability for selectig sogs We evaluated whether users could select a sog by usig the Music Scope Headphoes. We asked three musical ovices to fid a sog with a soprao saxophoe from the RWC-MDB-J- 2001 database [15]. It had fifty jazz sogs ad oly oe sog had a soprao saxophoe part. We measured the time the subject eeded to fid the soprao saxophoe. The subjects had ot heard the sogs o the database before the experimet except for RWC-MDB-J-2001 No. 38, which we had used i the experimet i Sectio 5.1. After measurig the time usig the Music Scope Headphoes, we measured the time for same trial usig Widows Mediaplayer, which is a stadard music player pre-istalled i Widows XP. The time for each subject to fid the musical istrumet was oly measured oe for the Music Scope Headphoes ad Widows Mediaplayer. Table 2 lists the results obtaied with the Musical Scope Headphoes ad Widows Mediaplayer. All the subjects could fid the sog more quickly whe usig our Music Scope Headphoes, but Widows Mediaplayer was hadicapped because the subjects may have memorized the sogs i the first trial with the Music Scope Headphoes. As a result, our experimet revealed that the Music Scope Headphoes were superior for previewig sogs from a ordered list. Table 2. Compariso of our system ad stadard music player. Music Scope Headphoes Widows Media Player Subject A 224 sec. 845 sec. Subject B 423 sec. 1145 sec. Subject C 642 sec. 751 sec. Average 429 sec. 914 sec. 6. Coclusio The Music Scope Headphoes eabled wearers to cotrol a audio mixer through atural movemets that eabled them ot oly to select a sog from a ordered list acquired from music retrievals but also to highlight a particular istrumet i the selected sog that they wated to hear more clearly. Three sesors were mouted to the headphoes: a digital compass, a tilt sesor, ad a focus sesor for detectig atural movemets. This freed users from mouse operatios so they could select music much more actively. We tested how usable three kids of focus sesors were ad foud that a ifrared distace sesor was better tha either a variable resistor or a bed sesor from the average time it took three subjects to locate a istrumet. We also tested how efficietly the headphoes were i selectig sogs ad the results revealed that they performed better tha the stadard Widows Mediaplayer by beig able to select a particular sog from fifty others. We are ow developig other applicatios for the headphoes. Figure 11 shows where the light s brightess has bee cotrolled accordig to the soud level at the music stads. This allows the user to experiece all soud levels visually as well as aurally. This should help musical ovices who do ot kow what idividual istrumets soud like to lear the relatioship betwee these ad the etire piece. The video is available at http://staff.aist.go.jp/m.hamaaka/video/. We pla to use these headphoes with music retrieval based o voice recogitio to costruct a system i which a display ad a mouse are uecessary. Figure 11. Lightig depedig o soud levels at music stads. Refereces [1] G. Tzaetakis ad P. Cook. Musical gere classificatio of audio sigals. IEEE Tras. o Speech ad Audio Proc., 10(5): 293 302, 2002. [2] F. Vigoli ad S. Pauws. A music retrieval system based o user-drive similarity ad its evaluatio. I Proc. of ISMIR 2005, pp. 272 279, 2005. [3] E. Pampalk. A MATLAB toolbox to compute music similarity from audio. I Proc. of ISMIR2004, pp. 254 257, 2004. [4] T. Sodig ad A. F. Smeato. Evaluatig a music iformatio retrieval system - TREC style. I Proc. of ISMIR2002, pp. 71 78, 2002. [5] W. W. Cohe ad W. Fa, Web-collaborative filterig: Recommedig music by crawlig the Web. WWW9/Computer Networks, 33 (1-6): 685 698, 2000. [6] A. Uitdebogerd ad R. va Schydel. A review of factors affectig music recommeder success. I Proc. ISMIR2002, pp. 204 208, 2002. [7] M. Goto ad T. Goto. Musicream: New Music Playback Iterface for Streamig, Stickig, Sortig, ad Recallig Musical Pieces, I Proc. of ISMIR 2005, pp. 404 411, 2005. [8] K. Hirata ad S. Matsuda. Iteractive Music Summarizatio Based o GTTM. I Proc. of ISMIR 2002, pp. 86 93, 2002. [9] M. Goto: SmartMusicKIOSK: Music Listeig Statio with Chorus-search Fuctio, I Proc. of UIST 2003, pp. 31 40, 2003. [10] Warusfel, O. ad Eckel, G. LISTEN - Augmetig Everyday Eviromets through Iteractive Soudscapes. I Proc. IEEE VR2004, pp. 268 275, 2004. [11]Wu, J., Duh, C., Ouhyoug, M., ad Wu, J. 1997. Head Motio ad Latecy Compesatio o Localizatio of 3D Soud i Virtual Reality. I Proc. ACM VRCIA1997, pp. 15 20, 1997. [12]Goudeseue, C., ad Kaczmarski, H.. Composig Outdoor Augmeted-reality Soud Eviromets. I Proc. of ICMC2001, pp. 83 86, 2001 [13]Pachet, F. ad Delerue, O. A Mixed 2D/3D Iterface for Music Spatializatio. I Proc. of ICVW1998, pp. 298 307, 1998. [14]Pachet, F. ad Delerue, O. O-the-Fly Multi-track Mixig.. I Proc. of AES2000, 2000. [15]Goto, M., Hashiguchi, H., Nishimura, T., ad Oka, R. RWC Music Database: Popular, Classical, ad Jazz Music Databases. I Proc. of ISMIR2002, pp. 287 288, 2002. [16]cyclig74. http://www.cyclig74.com/products/maxmsp/, 2006.