Guidelines for auditory interface design: an empirical investigation

Size: px

Start display at page:

Download "Guidelines for auditory interface design: an empirical investigation"

Reginald Shields
5 years ago
Views:

1 Loughborough University Institutional Repository Guidelines for auditory interface design: an empirical investigation This item was submitted to Loughborough University's Institutional Repository by the/an author. Additional Information: A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy at Loughborough University. Metadata Record: Publisher: c Dimitrios Ioanni Rigas Rights: This work is made available according to the conditions of the Creative Commons Attribution-NonCommercial-NoDerivatives 2.5 Generic (CC BY-NC- ND 2.5) licence. Full details of this licence are available at: Please cite the published version.

This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository (https://dspace

2 This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository ( under the following Creative Commons Licence conditions. For the full text of this licence, please go to:

3 Pllklngton Library,Lo1;lghb.ol'Ough Uruverslty Author/FIling Title... K.I.Y.:.f.\.?0... ~.'.+.;:.... Accession/Copy No. Vol. No...,.... Class Mark i'm9 ~: 1 tolail 2000 " I., ~\ 6 OCT 2000 '.' I

5 Guidelines for Auditory Interface Design: An Empirical Investigation by Dimitrios Ioanni Rigas Doctoral Thesis Submitted in partial fulfilment of the requirements for the award of '.. Doctor of Philosophy of Loughborough University ". ~ -.. ~.'," 29th NovemQer; '... \.\' Dimitrios Ioanni Rigas 1996

7 " the world little knows how many of the thoughts and theories which have passed through the mind of a scientific investigator, have been crushed in silence and secrecy by his own severe criticism and adverse examination; that in the most successful instances not a tenth of the suggestions, the hopes, the wishes, the preliminary conclusions have been realised...!! (Modern Culture, Edited by Youmans, p. 272, Macmillan and Co.) To Afy Parents,. ',..,'... '" ~.". ~." ~ ',., /. ~... :; ~~. ~..) ~~,,... " \,..... ~ I.. " '.,'..... ". t i., :;<:~.::j :.ijt~~;, ':' 1,..'" --...,. "; ",,:,::,'1.. ~.,...."....,.... ~.', I. ~:. \. ::..,...,'.". :...:..:... ''"''... ",I... ~... '. ~"" "Suppose that every tool we had could perform its task, either at our bidding or itself perceiving the need,... then master-craftsmen would have no need of servants nor masters of slaves... " (Aristotle, The Politics, Sinclair and Saunders, Penguin 1982, p.65).

8 Abstract This thesis examines the use of music for communicating information on the Human-Computer Interface. The current use of auditory signals (including the limited use of music) is reviewed and the perceptual advantages and drawbacks inherent in the use of music are identified. A set of exploratory experiments have been carried out to determine how pitch, rhythm, timbre, and stereophony can be used to communicate interface information to users with average musical ability. The experiments conclude that a sequence of notes of rising pitch can be used to give an indication of length by using either the Diatonic or Chromatic scale. No cultural differences were observed within the sample of subjects used. Experiments with timbre have shown that there are a small number of instruments which subjects can recall and that timbre can be used to distinguish different types of information provided they are taken from different musical families. Two further experiments were carried out to examine if music can be used to lessen information requirements 011 the visual channel and to probe the use of rhythmic elements in algorithmic audiolisation. The results of these experiments were then used to design an experimental prototype tool for communicating graphical information to blind users (the AudioGraph). Further experiments were used to verify the approach taken - that of using pitch sequences to communicate co-ordinate positions, cursor movement, graphical shapes and dimensions, control actions, and graphical area scannings. The tool was then used extensively by blind users from the Royal National Institute for the Blind. Blind users used the tool to perform graphical operations and provide feedback on the tool's usefuleness. Overall, blind users were able to carry out a number of graphical operations successfully using music alone. The experience gained from the earlier experimentation and the AudioGraph has resulted in a proposal of a three level approach to musical message design. 11

9 Acknow ledgements I would especially like to express my sincere gratitude to my supervisor, Professor J. L. Alty and to my director of research, Professor E. Edmonds. May thanks also go to the the Department of Computer Studies at Loughborough University and Royal National Institute for the blind. I wish to acknowledge and particularly thank my parents for their constant support and encouragement. Declaration Entered words that we have reason to believe constitute trademarks have been designated as such. However, neither the presence nor absence of such designation should be regarded as affecting the legal status of any trademark. 11l

10 Contents 1 Introduction to the Thesis 1.1 Introduction.... 1:.2 The Need for Research into use of the Auditory Channel in Interface Design The Spectrum of Potential Users who could Benefit from the Use of the Audio Channel in Interfaces 1.4 Thesis Methodology Thesis Contribution A General Outline to the Thesis Auditory Human-Computer Interaction 2.1 Introduction Aspects of Acoustics and Perception Perception of Frequency Perception of Tone Combinations Perception of Space.... Listening - What Does it Involve? Perceptual Aspects of Music Music and Language Perception of Timbre. Interpreting Music.. Memory of Individual Notes Memory in Extended Music Studying Perception of Continuous Music 2.5 Enabling Technology for Audio Systems The Application of Audio to Interface Design Auditory Icons, Earcons and Guidelines Auditory Applications in Interfaces SonicFinder LV

11 Soundtrack The Auditory SharedARK Audio Projection in Window Space Fields Sound-Graphs.... Auditory-Enhanced Scrollbar.... Other Auditory Applications Communicating Hierarchical Menus Auditory Software Environments LogoMedia WinProcne/HARP InfoSound Visually Impaired Computer Users Output Media of Non Visual Display Human Factors 2.8 Summary Communicating Using Music 3.1 Introduction Research Approach and Tools Used Musical Structure and Understanding Tools Used Subjects and Feedback Ascending Pitch Note Sequence Experiments One-Sequence Two-Sequences Discussion Musical Instruments Experiments Discussion Stereo Perception Communicating with Music. 3.7 Reducing Visual Complexity The PCTE Object Management System Information to be Communicated and Mapping Single Communication of Information Multiple Communication of Information Discussion Supporting Algorithmic Auralisation Sorting Algorithm v

12 3.8.2 Objectives and Musical Mapping Continuous Musical Messages Experiment Discussion. 3.9 Overall Discussion 3.10 Summary AudioGraph: Experiments in the Graphical Domain 4.1 Introduction Research Objectives AudioGraph: An Experimental Framework System Architecture Overall Description of Functionality User Interface Organisation and Strategy Graphical Drawing Area User Control Panel Area User Interface Input Musical Presentation of Single Graphical Objects Scanning the Graphical Drawing Area Local Scanning from the Auditory Cursor 4.4 Experiments with Musical Mappings Location Experiments Sequence of Notes Experiment Reference-Actual Notes Experiment Actual-Notes Experiment Discussion of Location Experiments. 4' Navigation of the Cursor Communication of Graphical Shapes... The Graphical Size Experiments The Cursor's Local Scanning Experiments Identifying Position of the Cursor Identifying Objects Around the Cursor Experiments with Editing Experiments in Perception of Diagrams Arbitrarily Arranged Objects Meaningfully Arranged Objects Interpretation under Different Perceptual Contexts Categorising Diagrams According to Content Feedback from Structured Interviews 185 VI

13 4.6 Critical Assessment from RNIB Overall Discussion The Basic Design of the Prototype Moving the Cursor to a Particular Location Navigating with the Cursor Move Successfully between Graphics and Control Mode Understand Editing Operations Recognise a Graphical Object... Carry out a set of Actions in Sequence Select a Shape and then Alter its Size Select a Control Action and Provide a Parameter Scanning the Space Understand a Complex Collections of Shapes Users Views of the Utility of the Approach Overall Assessment 4.8 Conclusion. 4.9 Summary Empirical Guidelines and Final Comments 5.1 Introduction Designing an Interface Using Music First Level: Designing Musical Mappings One-Meaning Messages Multiple-Meaning Messages Pitch Usage Rhythm Usage Semantic Tunes Musical Instrument Usage Stereophony Second Level: Creating Perceptual context 5.5 Third Level: Reasoning and Semantic Coding Application Dependent User Lexicon 5.6 Future Work AudioGraph Introduction of Auditory-Musical Standards A Generic Musical Library Software Debugging. 5.7 Epilogue vu

14 A Tools Used A.1 MIDI.. A.2 Sound Blaster Kit. B Musical Test C Raw Data and Analysis C.1 Perceiving Sequences.... C.1.1 Diatonic (Major) Scale. C One-Sequence. C Two-Sequences C.1.2 Chromatic Scale... C One-Sequence. C Two-Sequences C.2 Instrument Recognition Experiments C.3 AudioGraph.... C.3.1 Sequences of Notes Experiment C.3.2 Reference-Actual Notes Experiment. C.3.3 Actual-Notes Experiment C.3.4 Dimensions of Objects Vlll

15 Chapter 1 Introd uction to the Thesis 1.1 Introduction This thesis investigates the use of music as a communication metaphor in user interfaces. The emphasis is to examine how structured music might be used in an interface and not so much on making better tools and interfaces for blind people. The term music refers to musical elements and structures, such as pitch, rhythm, melody, and harmonic sequences produced by musical instruments or computational equivalents. Although music is a rich medium containing numerous structures introduced by musicians over many years of human evolution and we live in an age where multimedia systems are fully capable of producing musical sounds relatively easily and effortlessly, the use of music in interfaces (not only with computers, but also with other machines) is currently at a relatively low level. Music (and indeed the auditory channel as a whole) has been neglected in the development of userinterfaces possibly because there is very little known about how humans understand and process music. It is not intuitively obvious how to use musical structures in interface design. Current user interfaces focus heavily on visual interaction. The consequence of this is that user interfaces have become more and more visually crowded as the user's needs for communication with the computer increases. In addition, the development of complex visual user interfaces creates difficulties for computer users with special needs (such as the blind) who usually try to bridge this gap with the use of synthesised speech. Although the use of speech appears to offer a way of interpreting visual user interfaces to blind users, its usefulness decreases as the complexity of the visual interface increases. Furthermore there are applications where communication by speech is not always satisfactory. There are problems, for 1

16 example, in using speech to describe a diagram of a reasonable complexity or for expressing a complicated mathematical formula. Music may well provide alternative ways of conveying information to the user. This thesis examines this hypothesis by identifying some common musical structures and experimenting with these structures in a variety of circumstances with blind and non-blind users. The results are used as a basis for producing a set of general-purpose guidelines to assist interface designers using music either as an output medium or in conjunction with other audio and visual media. 1.2 The Need for Research into use of the Auditory Channel in Interface Design It is important that the auditory channel should be further investigated for the following reasons: The auditory channel has been somewhat neglected in the area of user interface design. This is despite the fact that auditory interaction is one of the primary forms of human interaction. Music has a number of powerful properties such as pitch, rhythm and melody which ought to be able to convey rich messages from software components to the user. Music, as well as other forms of audio, is of a particular value when the user can not be disturbed visually. The visual channels are becoming very crowded especially in those circumstances where visual or other means are not particularly useful. For instance, current monitors are often very overcrowded yet designers still try to present more information visually. When output is directed to users who do not have constant visual contact with the VDU (Video Display Unit) screen, an alert or interrupt is required. This over emphasis on visual communication presents serious interface difficulties for visually impaired users. 2

17 1.3 The Spectrum of Potential Users who could Benefit from the Use of the Audio Channel in Interfaces In the current information age, more and more people, with diverse backgrounds and experience, use computers as part of their daily work both in their work and home environments. Music is also an integral part of most people"s daily lives. Research in the area of using auditory-musical stimuli in HCI may well benefit a large proportion of computer users. The audio channel (and music in particular) could benefit those users with special HCI needs (for example, the visually impaired). The use of music as a communication metaphor could assist in a number of interface situations including the following: 1. Reducing the complexity of visually crowded user screens by presenting some information using music. 2. Audiolisation of the internal execution of algorithms. This has particular implications in understanding and debugging programs from listening to them. 3. Presenting information of a graphical nature to blind users who usually interact with computers using speech. However, one must also note that the acoustical channel as an interaction medium has a number of characteristics which impose limitations on the nature of acoustical interaction with humans. As it is generally known, two of these limitations are mainly concerned with the concept that an acoustical signal is an 'object in time' and that there are certain limits to the speed of the human processing of acoustical information just as there are limits involved in reading text. 1.4 Thesis Methodology The methodology adopted by this thesis is: 1. Literature survey. (a) A study of current and previous applications of audio and music in HCI. 3

18 (b) A brief discussion of music and how it has evolved in human evolution. (c) A study in perception, memory and other cognitive aspects of acoustical input processing, especially music. 2. Structured laboratory experiments. These experiments allow the collection of data for statistical analysis and subsequent identification of perceptual patterns as well as the capabilities and limitations of using one or another form of musical messages to convey meanings. Laboratory experiments have been extensively used in this thesis to obtain a general viewpoint of how nonmusically educated people process musical stimuli. 3. User post-experimental interviews and questionnaires. These post-evaluation investigation procedures probe user attitudes, such as views and difficulties experienced with musical interaction. 4. An integrated study of all the results generated. This study takes into consideration all the experimental findings by comparing similarities or difference in the results, finding musical patterns in a number of different problem domains which have shown a consistent user-perception and understanding. This study also provides a set of experimentally derived guidelines and mechanisms for developing musical messages as part of an HCI system. 1.5 Thesis Contribution Acoustical information (including music) has been an integral part of sensory input for most of human evolution. The thesis demonstrates that an average, nonmusically educated listener can extract information from musical messages. It shows that appropriate musical messages alone can be used as an interface mechanism to communicate aspects of the operation of a simple algorithm, to enhance visual interaction or to understand graphical information. Inevitably, for acoustical information, certain limitations exist. It is applicable in a number of different interfaces situations but it cannot always substitute for the visual or other channels. Musical stimuli can be used when acoustical interaction matches with the user's perceptual needs of the problem domain. It can be used in multi-media systems as an additional medium, but there are clearly situations where it fails to offer the same degree of perceptual detail as the visual medium. Musical interfaces, when combined with other non-visual interaction mechanisms, offer a promising alternative for blind computer users who currently have 4

19 considerable difficulty in using computer systems with their emphasis on visual interfaces. Speech, which has been used quite extensively in blind user-interaction, fails in certain circumstances to offer a totally satisfactory communication mechanism. For example, reading co-ordinates to blind users helps them to identify the location of a graphical object accurately but as the number of locations increases, the blind user fails to remember the information provided. Some of the experimental results in this thesis suggest that when speech is integrated with a musical stimuli, more complex interactions can be supported. Musical messages have been found useful for conveying information to blind users such as: Location within space, graphical objects (e.g., circles, squares). Editing operations such as contracting and expanding. Simple diagrams showing something meaningful. Experiments under the AudioGraph experimental framework have shown that relative location using musical representations for co-ordinates, within a particular space, is perceived by blind users with a reasonable degree of accuracy. Absolute location within the graphical drawing area of the AudioGraph was not possible because people can not usually perceive absolute pitch. However, with the aid of sequential notes, in ascending or descending in pitch, blind users perceive location by comparing relative differences of pitch with the additional help of stereo. Finally, from all the experimentation, a number of lessons were learned in using music to communicate in different interface situations. This, along with the experimental data, enabled us to produce a set of guidelines for using music in interfaces. 1.6 A General Outline to the Thesis This research work is presented in a number of chapters and appendices so that the documentation is readable and easy to comprehend. There are seven chapters and two appendices. A short description of the chapters and appendices follows: Chapter 1: Introduction to the Thesis provides a brief introduction to the research work carried out and positions the thesis within the general context of HeI research. 5

20 Chapter 2: Auditory-Human Computer Interaction introduces and reviews the use of audio, outlines possible applications of audio and music in HCI, and examines other aspects related to auditory-musical interfaces which place the research work into context. Chapter 3: Communicating Using Music documents a set of experiments focusing on the listener's general perception of musical structures such as pitch, rhythm, timbre and stereophony. The possibility of using music to reduce visually crowded displays is also empirically investigated. The chapter also looks into the auralisation of aspects of a sorting algorithm using musical structures. Chapter 4: AudioGraph: Experiments in the Graphical Domain discusses experiments performed under an experimental framework prototype (Audio Graph) to assist blind users in handling information of a graphical nature. Chapter 5: Empirical Guidelines and Final Comments suggests how music can be used in auditory interface design through the development of experimentally derived guidelines showing how some musical structures can be used as communication metaphors in user-interfaces. In addition, two appendices are provided: Appendix A: Tools Used. This appendix contains information about MIDI and SBK tool. Appendix B: Musical Test. The main questionnaire used for identifying musical knowledge in subjects. Appendix C: Row Experimental and Processed Data. This appendix presents the row experimental data. 6

21 Chapter 2 Auditory Human-Computer Interaction 2.1 Introduction This chapter provides an overall review of the existing experimental literature relevant to the application of music in user interfaces. More specifically, the chapter reviews fundamental research work, including the following: 1. Aspects of acoustics and psychoacoustics - a discussion of the perceptual concepts of acoustical signals, memory aspects of tone combinations and musical properties 2. General concepts of Human Computer Interaction (HCI) - a brief discussion of the different types of interfaces, perceptual and sensory concepts and current research development in auditory and musical HCI. 3. Technological concepts used to develop auditory interfaces and the use of the MIDI (Musical Instrument Digital Intp,rface) approach. 4. Auditory interfaces and experiments using sound as a communication metaphor in HCI. The process of interaction which t<jkes place between a human and a computer has been termed the Human-Computer Interface, the Man-Machine Interface, the Human-Systems Interface, Computer-Human Interaction and Human-Computer 7

22 Interaction. However the term HCI (Human-Computer Interaction) will be the common abbreviation used throughout this thesis. The main goal of HCI is the production of an interface system which facilitates an effective, efficient, comprehensible, consistent and usable means of manipulating and operating computer resources as well as their underlying software, from a human perspective. Major concerns [1) in HCI include the production of interfaces which: Provide clear and abstract user operations. Support the user to a maximum extent in convenient usage of the interface. Representation of data and information in a relatively easy manner. For a number of years, researchers have attempted to establish effective interaction mechanisms between tools, machines and humans. Early in the history of computing, the command line interface was developed which provided a primitive means of interfacing between a human being and a computer in a visual manner. A further development of command line interaction was the introduction of batch languages which simply incorporated the idea of a collection of user commands written in a file which could then be executed. The Unix operating system, although primarily a command driven system, employed the batch language concept at a high level by offering a script facility for executing a collection of commands in order to accomplish a given task. Programming languages provide another form of interaction which allows a user to interact with the underlying computer's operating system via a program. A large number of programming languages exist for accomplishing different tasks. Loy and Abbott [2) have remarked that a programming language can be viewed as an interface between the problem space and the solution space as well as a mechanism for providing an appropriate level of knowledge abstraction which may contribute towards the effectiveness of the interface. A number of domain-specific interfaces have thus been developed for utilisation by specific scientists, designers and artists, musicians, medical workers, programmers and others. Human computer interaction has to take place through the various senses. Historically, Aristotle identified five senses - hearing (audition), vision, smell (olfaction), taste (gestation) and touch (cutaneous sensitivity). Further to these five basic senses, there are a number of others such as static senses (equilibrium) and muscle senses (kinesthesis). In recent years, a set of organic senses have also been recognised such as those which cover ill-defined experiences, such as thirst and hunger [3). The hearing sense is rega.rded as one of the most important senses after vision (which is normally regarded as occupying the leading position). Hearing depends 8

23 upon sound waves or vibration, whereas vision depends upon light waves. Sound is produced by variations in air pressure. 2.2 Aspects of Acoustics and Perception Living systems interact with each other as well as with their environment. These interactions are based on processes which include pattern recognition and information processing. The brain is capable of performing complicated and fundamental processes with input directed from the senses. Higher level functions (in the brain) can recall stored information, and other presentations obtained through the senses. They can analyse, manipulate and store this information. When no external input is involved then the 'thinking' process of the brain is defined [4J. There are a number of physical and biological concepts involved in communicating using sound as shown in figure 2.1. Excitation Mechanism... Source 8 Vibrating Element 'T'~:..., I, I,,, Energy Supply I I.~.> : I ~ Determination of fundamental tone characteristics '. I, Conversion into air.:... ~.~ pressure oscillations 1 I : (sound waves), final : I j detennination of.to.. n.. a1... 1, characteristics Meditun Proper Boundaries I Sound propagation :.... J;;:..: ; I : Reflection. absorption : I : ' and reverberation : Receptor ~ ~ I,. i Processing, imaging,.. i.~ identification, storage,,, : and transfer to other brain centres Conversion into oscillations Primary frequency... :.>;....so~~~g....., I Figure 2.1: A representation of some physical and biological systems and functions involved relevant to music as adapted from [4J 9

24 The functions, as explained in Roederer [4], which are relevant to music are: 1. Source. The elements involved in the source are: Primary Excitation Mechanism. This is the primary energy source. Fundamental vibrating element. This determines the actual pitch of the tone and controls the excitation mechanism. Resonator. This converts the oscillations of the vibrating element and provides its final timbre. 2. Medium. This transmits the sound between the source and receptor. The boundaries of the medium must also be considered such as walls, floors, because these affect sound propagation by reflection and absorption. 3. Receptor. This involves the following: Eardrum. This detects the sound waves and pressures reaching the ear and converts them into mechanical vibrations which are directed to the inner ear. Auditory nervous system. This transmits the neural signals to the brain for processing. Figure 2.2 shows Roederer's (1987) [5] stages of object perception. External stimuli are processed through various stages in the sensory systems. For the visual sense, there are 'objects in space' but in an acoustical space the 'objects are in time', because of the temporal nature of sound waves. An external acoustical object in time first needs to be converted into a neural signal. This is then passed to the primary cortical receiving areas which are situated in the frontal lobes. Then, it is communicated to the rest of the brain. Although Figure 2.2 describes the visual sense, it can be thought of as describing the auditory sense as well. The human brain collects and stores information throughout its lifetime. Roederer (1994) remarks that 'the act of remembering', or memory recall of a sensory event, consists of the re-elicitation or 'replay' of that particular distribution signal which was specific to the original sensory event [4]. Pribman (1977) has attempted to describe the biological representation of remembered events in the distributed memory of the brain [6]. This representation indicates the absence of 'photographic' coding and 'imaging of environmental scenes' (Kohonen (1988)) [7]. Photographic coding defines an absolute point to point correspondence between the object presented and the neural presentation in the brain. An alternative representation 10

25 Envirorunental Signals Peripheral Stimulus Primary Areas Feature Detection Secondary Area. Feature Integration i-'....: 'Sameness'... ".-... Signal... Associated Areas Recognition Frontal Lobes Analysis f<'......,.... :::'fuput from other Senses.. )...,.... f<' Figure 2.2: The Stages Involved in Object Perception [5]. 11

26 scheme - hologic coding - defines a complex mapping between characteristics or points of the 'object' and the neural representation. A characteristic of this hologic mode of storage is a process called associative recall (Kohonen, 1988) [7]. This process enables the replay of a particular neural activity (which represents a particular object) from a cue other than the original one. It is on this principle that optical illusions are based. Left Hemisphere I Right Hemisphere - Stop Consonants - Phonological Attributes - Comprehension of Speech - Propositional Speech - Analysis of Nonsensical Speech Sounds - Spoken Text (verbal Content) - Rhythm. Short-term Melodic Sound Sequences - Verbal Memory - Steady Vowels - Stereotype Attributes (Rhythm in Poetry) - Intonation of Speech. Environmental and Animal Sounds - Emotional Content of Speech - Pitch. Timbre. Tonality and Harmony - Sung Text (Musical and Phonetic Content) - Holistic Melody - Tonal Memory Figure 2.3: The Separation between the Left and Right Hemisphere of the Brain for Auditory Tasks (adapted from [8]). The human brain is symmetrically divided into two hemispheres (see figure 2.3). The left cortex is connected to the right side of the body and vice versa. There are 200 million fibres (the corpus callosum) connecting the two hemispheres. In the process of evolution of the human brain, each hemisphere has become specialised for particular functions. Analytical and sequential functions of language have become the duty of the left, dominant hemisphere, and the right hemisphere has taken over the more holistic functions such as spatial integration or the synthesis of more (instantaneous) patterns of neural activity [9]. An overall discussion on these 12

27 concepts can be found in [4]. Pitch and tonality in music are handled by the right hemisphere whereas speech comprehension and production, as well as musical rhythm, are handled by the left hemisphere [10]. In figure 2.3 an overall presentation of the tasks handled by the left and right hemispheres in the brain is shown with regard to auditory tasks. There are a number of factors which contribute to perceiving, recognising and interpreting auditory stimuli - both perceptual and physical factors are involved [11] Similarity and dissimilarity, proximity and good continuation are some perceptual factors found not only in the auditory sense but also in the visual sense as well. However, sound location, frequency, rhythm, scales, and keys are examples of the physical contributing factors [12, 13]. The use of rhythm, for example, contributes very significantly in the recognition of the auditory stimuli [14]. Consider the following list which shows several characteristics of sound. Physical sound stimulus (e.g., frequency, amplitude, complexity, resonance, phase). Perception of intensity (e.g., intensity discrimination, loudness, frequency). Perception of frequency (e.g., frequency discrimination, pitch, intensity). Perception of space (e.g., monaural cues, binaural cues, echolocation). Perception of tone combination (e.g., beats, combination tones, masking). Perception of music (e.g., octaves, notation scales, absolute or perfect pitch, pitch sequences, temporal organisation, chromosthesia). In the following sections, some of these concepts are briefly described. A sound can be described as a pattern of successive pressure disturbances which use a particular molecular medium. For example, the medium could be gaseous, liquid or solid. It must be made clear that sounds do not exist in the absence of a medium. The main properties of sound waves are characterised by their variation in frequency, amplitude (or intensity), complexity and phase. By convention in acoustics, sounds are characterised by frequency. Frequency is a measure of the number of cycles or pressure changes completed in a second. In other words, the rapidity of pressure changes. Amplitude refers to the extent of displacement of the vibrating particles in either direction from the position of the rest. The complexity of sounds 13

28 results because vibrating bodies do not usually vibrate at a single frequency, but rather with a collection of frequencies. Finally, phase refers to the part of the cycle that the sound wave has reached at a given point in time Perception of Frequency There are complex relationships between the detection of frequency and sound levels by the human ear. The figure 2.4 shows the audibility areas for both speech and music (which differ considerably). The human ear can accept a frequency range from 20Hz to15khz and is capable of distinguishing frequency changes of as little as 1.5Hz at low frequencies but performance is significantly reduced at higher frequencies. Harris (1952) reports that a human being can detect changes in frequency of about 3Hz for frequencies up to about 1000Hz. For frequencies between 1000Hz and 10,OOOHz, the frequency discriminability can be specified as a constant [16]. For example, at 10,OOOHz, a 40Hz change is required in order for one to detect the change. The ear also performs a filtering process which allows characteristic sounds or noises to be distinguished from background noises [3J. Hearing characteristics include pitch, loudness and timbre. Pitch is the sound's frequency. Low frequencies produce a low pitch and high frequencies produce a high pitch. Loudness is proportional to the amplitude of the sound. Timbre denotes the special set of characteristics associated with a particular creating instrument. For example, different instruments produce different timbres. Pitch refers to how high or Iowa sound appears to be. Usually, it is determined by the frequency of the tone reaching the ears. A typical analogy is that high-pitch sounds are heard from high-frequency tones and low-pitch sounds derive from lowfrequency tones. However, it must be noted that there is no precise correspondence. Mel is an arbitrary unit which scales the dimension of pitch. Intensity has also an effect on the perceived pitch of tones. Stevens (1935), determined the effect of intensity on the pitch of tones for a number of frequencies between 150Hz and 12,OOOHz [17J. In brief, for frequencies above 3,OOOHz, a constant pitch is maintained by increasing intensity. Finally, for frequencies ranging from 1000Hz to 2000Hz, the effect of intensity is minimal [18J. 14

29 , I 100 db o T' RESH OF F.. ~LD ~IN I ~ --,.. '"... ii,; ", ~.. Ln = 100 pho!./ \ "\,;..... / ----~ MUSIC.... \ \\... - I ~ " " V... I 80., ".. / 1\\ '.- ~ ~: \\ \,; -- \ -< ~.. " I\ I \\ \~, \ " \.. ~"... I ~"\... ~..,;..,; SPEECH.. " ~ I\~ 1/, \ V I J I 1... i'-.. 4'0... -,'/ V,;,; I\.}.. '\ ~ - - V "'" i' / " ~ "'-...)... ~ V THRE HOLD OF / HEAR [NO 1--.,3 ~ V I Hz ~,~ FREQUENCY > Figure 2.4: A diagram showing the audibility area of music and speech taken from [15]. The dotted areas show the sound levels usually associated with speech and music. The solid lines indicate levels of power required to achieve equal perceived loudness. 15

30 2.2.2 Perception of Tone Combinations An orange light is produced when a red and a yellow light are combined but what really happens when two tones are combined? One of the major contributing factors towards the answer of the question depends on the similarity of the two tones. When two tones of similar frequency sound together, a third discrete tone derives. Therefore, a listener cannot hear the two initial tones. The quality of the third derived tone depends on the difference of frequency between the two original tones. Some combinations (more can be found in [3]) of tones are: 1. A difference in frequency of less than 6Hz produces a single tone with a variation in loudness. 2. A difference in frequency of 6Hz to 24Hz produces a single tone with a series of distinct impulses equal to the difference in frequency per second. For example, two combined tones of 400Hz and 420Hz will produce twenty (20) impulses per second. 3. A difference in frequency by 25Hz to about ten per cent (10%) of the frequency will not produce distinct impulses. The changes in loudness which derive from the first two of the three cases examined above are called beats. However, in order to answer the initial imposed question, the possibility of the two combined tones being substantially different should also be examined. If the two combined notes differ by more than ten per cent (10%) then the two distinct notes can be heard. Consonance and dissonance are two terms which are used to refer to the various combinations of tones. Consonance is a combination of tones which produces a pleasant result and dissonance is a combination of tones which produces an unpleasant result [19J. For a further reading and a detailed discussion about various combinations of tones see [20J. In visual circumstances, one strong visual stimulus (e.g., a light) prevents the perception of a weaker visual stimulus. In the same way, in auditory circumstances one tone may mask another tone. Masking happens usually when one tone is very intense and the other tone is very weak (i.e., the louder sound will mask the softer sound). Frequency is also another factor which has an important role in masking. When a wide range of frequencies are mixed then a white noise is introduced which, in fact, is one of the most effective masking noises. For example, white noise can be heard when an FM radio is tuned in between stations. Further reading on masking can be pursued in [21, 22J. 16

31 2.2.3 Perception of Space One of the functions of the human auditory system is to localise sounds in space. The auditory system achieves this by processing both the relative distance and the direction of the sound stimuli. It has been observed that monaural cues (i.e., one ear) can identify relative distance [23]. An important aspect of the distance identification is the intensity or loudness of the sound. The louder the sound, the closer it is perceived to be. If two sounds are being heard simultaneously then the louder is perceived to be closer. By altering the intensity (i.e., loudness) the perception changes in the following way: 1. A sound that grows louder gradually is perceived to be approaching. 2. A sound that grows softer gradually is perceived to be receding. In addition, change in the distance of a moving object can be perceived from a shift in frequency and pitch of the sound source to the listener. This is called the Doppler shift. This effect is often heard in the street when a vehicle passes, for example, an ambulance siren'. In the Doppler shift, the sound waves do not share a common centre because the sound source is moving. Thus, the stationary listener perceives an increase in pitch because the frequency of the sound waves, passing that particular point, increases. The frequency increases because there is a lessening of the distance between the waves transmitted by a moving sound source. Conversely, there is a lengthening of the distance between waves (and hence a decrease in frequency) when the source is receding. The discussion on localisation, has so far, only considered monaural cues. When binaural cues (i.e., both ears) are used then the localisation of a particular sound becomes even more precise because this process also utilises the relative stimulation of the two ears. In other words, the auditory system makes use of the physical differences in stimulation that arise between the two ears due to their separation in space. Binaural hearing is the basis for stereophonic listening. It is produced by dichotic stimulation and results in an experience of an aural space. Early techniques of stereophonic recording involved the use of two microphones placed at different locations to record a particular sound so as the difference in sounds perceived by 'The term derived from Christian Doppler. discovered it. A nineteenth-century Austrian physicist who 17

32 the two ears was simulated. Modern techniques involve the use of more than two microphones. 2.3 Listening - What Does it Involve? Listening [24J is regarded as a complex process involving four elements. These elements are: 1. Hearing. This is the physiological process of receiving acoustic stimuli or signals. Hearing is fundamentally important in listening, because any form of listening requires a good hearing capability. 2. Attention. The attention and the conscious awareness of the listener are required in order to attend to a certain message. 3. Understanding. The interpretation and assignment of a meaning to the message or signal received and attended. 4. Remembering. The process of storing the acoustical information received for later retrieval. It involves two types of memory, the short-term memory (STM) and the long-term memory (LTM). There are a number of axioms [24 J for listening which, although they apply to conversational listening, are also directly relevant to the listening of music. These are: 1. Listening is a mental operation (hearing is physicaj!). 2. Listening is active. It involves several intellectual operations. A person needs to be alert when listening. 3. Listening is learned. It can be learned and it improves with training. 4. Listening is complex. It involves, as noted above, hearing, attention, understanding and remembering. 5. Perceptive listeners must be trained. No matter how much a person wishes to listen, they can only do so to the level to which they have been trained. 6. Listeners share responsibility for communication success. Listeners have to exercise their minds and sometimes do some fast mental maneuvering to understand a message (this applies particularly to conversation). 18

33 7. Listening is as vital a communication skill as reading. 8. Listening is crucial to all communication. Listening is a major part of verbal communication and without it communication itself cannot exist. Further reading on the stages involved in listening as they have been described above can be found in [24]. 2.4 Perceptual Aspects of Music Music itself is a succession of sounds that vary in frequency, intensity, complexity and duration. It is a very complex kind of acoustic information. In order to discuss music some definitions must be considered. In music, the term scale refers to an ordered series of notes (ascending or descending, and usually spanning an octa.ve) which offer a foundation for musical composition. For example, two possible scale arrangements of an octave in the diatonic scale -C major and A minor - are shown in figure 2.5. C MAlOR MINOR BASED ON A C D,, 0 A, C A, C D,, 0 A Figure 2.5: Some Musical Scales. In fact there are many possible scales - Major, Minor, (both Harmonic and Melodic), Whole Tone, as well as a set of modes (two of which are scales), and the chromatic scale (all notes). There are also a number of harmonic relationships in music and in the diatonic scale, each letter stands for a particular musical function as shown in the following list: C Tonic, D Supertonic, E Mediant, F Subdominant, G Dominant, A Submediant, B Leading Note, C Tonic. The intervals between notes are described as follows: 19

34 C-C Unison C-C sharp Augmented C-D fiat Minor second C-D Major Second C-D sharp Augmented second C-E fiat Minor third C-E Major third C-F Perfect fourth C-F sharp Augmented fourth C-G fiat Diminished fifth C-G Perfect fifth C-G sharp Augmented fifth C-A fiat Minor sixth C-A Major sixth C-A sharp Augmented sixth C-B fiat Minor seventh C-B Major seventh C-C' Octave Further details and discussion on the above elements can be found in [25]. Historically, studies in musical perception can be traced back to 1879 in the psychological institute at the university of Leipzig where Wilhem Wundt performed a number of measurements using auditory stimuli. In 1883, Stumpf [26] carried out a study of tones with musicians and non musicians. A first complete publication focusing in the psychology and perception of music appeared in 1895 by Theodor Billroth [27]. Thirty-one years later, another psychologist, Johannes Von Kries (1926) wrote a similar book about auditory perception. While this research activity was in progress in Europe, W E Scripture (around 1890) set up a laboratory at Yale university in the United States. He based his research largely on the direction of Wundt. Scripture carried out a number of experiments for vision, hearing and other senses. It was in this laboratory that the first measurements in pitch discrimination were performed. Hughes, a successor of Scripture, compared musical scores with an outside criterion of musical ability [28]. In addition, Carl Emil Seashore can be considered as one of the most important pioneers at Yale. His research work included the invention of the voice tonoscope 2 and the audiometer. Seashore published, among others, the Psychology of Musical Talent (1919) and The Psychology of Music (1938) [29]. Some individuals have an ability to identify and reproduce isolated musical notes without a help from a musical reference. This ability is called absolute or perfect pitch and it is generally reported as being rare. However, systematic training can enhance a person's skill in pitch recognition [30]. 2The voice tonoscope offers a visual picture of a tone so singers can see the sounds they are producing. 3 Audiometer is an instrument which measures the threshold of hearing for the intensity of sounds at various frequencies. 20

35 Chromesthesia 4 is a form of synaesthesia. Synaesthesia is described as the phenomenon in which stimulation of one sensory modality almost simultaneously evokes an experience in a different sensory domain [31, 32]. Thus, in chromesthesia, sounds are not only aural stimulus but also result in colour sensations. As early as 1914, Langfeld [33] found that treble notes offer stable sensations of light colours and bass notes those of dark colours 5. Music has been characterised as designed uncertainty [34]. It is certainly a complex ordered pattern with properties commonly found in most aesthetic experiences such as emotion, tension, change, uncertainty or even surprise. It is certainly worth considering the proposal by Roederer (1994) that in the course of evolution, the attainment of music perception occurred as an incidental consequence dictated by the tremendous complexity demands placed on the auditory system, to serve, first, as a distance and locus detector and, later, as a communication system [4] Music and Language Philosophers, scholars and musicians have addressed the relationship between music and language [35, 36, 37]. In brief, there are a number of similarities between language and music such as: An inherent structure that evolves over a temporal continuum. Specifically the human species only, has the capability of acquiring fulllinguistic and musical competence. Both (i.e., language and music) are capable of creating an infinite number of novel sequences and combinations. Both have meaning for the listener. Both are innate expressions of human capacities. Both share several significant cognitive characteristics. Both involve the meaningful use of sound patterns. Both are modes of communication. 4Jt is also known as colour hearing and sometimes it is also spelled as chromaesthesia. 5The composer Messian had this ability. 21

36 Both have universal features across different cultural forms of natural language and music. Further reading of the similarities between music and language can be found in [38, 39, 40J. It has been argued that music has meaning, symbolic content, semantics or some 'significance', that music expresses the inner nature of metaphysical will or metaphorically exemplifies properties such as fragility and heroism [41 J. Roger Scruton (1987) remarks the following [42J:.. to understand musical meaning... is to understand how the cultivated ear can discern, in what it hears, the occasions for sympathy. I do not know how this happens; but that it happens is one of the given facts of musical culture... Let us consider an example. In the slow movement of Schubert's G Major Quartet, D.887, there is a tremolando passage of the kind that you would describe as foreboding. Suddenly there are shoots up from the murmuring sea of anxiety a single terrified gesture, a gesture of utter hopelessness and horror... No one can listen to this passage without instantly sensing the object of this terror-without knowing, in some way, that death itself has risen on that unseen horizon...in such instances we are being led by the ears towards a knowledge of the human heart. On the other hand researchers have also argued that musical meaning has little resemblance to meaning in the natural language sense. Lerdahl and J ackendoff (1983) express a negative view about the linguistic meaning of music [43J: Many previous applications of linguistic methodology to music have foundered because they attempt a literal translation of some aspect of linguistic theory into musical terms-for instance, by looking for (a)musical... semantics... But this is an old and largely futile game... Whatever music may 'mean', it is no sense comparable to linguistic meaning. Furthermore, Peter Kivy (1990) contends that, however expressive music can be, it does not carry meaning in the same way that language carries meaning. He specifically remarks [44J: Unlike random noise or even ordered, periodic sound, music IS quasisyntactical; and where we have something like syntax, of course, we 22

37 have one of the necessary properties of language. That is why music so often gives the strong impression of being meaningful... But although musical meaning may exist as a theory, it does not exist as a reality of listening... It seems wonderful to me, and mysterious, that people sit for protracted periods of time doing nothing but listening to meaningless-yes, meaningless-strings of sounds Perception of Timbre Experiments with trained musicians who were asked to rate the similarity of musical sounds produced by different musical instruments [45] showed that there were three families (with subfamilies) in terms of instrument similarity. Iil!J libj K.. lqij [Q[J 4 4i?, ~ \\---l:- '" --,, -, ' -, ;-iif 01,02 Obooo Cl,Cl O:.inoq, XI, Xl, Xl S&x"flboaoo 51,$2,53 ~I EH Ensfub Hora FH F.,. HOlD ", "'.,. 'IM Trombola Ft. "'. BN... Figure 2.6: A three dimensional representation of similarities and differences of musical instrument [45]. Participants in these experiments were asked to rate from very dissimilar (1-10), dissimilar (11-20), to the very similar (21-30). A technique called multidimensional scaling produced a three dimensional representation as shown in figure 2.6. The 23

38 squares and cubes represent particular instruments and similarity is shown from the distance among the squares or cubes where a small distance shows a great degree of similarity. The tree families identified with some of their subfamilies were: 1. Family one. E-flat clarinet (Cl), soprano saxophone (Xl), bass clarinet (C2), and English horn (EH). 2. Family two. Oboe (01) and mute trombone (TM). 3. Family three. Bassoon (BN), French horn (FH), cello, trumpet (TP), and flute (F1), Interpreting Music One of the most fundamental questions that one can ask is what natural mechanisms and propensities of the human auditory system can determine the way in which we hear musical sounds to be grouped? Musical sounds do not stand in isolation. They stand in significant relation to each other. Musical perception begins when listeners notice note relationships and they start grouping them. In visual perception, there are a number of grouping tendencies according to Gestalt principles of perception, some are shown in figure 2.7 [46]. Human adults, children and animals are subject to the operation of these principles [47]. Both pitch and location can be used to group music as well as pitch. Researchers have argued, however, that location is a less important grouping than pitch [48, 49, 50,51]. Some of the reasons for this are: Sounds with transients (e.g., clicks and chirps) can be localised by listeners more accurately than steady tones. Echoes and reverberations can cause sounds to arrive to the ear from other directions than the source. The human auditory system fails to distinguish between a single sound coming from directly in front of the head and two other identical sounds coming from equidistant sources on either side. Grouping by pitch is grounded by evidence found in the experimental literature. Deutsch (1975) played two simultaneous tone sequences through headphones to 24

39 Some Gestalt Principles I Al A4 B1 ", ~4 Cl. C4,, , ,,, A3 --,, C3 A2 B3 " B2 C D D D D D D D D D Figure 2.7: Some Gestalt principles: Box 1 represents the principle of proximity, box 2 represents the good continuation, and box 3 represents the similarity principle. listeners. One sequence was directed to the right ear and the other to the left ear. Subjects reported that all high tones were heard from the right headphone and all the low tones from the left headphone. This is called the phenomenon of scale illusion because listeners hear the tones as two smooth scale passages rather than as two angular melodic contours which were actually present [52] as shown in figure 2.8. Furthermore, Butler (1979) has demonstrated that pitch grouping phenomenon is very robust even when real instrumental sounds are used with spatially separated loudspeakers as opposed to Deutsch who used pure tones and headphones. It is also reported that most of the listeners grouped by pitch even when notes from one speaker had a distinctive timbre [53]. In the discussion, so far, pitch grouping has been demonstrated using two distinct sound sources producing simultaneous sounds. However, it is also possible that a single sound source can be heard as two independent sound sources. This phenomenon is called pitch streaming. Pitch separation and the speed of the component notes are two contributing factors towards whether or not one or two pitch streams are heard. In a sequence 25

40 Stimuli Perceived Right r J J J J J J r Perceived Left Figure 2.8: The scale illusion taken from [52]. of two notes alternated at a rate of ten per second, the following cases exist [54]: 1. When intervals are less than about one seventh of an octave then one single pitch stream is heard. 2. When intervals are greater than about one seventh of an octave then two independent pitch streams are heard. Further experiments [.55], have shown that in the event of reducing the rate of alternation beyond six notes per second then it becomes possible to hear wider intervals up to an octave as fused to form a single stream. Intervals of a tone or less are heard as a single pitch stream when repeatedly alternated regardless to how slow they are performed. But, what really happens when melodies are heard? Dowling [56] defines a melody as a sequence of single pitches organised as an aesthetic musical whole. The perception of a melody is influenced by contour, timbre, rhythm, intensity and tempo [50, 57]. It has been shown experimentally that when listeners are exposed to sequences of tones with a wide variation of frequencies then listeners organise the stimuli into narrower range (high and low) streams which offer a psychological coherence [58, 11]. Experiments [59] with real melodies (e.g., happy birthday) were also constructed in such a way that note 1 of a tune A was played followed by note 1 of a tune Band so on alternately. The sequences were played at a speed of eight notes per second 26

41 (i.e., four notes of each melody). The results of these experiments showed that it was almost impossible to recognise overlapping melodies because the melodies merged into one single not recognisable sequence of notes. However, when the melodies did not overlap in pitch, they were easily recognised. According to Dowling (1973) who constructed these experiments, the only way by which melodies with overlapping pitch could be recognised was when listeners were asked to search actively for a particular known melody (e.g., happy birthday). In circumstances where melodies are overlapping in pitch, the listener has to concentrate and perceive the melodies with considerable effort, as opposed to melodies not overlapping in pitch where the listener is passively aware of each of the melodies. It was also found that unfamiliar melodic patterns learned by listeners during the experiments could not be recognised so easily when interleaved with other melodies even when the pitch separation was an octave. Therefore, one can say that pitch streaming is not invariant, but can be helped or hindered by acquired knowledge about music. Other experiments with melodies [60] investigated the role of the melodic contour in remembering and recognising melodies. One of the major hypotheses in these experiments was that if the interval size was altered but the contour remained the same then listeners would still recognise the melodies. In order to test this, five note melodies were played at a speed of six notes per second. The notes were randomly selected, successive notes rose or fell by one, two or three semitones. The starting point was always the middle C (the number 60 in MIDI notation or 260Hz). Participants had to listen to a generated melody, a two second pause followed, and the second comparison melody was heard. The participants had to answer on a scale of four options ranging from 'I am sure that the melody is the same' to 'I am sure that the melody is different'. Results from this experiment [60] indicated that participants were making their decision based on whether or not the second (comparison) melody was transposed. For untransposed melodies, participants with pitch and musical training did not seem to have any particular advantage over untrained participants. However, these results were challenged from other experiments [61], in which musicians and non musicians were asked to suggest the contour of popular tunes and to guess the size of the intervals involved in those tunes. The results suggested that participants (musicians and non-musicians) did not produce accurate interval size estimates even though the tunes used were popular and well known to the participants. The conclusion was that although people may not recall the intervals, they still remember tunes [62]. Remembering a set of tunes improves when participants are trained and mentally work with these tunes [63]. Similar supporting results were 27

42 found with smaller tunes of three notes [64J. Further experiments [65J have shown that contour has a significance in the remembering and perceiving melodies and that both the key, and the tonality of a melody, contribute psychologically in the perception process. In experiments performed with two melodies sharing the same contour and scale, participants appeared to be easily confused regardless of any difference in the intervals. On the other hand, when the scales were changed and the key distance widened around the circle of fifths, participants shown less confusion [66J. Watkins [67J has shown that key is an issue. Participants from the western musical background were more capable of recognising melodies using pitch margins within the diatonic scale. In the same experiments [67J, it was also shown that melodies were recognised easier when pitch was drawn from a pitch distance close in the circle of fifths. When the rhythm is changed in a melody, the melody alters and is less easier to recognise [68J. Other research studies [69J argue that intervals in melodies are heard as positions within the scale and not as pure intervals. Dowling [70], also, suggests that, from a psychological point of view, listeners perceive a set of pitches as opposed to a set of intervals. The fact that listeners were able to recognise melodies even when their intervals were widened into different octaves but the same pitches [71, 59], indicates that the psychological representation in listeners is constructed in terms of pitches and not in intervals. The question as to whether trained musicians perform better than non-trained musicians in the perception of musical stimuli has been discussed. Wolpert [72J argues that untrained musicians do not interpret musical stimuli in the same way as trained musicians. He found that musicians and non-musicians follow different sets of rules in interpreting music. In matching excerpts, musicians used melody and correct harmonic accompaniment as the major criteria (as opposed to instrumentation). Non-musicians did not use the same rules. Any differences in rhythmic processing between musicians and non-musicians has argued to be due only to individual differences [73J. In pitch recognition tasks using earcons, when pitch was only involved, it was observed that musicians performed better. However, no differences were reported when earcons were played from instruments with different rhythms [74J Memory of Individual Notes A series of experiments have been performed to determine memory of individual notes [75, 76, 77, 78J. In these experiments, listeners had to hear two notes separated 28

43 by a five seconds interval. The pitch of the notes was the same for half of the trials and differed by a semitone for the other half. The subjects were asked to judge whether or not the notes had the same pitch. In one set of the experiments, the five seconds interval was silent. Deutsch reports that most of the listeners' judgement about the pitch of the notes was a 100 per cent accurate. In another set of experiments, the interval was filled by spoken numbers, either to be recalled or ignored. In both cases, the response of most of the subjects was a 100 per cent accurate according to Deutsch. In further experimentation, a number of randomly chosen notes were placed between the two test notes. These notes were drawn from the same octave as the test notes. Subjects were asked to ignore the intervening notes behyeen the two test notes. In these experiments, only 68 per cent of the listeners were able to judge accurately. Therefore, Deutsch argues that intervening notes are considered to have a disruptive effect on memory for a pitch of an earlier note. The disruptive effect, according to Deutsch's experiments, is even greater when the intervening notes are close to pitch with the test notes and also still exists even if the intervening notes are shifted up or down an octave. However, Sloboda (1985) remarks the following about Deutsch's experimental results [40]: At first sight, Deutsch's results suggest a very gloomy conclusion about musical memory. Memory for individual pitches seems incredibly POOl', if it can not survive a few succeeding notes. How is it possible to remember notes across structures of symphonic proportions, containing tens of thousands of notes? The general answer to this problem would seem to lie in the opportunities which most music affords for listeners to classify and organise what they hear. Deutsch's sequences were atypical in two respects. They did not confine themselves to the intervals of a common scale (using fractions of a semitone in some instances), and their notes were randomly chosen so that they were not designed to form common m usical patterns within the scale framework Memory in Extended Music What really happens when a person listens to an extensive passage of music? The answer to this question needs the analysis of two other major subquestions: How does a listener segment, 'break-up' or separate a piece of extended music into small groups of sequences? 29

44 How does a listener remember these sequences? At one level, one must consider th" physical characteristics of music such as timbre or the pause, which do suggest segmentation to the listener. Tan, Aiello and Bever (1981) experimented with equal-duration note sequences which contained two melodic phrases, each of which ended with a melodic cadence. The melodies were played to subjects and then subjects were asked to judge whether or not particular two note probes were present in the melodies. There were three forms of the note probes. A pair of notes ending the first phrase, a pair of notes beginning the first phrase and a pair of notes 'straddling' the phrase boundary. According to their findings, subjects recognised more of the first two types of probes as opposed to the last one. They were more able to form accessible memory representations of intervals within probes defined by cadences than they were for notes equally close in time but coming from two different phrases [79J. The answer to the second subquestion involves the human memory capacity for holding segments. In human memory, as more items are added, memory for other items is lost [80J. One way in which this problem can be solved is by associating different items together. In musical compositions, these associations are present in terms of patterning and structuring. There is evidence that people remember musical extracts best if they are labelled with concrete representation titles as opposed to abstract conceptual ones. These titles enable the construction of some sort of a story in human memory which in turn is associated with particular segments of music [81 J Studying Perception of Continuous Music How can the perception of listeners to whole musical pieces be experimentally studied? Although this thesis is not related to the perception of listeners when exposed to whole musical pieces, there are a number of concepts raised below which need to be considered in order to broaden the horizon of an empirical investigation of musical perception involving structured messages in an interface. The scientific areas of psychoacoustics and psycholinguistics have influenced experiments focusing OIl the perception of music [57, 40, 82J. In psychoacoustics, experiments are performed using the scientific method of measuring the Dependent Variable (DV) and changing and manipulating the Independent Variable (IV) of the musical stimuli. As far as tht, scientific method is concern, this is fine. However, the musical stimuli may not satisfy the aesthetic qualities of music. The concern raised is best described in [83J: 30

45 There are certain obvious advantages in this very controlled kind of approach, and it has proved extremely powerful and productive for advancing our understanding of tonal and metric hierarchies. However, it has left untouched a range of issues concerned with listeners' understanding of more extended and elaborate structures in which a considerable degree of interaction between different parameters can be expected. The above quotation appears to have support from other researchers, too. In particular Deliege and Ahmadi [84J also have remarked upon the same concept with the following: The usual practice, in our field, as in any scientific discipline, is to isolate the variable that one wishes to study and to incorporate it in a series of brief and repetitive sound frequencies (that are called musical), constructed by the psychologist for the needs of the experiment, in order to be able to identify it afterward, in appropriate manner, in the statistical analysis of the data. Unfortunately, many studies in the field of psychology of music scarcely achieve their aims because a musical objective is being sought through the use of material that is both too simple and too trivial. In order to use music in interfaces (either for blind users or in a multimedia system), one must certainly look into the existing research of musical perception primarily in the area of psychology. Although, it might be argued that no one would wish to use musical compositions in an interface, there is very little knowledge of the mental interactions occurring in the listener when exposed to genuine (produced by talented musicians) or artificially created (not necessarily talented but, nevertheless, following musical rules) compositions. Researchers have addressed the need for research to understand more about the not-so-obvious and measurable variables which are created and take values during exposition to continuous music [40, 85, 86, 83J. In particular, Sloboda [40J says:.. when I go to a symphony concert or listen to a gramophone record, there are may well be a lot of 'mental' activity, but there is not necessarily any observable physical activity. The principal end product of my listening activity is a series of fieeting, largely incommunicable mental images, feelings, memories, and anticipations. 31

46 However, some experiments using musical compositions as the entire stimulus are in the experimental literature. In one experiment, Pollard-Gott examined the possibility of participants focusing on particular musical themes when they are exposed to repeated listenings of the musical stimuli. Musically trained and untrained listeners were asked to rate the similarity between two short musical passages (taken from Franz Liszt Piano Sonata in B minor)6. Participants had to judge the similarity of passages from pairs of passages in the sonata on a 1-to-11 point scale where 1 stood for extremely dissimilar and 11 for extremely similar. Results showed that participants improved in rating the similarities between themes and the musically trained subjects appeared to perceive more quickly than the non-musically trained [87]. Other experiments with musically trained subjects have been performed showing that there is an accurate judging of excerpts duration of musical pieces [83]. In other studies [84], important similarities in the perception of musical compositions were found to exist between musicians and non-musicians. Further experiments with musical compositions can be found in [88,89, 90, 91]. 2.5 Enabling Technology for Audio Systems A multi-media system can be defined as a collection of appropriate hardware and software which utilise several media, such as text, sound, images, video, for communicating with users. One of the purposes of multi-media systems is to provide an effective user interface using a collection of different media. Figure 2.9 shows an overview of the components involved in a multi-media system. Full multi-media systems consist of specialised input, processing, storage and output devices. For audio, multi-media components include the audiocard, and MIDI (Musical Instrument Digital Interface) devices. Music can be output using these devices. Technically, the composition and output of many musical structures is possible, though there is a general lack of the use of music as a communication metaphor. Although music has been intimately associated with human life throughout the ages, its purpose has been mostly for entertaining. Humans appear to appreciate and somehow understand music, and there is particular evidence as discussed earlier, that the human brain allocates particular cerebral areas to musical processing (see section 2.2). 6This sonata has one movement with two principle and three minor themes. 32

47 VCR Camoorder Television CD-ROM '---_-----.J Digital Video L- I ~====~m:d~q;'p~hl: "Jr--~ Adapta, Digitising Tablet Computer Floptical Disk Drire f----' MicrophoIlO : Wavefonn Audio..., hjl 1... h..., Keyboard >UDI CD-Player ~ MUSIC Figure 2.9: A Typical Multi-Media System. The Musical Instrument Digital Interface (MIDI) [92, 93J enables a communication link to be set up between computers and musical devices by offering a specification of a software language and a hardware interconnection scheme. MIDI was launched in Simplicity and low cost were the two major contributing factors towards the use of the MIDI standard and its successful adoption. Briefly, in technical terms, MIDI communication is organised into lo-bit messages which are transmitted at a fixed rate of kilobits per second amongst different computers, instruments, sequencers and other devices. A number of MIDI controllable hardware devices have been built which range from sound-mixers to multiple-timbres. Besides the success and the wide acceptance of MIDI, a number of limitations and difficulties have been suggested: 1. Poor provision for microtonal music, timbre control and digital audio processing. 2. Low data rates when one compares the 10-Mbit/sec communication rate of digital devices and the kbit/sec of the MIDI. 3. The one-way communication link scenario from a master device to a slave device. Two-way communication among several devices is difficult to achieve and even when it becomes possi ble is not very convenient. 33

48 Despite the above limitations, MIDI was a significant step in allowing the manipulation and handling of several devices such as keyboards, drumkits, controllers, synthesisers, sequencers, signal processors and mixers. In the following sections, a more comprehensive review of MIDI is offered covering some aspects in more detail. A MIDI message consists of a start bit which is used to alert the device, and a stop bit. The device receiving a MIDI message first reads the 'start' and 'stop' bits and then decodes the remaining eight-bit message byte. In brief, there are two main types of byte. These are a status byte which starts with 1 and a data byte which starts with O. Typical MIDI messages can be: Program change. This message can be sent, for example, to assign an instrument to a MIDI channel. Note on. This message activates a note via a particular MIDI channel using the current instrument assigned to the channel. Note off. This message stops the output of a note on a particular MIDI channel after the note has been activated with a note-on command. Control change. This message is used to refine a note being output by means of adding nuance to it, for example, soft pedal, sustain pedal, vibrato or tremolo. In addition, there are also other messages which can be sent in order to control musical output such as after-touch and pitch bend which are mechanisms which allow a sound to be manipulated after it has been initiated, and alter the pitches of notes on particular MIDI channels. A synthesiser can be configured (through MIDI commands) to operate in the following modes: 1. Omni on, poly. All channels are entitled to send information. More than one note can be played at any time. 2. Omni on, mono. All channels are entitled to send information. One note is played at a time. 3. Omni off, poly. Only a specified number of channels can send information which, when different timbres are assigned to each channel then more than one note at the channel's specified timbres can be played at a time. 4. Omni off, mono. Only a specified number of channels can send information and one note at a time. Further details of the MIDI standard can be found in [94,95,96, 97, 98, 99]. 34

49 2.6 The Application of Audio to Interface Design Auditory Icons, Earcons and Guidelines One of the first attempts to use auditory icons in user interfaces was by Gaver [100, 101) who used everyday natural sounds to communicate actions on the user interface. This research work lead to the development of the SonicFinder (the SonicFinder will be reviewed in section ). Some earlier work [102) had examined the capability of people to recognise everyday sounds (e.g., tearing paper) and reported that 95% of the participants were in a position to match sounds with their sources especially when the sound derived from one source. When more than one sound source produce similar sounds, confusion occurred (e.g., hammering with walking). The idea was that people listened to the sources of the sounds and not to the pitch or timbre [103] and it is remarked that: Identification of sound sources, and the behaviour of those sources is the primary task of the auditory system. Sound sources can be used to communicate information about the environment [104) such as physical events (e.g., a bottle breaks, and smashes or bounces when dropped on the floor), events in space (e.g., an ambulance's siren approaching), dynamic changes (e.g., when liquid is poured into a glass, overflow can be detected), abnormal structures (e.g., a car engine with a fault such that it does not sound as it would normally sound without the fault), and invisible changes (e.g., a hollow space in a wall can be identified by tapping). There is also other research work [105, 106) which looks into environmental sounds. An investigation into the use of sound for communication from the perspective view of using musical stimuli resulted in the introduction of earcons [107, 108, 109]. According to Blattner [107), there are the one-element earcons (or motif), compound, inherited and transformed earcons. One-element earcons have a number of basic properties, or building blocks, such as rhythm, timbre, register and dynamics. An earcon of this type can be used to represent a system message in one of the windows displayed on the computer screen. The particular window in which the message is displayed can be identified by the earcon's register and dynamics. This type of earcon is highly appropriate for single basic messages such as simple error messages and other straightforward system 35

50 information, because one-element earcons are fairly simple. Examples of one-element earcons are: a digitised sound, a sound created by a synthesiser, a single note, a motif, or even a single pitch. Rhythm is important because it characterises a sound [14]. A rhythm change will alter the perception of a motif and, thus is regarded as a very important characteristic [107]. Pitch can be used to build a combination of different motifs given the fact there a number of octaves available for use in western music. However, pitches should be selected from one octave for better perception and a random combination of pitches should not be used [108, 109]. The use of different timbres can make the motifs sound different when the register denotes the position of the motif in a musical scale and thus, the change of the register changes the motif and the meaning perceived. One example of dynamics is acoustic volume. By increasing or decreasing the volume a motif can communicate information about, say, the varying a size of a window. Compound earcons are earcons made up from two or more motifs or one-element earcons. A number of general rules have been proposed for the construction of motifs [llo]: Their length should be kept to three or four notes so as they can be remembered easily and, will not take long time to be listened to when combined. Rhythms should be created with no more than seven time divisions. Notes should be taken from eight octaves of twelve notes. Semitone gaps can create incorrect melodic implications and, thus, should not be used. Sine wave, square wave, sawtooth waves and triangular waves should be used for timbre. Low, medium and high registers should be used. Loud, medium, soft, soft to loud and loud to soft dynamics should be used. Earcons should communicate one meaning only, be brief and simple, and they should be constructed in such a way so that they are easy to remember, understand and perceived using the following guidelines [110]: Repetition - repeating a motif unchanged. 36

51 Variation - change one or more of the motif's properties in relation to the proceeding one( s ). Contrast (pitch or rhythm) - can be used to contrast with previous motifs. There are three major properties in the construction of compound earcons which are combining, inheriting and transforming [107]. U sing the Earcon technique, the actions and tasks of an interface are broken down into smaller ones, and then compound earcons are assigned to facilitate communication. Thus, it is claimed, any event in the interface can be presented by a set of combined earcons. In a hierarchical approach, a tree of earcons is created and successor nodes (earcons) inherit properties from predecessors. In figure 2.10 compound earcons are shown and in figure 2.11, shortened earcons are shown for expert users taken from [107]. ( i < >, ~ ( ) d : Create De, oy File String 1 1 Combining Earcons,,,, Create File Destroy File Create File : Destroy String n p d! () P d ( 1 f> d!1l 1I1?il < > i< >,,, Figure 2.10: Examples of compound earcons [107]. 37

52 An example of eartons containing some elements In the hierarchy I ~ : Unpitdlc;l 50\114 \ File UnknO~ sine ~ IUegtl Derice triartgle Figure 2.11: Examples of earcons not containing all elements in the hierarchy for expert users [107]. A more specific set of guidelines for the creation of earcons comes from Brewster [74]. These guidelines which, also, appear revised in [111], are: Timbre. Musical instruments need to be used. Multiple harmonics are suggested to avoid masking. Avoid similar instruments (e.g., violinl, violin2, violin3). He remarks that ".. instruments that sound different in real life may not when played on a synthesiser, so care should be taken when choosing timbres. Using multiple timbres per earcon may confer advantages when using compound earcons. Usir;g the same timbres for similar things and different timbres for other things helps with differentiation of sounds when playing in parallel" [111]. Register. Pitch and register should not be used for absolute judgements. Better performance can be achieved when register is combined with another parameter. Large differences among earcons must be enforced if register only is required to be used, but maximum effectiveness may not result. He remarks that "Two or three octaves difference give better recognition. Much smaller differences can be used if relative judgements are to be made" [111]. Pitch, Rhythm or another parameter. These are required with complex intraearcon pitch structures in order to be effective and be differentiated by the listener. The range of pitch used must be between 5kHz and 125kHz-150kHz (four octaves above C3 and no lower that the C4). In addition, the ranges of the instruments need to be considered because different instruments fall into different ranges of pitch. 38

53 Rhythm and duration. The rhythms used must be as distinct from each other as possible. Notes ofless than sec duration should be avoided due to the danger of not being noticed. But, in short earcons (one or two notes) lengths of 0.03sec are suggested. It is also remarked that earcons should be as short as possible so that the rest of the user interface interaction can be in parallel with the sound. Presentation can be accelerated by playing two earcons together [112J. Six notes per second in each earcon are usable. Rhythmic patterns, with the first note louder and the last note longer, will sound as complete units [113J. Intensity. Intensity should not be used alone to differentiate between earcons because listeners are not good at absolute intensity judgement. Volume must be controlled by the user but all earcons must be held within a narrow intensity range so that when volume is low, no sound will be lost and when the volume is high, no part of the earcon will be too loud and thns create annoyance to the listener. Brewster remarks [74J: "One of the main concerns of potential users of auditory interfaces is annoyance due to sound pollution. If intensity is controlled in the ways suggested here then these problems will be greatly reduced. " Spatial location. Use stereo positioning or special hardware to utilise three full dimensions to enable the user to differentiate earcons in parallel (or in series when differentiation of a family of earcons is required). To make earcons attention grabbing Brewster suggests the use of intensity although acknowledges possible user annoyance. He also suggests other methods to achieve this by using rhythm and pitch. Compound earcons should have a space of 0.1 seconds between earcons. Examples can be found in [Ill J and for further reading consult [74, 114J. One problem with earcons is memorability. Random notes are remembered less well than structured sequences of notes [14J. Differences between two groups of notes, where each group has notes taken from a different scale can be better detected [115J. When only one octave is used then one can minimise problems created from the perception of pitch [12J. Spatial localisation in positioning earcons also helps in grouping earcons from different sources [116J. 39

54 2.6.2 Auditory Applications in Interfaces In this section, a number of auditory interface applications incorporating speech, sound and vision (some of them) for blind, partially sighted or non-blind users are reviewed. First of all, one must consider the limitations imposed upon user interaction with interfaces using sound. Sound is regarded as having low resolution as opposed to vision which provides high resolution. It also has low orthogonality which means that a change in one or more parameters of a sound will have a side effect on other parameters characterising the sound. Sound also cannot convey absolute information which means that relative and abstract information can be conveyed more easily than absolute and exact data [117]. Although, the use of speech in interfaces may have an interfering effect with the short-term memory of listeners [118], it is not the same with sound (non-speech sound) [119]. The serial presentation of sound also places an overload to memory [118] but there is evidence that sounds can be presented in parallel [120, 121,74, 122]. Another limitation is the difficulty of associating a sound and the element ( s) it represents [118] SonicFinder The Sonic Finder 7 is an interface which uses auditory icons and graphical feedback [101]. The interface itself aims to achieve, explore and test the following: 1. The introduction of everyday sounds which will map naturally and meaningfully to various events within the HCI aspects of the interface. 2. The employment and evaluation of auditory icons to establish whether users find them useful during the operation of the interface. 3. The identification of circumstances under which sound can be regarded as a particularly appropriate medium as opposed to visual graphics. The interface was incorporated on top of the Apple Macintosh interface and aimed to extend the auditory interface dimension beyond the direct visual one. For example, operations such as selecting, copying or removing a file, are associated with particular sounds. An action, or a set of actions, are represented by auditory icons and in such a way that each sound implies the task being performed. The mapping is shown in figure SonicFinder is a trademark of Apple Computer, Inc. 40

55 Finder Events Object! Selecdon 1)rpe (file, application, folder, disk, trash) Size Openlng S!~efromopened object Dragging Size Where (windows or desk) Possible drop-in? Drop-in Amount in destination Copying Amount completed Wtndows Selection Dragging Growing Window size Scrolling Underlying surface size Trashcm Drop-In Empty Auditoty Icons Hitting Sound Sound Source (wood, metal, ere) Frequency Whooshing50und Frequency Scraplngsound Frequency Sound 'Ype (bandwidth) Selection S<lund of disk, folder, or Irasl\can Noise of object landing Frequency Pouring sound F",""", Dink Scraping Clink Frequency Tid: sound Frequency Do'" Crunch Figure 2.12: The mapping correspondence between system's events and sounds, adapted from Gaver [101]. All sound are used redundantly to the visual stimuli of the Finder as shown in figure Gaver suggests that natural sounds can be used in interfaces to represent actions and tasks similar to the manner in which natural sounds offer information about the environment. However, there is a difficulty in finding sounds which map to user interface actions because they do not, as such, exist in the real environment. Gaver also suggests the use of sound effects for those user interface tasks which have no direct correspondence with sounds in the real environment. For example, files could have a wooden sound, folders, a paper like sound, applications, a metal sound, copying sounds could be pouring liquid into a receptacle, and delete sound could be the sound of smashing plates. However, it must be noted that there are circumstances where there is no direct equivalent real sound for the task in question (see also early part of the section 2.6.1) Soundtrack The Soundtrack system, developed by Edwards [123, 124], is an auditory wordprocessor using synthesised speech and musical tones (in the form of sine waves) for visually disabled users. The system runs on an Apple Macintosh. There are two underlying principles behind the Soundtrack system, one is that visual feedback 41

56 should be replaced by auditory feedback, and the other is the constraining of the resultant interface. Blind potential users of the system interact with an auditory screen which consists of auditory objects. In Soundtrack, there are three basic elements which define an auditory object. These are a musical tone, a name and an action. The main screen of the system, as shown in figure 2.13, is divided into eight different cells, each of which is associated with auditory tones and synthesised speech. File Menu Edit Menu Sound Menu Fonnat Menu Alert Dialogue Documentl Document2 Figure 2.13: The division of main screen in Soundtrack word-processor. Every cell produces a particular tone when the cursor is located within its area. With the aid of all these different tones a user can navigate the system without the need of having visual contact with the word- processor. The tones increase in pitch from left to right. Sounds were heard only on entry to an auditory window. A single mouse click causes the interface to speak the name with that cell. Thus, a user can easily identify the screen position and current stage within the Soundtrack system. A double mouse click causes the cell to produce its associated submenu. All submenu options are also associated with other tones. The tone lowers in pitch when moving down the menu and rises when moving up the menu. In the same way as before, a single mouse click causes the cell to speak its name and a double click executes the submenu options. There was speech feedback of input. Soundtrack thus introduces a novel form of interaction between itself and a blind user. In its evaluation phase, major aspects were studied and examined. These were, the nature and behaviour of the auditory interaction and the ease of use of the system Two major approaches were followed in the Soundtrack in terms of ease-of-use and usability which were the following: 1. Subjects were monitored while they were completing complex and sophisticated tasks. 42

57 2. Subjects were asked to state their opinions and views about the system in a questionnaire, The recorded data was used to implement a number of modifications so that the auditory interface could be enhanced and the functionality of the word processor increased. In brief, the major problems encountered were: 1. Human memory was the greatest problem. Subjects had to remember spatial information about the location of data on the screen. 2. Complexity of the Soundtrack interaction was increased as a result of the two-levels mode of operation. In addition, as Edwards remarks, the introduction of pitch to encode spatial information was not so helpful to the majority of the subjects. In contrast, a subject with a musical ear was able to benefit more from the pitch of the tones than a nonmusical subject. Clearly, Soundtrack demonstrated that a wimp-style interface can be designed and implemented in a way which can benefit the visual impaired human population. The development and evaluation of the system has brought into light the difficulties involved in constructing audio interfaces, in interacting via auditory interfaces with blind users, in the capabilities and limitations of humans in relation with auditory interfaces as well as its success in making positive steps towards enhancing the audi tory H Cl channel. Soundtrack proved that a fully auditory interface could be developed and showed that interaction with it was possible. Thus, it demonstrates the idea that it IS possible to create an auditory interface as a alternative to a visual interface The Auditory SharedARK Gaver and Smith (1990) investigated the potential applicability, and the various problems, involved in applying auditory icons on a large-scale, to multiprocessing and collaborative environments [125]. The experiments of applying auditory icons were carried out in a shared virtual environment called SharedARK [126]. SharedARK, in itself, is a collaborative version of the Alternate Reality Kit (ARK) [34]. The system offers a virtual physics laboratory for distance education. Some of the facilities offered, allow users to interact simultaneously with objects in an 43

58 environment that extends beyond their screen view. Smith (1989) remarks that the user interface of the underlying system was easy to learn by novice users [90J. However, a number of problems with the user interface were also reported. Some of these problems, are also present in other similar direct manipulation, multiprocessing systems: 1. Confirming user initiated actions. 2. Providing enough user feedback about the system's processes and states. 3. Providing sufficient navigational information. 4. Informing about the existence and activity of other users who may be using the system. Gaver and Smith applied auditory icons by using mainly pre-recorded environmental everyday sounds as opposed to synthesised or musical instrument sounds. They chose everyday sounds because it was considered that actions and tasks in a virtual computer world should sound like things in the everyday world which humans should be accustomed to. A strategy of selecting appropriate sounds, such as traps, clicks and scrapes, was taken for confirmatory sounds. In experimenting with these sounds in the SharedARK, it was found that they provided an immediate, intuitive and engrossing feedback for the user. It was realised that sounds could be used to reflect the state of the system and the activity of ongoing processes in a very helpful way. This is even more useful when the visual indicators are not visible and the screen is overcrowded. In addition, Gaver and Smith remark that the act of hearing the state of the system and its processing, is often more efficient than relying on visual graphical displays. Overall, in the experiments used in the sharedark, it was found that there was a potential for employing auditory icons for offering better user feedback in multitasking systems. In designing auditory icons for the sharedark [125J, however, problems were encountered: such as: 1. The mapping of everyday sounds into information which needed to be conveyed. 2. Designing non confusable sounds. [127J. Paterson (1989) has also addressed similar problems in designing auditory cues 44

59 A number of properties and benefits of auditory icons were recognised in giving an auditory dimension to the sharedark. For example: There is a wide range of functions that auditory icons can perform. Auditory icons provide more noticeable feedback than visual cues Audio Projection in Window Space Fields A system called 'audio windows' [128] demonstrates an interface which combines a spatial sound presentation and gestural input to serve as a teleconferencing system. The spatial sound system employed is based on projecting a sound into space, and by manipulating the sound sources, virtual positions are achieved. So, the principle is to have multiple sound sources which are heard by the listener simultaneously (all sources coexist). There are a number of parameters which can be controlled by the user so that different listening sensations can be achieved (e.g., walking around the conference room or changing position with regard to other speakers). Input control is performed via gestural recognition. This is achieved by employing a DataGlove. The DataGlove's moves are interpreted and recognised by the system. It employed posture recognition with the VPL-supplied gesture-editors and an arm interpretation component. Immediate confirmation feedback is also provided for the various operations. Some of these recognisable gestures are indicating and pointing tasks, grasping for repositioning, releasing, and others such as 'help' or 'stop'. A graphical illustration of the architecture of the system is shown in figure The system can be used either by blind or sighted users, due to the fact, that the input is gestural (no keyboard or mouse is required) and the output is auditory (no visual display is required). Finally, as Cohen and Ludwig remark, 'this prototype provides a test bed for exploring the immediate potential of the emerging technology's application to teleconferencing and for 'researching the relevant human factor issues'. Further reading can also be found in [129, 130, 131, 132] Sound-Graphs The Sound Graphs [133] system is an auditory interface for blind users which uses speech and sound (a visual display also exists for the benefit of partially sighted 8VPL stands for Visual Programming Language used for the DataGlove model 2. 45

60 ! Mac SE I Dall love Headphones RS422! VPL Glcvebox r- I Polthemous I RS232 I Tracker I I Ldt Crystal River Headphone S~ Convclvotron Amplifier Unix 11' Socket t.mlcrovax 'If RPC! MICROVAX "I - \ AKAI Audio J L Matrix (1/2) '1 Urel 546 Equalizer I _ I AKAI Audio! Matrix (2/2) l AKAI S900 \ Samplers J Right I Aphex Aural J Exciter Figure 2.14: The architecture of 'audio windows' [128J. users) enabling users to view and create graphs. The shapes of different graphs are conveyed to the user either as a whole continuing graph or in an interactive manner for those areas of the graph which the user is most interested. The user can control the output of sound by moving the cursor forwards and backwards. Coordinates of the graph are also given using speech together with the facility for magnify areas of the graph for more detailed study. There are two ways in which graph creation can be carried out by the user. One is to apply a formula which creates the graph providing that all the necessary information is known. The other way is to draw the graph while feedback is received from what is drawn interactively. The author has seen demonstrations of the SoundGraph system at the Royal National Institute for the Blind at Loughborough, where blind users of the A Level Mathematics course were using it. The package allows users to import third party data such as text and coordinates for the graphs Auditory-Enhanced Scrollbar An auditory-enhanced scrollbar, shown in figure 2.15, was developed to investigate problems associated with scrollbars such as 'kangarooing' with the thumb wheel and 46

61 losing track of the position within a particular document. Various sounds were used for user feedback from an electric organ (from Bl to C4) for scroll down in the C major scale. A low tone was used to offer status information and increased in volume (for two beeps of 9/60sec) to communicate a page boundary (a detailed discussion of the earcons used can be found in [74] as well as other experiments with earcons which their results in terms of guidelines have been reviewed in 2.6.1) File Edit Font Size Style expt data file cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf cebebcddcaabdfbaabbbfdvbfdfbfdbbbfbfdfbfddfvbbfdfdaf Figure 2.15: The auditory-enhanced scrollbar taken from [122]. Results of the auditory scrollbar experiment shown that it can assist in performance time and error recovery. The mental workload was also reduced and subjects preferred it to a visual schrollbar Other Auditory Applications Pitt and Edwards [134] have examined the localisation and spatial navigation of blind users in an auditory display. Users, with the aid of sound, can understand and navigate within an auditory display and select objects. Some of the characteristics of the system are given below: Concurrent use of speech - different timbres and pitches were used to offer feedback to the user for different positions in the display, as well as offering 47

62 directional information. Stereo - was used by offering intensity differences between the ears of the user and other information about proximity of the target object and its position within the display. (i.e., the more close the user moved towards the target, the louder the sound of the target became). The results of this work indicate that, with a reasonable degree of speed and accuracy, blind users were able to find up to eight targets on the screen. This is also a good example of avoiding the use of specialised and expensive hardware in offering spatial information to blind users. Another research project, PC-ACCESS, aimed to provide visually impaired users with ergonomic access to graphical interfaces [135]. The project was developed in the light of the research findings in Audiocone [136] and Pantograph [137] which had the purpose of offering blind users a multi-modal access to graphical interfaces. Two major research path were followed which were, in principle, based in the idea of replacing visual iconography with an adapted iconography, while direct manipulation mechanisms were maintained using the auditory and tactile senses to represent visual information. The two versions of the PC-ACCESS interface for Windows 3.ll were a space sound iconography which uses a drawing pad and the mouse, and an auditory haptic iconography. The space sound version incorporates a commercial drawing pad on which the screen is projected. Cursor position on the screen is followed on the drawing pad with tactile information. As the mouse moves around to different positions on the screen, the objects that are underneath the cursor produce an appropriate sound. This is similar to GUIB system [138] where graphical elements are represented using sound (as in the Mercator project). The text is represented by synthesised voice or braille. The blind user receives feedback in sound for actions such as opening a window or calling up a dialogue. Other recent research projects such the TDraw and the TRender [13!)] aim to provide blind users with access to graphics. Their tactile drawing system consists of a swell paper, a special thermo pen used to draw on this paper, a digitiser which monitors the drawing process, a speech input system, and a control program for the separate modules. A diagram created by a blind user is placed on the digitiser tablet. Names can be assigned to various objects which are drawn by the user and several inputs can be performed using speech. Thus, users do not have to use their hands which can remain in constant contact with the drawing. The tactile renderer takes a spatial model as input which itself is converted into a drawing. There are a number of steps involved in the rendering process. Firstly, 48

63 not necessary details or any other confusing aspects are removed and the drawing is simplified in three dimensions. Then with the aid of TDraw defined rules [140J the model is further processed and transformed to two dimensions Communicating Hierarchical Menus The employment of earcons to provide navigational cues in a 25 menu hierarchy was investigated in an experiment [141J. The design of the earcons was based on the guidelines [H1J (reviewed in 2.6.1). Figure 2.16: The hierarchy of nodes used in Brewster's experiment (taken from [lll]). As it can been seen in figure 2.16, the menu hierarchy has four levels. In level one (main menu), a constant D3 note at the centre of the stereophonic position (left and right, 60 in MIDI terms) was used. In level two (e.g., applications, word processing), a second note in addition to the first level note was used in communicating the submenu using a different register and stereophonic position. The submenus applications, word processing, experiments and games were assigned C4, C3, C2 and Cl using electric organ, violin, drum and trumpet in the stereophonic positions of far left, centre left, centre right and far right, respectively. For example 'applications' was assigned C4 produced by an electric organ in 'far left' stereophonic position, 'word processing' was assigned C4 produced by a violin in the centre left. The note carried forward from level one, changed instruments from flute to the particular instrument of the submenu (e.g., to electric organ for 'applications' and far left stereophonic position). In level three (e.g., graphics, letters, earcons), different rhythms were employed for each left, centre and right note. The register, timbre and stereophonic location was carried forward to this level from level two depending on the submenu. The rhythms repeated once every 2.5 seconds. In level four, rhythms were carried forward from the previously level but now the repetition was every second. 49

64 Results indicated an accuracy of 81.5% in enabling listeners to identify their position within the hierarchical menu. Listeners were also tested in identifying the position within the menu hierarchy of two unheard earcons (A and B nodes in figure 2.16) constructed using the same rules used for the others (these two new earcons make the total number of nodes 27). An accuracy of 91.5% is reported in this task Auditory Software Environments The major task of sound in software or programming environments is to use audio in terms of pre-recorded synthesised or natural sounds (i.e., auditory icons and earcons), musical tones and other sound effects, to represent, analyse, understand and explore the internal structure, data flow control, behaviour and other aspects of a program. The employment of sound in programming environments or in some of their underlying interfaces is sometimes also called program auralisation LogoMedia The LogoMedia programming environment [142] can be thought of as a software engineering environment where the programmer is supported by information in an audio form during program creation, execution and review. For example, some features of the LogoMedia programming environment are the following: In program execution, the relative values of variables are indicated and represented by tones and pitches. In program creation, opening and closing parenthesis are associated with a number of different tones. In viewing a program, special sounds are associated with particular segments of a program. The LogoMedia environment had its origins in earlier experimental work which produced a prototype system called LogoMotion [143] which developed the Logo language for animating procedures in programs. Using the Logomedia system, Software engineers can design special auralisations by placing software probes in their programs. There are two main types of probe in LogoMedia - the control probe and the data probe. Control probes are mainly used for monitoring a program's control flow. Particular sections of Logo software can be associated with particular program.50

65 sections prior to execution, for triggering sound commands during execution. Data probes are used for monitoring data flow and can be associated with arbitrary Logo expressions. Changes to these expressions trigger sound commands. Several synthesised musical instruments can be used and controlled by the program, sound samples can be played back, and pitch and volume levels can be adjusted. A graphical user interface is provided in the LogoMedia environment so that a user can specify control and data probes graphically [144, 145J WinProcne/HARP Experimental research work has also been directed towards the production of software engineering environments for the processing of sound and music. Early work developed from the low-level manipulation of music scores and composition algorithms [2J to software systems for sound synthesis such as music V [146J and music [147J. A multi-paradigm software environment for the real-time processing of sound, music and multimedia has been introduced by Camurri, Innocenti and Massucco [148J. It involves a software architecture for the representation and real-time processing of sound, music and multimedia using artificial intelligence techniques. The resulting system was called WinProcne/HARP. The HARP (Hybrid Action Representation and Planning) system is based upon the WinProcne (WINdows PROlog tool Combining logic and semantic NEts) [149, 150, 151 J system and bffers facilities for the storing and processing of music and sound as well as carrying out plans for manipulating such data. In technical terms, there are two formalisms in the underlying knowledge base of the system - the analogical and symbolic levels of representation. The analogical level is a low level sound representation with all its associated data. It is based on a metaphor of a mental model [152J. The symbolic level of representation has a declarative symbolic environment and a multiple inheritance semantic network which has been based on KL-ONE [153, 154J with a number of additions such as temporal primitives and typing mechanisms. The main architecture of the system is shown in figure Finally, there are a number of applications which have been developed using the WinProcne/HARP software environment such as multimedia and composition projects [155J and modelling and simulation of advanced robotic problems [149J. 51

66 Symbolic Level WinProcne Symbolic DB IT.BOX r ~ t - I N T E R F I A-BOX \-E ~ A 1 Prolog Engine C E Analogical Level Experts I I expert oxpert I I I '-- I Analogical KB I Figure 2.17: The main architecture of the WinProcne/HARP system InfoSound InfoSound [156J is a prototype system which presents application program events, which are difficult to present visually, using music and special sound effects. This is achieved by allowing the user to create, store and associate musical sequences and sound effects to application events, and by playing the music and the sound effects during program execution. The research work is a continuation of previous work on using sound to represent numerical data, and to provide cues about events and options in computer application environments. It continues work carried out by Bly [157J to represent multivariate data using musical sound, by Mezrich [158J to represent multivariate time series using musical sound, and by Morrison and Lunney [159J to represent chemical spectra data for visually impaired users using musical chords. The InfoSound system offers a number of facilities and mechanisms for the design of music (Le., auditory icons, earcons) and everyday sounds which themselves can be associated with program events which can be heard during program execution in a parallel or sequential application mode. InfoSound is an interface toolkit but not an environment in itself. The toolkit is part of the IC* project [160J which is an environment for the design and development of sophisticated software systems such as telephone networks. The system's underlying architecture, as shown in figure 2.18, has the following six components: 1. Sound composition system. This is a system which facilitates the creation, storage and playback of musical sequences, motif.~.52 and other sounds. The

67 Musical Instrument Digital Interface (MIDI) is used for the composition of music. 2. Sound storage system. This is a library system which stores musical sequences and sounds in MID I format. It also facilitates the retrieval of sound data for editing or playback. 3. Playback system. This is a system which plays musical compositions and other sounds. Audio properties such as duration, song orchestration, amplitude, frequency, stereo and panning are also controlled through this system. 4. Application program interface. 5. Sound generation system. This is a system which converts MIDI data into real sounds. It is achieved by transmitting MIDI data from the MIDI interface card to a multi-timbre synthesiser and an electronic sampler which generates sounds. 6. Sound amplification system. USERS T Sound Compo5ition System I Sound So"", Sound Storage System Generation Amplification - System System I Playback System Application Program Interface! APPLICATIONS Figure 2.18: The architecture of the InfoSound system. There are some applications which have been developed using InfoSound, such as a telephone network service simulation and a parallel computation simulation. In the telephone network service simulation, the following sound sequences and speech were used in the simulation: Telephone ringing. 53

68 Telephone dial tone. Touch-tone dialing. Telephone busy signal. Telephone receiver hang-up. Telephone receiver pickup. In parallel computation simulations, six processors were used and each of which was assigned to calculate and present the side of a rotating cube. A musical sequence was introduced and associated with each side of the six cube sides such as: Percussion sequence. Bass line. Piano melody. Flute. Violin. Voice. During execution, when the six processors, running in parallel were synchronised, the music was synchronised and the whole sound became six part harmony. In contrast, when the processors were running asynchronously, one could hear that the sound was asynchronous as well as cacophonous. The application also combined graphics as well as musical sounds and it was found that music provided a natural representation of parallel computation concepts. 2.7 Visually Impaired Computer Users As Martin Luther King has remarked we're all created equal but we're not all created the same. Therefore, various human beings have different needs and these needs became more evident when using software under a particular user interface. There are a number of special needs groups which can be defined among the general user population. Some of these user groups are: Speech and hearing impaired..54

69 Learning impaired. Physically impaired. Visually impaired. Visually and hearing impaired. For all these particular people, researchers have developed a number of techniques, methodologies and special peripheral devices which assist them in interacting with a computer or a computer oriented information system. Users with speech and hearing impairments can be assisted by synthetic speech, text-based communicators and conferencing systems. In addition, there has also been an attempt to establish turn taking protocols for communication purposes. Users with learning problems (e.g. dyslexia) may need to use speech input and output as opposed to reading and writing. Alternatively, in less severe cases, special purpose spelling checkers can be used. Physically disabled users can use speech input and output or an eyegaze system. In more severe cases, such as those where head movement is not possible, gesture and movement tracking can be employed (see [161] for an overall reading in interfaces accommodating special needs of users). Finally, visually impaired users can use screen readers which use synthesised speech, braille output devices and a number of interface systems which employ audio such as those ones discussed in an earlier part of this chapter. It is beyond of our review scope to discuss in detail the needs and the relevant technology and research supporting those needs in terms of HCI for these user groups. Thus, in the next sections, we concentrate our analysis and detailed discussion on the group of blind and visually impaired users as well as non-blind users. A number of experiments were conducted as early as the 1940s in order to determine what blind human beings are capable of perceiving about the outside world without, of course, using the visual sense. Supra, Cotzin and Dallenbach (1944) [162] experimented with the effectiveness of auditory cues for obstacle perception. The results of their experiments proved that blind subjects can realise and perceive obstacles on the basis of sound information alone. Almost two decades later, Kellogg (1962) [163J found that blind subjects were able to derive the distance of obstacles using self producing sounds, and also distinguish surfaces such as wood, metal, glass and other. The overall conclusion made by Kellogg was that blind human beings were in a position to detect echoes reflected from objects which in themselves provided sufficient information for object detection and avoidance. Cotzin and Dallenbach (1950) [164] noted that changes in pitch, as opposed to changes in.55

70 the loudness of the echoes, were the primary cues for auditory localisation and perception of obstacles. In object localisation, Rice (1967) [165] found that high frequency sounds were more effective. In addition, it is noted by Verraart (1987) and Hollins (1988) [166, 167] that the performance of blind human beings for spatial localisation is also affected by the age of blindness onset and the time since it occurred. People who had sufficient visual experience prior to losing their sight are more effective in object localisation and orientation than other human beings who were born blind or lost their sight at a very early age. This is mainly because such people lack sufficient visual experience. Blind humans with not sufficient visual experience are not as effective in judging the distance of sounds as others with more visual experience. Any sort of audio interface application will benefit the computer user population. The degree of benefit that users derive from audio applications is difficult to measure because it mainly depends on a trade-off between the particular user needs and the application itself. Audio interfaces benefit the blind user population by allowing them, through auditory interfaces, to have access to computer system applications. Entirely or partly user interfaces may well prove themselves as enlarging the degree of benefit that a blind user can gain. However, musical interfaces may well benefit non-blind users in the way they interact with computer systems Output Media of Non Visual Display Vanderheiden (1989) [168] reviews non-visual display techniques for icon-based systems. In summary, these are: Running text. Braille (single character, cell line, or full page). Speech. Tactile image (Optacon). Morse code (auditory and tactile). Text attributes. - With speech: * Announcement. * Background tone (presence, pitch, volume, apparent direction).. 56

71 * Sound (presence, type, pitch, volume, apparent direction, short or continuous). * Speech attribute (timbre, voice, volume, pitch). * Environment (echo, tremolo). - With braille: * Frequency of pin vibration. * Slow pulse of pins. * Extra pins at top or bottom of text. * Separate tactile or electrocutaneous stimulator. Spatially related (text) information. - Apparent source of speech. - Pitch. - Haptic sense (with Haptic tablet or joystick). - Haptic with tone. - Tactile tablet (full or virtual; with tones, speech). - Grids. - Temporary ridges. - Lockable tracking mechanism. - Driven cursor (puck, mouse, joystick). Pick from lists. Direct request (voice or keyboard). Best match( es). High or low speed scan. - Direct control scan (slidei' with indents or puck with ridges between choices). - Search or read as block of text. Interruption or alert. - Concurrent sound (beep, tone, or sound). - Speech announcement. 57

72 - Tactile simulator (vibrator or e1ectrocutaneous). - Olfactory. Directing. - Speech. - Tone and echo location. - Pitch and repetition rate of beeping tone. - Pitch and timbre. - Virtual source. - Braille. - Tactile direction indication. - Haptic tablet with auditory or tactile cross-hair (absolute position). - Driven mouse or puck (absolute position). Image-icon. - Recognised and announced (by number or name). Image-stereotypic. - Image interpretation with verbal output. Image-pictographic. - Pressure-sensitive full tactile tablet (with tones, speech). - Virtual tactile tablet (with tones, speech). Driven puck, graphics or tactile tablet (using guided or constrained movement). Object dropout. Zoomed images. - Image interpretation. A diagram reader program for the blind [169] (called AUDIOGRAF) has been developed. This enables blind and visually impaired people to read diagrams with the aid of a touch panel and auditory display. In the AUDIOGRAF a model of audio-tactile exploration has been chosen. The AudioGraph program documented.58

73 in chapter 4 in this thesis, aims to investigate how structured music can be utilised in this particular problem domain and not so much to provide a complete interface for blind users. Further reading about user interfaces and blind users can be pursued in [161] Human Factors Griffrith (1990) [170] remarks the following: Computers provide a vehicle for integrating people who are blind or visually impaired into the mainstream. Unless a number of human factors issues are successfully addressed, however, computer use by people with vision impairments will remain low. Aaronson and Gabias (1987) [171] describe seven classes of human factors problems that may occur between blind or visually impaired users and special purpose input-output devices which, in brief, are: 1. Limited product usability. Some input-output devices are incompatible with certain types of computer and they may also interfere with other input-output devices when they are employed with them on the same system. 2. Unreliable and incomplete information. Specialised input- output devices may not be able to accommodate all video information either because of buffering problems or inappropriate user reaction when using them. 3. Limited perceptibility. This issue refers to problems such as the lack of sufficient resolution in visual enlargement devices or difficulties in understanding synthesised speech. 4. Overstressed human memory capacity. Blind and visually impaired users need to remember more and more information as they develop their use of a system in order to use it efficiently and effectively as well as maintaining control of it. 5. Disruption of motor tasks. This problem usually occurs when function keys are not located in optimal and convenient keyboard positions. So, when a user attempts to use these keys there may be a re-location of the user fingers from the home-key positions. Further, after user failure, the return back to the home-keys may result in a series of problems. 59

74 6. Unreliable or slow feedback to the user. A blind or visually impaired user may have to follow a sequence of steps in order to access and understand a warning or an error message which appears in the monitor. 7. Slower information processing. This problem results from human factors problems mentioned above 'and others which all of them contribute to an extend towards slower information processing. 2.8 Summary A number of aspects in the us~f music as a communication metaphor have been reviewed in this chapter. In particular, these were aspects of acoustics and perception, guidelines for interface design using auditory icons and earcons, and a number of auditory interface application. 60

75 Chapter 3 Communicating Using Music 3.1 Introduction This chapter documents a set of initial experiments carried out in order to obtain some understanding of how an average person (i.e., non musically educated) perceives sequences of notes of rising pitch, timbre and stereophony. The results, when taken with existing guidelines on the use of sound in interfaces, are used as the basis for the design of some experiments which use music to communicate some contents of a complex software engineering database and aspects of the execution of a sorting algorithm. The empirical evaluation of these interface situations is expected to provide a preliminary understanding of how to use music in interface design in different domains. 3.2 Research Approach and Tools Used Musical Structure and Understanding There are many different types of musical structures which can be used to communicate information. At the basic level there are single notes (or a short series of single notes). Consider a bell sound output by a computer, an ambulance siren or even a conventional doorbell. All these sounds inform about a particular event. They have a very simple musical structure. If more complex musical structures are taken, then more information. can be communicated. For example, if t.he doorbell was pressed in the sequence 3 times :2 times :3 times, then the identity of a visitor might be communicated. In user interfaces a single note can only communicate a 61

76 simple event - say, success or failure, something has happened. To communicate more information we need to take advantage of higher structures in music which involve a number of properties such as pitch, timbre, rhythm and harmony. Earcons are examples of musical structures which communicate information and there are experimentally derived guidelines available for using such structures effectively (see section 2.6.1). However, there are higher levels of musical structure available than those provided by Earcons. At these levels music is characterised by structures such as Major and Minor scales, tunes, complex rhythms, timbre combinations and harmony. Most of these structures can be used both in works for large sets of instruments or for solo instruments. The more complex the music becomes, usually the more the emphasis on harmony and timing is required to hold the work together. Finally, when one listens to a concert, the interpreter adds another dimension to the experience. Human beings can comprehend these exceedingly complex levels of musical communication and can even comment and compare between different performances. However, there is a lack of understanding as to how much information listeners with no special musical ability can comprehend. The important research question is - to what level of complexity can we go in using music in HCI? We require effective mappings between domain actions and musical structures, and there is a lack of guidelines or evidence to help us in this task. Examples of relevant questions include: Can they distinguish note sequences across Octaves? At what level can users comprehend Rhythms and Tunes? By how far can users identify and distinguish different Timbres? How successful is Stereophony? How do we use these structures as a communication metaphor for improving the usability and effectiveness of interfaces? Some musical structures may be able to be used with no training. For example, a Klaxon to indicate an error, a mapping into rising notes to represent data, or use of different timbre to indicate a particular program module. We will call these self explanatory messages. Other messages will require some learning (for example many of the Earcons which have been proposed will have to be learned): We will call these Trained messages. Interfaces will need to use both types but a preponderance of the first type will make the interface easier to use. To examine these questions, an experimental programme has been carried out. 62

77 Firstly, a number of experiments have been performed using basic structures involving pitch, timbre and stereophony in an attempt to obtain a viewpoint about the perception of musically untrained listeners. Secondly, using music, the auditory channel, may be able to enhance or refine messages received via the visual medium. An experiment has therefore been carried out to probe applications where the visual information is complex and might benefit from the addition of music. Because of its familiarity to the author, the Object Management System (OMS) of the Portable Common Tool Environment (PCTE) was chosen to be used as a problem domain to investigate the possibility of combining music with visually perceived stimuli in order to improve overall understanding by the user. OMS is an arbitrary inter-connected graph system supporting an objectoriented database. Frequent browsing and referral to the database is needed by software engineers in order to carry ou t essential software engineering tasks. All tools in the OMS set attempt to offer a browsing facility for this special purpose database and currently use visual techniques via filtering to produce usable displays. However, in spite of these efforts, it is still questionable as to whether a comprehensive facility is actually offered. A considerable amount of database information which, in certain circumstances, might be of particular importance for a software engineer, is hidden. The experimentation has been limited to using OMS in ad-hoc experiments to provide a general understanding of the cognitive issue involved in using music in these circumstances, and to raise issues involved in musical message construction, and design. Thirdly, the parts of the execution of the Bubble Sort Algorithm has been auralised with music utilising pitch, triads, rhythm, timbre, and stereophony. The music used in the sort algorithm (in contrast to the above) uses continuous changes of musical structure to communicate the state of the list and the re-arrangement which occurs. The experiment using the Bubble Sort Algorithm is used as a means of verifying and understanding how listeners perceive and process the following: Ordered and non-ordered pitch ranges. Rhythm in combination with pitch. Temporal arrangements and pitch comparisons between one or two instruments. The development of a pattern of what the algorithm does without the listeners knowing its detailed processing. 63

78 The abstract development of mental models of current list states. The results from experiments of this nature will enhance our understanding of how users mentally react to, and process continuous musical stimuli. The experiments are not trying to identify the best ever mapping in the Bubble Sort algorithm (since, as with visualisation, auralisation may be implemented in many ways) but are endeavouring to identify sets of perceivable and non perceivable musical structures. It is likely that musical structures will be interpreted, not in a monolithic way, but in relation to the context in which they are heard. If it can be shown that music can be used to communicate the information generated by the Bubble Sort algorithm, then it might be expected that music could also communicate other sorts of information. Other example domains might include: 1. Comparing the execution of two algorithms or modules which, in principle, accomplish the same task but have been implemented in two different ways. For example, the final sorted list produced by two types of sorting algorithms will be the same regardless of the sorting methodology followed, provided that both algorithms used the same list of numbers and sorted in the same, ascending or descending order. 2. Debugging a program or system especially III circumstances where logical errors need to be identified and corrected. 3. Comparing parallel program output for similarities or differences. 4. Assisting blind users in monitoring execution of their programs through auralisation (providing them with an alternative to visualisation). Fourthly, one of the potential domains which may be important in the use of musical structures as a communicatiori metaphor, is blind user human-computer interaction. Most interaction in user interfaces today is biased towards the visual channel with only the occasional use of sound. This situation places blind users, wishing to utilise computer technology, in a disadvantageous position. Although some use has been made of spoken messages, music may provide additional advantages particularly in the manipulation of spatial data. This thesis therefore explores this possibility by investigating the use of musical structures to represent graphical information such as diagrams. The aspects which will be examined include: 1. The construction of an overall interface mechanism utilising music and some speech (only the musical aspect is investigated). 64

79 2. Development of a graphical drawing area within which the geometrical objects are accommodated using music. 3. Musical support for Geometrical shapes. 4. Musical support for Cursor movement and location. 5. Describing movement of geometrical objects and editing operations upon them.. via music 6. Using music to obtain an understanding of arrangements of graphical objects within the graphical drawing area and the inter-relationships amongst them (e.g. three lines or circles close to each other). 7. Using music to assist in navigation of the cursor in a user pre- determined direction according to the available space or other graphical contents within the graphical drawing area. The research therefore has involved the construction of an experimental framework (AudioGraph) to investigate the possibility of using music to assist blind computer users in handling graphical information. Although most of the above concepts will be communicated using music, some speech support will also provided when exclusively requested by the user or when text needs to be communicated. One primary research objective involves the exploration and subsequent identification of musical mechanisms for reading and locating particular graphical object on the screen. This needs to be accomplished before investigating facilities for editing or modification. Pure auralisation of a diagram also requires auditory perception of the space within which the diagram is located. A second objective focuses on the development of auditory means by which a user can understand graphical information (e.g., squares, circles). Reading a diagram using music is a difficult task because diagrams may be exceedingly complex. Usually, they incorporate graphical objects (e.g., circles, squares), text, colour and other attributes such as reference viewing points. The complexity of a diagram may affect the user's understanding of it, after it has been created in sound by an auditory algorithm. By breaking the task of auditory reading into sub-processes, the complexity may be reduced. Major sub-tasks include: 1. Establish an auditory space. 2. Develop mechanisms for auditory location positioning within the space. 65

80 3. Develop auditory identification techniques for graphical objects and their size. 4. Develop mechanisms for communicating the distance of graphical objects with relation to the cursor. 5. Investigate presentation order of auralised graphical objects. In graphics visualisation, a programmer plots graphics within a particular visual space that it can be either the whole VDU screen or a subsection of it. Likewise, in attempting to audiolise graphics, an auditory space field must be defined. The audi tory space may well vary in size in the same manner as visual spaces according to application needs. One of the questions regarding auditory field size is what are the boundaries of the range? In other words, how small or large can an auditory space be? In visual graphics, the standard VDUs offer a maximum space for simultaneous visual graphical presentation that is the full screen size. What is the equivalent in an auditory field? The major research questions here are: 1. How one can represent a musical space varying in terms of size? 2. What sort of music can be used for a representation of this sort? 3. Can a partial view of the space be communicated? Obviously, the above questions are inter-related because the answer for the first may have an effect on the answer for the second and vice versa. Once an auditory space has been established, the next research concept which needs to be addressed is how can one perceive a particular location within it and how precise should this perception be? Successful mechanisms for abstract location within an auditory space, with reasonable precision, will lead to the introduction of a musical cursor and musical feedback to the user about the cursor position within the space. Such auditory algorithms may produce results with a reasonable perception and precision, and be able to cope with reasonable complexity. This thesis aims to, at least partially, investigate these interface situations by probing with different musical structures. Thus, the research work in this thesis has been divided into four sets of experiments in which the results of one set contribute as input to the other. The objectives of the four sets of experiments are: First Set: General experimentation concerning the perception of musical structures. This will offer an overall understanding with regard to the perceptual capabilities and limitations of listeners when exposed to extended musical structures. There are three experiments: 66

81 EMPIRICAL INVESTIGATION General Perception Experiments with Some Musical Structures... ::,0.. Chapter 3 Experiments with PCTE "'... ~ Chapter 3 OMS based Data, Reduction of Visual Complexity Experiments in....;:;. Chapter 3 Communicating the Execution of a Sorting Algorithm Experiments in the... _ ~ Chapter Graphical Problem Domain with AudioGraph Experimental Framework 4 Experimentally Derived General Purpose Guidelines for Auditory.Musical Interface Design !> Chapter 5 Figure 3.1: The Stages Involved in the Experiments. 67

82 Perception of Rising Pitch (Section 3.3). Identification of Musical Instruments (Section 3.4). Stereo Perception (Section 3.5). Second Set: Experiments in using music to complement information received through the visual channel. This will offer an understanding of how music can complement visual stimuli and to establish the possibility that music may have a direct application in overcrowded visual displays (Section 3.7). Third Set: An Experiment in using music to communicate aspects of the execution of a sorting algorithm. This will help to understand further how musical structures in continuous musical messages are perceived and it will offer an overall indication of how music can be used to communicate the current and progress list states as they sorted by the algorithm (Section 3.8). Fourth Set: Experiments in using music to convey information of a graphical nature for blind computer users. This set of experiments must take into consideration results and experience gained from the previous three experiments and validate the use of music in this challenging problem domain of blind user interaction (Chapter 4). Figure 3.1 shows the stages and the relationships among those stages Tools Used The experiments with music in this thesis are implemented on an IBM compatible computer equipped with a Sound Blaster Card using Turbo Pascal. The Sound Blaster Software Development Kit (SBK) supports the production of third party software for a wide range of Sound Blaster Cards! [172, 173, 174, 175, 176, 177]. Figures 3.2 and 3.3 show the relationships between the applications and the SBK tools Subjects and Feedback Two types of participants will be involved. - Non-blind and Blind users. Although one of the goals is to examine auditory screen perception of blind users, it is also hoped that music may be able to improve interfaces for non blind users as well. It 'The SBK and Sound Blaster Card(s) are trademarks of Creative Technology Ltd. 68

83 Application Application Level I 1 High Level AUXDRV Digitised Sound Driver High Level Drivers Drivers I CTMMSYS Driver J 1 J Low Level Drivers CPS Driver er SOUND Driver Device Level Derivers J 1 Hardware Hardware Level Figure 3.2: The Relationships among application, SBK, Audio Card and Hardware [172J. Application Application Level CTMIDIDRV Driver Level Sound Blaster Music Synthesizer MIDI Port Hardware Level External MIDI Devices Figure 3.3: The Relationships between application and SBK Loadable Drivers [172J. 69

84 is, of course, pre-supposed in these experiments that non blind and partially sighted users will not be able to see the VDU (Video Display Unit). However further work where users are able to see the VDU would be useful. User feedback will be gathered before, during and after experiments are performed. On completion of an experiment, subjects will be asked to answer a questionnaire. In some circumstances, subjects will be interviewed in order to receive further feedback. Musical knowledge of all subjects will be examined via a questionnaire (see appendix B). The questionnaires determining the musical knowledge of the subjects will be examined and the decision about their musical ability will be made subjectively. 3.3 Ascending Pitch Note Sequence Experiments The pitch of a note depends upon the frequency, which is the number of vibrations per second. The higher the frequency, the higher the pitch of a note and vice versa (see Chapter 2). One of the most fundamental concepts that one needs to know, in this particular context of investigation, is the way in which pitch can be used in a musical HCI message in order to deliver an intended message to a user. The question to be answered in this experiment is the following: Can listeners perceive an abstract concept of 'length' by hearing a sequence of musical notes (from a particular scale) presented in an ascending order? If yes, does the musical scale (diatonic or chromatic) and the instruments used affect the perception? Can two sequences of notes be perceived as well as one? (i.e. can a user perceive two length values from two, succeeding sequences?) Does cultural background affect the above results? The general procedure for the experiments was the following: Participants listened to a scale sequence of 40 notes and were asked to associate the notes of the sequence with linearly increasing numbers ranging from 1 to 40 where the first note was 1, the second 2 and so on until the fortieth note, which was 40. The scale sequence of notes was presented to participants five times for familiarisation purposes. 70

85 A subset of the ascending sequence of notes, beginning at the same root position, was played and participants had to state a number between 1 and 40 which they thought represented the total number of notes played in the subset sequence. This same procedure was used with two musical instruments (piano and organ), with the chromatic and diatonic scales, and either one, or two subsets of sequences of notes were presented. The cultural background of the participants was also taken into account. Thus the experiments involved: 1. One Sequence of notes (see section 3.3.1): (a) In Diatonic scale: Using piano with a group of 8 British origin subjects. Using piano with a group of 8 International origin subjects. (b) In Chromatic scale: Using piano with a group of 8 British origin subjects. 2. Two Sequences of notes communicating one after the other with a three seconds pause in between (see section 3.3.2): (a) In Diatonic scale: Using piano with a group of 8 British origin subjects for both sequences. Using piano with a group of 8 International origin subjects for both sequences. Using piano (for the first) and organ (for the second) with a group of 8 British origin subjects. Using piano (for the first) and organ (for the second) with a group of 8 International origin subjects. (b) In Chromatic scale: Using piano with a group of 13 British origin subjects for both sequences. Using piano (for the first) and organ (for the second) with a group of 13 British origin subjects. 71

86 All subjects were undergraduate students of Loughborough University. The international subjects were from Europe (1 from France, 1 from Germany, 1 from Spain, 1 from Belgium), Asia (1 from Hong Kong, 1 from Malaysia, 1 from Singapore), and from USA (1 from Arizona State). All subject did not have knowledge in music. This was elicited from a musical questionnaire they were requested to answer. In the following sections, the results of these experiments are presented with a discussion about their potential application in interfaces using non-speech audio. The results are analysed to determine how successfully the representation can communicate an abstract value to the listener. This value, of course, could be a number, a length, a distance, or a value comparison One-Sequence In the one sequence experiments, the 40 note sequence started at E2 and rose to A5(fiat) in the chromatic scale (and similarly in equivalent notes following the diatonic scale) using the piano from the (Roland MT -32) multiple timbre synthesiser. The notes had a 0.3 second time delay between them. Their duration was 0.15 second each. Subsets of the original sequence of notes represented a particular value within the range 1 to n (n<40). Subjects were asked to identify the number (within 1 to 40) which they believed best corresponded to the subset sequence of notes being played. The subset sequence of notes was played in the same manner as the original sequence. The scale sequence of notes was played first and then the subset sequence played three times for each of the values chosen to be tested. Sequences were played in a random order for all groups participated in one sequence experiments as discussed in section 3.3. An overall analysis of t.he error in subjects' percept.ion of one sequence results is shown in table 3.1. These results show that a.ccurate perception of the exact number communicated is low as, of course, one might have expected. The range of error in subset perception (including the extreme values) does not typically pass beyond ±5. However, approximately 60% to 70% of the measurements (varying from group to group and from scale to scale) demonstrated an overall error which is within ±2. As one can observe, the high errors exhibit very low percentages. The mean, mode and median for ta.hle 3.1 are very similar (see Appendix C). Positive correlations were found between all three sets of data (as shown in table 72

87 Error Diatonic Diatonic Chromatic British Internat. British 88 (%) 88 (%) 88 (%) 5 1 (1.1) 1 (1.1) (1.1) 2 (2.2) 3 (3.4) 3 3 (3.4) 5 (5.6) 2 (2.2) 2 7 (7.9) 13 (14.7) 6 (6.8) 1 6 (6.8) 10 (11.3) 9 (10.2) 0 29 (32.9) 24 (27.2) 31 (35.2) (13.6) 4 (4.5) 12 (13.6) (12.5) 8 (9) 12 (13.6) -3 6 (6.8) 5 (5.6) 6 (6.8) -4 4 (4.5) 6 (6.8) 3 (3.4) -5 2 (2.2) 6 (6.8) 3 (3.4) -6 2 (2.2) (3.4) 2 (2.2) 1 (1.1) -8 2 (2.2) - - Table 3.1: The overall error as shown from the one sequence experiments in the Diatonic scale for British and International groups and in the Chromatic scale for the British group using piano (for all three groups). 3.2). In the diatonic scale, the correlation found for the British origin group III one-sequence was relatively high (Pearson's r=0.9973, df=9, critical value is at ) exceeding the critical value significantly. No significant difference in correlation was observed from the one- sequence perception from the international origin group in the diatonic scale (Pearson's r=0.9959, df=9, critical value is at ) which also exceeded the critical value. The change of scale in Chromatic did not generate significant difference in subjects' perception (Pearson's r=0.9983, df=9, critical value is at ). Scale I Voice(s) I Group I Pearsons r I critical Value I p< Diatonic Piano British Diatonic Piano International Chromatic Piano British Table 3.2: The correlations found against the mean of the perceived values in the Diatonic and Chromatic scales with df=9 for the one-sequence experiments. Further statistical analysis of the results indicates that there is no significant difference between British and International participants. In the one sequence of notes experiment using the dia.tonic sca.le, results shown no significant difference 73

88 I I "I " 1, I I 1 I 4 I I Chromldc, BrlWb DlBle>n.lc, Inlel1latWnill... ~i~~... ~.~~ ,, 26l ~... ~... I '2 i : ~ ",,... I 2 I 1 1 I I I I I ' L J?IIIe>~~... ~~.~ a;ro;;;:~ Brilliii... )... J?III.O~... ~.~~... j 12~ ~... I I , 1, I I I I 4 I 2 1 1,, DlBle>1lk:, BrBilh.. J "'a;ro;;;uic;"i3riii;b......, PllltOnic,...!?!~~... BrtUb IDtmlBlloIlll S Em:r la Subjcru' Pc=ptiOll Figure 3.4: A graphical representation showing the frequency of error in the stimuli presented using diatonic scale for British and International groups and Chromatic scale with the British group in the one-sequence experiment. statistically (related t=0.841, df=10, critical value is at 0.05 or 5%) in one tail hypothesis. Thus, the null hypothesis of no significant difference is retained showing that both groups (British and International) derive from the same population. The above result fall in the 95% area of the probability curve. The results in table 3.1 indicate an overall accuracy. Another important property to examine is if the accuracy changes over the range of the length. Figure 3.4 shows the frequency of error across the range of the length of the experiments performed with the diatonic scale (British and International subjects) and with the chromatic scale (British subjects). It can be seen that the distribution of frequencies is small for small numbers (e.g., 2,4,8,12) and higher for greater numbers (e.g., 18,22,26,30) in both Diatonic and Chromatic scales with both British and International subjects. It can also be seen that the extreme values are -7 and +5. However, the lower and upper limits 74

89 must also be considered. Subjects knew that the range of the values they had to choose were between 1 and 40. Thus, no subject could have possibly chosen a number which was smaller than 1 or number which was greater than 40. The shaded areas in figure 3.4 show these limits. With this in mind, figure 3.4 indicates that sequences communicating numbers 2 and 4 have a maximum error of ±1, 8 has a maximum error ±2, 12 has a range of error between +3 and -4. The rest of the results in using one sequence were very similar to the ones shown in figure 3.4..,} r-- Diatonio, Briilb K6l''' if {- Diatonio, llll«national '--- Chr."...a, BrUh Figure 3.5: A graphical representation showing the mean and standard deviation of the stimuli presented using diatonic scale for British and International groups and Chromatic scale with British group in the one-sequence experiment. The mean perception together with its standard deviation over the range of the length is shown in figure 3.5. It can again be observed in this figure that small numbers have a mean very close to the stimuli presented and have small standard deviations. On the other hand, large number have larger standard deviations. However, the standard deviation seems to stabilise. The linear positive correlation is also intuitively obvious. Sequences were presented in a random order. Thus, if a learning effect was encountered then groups would have reflected this by showing poorer results in the 75

90 first sequences heard and becoming better as more and more sequences were heard. Given that the frequencies for individual stimuli appear to be similar, then there is no evidence to support the hypothesis that significant learning effect has taken place. The differences in the mean perception among the three different groups is similar and, as shown above (from the t-test), statistically insignificant. Again, the lower and upper limits of error (as discussed for figure 3.4) must also be considered in viewing this figure. This could, of course, be an explanation for the smaller deviations in the two ends of the scale. However, small numbers (e.g., 2,3,4,5) typically exhibit less standard deviation values when compared with large numbers (e.g., 37,38,39,40). Thus one may assume that for small numbers subconscious or conscious counting takes place and it is for these reason that subjects perception is more accurate. Clearly in large numbers (e.g., 40) any form of counting is difficult and thus larger standard deviations are observed Two-Sequences The results of the two sequences (i.e., communicating two number at a time) are similar to the results observed in one sequence experiments discussed in The overall error across the scale is shown in table 3.3. Figure 3.6 shows the frequency of error in the first sequence of the experiments with two sequences in the Diatonic (British and International) and Chromatic (Bri tish) using piano. Figure 3.7 shows perception for the second sequence for the same experiments. All other results obtained were very similar to the ones presented in these figures (see Appendix C). It can be seen in both figures that the spread of the frequency distribution is narrower for small numbers (e.g., 3,6,9) and wider for larger numbers (e.g., 21,24,27). The shaded areas show the lower and upper limits of error and thus of the distribution similarly to the shaded areas in figure 3.4 of section It can be observed that ±2 is a typical error for numbers 3, 6, 9, 12, and 15. However, the error increases to typically ±5 for the rest of the numbers (see figure 3.6). Similar observations can be made for the second sequence shown in figure 3.7. The means and standard deviations are plotted for the first and second sequences in figures 3.8 and 3.9 using piano for the first sequence and organ for the second sequence in all groups presented. It can be clearly seen that the response to the second sequence is similar to the response of the first sequence. 76

91 r _._.-._._._._._._._... _._._ ~ ~" -- ~iii\ 1 I I 1 21! 2 I 2 I I ~, -...oi...t\noll i t._._._._.~...._._..._ _.._.1... _..._..._._.._.. ~... l\<i~t1!..._.. ~U ~ ~ iif,:i.iil ~--~)!.~~... B.rl.~~... ~!..j _ _ _ _ i- i -j-, -i- _ 1h i ~ ~- - ifwiil 211 I I DIaIooio, ~"""CWli i.._.._..._._._..1.._ L......_..._._._._._._._._~... _~~.-;._.. _.~).. J r _ _ _ - _... h _ _ '..._...;._;_., ~ ~ - ij:j,; iii) u\ S I I 1 ~io, "'_... \NoI1 i r _ - i- _ -;- ; _;-; - _ - _ - _ i _ _ ~ (.._.. _..._._.._._ _ _._._.1..._... h _ _ _._._._._._~,.l!<!~'!!._._.!l'""'t;.. B,iii;.i.."...(iW3j] 12! 2 I 3 I 1 ~ -..A aw); \._._._._._._._....._._ _ _.~,_.. ~t~ _.. _..!!~1.1 Figure 3.6: The frequency of error in perception of the first sequence in the twosequences experiments using piano. The results shown are from Diatonic scale with British and International, and Chromatic scale with British. I I I I, ~ "MI"* - iif,:ijj; rw.lo, awl t I I I :I "... ~... ~io,_.. ~.~~'!!._._.. ~!.j r _ - - _ _ -i- _ _ - - -i- ; -i-, _ i:- :t _ _ _ ~.~ (iw:i)] 261 I I I 2 I 1 ~. _... awl i i._._._._._._._._.._._j..._...l._.. l...l....1 _.. L._._._~_._l!ri!"... ~u r -""- _ _ I :I -~ ~ - (,N;iii:, I I 2 I ~ -.ud...tljiofl! _._ _... ~Io,.ll;d~"!._... ~u 1 nj r _ _ _ -i-, -;-;, _ ~-~--- - (:Ni;ijjl l&i 2 I' 2 DIatooio, awl! \._._.._._._._ _._..._..._. _._._. ~. ~.d~..._.. _.. ~.j 12 [.. _._._..-._._.._._..-.;:.-"1.-:;-..,... _._... ClnUio, BtIIii.i.. (:Ni;iii] = 1...l..! ;... ~.. ~... 1 =":.;;.1 Figure 3.7: The frequency of error in perception of the second sequence in the twosequences experiments using piano. The results shown are from Diatonic scale with British and International, and Chromatic scale with British. 77

92 E Brit. Brit. Brit. Brit. Int. Int. Int. Int. R 1st 2nd 1st 2nd 1st 2nd 1st 2nd R Seq. Seq. Seq. Seq. Seq. Seq. Seq. Seq. 0 Piano Piano Piano Organ Piano Piano Piano Organ R 88 (%) 88 (%) 88 (%) 88 (%) 88 (%) 88 (%) 88 (%) 88 (%) 9 1 (1.3) 1 (1.3) (1.3) (1.3) 1 (1.3) 1 (1.3) (1.3) 1 (1.3) 1 (1.3) 6 1 (1.3) 1 (1.3) 1 (1.3) - 1 (1.3) 2 (2.2) 1 (1.3) 3 (3. 4) 5 1 (1.3) 1 (1.3) 2 (2.2) 2 (2.2) 2 (2.2) 3 (3.4) 4 (4.5) 1 (1.3 ) 4 4 (4.5) 3 (3.4) 3 (3.4) 3 (3.4) 1 (1.3) 3 (3.4) 2 (2.2) 3 (3. 4) 3 3 (3.4) 4 (4.5) 5 (5.6) 4 (4.5) 3 (3.4) 2 (2.2) 3 (3.4) 3 (3.4 ) 2 3 (3.4) 8 (9) 5 (5.6) 9 (10.2) 8 (9) 10 (11.3) 5 (5.6) 7 ( 7. 9) 1 10 (11.3) 6 (6.8) 9 (10.2) 8 (9) 15 (17) 8 (9) 11 (12.5) I 0 (11.3) 0 23 (26.1) 28 (31.8) 26 (29.5) 31 (35.2) 21 (23.8) 28 (31.8) 24 ( 27.2) 29 (32.9) (14.7) 10 (11.3) 16 (18.1) 13 (14.7) 9 (10.2) 8 (9) 12 (13. 6) 5 (5.6) -2 9 (10.2) 12 (13.6) 8 (9) 7 (7.9) 12 (13.6) 9 (10.2) 11 (12.5 ) 11 (12.5) -3 8 (9) 3 (3.4) 5 (5.6) 6 (6.8) 5 (5.6) 7 (7.9) 4 (4.5) 5 ( 5.6) -4 4 (4.5) 3 (3.4) 4 (4.5) 3 (3.4) 4 (4.5) 2 (2.2) 4 (4.5) 5 (5.6) -5 4 (4.5) 3 (3.4) 2 (2.2) 1 (1.3) 1 (1.3) 3 (3.4) 2 (2.2) 3 ( 3.4) -6 2 (2.2) 3 (3.4) 1 (1.3) 1 (1.3) 2 (2.2) - 1 (1.3) (1.3) 1 (1.3) 1 (1.3) - 3 (3.4) 1 (1.3) 1 (1.3) 1 ( 1.3) -8 1 (1.3) 1 (1.3) (1.3) (1.3) Table 3.3: The frequency of errors and percentages of the results with two sequences of notes. Once again, standard deviations are smaller for small numbers but higher for larger numbers. The upper and lower limits must be considered in viewing these figures, too. It does not appear from the data in these figures that there is any significant difference between the Diatonic and Chromatic scales for the perception of the sequences. The change of the musical scale, from diatonic to chromatic did not generate significant differences in the correlations. In particular, the correlation found in one-sequence using the chromatic scale and piano (Pearson's r=0.9983, df=9, critical value is at a ) was not different from the association found in the first of the two sequences's group (r=0.9987) and both of them are very similar to the corresponding ones found in the diatonic scale as it can been seen in the tables C.3 and C.4. However, in the second sequence (in the two sequences using piano), a smaller 78

93 " " ", I: A- j: 1. ".-- Ili.".." Brilllh..,., Hl--._-.. i...-- CIton>tIIo llritish Figure 3.8: A graphical representation showing the mean and ± the standard deviation of the stimuli presented using diatonic scale for British and International groups and Chromatic scale with British in the first sequence. 11 ~~ li1 d if I n1 1 "" I.'.. ~"~n"nl.,,""~a~n~n a~.nnn~ nm r--j)j~ Ibh ",~ Koyo: IJI- DI~ hernlliciiiii ~Owmol!tllri"'h Figure 3.9: A graphical representation showing the mean and ± standard deviation of the stimuli presented using diatonic scale for British and International groups and Chromatic scale with British group in the second sequence. 79

94 Sequence( s) Voice(s) Group Pearsons r critical Value p< One Piano British One Piano International Two: first Piano Piano British Two: second Piano Piano British Two: first Piano Piano International Two: second Piano Piano InternationaJ Two: first Piano Organ British Two: second Piano Organ British Two: first Piano Organ International Two: second piano Organ International Table 3.4: The correlations found against the mean of the perceived values in the diatonic scale with df=9. I Sequence(,) I Voice(s) Group I Pearsons r I critical Value p< One Piano British Two: first Piano Piano British Two: second Piano Piano British Two: first Piano Organ British Two: second Piano Organ British Table 3.5: The correlations found against the mean of the perceived values in the chromatic scale with df= 9. correlation was found (r=0.8089). Similarly, strong associations between stimuli and perception were found even when one of the instruments was changed in the two sequences (different group of participants, r= for the piano sequence and r= for the organ sequence, df=9, critical value at ). Similarly to the experiments with one sequence, no statistically significant differences were found between the British and International groups in the perception of the first sequence of the two (t=0.615, df=10, critical value is at 0.05 or 5%), or of the second sequence (t=1.496, df=lo, critical value is at 0.05 or 5%) using piano. No difference was also demonstrated in the two sequences using piano and organ (t=0.288 for piano, t=1.443 for organ, both with df=10 at 0.05 or 5%). Further statistical tests performed to identify any differences in the perception of note sequences between the diatonic and chromatic scale were, also, shown that there is no statistically significant difference. More specifically, no significant differences were found in one sequence using piano (t=0.029, df=20, critical value is at or 5%), in two sequences using piano (t=0.012 for the first and t=0.093 for the second sequence, df=31, critical value is at 0.05 or 5%), and, also, in two sequences using piano and organ (t=0.016 for the piano and t=0.172 for the organ sequences, df=31, critical value is at 0.05 or 5%) between the diatonic and chromatic musical scales. 80

95 ' ' 2.' ' ' ,. OJ OJ A A A ~ la ' la 2.8 I to.1 8.' " la ' la la.~ , ~ 22 la ' 8.' la la la ,., , '.4 2.' 2.' 2.' ' 21 IIA OJ ' , A A 8.' la ,] I.' Percentage of EnOI in Subjects' Perceptioll Figure 3.10: An overall graphical representation of the percentages of error in the stimuli presented in subjects irrespective of scale or timbre. Given that no significant variation was found in various groups used with different conditions, figure 3.10 shows overall percentages of the error frequency in stimuli presented in all groups. It can be seen that the spread is smaller (and near 0 error) for small numbers and it becomes wider in larger numbers. Typically, 50% to 70% of the data is within ± Discussion There are some general comments which one can make. Results shown a typical spread within ranges of ±5. However, small numbers (e.g., 3,5,7) shown a lower error range but, at the same time they were easier to be counted. Considering the results, an obvious question here is to question their usefulness and their applicability in using them to communicate using music. First of all, one must consider the objective of these experiment which IS to investigate if simple note sequences taken from the musical scales (i. e., chromatic or diatonic) can communicate an approximate value (e.g., number, length, distance) to a listener'? Why do we use the term approximate? It is used because the interest here is not the exact value. If an exact value is required then there no obvious reason why speech should not be used, or to structure the notes in such a way that they can be counted. Well, there are a number of circumstances where exact values may not be the required information. Humans, usually, 'subjectively appreciate' in the process of 81

96 forming their knowledge. For example, we do not go around checking constantly our watches but we do have an approximate appreciation of the elapsed time which has occurred. When a train passes through a station at speed, we obtain a 'subjective value' of its length. Usually, in none of these cases do people measure them. When, for example, we view a diagram visually, we do not take a ruler and start accurately measuring the dimensions of the objects involved. We appreciate dimensions and compare them with the dimensions of the other objects involved in the diagram. - The results discussed above show that the use of ascending pitch can be used as a metaphor to communicate approximate information of numbers, length, distance and can therefore be applied in communicating graphical information about approximate positions within a two dimensional space. This provides some basis for experimenting in an audio graphical co-ordinate system. But, also in using (perhaps smaller) sequences to communicate other information in interfaces. There is an obvious criticism about these experiments. One can argue that the strong correlations obtained are just because the participants in the experiments have counted the notes. Although there was an instruction to the participants not to count and the sequences of notes were rapid enough to make counting difficult, there is no proof that some counting has not taken place. However, by considering the results and especially the error distributions, one can clearly comment that if participants were counting, then their counting was not always very accurate, and in any event, counting up to 40 notes at a frequency of two per second is simply not possible. The presentation frequency has to be this rapid or else the time taken to communicate a length would be unacceptably long. 3.4 Musical Instruments Experiments Another important research question for a designer who wishes to use music in an interface is the choice of musical voices or instruments to produce the music. Using a number of musical instruments is obviously advantageous for communication purposes, but what capabilities does the average listener have? Can he or she tell a Bassoon from a Piano? or a French Horn from an Organ? It is likely that complex problem domains will require the designer to use distinct voices (different musical instruments). More specifically, we need to know: Are there any musical instruments which non musically educated listeners can identify by name when heard? 82

97 If so, how large a set of musical instruments can a designer use and be reasonably confident that participants will perceive them? It must be noted here that there is a fundamental difference between memory recall and recognition. This thesis refers to the process of memory recall as the ability of a user to draw from memory the name of the instrument being heard. The process of recognition, in contrast, refers to a users ability to recognise the musical instrument being heard from a set of musical instruments when the names are provided to the listener. In order to investigate this, the following was done: 1. An initial survey, asked 100 participants to specify which instruments they thought they could recognise if heard 2. A recall experiment with 23 instruments and no training determined the success rate of people recalling from their memory the name of the instrument being heard. 3. A recognition experiment with five instruments and no training determined the success rate of people recognising an instrument from a set of names provided to the participants. 4. A recall experiment, aided by an instrument dictionary enabled users to perform successful mappings. We first made a survey with 100 people who were asked the following question: Which musical instruments do you think you will be able to confidently recognise by name if you hear them? The results of the survey, shown in table 3.6, show how many people were confident that they could recognise certain instruments (participants were able to consult a list of instruments). The results of this survey, as shown in table 3.6 are no more than an indication of what people believe that they will recognise. People were not tested with the musical stimuli of each instrument and, thus, some experimentation to explore this topic further was required. However, another way to look at the results of this survey is that it indicates the relative popularity of the musical instruments shown. Although, one cannot argue that people will definitely recognise these instruments, it is apparent that the ones at the top of the list must be the most 83

98 Musical % of Musical % of Instrument confidence Instrument confidence Guitar (A) 96 Xylophone 30 Piano 94 Trombone 30 Timpani 83 Pan Pipes 27 Violin 82 Tuba 20 Saxophone 76 Oboe 17 Flute 55 Piccolo 16 Harp 53 Fr. Horn 13 Trumpet 42 Bassoon 10 Castanets 40 Celesta 3 Cello 37 Contrabass 3 Harmonica 37 Harpsi-Chord 2 Organ 35 Mandolin 1 Clarinet 34 Engl. Horn 1 Table 3.6: The results of the instrument survey in percentages. popular and widely heard ones. One might assume that the success rate of the instruments' recognition in non musically trained listeners should be associated with the instruments' popularity. Then we constructed a timbre recall experiment in which 16 subjects were asked to recall from their knowledge the name of an instrument being played. A short tune (8 notes) was played using the normal range of each instrument and the participants were given 20 seconds to consider their answer. They then wrote down on an answering sheet the instrument they though they had just heard. The Roland MT32 multiple timbre synthesiser was used driven from the Sound Blaster Card, SBK and MIDI. The results as shown in figure 3.11 demonstrated that the successful recall rate was not uniform, and certainly not as good as subjects had predicted in the previous experiment. Some instruments were almost uniquely recognised, for example Xylophone, Drums and Organ. Others had reasonably good recognition rates (up to 50%). Note that there are major differences between what subjects thought they could recognise (see previous table) and what they actually recognised. If we redraw the previous figure, grouping instruments together into their recognised families we get the table 3.7. The families used were: 84

99 PIANO PICOLO HARP ENGL.HORN OUITAR ORGAN FR.HORN CELESTA CLARlNET SAXOPHONE CEu.0 TROMBONE DRUMS VIOLIN HARMONICA ruba CONTRA BAS S TRUMPET XYLOPHONE PAN PIPES FLUTE OBOE BASSOON ~ ~ 8 I I I I I I I I 2 I I I 8 I I, I I I I I I I I u ~ 0 2 I I 2, I, I I 8 I I I I I I 2, ", I I I I I I, I 2 I,, I I 2, I I I Figure 3.11: Results of the recall experiment with instruments. Numbers in boxes show the frequency of recalls. For example, when piano was heard, 8 subjects recalled it successfully as a piano, 1 subject recalled it as a harp, 2 subjects recalled it as a guitar, 2 subjects recalled it as an organ, 2 subjects recalled it as a celesta, and 1 subject recalled it as a tuba. Families Piano Organ Wind Woodwind Drums Strings Piano Organ Wind Woodwind Drums Strings Table 3.7: The results of the experiment organised in families. The numbers show the frequency of recalls. 85

100 Piano: piano, harp, guitar, celesta, and xylophone (though musically this is considered to be percussion) Organ: organ and harmonica. Wind: trumpet, French horn, tuba, trombone, and saxophone. Woodwind: clarinet, English horn, pan pipes, piccolo, oboe, bassoon, and flute. Strings: violin, cello, and bass. Drums: drums. Therefore the results shown in figure 3.11 can be presented in terms of the above families. This is shown in figure PlANO ORGAN WlND WOODWIND STRINGS DRUMS OROAN WIND WOODWIND STRINGS DRUMS PIANO HARP 21 I GUITAR 6 I 2 I, I CEl. STA XYLOPHONE OROAN HARMONICA TRUMPET FR. HORN ru.. TROMBONE SAXOPHONE a.arjnet NOI. HORN PAN PIPI!S "co"' 0",,' BASSOON "-"" VIOLIN tt'"' CONTRABASS DRUMS I ,!6 2 4 I I 2 2, I 5 1 I , , Figure 3.12: Results of the recall experiment with instruments grouped in families piano, organ, wind, woodwind, strings, and drums. Numbers in boxes show the frequency of recalls. These results (which basically follow music orchestration rules), indicate that an experimenter should choose instruments from different families to avoid confusion. Piano, Organ, Xylophone and Drums seem to be particularly well distinguished. In another experiment, the recognition of musical instruments from a small set of instruments was tested. In this experiment, subjects heard a short tune from an instrument and then had to choose the one being heard from a list of five (in a style of a multiple choice question). 86

101 ,.S Figure 3.13: The results of the recognition experiment. The successful recall rates in percentages are graphically shown in figure Successful recognition rates were 81.2% for piano, 87.5% for guitar, 100% for drums, 87.5% for violin, 93.7% for saxophone, 87.5% for flute, and 31.2% for harp. These experiments showed that recognition from a small set of possibilities (and not recall) has higher success rates (the poor performance of the harp is difficult to explain but one must take in consideration the synthesiser quality of output may play an important role in the recognition of an instrument). The instruments chosen were one from each class, as specified above, but two were from piano class. These were piano and harp which, as it can be seen in figure 3.13, confused the subjects. In a further experiment 10 participants were presented with a dictionary of instruments. The dictionary had entries of the following form for 10 instruments: This is <the name of instrument> (a short tune played). The dictionary was presented five times as described above. On completion of the dictionary being heard, participants had to write the name of the instrument for every succeeding tune heard. The instruments chosen to be tested were piccolo, flute, pan pipes, clarinet, tuba, harmonica, trumpet, cello, celeste, violin. Some of the instruments had been selected because of their similarity, and some because of their distinctness, in order to examine the capability of participants in making successful comparisons and matching after a short conditioning process. The results of this experiment are shown in figure The percentages of successful recalls were 70% for piccolo, 60% for flute, 100% for pan pipes, 100% for clarinet, 80% for 87

102 tuba, 60% for harmonica, 70% for trumpet, 60% for cello, 100% for celeste, and 90% for violin. 100 W 80 n- '.:.;.: :;:$; m.:~~: :::::". 70 ~\(::~ ~ }.~; Jt ;,''':., ~ l '" ~:;:: u!!1.::;;:..:.:.: '~I I M :::::~ ::~\ :<.:.: ",,;. 11.: " 5 K la tt "'.,. S J{ ~B. :.,:.. %. :::;::. ;;,:. ~ ~*,. «J ~ 1 K~ i:; :~:~ ~r :IJ 30 :,,: ~ill~ iw It x :8~ t~.:,:-, E If. :11 W :M 1J) :::,:' ~:~:' ~fl: I x;.~ :<% 10 n if + :8;~: ~l n 0 *u! :::;,'. y if! t:' ~f! f:f «.: :<.;.: m.':' ;~:{ '::::; Jili1 W ;:::, "'.;0" ;:f: % H ~ ~ [ s ~ j 11. ~ a '".~ ~ ~ Figure 3.14: The results of recognising instruments with the aid of an instrument's dictionary Discussion This investigation into recognition and recall of musical instruments has shown the following: 1. When a considerable number (say 10) musicalinstruments are used even if they are distinctive, listeners do not recall the name of all the instruments. Recall is poor. For example, if 10 instruments are used to communicate 10 different types of information, then the risk of that the user will confuse instruments is very high. It was demonstrated that it is very difficult for users to make decisions recalling instrument names when they have to choose from a large population of instruments. 2. When using fewer numbers of instruments (say 5 or fewer) then the success rate in recognition experiments is greatly improved. The listener can recognise the instruments better. We have found that if instruments are chosen from the six (6) families created from our experimental data then recognition or recall will be higher. Instruments 88

103 from the same family (the ones introduced earlier) must be avoided but if training is offered to the listener then some instruments (from the same family) may be recognised. It was also shown that an introductory demonstration of the musical instrument timbre contributes very significantly in successful recall of musical instruments. The dictionary of musical instruments was found to be an important aid in assisting listeners to make correct recalls. In addition, there are also a number of other advantages when one uses a dictionary of instruments: 1. The listeners use the dictionary of the instruments as a reference basis for comparison and recall as opposed to using their memory of previous experience with timbres. 2. The dictionary of the instruments uses the same synthesiser (internal or external) and, thus, any distortion in the musical instrument is reflected uniformly in the dictionary as well as in the musical output. 3. The need for the instruments dictionary, although necessary at the beginning, gradually diminishes as the listener becomes more experienced. The use of a dictionary is suggested for the circumstances where a large number of instruments need to be used or when instruments belong to the same family and may sound alike in an untrained ear. 3.5 Stereo Perception Experimentation with panning, which is a physical characteristic of sound presentation to the ears, was also pursued in the interest of identifying the accuracy of spatial location perception, not only in the left, right and middle panning positions of the speakers, but also, intermediate ones. Panning is important because it may enhance musical messages by making them more distinguishable and clear, as well as providing an additional property for mapping. Panning can be presented either through headphones or speakers2. The experiments described below with panning used speakers. It is assumed that any panning characteristics perceived from speakers will also be perceived equally, if not better, using headphones. 20ne must consider that speakers are usually located on the left and right of the VDU on a multi-media computer workstation. The physical position of a computer user is in the centre and, thus, there is a technical precondition to investigate panning. 89

104 Twelve participants were involved in this experiment. Two speakers were used, each of which was located 50 centimetres (cm) away from the participant (the experiment was carried out with one participant at a time) as shown in the figure Left Speaker Right Speaker lm ' '< j" ""----~,,: SOcm,, 0/ (WciP0) Figure 3.15: The way in which participants were tested III the spatial location experiment. In MIDI, the distance from one speaker to another is 120. This means, that 60 produces an equal output from the speakers. Any other number in between 0 and 120 will balance the output of the speakers appropriately. The numbers examined in the experiments are 120, 0, 60, 30, 90, 15 and 105. Participants were asked to mark a position within 0 and 120. Answers were accepted as correct when they were plus or minus 5 from the correct answer. The subjects had to make a decision about the spatial location in the absence of any reference or comparison point. Thus, no continuous musical stimuli was used to designate different positions. The results shown that the left, right and middle spatial location were successfully recalled by all the participants. The percentage of success for the rest of the spatial locations were 58% for the position 30, 66% for the 90, 25% for the 15 and 33% for the spatial location position 105. In conclusion, this demonstrates that one can very confidently use right, left and middle as spatial location indicators in an interface, without having to provide any reference point which the listener will have to compare with the presented spatial location. Thus one could use these three major channels of physical separation using the right speaker to communicate information of one sort, the left speaker to communicate another, and the middle speaker for yet some other information. Any other spatial locations are not so confidently perceived. 90

105 I 100% ~ ~ SO ~ % 100% ~ 30 i I 20 0 IS Positions of Stereophony in MIDI terms Figure 3.16: Results of perceiving spatial location. 3.6 Communicating with Music The results of the previous section and the usage of the existing guidelines (although they may not be strictly followed in order to experimentally probe with new structures and techniques) can now be used to map real problem domains. Real problem domain experimentation is necessary for an empirical investigation of this sort since the perception of music is not simply the summed perception of its individual elements (the experiments conducted so far hardly use real music in terms of continuous stimuli such as we will see applied in the communication of some aspects of the execution of the Bubble Sort, section 3.8). More experimentation with real problem domain, as opposed to ad-hoc experiments, will help to further establish and enhance our understanding of how to use music in interfaces. There also a number of concepts which need to be raised and at least partially researched in this thesis. Firstly, there is a need for research in defining the perceptual musical building blocks (meaningful objects) which are easily understood in a context of a user interface or other software, and rules for combining them with each other, not only in making sense to the user from the perceptual point of view but, also, in best representing the properties and the underlying meaning which need to be exposed from the problem domain. If the mapping between the problem domain and the musical representation is musically meaningful then the meaning behind a musical message may have a greater chance of being commonly understood and interpreted. Secondly, in order to audiolise or musicalise an algorithm or a user interface software component, the operation and functionality of the problem domain must 91

106 be understood. The underlying functional concepts of the problem domain need to be examined in detail and decomposed into smaller units until an overall list of primitive simple functions is completed. By careful examination of this functional list, one and only one meaning associated with each subdivided concept, action or task can be identified. This technique can be thought of as similar to the well known structured programming techniques using the top-down or bottom-up methodologies or with more recent object-oriented methodologies in software engineering. When a designer has done this and has successfully decomposed the problem domain, the different events or characteristics of the problem domain need to be matched with musical elements and structures. Of course, not everything can be matched in music or audio. This is because different media are for different purposes. It is beyond the research scope of this thesis to discuss the suitability of different mediums for different problem domains. Thus, the objective here is to match a problem domain with music as far as possible and then investigate its applicability. The matching of algorithmic events with music must follow certain perceptual rules in the usage of musical events. An algorithmic event would, for example, be the execution of a single statement. A musical event might be, for example, the sound of three consecutive notes or even a single note in the musical scales (i.e., chromatic or diatonic). Single events in isolation either in user interfaces or in the music domain may not necessarily make sense. Usually, events make sense only when they are considered in relation to each other. For example, when a user interface opens a window for a user with no text into it, it does not make sense unless a user has made a prior explicit request for it. In the same manner, when a musical note is being heard in isolation, it does not make perceptual sense unless a user or listener expects it or by interpreting it receives new information which justifies the presentation of those particular musical stimuli. The approach to matching the problem domain (events, characteristics) with music can be attacked from different angles: From the user perspective view. The identification of events which are required to be understood by the user. From the problem domain perspective view. The identification of the conceptual properties which need to be exported and the matched with the music. 92

107 From the communication media perspective view. This involve the consideration of using the best media to communicate the information. Theoretically, any matching process will have to consult all perspective views mentioned above in order to be a successful metaphor which delivers its intended message from the problem domain through the commonly understood semantics and structures of music to the perceptual expectations of the user. If a matching does this then it is a successful one. The next sections exercise these issues by examining the practical application of music to communicate some types of contents of a software engineering database, and aspects of a sorting algorithm (using the Bubble Sort Algorithm). 3.7 Reducing Visual Complexity One application of music is its employment in complex and overcrowded visual presentations. This is particularly important especially when there are limits for displaying 'space' to represent concepts of the problem domain. In these circumstances, music can be considered as an alternative way for conveying some information to the user and thus make the visual display more readable. If two channels are used to convey information to the user then the information passed through each channel needs to be distributed appropriately. One way to produce a distribution is to ask the following question. "What are the groups or bits of information which when used in a visual presentation produce a complex and overcrowded display?" The answer to this question will obviously depend on the particular information characteristics of the application in question. This process of reducing information from the visual presentation will accumulate information filtered from the visual presentation( at least for each particular display instance). There are a number of ways which one can deal with the filtered information. These are: 1. Hide the information from the user by keeping the user unaware of its existence or its content. 2. Present the information to the user in another visual instance at the cost of hiding some other displayed information and more effort to access it. The user will have to relate information presented in one visual instance with information presented in another especially when explicit or implicit relationships exist between different sets of data. 93

108 3. Allow the user to make a selection of what is going to be filtered out from the visual presentation. This will pre-suppose that the user has a sufficient degree of awareness of the relative importance of the information which is going to be presented. 4. Present the filtered-out information in another medium. This extra medium will not present information redundantly but only the information which is filtered out from the visual presentation. In this way, the user will have an alternative source for the content and quantity of the filtered information. One example of a heavily crowded information display system is the PCTE (Portable Common Tool Environment) system. The author of this thesis has personal research experience in investigating possible ways of eliminating and filtering visual information from this system which, if it was presented unfettered, would have been unreadable except to the really experienced user. The problem domain of that research was to produce a readable browsing facility for the contents of the OMS (Object Management System) of PCTE. This research led to an MPhil award from the University of Wales, Aberystwyth, [178). In the next sections, the software engineering application of PCTE will be briefly reviewed and the problems of its overcrowded visual displays will be highlighted and discussed, raising questions of how an alternative auditory-musical output can help in presenting some information to the user The pete Object Management System The Portable Common Tool Environment (PCTE) is an attempt to provide a large software system which will enable the development of state-of-the-art Software Engineering Environments (SEEs) [179, 180J The environment was developed within the software engineering activities of the first phase of the ESPRIT software technology programme. Six European companies were involved. These were BULL (France), GEC and ICL (United Kingdom) NIXDORF and SIEMENS (Germany), and OLIVETTI (Italy)3 PCTE is structured into the following three components: 1. An Object Management System (OMS), which provides data storage services. 3For illustration purposes, the original problem domain of the PCTE version 1.5 is considered, but now, further versions have been introduced (ECMA PCTE [181, 182]). 94

109 2. The User Interface, which provides a powerful windowing system. 3. A Distribution System, which allows transparent distribution of facilities and resources across workstations. The reader need only know about the Object Management System. OMS manages the information of PCTE using an E-R (Entity Relationship) model. It offers uniform management of objects corresponding to a number of different levels of abstraction. One of the most powerful facilities offered by OMS is a set of mechanisms which allows users and PCTE itself to designate objects and storage in a manner where all their properties, as well as interrelationships, are meaningfully managed. Every object has a set of attributes and a set of relationships connecting it to other objects within the information base. In short, the functionality of OMS is the handling of unstructured data as a uniform graph of objects and links (i.e., relationships with other object(s)), the evolution of a database scheme, a set of integrity constraints and version management facilities. Every information object in OMS has an object type and a definition of properties. There are some common characteristics of OMS objects which belong to the same type. Some of these are: 1. Attributes. These are qualifying objects associated with an object or link. They determine the destination object of a link. They are defined by name, value type or an initial value. There are also system attributes which are: (a) Owner which specifies the user-identification of the process containing this object. (b) Group which specifies the group identification of the process containing this object. (c) Mode which specifies a sequence of bits which allows the user to read, write or execute access permissions of the objects. 2. Links. These are all the object relationships. A link connects an original object to a destination object. Any object can have up to thirty-two (32) links to other objects across OMS. An object link is characterised by the following: (a) Name which is a string defining the link. (b) Cardinality degree. (c) Set of originating objects. (d) Attribute types. 95

110 (e) Stability property which determines if the destination object can be updated or deleted. (f) Category. There are composition, reference and implicit categories: 1. Composition. This category is assigned to a link type if an object must be created with a link starting from an origin object and leading to a new object (one-way link from the object). 11. Reference. This category is assigned to a link type if a pre- existing object must be referenced with a link type by another object (oneway link to the object). iii. Implicit. This category is assigned to a link type if a reverse link must be created of a relationship type (two way link). The above elements are a subset of the types found in OMS and some of them will be used in the experiment documented in this section. In an attempt to browse this database, we can get reasonably readable displays such as the ones shown in figure 3.17 and 3.18, and complex displays (or to a certain extent unreadable) such as the one shown in figure However, for a reasonable task (remember this is a software engineering database), quite a few OMS objects and links are required to be viewed at the same instance. The displays shown in the figures above illustrate that the greater the number of objects and links, the greater the presented complexity and subsequently incomprehensibility. POTE and OMS are mainly used as a basis for the development of tools and in the production line of systems in software engineering environments. The objective of the experiment is to see if software engineering researchers can be helped by the use of music as a communication metaphor in displaying OMS contents. If the experimentation in using music to represent certain types of OMS contents is successful and, thus, the use of music in this problem domain found to be feasible, the following advantages will derive: 1. A way of reducing complexity of the visual display. 2. Information which is filtered and not presented to visual browsers of OMS will not be lost or missed because it can be auralised musically. 3. Users can be visually free and still make searches in which information is presented abstractly using music. Visual attention can be utilised when the required OMS objects are found. 96

111 Figure 3.17: An overview of the tool running [178]. TOOL DIOUllUES II TRIIVIIL DETfIlJ..15 I'ltE>OITAnOll IIErRUS ~,- ~ lje.trll OISr\.A'." '" nil u... - ""I u... uenl_.. -- II.~&....,, - [@ Figure 3.18: Readable displays of pete OMS contents. 97

$"'l DETRIl DISPlIIY ~. (Iq "... (\) "'"... ~ >- " 0 El 'cl " >< ~ 0... (\) <: (\) " " <D ".$

112 "'l DETRIl DISPlIIY ~. (Iq "... (\) "'"... ~ >- " 0 El 'cl " >< ~ 0... (\) <: (\) " " <D "... CO (\) ~ 0- ~ 0-' " ~ (Iq... ~ "", 0-' ~ '" (\) (\) 'cl... (\) '" (\) " '"'" ~ '"'" o p

113 4. Quicker interaction will be possible due to the fact that different information is presented using visual and audio channels simultaneously. It will take longer to retrieve the same information using only the visual channel because it will require a number of display instances in the browsing task. 5. In the event of user confusion, the user can double check and confirm what is being heard visually. Usually the use of music presented in such domains will be that of playing an auxiliary role to the visual presentation. The purpose of the musical messages is to communicate some information which otherwise would not have been presented to the user because of the filtering in order to keep the visual presentation comprehensible. However, in the experiment which is documented below, music alone is being used to communicate pete OMS related information Information to be Communicated and Mapping A brief introduction showing some of the pete OMS complexity was given above. In this section, the experimental objectives, the sample data to be used, and the design approaches of the musical mappings are discussed. The aims of this experiment is to investigate the perceptual effectiveness of different forms of musical mappings and learn lessons. In this particular experiment, the possibility of using music to communicate pete OMS related contents (at least at a preliminary level) is also addressed. The objects types chosen for musical representation were file, pipe and queue. Three subdivisions of the type file were also chosen. These were the sub types source, compiled and executable. In addition, different numbers of links (maximum 32) associated with each of the above object types or subtypes (for the file type) have been chosen to be communicated (the exact number will be discussed in section ). Link types to be communicated were one-to-one, one-to-many, and manyto many as well as composition, explicit and reference types. Questions which can be asked about the design of the musical mapping of the above data are: 1. What are the smallest possible meaningful entities which can be identified in this data? 2. What are the subtypes within the types of data which need to be communicated? 99

114 3. Are there any relationships between the different types of information which need to be communicated? There are a number of issues which need to be considered in the design of the musical mappings to communicate the above data. One way is to assume that only one piece of information will be communicated at any given time. For example, it could be an object type, a link type or a number of links associated with an object. This means that the listener will have to listen to each musical message individually and then associate an object type communicated from one message with its corresponding number of links communicated from another message. For example, supposingly an object of type queue with (say) 27 links needs to be communicated then two musical messages will communicate it, one message for the object type and another message for the associated number of links. However, another way the designer can approach the musical mapping is to integrate the communication of the object type with its corresponding number of links in one musical message. Integration does not mean playing the two musical messages one after another in sequence but to design the musical mapping in such way which allows one to interpret the object type and the number of associated links from the same musical message. This means that the musical message will simultaneously communicate the object type and its corresponding number of links. The listener will be involved in understanding, interpreting and reasoning about the object type and links simultaneously. This method, if applicable, will require less listening time because there is only one message to be played as opposed to two. It also involves active listening because the user (listener) does not passively decode information communicated using music but reasons about the information heard. Considering the above point, the musical mappings chosen for the investigation purposes of this experiment will be both single communication (e.g., object type at a time or link type at a time) and integrated (object and link types and abstract. number of links) in order to allow both to be investigated. Therefore, the object type, its corresponding number of links and their types (i.e., composition, reference and explicit) will be attempted to be communicated in one single message. The link types one-to-one, one-ta-many, and many-ta-many will be communicated in single messages. The principles underlying the design of the musical messages, which will be used to communicate this information will vary for investigation purposes. The object of type file is generically mapped with 32 notes (equal to the maximum number of links) which are played in an accelerating time and ascending pitch in rhythmic groups of two notes. The time delay among the groups of notes decreases 100

115 gradually. The time is initially 2 seconds which consecutively decreases by dividing prior to the output of every two note with 1, 2, up to 16). There are three pitch positions from which the notes start. These are the low pitch position (starting at MIDI notation 40 or E2) for the subtype source, the middle position (starting at MIDI notation 52 or E3) for the subtype compiled and the high position (starting at MIDI notation 64 or E4) for the sub type executable. All notes belong to the chromatic scale intervals and their stereophonic position is at the middle position (i.e., equal output from both speakers, MIDI notation 60). The object of type pipe is mapped into two groups of notes (the second group is a reversed repetition) separated by a pause. The first group starts in the left speaker and then gradually (in terms of stereophony) moves to the middle (MIDI notation 60). Up to 32 notes can be used depending on the number of links associated with the pipe object. The second group is a repetition of the first set of notes but in reverse order, starting from the middle (MIDI notation 60) and gradually moving to the right speaker as the notes are played in reverse. For example, if an object has 16 links, the two groups will consist of 16 notes each. The 16 notes will be distributed evenly from the left speaker to the middle and again in reverse pitch from the middle to the right speaker. The listener therefore hears an inverted 'V' shape with a number of links being communicated through the number of notes played from left to middle or middle to right. If there were 30 links, the MIDI distance will be 2 (since 30 links divided by 60 results in a stereophonic step of 2). If the number of links is 31 or 32, the MIDI division is as for 30 links and the extra 1 or 2 links are played in the centre. There is a short pause between the groups to assist listeners to differentiate the groups. Stereophony is used to give the impression of movement. C3 (MIDI notation 48) is the starting pitch following the chromatic scale. The object of type queue is mapped using consecutive positions of stereophony to a distance which depends on the number of links (communicated by the number of notes). In this case, the maximum stereophony (i.e., 120) is divided by the number of links and truncated. The resulting integer is the MIDI stereo spacing between notes. If there 32 notes (i.e., 32 links) then the stereophony positions will increase by 3 in MIDI notation. The general structures are shown in figure 3.20 which starts from the left and gradually moves to the right. An initial note (C3 or MIDI notation 48) first is heard in the left speaker (MIDI notation 0) and then it gradually moves to the right speaker using the stereophony positions in the way described above as it gradually ascends in pitch in the chromatic scale. When it reaches the right speaker, it returns back to the left speaker using the same positions of stereophony and notes 101

116 MIDI Indicator of Stereophony (0) (20) (40) (60) (8 0) (loo) (120) ~fts~~ Ri~S~ec j m ~ m ~ I Object: File I --- Source: Nl.N2,N3...N32 ~ Low Pitch Compiled: NI. NZ,N3.... N3Z - Medium Pilch Executable: NI. NZ,NL.. N3Z - High Pitch ~;.. I.... ;..... ~~ 1 I Object: Pip" I i;l NI. NZ. N3. N4... N3Z ~ :~.... ~.:...:..: ::~: ~ L ~~t N3Z... N4. N3. NZ. NI ~;];::~ ~: ::~.:.: ~~~j.: :.~~:.. ::.~::::.: :.:a~~;. I Object: Queue I NI NZ N ;> N32, V... ~~... ~:::::::::::::::::.::::::::::::::::::: ::::::::::::::::::::: :::::::::::::::. ~~.~... ~.~~~~~~... Figure 3.20: The musical mapping used in the experiment to communicate object types. but in a reverse order. The cardinality type of a link is communicated with stereophony using left and right as shown in figure The one-ta-one relationship is communicated with three notes first heard in the left speaker and the moving to the right speaker (the notes used are C4, D4 and E4 or 60, 62 and 64 in MIDI notation). Similarly, the one-ta-many is communicated with the same musical message on the left and an additional triplet of notes (C5, D5 and E5 or 72, 74, and 76 in MIDI notation) is heard, after an 0.5 seconds delay, from the right speaker. Thus, the output is one triplet of notes in the left speaker and two triplets in the right speaker. The relationship of many-ta-many is balanced with two triplets of notes (the same notes with the ones mentioned above) from each speaker. The musical messages communicating objects pipe and queue have been designed in a way in which the listener has two opportunities of determining the number of links communicated. This particular arrangement of notes was chosen because they were shown to be effective in trials. Using the musical mappings described above, the following are investigated: 1. Single communication of a single piece of information at one time. Object types such as source, compiled, executable, pipe and queue and link types such as one-to-one, one-ta-many and many-ta-many will be communicated individually. 2. Multiple communication of several pieces of information at one time. Object 102

117 H~~r~~H'~'~ MIDI Indicator of Stereophony (4~HH'H (~~)HHm H(8~HH'Hm(I:H H~~;P~~ 1 I Cartdinality: 1->1 r-"'-?...ll_, I Nl.N2. N3 1 _n nn ~ I NI. N2,N ~.ft!... H..... H.... H... H r~~; 1 I Cardinality: l->m C2 -!,I;::-:::-, I NI. N2, N31,,' :::::::::::::::::::::::::::::::: I ~::~~:~~ I.0.{:... :::.::::.:: ::::::...:.::: ::.:: ::::]~~t 1 I Cardinality: M->M 2 I \ NI, N2, N3 \ ---- _ _---- _-> r:', N'-I.LNcc2.""N::-J31 NI. N2. N > NI. N2. N3 Figure 3.21: The musical mapping used in the experiment to communicate cardinality, reference, composition and implicit link types. types and link types (i.e., composition, reference and explicit) will be communicated in a single message Single Communication of Information Twelve subjects, students of Loughborough University, participated in the experiment. Subjects were final year undergraduate and MSc postgraduate students (10 male and 2 female). All of them were assessed as not having musical knowledge via a questionnaire. Subjects were introduced to the concepts of the pete OMS system and the nature of the concepts which were to be communicated using music in the experiment as well as the design of the musical mappings and the rules used. Subjects were played five examples of each musical message. These enabled them to listen to the musical mappings used. The musical messages communicated one piece of information at a time. This could have been either an object type from file (source, compiled or executable), queue and pipe or a link type from one-to-one, one-to-many and many-to-many. A random order consisting of one example from each type of information was tested. Subjects heard each musical message for three times before they gave their answer. The results in object recognition are shown in figure The successful recognition rates were 75% (or 9 subjects) for File Source, 66.6% (or 8 subjects) for File Compiled, 58.3% (or 7 subjects) for File Executable, 83.3% (or 10 subjects) for Pipe, and 91.6% (or 11 subjects) for Queue. However, all of our subjects successfully recognised the groups of cardinality (i.e., one-to-one, one-to-many, and many-tomany) communicated using stereophony. 103

118 .S ~ File Source Compiled ExCC\ltable Pipe Queue Itol ltom MtoM Object Types Link Types Figure 3.22: percentages. The successful recall rates III communicating objects shown III Most of the musical messages (excluding the object type file and its subtypes) utilised stereophony and rhythm in communicating information. The use of stereophony as a physical cue assisted in structuring the musical presentation in ways in which, for example, the link type cardinality can be communicated as an obvious metaphor. In the absence of stereophony, the musical message communicating the link type cardinality would have lost its 'metaphorical' nature. It is important to note that although the perception of stereophony was shown to be difficult for some positions (see section 3.5), the gradual stereophonic movement in intermediate positions from left to right and vice versa was perceived. From short post experimental interviews, subjects remarked that there was not a problem in recognising the object type file but there was for its subtypes. They were in a position to identify the difference in pitch but they did not remember which pitch (i.e., low, medium, and high) was communicating the sub types source, compiled, and executable Multiple Communication of Information This second stage of experimentation examined the possibility of communicating more than one piece of information at a time. By altering the musical mapping discussed in section , object type, total number of links and link type (i.e., reference, implicit, and composition) were communicated using a single musical message. 104

119 The same twelve subjects continued the experiment into the second stage with additional musical mappings (compared with the one in ) communicating multiple information in one integrated message. The link types reference, implicit and composition are represented using a sequence of notes equal to the total number of links of all types. In this sequence, all notes communicating a reference link are played by piano, all notes communicating an implicit link are played by an organ and all composition links are played by a tuba (these three instruments were chosen from different families). Subjects were introduced to the design and rules of these multiple meaning musical mappings and five examples of each musical message were demonstrated with comments and remarks as to how they should be interpreted. This enabled subjects to understand the rationale behind the design of the musical mappings used. The information communicated randomly was the following: 1. Source (as before all notes are from the centre, starting from pitch MIDI notation 40 or E2). Seven musical messages of type source were played with the following number and type of links on each example: - 1 (1 'implicit'), - 3 (1-7 (5-13 (4-20 (5-22 (2 'reference', 1 'reference', 1 'reference', 5 'reference', 6 'reference', (15 'reference " 5 'implicit' and 1 'implicit' and 1 'implicit' and 4 'implicit' and 9 'implicit' and 10 'implicit' and 7 'composition'), 'composition'), 'composition,), 'composition'), 'composition,), and 'composition') number of links. The link subtypes (Le., reference, implicit, and composition) were differentiated using piano (for type reference), organ (for type implicit) and tuba (for type composition). The order was random. This means link types were communicated as they were encountered using different timbre. 2. Compiled (as before all notes are from the centre, starting from pitch MIDI notation 52 or E3). Seven musical messages of type executable were played with the following number and type of links on each example: 105

120 - 1 (1 'reference ), - 2 (1 'reference' 1 'implicit), - 10 (3 'reference' 5 'implicit' and 2 'composition), - 15 (10 'reference' 3 'implicit' and 2 'composition), - 18 (8 'reference' 7 'implicit' and 3 'composition), - 25 (1 'reference' 9 'implicit' and 15 'composition), and - 30 (12 'reference' 14 'implicit' and 4 'composition), number of links. Similarly to the above, the link subtypes (Le., reference, explicit, and composition) were differentiated using piano (for type reference), organ (for type implicit) and tuba (for type composition) but in this design the link types were grouped in a way where all reference types were always first using piano, all implicit types second using organ and all composition types third using tuba. 3. Executable (as before all notes are from the centre, starting from pitch MIDI notation 64 or E4 but testing for number of links only) with 1, 4, 8, 14, 16, 24, and 26 number of links. For objects queue and pipe identical messages were used as before but subjects were told how the messages represented the number of links. The message presented were: 1. Queue with 1, 5, 9, 11, 17, 21, and 32 number of links using piano. Thus, this message communicates only the object type and the abstract number of links associated with it. This message was designed to investigate the possible effects in the perception of listeners when the numbers of links are repeated in the same message while at the same time this repetition helps the listener in identifying the object type (in this case queue). 2. Pipe with 1, 4, 6, 15, 19, 23, and 31 number of links using organ. Thus, this message communicates only the object type and the abstract number of links associated with it. This message was, again, designed differently to the queue type in order to investigate the possible effects in the perception of listeners when the numbers of links are repeated in the same message while at the same time this repetition helps the listener in identifying the object type (in this case pipe). Subjects were requested to rest their pens on their answering sheets (this was, also, enforced by the experimenter) whilst the musical messages were presented, each three times. On completion of the presentation of the group of three identical 106

121 stimuli, subjects were given 30 seconds to answer the questions asked. The questions were: 1. What was the type of the object communicated? 2. How many links in total were associated with this type of object? Subjects were offered the following categories of number of links to choose from: 1-5,6-10, 11-15, 16-20, 21-25, How many links were of type reference, implicit or composition? Subjects, again, were offered the following categories of number of links to choose from: 1-5,6-10, 11-15, 16-20, 21-25, (This question was not asked for the messages of compiled, pipe or queue object types. ) Then, the next three stimuli were presented, and so on. This provision (to rest their pens) in the administration of the experiments ensured that subjects did not employ any visual aid in order to answer the questions asked. In addition, this procedure, most importantly, tested their ability to comprehend the musical stimuli as a whole and their capability in maintaining information perceived in their short term memory. This presentation style was because if real software engineers were to listen to music communicating pete related information (and not in laboratory conditions), their ability to interpret and remember information communicated through music in the absence of any visual aid assisting the musical interpretation is crucial. Thus, an experiment requiring participants to listen to one single message and write their perception on their answering sheet or make notes on their answering sheet when more than one piece of information is communicated would produce a misleading result. The musical messages were presented in a random order. All subjects successfully identified file, pipe, and queue objects types. However, the recognition of subdivisions source, compiled, and executable ofthe object type file demonstrated a difficulty similar to the one observed in figure The successful recognition for type source was 91 %, for type compiled 75%, and for type executable 83%. This time, of course, subjects had more experience with the musical messages. Table 3.8 shows the subjects perception of the overall number of links and the links of types reference, implicit and composition as they were perceived from the File Source message. Table 3.9 shows the subjects perception of the overall number of 107

122 No. of Links 1-5 & & & , li le , li le , Si e , i e , loi loc r Si e Table 3.8: The results in terms of frequency of subjects assigning number of links they perceived from the message file-source into categories of 1-5,6-10,11-15,16-20, 21-25, and number of links for the overall number of links and for link types reference, implicit, and composition communicated ungrouped (abbreviated as r, i and c). links and the links of types reference, implicit and composition as they were perceived from the File Compiled message. It can be seen that the perception of subjects from the ungrouped notes shown in table 3.8 was poorer compared with the results in table 3.9. This difference in the result can be attributed to the ordering of link types. From post experimental interviews, subjects remarked that the problem was not to distinguish one instrument from another and thus one link type from another. The difficulty was in the capability of keeping three abstract counters of how many link types were communicated across the sequence. Therefore, the random output of link types differentiated by different timbre (piano, organ, and tuba) did not appear to help the listener to perceive the numbers of the different types of links communicated. However, the results in table 3.8 clearly suggests that they were aware that 108

123 No. of Links r li r Si c lor i c r i c r i c r i c Table 3.9: The results in terms of frequency of subjects assigning number of links they perceived from the message file-compiled into categories of 1-5, 6-10, 11-15, 16-20, 21-25, and number of links for the overall number of links and for link types reference, implicit, and composition communicated grouped (abbreviated as r, i and c) in a way where reference were first, implicit were second, and composition were third. different types of links were being communicated. The results in table 3.9 show the results when link types were communicated as groups (i.e., reference first in one group using piano, implicit second in another group using organ, and composition in a further group using tuba). Thus clearly showed an improvement in the perception of subjects. The results of the perception of subjects in terms of categorising the number of links heard into groups from the message File Executable are shown in table In this case link subtypes were not communicated. Table 3.11 shows the results from messages queue and pipe. It can be seen that small number of links were perceived better regardless of the structure of the musical message. The musical design of the pipe type was different to the design of the queue type. Both designs showed no significant differences in the listeners' perception of the number of the links associated with each type. 109

124 No. of Links Table 3.10: The results in terms of frequency of subjects assigning number of links they perceived from the message file-executable into categories of 1-5, 6-10, 11-15, 16-20, 21-25, and number of links. No. of Links Queue Pipe Table 3.11: The results in terms of frequency of subjects assigning number of links they perceived from musical messages communicating queue and pipe into categories of 1-5, 6-10, 11-15, 16-20, 21-25, and number of links. Overall, the results indicate that subjects were in a position to interpret the type of the object, the number of links exported from it and there is evidence to suggest that link types such as composition, implicit and reference can also be communicated in one single, multiple meaning, musical message Discussion On one level, the OMS experiments have shown that music has potential for conveying information to users about concepts commonly found in pete OMS (and other databases as well). The mappings used, although they may not be the only ones (or even the best), show that music is to a certain extent reliable in communicating some aspects which could then be removed from the visual presentation, reducing 110

125 the complexity of the display. It is important, however, to realise that the results of these experiments here must not only be interpreted with regard only to the OMS but also, providing further evidence on how to design musical messages. A common characteristics in these experiments was the use of structure in the construction of integrated musical messages. The participants showed they could process such structural music in a satisfactory manner. Therefore, one particular aspect which needs to be considered in musical mappings is that music must be used sparingly and not excessively in representing a particular concept. One must always remember that in producing a musical message which maps a particular problem domain property, the user must be able to identify, not only the meaning of the whole musical message, but also the individual musical notes and must be able to reason why one note, timbre or rhythm, or another is being heard. It is only when the user can reason about what is being heard in terms of musical output, that the musical message can be understood. It was also elicited by post experimental interviews, that the knowledge of the subjects of the domain of what was musically conveyed was also important. Subjects who understood the anticipated information via the musical channel were observed to be more alert and found musical messages more comprehensible in that particular concept. Therefore, an important trade-off musical perception of listeners is the degree of understanding of the general context under which musical stimuli are presented. This positively contributes towards the perception of the listeners. The more listeners are aware of the context, the more easily they will understand the musical messages. However, at another level, that of musical message design, the experiments have shown that two groups of rhythmic notes, and stereophony are effective in communicating information both at a single level where one piece of information is communicated at a time and at a multiple level where several pieces of information are communicated at a time within one musical message. Stereophony was shown to be effective because it was used as a physical cue for the communication of the object types queue and pipe and link types oneto-one, one-to-many and many-to-many. The way that stereophony was used in these musical mappings did not require the listener to identify exact (or near exact) positions of stereophony of each individual note. Instead, listeners were required to identify a movement in stereophony (from left to right or otherwise) as a whole at the end of the presentation of the musical message. It is interesting to note that the positions of stereophony which exhibited low perceptual success (see section 3.5) were shown in this experiment to be perceived as part of an overall movement in 111

126 stereophony (for example, the stereophonic positions used in the musical design of object type queue). In conclusion, the experiment has shown that music can viably be used as a communication metaphor in PCTE OMS to convey approximate information. The experiment has also shown that it is possible to communicate more than one piece of related information in one serial message. In order to ensure that listeners understand a musical message, they either have to be specifically trained with abstract concepts or the designer must invoke some metaphorical association in the listeners mind. A classical example of a metaphorical association is the connection of pitch and number. In the experiments with PCTE, those objects which invoked strong metaphorical associations were understood best. This is illustrated, for example, in the messages which represented cardinality. Users rapidly achieved 100% understanding of these messages. This is because of a strong metaphorical relationship in each representation (i.e., one-to-one, one-to-many, and many-tomany). In contrast, the metaphorical association of file type with its subtypes was weak. This issue will be discussed further in chapter Supporting Algorithmic Auralisation The experimental objective in this section is to investigate musical information processing from a musical mapping which has been designed to communicate some aspects of the Bubble Sort algorithm. The musical structures investigated are the perception of variable rhythmic patterns (within a 13 note sequence of notes in the Diatonic scale), the use of triads whose root position dependent on the location of the sequence within which the triad is heard, and the use of stereophony as a physical cue to assist listeners in differentiating one musical message from another. For the purpose of experimentation, the Bubble Sort algorithm is briefly reviewed (section 3.8.1), the design of the musical mapping is discussed (section 3.8.2), and results are reported and analysed (sections and 3.8.4) Sorting Algorithm The enhancement of the Bubble Sort algorithm with music (in parallel with visual stimuli) was first introduced in the ZEUS algorithm animation system [183J. In that work [183], musical instruments were used to communicate the movement of the elements within the list as they were sorted. The audio output supported visual stimuli. 112

127 In our experiment, we will be using music exclusively to communicate some of the algorithm's actions and the list state as it is sorted. In the following, a brief review of the algorithm is given. The principle behind the exchange or Bubble 4 Sort algorithm is the swap or exchange of adjacent pairs of list elements. The algorithm passes a number of times over the list's element data. When an adjacent pair of elements is not in order, according to an ascending or descending sequence requirement, then they are exchanged. The sorting algorithm terminates when no exchange of elements takes place during a completed pass over the elements of a list. The exchange sort algorithm (in Pascal format) is given below: procedure Exchange_Sort(var list:list_type; n:integer); var begin end j, k, temp sorted integer; boolean k := n; sorted := false; while (k>1)and(not sorted)do begin end sorted := true; for j:=1 to k-1 do if list [j] > list [j +1] then begin temp : =list [j] ; list[j] :=list[j+1]; list[j+1] :=temp; sorted : =false end; Objectives and Musical Mapping The objective in this experiment was to investigate the effectiveness of variable rhythmic patterns (groups), occurring in several parts of a 13 notes sequence, in 4This sorting algorithm is usually referred to as Bubble Sort because small elements appear to move slowly or Bubble towards the top of the list as in a liquid. 113

128 attracting the listener's attention to particular parts of the sequence. These rhythmic patterns may also assist listeners in remembering several locations within the sequence because of the rhythm. Another musical aspect investigated was the possibility of attracting listeners' attention to use of major triads (i.e., three notes arranged in a particular manner). Musicians use triads in their compositions extensively. Other objectives included the investigation of using stereophony and different timbres in order to assist listeners to differentiate one musical message from another. The short-term memory of subjects in remembering the location of rhythmic and non rhythmic patterns or triads within the sequence are also been tested together with the degree to which the subjects could perceive the totality of the continuous musical messages used. For example consider the following cases: Locations: Ll L2 L3 L4 LS LB L7 L8 L9 LlD Lll Ll2 Ll3 Example 1: Nl N7 (N2 N2 N3) N8 NlD Nl2 (N4 NS NB NB) N9 Example 2: (Nl N2) N7 N6 (NW NU Nl2) NlS N8 (N4 NS) (N9 NlO) The letter' L' stands for the location within the current sequence and the letter 'N' stands for the notes communicating elements in the list (i.e., note N6 communicates the value in location 6 on the Diatonic scale and it is therefore equal in pitch with any other N6 notes found elsewhere within the sequence). For simplicity numbers are used rather than musical notation. Rhythm is introduced in the following way. If adjacent locations (Ll, L2, etc.) contain notes which are sequentially rising or equal (i.e., sorted) then they are played faster in a rhythmic pattern. So, for example, in the two examples given above, notes in parentheses are played with a quicker rhythm. As the list becames properly sorted, the of rhythmic elements will change. For example, after two passes the list will be: (NI, N2, N2, N3) (N7, N8) (N4, N5, N5, N6) (N9, NlO) N12 In musical notation, figure 3.23 shows example 1 as it is numerically represented above. One can see that sections of the list are sorted. The Bubble Sort algorithm offers a way of investigating these objectives. Therefore, the information which needs to be communicated in the Bubble Sort algorithm [184] involves the current state of the list, progression through the list, swapping and the conclusion of the sort. These are implemented as follows: 1. Current state of the list. The numbers within the list are converted into rising pitch starting from the middle C in the diatonic scale using a Celeste. Variable 114

129 Example 1: Figure 3.23: A musical representation of example 1 initially and after two passes. rhythmic groups of notes are used to communicate groups of ordered elements within the list. The musical mapping used allows rhythmic patterns to be created when the same or orderly ascending numbers are encountered in the 13 elements in the list communicated with 13 notes sequence. 2. The progress of the list and swaps among elements within the list. Progress through the list itself is heard in the same manner as the current state of the list using a harp for contrast and the swaps are communicated using a major triad (or triple with the lowest note of the list element). The greater element of the pair which is to be swapped is played first, then the smaller, and finally the higher element again using a trumpet triad. During all instances in communicating this information in the Bubble Sort algorithm, stereophony is used as shown in figure The major triads were communicated from the left speaker interrupting the sequence of notes played from a harp in the right speaker. Thus the major triads apart from interrupting the sequence of notes musically, also create a contrast in stereophony because listeners' attention to the right speaker is interrupted from the major triad heard from the left speaker. The current state of the list using the sequence of notes with variable rhythmic groups was always communicated from the centre (MIDI notation 60). The overall design principles for the mapping are shown in figure The experiment seeks to investigate and address answers to the following: Can listeners recognise the variable rhythmic groups within the sequence of notes and thus determine the position of adjacent and equal elements (e.g., numbers) of a list? Can listeners recognise parts of the list as being ordered and some other parts as not being ordered as this is projected from the rhythmic patterns"? 115

130 ~ ~, lrljl.!pet SWAPS CHANGES IN TIlE LIST CELESTE CURRENT STATE OF TIlE LIST HARP PROGRESS IN TIlE LIST Figure 3.24: The information communicated using music. ImZ""Jm~m ----, 'r-~~-.-.-l---~ POIitIoQ 01 lilt Eiemmltl of!be element with 1111 id miej Ilul,._--- ' :0' p<llfllon witbi.a tl:w:lln.ljjj. g r"l-~l ~ ~r l-,;... """'. '-:... EkmeQtJ Rt.mained, ',, --:..,.--;, '--.: ConllnuoUJ MUllc Pmieatetl Communicllln, the ColDmullk:aliq till> CutR.& Sta& m th6 Lilt MuliCli Me".. I CoUliDuOOI MUllc Pni:1e1U4 COIJIIIIUlIicalina lbe ElemeDll of the lj&t bcl.ca Re-Arrugd Musiclll Menage :2 Figure 3.25: The integration of musical structures in order to communicate the current state ofthe list and progress within the list using two messages of continuous music. 116

131 Can listeners understand the swapping of elements using major triads, interrupting the sequence of notes at locations where a swap occurs, as they are performed by the algorithm, and determine the positions where swaps occurred within the list? To what extend and accuracy can users follow execution of the algorithm and thus the variable rhythmic groups as well as the major triads interrupting the sequence? Continuous Musical Messages Experiment Ten subjects (undergraduate students) participated in this experiment. Subjects were not told that this was the Bubble Sort algorithm. It was simply explained that they were hearing an algorithm for rearranging numbers which were communicated in sequences of notes with variable rhythmic groups to represent ordered or similar elements. In addition the principles used in the mapping and rules were explained along with examples of the continuous musical messages communicating the current state and progress within the list (5 of each). Two continuous musical messages were presented, one communicating the current state of the list using the variable rhythmic groups and the other the progress within the list using the major triads interrupting the sequence of notes. Each cycle of the Bubble Sort passed completely through the list was communicated by one continuous musical message repeated for three times. Then a pause of two minutes was given so that subjects had time to reason about the musical stimuli heard and answer the questions. Subjects had to answer the following questions after each pass of the Bubble Sort. Indicate which parts of the sequence of notes is in order with each other or equal? (Mark equal elements with aline) I ~.I ~ ~ I ~ I ~ I ~ I ~ I ~ 1 10 III Mark appropriate position(s) that changes occurred (swaps): I ~ I ~ I ~ I ~ I ~ I ~ I ~ I ~ I ~ I ~~ I ~~ I ~~ I ~~ I 1 \ 117

132 Each pass was played three times. The subjects were advised to use the first time to familiarise themselves with the list, the second to identify the parts of the list which were in order in the current list or the position that swaps occurred in the progress list, and, finally, to use the third time confirm or correct their answers. The derived results were analysed in three ways: 1. Can listeners locate positions within a 13 notes sequence with the aid of variable rhythmic patterns? 2. Can listeners interpret same pitch notes within a variable rhythmic pattern? 3. Can listeners locate positions within a 13 notes sequence with the aid triads which interrupt the presentation of the sequence? A detailed account of the elements perceived correctly and wrongly along with a learning effect is shown in figure 3.26 for the state of the current list and figure 3.27 for the swaps which occurred in the progress list. Firstly, this experiment examined the capability of listeners to identify locations within a 13 notes sequence communicated using variable note length rhythms (i.e., length 2, 3, 4 notes and so on depending on whether the list element were orderly ascending, or equal in value). Figure 3.26 shows the frequency of subjects identifying the correct locations within the sequence where the rhythmic patterns were attracting attention. By observing the passes, it can be observed that a learning effect seems to be taken place, some of perceptions of correct locations within the sequence reaching up to 100%. However, as one would expect with a sort, rhythmic patterns become longer and consecutive (i.e., one rhythmic group merges with another). A general observation which can be made about this data is that listeners can more easily perceive long rhythmic patterns (3 notes or more) or consecutive rhythmic patterns of equal or different length. It becomes easier to perceive rhythmic patterns as they increase in length because the rhythmic patterns become more noticeable. Similarly for consecutive rhythmic patterns, listeners remembered their location more successfully. Subjects were able to construct ratios either of the rhythmic groups or non-rhythmic ones depending on whichever the listener subjectively judged to be easier. For example, in pass one (see figure 3.27), the ratio is 3 notes initially for the non rhythmic patterns, followed by 7 notes for the rhythmic patterns and 3 notes for the final non rhythmic pattern (the term non rhythmic pattern( s) means the default rhythm). Secondly, this experiment examined the capability of listeners to identify samepitch and ascending pitch notes within a rhythmic group. Table 3.12 shows the 118

133 "7 5:,, :., 8:$, ',e 10:_, 2:. 9 9,. 10: : 2,,,, I I, I 7. 9:0 1O:@ 10:0 10:0 7 $ 10:. 10:0 10:. 10:. I I I I 7 e 10:0 1O:@ 10:. 10:. 3 G 9:. la:. 10:. 10:. I, I I 2'. 10:. 9:0 10:. B 8 " 10: m 8:" 2: 1 4:$, 8:., 10:., '.::l 7 lo:e 10:. 8:. 7 2:0 9:. ] 6 " 10:0 10:0 9:0 7 G 9,, :. 10:. 10:0 9,. ID:e 10: 2: I I I I I I, :, 4 0 7:_ 10:8 10:_ to:o 10:0 10:0 9: 2: 3 2:. 10:. 10:. 10:. 10:0 10:. 10:_ 9: 2 :.. 10:. 10:. 10:. 10:. 10:. I I I I I I I I I I I, 10:0 10:., 10:. 10:., 10:. 10:0, 10:.,, 8:@ 10: 10: G 10: 10:. 10: 10:_ 10: 10:. 10: 10:9 la: 10:. 10: 9:_ 10:0 10: 1:0 10:. 10: 1:_ 101 :. 10: 10\. 10:, Figure 3.26: The distribution of frequency in the perception of variable rhythmic patterns in a sequence of 13 notes in the Diatonic scale as it was generated from the sorting algorithm in the communication of the current state of the list. The shaded circles indicate the locations with rhythm in the 13 note sequence. The numbers next to the shaded circles show the frequency of subjects who successfully identified the rhythm. number of subjects who successfully identified same-pitch notes within variablelength rhythmic groups as generated from the sorting algorithm. Possible learning effects may have also occurred. It can be seen that percentages are smaller in identifying same pitch notes in short rhythmic groups (see passes 3, and 4). However, in longer rhythmic groups percentages are higher. Assuming that this is not a learning effect then one can remark that it is easier to perceive pitch variation in long than short rhythmic groups. Thirdly, this experiment examined the possibility that subjects could identify positions within a 13 notes sequence where re-arrangements (swaps) occurred. Figure 3.27 indicates that it was easier for listeners to perceive the location of swaps (using triads) communicated one after the other in the sequence. On the other hand, when the location of only one swap was required to be perceived within the sequence then it was more difficult (see passes 1 and 12). An interesting learning effect was that at the end of the experiment, all subjects were able to distinguish the two different continuous musical messages used to communicate the state and the progress within the list without the aid of stereophony. They had learned the messages, so stereophony was not any longer required. This can be very useful in interfaces using music. Stereophony can be used to train listeners at the beginning then it can be re-used for other musical messages once 119

134 Same pitch Notes Identified Length of Group Pass % part of 5 notes rhythm Pass % part of 7 notes rhythm Pass % 2 notes rhythm Pass % 3 notes rhythm Pass % part of 4 notes rhythm Pass % part of 5 notes rhythm Pass % part of 8 notes rhythm Pass % part of 7 notes rhythm Pass % part of 8 notes rhythm Pass % part of 9 notes rhythm Pass % part of 15 notes rhythm Table 3.12: Percentages of the accuracy rate in the capability of subjects to identify same pitch notes within rhythmic groups of variable length. The table shows the percentages for the identification of same pitch 13 1 : 12 e 7', 2',, 11e 9:_, 10: 2:,, loe 9:e 10:e, 7: 2:,.e lo',e lo',e 9:e,:,, 3 :, e 7:_ 10:_ lo:e 9:_.: 3, ',,, 2:_ ID:_,,, 6 1: 8:e lo:e 10:_ 6: 1, S 2:e 8:. 9:e, 10:_, 8:e.:,, 4 1:_ ':- 10:_ 9:e, 1O;e 10: l',,, 3 1,, l:e 9:_, 8:. lo:e, 9:. 9:, e 9: :e 8:e 10:,,,, ] 7 3:_ 7:_ 9:e 10:_ lo:e.:,: ~ S 6 7 passes ~ Figure 3.27: The distribution of frequency in the perception of triads communicating location in sequence of 15 notes in the Diatonic scale as it was generated from the sorting algorithm in the communication of the progress (swaps) in the list. 120

135 listeners have learned the first musical messages Discussion Overall, this experiment using the Bubble Sort algorithm as a vehicle for investigation shown two ways (one in the current state of the list and another in the progress of the list) under which attention of the listener can be attracted to particular but, still, abstract parts (locations) within a sequence of notes. There is evidence to suggest that the rhythmic groups assisted listeners to perceive the same pitch notes within these rhythmic groups. This is particularly interesting if one considers that the time taken for the whole 13 notes sequence with the rhythmic patterns was under 20 seconds which was quite fast. One approach was to use variable rhythmic patterns and the other was to use major triads which interrupted the output of the sequence of notes to communicate re-arrangement and abstract location within the sequence. The results from these experiments suggest that subjects were in a position to understand the state of the current list and the changes which occurred in the progress list. Although, it is not known if subjects would have recognised the 'celeste', the fact that was the only instrument to be heard in the middle (stereophonic position at 60 in MIDI), subjects recognised that this is the celeste communicating the current state in the list. The results of the experiments also demonstrate that musical structures can convey not only one single message but a whole set of information (e.g., order of elements in any current state of a list) with one continuous musical message. This makes the possibility of using music to communicate the execution of algorithms stronger and it suggests that combining different musical structures helps in communicating large sets of information. The experiment shows that musical structures communicated the following with a reasonable success: 1. Elements in rhythmic groups and their relationship within the unordered groups. 2. The location of elements in the list where re-arrangements (swaps) took place. The musical design involved: Continuous musical messages to communicate the information involved at each cycle of the loop. The information communicated by each musical message 121

136 was a set of data linked and interrelated with each other (ordered elements with non-ordered elements in the current list). Combination of musical structures by using ascending pitch and rhythm in the communication of the current list and ascending pitch, triads (based on the pitch that the swap took place), and rhythm for the progress within the list. Different instruments were used for the output of each message. In addition, the message used to communicate the progress in the list used two different instruments to ensure that the swaps were not only disambiguated with triads and rhythm but also with timbre. Stereophony to provide further physical cues for disambiguation instead of monophonic or equal stereophonic ou tput. Another combination among musical structures can be seen in the music used to communicate the swaps of elements occurring within the list. In this case, a short tune (major triad, 3 notes) with its rhythm was combined with the pitch of the list element to be swapped. Thus, the tune communicated a swap of two elements within the list used the pitch of the list element. So, although the location that a swap occurred within the list was communicated using notes from the Diatonic scale, subjects were also communicated the location where the swap occurred from the pitch of the major triad. This, again, underlines the importance of combination of musical structures which make sense musically. This assists in communicating information. 3.9 Overall Discussion The experiments documented in this chapter employed musical elements and structures such as pitch, timbre, rhythm and stereophony to communicate information. The experiments with sequences of ascending pitch notes were shown to communicate approximate numerical values which can be used as metaphors for length or distance. The recall experiments with the instruments underlined the failure of listeners to correctly name the majority of instruments heard. However, some instruments were successfully recalled by most of our subjects (e.g., piano, organ) in the absence of any training. It was also seen that there were a number of instruments which could be recognised once the listeners were aware of the names of the instruments which were going to be heard. An instrument dictionary was shown to assist listeners in recognising instruments even if they were similar. 122

137 Listeners can process integrated musical structures (e.g., pitch, rhythm) as the results suggest. The results from the continuous musical messages generated from the Bubble Sort algorithm suggest that listeners are in a position to follow stereophony, timbre, pitch changes and rhythm in a continuous musical message. It has also been shown that there are at least two ways under which a designer can utilise stereophony. One way of using stereophony, as seen in the experiment with PCTE OMS related contents, is to use spatial location as a physical cue which helps the listener to make easier interpretation of the semantical content of the musical message (i.e., what the music attempts to communicate? a queue?) Another way, as seen in the experiment with the sorting algorithm, is to use stereophony in order to allow the user to differentiate different types of musical messages (e.g., current list or progress within the list). To state the obvious, the latter was important in the Bubble Sort experiment because the listener needed to decide the type of the information being communicated via the musical message before the actual content of the music was interpreted. How important is stereophony? The answer to this question depends on the listener's capability of identifying the syntax of the musical messages and, thus, the type of information communicated. As discussed in section 3.8.3, subjects were asked to identify the type of information communicated from the musical messages without the aid of stereophony at the end of the Bubble Sort experiment. That is, subjects were asked to describe what type of information this musical message communicated. All subjects successfully recalled the types of information (i.e., current state and progress of the list) without the aid of stereophony. Certainly, this was as a result of their previous experience with the musical messages of the Bubble Sort. But, what does this mean? Stereophony is a useful aid to differentiate musical messages which are unknown to the listener. However, as the listener becomes familiar with the syntax of the musical message (e.g., ascending pitch in a musical scale, or rhythm and triads), the listener's need for stereophony to differentiate musical messages diminishes. This is because once the listener is familiar with the musical messages the need for stereophonic cues for differentiating musical messages is not any longer vital. How long will it take for a listener to become familiar with the musical messages? In the Bubble Sort, a short exposure of the listener to the musical messages was enough. This is particularly important in auditory interfaces using a large number of messages types. A designer can utilise stereophony for a small number of musical messages (say five or six) but it not clear how, for instance, twenty different types of musical messages can be differentiated using stereophony? In addition, 123

if stereophony is used to differentiate musical messages then the presentation of the content of the musical messages (such as those used in PCTE OMS related experiments which use stereophony to

138 if stereophony is used to differentiate musical messages then the presentation of the content of the musical messages (such as those used in PCTE OMS related experiments which use stereophony to enhance the musical message itself) takes very little advantage of stereophony if at all! The experiment with the PCTE OMS related contents used stereophony to help the listener to interpret the content of the musical message and, as it was shown, it had a positive effect. Furthermore, it was seen in the experiment using stereophony (see section 3.5) that the positions of stereophony successfully identified by listeners using speakers was limited. The ideal scenario will be to utilise stereophony not only to different types of musical messages but, also, in enhancing the actual content of the musical message itself. We believe that this can be achieved by training or conditioning the listeners in stages. At each stage, stereophony will be used to help the users to familiarise themselves with the different types of messages. Once those messages are known by the listeners, stereophony is free to be re-used in the next stage for other new messages. Figure 3.28 indicates what might be happening in communication with music. Musical relationships and other rules imposed by the designer are used to represent relationships within the problem domain being communicated. From the listeners' viewpoint, there are three possibilities. One possibility is that listeners will understand the message limited only by other non-musical variables (e.g., concentration level, interest to interpret, perceptual context). The second possibility is that music does communicate various aspects but not all of them are perceived by the listener. Finally, the third possibility, is that the designer fails to construct musical mappings which can be perceived by listeners. One must, however, note that the experiments have also addressed issues such as structure, classification, and presentation of musical messages. The issue of musical message structure refers to the way in which musical messages ranging from the single and simple to the multiple and complex, are constructed. This structuring format must be applied uniformly so that musical messages within a particular problem domain are interpreted uniformly. For example, in the Bubble Sort algorithm, musical relationships of pitch were uniformly used to communicate arrangements of elements of a list. The music heard made sense musically, to those hearing it. The issue of classification refers to the grouping of musical messages. Musical messages of the same classification (or type) must share similar structural and design properties of music. This can be thought of as analogous to visual interfaces where a visual user feedback can be classified within a number of different categories, each of which designates a particular type of message. In interfaces using music, this 124

139 Figure 3.28: The mapping correspondence between problem domain events and musical events helps, just like in visual interfaces, to disambiguate one message from another. The issue of presentation refers to the order in which musical messages are presented so that the listener can make sense and interpret them appropriately. When there are only a small number of musical messages, presentation may not have a primary role but in large numbers of messages the order must be taken into an account. There are also other issues which concern communication. One of them is the development process (lifecyde) of a musical message. This issue refers to the various stages involved in the lifecycle of a musical message starting from its initial design through to implementation and user-evaluation as well as message alteration and future modification to accommodate the changing circumstances and needs of a dynamic user interaction. The final form of music used to communicate information in the experiments described earlier derived from numerous informal trials with listeners (although not as such reflected in this documentation). Those trials took into account the opinion of listeners and contributed very importantly in the design of the musical messages. There were a number of mappings initially designed and then discarded because they did not appear during trials to communicate the intended message successfully. Beyond academia, designers who wish to use music in software, need some practical assurance that the messages designed using the guidelines will work. The steps taken to ensure this involve the process of 125

140 development or the lifecycle of a musical message. The musical properties and structures used so far have shown a way of using music to communicate with users (listeners) in two problem domains. Thus, there is a high probability that they will also be useful for representing properties of other problem domains regardless of how different or similar these domains may be. One must also focus on what the term 'making sense of the musical message' really means. As observed in the investigation, a listener makes sense of a musical message only when the message, in the context in which it is being heard, is understood by being compared with previously stored templates of musical knowledge or other messages, placed into context with the rest of the information that the listener is aware, interpret it and store this information in the short-term memory for future reference. Repetition of the same musical message and frequent use will enable the listener to acquire knowledge of the musical message and store it in the long-term memory. In an interactive context, it is quite usual that a number of interactions involving music will be required from the user to complete a reasonable size task. If a small error is introduced in any of these stand alone interactions which form the activities of a user's task, then when all these errors are added up (user's miscomprehensions) the user's task will inevitably be liable to the overall error accumulated from every single interaction. In order to study all these issues as well as in examining further the applicability and appropriateness in using the musical properties of pitch, rhythm, and timbre, an interactive application which will facilitate experimentation and observation is necessary. In the next chapter an interactive experimental framework, graphical understanding and drawing application, AudioGraph, for blind users will be discussed Summary This chapter started with an introduction of the objectives that this thesis will, at least partially, investigate. It also sets out our probing experimental approach in different interface situations such as using music to communicate some information of PCTE OMS type-related data, to communicate some aspects of the execution of the Bubble Sort algorithm and possible use of pitch sequences, rhythm, timbre, and stereophony. Firstly in this chapter, some ad-hoc experiments using sequences of rising pitch, stereophony, and timbre are reported. Secondly, experiments with PCTE OMS type-related (sample) data showed that music could assist on-line visual 126

141 browsing by communicating aurally some visually hidden information. It was also shown that musical design can be approached in a way in which the musical messages can deliver more than one piece of information to the user. Thirdly, experiments with continuous musical stimuli as were generated from the sorting algorithm showed that combination of musical structures (pitch and rhythm) help in communicating the single pass of an algorithm in one continuous message. The results in this chapter indicate that music can be used to communicate messages of an intuitive nature. For example, subjects were not trained to recognise the passes of the Bubble Sort apart from presenting a few examples to demonstrate the nature of the musical design and explain the rules of the musical presentation, yet they were able to identify crucial aspects of the sorting process. 127

142 Chapter 4 AudioGraph: Experiments in the Graphical Domain 4.1 Introduction This chapter discusses some experimental findings from experiments using music as a communication metaphor in the problem domain of graphical presentation for blind users. Firstly, the research objectives in this particular problem domain in the context of the overall objectives of the thesis are reviewed. Secondly, the chapter also focuses in reporting, discussing and interpreting the experiments and experimental findings of the work carried out in this problem domain. A experimental framework program, Audiograph, has been developed in the light of the research work previously discussed and as an attempt to investigate musical information processing further. Although AudioGraph can be viewed as a prototype tool, its declared aim was to investigate musical information processing in the graphical domain. The Audiograph program implements some musical mappings which could assist blind users in reading graphical information through the use music. The research work documented here focuses in the usage of music as a communication metaphor (rather than speech). Experiments under the AudioGraph program had two phases, although no differentiation between these phases is reflected in this chapter. For information purposes, these phases were: 1. The first phase concentrated on the development of the AudioGraph experimental framework and investigated the overall feasibility and overall applicability of music in such a complex problem domain as graphical presentation. 128

143 Particular research emphasis was placed on the overall representation of the graphical drawing area and the perception oflocation within such an area using music. The description of graphical objects using music was also investigated. 2. The second phase of the experiments included the investigation of other issues and extra experimental mappings. Additional functionality was added incrementally onto the existing framework. Investigations were pursued into comprehensive presentation and editing operations of graphical objects, and methods of representing whole sets of graphical objects. Finally, the chapter concludes with a discussion of the overall results, an lllterpretation of the experiments, applicability, and the appropriateness of music to communicate graphical information to blind users. 4.2 Research Objectives The first objective was to test the overall feasibility and viability of the research experimental directions pursued. Secondly, an overall framework was established on which further research study could be based. Thirdly, the research work was pursued with the intention of enlarging the overall understanding of the communication of graphical information using music. It is expected that results of this research will contribute in assisting blind users in interacting with computer generated graphical information but, before this happens, an investigation is required to determine the methods and structures needed for fundamental properties of the graphical domain, such as position within an area or the shape of a graphical object, can be communicated with the aid of music. The achievement of these objectives requires the construction of an experimental framework tool, not with the objective to serve as a complete tool ready to be used by blind users but with the intention of using it as a framework for experimenting and evaluating musical mappings along with the measurement and observation of musical information processing in this particular problem domain. The creation ofthe AudioGraph experimentalframework will help in establishing. a viewpoint of potential capabilities and limitations in using music to communicate information of a graphical nature. It is understood, of course, that future mappings will necessarily involve music and speech, and possibly other modes as well, in a balanced manner. This balance will be primarily dictated by the perceptual needs 129

144 and capabilities of blind users in processing music and speech. The musical aspect is investigated. 4.3 AudioGraph: An Experimental Framework The AudioGraph framework implements musical mappings in order to examine if these mappings could assist blind users to understand graphical objects in a twodimensional space. The system was designed to use music to communicate a space containing graphical objects (e.g., squares or circles), provide an auditory cursor to move around the graphical drawing area, locate objects, create objects, edit objects and select menu options to control the tool (some Function keys have also been used). Facilities also exist for scanning the space. In designing the framework, the following issues needed to be considered not only for the current experimentation but for future experimentation. The mode of interaction which the AudioGraph uses for exploratory investigation purposes in employing music as a spatial communication metaphor. The architecture of AudioGraph in terms of its software and hardware. The design principles of AudioGraph and its object-oriented design for musical presentation which offers the provision for expansion and alteration with minimum software effort for incremental building and experimentation. The overall description of AudioGraph's musical functionality and its input interface mechanisms. The musical mappings in AudioGraph for the graphical drawing area and the specification of relative location within it. The implementation of the auditory cursor, graphical drawing objects (e.g., circle), and presentation techniques for a set of graphical objects. The usage of speech for further evaluation and its potential, integration with music. However, one must note that the experiments, at this stage, were performed with music only System Architecture The AudioGraph was developed using an object-oriented methodology in Pascal. The architecture involved: 130

145 A object-oriented mechanism to allow a designer to tryout many musical mappings in such a way that properties of music can be implemented easily in the absence of too many constraints from the software design (i.e., provisions are taken to allow one to change the musical mappings, timbre used and presentation speed or order in an easy manner). A visual presentation facility for the experimenter (as well as for future experiments which may require integration of visual and audio media to accommodate needs of partially sighted people). A system which makes use of both an internal and an external synthesiser with optional amplification, and speakers (an external synthesiser is used). The figure 4.1 shows the different software components and hardware specification of the AudioGraph experimental framework program. These are the input manager, the graphical drawing manager, the auditory manager, and the low and high level MIDI libraries. The input manager handles all the input to the AudioGraph and offers design provisions for expansion or alteration of its functionality. The keyboard is the device which is currently used for all input to the tool since blind users in higher education are usually familiar with keyboards. The graphical drawing manager produces all the visual output of the AudioGraph, interacts with the input manager and informs the auditory manager for all user input. The auditory manager is responsible for all the musical and speech output of the tool by making calls to the high level MIDI library which in turn, after appropriate processing makes further calls to the Sound Blaster Kit! low level MIDI libraries. The auditory manager synchronises its output with the graphical drawing manager so that musical and visual presentation is co-ordinated. This co-ordination will be useful for experiments with partially or severely visually impaired users. One must note that the visual display of the AudioGraph, in the experiments conducted, was for the use of the experimenter only and that all of the subjects participated in the experiments were not in a position to make use of any visual presentations Overall Description of Functionality In this section, the overall experimental functionality of the AudioGraph tool is described. Some of the concepts to which particular significance is given are: 'Sound Blaster Kit (SBK) is a trademark of the Creative Labs. 131

146 I USER INPUT ():eybolrd) I AudloGl1ph oftware I INPUT MANAGER I GRAPHICAL ORA WING MANAGER AUDITORY MANAGER J j! I HIGH LEVEL MIDI LIBRARY I VISUAL DISPLAY (FOR EXPERIMENTER USE ONLY) I j LOW LEVEL MIDI LIBRARY J I I I INTERNAL OR EXTERNAL SYNTHESIZER I AMPLIFICA non (OPTIONAL) SPEAKERS I HEADPHONES I J I Figure 4.1: An Overall Presentation of the AudioGraph Software and Hardware Architecture. The auditory-musical output that the tool offers for mapping and presenting graphical information with a number of different properties of music such as pitch, rhythm, and tunes. The user interface acoustical interaction mechanisms under which a blind user can operate the tool. The screen layout which allows a user to navigate around several locations by using the cursor via the arrow keys of the key board. The AudioGraph prototype communicates to the user the following: 1. Relative position within a musically defined graphical drawing area of a point, graphical drawing objects, and the auditory cursor. 132

147 2. The shape of graphical drawing objects using a semantic interpretation process (i.e., representationally), as opposed to associative symbolic recall, using musical messages which present the shape of the graphical drawing object in question according to the musical properties defining the graphical drawing area. 3. The meaning of musically defined editing operations such as contract or expand, and musically defined presenting filenames in loading and saving graphical information. 4. Editing operation performed upon graphical drawing objects. 5. The musical presentation of objects using a number of different orders (objects to be presented first, second and so on) in order to assist users to understand different arrangements of objects within the graphical drawing area. 6. The abstract location of graphical objects and available space in the surrounding and more distant areas from a variable position of the auditory cursor within the graphical drawing area so that the user can plan the movement of the cursor with prior knowledge of the location of other graphical objects. Further functionality and extensions of the musical mapping can be added User Interface Organisation and Strategy The visual layout as well as the auditory interaction of the AudioGraph is divided into two main parts. The overall layout is shown in Figure 4.2. These parts are: 1. Graphical drawing area. This is the area where the diagram is drawn in the visual sense. It is also mapped musically into two aural dimensions. 2. User Control panel area. This is the area where all the user controls of the tool are located. They are presented visually, musically and verbally (using a speech synthesiser) Graphical Drawing Area The fundamental problem was the design of the musical representation for the auditory space within which the graphical objects will be accommodated. In order 133

148 I LouP')Jborou~h University of Technolo~, Dcrmrtment of Computer Studies LurCH! AudioGraph 1 CIRCLE 1 IRECTANOL~ I LINE 1 SQUARE I I READ 1 1 UNDO 11 TEXT 1 1 MOVE 1 ICONIRACT 1 I EXPAND 1 I LOAD 1 1 SAVE 11 CLEAR 1 I CEN1RE 1 IASCENDINOI IDESCENDIN9 I TOP DOWN\ I SUB-AREA I I QUIT 1, 1 REMINDER 1 Graphical Drawing Area User Control Panel Figure 4.2: A Visual Presentation of the AudioGraph Auditory Interface. to fully understand the concept of this space, it must be defined. The space IS viewed as a definite area with beginning and ending boundaries for its vertical and horizontal dimensions (variable sizes could have been used but the experimentation here uses one, fixed, size). This, of course, defines a two dimensional space. The tool has design provisions to accommodate an extra dimension (i.e., three dimensions) with variable sizes, but a two dimensional space is used in the experiments. The graphical drawing area is the area within which the user draws and further edits particular graphical objects (e.g., lines, squares, circles) and it is defined using music to represent relative co-ordinate locations. The horizontal and the vertical dimensions are represented musically using two different instruments as shown in figure 4.3. The length of the horizontal and vertical dimensions is represented by a pitch sequence using the chromatic musical scale. The horizontal axis is represented by a piano. The pitch increases from low to high, the leftmost location being represented by the lowest pitch and the rightmost location by the highest pitch of the selected 134

149 High Pitch 40 Graphical Drawing Area Organ Y Low Pitch 1,, I',,,,,,, li1,,,,,' / Low Pitch x Piano... 40, High Pitch A Pair of Two Notes <OColumn bow.) Figure 4.3: The Musically Defined Graphical Drawing Area. forty-note pitch range (starting from E2). The vertical axis is represented by an organ. The pitch increases from the bottom part of the graphical drawing area to the top (again a forty-note range). The horizontal location is always presented first, followed by the vertical one. Additional cues are given to assist the user. A' short rhythmic sequence is used to separate the horizontal (X) and vertical (Y) coordinates. Piano and Organ were chosen because they had a good recognition rate in the experiments with instruments (see section 3.4) and they have a long pitch range. The cursor location, within the graphical drawing area, is represented by a pair of notes which designate the horizontal and vertical dimensions in the appropriate pitch according to the particular location. This pair of notes is also used to provide user feedback for the descriptions of the graphical objects' shapes. In the mapping of the auditory cursor, the pair of notes offers immediate feedback to the user with regard to the location of the cursor and its movement within the graphical drawing area. The horizontal location is always presented first, followed by the vertical one (to be consistent with all other presentations). When the cursor reaches the boundaries of the graphical drawing area, a rhyth- 135

150 mic signal of notes informs the user that there is no further space for the cursor to move into. This end-of-zone boundary signal is played using a percussive instrument. The tool does not allow the cursor to move any further once a boundary side has been reached. If the user tries to move the cursor further from the boundary the boundary musical signal is played repeatedly following the number of times that the particular arrow key has been pressed. The auditory cursor is relocated from the graphical drawing area to the user control panel when the user presses the 'space' key and vice versa User Control Panel Area The user control panel area offers a number of options under which a user can operate the framework. When the cursor has left the graphical drawing area (by pressing the 'space' key) a synthesised speech message informs the user that the graphical drawing area has been vacated and the control panel area has just been entered. The control panel area is shown in figure 4.2 is divided into smaller subareas in which the visual displays are shown in squares. Each of these subareas in squares represents a particular option. The navigation of the cursor within the control panel area is performed in exactly the same manner as in the graphical drawing area, that is, using the arrow keys. The cursor moves from one subarea to another within the user control panel area as the arrow keys are pressed. On pressing the 'space' key, the cursor will return to the position in the graphical drawing area where it was previously located. The philosophy behind the auditory interaction process between the user and the tool is based on the following principle: The user presses the arrow keys of the computer keyboard (i.e., left, right, up, and down) to navigate the cursor around the graphical drawing area and the user interface control panel. Any user action, task or option can be confirmed by the user when any key apart from the arrow keys of the keyboard is double pressed. A single press of any key apart from the arrow keys will cause the tool lake no further action. The sub areas associated with particular options are: 1. Circle subarea. When the cursor is located within this subarea, a circle graphical object can be selected (using a double key press). 2. Rectangle subarea. When the cursor is located within this subarea, a rectangle graphical object can be selected. 136

151 3. Line subarea. When the cursor is located within this subarea, a line graphical object can be selected. 4. Square subarea. When the cursor is located within this subarea, a square graphical object can be selected. 5. Undo subarea. When the cursor is located within this subarea the last operation being performed can be cancelled. 6. Read subarea. When the cursor is located within this subarea, all objects in the graphical area are communicated using music in the order they were drawn by the user. 7. Drag subarea. When the cursor is located within this subarea, a graphical object already selected by the user from the graphical drawing area can be dragged (i.e., to be moved from its existing location to a user selected one). 8. Contract subarea. When the cursor is located within this subarea, a graphical object already selected by the user from the graphical drawing area can be decreased to a user defined size. 9. Expand subarea. When the cursor is located within this subarea, a graphical object already selected by the user in the graphical drawing area can be increased to a user defined size. 10. Load subarea. When the cursor is located within this subarea, an existing file containing a particular diagram can be loaded. 11. Save subarea. When the cursor is located within this subarea, the current diagram can be saved into a user selected file. 12. Clear subarea. When the cursor is located within this subarea, all graphical objects from the graphical drawing area can be cleared and permanently deleted. 13. Scan subarea. When the cursor is located within this sub area, the objects within the graphical area are communicated using music, scanning from left to right and top to bottom. 14. Centre scan subarea. When the cursor is located within this subarea, the objects within the graphical area are communicated using music, starting from the centre of the graphical area. 137

152 15. Ascending scan subarea. When the cursor is located within this sub area, the graphical objects within the graphical drawing area are communicated using music, starting from the smallest and proceeding to the next bigger object can be selected. 16. Descending scan subarea. When the cursor is located within this sub area, the objects within the graphical area can be read auditorily starting from the biggest and proceeding to the next smaller object. 17. Selected region subarea. When the cursor is located within this subarea, the graphical objects within a user selected graphical area (sub-part of the whole graphical area) are communicated using music. 18. Quit subarea. When the cursor is located within this subarea, the user can exit the AudioGraph. 19. Reminder subarea. When the cursor is located within this subarea, an option that will remind the user about the most recent graphical object shape which has been selected (providing that such a selection has been made). To select any of the above options any key (apart from 'space', 'arrow', and 'F' keys) must be pressed twice. The user can select a file in order to load or save a diagram with the following interaction mechanism: When the options load or save are selected and confirmed then the relevant files are presented to the user as a sequence of different musical melodies (short tunes) in a serial manner separated by short time intervals. The user can select any of these files, once the file associated melody has been played, by double pressing any key apart from the arrow keys. The user selection and confirmation must take place in the time interval period offered by the tool immediately after a particular musical melody is being presented. If no user confirmation is given during the time interval then the tool proceeds with the musical presentation of the next file and so on. In the event of no user selection and confirmation and all files have been presented then the tool takes no further action User Interface Input Another important issue, in the design of the framework, was the selection of an input mechanism which will enable blind users to navigate the auditory cursor of the 138

153 experimental tool, especially, in the graphical drawing area. Interviews with blind users indicated that the input mechanism must provide users with facilities for: 1. Navigating the auditory cursor in a convenient manner. 2. Moving the auditory cursor in a structured way characterised by known user distance steps. 3. Facilitating convenient and easy search with the auditory cursor for location and exploration of graphical drawing objects accommodated within the musically defined graphical drawing area which will also enable editing operations on graphical objects to be performed without confusion. 4. Relocating the cursor to its initial position before an operation occurred so that users can manually perform an operation similar to an 'undo'. This is particularly important when the cursor is being moved wrongly to some other close location. 5. Select options exported by the User Control Panel. The candidates for the input mechanism used in the AudioGraph experimental framework were the mouse and the arrow keys of the keyboard. Feedback from structured interviews with eight blind users reinforced the view that the mouse was not practicable or appropriate for this particular graphical context application. All blind users were students from RNIB. They commonly used software packages which utilised synthesised speech output, and some of them also had programming experience. All of them were used to using the keyboard. A number of the blind users quoted examples drawn from their own experiences with computing and information technology which strongly suggested a number of potential weaknesses for mouse input mechanisms in a graphical context where detailed and structured user predictable movement was required. On these grounds, the use of arrow keys was preferred. Figure 4.4 shows the possible movements of the cursor in the control panel. The idea of using this style of boxes as options was initially seen in Soundtrack auditory word processor [123] Musical Presentation of Single Graphical Objects The auditory representation of each graphical object (e.g., circle, square) involves a particular musical sequence. Firstly, the co-ordinates of the starting reference point 139

154 I 04 ~ clul~ r~1. I~t I mm. 07V c... ;... I:!; I 1010icl IO~I;_ ; I I 1016;cl I02i~1 fo;8; I 013t~~ 10~4;~~ 10~5; I 1019fcl IO; fcl hi? I i022 iu.... ~ e, c... I Notation Keys [ I 01, : Menu Options 1<..,..: Onc Press Movement I I : Location of Cursor I Figure 4.4: Possible Movements with the Arrow keys in the User Control Panel of the AudioGraph. of an object are sounded by using the sequential pitch method. The reference points depend upon the object and are - the centre of a circle, the left top corner of a square or rectangle, and the left or top starting point of a vertical or horizontal line. Secondly, the actual shape of an object is presented, in a clockwise sequence, by playing consecutive pairs of notes which correspond to the co-ordinates of the outline of the object within the musically defined graphical drawing area using pairs of notes, for horizontal and vertical dimensions, scanning the locations occupied by the shape as illustrated in figure 4.5. For a circle, the radius sounds twice and then its shape is presented. describing instruments were clavi and celeste. For a rectangle or square, the horizontal and vertical sides are represented by pairs of notes which sound in accordion and harpsi-cord for co-ordinates respectively. Similarly, for a horizontal or vertical line, the instruments used were brass and celeste. Consistency of presentation was maintained by communicating horizontal co-ordinates first and vertical co-ordinates second for all objects using the musical definition of the graphical area. The change of instruments was expected to communicate simply a change in graphical object communication. As a result of the experiments performed, the instruments were The 140

155 Start Start t »-[ I I ",.. "<.... ~ ;: ~ Circle Rectangle or Square Start... 3>... > l> > Top or left-end of line Horizontal or Vertical Line Keys 1... u... u... >1 The sequence in the shape of the object that the musical presentation follows. Shape of the object Figure 4.5: The sequence of graphical object's musical presentation. changed to piano and organ for communicating horizontal and vertical co-ordinate locations in all graphical objects Scanning the Graphical Drawing Area It is important for a blind user not only to work with one graphical object in isolation, but with a number simultaneously. Given that most diagrams will contain a varied number of different graphical objects, a number of scanning techniques (providing an equivalent of visual scanning) will be required. Three different presentation techniques have been developed and investigated, and aim to offer a blind user the following: An abstract idea of the overall arrangement of the graphical drawing objects. A mental model which describes the diagram in a reasonable degree of detail. The top-down-iejt-right, scanning method reads the graphical area, musically, from the top-left corner to the right-bottom corner. This technique was regarded 141

156 at the beginning to be the simplest way of reading a diagram and was thought to be one of the most natural ways of scanning. However it was realised that more sophisticated methods would also needed Start,,,, 'r , ~ ~-- -~.;:"":1[:B=====::J A "...:a.. "-c _-_= 0-- F --_ r-.-.-": G I, ~ cC;." : SeqlUlce of Musical Presentation of Graphical Objects Figure 4.6: A Diagrammatic Approach of the Sequence of the Musically Presented Graphical Objects in Top-Down Scanning. A typical order in presenting graphical objects musically using this technique is shown in figure 4.6. The basic scanning principle defining the presentation order is: Start the scanning from the top left corner of the graphical drawing area and then proceeds horizontally. It also advances a variable step vertically. The first 'reference point' of a graphical object encountered is presented first. The priority is determined from the location of the reference point of each object. The figure 4.6 illustrates the order that the graphical objects A, B, C, D, E, F, G, H, I, J, K and L will be presented musically in the top-down order. The co-ordinates of the reference point of the graphical object determine the order of presentation and not the object's size. For this reason, although circle F is encountered prior to rectangle E, the rectangle is presented first and then the circle due to the fact 142

157 that the reference point of the circle is encountered after the reference point of the rectangle. Another ordering technique was developed which used the centre of the graphical drawing area as the reference point in scanning the objects in the area. The centre scan technique starts, by definition, from the centre of the horizontal and vertical axes of the graphical drawing area and expands in a navigating course of choosing the next object's reference point (i.e., centre of circle, top left corner of the square or rectangle) which is nearest to the centre of the display. ~--- y A - - [:1 ========:::::J B c,.-"a,,,,, '@': ~t:-;===;--' :,, / H', -,,- I ,,, ~r_ ~~~'~~~r_--_7_, / I ~ J,, '-", ' ' D -~, L > : Sequnce of Musical Presentation of Graphical Objects Figure 4.7: A Diagrammatic Approach of the Sequence of the Musically Presented Graphical Objects in Centre Scanning. The figure 4.7 shows visually the order of the musical presentation of the graphical objects - F, E, G, J, H, C, D, I, K, L, Band A. It was found that (section 4.4) the central scanning technique helped blind users due to the following characteristics: It assists the user in the building of a mental model of the graphical data by starting from the centre of the graphical drawing area and, if empty, from the graphical object nearest to the middle of both axes. The users' mental model of the graphical data expands in a circular manner which is more comprehensible for certain graphical arrangements. 143

158 It supports the incremental building of the users' mental model as the diagram expands. It provides a more accurate relevant position of graphical drawing, especially in unbalanced graphical arrangements. The mental model is built with reference to the graphical drawing area's boundaries and the centre of the area. In addition, a facility for scanning a selected region of the graphical drawing area was also developed using the top-down-left-right scanning principle. This facility allowed the user to select a particular part of the graphical drawing area and listen to its graphical content. In these attempts to 'musicalise' the overall positions of graphical objects in the graphical drawing area, a concept became evident during the evaluation of the previous scanning techniques. This was the concept of the size of a graphical object in relation to others. The informal evaluation of the previous scanning techniques in building mental models revealed that users had a difficulty in remembering the size of graphical objects. In one-to-one interviews and questionnaires with blind people it was clearly shown that a more memorable mental model was likely to be constructed if the scanning organisation started from the smallest object and moved progressively to the biggest graphical object and vice versa. In this way, the mental model follows size indications. The blind user can start building the model from the smallest or biggest in ascending or descending order correspondingly. We call these scanning techniques ascending size and descending size scans Local Scanning from the Auditory Cursor In addition to the various scanning techniques discussed above, some further scanning mechanisms using music were developed. Preliminary interviews and trials indicated that blind users needed a set of facilities which allowed them to understand the immediate and further distance space round the graphical drawing area from a variable position of the auditory cursor. This meant that as the user moved the auditory cursor within the graphical drawing area of the AudioGraph, mechanisms were needed to inform blind users of the neighbouring surrounding graphical objects or empty space. More specifically, users indicated that they needed the following information from the current position of the auditory cursor: 144

159 1. The distance from other graphical objects. 2. The available space around the cursor's location available for drawing more objects or other operations. 3. The shape (e.g., circle, square) of the surrounding graphical objects and all other information related to these objects (e.g., size). 4. The graphical objects located further away and their relevant distances from the auditory cursor. In order to address these particular requirements for blind users, a preliminary study was conducted. A new scanning technique was introduced for scanning the immediate and distant graphical objects in the graphical drawing area from any current position of the auditory cursor. We call this star local scanning. This immediate and (distant) scanning technique reads in a star structure where beams of rising pitch scan radii of the graphical drawing area. Graphical Drawing Area Start -~~ t r I '------;~r i-----~ 'J41i. Auditory Cursor, * Clockwise Scanning Movement Variable Angle Figure 4.8: A graphical representation of the principles involved in the star local scannmg. As figure 4.8 shows, the star technique allows the user to understand the surrounding area from the cursor position by communicating a sequences of notes which 145

160 stops when the first object is encountered. The sequence is in a clockwise order and users were told the scanning followed a similar sequence to the clock hours passing from (12, 2, 3, 4, 6, 8, 9, and 11 o'clock) and excluding some hours (1, 5, 7, and 10 o'clock). The remaining hours can be added if the idea shown is to be successful in user experimentation. However, one must note that the objective is to communicate relative distance from the cursor to the boundaries of the graphical area. In addition the pitch variation (ascending or descending) is another cue to assist with the perception of direction. When an object is encountered, the sequence of notes communicates the distance of the object from the cursor. With the help of the Function keys, the user can play the shape of the object. Once the shape of the object has been played then the sequence of notes continue until they encounter another object or terminate. During the development of these scanning technique, other methods were considered but this was followed because it showed a higher level of success in the informal trials. 4.4 Experiments with Musical Mappings This section documents the experiments and results obtained with some musical mappings exported from the AudioGraph experimental framework. It evaluates musical mappings (discussed previously in section 4.3) used in providing abstract location within the graphical area, description of graphical objects and editing operations performed upon them, and auditory cursor perception and navigation, as well as mechanisms for scanning sets of graphical objects (forming diagrams). Apart from the experimental results and observations, user feedback was also elicited using questionnaires and interviews. All the subjects who participated in the experiments had to answer a set of questions and subjects who wished to make further comments were also interviewed in longer sessions. In addition, some preliminary experimental trials were conducted. These trials were used to test research ideas before a more carefully designed musical mapping was implemented on which formal experimentation was carried out. These trials probed the potential merit of a particular research idea. Subjects participated in these trials in an informal way and some unstructured feedback was received. Any observations made and recorded during the informal experimental trials informed the overall design of the formal experiments. 146

161 The subjects who participated in the experiments were volunteers from RNIB2 school, from Loughborough College, from Loughborough University and from blind charities operating in Leicestershire. None of the subjects who participated declared themselves to be professional musicians. Three of the subjects did declare an amateur interest in music but their musical knowledge as well as the knowledge of the other subjects (who did not declare knowledge in music), as elicited from the questionnaires did not suggest significant knowledge. The subjects were an opportunistic sample. Most of them (90%) were involved in higher education and had experience with computers using speech. Not all participants completed all the experiments due to time demands. There were also sessions (without experimental procedures) where subjects were left to operate the AudioGraph freely. During these sessions, however, observations were made and noted by the experimenter Location Experiments Using a rising pitch on the piano to map horizontal co-ordinates and another using organ to map vertical co-ordinates, experiments to communicate graphical locations were performed in the following situations: 1. Use a sequence of notes starting from the beginning of the scale and finishing at the actual note corresponding to the location. We call this the sequence of notes experiment (see section ). 2. Use of two notes for each co-ordinate, the starting note of the scale and one corresponding to the particular horizontal location (likewise for the vertical). So, it was investigated if subjects could estimate the interval of the distance between the starting note and the ending note corresponding to the location communicated. Again here the reference scale was presented. We call this the reference-actual notes experiment (see section ). 3. Use of two notes, one for the horizontal location and one for vertical location. The reference scales were also presented to enable subjects to compare the presented note with its position within the scale. We call this the actual-notes experiment (see section ). The three experiments were carried out in sequence by subjects. Therefore learning effects might be expected. 'The Royal National Institute for the Blind, Loughborough. 147

162 Sequence of Notes Experiment In the sequence of notes experiment, 12 subjects participated. Subjects heard the scale of notes corresponding to the numbers from 1 to 40 for five times initially on the chromatic scale. Then they heard two subsets of notes taken from the same reference sequence of notes starting always from the root position and finishing at some intermediate position of the reference sequence. The first sequence communicated the horizontal axis location using a piano and the second the vertical axis location using an organ. Subjects heard (for each observation) the initial reference sequence and then three identical samples of the two subset sequences indicating the graphical location co-ordinates. The locations were presented to each subject at a random order. Subjects spoke the perceived co-ordinates which were recorded by the experimenter. The overall results are shown in table 4.1. The total number of measurements was 624 (312 for horizontal locations and another 312 for vertical locations ) out of which 139 were correct, 215 had an error of ±1, 112 had an error of ±2, 72 had an error of ±3, 38 had an error of ±4, 16 had an error of ±5, and the remaining 32 measurements had an error of more than ±5. Error 0 ±1 ±2 ±3 ±4 ±5 > ±5 Out of 624 measurements Distribution in % Relative frequency Probability on 1 measurement Table 4.1: An Overall Representation of the Relative Frequency of the Perceptual Error as Demonstrated by the Sample in the sequence of notes experiment. It can be seen that 74.6% of the measurements were within an error of ±2 and 86.1% within ±3. Figure 4.9 shows the mean of subjects perception plotted against the communicated locations for horizontal and vertical axes. A positive and linear relationship between the means and the stimuli presented can be identified (r=0.99) for vertical and horizontal locations. A more detailed representation of the data can be seen in figure 4.10 where the perceived frequency of horizontal (in the first diagram) and vertical (in the second diagram) locations plotted against the communicated ones are shown. The spread of error is smaller for short sequences (e.g., 2 or 5) and larger for longer sequences. Experiments with sequences of notes documented in section

163 shown no perceptual difference between the first and the seconds subsets of the sequences regardless of timbre used. Similarly, no difference can be seen for the first and second subsets of sequences communicating horizontal and vertical locations (see figure 4.10). y i! J 30 5 ] ~ * 15 8 "a ~ 10.L I '- _ I o x Co-ordinate Locations Presented Figure 4.9: A Scatter Plot of the Presented Horizontal and Vertical Locations with the Participants Perceived Mean in the sequence of notes experiment. Statistical values of correlations in horizontal co-ordinates were r=0.99, intercept=0.07, slope=1.003 and in vertical co-ordinates were r=0.99, intercept=0.20, slope=1.007 (critical value at ). 149

164 1 ~ " " " " " " " ~ " " " " " u " " " ~ " " ".. u.. u " u '" ~ " "», "! :» r " " l" " u.. " u " Figure 4.10: Perceived frequency of horizontal and vertical co-ordinates in Sequence of Notes Experiment (n=12). 150

165 Reference-Actual Notes Experiment Another experiment was then performed with the same 12 subjects but only after they have received at least thirty minutes training with using the musical representation of the horizontal and vertical co-ordinates (by means of using the AudioGraph on their own). The procedure remained the same as previously discussed, but this time, subjects heard the first note of the reference sequence (communicating the beginning of the scale) and a second note (communicating the finishing position within the scale). Thus, subjects heard two notes communicating the horizontal position and two notes communicating the vertical one. Co-ordinate locations were again presented in a random order for each subject. Error 0 ±1 ±2 ±3 ±4 ±5 > ±5 Out of 624 measurements Distribution in % Relative frequency Probability on 1 measurement Table 4.2: An Overall Representation of the Relative Frequency of the Perceptual Error as Demonstrated by the Sample in the reference-actual notes experiment. y 40 3S _ 1_ I 'a 10 -I I 1- I -I';:; 5 _ S 30 3S 4{)x Co-ordinate Locations Presented Figure 4.11: A Scatter Plot of the Presented Horizontal and Vertical Locations with the Participants Perceived Mean in the reference-actual notes experiment. Statistical values of correlations in horizontal co-ordinates were r=0.97, intercept=3.65, slope=0.83 and in vertical co-ordinates were r=0.98, intercept=3.37, slope=0.85. The overall error in their perception of the locations communicated are shown 151

166 in table 4.2. The total number of measurements was, again, 624 (312 for horizontal locations and another 312 for vertical locations) out of which 35 were correct, 79 had an error of ±1, 96 had an error of ±2, 99 had an error of ±3, 68 had an error of ±4, 61 had an error of ±5, and the remaining 186 measurements had an error of more than ±5. When the distribution of error is compared with the distribution observed in the sequence of notes experiment, shown earlier (see figures 4.9, 4.10 and table ), the 86.1% of perceived values within ±3 in sequence of notes experiment has dropped to the value of 49.5% in the reference-actual notes experiment. A positive linear relationship still derives when the perceived means are correlated against the communicated locations (r=0.9) as shown in figure However, a better representation of data is shown in figure 4.12 where the spread of the frequency of the perceived values for horizontal (see first diagram) and vertical co-ordinates is plotted against the communicated co-ordinates. It can be seen that the spread of perceived values is considerably larger in the perception of both horizontal and vertical co-ordinates compared with the sequence of notes experiment (see section ). 3The error distribution was 139, 215, 112, 72, 38, 16, and 32 measurements for errors 0, ±1, ±2, ±3, ±4, ±5, and >±5, respectively. 152

167 " K " 1" I 2'.'.71 'WUUUWUI'~""~Unn~~U~D».nn»~B.~.W... " u.. " " 1 2' WUUUK15"flu"~nnuuu~~~~»nn»M»~~~~~ "'""' Figure 4.12: Perceived frequency of horizontal and vertical co-ordinates in Reference Actual Notes Experiment (n=12). 153

168 Actual-Notes Experiment A further experiment was performed with the same subjects who were now more experienced with the reference scales used. The procedure remained the same but this time two notes only were used to communicate horizontal and vertical coordinates. Thus, subjects heard the reference sequence of notes five times, and then two notes representing the horizontal and vertical co-ordinates. Table 4.3 shows the error in perception which occurred with these trained and experienced listeners. Error 0 ±1 ±2 ±3 ±4 ±5 > ±5 Out of 624 measurements Distribution % Relative frequency Probability on 1 measurement Table 4.3: An Overall Representation of the Relative Frequency of the Perceptual Error as Demonstrated by the Sample in the Actual Notes Experiment. On overall, approximately one third (32%) of the measurements demonstrated an error of equal or greater to ±5 and two thirds (68%) of the measurements were within ±4. These results are very similar to the ones demonstrated from the reference actual notes experiment. This results show that in the absence of a reference note (see section ), subjects did not perceive location worse. Obviously, subjects had some experience with the scale used from the experiments participated previously in sequence of notes and actual-reference notes experiments (see sections and ). A positive linear relationship still derives when the perceived means are correlated against the communicated locations (r=0.9) as shown in figure Obviously, subjects were experienced with reference scales. This experiment differs from learning absolute pitch because subjects could also hear the scale from which the note was taken. 154

169 y _11 -I I 1l j 30 - I ~ 25 - ] 20 _11 II 1 -I 15 _I ''is I I 10 I-I I ~ 5 -:.I I x ~orclinate Locations Presented Figure 4.13: A Scatter Plot of the Presented Horizontal and Vertical Locations with the Participants Perceived Mean in actual notes experiment. Statistical values of correlations in horizontal co-ordinates were r=o.98, intercept=3.19, slope=o.89 and in vertical co-ordinates were r=o.98, intercept=3.52, slope=o

$~~... ~v J. ~'r.. ~ 'r -'r' 'Y"~ r r "r' " 010.. 1 +.. +..,..\.~ '!'''f f! f i + t... ~. f- f..,.. ~ i '.:.'.'::.'..:t..,."..'..'+.:.,...'1. '.'. '.~ '.~.. ~.~ l!...!...:... + '.1 :It ;, t,t'.+.1.+ +.$

170 ~~... ~v J. ~'r.. ~ 'r -'r' 'Y"~ r r "r' " ,..\.~ '!'''f f! f i + t... ~. f- f..,.. ~ i '.:.'.'::.'..:t..,."..'..'+.:.,...'1. '.'. '.~ '.~.. ~.~ l!...!...:... + '.1 :It ;, t,t' ~..,. t i ;' -t t t i '1 ~ '; ~ ~ v ~ ~ ~ ~ "1.. ; "~. 31 "'4"~ of. +,t.~ '4"-! 'i'.~. t '.. +.~...;.+.~.~..;. '4.. +-_.~..;.l. ".4 ~ T'" :n...1. _.. ~.;....l.) ~.. ~.. 1.i.~.~.1..f,.:.:,... ~ ' L...,! i.lt.~.. t..;~.l, \.. 1.~...\ '1..\....~., L..;. ~~... i.l.l.i.! \.ji.3.lli..j!j,::( f f f t.+ t l Ht + -f'+++"+..;.~.. + i + f 1 ~.j.. +. f 1 ~ ~ 1 1 II J M ".;... ~.~ +..+,+,.!.+.~ i.. +-,~""i" i.. i '1"~"'~ +'1.:~:.:r:+ i...:i". ~. ~t.'.. :i.:~~.:. J. ~.,t.... ',t:,~.',t:~...;.:l..... ~.::.,' u.."i "~.; f -I ~ "~.+ i t...;.. ~...;.-! ,... t. t.. ~: ~ ~"'.""'.,.... n... J. + ~.l ~ i J.+.-l.. f +!~ 1..l... ~.. +..! J.JJ.i".f.l..l. ~J' 'l'j~j J'2.'2.iJ:l:J: :., 31,, 4 t.~,,~.,.+. f of f.. l..; + i 1.f "+,, f ) ~. ".'.. f 2.f :..,f..,.. I...:.:.1-.),: :.7>,':...r...r. :l...r.!:.f.,... :,::.!:.f,. 30 :.~,::.;.r.:,'. J.l.. i, f +.+,,~.,L.l. ).If.. J.J f.+.+" -l.jj +l+.~+.~ 1 f ~... ~ ~ A..'. ~., 2.9 :::l.'.lt.;.~.~.t.~.. f..l. "L..l 1 ~.. 4"... ~ 2.f ++.+ fl~.. t:.~... ~. J 2.~.'L.11; J ~ ~ J 1 1 ;.:~ :( l..i. ~ -4-4.~... ~... ~ ~.'..'..t....t.,.:.~ :n.1..1.l~,, :,.. +.J::L(.L~,L.Ll..L.1.L.L.'\. ~ ~ ~ ~ ~ :.>.1'.1.l.~.~ 26 l.+ t f.+.. ~.. +.~. ++ t.1.ji...i.j..i i.ll.la i 2.:. L... l.l.l '2.' t l' f t T J ".l S ""1".1.+ ~... t "f f.~.+ "1"'1'''' i... 1 "~,:.. t'.+ T r l i l '1'..1::'+..[ ~,.. i... l.,. :... :,f.::.+ '.::,: :>,': :;."f.::~~,'.:.:.::,,~.:~:,:.:~.:...,.:::.:~':,~. '. :. j:1 :1':! '::; :t t:~ t.:t.1..;.tlt~.t ~~ f... ~T :+' :+:~.~:: ", T :t t:t ::j,, i'..,.. f"+...,,..:.;.. i..'...., ".'...'.. '. '. "',.. J2...;.+ i-~.~ t..; :. t " "t J't "'PP-i.';.l f t.. + ~ ~., T"". -. "Pi- '; t '';'1 t i i.. f -+.~ "~,' ",",... I.. f.. f ',i "t.'.+,,+. "1".l.. t "! +- t,.. + ~' I + ';..\... +,,+.. i i "t 20., ".f "," l. ".: i,,i i", i... r ::,I...,. I '" ' '1"1 "t" t "I t r "1 "t""t... r..."t::+.: t!" 1 1 t... t..,...,..!..."t"...i.'t.'j..r...!.;... 1.'.. i. '' i.. ~...;.. '. r..:.'... ':.'. '.'.,..,.. ~... ~. ~ T.J_.~ f ~ T ~ T T ~ _ ~ _ T 11 :' T ~ ~ ~. ;':' i i l.~.. ~,, i,,.~.. ~ ;. l- ',',....:.f,... r-.~... +.'..t.!~..'.l. ~.".;..;.;. ~~..',r,.".'.;.;. t.. f J!.. f J.j i J,f l.~... l.j.; T~r~.. t.~.. f.j.".~ ~... i f +... ',,~T. t f..' "... -.:... 1,. A r.. ~.. ~ J i AH.+.. ~.J.;, H.. t + 1'''1...r...r t f.. f ~,+ ++ t... ~ f.. ~ j" 1.~... l. L...; '1' +- 't,;. "1 14 _ Y'+ ' I.+.J.~.1.Jl.2.i "1.. j....; '1 '1"'1..;... ~..J...l t '1. j 11. Ll '; t + J 4 +~:f:~t.. +.l.j.j..:..1.1~.. +.J~.,;.l f.l i.~-.~..;. + + i of.j t -l...i 11.J.,l,.:! :t..:l:t:-t > i :.:1. " t 1 'tj"'f :1.:;:;,... :~" :t.::: t..:.! +.. i.:: :~ ::!.:~ t :::f...;:::! :j ::! :1 :! ::t::ia :t :i:::f::i~r~ffli:ii~ll:lt.l... ~. i J.. + t,"~.. ~ :! ::1 i ::~ :1.. t.1 :1'.:1: :l j.~... ~ 1.L~ 1 : 1 t t.. i.. t +.. i.j.. t.~ t.~.. t'".1 i.~... l...1.1i.j.1.~; +.2.f f ~+:~f f +.~ +J.;.l+. + i +J.; Ll. + t :1. t +- t:r t.. j, :JJJIJ IJ :LEt JJ :f:jj 1:i::J:J.:! t.+:1 ::i:":~ J l.l.. 4 Jf 2t 4- JP,; ~... ~ J 4. f.. J...; :1 J:.t :L~::.j J.:'1.. ~ f )..; ;.. ~.;.. t.+.;.. t.~... f...; ~.. j 2 J f.$.f 1f "t + p,[ J,,+ f 4 -+ 'f.+..; i,+.. i... ~ "t.+.. t.~ "1"'+ f i.. ~ f... ~... > :...;.. t.j " u " " ~ " n " 1 ~!.. u u " u.. n " u roUU13"~"~II~W~nnUllunH~~»~DM~"~M~~ ''''''' Figure 4.14: Perceived frequency of horizontal and vertical co-ordinates In Actual Notes Experiment (n=12). 156

171 Discussion of Location Experiments Figure 4.15 shows the subjects' mean of perception and standard deviation of the horizontal and vertical communicated locations using the sequences of notes, reference-actual notes and actual notes experiments. The sequences of notes experiments showed better location determination than the other two presentations (F==14.72, critical value 4.79 at 1%). Results from the sequence of notes experiment were different from the results in the actual notes experiment (t==2.03, critical value at 0.05). The performance of the sequences of notes was even more impressive because subjects did this first and therefore had minimum learning. But, is this 'best' method good enough to communicate abstract location with a reasonable accuracy? An advantage of using actual notes (say) for the cursor as opposed to referenceactual notes to communicate an abstract location within the graphical area is that it takes less time (two notes are heard, one for the horizontal co-ordinate and one for the vertical co-ordinate, as opposed to four notes, two for the horizontal co-ordinate and two for the vertical co-ordinate). An examination of the performance of the sequence of notes experiments over the graphical area gives a pointer as to how location might be improved. It can be seen that for short sequences, the accuracy is very good indeed (and probably means that the subjects are counting to some extent). Once the sequence exceeds about ten notes, counting becomes impossible and the subjects have to make an estimate based on pitch and time, so the accuracy decreases. This variation suggests a more accurate approach - that of using a combination of counting and pitch within a rising pi tch metaphor. As the notes rise up the scale, they are grouped into sets ten notes, with pauses between each group. The last set is usually incomplete (and is always smaller or equal to ten). Thus 36 would be heard as nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnn The listener need only count the first three groups and concentrate on working out the actual number in the final incomplete group. In this way, counting is used for the first three groups and pitch for the last. In theory, one could simply use notes of the same pitch for the complete groups and only use rising pitch for the final incomplete group. However, there is another advantage in using note sequences with rising pitch as opposed to a single pitch. The variable pitch metaphor might, also, be used to convey other information including shapes such as a circle or rectangle within the same audio space. 157

172 j j 1 r l l., "'... ~ """"- AcbW. N_ ~ 123<4 S 67' 9101!1,IS'4,Sl6171'192( fi """ I ~ +- AIlWal. :, NciM Elo:porlmeol. L- N<* Seqao..,.. ~ I,l/It: r i! ;;! 1; ':,il,tfi1\ 1:, '!! I i! S IOIl121314ISI S2tiZ l4SS363HUl40 Stinili Figure 4.15: An Overall Presentation showing the perceived mean and standard deviation of horizontal and vertical locations in the sequences of notes, referenceactual and actual-notes experiments. 158

173 4.4.2 Navigation of the Cursor The navigation of the cursor using the arrow keys within the graphical area was investigated with 12 blind subjects. A raised paper, indicating the graphical area, was divided into 64 small boxes clearly marked and was presented to subjects. Subjects were then requested to move the cursor in various boxes. The position of the cursor was communicated with a pair of notes and subjects used the star local scanning (see section 4.3.9) for an appreciation of distance between the cursor and the graphical area edges. The first note communicated the horizontal location using piano and the second the vertical location using organ. When the cursor encountered any of the edges in the graphical area then the cursor was relocated to a random location. This aimed to prevent subjects from counting steps from the edges of the graphical area. All subjects had some experience because they had previously participated in the location experiments. The boxes tested (see figure 4.17) were 1,3,6,8,10,12,13,15,17,19,22,24,26,28,29,31,34,36,37,39,41,43,46,48,50,52,53, 55,57,59, 62, and 64. Subjects were requested to move the cursor to each of the above boxes in a random order. The measurements for both horizontal and vertical locations were 768 out of which 384 were for horizontal locations and another 384 for vertical locations of the cursor. The times that the cursor was in an incorrect box horizontally was 61, vertically was 46, and for both at the same pair of measurements was 17 (124 in total), out of the total number of 768 measur~ments. Therefore, in other words, 260 measurements were in the correct box vertically and horizontally, 323 in the correct box vertically, and 338 in the correct box horizontally. Figure 4.16: The frequency distribution of correct and incorrect measurements in cursor navigation within the graphical drawing area. It can be seen in figure 4.17 that the outer boxes (e.g., 1, 8, 57, 64) demonstrated less error in cursor location than the inner boxes (e.g., 19, 28, 37, 39). This shows that the centre of the graphical area is more problematic compared to the outer parts. The star scan used to offer additional navigational cues was using sequences to communicate the distance between the cursor and an edge of the graphical area. 159

174 . Q... ~ ~ 0: 0 : (I : 3: 0 6: 0!It... ill 1 [~..!tit.:. figl.3 &2... 3:0 1:12:0 1:1 (; :.. ~ i &1. ~g...; g~~ ~... ~r ; ~8... ~... ~... ~... A 3:0...,:3: 2:1... 5:0... j.... l. ~... ~ 2:1 0:01:2 0:0.~ ~ ~ ~. 0,0 0:0: 0:0 0:0 ~ Incorrect Horizontalh Incorrect both Horizontally and Vertically Incorrect Vertically Box Nwnber Figure 4.17: Frequency of error horizontally, vertically and both in locating the cursor in various boxes. The shaded area shows the box number and the other three numbers (starting from bottom left) show the frequency (n=384) of the cursor being vertically out of the box co-ordinates, horizontally (top right number) and for both co-ordinates (diagonally to shaded box number). It was seen in experiments with sequences of notes that large sequences demonstrate wider error than short sequences (see sections and ). Therefore, when the cursor was located near the boundaries of the graphical area, the short sequence of notes (star local scanning) communicating the current location of the cursor, was perceived better than if the cursor was in inner parts of the graphical area. The small frequencies of error also add further evidence that the typical error in perceiving sequences of notes was not more than ±5 as it was also seen in section The results in percentages are shown in figure 4.18 suggest that movement of the cursor to the desired position (within the specified box) had a success of 62% for both co-ordinates. For 83% of the time, the position of the cursor was within the target box in horizontal co-ordinates only (this figure includes the correct horizontal coordinates from the 62%). The times that the cursor was within the box in vertical co-ordinates (again including the correct vertical co-ordinates from the 62%) was 79%. The contributing variables were the communication of co-ordinate locations using the actual notes method and the star local scanning to communicate the distance 160

175 .g ~ "" 70 :f 'Cl 40 * 30 ~ 20 f 10 Correct in Horizontal and vertical Correct in Horizontal Correct in vertical Cursor Navigation in Graphical Drawing Area Figure 4.18: Accuracy (in percentages) of cursor positioning in boxes. between the cursor and the boundaries of the graphical area. The results obtained for the cursor cannot be fully attributed to the communication of co-ordinate locations using the actual notes method. It has been seen in section that the results obtained using this methods were not as good. Therefore, the star scan has contributed the results at least equally (if not more) to the subjects perception of the current co-ordinate location of the cursor. It is also a result of learning because the subjects were previously exposed to the location experiments and thus they developed some skill in interpreting the scales used. However, all exposure to this musical stimuli has not exceeded two to three hours. It is also important to note that some counting may have taken place consciously or subconsciously in participating subjects although effort was directed to prevent it. Even so, there is certain evidence that music, as used in this experiment, has a potential in providing navigational cues in a two-dimensional space Communication of Graphical Shapes In evaluating the use of music in user identification and recognition of graphical drawing objects, there are two major questions which need to be answered based on the musical mapping of the graphical area. 1. Can a tonal sequence representing the horizontal and vertical co-ordinate outlines of a graphical object (e.g., circle) in the drawing area be recognisable to listeners without significant training? 161

176 2. How accurately can listeners perceive graphical objects after a short period of training? In order to investigate these two questions, two experiments were conducted. In the first experiment, blind subjects were asked to listen to sequences of notes produced by different instruments communicating horizontal and vertical graphical dimensions. However, subjects were not requested to associate instruments with graphical shapes. The graphical mapping used was expected to allow the user to interpret the shape and not to associate it with instruments. For example, we wanted to investigate if subjects could listen to the outline (say) of a rectangle and, by understanding the dimensions of each side, conclude that this is a rectangle. The different instruments used, however, were expected to help by communicating a change of direction (e.g., in the corners ofthe square). The music heard followed the principles of the musical mapping in the graphical drawing area (using consecutive pairs of notes in which the first note communicated horizontal locations and the second note vertical locations). Graphical objects were presented in a clockwise order as discussed in All subjects did not have any prior hearing of the music communicating graphical shapes but they had experience with the musical mapping of the graphical area. In the first experiment (no training), 18 subjects were requested to listen to each graphical shape for four times before they attempted to identify the graphical object. The graphical objects tested were presented at a random order for each subject (one subject was participating at a time). The graphical shapes presented had approximately similar dimensions which were ten units of the graphical area for the radius of the circle, ten units for each side of the square, ten by five units for the sides of the rectangle, and ten units for vertical and horizontal lines. In the second experiment, the same subjects were offered a short training (examples of the sound of each object was heard for five times). Subjects were then presented with random graphical objects of variable size and from different locations of the graphical drawing area (so although the presentation principles were the same, the actual notes were different because they were at different locations in the graphical area). Subjects heard each object three times before they gave an answer. The results of these two experiments are shown in figure Finally, in the third experiment, 5 subjects from the group which participated in the experiments above were presented again with the graphical objects but this time using piano for the horizontal co-ordinates and organ for the vertical co-ordinates. Again objects were presented in a random order and they had similar dimensions. 162

177 ." 100 ~ 90 8: 80 l! 70.~ 60.~ 50.. ~ 0 40 ~ ~ 30 i! 20 Circle Rectangle Square Horizontal Vertical Graphical Objects Line line : No training. rust time heard by participants : After short period of training Figure 4.19: A Graph Presentation of the participants perception of Graphical objects. Their recall rate reached 100% for the circle, vertical and horizontal line and 80% for the rectangle and square. This was because subjects were more experienced, but, most importantly, they were familiar with the piano communicating horizontal co-ordinates and organ communicating vertical co-ordinates from their participation in previous experiments. Thus, although the idea of using different timbre to signal simply a change in shape or direction appeared to be good initially, the use of the original instruments (piano and organ) helped them to realise easily when a horizontal or vertical shape was communicated. Subjects reported that they had a difficulty in working out which notes came first (produced from one timbre) and which notes came second (produced by another timbre) in a series of pairs of notes. They were indeed more familiar with the general principle that piano always communicates horizontal coordinates and organ always communicates vertical co-ordinates. In fact, the use of different timbres (apart from piano and organ which are the two major timbres used) for communication of shapes representationally violated the consistency of the musical mapping in the graphical area. However, the problem involved here is that when only piano and organ are exclusively used in a continuous (with pauses) message communicating co-ordinates, shape and size of a graphical object then it will 163

178 be difficult for the listener to quickly pick up the different information communicated at various parts of the musical message. It can be seen that the musical representation of the shape of a graphical object, using the corresponding notes of the object outline, was observed to be understood reasonably well by blind users. Some graphical objects were understood immediately by some blind users from the first time they were heard without any prior training. Once a short training session was given (i.e., users heard the musical sound of a shape for five times), the identification of graphical objects was much easier and confident even from the first time of presentation. It must be noted that subjects identified the shape of the graphical objects (e.g., circle, square) by following the pairs of notes (one for horizontal location and one for vertical location). In this way, listeners gradually created a 'mental image' of the shape communicated. For example, when the second note of the pair of notes (communicating a point) was unchanged while the first note was rising, in a series of pairs heard, then subjects realised that a horizontal movement was communicated. Similarly, when the first note of the pair was unchanged in pitch and the second note was falling then a vertical (down) movement was communicated. This was particularly important for assisting listeners to decide shape usually in rectangles, squares, and horizontal and vertical lines. The circle produced a distinct rhythm which appeared to be a stronger cue in subjects perception (subjects were observed to interpret it as a circle before the communication of the circle shape was complete). On the other hand, subjects were observed to wait to hear the whole shape and then by comparing the sides (say) for a rectangle conclude that it was a rectangle The Graphical Size Experiments In this section the perception of the dimensions of graphical objects is examined. An experiment with 5 blind subjects was performed. Subjects were first given an explanation of the principles of presentation (see section 4.3.7) of each graphical object (e.g., circle, square). Graphical objects of different dimensions were communicated to subjects. There were six rectangles, seven squares, four circles, seven horizontal lines and seven vertical lines. The presentation order of graphical objects was random for each subject. Subjects were presented with an object and they were requested (after listening to it for five times) to abstractly estimate its dimensions. They had to estimate the length of the radius for a circle, the length of a horizontal or vertical line, the length of one side of a square, and the length of the two sides in a rectangle. 164

179 This involved subjects first recognising the object (e.g., circle) and then estimate its dimensions (radius for a circle). The describing instrument was piano for the horizontal locations, and organ for the vertical locations. This was because in the light of the experimentation in section 4.4.3, subjects demonstrated a better accuracy when the original instruments of the graphical drawing area were used. " J: 0 " 0 " ~.. o 0 Figure 4.20: A graphical representation of the standard deviation produced from subjects' interpretation of the dimensions of the objects. The mean and standard deviation of subjects' interpretation of the circle's radius, and a side of a square is graphically shown in figure The rest with individual subject's perception can be seen in table 4.4. Rectangle Mean <7xn - 1 Square Mean O'xn $ H. Line Mean O'xn - 1 V. Line Mean O'xn Circle Mean O'xn - 1 Circle Mean O'xn Table 4.4: The means and standard deviations of subjects' perception of graphical objects dimensions. Horizontal length is shown first and vertical length second for the rectangle, the length for square, horizontal line and vertical line, and radius for the circle. It can be seen that the value of the standard deviations does not exceed 6 165

180 significantly. A typical value of the standard deviation is 3. Graphical objects of large size produced the highest values of standard deviations and as it can be seen smaller objects in size demonstrate a smaller deviation value. Figure 4.20 shows plots graphically the dimensions of squares, rectangles (one side) and circles against the perceived values. The standard deviations for those objects are also shown. The results produced positive significant correlations for the size of the rectangles (side 1: Pearson's r=0.9878, side 2: r=0.9788, df=4, critical value at at 0.05%, one-tail test), squares (r=0.9971, df=5, critical value at at 0.05%), circles (r=0.9960, df=2, critical value at 0.5%), horizontal lines (r=0.9983, df=5, critical value at at 0.05%), and vertical lines (r=0.9983, df 5, critical value at at 0.05%). It can be seen that for the circle and one side of the rectangle the statistical values (r) are just exceeding the critical values for statistical significance. Subjects were able to ascertain the dimension of objects to a reasonable accuracy (typically to within 10%). The results were very similar for almost all objects examined. This shows the potential of music to communicate abstractly the dimensions of a graphical object The Cursor's Local Scanning Experiments The star local scanning had informal 'trial and error' evaluation before it was finalised in its present state. The primary objective of this local space scanning technique is not to communicate using music the entire diagram but to inform the user about the available space within the graphical area and the objects located close to the current location of the cursor. Also, it was designed to help the user to control cursor navigation and to be able to estimate the distance which needed to be covered by the cursor to reach a particular object (see section 4.3.9) Identifying Position of the Cursor In order to evaluate this technique, a set of objects on a raised paper were presented to 14 blind users. Four positions of the cursor were predetermined as shown in figure The subjects did not know the positions of the cursor as shown in the figure 4.21 but they were allowed to study the arrangement of the objects so they could determine where the cursor was located when the local scan was heard. 166

181 Subjects were requested to identify the box number within which they thought the cursor ought to be. The positions of the cursor from which the star scan was initiated were presented to each subject in a random order. The local scan was played for up to five times for each cursor position. Keys 9 17 [W!@!1 Graphical ~~_bngi Objects --Lines ".. ~.. _... _..:... ~ ;. _... ~. _ ~... _... ;. _.. _ Separation of Graphical Area inboxes! 1! Box Number : : I Cursor Positions (Unseen from Subjects) Figure 4.21: The raised paper presented to subjects. The positions 1, 2, 3, and 4 of the cursor were not included in the raised paper presented. They are shown here in order to give to the reader an understanding of the positions of the cursor from which the star local scan was heard. Results of this experiment are shown below: Cursor position 1 (actual box number was 19): 11 subjects (or 78.5%) perceived box number 19, 3 subjects (or 21.4%) perceived box number 20. Cursor position 2 (actual box number was 44): 10 subjects (or 71.4%) perceived box number 44,2 subjects (or 14.2%) perceived box number 36, 1 subject (or 7.1%) perceived box number 37, and 1 subject (or 7.1%) perceived box number 43. Cursor position 3 (actual box number was 55): 7 subjects (or 50%) perceived box number 55, 3 subjects (or 21.4%) perceived box number 47, 2 subjects (or 14.2%) perceived box number 56, and 1 subject (or 7.1%) perceived box number 54. Cursor position 4 (actual box number was 14): 6 subjects (or 42.8%) perceived box number 14, 5 subjects (or 35.7%) perceived box number 22, 1 subject (or 167

182 7.1%) perceived box number 30, and 1 subject (or 7.1%) perceived box number 15. The results show the star local scan communicated information which enabled most of the subjects to identify either the correct box within which the cursor was located or a nearby one. There were a number of observations made during experimentation. First, intensive concentration was required by subjects who found it hard to follow the selected clock hours. In the light of knowledge gained in user interviewing, four beams would have been simpler and thus requiring less concentration. Second, when subjects lost track of the scan angles, the information communicated was meaningless. Again, it can be seen here that the metaphor used was sequences of notes which offered abstract information about the available space and objects encountered around the cursor Identifying Objects Around the Cursor The objective of this experiment was to identify if subjects could perceive the relative distance of objects from the current location of the cursor and also from the boundaries of the graphical area. Six subjects were used for this experiment. Subjects had some experience because they had been exposed in these scans in the experiments in section l. Figure 4.22 shown the location of the graphical objects which were not shown to the subjects and the locations of the cursor which was presented on a raised paper to subjects so they could identify the box numbers around the cursor which they thought that graphical objects were accommodated. This experiment aims to identify if subjects could perceive the gross positions within the graphical drawing area, not the difference between (say) box 2 and 3. Results of this experiment using the star local scan are shown in tables 4.5 and 4.6. As it can be seen in table 4.5, all subjects perceived the rectangle 'RI' was near box 3, 66% perceived the square 'S2' to be around the box 8 and 33% around the box 7, 50% perceived the rectangle 'R2' to be around the box 7 and the remaining 50% perceived it in box 11, and 83% perceived square 'SI' to be around the box 5. As it can be seen in table 4.6, 66% perceived rectangle 'R2' at around the box 7 and 33% at around the box 11, all subjects perceived rectangle 'RI' to be at around box 3, 83% perceived circle 'C2' at around the box 12, 66% perceived circle 'Cl' at around the box 15, 83% perceived rectangle 'R3' to be at around box 9, and 66% perceived square 'SI' to be at around box

183 Keys 'Jiffi ~~~!:cal (Unseen by Subjects) Separation of Graphical... Area in Boxes, 1 :Box Number... I Cursor Positions (Presented to Subjects)...;:..- Scan Paths ~ (Unseen by Subjects) Figure 4.22: The graphical objects requested to be identified from 1 and 2 positions of the cursor. subjects RI 52 R Table 4.5: The box numbers which subjects thought contained the graphical objects encountered from the star local scan from the cursor position' l' (see figure 4.22). subjects R2 RI C2 Cl R Table 4.6: The box numbers which subjects thought contained the graphical objects encountered from the star local scan from the cursor position '2' (see figure 4.22). 169

184 The star scan, which relies on some clockwise hours sequence to denote direction and a sequence of notes to project the distance, assisted in users understanding the position of objects in relation to the cursor but not the distance between the objects themselves. The results indicate that the principles used in this local scan helps users to understand the position of the cursor in relation to graphical objects and vice versa. However, it was requiring concentration and quite a few repetitions (maximum five in the experiment) to be heard. Again, if the subjects did not properly understand the order of presentation then the information was not meaningful Experiments with Editing The AudioGraph offers a number of earcons to communicate editing features such as undo, expand or contract. These earcons are heard from the user control panel when the cursor is within the option's box. In the first experiment, 10 blind subjects were presented with one musical representation of an editing operation at a time and were asked, without having any specific prior knowledge of the sound of the earcon, to interpret the musical message. The earcons used were contract, expand, and undo. The duration of each note in the earcons is 0.15 seconds and the total time was 3 seconds. They were constructed using the major scale (starting from middle C)4 in the following way: Expand: Contract: NI NI N2 NI N2 N3 NI N2 N3 N4 NI N2 N3 N4 NI N2 N3 NI N2 NI (0.5 second delay) (0.5 second delay) (0.5 second delay) (0.5 second delay) (0.5 second delay) (0.5 second delay) Undo: NlN9N3N4 (0.5 second delay) NI N2 N3 N4 (0.5 second delay) 4It must be noted that the rhythm primarily communicates information in these earcons. 170

185 A short pause of 0.3 second was in between the notes. In the expand, the rhythm expands from one note, to two notes, to three notes, and finally to four notes. The contract is the opposite, the rhythm becames smaller starting with four notes, decreasing to three notes, to two notes, and finally to one note. In the undo the first sequence of four notes in ascending pitch is played but the second note is out of pitch continuity comparing to the rest of the notes. The second sequence corrects the second note into a note which its pitch does not interrupt the good continuity principle of ascending pitch. More specifically, the editing operations were evaluated in the following manner: 1. No training was offered to the subjects before the experiment. They were only told that they were going to listen to three musical editing messages which represented an operation to change the state of the graphical object but without presenting any target object. Subjects were told that it could be, for example, expand, contract, or undo. 2. The participants heard for the first time the melodies representing expand, contract and undo (five times for each one) in the following manner: (a) Each subject listened to a random order the earcons expand, contract, and undo. (b) Subjects were requested once they have heard all three earcons to assign meaning to each of them in terms of identifying which one communicates expand, contract, or undo. 3. The subjects had also to explain in each of their answers, the reason( s) why they had selected one editing operation as opposed to another. The results are shown in figure As it can be seen, the earcons expand and contract were interpreted reasonably well although it was the first time they were heard. The undo though did not have the same success as the other two. The successful interpretation was 80% for expand, 50% for contract and 10% for undo. It was expected that short training would produce better results in terms of accurate perception. For this reason, a further experiment was conducted with the intention of examining if this was true. The same subjects were trained by announcing the editing operation and playing the corresponding musical message immediately afterwards for five time. This was repeated for the other two operations. Then the ten subjects were played random examples of the earcons once and they 171

186 .g 100 e- 90 &! 80 El " la 0.. SO "0 40 ~ ~ 30 j 20 10! EXPAND CONTRACT UNDO Editing Operations : No training. ftrst time heard by participants : After short period of training Figure 4.23: A Graph Presentation of the subjects perception of editing operations. were asked to identify the particular editing operation from these three. The results, as shown in figure 4.23, reached 100% for expand, contract, and undo. It can been seen that, without prior knowledge, the expand operation was understood intuitively because it has a metaphorical nature. This should be contrasted with the undo operation which was not understood by 90% of the participants. It appears that expanding or reducing a number of notes, to communicate the operations expand and contract, was a closer mapping than the one used for undo where an erroneous (erroneous because the second note in the sequence of a the four notes was not in a consecutive pitch order with the rest of the notes) sequence was sounded and a second sequence corrected the error. In the latter case, repetition represents correction rather than progress. Even a small amount of training made a dramatic difference to the recognition rate in all three cases. Recognition rate rose to 100% in all cases. In this case, what mattered was not the intuitive nature of the signals but rather the recognisably different musical structures. This implies that blind users ought to be able to understand the editing earcons used in the AudioGraph with minimum training. Another experimental objective was to test editing tasks using real objects within the graphical drawing area, as well as the cursor, to select editing options from the 172

187 control panel and apply them upon objects. Thus, the following tasks were tested with 10 blind subjects: 1. Create an object (e.g., circle) and place in a particular position in the graphical drawing area. 2. Move, expand and contract graphical objects in the graphical drawing area. 3. Save a set of objects in a file communicated musically using the save file option in the control panel. 4. Load a file and perform editing operations upon the objects. These tasks, in the context of the AudioGraph, involved the user in relocating the auditory cursor from the graphical drawing area to the control panel (by pressing the space bar once) and vice versa. The general principles involved are: When the outline of an object in the graphical area is under the cursor then a single note (reasonably loud so it is not missed) communicates it. At this stage, the user knows that an object has been encountered. The user can hear the shape of the graphical object by pressing an F key (F2). The object can be left as the cursor continues its movement or it can be selected by pressing any key (apart from F keys, space bar, and arrow keys) two times. If an object is selected then the user can relocate the cursor to the control panel where options such as expand, contract or move can be selected. As objects are manipulated (moved, expanded or contracted), another note (reasonably loud so it is not missed) communicates to the user the number of arrow keystrokes pressed. On completion, an object can be released by pressing again any key (apart from F keys, space bar, and arrow keys) two times. The cursor always returns to the position it had before the graphical drawing area was left when relocation between graphical area and control panel takes place. In these conditions, 10 subjects were asked to create, expand, contract and move objects as shown in figure Subjects were first requested to create the graphical 173

188 , D P,,. ;0:.. ;o:... :t:> Figure 4.24: A graphical illustration of the create, move, expand and contract tasks requested from the subjects in the experiment. objects shown in figure These objects were presented to subjects on a raised paper. Subjects were requested to expand circle 'C' by five steps (each step was equivalent to a single press on the arrow keys), square 'E' by two steps, rectangle 'A' from its right side by five steps (right arrow key had to be pressed), and lines 'B' (from the bottom end) and 'D' (from the right end) by three steps. On completion of the above, subjects were requested to move objects around the graphical area (see figure 4.24). The rectangle 'A' was requested to be moved to the location of square 'E' and square 'E' to the location of rectangle 'A'. Likewise for the lines as shown in the figure The subjects were then requested to save the graphical objects by using file save option of the AudioGraph and then load the content back on. The next step was to request subject to contract and move all objects to their original dimensions and positions respectively. Subjects were observed in performing this task oriented experiment by keeping a log on the success or failure, attempts for each action. All subjects performed the task with a reasonable accuracy. Due to counting graphical objects were typically contracted and expanded correctly. All subjects were observed to use the boundaries of the graphical area as a reference point. Two subjects did not expand or contract correctly due to the fact that they lost track of counting and they did not have any reference point about the steps they had already expanded or contracted. This strongly suggests that an undo operation not 174

189 only for the last action but for the last few actions (user controllable) is necessary. The nature of this operation should be to bring an object to one of its previous states or the size was before it was manipulated. However, the actual task itself was not too complex. It shows though that simple interaction in the AudioGraph was not difficult but it did required concentration. General observations made are that subjects: 1. Used the distinctive tune communicating that an edge had been encountered for reference purposes. 2. Used the earcons of contract and undo as reference points when they wanted to select expand and vice versa in the control panel. 3. Counted to keep track of the steps for expansion, contraction or a movement of an object. 4. Forgot to select objects prior to the selection of an editing operation. 5. Listened to graphical shape outline to find out abstract location of the object within the graphical area when they had lost track of counting and position in the graphical area. Another important general observation made was that editing would have been easier when a collection of objects had to be expanded, contracted or moved. The user should be able to pre-select an operation (e.g., expand) and then expand objects in the graphical area as they are encountered by the cursor rather than selecting a particular object first and then editing individually. However, the automatic transfer of cursor to the object in the graphical area, after an editing operation had been selected from the control panel, was observed to increase subjects' confidence that they were editing the right object. Finally, it must be noted that subjects relied on counting quite often (which is fine for blind user interaction if they are comfortable with counting), but for the purpose of our investigation in musical information processing, experiments measuring the perception of a set of objects (forming diagrams) communicated using music was necessary Experiments in Perception of Diagrams In earlier sections (4.4.3 and 4.4.4), messages derived from the musical mapping of the graphical area to communicate co-ordinates, graphical objects and their 175

190 dimensions have been examined. The objective in the experiments of this section was to examine the capability of listeners perceiving and interpreting a set of graphical objects forming a diagram. Therefore, experiments were performed in which: Arbitrarily arranged graphical objects were communicated using the top left bottom right, centre and ascending scanning orders (see section ). Meaningfully arranged graphical objects (at least for the visual sense) were communicated in the absence and in the presence of a perceptual context (created by offering a 'hint'to subjects) or expectation (see section ). Meaningfully arranged graphical objects were communicated to subjects who had different perceptual contexts by offering different 'hints' to each of them (see section ). Meaningfully arranged graphical objects (forming diagrams) were communicated to subjects who were requested to categorise them according to their interpretation of the meaning of the communicated diagrams (see section ). All the above experiments aim to identify other contributing factors in the interpretation of diagrams apart from the musical stimuli itself. In particular, the effect of context and expectation is being examined Arbitrarily Arranged Objects An experiment was performed to investigate the perception and interpretation of subjects who were presented with a set of arbitrarily arranged objects. The two sets of graphical objects which were communicated are shown in the top left corners of figures 4.25 and The top-down, centre and ascending scanning presentation orders were used. Six blind subjects participated in total of which three were presented one set of objects (one scanning order for each) and the other three were the other set of objects (again one scanning order for each). Subjects were requested to: 1. Listen to the musically scanned diagram 3 times. The ability to pause and resume was offered. 2. Draw the perceived graphical information on raised paper provided. 176

191 In asking a blind user to reproduce a diagram, one must consider the probability that errors might be introduced during the drawing exercise itself. A raised paper grid was provided with clear horizontal and vertical lines which identified the various positions within the grid. The raised paper was divided into 64 small boxes. It should again be mentioned here that the objective of these experiments is to test relative location reproduction and not absolute location. Absolute location of coordinates can be easily achieved using speech. The results of the perception with the three scanning methods are shown in figures 4.25 and It is important to know that there are serious problems with the evaluation of this experiment. The problems stem from the difficulty subjects faced in drawing objects over a raised paper grid. The actual drawn lines include this error as well as the errors of perception. 0 0 l~ 0 I 0 ORIOINAL TOP DOWN CENTRE ASCENDING Figure 4.25: The drawings subjects produced from the various scannings. On the other hand, one must note that results of the grouped sequences of notes used to communicate particular locations within the graphical area (see section 4.4.1) and results of perceived dimensions of shapes (see section 4.4.4) indicated that the subjects' interpretation of the music was better than what we see here. The obvious expectation was to observe similar results as for the experiments with scannings. Even when some error, possibly introduced by subjects during drawing, is allowed for, these results are still not as good as might have been expected in the light of the previous experiments. For example, consider the two circles in the ascending 177

192 ORIGINAL TOP OOWN DDO VD 0 D CImRE ~ ASCENDING ~Qag Figure 4.26: The drawings subjects produced from the various scannings. scan of figure The ascending scan along with the music describing the object has clearly communicated that one circle is bigger than the other but both circles have either been perceived or drawn incorrectly. Why there is poorer perception? Experiments in location (section 4.4.1) and graphical size (section 4.4.4) measured perception of one location or one graphical object at a time. Experiments with scannings (as reported above) measure perception of a set of objects arranged in a non meaningful manner. There are clearly greater memory demands in listening and reproducing (in drawing) a set of arbitrarily arranged objects than one object in isolation. The fact that the arrangement of the objects did not represent anything meaningful, considerably overloaded the memory of the subjects. The above results imply that when no drawing activity is involved, and a meaningful arrangement of objects (e.g., the shape of letter 'E') is being communicated then the memory load ought to be reduced and results improved. This is examined in next experiment Meaningfully Arranged Objects The objective of these experiments was to investigate subjects' perception when a meaningful (at least visually) arrangement of objects was communicated using 178

193 music. The experiment had two phases: 1. No initial knowledge of the concepts was communicated. 2. An expectation about the concept of the diagram in the form of a 'hint' offered (e.g., type of vehicle or data presentation method). The first case is rare for blind people but is possible in certain circumstances. For example, it could be a university campus map, or a town map showing construction work on roads which need to be avoided by the blind people. The second case is common. Almost all diagrams or figures have a short note or description about what is presented (e.g., a top down representation of a computerised theatre booking system or a graph presentation of research findings). The hint offered in the experiment here will be substituted from the description of the figure in real life. In order to investigate if an expectation in a perceptual context would assist listeners in assigning meaning to a diagram communicated using music, an experiment with two groups (control and experimental) was performed. The control group of six blind subjects was presented the diagrams shown in figure No semantic guidance or hint was offered so that subjects did not have an expectation or perceptual context. All subjects were between 21 and 39 years of age. All had visual experience and they were speech users. They had no experience with braille. c:=j lrrllillln ~ [ Diagram 1 Diagmm 2 Diagram 3 Diagram 4 Figure 4.27: Diagrams presented in the control and experimental groups. The order of presentation of the diagrams was (see figure 4.27): Subject 1 : Diagram 1, Diagram 2, Diagram 3, Diagram 4. Subject 2 : Diagram 4, Diagram 3, Diagram 2, Diagram 1. 5The term concept means a general idea Df the content of a diagram. For example, 'a data representation', 'type ofvehicie', or 'letter of the alphabet'. 179

194 Subject 3 : Diagram 2, Diagram 4, Diagram 1, Diagram 3. Subject 4 : Diagram 3, Diagram 1, Diagram 4, Diagram 2. Subject 5 : Diagram 3, Diagram 2, Diagram 1, Diagram 4. Subject 6 : Diagram 4, Diagram 1, Diagram 3, Diagram 2. The experimental group, also with six blind subjects, was presented with the diagrams shown in figure 4.27 in the same presentation order as described above. However, this group was offered a semantic guidance to create an expectation or perceptual context in the subjects. The descriptions offered prior to the communication of each diagram (see figure 4.27) were: 1. 'Type of vehicle' for diagram 'Method of data representation' for diagram 'Number' for diagram 'Letter of the alphabet' for diagram 4. The centre scan was used for both groups. The results are shown in table 4.7. CAR GRAPH NUMBER '3' LETTER 'E' No Context 0(0%) 0(0%) 2 (33%) 3 (50%) 1 Context 16 (100%) 1 4 (66%) 1 6 (100%) 6 (100%) Table 4.7: The results of the two groups. In the control group (no semantic guidance), 3 subjects (50%) interpreted the diagram 4 as a shape which resembles the letter 'E', 2 subjects (33%) interpreted the diagram 3 as a shape which resembles the number '3' and the rest of the subjects failed to assign any meaning to any of the diagrams. However, all of them identified the individual graphical objects and their abstract dimensions but, did not assign any meaning to the whole diagram. On the other hand, the results of the experimental group (semantic guidance) were significantly better. All of the six subjects assigned the meaning 'car with two wheels' when diagram 1 was communicated, the shape '3' for diagram 3, the letter 'E' for diagram 4, and 4 subjects interpreted the diagram 2 as a 'graph representation' (see diagrams in figure 4.27). However, one must consider that subjects did not exactly describe 'car with two wheels'. Subjects offered descriptions which matched the meaning 'car with two wheels'. These results underline that in the absence of 180

195 a semantic guidance (an expectation in a particular perceptual context), subjects had a major difficulty to assign a meaning (e.g., shape of a letter) to a diagram communicated using the musical mapping of the AudioGraph. Subjects interpreted the music of individual objects but they could not (see table 4.7) interpret what was the meaning of all objects together. However, the creation of a perceptual context with a rather narrow expectation (e.g., number) assisted subjects in assigning a meaning to a set of objects by using their expectation to bridge perceptual gaps (e.g., distances in between objects not properly interpreted) Interpretation under Different Perceptual Contexts Assuming that the listeners perceptual context has a direct influence upon the interpretation of the music used to communicate the diagrams then can one conclude that a change in listeners' perceptual context will cause a different interpretation of the music (associated with the same diagram) even if it is exactly the same music or diagram? In order to investigate this, a further experiment was performed. Five blind subjects (of the ones who were in the control group in the previous experiment, section ) were presented with the diagrams shown in figure [ ] mm ~ Diagram 1 Diagram 2 Diagram 3 Diagram 4 Figure 4.28: The diagrams presented. The subjects had experience with the AudioGraph musical output but not with these particular drawings. The presentation order of the diagrams and 'hints' offered in each of the subjects are given below. Subjects Presentation Order Semantic Guidance 'Hint' SI 1,2,3,4 No 'hint' S2 4,3,2,1 Upper Case Letter S3 2,4,1,3 Upper Case Letter (Right or Rotated Angle) S4 3, 1, 4, 2 Number S5 4, 1,3,2 Number (Right or Rotated Angle) 181

196 The results of this experiment are shown below: Diagram No Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Diagram 1 - E E - 3 Diagram E 3 3 Diagram E - 3 Diagram E - 3 The results indicate that the communicated graphical information usmg the musical mapping of the graphical area is interpreted by subjects as a random combination of objects in the absence of a perceptual context or expectation. However, in the presence of a an expectation, the graphical information communicated is interpreted into a meaningful shape. The results also show that the interpretation is influenced by the subjects expectation. This means that perceptual context or expectation has a direct and contributing role in the interpretation of the music used to communicate the graphical objects. Thus, the design of a musical mapping is one variable which contributes to the listener's perception and interpretation, but simultaneously, the creation of an appropriate (in terms of the communicated meaning) perceptual context in the listener is another contributing variable. As can be seen, the absence or inappropriate creation of a perceptual context will result in a lack of a meaningful interpretation by the listener although individual musical mappings (e.g., a graphical object) are understood Categorising Diagrams According to Content This experiment aimed to investigate the perception of changes occurring in a diagram which subjects had previously studied (for 5 minutes) on raised paper. The study of the raised paper was intended to assist blind subjects in understanding the content of the diagram. This situation is expected to be the most common one encountered in real life. Almost everyone (especially all of the blind subjects who participated in our evaluation acquired blindness at some point in their life), drawing a diagram either on paper or on the computer has at least some partial expectation of the look of the final drawing. When a blind user is engaged in drawing a diagram, and he listens to it (say a partly complete one, for instance), then uses his expectation as a reference point for comparison with the actual drawn diagram communicated using music. In order to address this objective, an experiment with six subjects was performed. 182

197 Subjects were communicated a set of diagrams as shown in figure Subjects did not declare to be tactile users. [w.. 1 [g~~ [ In_off"""" I I TypeolVdide! I Dor. PRttmIioa I I ~ or Ilig!t I l~w[jj [ Vlllodon A I~~~ [ Vuillloft B I Vori",""C 18 [ Vorilllon D 1[J5]~~ I!:J II~I rn Figure 4.29: The initial and subsequent arrangement of graphical objects in the evaluation. The presentation order was the following: Subject 1: 2D, lb, 3B, 2A, 2C, 3C, 3D, lc, 3A, ID, la, 2B. Subject 2: lb, 3C, ID, 3B, 2A, 3D, 2C, la, lc, 3A, 2B, 2D. Subject 3: 3B, ID, la, 3C, lb, 3A, 2B, lc, 2D, 2C, 2A, 3D. Subject 4: ID, 2C, 3A, lb, 3C, 2A, 3B, 2D, lc, la, 3D, 2B. Subject 5: 2A, ld, 3C, 3A, la, lc, 3D, 2C, 3B, 2D, 2B, lb. Subject 6: 2C, la, lc, 3C, 2A, lb, 2D, ID, 3D, 2B, 3B, 3A. Apart from the initial 'hint', no further semantic guidance was offered. Subjects were requested to classify the diagrams heard (at the end of the communication of each diagram) in four categories according to their interpretation of the diagram's meaning. With reference to figure 4.29, the categories used were: Category 1: Diagram 1. Perceptual Context: Type of Vehicle. Category 2: Diagram 2. Perceptual Context: Method of Data Representation. Category 3: Diagram 3. Perceptual Context: Letter or Number. Category 4: Diagram cannot be categorised in 1, 2, or 3. The results of the individual subjects are shown below. 183

198 Subjects CATEGORY 1 CATEGORY 2 CATEGORY 3 CATEGORY 4 SI 1C, la 2D, 2A, 2B 3B, 3C, 3A lb, 2C, 3D, ID S2 lb, la 3D, 2B, 2D, 2A 3A 3C, ID, 3B, 2C, lc S3 ID, la, lb, lc 2B, 2D, 2A 3B, 3A, 3D 3C,2C S4 ID, lb, lc, la 2A, 2D, 2B 3A, 3C, 3B 3D,2C S5 ID, la, lc 2A, 2B, 2D 3C, 3B, 3A 3D, 2C, 1B S6 la 2A, 2D, 2D 3B,3A 2C, lc, 3C, lb, ID, 3D The results can also be presented more clearly in the table below. I CATEGORY 1 I CATEGORY 2 I CATEGORY 3 I CATEGORY 4 I la B lc D A B C D A B C D It can been seen that diagrams such as IB, IC, ID, 3C and 3D confused subjects' interpretation. Many of them (at around 50%) thought that they should be placed in the category 4. It can also been seen that diagram 2C was assigned to category 4 by all of the subjects. The results show that, for graphical objects resembling well known entities, it is possible for the users not only to understand the initial arrangement of graphical objects but, also, follow subsequent changes and variation within it. To repeat, it is particularly important to remark that in real life, the user will have at least some partial idea of the content of a third party diagram. The possibility of a blind user listening to a diagram with no prior knowledge or verbal explanation is rather rare in the context of the situations that this system is intended to be used in. More often, one can argue, blind users will often create diagrams themselves. The fact that subjects realised that the number '3' was altered to a letter 'E' for instance, shows that subjects' listening was active in terms of comparing between the previous knowledge of the whole shape with the new one presented. Subjects 184

199 reported in interviews, as one may have expected, that 'remembering objects is not difficult because of the resemblance to a well known shape '. For example, the number '3' constructed with rectangles, or a peculiar car with a circle and a small box as its main body and two circles for its two wheels had such resemblances. Subjects had an expectation of what was to be presented once they had establish a knowledge of the whole arrangements of objects in question. When an expectation (say for a rectangle) was not fulfilled then the new presented object allowed them to create a new mental model and imagine how the new whole shape presented might be. It is important to note that in the event of users listening to their own created diagrams, it is certainly anticipated that an expectation of the desired arrangement of the objects will exist in their imagination. Thus, while listening, an active comparison between the pursued arrangement of the objects and the one created and communicated with music will take place. Finally, the comment which needs to be made here is that memorability increases because subjects were remembering the shape (eg., letter 'E') and possible variations and not because of the music itself. Music was simply interpreted in relation to their expectation. Although one object presented musically can be memorable, a set of objects arbitrarily arranged is not unless the listener can make an association of the presented shapes with some well known shape or entity (e.g., a car with three wheels). However, in order for the user to make an association, a hint which creates a user expectation is essential. The listeners' expectation (from the hint provided) has been shown to play a key role in interpreting musical messages. The phenomenon seen here is a generic property of music. A musical message will be interpreted in one way under one particular expectation (created from one hint offered to the user) but the same musical message will be interpreted in another way under the influence of some other expectation (created from some other hint). This was also partially observed in the experiments with sample pete OMS data where the same musical structures were used to communicate different types of data under different user expectations. 4.5 Feedback from Structured Interviews First, one must state that all people who participated in our experiments and in these structured interviews were people who responded to RNIB calls for voluntary participation in these experiments. They were well aware of the use of music as a communication metaphor and, perhaps, the ones opposed to such an idea did not 185

200 come forward. One might argue that the samples used were not only opportunistic but also in favour of AudioGraph's objectives. The subjects were interested to tryout such a program and at the same time were interested to see if music had something to offer them. They wanted to find out how difficult it was to use music? If they could use it for their academic work (90% were in higher education)? All of them were speech users and did not show any particular musical knowledge from the questionnaires. They were very concerned over the limited access they had to computers with visual interfaces. Some were studying computing. They were also concerned with finding new ways of communicating. They were willing to spend much (unpaid) time in tedious and tiring experimental procedures (let alone the experimental trials and the free sessions with task oriented activity). Most of them did with great patience. The author is grateful to all of them. Twenty blind people were interviewed. Eighteen of them had participated in one or another experiment and the rest had offered demonstrations of the musical mappings. 1. How would you rate, in overall, the following in terms of your understanding;6 (a) Overall musical representation of the AudioGraph. (b) Graphical drawing area representation. (c) Sequential pitch for distance identification. (d) User control panel. (e) Earcons for communication of expand, contract, and undo. (f) A uditory cursor. (g) Graphical objects auditory-musical presentation. (h) Scanning from cursor. (i) Scannings of diagrams. The answers to the above questions are shown below: 6The scale used in this question was: Very Poor - Poor- Average - Good - Very Good 186

201 Questions V. Poor Poor Average Good V. Good (a) Overall - 2 (10%) 6 (30%) 8 (40%) 4 (20%) (b) Graphical area - 1 (5%) 2 (10%) 14 (70%) 3 (15%) (c) Sequential pitch - 1 (5%) 12 (60%) 5 (25%) 2 (10%) (d) Control panel (15%) 15 (75%) 2 (10%) ( e) Expand, contract (10%) 18 (90%) (f) Auditory cursor (10%) 17 (85%) 1 (5%) (g) Graphical objects 1 (5%) 2 (10%) 5 (25%) 9 (45%) 3 (15%) (h) Local scanning (55%) 6 (30%) 3 (15%) (i) Diagram scannings 1 (5%) 2 (10%) 6 (30%) 7 (35%) 4 (20%) As it can be seen in the above table, most of the replies (at around 80%) fall in the average, good, and very good. However, there are also replies (at around 15% to 20%) which fall within the poor and very poor. They were also, open ended questions so that we could find out more about their general views with regard to the musical mappings used but also their general views of music as a communication metaphor. 1. Did you generally have difficulty in understanding the auditory messages in the AudioGraph? (This question was expected to be answered with 'yes or no' with comments and further questioning and discussion took place in investigating their views.) The general reply was 'no' (80%). Most of them remarked that some of the music was presented quickly and it required intensive concentration but, as they gained experience less concentration was required. In addition, most of the subjects said that the musical stimuli were somewhat 'strange' initially but not when they understood its communicated meaning. If so, what do you believe were the reasons for creating a difficulty to understand the auditory messages? (e.g., speed, structure of message). Replies to this question pointed out that music can be easily misinterpreted and it requires concentration which in some cases needs to be undivided along with remembering information communicated via music earlier (e.g. diagrams).. If so, do you think that these difficulties reduced as you became more familiar with the tool and listened to auditory messages more times? 187

202 Most of the people replied that the initial difficulty did not continue to exist as they became familiar with the musical mappings. Also, all of them seem to agree that it did not take too long to familiarise themselves with the musical mappings. However, they did not seem to believe that concentration and memory demands diminished significantly. 2. Would you consider a touch sensitive screen suitable for your needs in the following areas? (a) General interaction with the computer?: (b) Graphical Drawing Application?: (c) Understanding visual layout of the screen with multiple windows?: The use of such a device was welcomed by all people interviewed especially for drawing. Obviously, this was a hypothetical question and as none of them had actually used any such device interactively before, all of them appear to agree that it will be possible to help. 3. Do you believe that music can be used in receiving feedback from a graphical drawing tool from your own personal point of view after all this experience you had? All of them answer 'yes', including the ones that were sceptical in using music. They all appeared to welcome music but with speech as well. No opinion was in favour of music on its own. 4. Do you believe that with the appropriate training you can use a tool similar to the A udiograph to read and create your own diagrams by yourself? The general reply to this question was that on the top of speech feedback, they would also welcomed some music. In this way, their experience in using such a tool could indicate to them which is more useful and under what circumstances. 20% replied that they would not wish to use music (in a similar way they had not learned braille). This indicates that the preferences of users play an important role in introducing new communication metaphors. 5. In what other applications do you believe basic musical properties can be used? 188

203 Replies to this question included document structure and programming environments (for the ones doing computing courses). The general feeling was that most of their interaction with computers was via speech and, perhaps, some feedback in addition to speech may provide alternative ways for data representation to satisfy particular needs and preferences. 6. Have you any more comments to make? Ideas and comments were put forward such as user customisation or even allowing the user to decide presentation order in scanning the graphical objects (for instance). Other comments included that although counting was involved at one instance or another, playing a rectangle and having a feeling of its dimensions and relative location was a completely different representation compared to listening to numbers (i.e., using speech). Some others, remarked that precision drawing or fine detail understanding would certainly require the use of speech, but for other, rather casual work, music could be used just as well. They all seem to conclude that using speech for understanding the arrangement of objects will inevitably leave them with a general idea of the location of the object (say ten minutes later) and not with the memory of the actual co-ordinates and music does that just as well (memorability problems). Others (in a rather humourous note), said that they would do anything to take the cacophonous sound of synthesised voice away from their ears! This is a problem if good speech synthesisers are not used. Finally, when people were asked whether this research should continue in investigating music as a communication metaphor, all of them answered 'yes' (including the ones who appear to the author to be somehow 'sceptical'). Other comments of interviewed people which were also similar with experimental observations are discussed in section Critical Assessment from RNIB Feedback for the AudioGraph experimental framework tool was also received from a mathematics tutor in RNIB, with a long experience in teaching blind students, and who had an understanding of the problems involved in blind user graphical information processing. The report stated the following: 189

"Based on the research so far, your program could have the potential of allowing visually impaired people the capability of drawing their own diagrams.

204 "Based on the research so far, your program could have the potential of allowing visually impaired people the capability of drawing their own diagrams. The grid allows users to know where they are in relation to other points. For precise locations the user would require a command for the x and y co-ordinates and also a command to move to a particular co-ordinate. The control panel/grid (i.e., graphical drawing area) has now been improved so that the user can toggle between them. Choosing lines, circles, etc., are currently done from the menu, a direct command whilst within the grid would be helpful. Currently the object is selected and dragged into position. It would be useful if the object could be placed at the current grid position. A default size is currently drawn on screen and if an enlargement is made the control panel is required. It would be more useful if the size could be chosen along with the object, for example: Circle centred on the current grid position with a default radius (spoken to the user). To change the radius, it could be entered via the keyboard directly or by using the right or left arrow key for increasing and decreasing the radius. Again, it should be spoken.. Rectangle with the left top corner at the current grid position. The user should then be able to specify the length and width of the rectangle or move with the arrow keys to bottom right hand corner. Lines should begin at the current grid position and the arrow keys or direct input used to determine the end of the line. The user may want to draw a scale diagram. The grid could be given a particular dimension so that 1 grid unit equals 1 cm, etc. The user could work in centimetres and the program would convert to the relevant grid units. As the user moves around the grid and encounters the edge of an object, the user must be able to get some status information such as the current position and the current object's details (e.g., circle centre in x and y, radius r). Another useful piece of information would be if you were within an object and, if so, which object(s) and what their details are. All objects should be labelled from the software point of view so that any object could be selected and modified or deleted. Each diagram needs to be saved and to be printed. For those with some sight, the grid may need to be bigger. The grid and control panel could be in a two separate displays enabling the full screen to be used. The colours of the grid and 190

Curriculum Development In the Fairfield Public Schools FAIRFIELD PUBLIC SCHOOLS FAIRFIELD, CONNECTICUT MUSIC THEORY I

Curriculum Development In the Fairfield Public Schools FAIRFIELD PUBLIC SCHOOLS FAIRFIELD, CONNECTICUT MUSIC THEORY I Board of Education Approved 04/24/2007 MUSIC THEORY I Statement of Purpose Music is