MINIDISC RECORDERS VERSUS AUDIOCASSETTE RECORDERS: A PERFORMANCE COMPARISON

Similar documents
Natural Radio. News, Comments and Letters About Natural Radio January 2003 Copyright 2003 by Mark S. Karney

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Supervision of Analogue Signal Paths in Legacy Media Migration Processes using Digital Signal Processing

Diamond Cut Productions / Application Notes AN-2

Behavioral and neural identification of birdsong under several masking conditions

Measurement of overtone frequencies of a toy piano and perception of its pitch

WHY DO VEERIES (CATHARUS FUSCESCENS) SING AT DUSK? COMPARING ACOUSTIC COMPETITION DURING TWO PEAKS IN VOCAL ACTIVITY

A few white papers on various. Digital Signal Processing algorithms. used in the DAC501 / DAC502 units

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Simple Harmonic Motion: What is a Sound Spectrum?

Black-capped chickadee dawn choruses are interactive communication networks

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

Experiments on tone adjustments

UNIVERSITY OF DUBLIN TRINITY COLLEGE

IP Telephony and Some Factors that Influence Speech Quality

R&S FSW-B512R Real-Time Spectrum Analyzer 512 MHz Specifications

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Spectrum Analyser Basics

The Cocktail Party Effect. Binaural Masking. The Precedence Effect. Music 175: Time and Space

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

BitWise (V2.1 and later) includes features for determining AP240 settings and measuring the Single Ion Area.

R&S FSW-K160RE 160 MHz Real-Time Measurement Application Specifications

Using the BHM binaural head microphone

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

The BAT WAVE ANALYZER project

MEASURING LOUDNESS OF LONG AND SHORT TONES USING MAGNITUDE ESTIMATION

AUD 6306 Speech Science

DETECTING ENVIRONMENTAL NOISE WITH BASIC TOOLS

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

The Tone Height of Multiharmonic Sounds. Introduction

Pitch shifts and song structure indicate male quality in the dawn chorus of black-capped chickadees

Signal Stability Analyser

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

SOUND LABORATORY LING123: SOUND AND COMMUNICATION

Olga Feher, PhD Dissertation: Chapter 4 (May 2009) Chapter 4. Cumulative cultural evolution in an isolated colony

Dominance and geographic information contained within black-capped chickadee (Poecile atricapillus) song

Received 27 July ; Perturbations of Synthetic Orchestral Wind-Instrument

A Technique for Characterizing the Development of Rhythms in Bird Song

Precision testing methods of Event Timer A032-ET

Sound design strategy for enhancing subjective preference of EV interior sound

Sound Quality Analysis of Electric Parking Brake

I. INTRODUCTION. University of California at Davis, One Shields Avenue, Davis, CA Electronic mail:

Auditory Illusions. Diana Deutsch. The sounds we perceive do not always correspond to those that are

Proceedings of Meetings on Acoustics

A procedure for an automated measurement of song similarity

AMEK SYSTEM 9098 DUAL MIC AMPLIFIER (DMA) by RUPERT NEVE the Designer

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

Temporal summation of loudness as a function of frequency and temporal pattern

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Assessing and Measuring VCR Playback Image Quality, Part 1. Leo Backman/DigiOmmel & Co.

2. AN INTROSPECTION OF THE MORPHING PROCESS

Acoustical Testing 1

NanoGiant Oscilloscope/Function-Generator Program. Getting Started

456 SOLID STATE ANALOGUE TAPE + A80 RECORDER MODELS

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

What to look for when choosing an oscilloscope

We realize that this is really small, if we consider that the atmospheric pressure 2 is

Note on Posted Slides. Noise and Music. Noise and Music. Pitch. PHY205H1S Physics of Everyday Life Class 15: Musical Sounds

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Spatial-frequency masking with briefly pulsed patterns

Nature Neuroscience: doi: /nn Supplementary Figure 1. Emergence of dmpfc and BLA 4-Hz oscillations during freezing behavior.

Audio Feature Extraction for Corpus Analysis

The presence of multiple sound sources is a routine occurrence

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

1 Introduction to PSQM

Getting Started with the LabVIEW Sound and Vibration Toolkit

LeCroy Digital Oscilloscopes

DYNAMIC AUDITORY CUES FOR EVENT IMPORTANCE LEVEL

Hugo Technology. An introduction into Rob Watts' technology

Red-winged blackbirds Ageliaus phoeniceus respond differently to song types with different performance levels

Psychoacoustics. lecturer:

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

Please feel free to download the Demo application software from analogarts.com to help you follow this seminar.

EFFECTS OF REVERBERATION TIME AND SOUND SOURCE CHARACTERISTIC TO AUDITORY LOCALIZATION IN AN INDOOR SOUND FIELD. Chiung Yao Chen

SYMPHONY OF THE RAINFOREST Part 2: Soundscape Saturation

Localization of Noise Sources in Large Structures Using AE David W. Prine, Northwestern University ITI, Evanston, IL, USA

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

APP USE USER MANUAL 2017 VERSION BASED ON WAVE TRACKING TECHNIQUE

Binaural Measurement, Analysis and Playback

THE INTERACTION BETWEEN MELODIC PITCH CONTENT AND RHYTHMIC PERCEPTION. Gideon Broshy, Leah Latterner and Kevin Sherwin

Topic 10. Multi-pitch Analysis

A Need for Universal Audio Terminologies and Improved Knowledge Transfer to the Consumer

Loudness and Pitch of Kunqu Opera 1 Li Dong, Johan Sundberg and Jiangping Kong Abstract Equivalent sound level (Leq), sound pressure level (SPL) and f

PRELIMINARY INFORMATION. Professional Signal Generation and Monitoring Options for RIFEforLIFE Research Equipment

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Dynamic Spectrum Mapper V2 (DSM V2) Plugin Manual

Quarterly Progress and Status Report. An attempt to predict the masking effect of vowel spectra

Choosing an Oscilloscope

Loudness and Sharpness Calculation

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Music Source Separation

VOCAL TRACT FUNCTION IN BIRDSONG PRODUCTION: EXPERIMENTAL MANIPULATION OF BEAK MOVEMENTS

Colour Reproduction Performance of JPEG and JPEG2000 Codecs

Version 1.10 CRANE SONG LTD East 5th Street Superior, WI USA tel: fax:

ECE 5765 Modern Communication Fall 2005, UMD Experiment 10: PRBS Messages, Eye Patterns & Noise Simulation using PRBS

CSC475 Music Information Retrieval

Advance Certificate Course In Audio Mixing & Mastering.

Lab #10 Perception of Rhythm and Timing

Transcription:

Bioacoustics 1 The International Journal of Animal Sound and its Recording, 2005, Vol. 15, pp. 000 000 0952-4622/05 $10 2005 AB Academic Publishers MINIDISC RECORDERS VERSUS AUDIOCASSETTE RECORDERS: A PERFORMANCE COMPARISON DAVID M. LOGUE *, DAVID E. GAMMON AND MYRON C. BAKER Department of Biology, Colorado State University, Fort Collins, CO 80523, USA. ABSTRACT MiniDisc (MD) digital audio recorders have the potential to benefit bioacoustics research, but concerns about the ATRAC (Adaptive Transform Acoustic Coding) compression method employed by MD recorders have prevented their widespread acceptance in the research community. We compared the performance of MD recorders with that of professional grade audiocassette recorders. Test sounds were synthesized or recorded directly onto a computer hard drive and then transferred to each of two MD recorders and three cassette recorders. The sounds were then transferred back to a computer and compared to the original versions to quantify degradation caused by the recorders. MD recorders proved superior to cassette recorders in the accurate reproduction of mean frequency and the reproduction of low amplitude signals when a high amplitude signal occurred at a nearby frequency. Unlike audiocassette recorders, MD s did not generate artefactual variance in signal frequency and amplitude. The new MD recorder used in our study consistently outperformed all other units in the ability to reproduce natural sounds, as quantified by two automated sound comparison techniques. We found, however, that MD recorders introduced acoustic artefacts after the rapid offset of signals. Artefact duration was not affected by signal duration, resulting in a positive relationship between signal duration and signal-to-noise ratio. The artefacts periodicity, duration, and amplitude depended on the frequency of the signal; high-frequency signals produced more periodic, shorter, and quieter artefacts than did low frequency signals. Recording amplitude has little to no effect on signal-to-noise ratio. Cassette recorders introduced non-periodic offset artefacts that were similar to the artefacts introduced by MD recorders after low frequency signals. We conclude that researchers should base their choice of a recording device on the types of sounds they intend to record and the relative importance of accurate reproduction of sound offset versus other aspects of recording fidelity. Overall, however, we see no compelling reason to avoid MD recorders for most field recording and playback applications, and we suggest that the study of bioacoustics stands to benefit from the many practical advantages and novel research methods afforded by this technology. Keywords: ATRAC compression, audiocassette recorder, MiniDisc, recording fidelity, sound recording equipment. * Correspondence: D. M. Logue, Department of Biology, Colorado State University, Fort Collins, CO 80523-1878, USA. Email: dlogue@lamar.colostate.edu

2 INTRODUCTION This paper addresses the suitability of MiniDisc (MD) digital audio recorders for bioacoustics research. MD technology was developed in the 1980s as a way to store long blocks of digitised sound on a compact, skip-free medium. This was accomplished through ATRAC (adaptive transform acoustic coding) compression, an audio coding system that compresses compact disc audio to approximately 1/5 of the original size by allocating different amounts of memory (and conversely, accepting different degrees of quantisation noise) to each of 52 frequency sub-bands (Tsutui et al. 1992). The allocation is based on human psychoacoustics with the most audible sub-bands recorded with the highest fidelity (Tsutui et al. 1992). ATRAC compression results in little or no loss in sound quality to human listeners (Tsutui et al. 1992). Despite several practical advantages of MD recorders over cassette recorders (see Discussion), MD recorders are rarely used in bioacoustical research. The primary source of reluctance is undoubtedly the formerly published claim from the Cornell Laboratory of Ornithology s (hereafter CLO) that MD recorders are unacceptable for natural sound recordings (Macaulay 2001, as of March 16, 2004). The CLO web page stated that sounds containing rapid amplitude modulation (e.g. the trills produced by certain anurans and emberizid sparrows) were distorted through ATRAC compression in a manner that certain animals, but not humans, may perceive. It went on to say that ATRAC compression could mask quiet sounds when another louder sound occurs at a nearby frequency. Humans cannot perceive the lower amplitude sounds, but it is unknown if animals can hear them. We chose to revisit the question of whether MD recorders are suitable for bioacoustical research because MD technology has progressed substantially since the original CLO tests. Recent models promise superior sound quality, with less noise, and increased frequency response relative to the machines tested by the CLO (Sony customer service, pers. comm.; Paprotka 1997). Furthermore, statements about the absolute quality of MD recordings may not be as useful as comparisons between MD recorders and their most widely used alternative, cassette recorders. We used bioacoustical techniques to quantify and compare various types of distortion introduced by MD and cassette tape recordings. First, we tested all units for artefactual variations in frequency and amplitude because all tape recorders are known to distort sound in these ways (Wickstrom 1982; Kroodsma et al. 1996; Jiang et al. 1998). Second, we compared the abilities of the two kinds of recorders to reproduce natural sounds accurately. Third, we tested the CLO claims that MDs distort trills and fail to reproduce low amplitude signals when a high amplitude signal occurs at a nearby

frequency. Finally, we discuss the practical advantages and disadvantages of the two kinds of recorders and offer recommendations for their use. We focus on bird vocalizations since that is our area of expertise, although inferences may apply to other natural sounds with similar characteristics. 3 METHODS Our basic methodology can be summarized in four steps: 1) synthesize or record a sound directly onto a computer, 2) transfer the sound to all the recording devices to be tested, 3) transfer the sound back to a computer, and 4) compare the re-recorded versions of the sound to the original recordings by quantifying the loss of quality caused by each recording device. Sound generation and recording Synthetic sounds were designed to reveal various types of recording distortion. We used the programs Multi-Speech (Kay Elemetrics Corp., Model 3700, Version 2.3, Lincoln Park, NJ) and Goldwave (Version 4.26, http://www.goldwave.com/, St. Johns, Newfoundland) to synthesize three waveforms, (Figure 1a-c): (a) a sine wave at 2000 Hz, (b) a sine wave that increased in frequency linearly from 20-15,000 Hz in 10 sec with a 0.10 second-delayed echo at 1% of the signal volume, and (c) a white noise trill with 10 notes each of durations 0.1, 0.05, and 0.03 s. All synthetic sounds were generated at a sampling rate of 44,100 samples per second with 16-bit accuracy and saved as wave files. Natural sounds were recorded with a Sennheiser microphone (ME62) connected to the microphone input of a laptop computer (Gateway 450SX4 with ESS Allegro PC1 Audio sound card; the same machine was used for all sound transfer). The sounds were sampled at 44,100 Hz and saved as wave files using the program Syrinx (John Burt, http://syrinxpc.com/). We used four, high quality, unfiltered recordings: (Figure1d) a jay call from a Blue Jay Cyanocitta cristata, (Figure 1e) a fee-bee song from a Black-capped Chickadee Poecile atricapillus (Gammon & Baker 2004), and two short calls from a captive male Nanday Conure Aratinga nenday that we call a scream and a grunt, respectively (Figure 1f-g). Sounds d and e were recorded from a distance of approximately 20 m in a mixed agricultural/ riparian area near Fort Collins, Colorado. A 60 cm Telinga Pro- Universal parabola was used for these recordings. Sounds f and g were recorded in an anechoic chamber (interior dimensions 80 60 60 cm, Industrial Acoustics Co.).

4 Figure 1. Spectrogram of the three synthetic sounds (a-c) and four natural sounds (d-g) used to test recording fidelity. We generated (a) a sine wave at 2000 Hz, (b) a sine wave that increases in frequency with an echo at 1% of the signal volume, and (c) an accelerating white noise trill. We recorded (d) the jay call of a Blue Jay, (e) a fee-bee song from a Black-capped Chickadee, and (f) scream and (g) grunt calls from a Nanday Conure. All spectrograms use 1024 point FFT size, Blackman window. Note variation in time and frequency scales. Sound transfer All sounds a-g were transferred from the computer to five different recording devices: a new (unused) Marantz PMD-222 cassette recorder (NMAT), a two year old Marantz PMD-222 (OMAT), a 9 year old Sony TCM-5000EV cassette recorder (OSAT), a new (unused) Sony MZ-N10 MD recorder (NMD), and a one year old Sony MZ-N1 MD recorder (OMD). We used both old and new machines to reflect the range of equipment that might be available to working recordists. The computer, cables, and recording devices were impedance-matched.

Both MD units use ATRAC Type-R digital signal processing, though because the NMD also offers the option of recording at a higher level of compression, it is labelled Type-S. Specific advantages of the newer machines relative to the machines tested by CLO, which used ATRAC Version 2, include a broader frequency response, higher acoustic resolution (20-24 bits versus 16 bits for ATRAC 2) and more accurate bit allocation (Paprotka 1997, www.minidisc.org). Other than the two new machines, we have no records of the amount of use of the recorders, although all had received regular use. All used recorders were cleaned thoroughly prior to these tests and the NMAT had been calibrated since its last use. The cassette recordings were made on new TDK Type II high bias cassette tapes, and the MD recordings were made on new Memorex MiniDiscs. A 60 cm Radio Shack cable ran from the headphone out jack on the computer to the line in jack on each device. Even though all of the recorders were set to automatic gain control, it was necessary to adjust the computer s output levels to avoid clipping. For each cassette recorder, we played sound a (Figure 1a) and adjusted the computer s output level until the maximum volume registered by the VU meter on the cassette recorder was slightly below the red zone. MD meters are digital and we adjusted the output of the computer signal until each MD meter displayed a maximum of four bars (as recommended by http:// www.minidisc.org). Sounds were then recorded from the devices back to the computer. A 60 cm Radio Shack cable ran from the line out jack of the device to the line in jack on the computer. We played sound a and adjusted the volume on each device to the highest level possible without overloading Syrinx (as evidenced by red volume readings). Each sound was recorded to its own wave file at a sampling rate of 44,100 Hz. 5 Comparing original and re-recorded sounds Frequency and amplitude fidelity Sound a (sine 2000 Hz) was chosen for this analysis because the energy in this signal is concentrated near the frequency of peak sensitivity for the avian ear (e.g. Dooling et. al 1978). We used Syrinx to trim each version of sound a to 2s. We then ran a custom computer program that measured the peak-to-peak periods (measured in points or digital sampling units), and peak amplitude of each wave. This system of measurement is a maximally accurate way of measuring period in a digitised waveform. There is a first order, linear relationship between period and frequency (frequency = sampling rate/period. Variation in the digital sampling rate is negligible, allowing us to use period as a proxy for frequency

6 throughout this analysis. The sample rate we employed (44,100 Hz) was not evenly divisible by the signal frequency, resulting in cyclical variation in period for the original signal (nineteen 22-point periods followed by one 23-point period, repeating). We analysed the central 3960 cycles (1.98 s in the original signal) from each signal because the beginning and end of some signals contained extreme values that we attribute to computer software trimming artefacts, and because we wanted an even number of cycles for the original signal. To compare variances in amplitudes while controlling for mean amplitude, we standardized mean amplitudes to 10,000. Due to period cycling, samples were not normally distributed, necessitating non-parametric techniques. We used Statistica (version 5.1, StatSoft Inc. 1998) to run Mann-Whitney U-tests for differences in median period, and Brown- Forsythe tests for homogeneity of variance in both periods and standardized amplitude. In all cases, data from each re-recorded signal were compared to data from the original signal. We used the Bonferroni adjustment for multiple comparisons, to arrive at α=0.01. Reproduction of natural sounds This analysis relied on two techniques for the comparison of complex sounds (Baker & Logue 2003) to test for overall recording fidelity: Sound Analysis s (Version 3.21; Tchernichovski et al. 2000; Tchernichovski & Mitra 2001) accuracy measure and Canary s (Version 1.2; Charif et al. 1995) spectrogram cross-correlation (SPCC). Sound Analysis uses multi-taper spectral analysis to represent sounds as sharply defined contours on a time-frequency display. The program divides the sounds into small segments and derives values of four biologically relevant features for each segment (Fee et al. 1998; Tchernichovski et al. 2000; Baker et al. 2003; Baker & Logue 2003). The four features are: Wiener entropy (randomness of sound), spectral continuity (the continuity of element contours over time), pitch (the fundamental frequency), and frequency modulation (the slope of the contour traces). One of us conducted pair-wise comparisons between original and re-recorded versions of sounds d-g (natural sounds) as well as comparisons of each original recording versus itself. We selected overall comparison rather than chunks, because the former forces syllable order to be conserved. These analyses generated accuracy scores representing the mean similarity (in terms of the four features) of the segments between the two sounds being compared. Accuracy scores range from 0-100, where 100 represents maximum similarity. Using the SPCC function of Canary we compared the rerecorded versions of sounds d-g to the original recordings. SPCC quantifies the similarity of two sounds by generating normalized spectrograms of the sounds. The two spectrograms are then compared

in successive steps of overlap along the time axis. The similarity score that is reported represents the overlap of the two spectrograms that results in maximum correlation coefficient. We used batch processing to compare re-recordings with original recordings and to compare each original recording to itself (SPCC settings were: frame length 512 points, time grid resolution 2.9 ms with 75% overlap, FFT size 1024, Hamming window, and clipping level 80 db). We chose the Hamming window function because it achieves good sideband suppression while maintaining a fairly small filter bandwidth (Charif et al. 1995), an appropriate trade-off when comparing natural sounds. SPCC generates similarity scores ranging from 0-1, where 1 represents maximum similarity. Masking of low amplitude sounds We used Multi-Speech to generate FFT spectra (2048 pt) from the point on each version of sound b (slide with echo), where the high amplitude signal equals 2 khz. Syrinx was used to make spectrograms of each version of sound b (1024 point FFT size, Blackman window). For this analysis it was important that sidebands were maximally suppressed to allow us to distinguish between the two signals and accurately assess noise levels. We chose the Blackman window function because it emphasizes sideband suppression at the allowable expense of increased filter bandwidth. Spectra and spectrograms were examined visually. Temporal onset and offset We visualized the waveforms of each version of sound c (accelerating white noise trill) in Syrinx. Upon finding artefactual noise in the MD recordings, we measured the duration of the artefacts in Syrinx using the onscreen cursors, which allowed a temporal resolution of 0.1ms. Any visible perturbation in sound pressure level preceding or following a trill element was included in the measure of artefact duration. We measured the amplitudes (RMS) of trill elements and their associated artefacts in Multi-Speech (window size=32 points). Some of the signal-onset artefacts (17/60) were shorter than the minimum window size (0.7 ms) for determining amplitude, and so were scored as missing values with regard to amplitude. We derived the energy for each artefact and trill element by multiplying the square root of the amplitude by the duration. We then divided the energy in each trill element by the energy in the associated artefact to arrive at the signal-to-noise ratio. We used SPSS 11.0 for Windows (SPSS Inc. 2001) to construct two-way factorial ANOVA models testing for the effects of the recording unit (NMD vs OMD) and the duration of the notes on the signal-to-noise ratio and the duration of the offset artefacts. 7

8 The results of this analysis prompted further questions about the effect of signal frequency and amplitude on the generation of artefacts. To test for a relationship between signal-offset artefacts and fundamental frequency, we used Multi-Speech to generate a sine wave trill consisting of clusters of four 0.05 s pulses at each frequency 75, 150, 300, 600, 1200, 2400, 4800, 7200, and 9600 Hz (Figure 2). Pulses within a cluster were separated by 0.05 s, and clusters were separated by 0.25 s. This trill was recorded onto the NMD recorder, and transferred back to the computer. One of us employed the methods described above to measure the duration of the artefacts and the signal-to-noise ratio for each trill element. We examined the influence of recording level on the production of artefacts by recording 10 white noise pulses (0.03 s each, separated by 0.03 s of silence) to the OMD machine under four conditions: 1) using the MD recorder s automatic gain control, and using manual gain control at three recording levels, 2) 10/30, 3) 20/30, and 4) 30/30. We then derived signal-to-noise ratios. We wanted to know how signal amplitude affects artefact amplitude, so we regressed log 10 - transformed signal amplitudes versus log 10 -transformed artefact amplitudes. An ANOVA model was used to compare signal-to-noise ratios across treatments. Figure 2. A spectrogram of the pulsed pure-tone trill used to examine the effects of signal frequency on the amplitude and duration of MiniDisc-generated artefacts. This spectrogram was generated in Syrinx (Window size = 1024 points, Blackman window).

9 RESULTS Frequency and amplitude fidelity The MD recorders reproduced the period, and thus the fundamental frequency of the 2,000 Hz sine wave accurately, whereas the cassette recorders altered the mean period by 0.03-0.28 points (Table 1), which is equivalent to absolute deviations in frequency of 2.4-25.6 Hz. The MD recorders did not introduce any detectable variation in frequency over time, but all three cassette recorders introduced substantial variation in fundamental frequency, or flutter (Table 1). Additionally, the cassette recorders introduced detectable variation in amplitude, while the MD recorders did not (Table 2). Reproduction of natural sounds When comparing original sounds against themselves, both Sound Analysis and SPCC delivered maximum or near maximum similarity scores (Table 3). According to both methods of analysis, the NMD unit reproduced all sounds with higher fidelity than any of the other units (Table 3). The OMD ranked second in terms of average similarity score according to the Sound Analysis accuracy measures, but it ranked fourth (behind the OMAT and NMAT) according to the SPCC TABLE 1 A comparison of mean and variance of wave period reproduction from five recording devices. The original synthesized waveform contains 1.98 s of sine wave at 2,000 Hz. Waveforms contained 44,100 points per second, resulting in an expected period of 22.05 points. Significant results for the Mann-Whitney U-Test signify departure from the median period of the original waveform. Significant results for the Brown-Forsythe test signify departure from the variance in period relative to the original waveform. Values in bold face are statistically significant at the α=0.01 level. MD recorders accurately reproduced mean period and period variance, and cassette recorders did not. See text for abbreviations. Recorder Mean Variance in Mann-Whitney U-Test Brown-Forsythe Test Period Period N 1 = N 2 = 3960 df = 1, 7918 (points) U p adj F p Original 22.050 0.0475 NMD 22.050 0.0475 7,841,664 0.982 0 0.999 OMD 22.050 0.0475 7,841,664 0.982 0 0.999 NMAT 21.772 0.272 5,823,364 <0.0001 955.61 <0.0001 OMAT 22.104 0.153 7,403,782 <0.0001 276.33 <0.0001 OSAT 22.023 0.227 7668936 0.0047 551.39 <0.0001

10 TABLE 2 A comparison of the variance in amplitude reproduction from five recording devices. Standardized amplitudes were measured from all peaks in the original synthesized signal and from each re-recorded signal. Significant results for the Brown-Forsythe test signify departure from the variance in amplitude relative to the original waveform. Values in boldface are statistically significant at the α=0.01 level. MD recorders reproduced variance in amplitude accurately and audiocassette players did not. See text for abbreviations. Amplitude measurements derived directly from uncalibrated signals are without acoustical units. Amplitude data were standardized so all averages would equal 10,000. Recorder Mean Amplitude Variance in Amplitude Brown-Forsythe Test df = 1, 7918 F P Original 10,000 60,560 NMD 10,000 59,346 0.70 0.40 OMD 10,000 58,865 1.33 0.25 NMAT 10,000 150,847 76.24 <0.0001 OMAT 10,000 171,346 765.87 <0.0001 OSAT 10,000 100,303 153 <0.0001 similarity measures. The OSAT delivered the poorest average fidelity according to both methods of analysis. Masking of low amplitude signals The low amplitude signal in sound b (slide with echo) was clearly visible in both MD recordings, and was represented at appropriate amplitude relative to the high amplitude signal (Figure 3). In contrast, the low amplitude signal was partially masked by noise in the cassette tape recordings (Figure 3). To ensure that this result was not caused by inappropriate volume settings during sound transfer, we produced several recordings across a range of volume settings on both the computer and the NMAT recorder. These recordings all contained substantial noise masking the low amplitude signal. Temporal onset and offset Examination of the white noise trill recordings revealed that both MD recorders introduced artefacts immediately before and after the trill elements (Figure 4). Artefacts were found with nearly every note, included a broad range of frequencies, and were variable in amplitude

11 TABLE 3 Measures of similarity between re-recorded sounds and original recordings. Sound Analysis Scream Jay Fee-Bee Grunt Average Rank Original 100 100 99.99 100 100 NMD 97.38 94.35 99.30 96.82 96.96 1 OMD 96.88 93.60 98.48 91.09 95.01 2 NMAT 94.94 86.10 94.11 87.7 90.71 3 OMAT 85.82 93.06 95.98 84.63 89.87 4 OSAT 94.62 94.72 88.10 75.44 88.22 5 SPCC Scream Jay Fee-Bee Grunt Average Rank Original 1 1 1 1 1 NMD 0.982 0.984 0.978 0.965 0.98 1 OMD 0.887 0.919 0.899 0.785 0.87 4 NMAT 0.91 0.868 0.818 0.942 0.88 3 OMAT 0.907 0.895 0.862 0.921 0.90 2 OSAT 0.878 0.885 0.845 0.751 0.84 5 and duration. The signal-onset artefacts averaged 1.4±0.9 ms (error terms represent standard deviations). Those that were long enough to obtain amplitude measurements (see Methods) averaged 20.2±6.7 db RMS resulting in a mean signal-to-noise ratio of 51.9±29.6 db. Because the artefacts prior to signal onset were very brief and contained little energy, we do not examine them further, but focus instead on the more substantial signal-offset artefacts. The duration of the offset artefacts was greater in the NMD machine versus the OMD machine (X±SD: NMD: 8.9±1.6 ms; OMD: 5.6±2.7 ms), but artefact duration was not affected by the trill element duration (Twoway ANOVA; machine: F 1,54 =31.23, P<0.001; note duration: F 2,54 =0.045, P=0.96; interaction of machine X note duration: F 2,54 =0.20, P=0.82). The signal-to-noise ratio varied between the two machines (X±SD: NMD: 9.3±5.1 db; OMD: 19.4±18.3 db) and was lower for shorter trill elements (X±SD: 0.1s element: 24.7±11.7 db; 0.05s element: 11.7±7.7 db; 0.03s element: 6.8±4.0 db; Two-way ANOVA; machine: F 1,54 =12.77, P<0.001; note duration: F 2,54 =14.14, P<0.001; interaction of machine X note duration: F 2,54 =2.12, P=0.13). Cassette recorders did not produce periodic artefacts, but they did produce a spurious increase in pressure after each syllable termination that decreased monotonically to the zero level (Figure 4). These monotonic declines appear as low frequency noise on a spectrogram.

12 Figure 3. Spectrograms (Window size = 1024 points, Blackman window) and associated FFT power spectra (Window size = 2048 points, Blackman window), for three versions (Original, NMD and NMAT) of the slide with echo sound. The maximize function in Syrinx was employed to standardize signal amplitudes. Arrows on the spectrograms indicate where the associated power spectrum was taken, and arrows on the spectra indicate where the high and low amplitude signals should appear. The NMD recording of a trill composed of pure tones (sine waves) of varying frequency revealed that for tones <1200 Hz, the offset artefacts approached a monotonic return to the zero level, with little periodicity (Figure 4). These artefacts are similar to the offset artefact introduced by cassette recorders in response to white noise (compare Figure 4b and 4d). Artefacts following low frequency (pitch) trill elements were longer and of greater relative amplitude than those following high frequency elements (Figure 5). The 1200 Hz tones were followed by semi-periodic artefacts (Figure 4e). Above 1200 Hz the artefacts were periodic, relatively short (average=8.0ms), and quiet (Figure 5; average signal-to-noise ratio=10.9 db). Signal amplitude had a strong and direct effect on artefact amplitude (linear regression; R 2 =0.76, F 1,28 =90.86, p<0.0001). All four

13 Figure 4. Oscillograms of single notes from white noise trills (a-c) and the tonal trill (d-f). In (a) the original version of the white noise trill, the sound begins and ends abruptly, whereas in (b) cassette recordings like this note from the NMAT, there is a non-periodic return to zero sound pressure after the end of the note. In (c) MD recordings, like this one from the NMD recorder, periodic artefacts are introduced at the beginning and end of the note. The NMD recorder introduces non-periodic offset artefacts after pure tones of low frequency (d, 150 Hz), semiperiodic artefacts after pure tones in the middle of the test range (e, 1200 Hz), and periodic artefacts at the upper end of the test range (f, 9600 Hz). The horizontal bars represent 0.05s, and the arrows point to artefacts introduced by the recording equipment. treatments produced signal-offset artefacts, but recording level did not have a significant effect on signal-to-noise ratio (X±SD: automatic gain control: 5.4±1.9 db; manual gain control 10/30: 5.8±3.2 db; 20/ 30: 4.7±0.95 db; 30/30 4.3±0.64 db; ANOVA, F 3,36 =1.26, p=0.30). The automatic gain control setting worked well, producing a higher mean signal-to-noise ratio than either the medium or high recording levels. DISCUSSION Both cassette recorders and MD recorders distort sounds in ways that may influence bioacoustics research. In most respects, MD recorders reproduce sounds as well as or better than professional grade cassette recorders. Compared to cassette recorders, MD recorders introduce

14 Figure 5. The duration of offset artefacts, and the signal-to-noise ratio from MD recordings of tonal bursts of sound of varying frequency. The x-axis is in log scale. See text and Figure 4 for further characterization of artefacts. less variation in amplitude and frequency, reproduce fundamental frequencies with greater accuracy, and add much less noise. According to both measures of sound similarity, the NMD recorder reproduced all four natural sounds at higher fidelity than any of the other units. According to the similarity scores generated by Sound Analysis, the OMD recorder ranked second, outperforming all of the cassette tape recorders. Sound Analysis is highly effective at characterizing similarity among harmonically rich natural sounds, like three of the four sounds used in this study (Baker & Logue 2003). According to SPCC analysis, the two Marantz cassette recorders outperformed the OMD unit. With our small sample of recorders, it is impossible to determine whether this discrepancy between the two MD recorders is attributable to differential wear or to some other attribute of the machines. We found mixed support for the CLO s claims regarding MD recording quality. On the one hand, we reject the claim that MD s fail to reproduce low amplitude signals when a high amplitude signal occurs at a nearby frequency (Macaulay 2001). To the contrary, we found MD recorders to be effective at reproducing both signals. In the cassette recordings, however, noise partially obscured the quiet signal (Figure 3). We conclude that a bioacoustician who is naïve to the presence of a quiet signal among louder signals would be more likely to detect the quiet signal if he used MD versus audiocassette. On the other hand, we found support for the CLO claim that MD s distort signals with rapid amplitude modulation. Specifically, we found that MD recorders generate artefactual variation in sound pressure levels prior to rapid signal onset and following rapid signal offset, although artefact amplitude was always much lower than signal amplitude. Our tests suggest that signal-offset artefacts are more likely to be of concern to bioacousticians because they contain much more energy than signal-onset artefacts. The duration of the recorded signal does

not affect the duration of offset artefacts, resulting in a substantially lower signal-to-noise ratio for shorter signals. Based on this finding, we conclude that MD performance is weakest for very brief sounds (<0.05s). The frequency of the recorded signal affects both the duration and the amplitude of the offset artefacts. At low frequencies these artefacts are non-periodic, and thus unlikely to be perceived as a continuation of the signal by either researchers or test subjects. Cassette recorders introduce similar non-periodic artefacts at the rapid offset of sounds. Recordists should be aware that these nonperiodic artefacts for both types of equipment appear on spectrograms as low frequency noise. The offset artefacts introduced by MD become shorter, quieter, and more periodic as the frequency of the signal increases. Varying the recording level on the MD recorder did not significantly affect the signal-to-noise ratio, though there was a nonsignificant trend for higher signal-to-noise ratios in the quieter signals. The automatic gain control worked well, resulting in a relatively high signal-to-noise ratio. In our experience in the field, automatic gain control results in high-quality recordings, even when recording loud sounds. We conclude that automatic gain control may be employed without sacrificing signal-to-noise ratio. Because of their low amplitude, the temporal artefact introduced by both types of recording equipment are unlikely to affect researchers ability to make fine scale temporal measurements, but they may be perceptible to animals in playback experiments. The problem of signal-onset and -offset artefacts may become irrelevant with the availability of the Hi-MD MiniDisc format. Introduced by Sony in mid-2004 (subsequent to our tests), Hi-MD allows users to record and playback uncompressed audio at 44,100 samples per second. If, as appears to be likely, ATRAC compression is responsible for the temporal artefacts discussed in this manuscript, Hi-MD recorders will be an attractive option for a wide range of bioacoustics research. Many characteristics of MD are typical of digital recording devices in general. Compared to analogue recording, digital recording offers very little noise, absence of flutter and wow (rapid and slow variation in frequency, respectively), accurate frequency representation, and little or no degradation upon repeated playback or recording. With digital recordings, however, sample rates are fixed upon recording, potentially compromising the ability to compare signals with independently digitised signals at a later stage. We have only limited knowledge of how animals may perceive recordings from MD and cassette recorders. Several types of distortion introduced by cassette recorders, but not MD recorders, are likely to be perceived by animals. For example, a number of songbird species, e.g. Field Sparrow Spizella pusilla (Nelson 1988); White-throated Sparrow Zonotrichia albicollis (Hurly et al. 1990); Carolina Chickadee 15

16 Poecile carolinensis (Lohr et al. 1991); Veery Catharus fuscescens (Weary et al. 1991); Black-capped Chickadee (Weisman & Ratcliffe 1989) perceive changes in absolute and relative fundamental frequency. Fluctuations in amplitude, like those generated by the audiocassette recorders, may play an important role in sexual selection in birds (Forstmeier et al. 2002). We do not know whether animals can perceive the artefacts introduced by MD during playback experiments, but in psychophysical experiments that use continuous sounds interrupted by silent gaps of variable duration, some birds (e.g. Dooling et al. 1978) and mammals (Giraudi et al. 1980) can detect gaps of substantially shorter durations than these artefacts. Psychophysical experiments inform us of the capabilities of animals to distinguish among various features of sound signals. Often, such studies reveal capacity for fine-scale discriminations (e.g., Dooling 1982; Lohr & Dooling 1998; Dent et al. 2000), but these results must be considered in light of mitigating factors. For example, the well-known just noticeable difference (JND) between two sounds that can be revealed in psychophysical experimentation may tell us little about the just meaningful difference (JMD) to the animals themselves (Nelson & Marler 1990). In field recordings or playbacks of natural sounds, the environmental noise and degradation of the sound between source and receiver would likely overwhelm many of the fine-scale artefacts we have illustrated in our comparisons among recording devices. A study examining animal perceptual discrimination capacities for the artefacts introduced by different recording devices could benefit the field of bioacoustics. The signal-offset artefacts generated by MD s are likely to cause problems in studies of signal degradation. These artefacts would be expected to increase the appearance of energy in a signal s reverberation tail, an important cue for sound localization (Holland et al. 2001). Researchers should therefore avoid MD players for studies of sound degradation. Since some types of sound degradation are known to affect birds strength of response to playback, a highly conservative approach would be to avoid MD players for playback experiments. We do not advocate this approach for the following reasons: (1) the MD artefacts are quiet and brief and would be overwhelmed by environmental noise under most field conditions, (2) audiocassettes, the most common alternative to MD, also distort sounds in ways that may affect test subjects, (3) playback studies using MD recorders have revealed strong, consistent, and predicted behavioural differences between experimental groups (e.g. Gebrezahn & Hultsch 2003; Logue & Gammon 2004). MiniDisc recorders offer several practical advantages over cassette recorders. MD recorders are typically less expensive than cassette recorders. While most cassette recorders are relatively bulky and must be carried over the shoulder, most MD recorders can fit into

a shirt pocket. Extreme temperatures and low battery power can alter tape speed and consequently signal frequency in cassette recordings, but they have no effect on MD recordings. Since the laser head used to read and write MD recordings does not actually touch the disc itself, there is less cleaning and maintenance involved, and MD signal quality does not degrade with many uses or in extreme temperatures. We have played a single MD stimulus thousands of times with no perceptible change in signal quality. We have used MD to record and broadcast sounds in a variety of environmental conditions: from winter in Colorado to the wet season in Panama. Random access to discrete tracks allows easy record keeping and quick access to sounds. Tracks can be labelled with alphanumeric characters and (in some models) an automatic time/date stamp. High-end MD recorders can record to a buffer, allowing the recordist to capture sounds that occurred a few seconds before releasing the pause button. MD technology allows research techniques that are not available to cassette users. For example, one can record a sound stimulus followed by the desired duration of blank space, and set the machine on one-track-repeat mode to emulate a tape loop of any duration, and the signal never degrades. These units also work well in interactive playback experiments, wherein the experimenter controls the timing of the playback stimulus (e.g. Logue submitted). MiniDisc recorders record in stereo, and so can be used for stereo playbacks (Logue & Gammon 2004). Unlike stereo cassette tapes, however, MD recorders do not cross-talk from one channel onto the other. The sound editing features of MD recorders allow researchers to add or delete track marks at specific points during sound recordings, as well as move or delete the tracks themselves. Using these editing capabilities, we have recorded several seconds of birdsong, isolated a desired sequence, and created a novel stimulus while in the field. The entire process from the bird singing until he hears its own song played back takes less than two minutes. Researchers who are interested in MD recorders should consider the practical limitations of these devices. In contrast with most professional grade cassette recorders, most MD recorders do not have a built-in speaker. MD recorders are digital, and thus moisture sensitive. However, one of us used two MD recorders (Sony MZ-N1) for four months of fieldwork in a Panamanian moist forest without any moisture-related problems. Another MD recorder (Sony MZ- R700), having spent several minutes underwater, was opened, dried under a fan, and regained full function. In conclusion, researchers who require temporally precise recordings of signals undergoing rapid amplitude modulation should consider avoiding MD recorders. Those who need accurate reproduction of signal frequency with minimum introduction of noise, frequency modulation, and amplitude modulation may safely consider 17

18 MD recorders and the novel research opportunities they afford. For much of the routine descriptive work in birdsong research, we see no compelling reason to avoid MD recorders. ACKNOWLEDGEMENTS Thanks to J. J. Price and W. A. Mackin for encouraging us to write this paper, and to S. Martin for the extended use of his cassette recorder. We thank John Logue for designing the program to automatically measure period and amplitude in the time domain. Lorax the conure and his owner, Mala Sawhney were generous with their voice and time respectively. Our thanks go out to the Belinski family for allowing us to record birds on their ranch. The comments of two anonymous referees guided us to improve this manuscript. The authors are not affiliated with any of the brands tested in this project; we have not received compensation or the promise of compensation from any commercial interest. MCB was supported by an NSF Grant (IBN-0090400). REFERENCES Baker, M. C., Baker, M. S. A. & Gammon, D. E. (2003). Vocal ontogeny of nestling and fledgling black-capped chickadees Poecile atricapilla in natural populations. Bioacoustics, 13, 265-296. Baker, M. C. & Logue, D. M. (2003). Population differentiation in a complex bird sound: A comparison of three bioacoustical analysis procedures. Ethology, 109, 223-242. Bradbury, J. W. & Vehrencamp, S. L. (1998). Principles of Animal Communication. Sunderland, MA: Sinauer Associates. Dent, M. L., Dooling, R. J. & Pierce, A. S. (2000). Frequency discrimination in budgerigars (Melopsittacus undulatus): effects of tone duration and tonal context. J. Acoust. Soc. Am., 107, 2657-2664. Dooling, R. J. (1982). Auditory perception in birds. In Acoustic communication in birds (Ed. by D. E. Kroodsma & E. H. Miller), Vol. 1, pp. 95-130. London: Academic Press. Dooling, R. J., Zoloth, S. R. & Baylis, J. R. (1978). Auditory sensitivity, equal loudness, temporal resolving power, and vocalizations in the house finch (Carpodacus mexicanus). J. Comp. and Phys. Psychol., 92, 867-876. Charif, R. A., Mitchell, S. & Clark, C. W. (1995). Canary 1.2 User s Manual. Ithaca, NY: Cornell Laboratory of Ornithology. Fee, M. S., Shraiman, B., Pesaran, B. & Mitra, P. P. (1998). The role of nonlinear dynamics of the syrinx in the vocalizations of a songbird. Nature, 395, 67-71. Forstmeier, W., Kempenaers, B., Meyer, A. & Leisler, B. (2002). A novel song parameter correlates with extra-pair paternity and reflects male longevity. Proc. R. Soc. Lond. B, 269, 1479-1485. Gammon, D. E. & Baker, M. C. (2004). Song repertoire evolution and acoustic divergence in a population of black-capped chickadees. Anim. Behav., 68, 903-913. Geberzahn, N. & Hultsch, H. (2003). Long-time storage of song types in birds: evidence from interactive playbacks. Proc. R. Soc. Lond. B, 270, 1085-1090.

Giraudi, D., Salvi, R., Henderson, D. & Hamernik, R. (1980). Gap detection by the chinchilla. J. Acoust. Soc. Am., 68, 802-806. Holland, J., Dabelsteen, T., Pedersen, S. B., & López Paris, A. (2001). Potential ranging cues contained within the energetic pauses of transmitted wren song. Bioacoustics, 12, 3-20. Hurly, T. A., Ratcliffe, L. & Weisman, R. (1990). Relative pitch recognition in whitethroated sparrows, Zonotrichia albicollis. Anim. Behav, 40, 176-181. Jiang, J., Lin, E., & Hanson, D.G. (1998). Effect of tape recording on perturbation measures. JSLHR, 41, 1031-1041. Kroodsma, D. E., Budney, G. F., Grotke, R. W., Vielliard, J. M. E., Gaunt, S. L. L., Ranft, R. & Veprintseva, O. D. (1996). Natural sound archives: Guidance for recordists and a request for cooperation. In Ecology and Evolution of Acoustic Communication in Birds (Ed. by D. E. Kroodsma & E. H. Miller), pp. 474-486. Ithaca, NY: Cornell University Press. Logue (submitted). The duet code of the female black-bellied wren, Thryothorus fasciatoventris. Logue, D. M. & Gammon, D. E. (2004). Duet song and sex roles during territory defense in a tropical bird species: the black-bellied wren, Thryothorus fasciatoventris. Anim. Behav., 68, 721-731. Lohr, B. & Dooling, R. J. (1998). Detection of changes in timbre and harmonicity in complex sounds by zebra finches (Taeniopygia guttata) and budgerigars (Melopsittacus undulates). J. Comp. Psychol., 112, 36-47. Lohr, B., Nowicki, S. & Weisman, R. (1991). Pitch production in Carolina Chickadee songs. Condor, 93, 197-199. Macaulay Library of Natural Sounds, Cornell Laboratory of Ornithology. (2001). http: //birds.cornell.edu/lns/recordingnature/recordingnature_index.html. Nelson, D.A. (1988). Feature weighting in species song recognition by the field sparrow (Spizilla pusilla). Behaviour, 106, 158-182. Nelson, D. A. & Marler, P. (1990). The perception of birdsong and an ecological concept of signal space. In Comparative perception and complex signals (Ed. by W. C. Stebbins & M. A. Berkley), vol. 2, pp. 444-478. New York: Wiley. Paprotka, R. (1997). Top MD Recorder, Sony MDS-JA50ES. German Stereo Magazine. April. pp. 20-23. www.minidisc.org/ja50es_review.html. SPSS, Inc. (2001). SPSS 11.0 for Windows. Chicago: SPSS Inc. StatSoft, Inc. (1998). STATISTICA for Windows [Computer program manual]. Tulsa: StatSoft, Inc. Tchernichovski, O., Nottebohm, F., Ho, C. E., Pesaran, B. & Mitra, P.P. (2000). A procedure for an automated measurement of song similarity. Anim. Behav, 59, 1167-1176. Tchernichovski, O. & Mitra, P.P. (2001). Sound analysis user manual, version 2. http: //TalkBank.org/animal/sa.html. Tsutsui, K, Suzuki, H., Shimoyoshi, O., Sonohara, M., Akagiri, K. & Heddle, R. M. (1992). ATRAC: Adaptive Transform Acoustic Coding for MiniDisc. Reprinted from the 93 rd Audio Engineering Society Convention in San Francisco, 1992 October 1-4. http://www.minidisc.org/aes_atrac.html Weary, D. M., Weisman, R. G., Lemon, R. E., Chin, T. & Mongrain, J. (1991). Use of the relative frequency of notes by Veeries in song recognition and production. Auk, 108, 977-981. Weisman, E., & Ratcliffe, L. (1989). Absolute and relative pitch processing in blackcapped chickadees, Parus atricapillus. Anim. Behav, 38, 685-692. Wickstrom, D. C. (1982). Factors to consider in recording avian sounds. In Acoustic Communication in Birds (Ed. by D. E. Kroodsma & E. H. Miller), Vol. 1, pp. 1-52. New York: Academic Press. Received 1 April 2004, revised 16 October 2004 and accepted 18 October 2004. 19