SUPPLEMENTARY INFORMATION

Similar documents
Neural Correlates of the Lombard Effect in Primate Auditory Cortex

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Object selectivity of local field potentials and spikes in the macaque inferior temporal cortex

Nature Neuroscience: doi: /nn Supplementary Figure 1. Emergence of dmpfc and BLA 4-Hz oscillations during freezing behavior.

MEASURING LOUDNESS OF LONG AND SHORT TONES USING MAGNITUDE ESTIMATION

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Differential Representation of Species-Specific Primate Vocalizations in the Auditory Cortices of Marmoset and Cat

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Supplemental Material for Gamma-band Synchronization in the Macaque Hippocampus and Memory Formation

TASTEPROBE Type DTP-1. Pre-amplifier for recording from contact chemosensilla INSTRUCTIONS

Measurement of overtone frequencies of a toy piano and perception of its pitch

Experiments on tone adjustments

Vocal-tract Influence in Trombone Performance

Portable in vivo Recording System

Neural Correlates of Auditory Streaming of Harmonic Complex Sounds With Different Phase Relations in the Songbird Forebrain

Temporal Envelope and Periodicity Cues on Musical Pitch Discrimination with Acoustic Simulation of Cochlear Implant

I. INTRODUCTION. Electronic mail:

聲音有高度嗎? 音高之聽覺生理基礎. Do Sounds Have a Height? Physiological Basis for the Pitch Percept

Auditory Streaming of Amplitude-Modulated Sounds in the Songbird Forebrain

Pitch is one of the most common terms used to describe sound.

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

JOURNAL OF BUILDING ACOUSTICS. Volume 20 Number

MTI-2100 FOTONIC SENSOR. High resolution, non-contact. measurement of vibration. and displacement

Behavioral and neural identification of birdsong under several masking conditions

Auditory streaming of amplitude modulated sounds in the songbird forebrain

ANALYSING DIFFERENCES BETWEEN THE INPUT IMPEDANCES OF FIVE CLARINETS OF DIFFERENT MAKES

Localization of Noise Sources in Large Structures Using AE David W. Prine, Northwestern University ITI, Evanston, IL, USA

CTP 431 Music and Audio Computing. Basic Acoustics. Graduate School of Culture Technology (GSCT) Juhan Nam

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Interface Practices Subcommittee SCTE STANDARD SCTE Measurement Procedure for Noise Power Ratio

Toward a Computationally-Enhanced Acoustic Grand Piano

UNIVERSITY OF DUBLIN TRINITY COLLEGE

Why are natural sounds detected faster than pips?

Do Zwicker Tones Evoke a Musical Pitch?

PRODUCT SHEET

Tinnitus: The Neurophysiological Model and Therapeutic Sound. Background

LETTERS. The neuronal representation of pitch in primate auditory cortex. Daniel Bendor 1 & Xiaoqin Wang 1

Assessing and Measuring VCR Playback Image Quality, Part 1. Leo Backman/DigiOmmel & Co.

Using the BHM binaural head microphone

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

INTRODUCTION J. Acoust. Soc. Am. 107 (3), March /2000/107(3)/1589/9/$ Acoustical Society of America 1589

Sound Quality Analysis of Electric Parking Brake

Proceedings of Meetings on Acoustics

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Proceedings of Meetings on Acoustics

2. AN INTROSPECTION OF THE MORPHING PROCESS

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

VivoSense. User Manual Galvanic Skin Response (GSR) Analysis Module. VivoSense, Inc. Newport Beach, CA, USA Tel. (858) , Fax.

2018 Fall CTP431: Music and Audio Computing Fundamentals of Musical Acoustics

Multiwell-MEA-System

Effect of room acoustic conditions on masking efficiency

The Power of Listening

SigPlay User s Guide

UNDERSTANDING TINNITUS AND TINNITUS TREATMENTS

Noise evaluation based on loudness-perception characteristics of older adults

Effects of headphone transfer function scattering on sound perception

Electrical Stimulation of the Cochlea to Reduce Tinnitus. Richard S. Tyler, Ph.D. Overview

Fine frequency tuning in monkey auditory cortex and thalamus

The Tone Height of Multiharmonic Sounds. Introduction

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

A SEMANTIC DIFFERENTIAL STUDY OF LOW AMPLITUDE SUPERSONIC AIRCRAFT NOISE AND OTHER TRANSIENT SOUNDS

A PSYCHOACOUSTICAL INVESTIGATION INTO THE EFFECT OF WALL MATERIAL ON THE SOUND PRODUCED BY LIP-REED INSTRUMENTS

Acoustic and musical foundations of the speech/song illusion

Sound design strategy for enhancing subjective preference of EV interior sound

Musicians Adjustment of Performance to Room Acoustics, Part III: Understanding the Variations in Musical Expressions

Dimensions of Music *

DESIGNING OPTIMIZED MICROPHONE BEAMFORMERS

HST Neural Coding and Perception of Sound. Spring Cochlear Nucleus Unit Classification from Spike Trains. M.

MASTER'S THESIS. Listener Envelopment

August Acoustics and Psychoacoustics Barbara Crowe Music Therapy Director. Notes from BC s copyrighted materials for IHTP

Performing a Measurement/ Reading the Data

Psychoacoustic Evaluation of Fan Noise

Technical Data. HF Tuner WJ-9119 WATKINS-JOHNSON. Features

Hybrid active noise barrier with sound masking

Test Procedure for Common Path Distortion (CPD)

Topic 10. Multi-pitch Analysis

New Generation of MEA-Systems: MEA2100-System

Heart Rate Variability Preparing Data for Analysis Using AcqKnowledge

Precise Digital Integration of Fast Analogue Signals using a 12-bit Oscilloscope

THE DIGITAL DELAY ADVANTAGE A guide to using Digital Delays. Synchronize loudspeakers Eliminate comb filter distortion Align acoustic image.

FLOW INDUCED NOISE REDUCTION TECHNIQUES FOR MICROPHONES IN LOW SPEED WIND TUNNELS

A low noise multi electrode array system for in vitro electrophysiology. Mobius Tutorial AMPLIFIER TYPE SU-MED640

Precedence-based speech segregation in a virtual auditory environment

Simple Harmonic Motion: What is a Sound Spectrum?

Estimating the Time to Reach a Target Frequency in Singing

POSITIONING SUBWOOFERS

NIH Public Access Author Manuscript Nature. Author manuscript; available in PMC 2007 January 23.

LETTERS. The neuronal representation of pitch in primate auditory cortex. Daniel Bendor 1 & Xiaoqin Wang 1

Brain.fm Theory & Process

Brian C. J. Moore Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England

Getting Started with the LabVIEW Sound and Vibration Toolkit

Binaural Measurement, Analysis and Playback

EMI/EMC diagnostic and debugging

The Relationship Between Auditory Imagery and Musical Synchronization Abilities in Musicians

COMPARED IMPROVEMENT BY TIME, SPACE AND FREQUENCY DATA PROCESSING OF THE PERFORMANCES OF IR CAMERAS. APPLICATION TO ELECTROMAGNETISM

Transcription:

doi: 1.138/nature691 SUPPLEMENTAL METHODS Chronically Implanted Electrode Arrays Warp16 electrode arrays (Neuralynx Inc., Bozeman MT) were used for these recordings. These arrays consist of a 4x4 array of stainless steel guide tubes, each of which contains a single sharp microelectrode (Fig. S1, a-b). Electrodes are individually vertically moveable to a resolution of 1-2 µm using an external, removable pushing device (Neuralynx), though only in a downward direction. The electrodes were made of either tungsten (FHC, Bowdoinham ME) or platinum-iridium (Microprobes, Gaithersburg MD), were insulated with epoxy or parylene, had tip exposures of 3 µm, impedances from 2-4 MΩ, and shaft diameters of 75 µm in order to fit into the guide tubes. A printed circuit board connects the individual guide tubes to a connector block for headstage-tether attachment. Before the implantation of the arrays, animals were first implanted with a dental-acrylic headcap using aseptic technique. Creation of the headcap followed standard procedures in our lab developed for marmosets 3. After an appropriate recovery time, the auditory cortex (AC) on one hemisphere was located using standard single-electrode recording techniques. The center frequency and tone/noise preferences of observed neurons were used to localize the position within the AC and to serve as a reference location for the subsequent targeting of the array implantation. A removable protective cap was custom made to protect the array between experimental sessions. The left hemisphere was typically implanted first, followed a few weeks to months later by an implant in the right hemisphere, after which both arrays were recorded simultaneously. Later histologic examination showed all four arrays to span both primary AC as well as lateral and parabelt regions. Neural Recordings Neural signals were passed through a unitary-gain preamplifier headstage (HS6, Neuralynx) that attached to the electrode array, subsequently amplified (2,x) and band-pass filtered from.3 to 6kHz (Lynx-8, Neurlynx), before being digitized at 2 khz onto a 64-channel data-acquisition card (PCI-671E, National Instruments). Neural signals were observed on-line in order to guide electrode movement and optimize signal quality. During any given experimental session, two electrode channels were monitored, including on-line spike sorting (MSD, Alpha-Omega Engineering, Nazareth, Israel), in order to guide auditory stimulus selection. Neural signals sampled on different days were treated as individual units even when the electrodes were not moved. Vocal Recordings Vocalization recordings were obtained using a directional microphone (AKG C1S) placed ~2 cm in front of the animals, amplified (Symetrix SX22) and low-pass filtered to prevent aliasing (Frequency Devices, 24 khz, 8-pole Butterworth). Vocal signals were digitized at 5 khz sampling rate (National Instruments PCI-652E) and synchronized with neural recordings. Vocalizations were later extracted from the recorded signals and manually classified into established marmoset call types 31-32 based on their spectrograms. Only three of the major vocalization types were included for analysis: phees, trilphees, and trills. www.nature.com/nature 1

doi: 1.138/nature691 Experimental Protocol Experimental sessions typically began with presentation of acoustic stimuli to characterize the auditory tuning of neurons. Animals were seated in a custom primate chair within a soundproof chamber (Industrial Acoustics, Bronx NY). The acoustic stimuli were digitally generated at 5 or 1 khz sampling rate and delivered using TDT hardware (Tucker- Davis Technologies System II) in free-field through a speaker located 1 m in front of the animal (B&W DM61). Animals were restrained during all auditory stimulation to eliminate amplitude and binaural variability that would occur with an un-restrained animal. The stimuli included both tone- and noise-based sounds to assess frequency tuning and rate-level responses. Neuronal center frequencies (CFs) were determined by the pure tone or band-pass noise stimulus with the highest firing rate response. Animals were also presented with multiple recorded vocalizations at different sound levels, including samples of an animal s own vocalizations (previously recorded from that animal) and conspecific vocalization samples (from other animals in the marmoset colony). After auditory testing, vocal experiments were performed in either one of two settings. The first was an antiphonal calling experiment 24, where an animal vocalized interactively with recorded vocalizations from a conspecific animal. These experiments were conducted with the animal in the soundproof chamber, with the door ajar, and recorded vocalizations played from a speaker out of the animal s sight. During these experiments, the animals produced almost exclusively isolation (phee) calls. The second, more commonly used, setting of vocal experiments was recordings carried out in the marmoset colony. The subject animal was placed within a three-walled sound attenuation booth, within the colony, allowing free visual and vocal interaction with the rest of the animals in the colony. Multiple microphones were used to monitor both vocalizations produced by the experimental subject and sounds from the rest of the colony. In this setting, the two marmosets used in this study made a more diverse repertoire of vocalizations, including both isolation and social calls. In both experimental settings, the animals were either seated in the primate chair or were free-roaming. The former involved keeping a subject in the custom primate chair after auditory experiments, but releasing the head. Free-roaming experiments involved the use of small cage in which the animal was allowed to move freely without restraint. Tether wires connected the electrode arrays to amplifier hardware located outside the cage. All experiments were conducted under the guidelines and protocols approved by the Johns Hopkins University Animal Care and Use Committee. Altered Feedback Experiments Auditory feedback of produced vocalizations was altered in real-time by passing signals recorded from a microphone in front of the experimental subject through a vocal effects processor (Yamaha SPX 2) and modified by a ±2 semi-tone (ST) frequency shift (Fig. S1c). One ST is 1/12 th of an octave. For example, a +2 ST shift of a vocalization centered at 7 khz resulted in feedback centered at 7.86 khz. This shift magnitude was chosen because it is within the ethologic range of normal marmoset vocal variation 31-32, yet large enough to produce a noticeable difference in the vocalizations. The vocal effects processor introduced a small delay (15 ms) in altered feedback and a small amount of envelope fluctuations that were small compared to the absolute vocal amplitude. The intensity of shifted vocalization feedback was calibrated (PA4, Tucker-Davis Technologies Inc.) to ~1 db SPL above the intensity of direct, air-conducted feedback. This increase in altered feedback intensity was necessary because of the www.nature.com/nature 2

doi: 1.138/nature691 inability to block the direct feedback. The intensity of altered feedback, relative to direct (airconducted) feedback, was calibrated by recording vocalizations of a restrained animal using two microphones, one in the typical recording position in front of the animal and the second adjacent to the ear, and comparing the relative amplitudes. Altered feedback signals were presented back to the animal through a pair of earbud-style headphones (Sony MDR-E828LP) modified to attach to the animals headcaps (Fig. S1d). Past work involving altered feedback has been limited by animals unwillingness to wear headphones, a problem eliminated by attaching the headphones in this fashion. Many headphones were tested for spectral flatness, and these were found to be the best, with distortions <5 db over the frequency range of interest. There were two potential limitations of this feedback manipulation system. First, direct feedback could not be eliminated because animals generally stopped vocalizing when they could no longer hear auditory inputs from the rest of the marmoset colony. Second, the headphones could not be driven louder than 85 db SPL. Because of this particular limitation, the effect of altered feedback might be underestimated in situations where an animal s vocalizations were louder than the altered feedback. A typical feedback-alteration experiment involved three procedures: 1) an hour of recording baseline (un-altered) vocalizations together with neural responses, 2) an hour of recording with frequency-shifted feedback, and 3) ½ an hour of recording with amplified, but not frequency-shifted, feedback as a control. More than one frequency shift per session was generally not possible because of time limitations to obtain sufficient numbers of vocalizations in each condition. The direction of frequency shift for a given session was chosen randomly, without respect to the frequency tuning of individual neurons studied because multiple different neurons, covering the entire hearing range, were recorded simultaneously. The order of feedback blocks was changed during some sessions to eliminate temporal interactions. Additional auditory stimuli were tested to serve as controls for feedback alterations. These auditory control stimuli consisted of pre-recorded vocalizations from the experimental subject and frequency-shifted variations of these vocalizations. Frequency-shifted stimuli were created by passing recorded vocalizations through the same vocal effects processor used during vocal production experiments. Because animals vocalizing during altered feedback conditions simultaneously heard both frequency-shifted and direct feedback, frequency-shifted controls consisted of vocal stimuli added to themselves with the appropriate frequency shift, relative amplitude (+1 db SPL), and altered feedback delay (1 ms). This combination of shifted and un-shifted inputs best reflected the actual acoustic inputs during vocal production. Both normal and frequency-shifted vocal stimuli were presented at multiple sound levels to simulate variations in the loudness of produced vocalizations. One potential difference between vocal and auditory effects of frequency shift stems from the use of a room speaker for auditory testing and headphones for vocal experiments. However, on several occasions, auditory stimulus responses were compared for both speaker and headphones and were not qualitatively different. Data Analysis Responses to individual vocalizations were calculated by comparing the firing rates before and during self-initiated vocalizations. Pre-vocal activity was assessed by randomly selecting 1 time bins, with lengths matching the vocal duration, from the 4 seconds preceding vocal onset, and finding the average firing rate of 5 bins with the quietist microphone signal (acoustic background). A window of 5 ms immediately before vocal onset was excluded from this calculation because of our previous work 12 indicating pre-vocal suppression (median www.nature.com/nature 3

doi: 1.138/nature691 duration 24 ms). The response to each vocalization was individually quantified using the vocal RMI (see Methods). Vocalization responses that failed to elicit at least 3 spikes before or during the vocal period were excluded from analysis. The overall response of a neuron to vocalizations was assessed by averaging the RMI from multiple vocalization responses. Additional comparisons of feedback effects on suppressed (Response modulation index, RMI=-.2) and excited (RMI=.2) neural populations were made by calculating post-stimulus time histograms (PSTHs). PSTHs were calculated by averaging neural responses to vocalizations, aligned by the onset of each vocalization.. The binwidths used were 25 ms. Individual PSTHs were calculated for both suppressed and excited neural populations and for both baseline and altered-feedback conditions in each neural population. These comparisons were meant to be purely visual and qualitative and error bars are not shown, though they are small due to the large number of vocal samples. PSTHs were similarly calculated for individual neurons, for display purposes only, using 5 ms binwidths. The auditory effects of feedback alterations were measured by calculating the spontaneous-subtracted firing rate for the response to recorded vocal stimuli presented when the animal was in the sound chamber. Spontaneous rate during auditory stimulation was calculated from the 5 ms preceding each stimulus, in contrast to the more elaborate methods needed for vocal production because of background noise. These auditory responses were calculated both for normal and frequency-shifted vocal samples and compared to feedback effects during vocalization. Although auditory stimuli were presented at multiple sound levels (-9 db SPL), only those that overlapped the intensity of the produced vocalizations were included in the analysis. In addition to the calculation of the sensitivity index (SI, see Methods), evaluation of neuronal sensitivity to altered feedback was calculated using a d' measure, comparing normal and shifted feedback firing rates. There were calculated for both vocal and auditory responses, and compared to evaluate sensitivity increases during vocalization. D' measures, which take into account both average feedback changes and response variability, slightly underestimate the actual differences because the acoustic variability of vocalizations increases response distribution widths during vocalization, unlike auditory playback where stimuli were fixed and only varied in the introduced frequency shift. D' is, nonetheless, a useful adjunct to quantifying sensitivity increases. All quantitative analyses were performed on firing rate-based data. Analyses in the temporal dimensions, which may have the potential to pick up more subtle changes, were not performed because of the variability from one vocalization to another. A rigorous analysis of temporal responses patterns requires repetitive stimulation with an identical, or nearly-identical, stimulus. Analysis based on firing rate data is a far more conservative because individual vocal variations can be averaged out. www.nature.com/nature 4

doi: 1.138/nature691 SUPPLEMENTAL DISCUSSION Possible Origins of Vocal Suppression and Feedback Monitoring The neural circuits mediating vocal suppression and the resulting increase in feedback sensitivity in the primate brain are not yet known. It is likely that the suppression is the result of an internal modulatory signal originating in vocal control centres. Such internal signals are common in many sensory-motor systems and sometimes termed corollary discharges or efference copies 2. The locations of vocal centres responsible for the corollary discharge inhibiting the AC are unclear, particularly in non-human primates. It is also possible that brainstem and auditory peripheral mechanisms. Brainstem mechanisms have also been previously implicated in the detection of vocal feedback errors including GABAergic inhibition in non-auditory brainstem areas of echolocating bats 33, though these areas are outside the primary ascending auditory pathway and the neuronal mechanisms related to feedback coding have not been studied. While we cannot exclude inheritance of some vocal response properties from sub-cortical auditory structures, our previous work has provided evidence to suggest that vocal suppression observed in the AC cannot be entirely accounted for by sub-cortical contributions 123. It is likely that multiple levels of feedback monitoring are operating simultaneously during vocalization, involving both cortical and sub-cortical structures. Additional Considerations when Comparing Vocalization and Playback Several possible confounds to the comparison of vocal and auditory effects of altered feedback need to be mentioned for completeness, bone conduction and middle-ear muscle reflexes. During vocalization, vocal sounds are transmitted via the skull bones to the cochlea. It is unlikely, however, that these significantly contribute to the observed effects as bone conduction is primarily low-pass, transmitting acoustic energy primarily below 2 khz 34, and marmosets vocalizations are almost entirely above 4 khz. The middle-ear reflex, elicited during vocalization, acts to attenuate both bone and air conducted inputs, but primarily effects the low frequencies found in bone conduction 35, and probably does not significantly affect high frequency marmoset vocalizations. Furthermore, because bone conduction is equal during both normal and altered feedback, its effects should be removed during the altered-baseline subtraction used in data analysis. Another potential confound involves possible differences in the background acoustic environment. Although there were large differences in the background noise between vocal experiments in the marmoset colony and playback in the soundproof chamber, many of the vocal experiments were conducted in the same sound chamber using an antiphonal calling paradigm 24. The only environmental differences between playback and vocalization in this paradigm were the open chamber door (to present a stimulus out of line-of-site) and the relaxed restraint conditions (removal of head restraints or free-roaming). There may also be a concern that the continuous frequency shift would change the acoustic background heard by the animal. Because a directional microphone pointed at the experimental animal was used for frequency-shifts, the pickup of other sounds (e.g. background noise and other animal s vocalizations) was reduced by ~2 db. Therefore a shifted version of other sounds would have been much softer than the directly heard sound. A final possible confound to the vocalization-playback comparison is differences in the attentional state of the animal. Because of the intrinsic variability of vocal behaviour, and the www.nature.com/nature 5

doi: 1.138/nature691 need to perform many experiments in the marmoset colony, fluctuations in an animal s behavioural state remains a possibility. This was less likely the case for antiphonal experiments in the soundproof chamber, but may have existed between the chamber and marmoset colony. There may have also been fluctuations in behavioural state during and between the blocked feedback conditions. These factors are unlikely to account for the reported observations, but also cannot be ignored. Additional References 31. Pistorio, A.L., Vintch, B. & Wang, X. Acoustical analysis of vocal development in a New World primate, the common marmoset (Callithrix jacchus). J. Acoust. Soc. Am. 12,1655-167 (26). 32. DiMattina, C. & Wang, X. Virtual vocalization stimuli for investigating neural representations of species-specific vocalizations. J. Neurophysiol. 95,1244262 (26). 33. Smotherman, M., Zhang, S. & Metzner, W. A neural basis for auditory feedback control of vocal pitch. J. Neurosci. 23, 1464477 (23). 34. Tonndorf, J. in Handbook of Sensory Physiology (eds. Keidel, W. D. & Neff, W. D.) 37-84 (Springer-Verlag, New York, 1976). 35. Irvine, D. R. & Wester, K. G. Middle ear muscle effects on cochlear responses to boneconducted sound. Acta. Physiol. Scand. 91, 482-96. (1974). www.nature.com/nature 6

doi: 1.138/nature691 Figure S1 a b Circuit Board Guide Tube Electrode Bone Silastic Layer Brain c d DSP Box PA4 Figure S1. Illustration of experimental methods. a, Photograph of the Warp16 implantable 16-electrode arrays (Neuralynx Inc). The bottom of the array is towards the right. b, Schematic of the Warp16 array. The array consists of a 4x4 matrix of stainless steel guide tubes, each of which holds a single metal microelectrode. A 3º bend in the tail of the electrodes maintains mechanical and electrical contact with the guide tube. A small layer of silastic forms a tight seal around the electrodes as they penetrate through into the brain, increasing stability and preventing CSF leaks. c, Illustration of the feedback alteration system. A microphone relays vocal signals to a vocal effects processor ('DSP Box', Yamaha SPX 2), which are then attenuated ('PA4', Tucker-Davis Technologies) and presented to the animal through headphones at 1 db louder than direct, air-conducted feedback. d, Picture of the headphones used for feedback manipulation. A pair of earbud-style headphones were modified to be mounted to the headcap. See Methods section for additional details. www.nature.com/nature 7

doi: 1.138/nature691 Figure S2 a Single Phee Two-phrase Phee Firing Rate (spikes s ) % of Vocalizations 1 5-5 16 12 8 4 N=212 neurons 1 sec 1 sec Baseline Altered -3-2 1 2 3 4 5 Phrase Offsets Phrase 1 Phrase 2 Phrase 3-3 -2 1 2 3 4 5 Time relative vocal onset (sec) b Trill Firing Rate (spikes s ) 8.5 sec N=179 neurons 6 4 2-2.5 -.5.5 1 1.5 2 Time relative vocal onset (sec) Figure S2. Comparison of average population responses between normal and altered feedback conditions. Peri-stimulus time histograms (PSTHs, 25ms binwidth) were computed from responses of all recorded AC neurons, including both suppressed and excited neurons. Frequency-shifted vocalizations (red) elicited higher firing rates than baseline, normal vocal production (blue) for both phee (a) and trill (b) vocalizations. The overall bias towards increased neural activity was due to the increased activity in the more prevalent suppressed neurons despite a slight decrease in activity in excited neurons (see Fig. 2). Sample spectrograms are shown for single and multi-phrase phee (a, top) and trill vocalizations (b, top). Scale bars indicate relative vocal durations. A distribution of the times of phrase offset for phee vocalizations is shown (a, bottom), and the featured peaks correspond to the transient increases in neural activity seen in the PSTH (middle). www.nature.com/nature 8

doi: 1.138/nature691 Figure S3 1 +2 ST (N=143 neurons) -2 ST (N=121neurons).5 RMI Difference (altered-baseline) -.5 Suppressed Excited -.5.5 1 Baseline RMI Figure S3. Population scatter plot comparing the RMI difference between altered feedback and baseline conditions. Fig. 2c is based on the detailed data shown in this plot. Suppressed neurons (RMI <) showed the greatest variability of feedback effects, some increasing and others decreasing their responses. Excited neurons (RMI >) showed both less variability and less feedback increases. Positive (orange) and negative (green) frequency shifts are indicated by color. Points are shown for each vocalization type in each neuron (phee N=197, trilphee N=162, trill N=17). www.nature.com/nature 9

doi: 1.138/nature691 Figure S4 a Positive Freq Shifts Negative Freq Shifts 1 r=.52 1 r=.59 N=146 N=121 RMI (altered).5 -.5.5 -.5 1 RMI (Baseline) 1 RMI (Baseline) b Phee Trilphee Trill N=192 N=162 N=17 RMI (altered).5 -.5.5 -.5.5 -.5 1 1 1 RMI (Baseline) RMI (Baseline) RMI (Baseline) Figure S4. Comparison of altered feedback effects between frequency shift direction and vocalization type. a, Individual scatter plots compare positive (+2 ST, left) and negative (-2 ST, right) shift directions. Correlation coefficients (r=.52 and.59, respectively) were statistically significant (p<.1), but not significantly different from one another (p>.5). b, Results are plotted individually for phees (left, r=.7), trilphees (middle, r=.48), and trills (right, r=.39). N indicates numbers of neurons on all plots. www.nature.com/nature 1

doi: 1.138/nature691 Figure S5 15 M49p (N=61 neurons) M49r (N=179 neurons) Percent of Neurons (%) 1 5 -.8 -.6 -.4 -.2.2.4.6.8 1 RMI Difference (altered-baseline) Figure S5. Distributions of altered feedback effects in two animals. Population distributions, analogous to Fig. 3b, are shown individually for the two animals (M49p and M49r). The animal-specific RMI differences were.8±.25 (mean±sd) and.4±.19, respectively. These population differences between the two animals were not statistically significant (p>.5, ranksum). www.nature.com/nature 11

doi: 1.138/nature691 Figure S6 a Sound Level (db SPL) 8 6 4 2 Normal 4 8 12 8 6 4 2 Freq Shift +2 ST 4 8 12 Time (ms) b c Firing Rate (spk/s) Percent of Samples 2 1 15 1 2 4 6 8 Sound Level (db SPL) 5 Normal Freq Shift med=.2 med=.46 1 Auditory RMI Figure S6. Example of auditory responses to the playback of vocalization stimuli. A sample neuron is shown to illustrate the process of presenting auditory stimuli and calculating the effects of frequency-shifts on the auditory responses. a, Raster plots showing auditory responses of a sample neuron at different sound pressure levels (SPLs) to the playback of a normal trill vocalization (top) and a frequency-shifted trill vocalization (+2 ST, bottom). b, Firing rate versus sound level curves calculated from the data shown in a. c, Distributions of auditory RMI in response to normal (blue) and frequency-shifted (red) playback vocalizations. There was a significant increase in auditory responses with the +2 ST frequency shift (auditory RMI difference +.26, p<.5, ranksum). www.nature.com/nature 12

doi: 1.138/nature691 Figure S7 a 1 Phee -2 ST +2 ST RMI Difference.5 -.5 b N=192 neurons 5 1 15 2 25 3 Trilphee 1 RMI Difference.5 -.5 c N=162 neurons 5 1 15 2 25 3 Trill 1 RMI Difference.5 -.5 N=17 neurons 5 1 15 2 25 3 Unit CF (khz) Figure S7. Frequency-shifted feedback responses and frequency tuning. Responses are shown for individual neurons comparing the RMI difference (altered-baseline) and a neuron's center frequency (CF) measured with either tones or band-pass noise. Left and right pointing symbols correspond to negative (-2 ST) and positive (+2 ST) frequency shift directions. Different vocalization types are plotted individually: phee (a), triphee (b), and trill (c). The mean vocal frequency and its harmonics are indicated (dashed line) as is the frequency range corresponding to ±2 ST (shaded area). A clear pattern of CF dependence, i.e. a positive RMI difference indicating increased activity when a unit has a CF less than normal vocal frequencies and a -2 ST shift is applied but a negative RMI difference with a +2 ST shift, might be expected from a simple model based on auditory receptive fields. This was not the case, and no clear pattern is evident in the data. Interestingly, many neurons with CFs in the 1-2 khz region exhibited feedback-related changes despite being well outside the vocal frequency range. www.nature.com/nature 13

doi: 1.138/nature691 Figure S8 a Firing Rate (spk/s) 3 2 1 Base -2 ST CF = 4 khz +2 ST -3-2 1 2 3 4 5 b Firing Rate (spk/s) 4 3 2 1 CF = 1.5 khz -3-2 1 2 3 4 5 c Firing Rate (spk/s) 3 2 1 CF = 9.1 khz -3-2 1 2 3 4 5 Time relative vocal onset (sec) Figure S8. Sample responses of neurons tested with multiple frequency shifts. In some sessions, both positive and negative frequency shifts were tested on the same neurons. a, A sample response PSTH (25 ms bindwidth) is shown for a neuron that was weakly suppressed by baseline vocalizations (blue) where the firing rate increased for negative (green), but not positive (orange), shifts. Because this neuron's CF was 4 khz, these responses could be explained based on shifting the vocalization acoustics towards the neuron's frequency receptive field. b, A sample neuron excited by baseline vocalization where -2 ST shifts had little effect, but +2 ST shifts slightly reduced the firing rate. This was surprising given that the CF of the neuron was 1.5 khz, well away from the vocal acoustic frequencies. c, A third sample neuron, weakly excited by normal vocalization, that exhibited increased activity during -2 ST, but not +2 ST frequency shifts. This neuron had a CF of 9.1 khz, and should have exhibited increased firing for the positive frequency shift if altered-feedback effects were simply influenced by auditory tuning. www.nature.com/nature 14

doi: 1.138/nature691 Figure S9 a Auditory d' (altered-baseline) 3 2 1 N=29 neurons 1 2 3 Vocal d' (altered-baseline) b c Percent of Neurons (%) Mean d' change 12 8 4 1 mean =.29-2 1 2 d' change * * * * Suppressed Excited -.5.5 1 Baseline RMI * Figure S9. Comparison of frequency shift effects between vocalization and auditory conditions using discriminability measures. a, The d' measure, indicating the discriminability between normal and frequency-shifted vocalizations is shown for vocalizations and auditory playback. Values for individual neurons are small, but clearly increased during vocalization compared to auditory playback. b, A comparison of vocal and auditory discriminations, the d' change, is shown. This distribution (mean±sd =.29±.58) is shifted towards positive values (p<.1, signrank), indicating an increase in discriminability during vocalization analogous to the positive sensitivity change indicated by the sensitivity index (SI) shown in Fig. 4c. c, The dependence of this d' increase on baseline vocalization-induced modulation (RMI), is nearly identical to the SI in Fig 4d, indicating that it is suppressed neurons that exhibit increases in sensitivity to feedback during vocalization. Error bars, boot-strapped 95% confidence intervals (* p<.1 signrank). The d' values for individual neurons are relatively low, even during vocalization. This is likely due to the acoustic variability from one vocalization to another, introducing additional variability to the neural responses and thus decreased d-prime values. When the feedback responses are constrained to only include vocalizations whose acoustics match those of the baseline vocalizations, reducing the vocal variability, the mean d' change between vocal and auditory conditions increases from.29 to.36 (p<.1, ranksum). The reported d' values thus represent the minimum discrimination of altered feedback rather than the optimal performance. www.nature.com/nature 15

doi: 1.138/nature691 Figure S1 a b Firing Rate (spikes s ) Firing Rate (spikes s ) 1 5 12 8 4 N=146 neurons Baseline Altered -.5.5 1 1.5 N=28 neurons -4 -.5.5 1 1.5 Time relative vocal onset (sec) Suppressed Neurons Excited Neurons Figure S1. Frequency shift effects during auditory playback in suppressed and excited neural populations. PSTHs showing population average responses to auditory stimuli, both baseline (blue) and combined with frequencyshifted stimuli ('altered', red), are compared. Firing rates are only slightly increased for altered stimuli in neurons suppressed during vocal production (a). Greater differences are evident in neurons excited during vocalization (b). These differences contrast with the feedback effects observed during vocalization, where suppressed neurons exhibit large increases in neural activity during altered feedback (see Fig. 2a) and excited neurons exhibit only small feedback changes (Fig. 2b). These results support a sensitivity increase during vocalization that is present for suppressed, but not excited neurons. www.nature.com/nature 16