Emergence of invariant representation of vocalizations in the auditory cortex

Similar documents
-To become familiar with the input/output characteristics of several types of standard flip-flop devices and the conversion among them.

Measurement of Capacitances Based on a Flip-Flop Sensor

G E T T I N G I N S T R U M E N T S, I N C.

10. Water tank. Example I. Draw the graph of the amount z of water in the tank against time t.. Explain the shape of the graph.

application software

DO NOT COPY DO NOT COPY DO NOT COPY DO NOT COPY

Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion

application software

Adaptive Down-Sampling Video Coding

Communication Systems, 5e

First Result of the SMA Holography Experirnent

Overview ECE 553: TESTING AND TESTABLE DESIGN OF. Ad-Hoc DFT Methods Good design practices learned through experience are used as guidelines:

4.1 Water tank. height z (mm) time t (s)

Real-time Facial Expression Recognition in Image Sequences Using an AdaBoost-based Multi-classifier

LATCHES Implementation With Complex Gates

Nonuniform sampling AN1

Lab 2 Position and Velocity

Drivers Evaluation of Performance of LED Traffic Signal Modules

Coded Strobing Photography: Compressive Sensing of High-speed Periodic Events

EX 5 DIGITAL ELECTRONICS (GROUP 1BT4) G

TRANSFORM DOMAIN SLICE BASED DISTRIBUTED VIDEO CODING

Workflow Overview. BD FACSDiva Software Quick Reference Guide for BD FACSAria Cell Sorters. Starting Up the System. Checking Cytometer Performance

CE 603 Photogrammetry II. Condition number = 2.7E+06

A Methodology for Evaluating Storage Systems in Distributed and Hierarchical Video Servers

THE INCREASING demand to display video contents

A Turbo Tutorial. by Jakob Dahl Andersen COM Center Technical University of Denmark

2015 Communication Guide

THERMOELASTIC SIGNAL PROCESSING USING AN FFT LOCK-IN BASED ALGORITHM ON EXTENDED SAMPLED DATA

AN ESTIMATION METHOD OF VOICE TIMBRE EVALUATION VALUES USING FEATURE EXTRACTION WITH GAUSSIAN MIXTURE MODEL BASED ON REFERENCE SINGER

Telemetrie-Messtechnik Schnorrenberg

Physics 218: Exam 1. Sections: , , , 544, , 557,569, 572 September 28 th, 2016

UPDATE FOR DESIGN OF STRUCTURAL STEEL HOLLOW SECTION CONNECTIONS VOLUME 1 DESIGN MODELS, First edition 1996 A.A. SYAM AND B.G.

Automatic Selection and Concatenation System for Jazz Piano Trio Using Case Data

Removal of Order Domain Content in Rotating Equipment Signals by Double Resampling

Video Summarization from Spatio-Temporal Features

Besides our own analog sensors, it can serve as a controller performing variegated control functions for any type of analog device by any maker.

Determinants of investment in fixed assets and in intangible assets for hightech

MULTI-VIEW VIDEO COMPRESSION USING DYNAMIC BACKGROUND FRAME AND 3D MOTION ESTIMATION

Monitoring Technology

The Impact of e-book Technology on Book Retailing

SC434L_DVCC-Tutorial 1 Intro. and DV Formats

Hierarchical Sequential Memory for Music: A Cognitive Model

MELODY EXTRACTION FROM POLYPHONIC AUDIO BASED ON PARTICLE FILTER

The Art of Image Acquisition

On Mopping: A Mathematical Model for Mopping a Dirty Floor

Solution Guide II-A. Image Acquisition. HALCON Progress

A ROBUST DIGITAL IMAGE COPYRIGHT PROTECTION USING 4-LEVEL DWT ALGORITHM

Solution Guide II-A. Image Acquisition. Building Vision for Business. MVTec Software GmbH

Source and Channel Coding Issues for ATM Networks y. ECSE Department, Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A

Predicting the perceived Quality of impulsive Vehicle sounds

SMD LED Product Data Sheet LTSA-G6SPVEKT Spec No.: DS Effective Date: 10/12/2016 LITE-ON DCC RELEASE

Diffusion in Concert halls analyzed as a function of time during the decay process

Digital Panel Controller

The Art of Image Acquisition

Singing voice detection with deep recurrent neural networks

VECM and Variance Decomposition: An Application to the Consumption-Wealth Ratio

AUTOCOMPENSATIVE SYSTEM FOR MEASUREMENT OF THE CAPACITANCES

Computer Graphics Applications to Crew Displays

(12) (10) Patent N0.: US 7,260,789 B2 Hunleth et a]. (45) Date of Patent: Aug. 21, 2007

R&D White Paper WHP 120. Digital on-channel repeater for DAB. Research & Development BRITISH BROADCASTING CORPORATION.

TUBICOPTERS & MORE OBJECTIVE

The Measurement of Personality and Behavior Disorders by the I. P. A. T. Music Preference Test

Region-based Temporally Consistent Video Post-processing

Enabling Switch Devices

Personal Computer Embedded Type Servo System Controller. Simple Motion Board User's Manual (Advanced Synchronous Control) -MR-EM340GF

Truncated Gray-Coded Bit-Plane Matching Based Motion Estimation and its Hardware Architecture

SOME FUNCTIONAL PATTERNS ON THE NON-VERBAL LEVEL

Novel Power Supply Independent Ring Oscillator

SAFETY WITH A SYSTEM V EN

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

And the Oscar Goes to...peeeeedrooooo! 1

Sustainable Value Creation: The role of IT innovation persistence

MELSEC iq-f FX5 Simple Motion Module User's Manual (Advanced Synchronous Control) -FX5-40SSC-S -FX5-80SSC-S

TLE Overview. High Speed CAN FD Transceiver. Qualified for Automotive Applications according to AEC-Q100

IN THE FOCUS: Brain Products acticap boosts road safety research

Computer Vision II Lecture 8

Computer Vision II Lecture 8

USB TRANSCEIVER MACROCELL INTERFACE WITH USB 3.0 APPLICATIONS USING FPGA IMPLEMENTATION

Advanced Handheld Tachometer FT Measure engine rotation speed via cigarette lighter socket sensor! Cigarette lighter socket sensor FT-0801

Automatic location and removal of video logos

Student worksheet: Spoken Grammar

TLE9251V. 1 Overview. High Speed CAN Transceiver. Qualified for Automotive Applications according to AEC-Q100. Features

LOW LEVEL DESCRIPTORS BASED DBLSTM BOTTLENECK FEATURE FOR SPEECH DRIVEN TALKING AVATAR

DIGITAL MOMENT LIMITTER. Instruction Manual EN B

LCD Module Specification

Marjorie Thomas' schemas of Possible 2-voice canonic relationships

LABORATORY COURSE OF ELECTRONIC INSTRUMENTATION BASED ON THE TELEMETRY OF SEVERAL PARAMETERS OF A REMOTE CONTROLLED CAR

Trinitron Color TV KV-TG21 KV-PG21 KV-PG14. Operating Instructions M70 M61 M40 P70 P (1)

TLE7251V. 1 Overview. Features. Potential applications. Product validation. High Speed CAN-Transceiver with Bus Wake-up

TEA2037A HORIZONTAL & VERTICAL DEFLECTION CIRCUIT

TLE6251D. Data Sheet. Automotive Power. High Speed CAN-Transceiver with Bus Wake-up. Rev. 1.0,

Circuit Breaker Ratings A Primer for Protection Engineers

Signing Naturally, Teacher s Curriculum Guide, Units 7 12 Copyright 2014 Lentz, Mikos, Smith All Rights Reserved.

United States Patent (19) Gardner

H3CR. Multifunctional Timer Twin Timer Star-delta Timer Power OFF-delay Timer H3CR-A H3CR-AS H3CR-AP H3CR-A8 H3CR-A8S H3CR-A8E H3CR-G.

CHEATER CIRCUITS FOR THE TESTING OF THYRATRONS

Tarinaoopperabaletti

Press Release

Video inpainting of complex scenes based on local statistical model

Transcription:

J Neurophysiol 114: 2726 274, 215. Firs published Augus 26, 215; doi:1.1152/jn.95.215. Emergence of invarian represenaion of vocalizaions in he audiory corex Isaac M. Carruhers, 1,2 Diego A. Laplagne, 3 Andrew Jaegle, 1,4 John J. Briguglio, 1,2 Laeiia Mwilambwe-Tshilobo, 1 Ryan G. Naan, 1,3 and Maria N. Geffen 1,2,4 1 Deparmen of Oorhinolaryngology and Head and Neck Surgery, Universiy of Pennsylvania, Philadelphia, Pennsylvania; 2 Graduae Group in Physics, Universiy of Pennsylvania, Philadelphia, Pennsylvania; 3 Brain Insiue, Federal Universiy of Rio Grande do Nore, Naal, Brazil; and 4 Graduae Group in Neuroscience, Universiy of Pennsylvania, Philadelphia, Pennsylvania Submied 28 January 215; acceped in final form 25 Augus 215 Carruhers IM, Laplagne DA, Jaegle A, Briguglio JJ, Mwilambwe-Tshilobo L, Naan RG, Geffen MN. Emergence of invarian represenaion of vocalizaions in he audiory corex. J Neurophysiol 114: 2726 274, 215. Firs published Augus 26, 215; doi:1.1152/jn.95.215. An essenial ask of he audiory sysem is o discriminae beween differen communicaion signals, such as vocalizaions. In everyday acousic environmens, he audiory sysem needs o be capable of performing he discriminaion under differen acousic disorions of vocalizaions. To achieve his, he audiory sysem is hough o build a represenaion of vocalizaions ha is invarian o heir basic acousic ransformaions. The mechanism by which neuronal populaions creae such an invarian represenaion wihin he audiory corex is only beginning o be undersood. We recorded he responses of populaions of neurons in he primary and nonprimary audiory corex of ras o original and acousically disored vocalizaions. We found ha populaions of neurons in he nonprimary audiory corex exhibied greaer invariance in encoding vocalizaions over acousic ransformaions han neuronal populaions in he primary audiory corex. These findings are consisen wih he hypohesis ha invarian represenaions are creaed gradually hrough hierarchical ransformaion wihin he audiory pahway. audiory corex; hierarchical coding; invariance; processing; vocalizaions IN EVERYDAY acousic environmens, communicaion signals are subjeced o acousic ransformaions. For example, a word may be pronounced slowly or quickly, or by differen speakers. These ransformaions can include shifs in specral conen, variaions in frequency modulaion, and emporal disorions. Ye he audiory sysem needs o preserve he abiliy o disinguish beween differen words or vocalizaions under many acousic ransformaions, forming an invarian or oleran represenaion (Sharpee e al. 211). Presenly, lile is undersood abou how he audiory sysem creaes a represenaion of communicaion signals ha is invarian o acousic disorions. I has been proposed ha wihin he audiory processing pahway, invariance emerges in a hierarchical fashion, wih higher audiory areas exhibiing progressively more oleran represenaions of complex sounds. The audiory corex (AC) is an essenial brain area for encoding behaviorally imporan acousic signals (Aizenberg and Geffen 213; Engineer e al. 28; Friz e al. 21; Galindo-Leon e al. 29; Recanzone and Cohen 21; Schnupp e al. 26; Wang e al. 1995). Up Address for reprin requess and oher correspondence: M. N. Geffen, Dep. of Oorhinolaryngology and Head and Neck Surgery, Univ. of Pennsylvania Perelman School of Medicine, 5 Ravdin, 34 Spruce S., Philadelphia, PA 1914 (e-mail: mgeffen@med.upenn.edu). o and wihin he primary audiory corex (A1), he represenaions of audiory simuli are hypohesized o suppor an increase in invariance. Whereas neurons in inpu layers of A1 preferenially respond o specific feaures of acousic simuli, neurons in he oupu layers become more selecive o combinaions of simulus feaures (Aencio e al. 29; Sharpee e al. 211). In he visual pahway, recen sudies sugges a similar organizing principle (DiCarlo and Cox 27), such ha populaions of neurons in higher visual area exhibi greaer olerance o visual simulus ransformaions han neurons in he lower visual area (Rus and DiCarlo 21; Rus and DiCarlo 212). Here, we esed wheher populaions of neurons beyond A1, in a nonprimary audiory corex, suppor a similar increase in invarian represenaion. We focused on he ransformaion beween A1 and one of is downsream arges in he ra, he suprarhinal audiory field (SRAF) (Arnaul and Roger 199; Polley e al. 27; Profan e al. 213; Romanski and LeDoux 1993b). A1 receives projecions direcly from he lemniscal halamus ino he granular layers (Kimura e al. 23; Polley e al. 27; Roger and Arnaul 1989; Romanski and LeDoux 1993b; Sorace e al. 21; Winer e al. 1999) and sends exensive convergen projecions o SRAF (Covic and Sherman 211; Winer and Schreiner 21). Neurons in A1 exhibi shor-laency, shor ime-o-peak responses o ones (Polley e al. 27; Profan e al. 213; Rukowski e al. 23; Sally and Kelly 1988). By conras, neurons in SRAF exhibi delayed response laencies, longer ime o peak in response o ones, specrally broader recepive fields and lower spike raes in responses o noise han neurons in A1 (Arnaul and Roger 199; LeDoux e al. 1991; Polley e al. 27; Romanski and LeDoux 1993a), consisen wih responses in nonprimary AC in oher species (Carrasco and Lomber 211; Kaas and Hacke 1998; Kikuchi e al. 21; Kusmierek and Rauschecker 29; Lakaos e al. 25; Pekov e al. 26; Rauschecker and Tian 24; Rauschecker e al. 1995). These properies also sugges an increase in uning specificiy from A1 o SRAF, which is consisen wih he hierarchical coding hypohesis. Ras use ulrasonic vocalizaions (USVs) for communicaion (Knuson e al. 22; Porfors 27; Sewell 197; Takahashi e al. 21). Like mouse USVs (Galindo-Leon e al. 29; Liu and Schreiner 27; Marlin e al. 215; Porfors 27), male USVs evoke emporally precise and predicable paerns of aciviy across A1 (Carruhers e al. 213), hereby providing us an ideal se of simuli wih which o probe invariance o acousic ransformaions in he audiory corex. The USVs used in his sudy are par of he more general class of high- Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 2726 22-377/15 Copyrigh 215 he American Physiological Sociey www.jn.org

INVARIANT CODING IN THE AUDITORY CORTEX 2727 frequency USVs, which are produced during posiive social, sexual, and emoional siuaions (Barfield e al. 1979; Bialy e al. 2; Brudzynski and Pniak 22; Burgdorf e al. 2; Burgdorf e al. 28; Knuson e al. 1998; 22; McInosh e al. 1978; Parro 1976; Sales 1972; Wohr e al. 28). The specific USVs were recorded during friendly male adolescen play (Carruhers e al. 213; Siroin e al. 214; Wrigh e al. 21). Responses of neurons in A1 o USVs can be prediced based on a linear non-linear model ha akes as an inpu wo ime-varying parameers of he acousic waveform of USVs: he frequency- and emporal-modulaion of he dominan specral componen (Carruhers e al. 213). Therefore, we used hese sound parameers as he basic acousic dimensions along which he simuli were disored. A he level of neuronal populaion responses o USVs, response invariance can be characerized by measuring he changes in neuromeric discriminabiliy beween USVs as a funcion of he presence of acousic disorions. Neuromeric discriminabiliy is a measure of how well an observer can discriminae beween simuli based on he recorded neuronal signals (Bizley e al. 29; Gai and Carney 28; Schneider and Woolley 21). Because his measure quanifies available informaion, which is a normalized quaniy, i allows us o compare he expeced effecs across wo differen neuronal populaions in differen anaomical areas. If he represenaion in a brain area is invarian, discriminabiliy beween USVs is expeced o show lile degradaion in response o acousic disorions. On he oher hand, if he neuronal represenaion is based largely on direc encoding of acousic feaures, raher han encoding of he vocalizaion idenify, he neuromeric discriminabiliy will be degraded wih changes in he acousic feaures of he USVs. Here, we recorded he responses of populaions of neurons in A1 and SRAF o original and acousically disored USVs, and esed how acousic disorion of USVs affeced he abiliy of neuronal populaions o discriminae beween differen insances of USVs. We found ha neuronal populaions in SRAF exhibi greaer generalizaion for acousic disorions of vocalizaions han neuronal populaions in A1. METHODS Animals. All procedures were approved by he Insiuional Animal Care and Use Commiee of he Universiy of Pennsylvania. Subjecs in all experimens were adul male Long Evans ras, 12 16 wk of age. Ras were housed in a emperaure- and humidiy-conrolled vivarium on a reversed 24 h ligh-dark cycle wih food and waer provided ad libium. Simuli. The original vocalizaions were exraced from a recording of an adul male Long Evans ra ineracing wih a conspecific male in a cusom-buil social arena (Fig. 1A). As described previously (Siroin e al. 214), he arena is spli in half and kep in he dark, such ha he wo ras can hear and smell each oher and heir vocalizaions can be unambiguously assigned o he emiing subjec. In hese sessions, ras emied high raes of calls from he 5 khz family and none of he 22 khz ype, suggesing ineracions were posiive in naure (Brudzynski 29). Recordings were made using condenser microphones wih nearly fla frequency response from 1 o 15 khz (CM16/CMPA-5V, Avisof Bioacusics) digiized wih a daa acquisiion board a 3 khz sampling frequency (PCIe-6259 DAQ wih BNC-211 connecor, Naional Insrumens). We seleced eigh represenaive USVs wih disinc specroemporal properies (Figs. 1 and 2) (Carruhers e al. 213) from he 6,865 ones emied by one of he ras. We conrased mean frequency and frequency bandwidh of he seleced calls wih ha of he whole reperoire from he same ra (Fig. 2B). We calculaed vocalizaion cener frequency as he mean of he fundamenal frequency and bandwidh as he roo mean square of he mean-subraced fundamenal frequency of each USV. We denoised and paramerized USVs following mehods published previously by our group (Carruhers e al. 213). Briefly, we consruced a noiseless version of he vocalizaions using an auomaed procedure. We compued he noiseless signal as a frequency- and ampliude-modulaed one, such ha a any ime, he frequency, f(), and ampliude, a(), of ha one were mached o he peak ampliude and frequency of he recorded USV a all imes, using he relaion x a sin 2 f d. We consruced he acousic disorions of he 8 seleced vocalizaions along he dimensions ha are essenial for heir encoding in he audiory pahway (Fig. 1B). For each of hese 8 original vocalizaions we generaed eigh differen ransformed versions, amouning o 9 versions (referred o as ransformaion condiions) of each vocalizaion. We hen generaed he simulus sequences by concaenaing he vocalizaions, padding hem wih silence such ha hey were presened a a rae of 2.5 Hz. Simulus ransformaions. The 8 ransformaions applied o each vocalizaion were emporal compression (designaed T, ransformed by scaling he lengh by a facor of.75: x a.75 sin.75 2 f.75 d ), emporal dilaion (T, lengh 1.25: x a 1.25 1.25 sin 2 f 1.25 d ), specral compression (FM, bandwidh.75: x a sin 2.75 f f f d ), specral dilaion (FM, bandwidh 1.25: x a sin 2 1.25 f f f d ), specroemporal compression (T /FM, lengh and bandwidh.75: x a.75.75 sin 2.75 f.75 f f d ), specroemporal dilaion (T /FM, lengh and bandwidh 1.25: x a 1.25 sin 1.25 1.25 f 1.25 f f d ), cener-frequency increase (CF, 2 frequency 7.9 khz: x a sin 2 f 7.9 khz d ), and cener-frequency decrease (CF minus, frequency minus 7.9 khz: x a sin 2 f 7.9 khz d ). Specrograms of denoised vocalizaions are shown in Fig. 1A. Specrograms of ransformaions of one of he vocalizaions are shown in Fig. 1B. Microdrive implanaion. Ras were anesheized wih an inraperioneal injecion of a mixure of keamine (6 mg/kg body w) and dexmedeomidine (.25 mg/kg). Buprenorphine (.1 mg/kg) was adminisered as an operaive analgesic wih keoprofen (5 mg/kg) as posoperaive analgesic. A small cranioomy was performed over A1 or SRAF. Eigh independenly movable erodes housed in a microdrive (6 for recordings and 2 used as a reference) were implaned in A1 (argeing layer 2/3), SRAF (argeing layer 2/3) or boh as previously described (Carruhers e al. 213; Oazu e al. 29). The microdrive was secured o he skull using denal cemen and acrylic. The erodes iniial lenghs were adjused o arge A1 or SRAF during implanaion, and were furhermore advanced by up o 2 mm (in 4- m incremens, once per recording session) once he erode Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

2728 INVARIANT CODING IN THE AUDITORY CORTEX A Voc 1 Voc 2 Voc 3 Voc 4 75 7 65 5 6 55 SPL (db) Frequency (khz) B Frequency (khz) 25 75 5 25 75 5 25 75 5 Voc 5 Voc 6 Voc 7 Voc 8 1 1 1 1 Time (ms) T+ CF+ T- FM- Original FM+ 5 45 7 65 6 55 5 45 SPL (db) Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 25 T-/FM- CF- T+/FM+ 75 5 25 1 1 Time (ms) 1 J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

INVARIANT CODING IN THE AUDITORY CORTEX 2729 was implaned. A1 and SRAF were reached by erodes implaned a he same angle (verically) hrough a single cranioomy window (on he op of he skull) by advancing he erodes o differen dephs on he basis of heir sereoacic coordinaes (Paxinos and Wason 1986; Polley e al. 27). A he endpoin of he experimen a small lesion was made a he elecrode ip by passing a shor curren (1 A, 1 s) beween elecrodes wihin he same erode. The brain areas from which he recordings were made were idenified hrough hisological reconsrucion of he elecrode racks. Limis of brain areas were aken from Paxinos and Wason (1986) and Polley e al. (27). Simulus presenaion. The ra was placed on he floor of a cusombuil behavioral chamber, housed inside a large double-walled acousic isolaion booh (Indusrial Acousics). The acousical simulus was delivered using an elecrosaic speaker (MF-1, Tucker-Davis Technologies) posiioned direcly above he subjec. All simuli were conrolled using cusom-buil sofware (Mahworks), a high-speed digial-o-analog card (Naional Insrumens) and an amplifier (TDT). The speaker oupu was calibraed using a 1/4-in. free-field microphone (Bruel and Kjaer, ype 4939) a he approximae locaion of he animal s head. The inpu o he speaker was compensaed o ensure ha pure ones beween.4 and 8 khz could be oupu a a volume of 7 db o wihin a margin of a mos 3 db. Specral and emporal disorion producs as well as environmenal reverberaion producs were 5 db below he mean sound pressure level relaive o 2 Pa (SPL) of all simuli, including USVs (Carruhers e al. 213). Unless oherwise menioned, all simuli were presened a 65 db (SPL), 32-bi deph and 4 khz sample rae. Elecrophysiological recording. The elecrodes were conneced o he recording apparaus (Neuralynx digial Lynx) via a hin cable. The posiion of each erode was advanced by a leas 4 m beween sessions o avoid repeaed recoding from he same unis. Terode posiion was noed o 2 m precision. Elecrophysiological daa from 24 channels were filered beween 6 and 6, Hz (o obain spike responses), digiized a 32 khz, and sored for offline analysis. Single and muli-uni waveform clusers were isolaed using commercial sofware (Plexon Spike Sorer) using previously described crieria (Carruhers e al. 213). Uni selecion and firing-rae maching. To be included in analysis, a uni had o mee he following condiions: 1) is firing rae averaged a leas.1 Hz firing rae during simulus presenaion, and 2) is spike coun conained a leas.78 bis/s of informaion abou he vocalizaion ideniy during he presenaion of a leas one vocalizaion under one of he ransformaion condiions. We se his hreshold o mach he elbow in he hisogram of he disribuion of informaion raes for all recorded unis ha passed he firing rae hreshold (see Fig. 5A, inse). We validaed his hreshold wih visual inspecion of vocalizaion response pos-simulus ime hisograms for unis around he hreshold. We esimaed he informaion rae for each neuron by fiing a Poisson disribuion o he disribuion of spike couns evoked by each vocalizaion. We hen compued he enropy of his se of 8 disribuions, and subraced from his value he prior enropy of 3 bis. Enropy was defined as as H S R r p r H S R r r,s p r,s log2 p s r. We defined s r s r r! e s, he Poisson likelihood of deecing r spikes in response o simulus s where s is he mean number of spikes deeced from a neuron in response o simulus s. The enropy was compued as H S R 1 N s r log2 r,s s r s' s' r. We performed his compuaion separaely for each ransformaion condiion. In order o remove a poenial source of bias due o differen firing rae saisics in A1 and SRAF, we resriced all analyses o he subse of A1 unis whose average firing raes mos closely mached he seleced SRAF unis. We performed his resricion by recursively including he pair of unis from he wo areas wih he mos similar firing raes. Response sparseness. To examine vocalizaion seleciviy of recorded unis, sparseness of vocalizaion was compued as: i n i 1 FR i n 2 Sparseness 1 i 1 i n FR i 2 n where FR i is he firing rae o vocalizaion i afer he minimum firing rae in response o vocalizaions was subraced, and n is number of vocalizaions included (which was 8). This value was compued separaely for each recorded uni for each vocalizaion ransformaion, and hen averaged over all ransformaions for recorded unis from eiher A1 or SRAF. Populaion response vecor. The populaion response on each rial was represened as a vecor, such ha each elemen corresponded o responses of a uni o a paricular presenaion of a paricular vocalizaion. Bin size for he spike coun was seleced by cross-validaion (Hung e al. 25; Rus and Dicarlo 21); we esed classifiers using daa binned a 5, 74, 1, and 15 ms. We found he highes performance in boh A1 and SRAF when using a single bin 74 ms wide from vocalizaion onse, and we used his bin size for he remainder of he analyses. As each ransformaion of each vocalizaion was presened 1 imes in each recording session, he analysis yielded 1 N marix of responses for each of he 72 vocalizaion/ ransformaions (8 vocalizaions and 9 ransformaion condiions), where N was he number of unis under analysis. The response of each uni was represened as an average of spike couns from 1 randomly seleced rials. This pooling was performed afer he segregaion of vecors ino raining and validaion daa, such ha he spike-couns used o produce he raining daa did no overlap wih hose used o produce he validaion daa. Linear suppor vecor machine (SVM) classifier. We used he suppor vecor machine package libsvm (Chang and Lin 211), as disribued by he sciki-learn projec, version.15 (Pedregosa e al. 211) o classify populaion response vecors. We used a linear kernel (resuling in decision boundaries defined by convex ses in he vecor space of populaion spiking responses), and a sof-margin parameer of 1 (seleced by cross-validaion o maximize raw performance scores). Classificaion procedure. For each classificaion ask, a se of randomly seleced N unis (unless oherwise noed, we used N 6) was used o consruc he populaion response vecor as described above, dividing he daa ino raining and validaion ses. For each vocalizaion, 8 vecors were used o rain and 2 o validae perransformaion and wihin-ransformaion classificaion (see Acrossransformaion performance below). In order o divide he daa evenly among he nine ransformaions, 81 vecors were used o rain and 18 o validae in all-ransformaion classificaion. We used he vecors in he raining daase o fi a classifier, and hen esed he abiliy of he resuling classifier o deermine which of he vocalizaions evoked each of he vecors in he validaion daase. Boosrapping. The enire classificaion procedure was repeaed 1 imes for each ask, each ime on a differen randomly seleced Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 Fig. 1. Specrograms of vocalizaions and ransformaions used as acousic simuli in he experimens. A: he eigh differen original vocalizaions seleced from recordings, afer de-noising. B: one original vocalizaion (cener), as well as he 8 differen ransformaions of ha vocalizaion presened in he experimen. From op lef o boom righ: T : emporally sreched by facor of 1.25; CF : cener frequency shifed up o 7.9 khz; T : emporally compressed by facor of.75; FM : frequency modulaion scaled by a facor of.75; Original: denoised original vocalizaion; FM : frequency modulaion scaled by a facor of 1.25; T /FM : emporally compressed and frequency modulaion scaled by a facor of.75; CF : cener frequency shifed down by 7.9 khz; T /FM : emporally sreched and frequency modulaion scaled by a facor of 1.25. J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

273 INVARIANT CODING IN THE AUDITORY CORTEX A.3 Voc 1 Voc 2 Voc 3 Voc 4 Frequency modulaion (cycles/khz) B -.3.3 Voc 5 Voc 6 Voc 7 Voc 8 -.3-1 1-1 1-1 1-1 1 Temporal modulaion (cycles/ms) 12 1 8 Bandwidh (khz) 6 4 Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 2 3 4 5 6 7 Cener frequency (khz) Fig. 2. Saisical characerizaion of vocalizaions. A: specroemporal modulaion specrum for he 8 vocalizaions. B: disribuion of cener frequency and bandwidh for all recorded vocalizaions. Eigh vocalizaions used in he sudy are indicaed by red dos wih corresponding numbers. populaion of unis, and each ime using a differen randomly seleced se of rials for validaion. Mode of classificaion. Classificaion was performed in one of wo modes: In he pairwise mode, we rained a separae binary classifier for each possible pair of vocalizaions, and classified which of he wo vocalizaions evoked each vecor. In one-vs.-all mode, we rained an 8-way classifier on responses o all vocalizaions a once, and classified which of he eigh vocalizaions was mos likely o evoke each response vecor (Chang and Lin 211; Pedregosa e al. 211). This was implemened by compuing all pairwise classificaions followed by a voing procedure. We recorded he resuls of each classificaion, and compued he perfor- J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

INVARIANT CODING IN THE AUDITORY CORTEX 2731 mance of he classifier as he fracion of response vecors ha i classified correcly. As here were 8 vocalizaions, performance was compared o he chance value of.125 in one-vs.-all mode and o.5 in pairwise mode. Across-ransformaion performance. We rained and esed classifiers on vecors drawn from a subse of differen ransformaion condiions. We chose he subse of ransformaions in wo differen ways: When esing per-ransformaion performance, we rained and esed on vecors drawn from presenaions of one ransformaion and from he original vocalizaions. When esing all-ransformaion performance, we rained and esed on vecors drawn from all 9 ransformaion condiions. Wihin-ransformaion performance. For each subse of ransformaions on which we esed across-ransformaion performance, we also rained and esed classifiers on responses under each individual ransformaion condiion. We refer o performance of hese classifiers, averaged over he ransformaion condiions, as he wihin-ransformaion performance. Generalizaion penaly. In order o evaluae how oleran neural codes are o simulus ransformaion, we compared he performance on generalizaion asks wih he performance on he corresponding wihin-ransformaion asks. We defined he generalizaion penaly as he difference beween he wihin- and across-ransformaion performance. RESULTS In order o measure how invarian neural populaion responses o vocalizaions are o heir acousic ransformaions, we seleced USV exemplars and consruced heir ransformaions along basic acousic dimensions. Ra USVs consis of frequency modulaed pure ones wih lile or no harmonic srucure. The simple srucure of hese vocalizaions makes i possible o exrac he vocalizaion iself from background noise wih high fideliy. Their simpliciy also allows us o parameerize he vocalizaions; hey are characerized by he dominan frequency, and he ampliude a ha frequency, as hese quaniies vary wih ime. In urn, his simple parameerizaion allows us o easily and efficienly ransform aspecs of he vocalizaions. The deails of his parameerizaion and ransformaion process are repored in deph in our previously published work (Carruhers e al. 213). We seleced 8 disinc vocalizaions from recordings of social ineracions beween male adolescen ras (Carruhers e al. 213; Siroin e al. 214). We chose hese vocalizaions o include a variey of emporal and frequency modulaion specra (Fig. 2A) and o cover he cener frequency and frequency bandwidh disribuion of he full se of recorded vocalizaions (Fig. 2B). We previously demonsraed ha he responses of neurons o vocalizaions were dominaed by modulaion in frequency and ampliude (Carruhers e al. 213). Therefore, we used frequency, frequency modulaion, and ampliude modulaion ime course as he relevan acousic dimensions o generae ransformed vocalizaions. We consruced 8 differen ransformed versions of hese vocalizaions by adjusing he cener frequency, duraion, and/or specral bandwidh of hese vocalizaions (see METHODS), for a oal of 9 versions of each vocalizaion. The 8 original vocalizaions we seleced can be seen in Fig. 1A, and Fig. 1B shows he differen ransformed versions of vocalizaion 3. We recorded neural responses in A1 and SRAF in ras as hey passively lisened o hese original and ransformed vocalizaions. As in our previous sudy (Carruhers e al. 213), we found ha A1 unis respond selecively and wih high emporal precision o USVs (Fig. 3). SRAF unis exhibied similar paerns of responses (Fig. 4). For insance, he represenaive A1 uni shown in Fig. 3 responded significanly o all of he original vocalizaions excep vocalizaions 4, 6, and 7 (row 1). Meanwhile, he represenaive SRAF uni in Fig. 4 responded significanly o all of he original vocalizaions excep vocalizaion 6 (row 1). Noe ha he A1 uni s response o vocalizaion 5 varies significanly in boh size and emporal srucure when he vocalizaion is ransformed. Meanwhile, he SRAF uni s response o he same vocalizaion is consisen regardless of which ransformaion of he vocalizaion is played. In his insance, he seleced SRAF uni exhibis greaer invariance o ransformaions of vocalizaion 5 han he seleced A1 uni. To compare he responses of populaions of unis in A1 and SRAF and o ensure ha he effecs ha we observe are no due simply o increased informaion capaciy of neurons ha fire a higher firing raes, we seleced subpopulaions of unis ha were mached for firing rae disribuion (Rus and Dicarlo 21; Ulanovsky e al. 24) (Fig. 5A). We hen compared he uning properies of unis from he wo brain areas, as measured by he pure-one frequency ha evoked he highes firing rae from he unis. We found no difference in he disribuion of bes frequencies beween he wo populaions (Kolmogorov- Smirnov es, P.66) (Fig. 5B). We compared he amoun of informaion ransmied abou a vocalizaion s ideniy by he spike couns of unis in each brain area, and again found no significan difference (Fig. 5C, Kolmogorov-Smirnov es, P.42). Furhermore, we compued sparseness of responses of A1 and SRAF unis o vocalizaions, which is a measure of neuronal seleciviy o vocalizaions. A sparseness value of 1 indicaes ha he uni responds differenly o a single vocalizaion han o all ohers, whereas a sparseness value of indicaes ha he uni responds equally o all vocalizaions. The mean sparseness values for responses were.354 for A1, and.376 for SRAF (Fig. 5D), bu his difference was no significan (Kolmogorov-Smirnov es, P.84). These analyses demonsrae ha he seleced neuronal populaions in A1 and SRAF were similarly selecive o vocalizaions. Neuronal populaions in A1 and SRAF exhibied similar performance in heir abiliy o classify responses o differen vocalizaions. We rained classifiers o disinguish beween original vocalizaions on he basis of neuronal responses, and we measured he resuling performances. To ensure ha he resuls were no skewed by a paricular vocalizaion, we compued he classificaion eiher for responses o each pair of vocalizaions (pairwise performance), or for responses o all 8 vocalizaions simulaneously (8-way performance). We found a small bu significan difference beween he average performance of hose classifiers rained and esed on A1 responses and hose rained and esed on he SRAF responses (Fig. 5, E and F), bu he resuls were mixed. Pairwise classificaions performed on populaions of A1 unis were 88.% correc, and on populaions of SRAF unis, 88.5% correc (Kolmogorov- Smirnov es, P.13). On he oher hand, 8-way classificaions performed on populaions of 6 A1 unis were 61% correc, and on SRAF unis were 59% correc (Kolmogorov- Smirnov es, P 7.7e 11). Figure 5, G and H, shows he classificaion performance broken down by vocalizaion for pairwise classificaion for A1 (Fig. 5G) and SRAF (Fig. 5H). There is high variabiliy in performance beween vocalizaion pairs for eiher brain area. However, he performance levels are Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

2732 INVARIANT CODING IN THE AUDITORY CORTEX Original Trial 1 1 Voc 1 Voc 2 Voc 3 Voc 4 Voc 5 Voc 6 Voc 7 Voc 8 FR(Hz) FM- FM+ CF+ CF- T- T-/FM- T+/FM+ T+ Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2 Fig. 3. Perisimulus-ime raser plos (above) and hisograms (below) of an exemplar A1 uni showing selecive responses o vocalizaion simuli. Each column corresponds o one original vocalizaion, and every wo rows o one ransformaion of ha vocalizaion. Hisograms were firs compued for 1-ms ime bins, and hen smoohed wih 11-ms Hanning window. J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

INVARIANT CODING IN THE AUDITORY CORTEX 2733 Original 1 Trial 2 FR(Hz) Voc 1 Voc 2 Voc 3 Voc 4 Voc 5 Voc 6 Voc 7 Voc 8 FM- FM+ CF+ CF- T- T-/FM- T+/FM+ T+ Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2 Fig. 4. Perisimulus-ime raser plos (above) and hisograms (below) of an exemplar SRAF uni showing selecive responses o vocalizaion simuli. Each column corresponds o one original vocalizaion, and every wo rows o one ransformaion of ha vocalizaion. Hisograms were firs compued for 1 ms ime-bins, and hen smoohed wih 11-ms Hanning window. J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

2734 INVARIANT CODING IN THE AUDITORY CORTEX A 1. Cumulaive Firing Rae Disribuions B 1 Bes frequency disribuion Fig. 5. Ensembles of A1 and suprarhinal audiory field (SRAF) unis under sudy are similar in responses and overall classificaion performance. A: cumulaive disribuions for average firing rae of unis during simulus presenaion. Disribuion of SRAF unis shown in red, A1 unis shown in fain blue, and he subse of A1 unis mached o he SRAF unis shown in blue. Inse: disribuion of informaion raes for all recorded unis ha passed he minimum firing rae crierion. The hreshold for informaion rae (.78 bis/s) in response o vocalizaions under a leas one ransformaion is marked by a verical black line. B: box-plo showing he disribuion of frequency unings of he unis seleced from A1 and from SRAF. The boxes show he exen of he cenral 5% of he daa, wih he horizonal bar showing he median frequency. C: hisogram of he informaion conained in he spike couns of unis from A1 and SRAF abou each vocalizaion. Dashed lines mark he mean values. D: hisogram of sparseness (wih respec o vocalizaion ideniy) of responses of unis from A1 and SRAF. Dashed lines mark he mean values. E: classificaion accuracy of suppor vecor machine (SVM) classifier disinguishing beween wo vocalizaions (pairwise mode). Faded colors show performance for he pair of vocalizaions wih he highes performance for each brain area, and sauraed colors show average performance across pairs. F: classificaion accuracy of SVM classifier disinguishing beween all vocalizaions (8-way mode). Faded colors show performance for he vocalizaion wih he highes performance for each brain area, and sauraed colors show average performance across all vocalizaions. G: average performance of pairwise classificaion for each vocalizaion for neuronal populaions in A1. H: average performance of pairwise classificaion for each vocalizaion for neuronal populaions in SRAF. Cumulaive Disribuion C Number of unis E Fracion correc classificaions G.8.6.4.2. 16 14 12 1 8 6 4 2 1..9.8.7.6.5 2 4 6 Informaion (bis/s) A1 unis SRAF unis rae-mached A1 unis 5 1 15 2 Firing Rae (Hz).479.495 Informaion Rae..5 1. 1.5 2. 2.5 Informaion (bis per second) Vocalizaion ideniy Pairwise Classificaion Performance A1 (mean) SRAF(mean) chance A1 SRAF 1 1 2 3 4 5 6 Number of unis in group 7 6 5 4 3 2.5 Probabiliy A1 (bes) SRAF (bes) Threshold A1 Pairwise Performance All recorded unis Bes frequency (khz) D Number of unis F Fracion correc classificaions H 1 8 6 4 2 1 12 1 A1 unis.354.376 SRAF unis Response Sparseness A1 SRAF..1.2.3.4.5.6.7.8.9 Sparseness 1..9.8.7.6.5.4.3.2.1 Vocalizaion ideniy 7 6 5 4 3 2 8-Way Classificaion Performance 1 1 2 3 4 5 6 Number of unis in group SRAF Pairwise Performance 1..9.8.7.6 Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 1 2 3 4 5 6 7 8 Vocalizaion ideniy 1 2 3 4 5 6 7 8 Vocalizaion ideniy.5 similar. Togeher, hese resuls indicae ha neuronal populaions in A1 and SRAF are similar in heir abiliy o classify vocalizaions. To es wheher neuronal populaions exhibied invariance o ransformaions in classifying vocalizaions, we measured wheher he abiliy of neuronal populaions o classify vocalizaions was reduced when vocalizaions were disored acousically. Therefore, we rained and esed classifiers for vocalizaions based on populaion neuronal responses and compared heir performance under wihin-ransformaion and acrossransformaion condiions (Fig. 6A). In wihin-ransformaion condiion, he classifiers were rained and esed o discriminae responses o vocalizaions under a single ransformaion. In across-ransformaion condiion, he classifier was rained and J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

INVARIANT CODING IN THE AUDITORY CORTEX 2735 A Wihin-ransformaion A1: High discriminabiliy Across-ransformaion A1: Lower discriminabiliy Neuron 2 USV1 USV2 Neuron 2 USV1* Neuron 2 USV2* Neuron 1 Neuron 1 Neuron 1 SRAF: High discriminabiliy SRAF: Higher discriminabiliy Wihin-ransformaion performance Wihin-ransformaion performance B Pairwise, Per-Transformaion 1..9.8.7.6.5.5.6.7.8.9 1. Across-ransformaion performance 8-Way, Per-Transformaion 1. D Neuron 2.9.8.7.6.5.4.3 USV1 Neuron 1 USV2 A1 SRAF.2.2.3.4.5.6.7.8.9 1. Across-ransformaion performance Neuron 2 USV1* Neuron 1 USV2* Wihin-ransformaion performance Wihin-ransformaion performance C Pairwise, All-Transformaion 1..9.8.7.6.5.5.6.7.8.9 1. Across-ransformaion performance 8-Way, All-Transformaion 1. E.8.6.4.2 Neuron 2 Neuron 1..2.4.6.8 1. Across-ransformaion performance Fig. 6. Classifier performance on wihin-ransformaion and across-ransformaion condiions. A: schemaic diagram of neuronal responses o 2 original (USV1, USV2) and ransformed (USV1*, USV2*) vocalizaions. Each do denoes a populaion response vecor projeced in a low-dimensional subspace. Lef: wihin-ransformaion classificaion: classifier is rained and esed o classify responses o vocalizaions for a single ransformaion. Wihin-ransformaion discriminabiliy is high for boh original and ransformed vocalizaions by populaions of neurons in eiher A1 (op) or SRAF (boom). Righ: generalizaion classificaion: Classifier is rained and esed o classify responses o vocalizaions for original and ransformed vocalizaions simulaneously. Predicions of he hierarchical coding model: Across-ransformaion classificaion performance is low for A1 and high for SRAF, reflecing an increase in invariance from A1 o SRAF. B and C: performance when discriminaing each vocalizaion from one oher vocalizaion (pairwise classificaion). D and E: performance when discriminaing each vocalizaion from all ohers (8-way classificaion). B and D: performance when generalizaion is performed across he original vocalizaions and one ransformaion a a ime (per-ransformaion). C and E: performance when generalizaion is performed across all eigh ransformaions and he originals a once (all-ransformaion). Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 esed in discriminaing responses o vocalizaions in original form and one or all ransformaions. The difference beween wihin-ransformaion and he across-ransformaion classifier performance was ermed he generalizaion penaly. If he neuronal populaion exhibied low invariance, we expeced he across-ransformaion performance o be lower han wihinransformaion performance and he generalizaion penaly o be high (Fig. 6A, op). If neuronal populaion exhibied high invariance, we expeced he across-ransformaion performance o be equal o wihin-ransformaion performance and he generalizaion penaly o be low (Fig. 6A, boom). To ensure ha responses o a selec ransformaion were no skewing he resuls, we compued across-ransformaion performance boh for each of he ransformaions and for all ransformaions. In per-ransformaion condiion, he classifier was rained and esed in discriminaing responses o vocalizaions in original form and under one oher ransformaion. In all-ransformaion condiion, he classifier was rained and esed in discriminaion of responses o vocalizaions in original form and under all 8 ransformaions simulaneously. Neuronal populaions in A1 exhibied greaer reducion in performance on across-ransformaion condiion compared o J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

2736 INVARIANT CODING IN THE AUDITORY CORTEX wihin-ransformaion condiion han neuronal populaion in SRAF. Figures 6 and 7 presen he comparison beween acrossransformaion performance and wihin-ransformaion performance for each of he differen condiions. Noe ha he differen condiions resul in very differen numbers of daa poins: he per-ransformaion condiions have 8 imes as many daa poins as he all-ransformaion condiions, as he former yields a separae daa poin for each ransformaion. Similarly, he pairwise condiions yield 28 imes as many daa poins as he 8-way condiions (one for each unique pair drawn from he 8 vocalizaions). As expeced, for boh A1 and SRAF, he classificaion performance was higher for wihin-ransformaion han across-ransformaion condiion (Fig. 6, B E). However, he difference in performance beween wihin-ransformaion and across-ransformaion condiions was higher in A1 han in SRAF: SRAF populaions suffered a smaller generalizaion penaly under all condiions esed (Fig. 7), indicaing ha neuronal ensembles in SRAF exhibied greaer generalizaion han in A1. This effec was presen under boh pairwise (Fig. 6, B and C, and Fig. 7, A and B) and 8-way classificaion Pairwise, Per-Transformaion Generalizaion penaly Generalizaion penaly A B C.2.1 A1 SRAF.3.2.1 Pairwise, All-Transformaion * *** Generalizaion penaly.2.1 A1 SRAF D E F 8-Way, Per-Transformaion 8-Way, All-Transformaion Generalizaion Penaly by Populaion Size.4 *** *.2 Generalizaion penaly.3.2.1 Generalizaion Penaly by Populaion Size Generalizaion penaly Generalizaion penaly.12.8.4 A1 SRAF 1 2 3 4 5 6 Number of cells.1 A1 SRAF Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 A1 SRAF A1 SRAF 1 2 3 4 5 6 Number of cells Fig. 7. Generalizaion penaly (difference beween wihin-ransformaion performance and across-ransformaion performance) is higher for A1 ensembles han for SRAF ensembles. Each do corresponds o average classifier performance for a specific vocalizaion/ransformaion combinaion. Condiions in which SRAF unis show smaller penaly han A1 unis are conneced wih cyan lines, condiions, in which SRAF unis show more penaly are conneced by yellow lines. Mean penaly values for each brain area are marked wih black arrows. A and B: generalizaion penaly when discriminaing each vocalizaion from one oher vocalizaion (pairwise classificaion). D and E: generalizaion penaly when discriminaing each vocalizaion from all ohers (8-way classificaion). A and D: generalizaion penaly when generalizaion is performed only across he original vocalizaions and one vocalizaion a a ime (per-ransformaion generalizaion). B and E: generalizaion penaly when generalizaion is performed across all eigh ransformaions and he originals a once (all-ransformaion generalizaion). C and F: generalizaion penaly as funcion of he number of cells in ensemble. C: pairwise classificaion across all eigh ransformaions, as in B. E: 8-way classificaion across eigh ransformaions, as in D. *P.5; ***P.1. J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

INVARIANT CODING IN THE AUDITORY CORTEX 2737 (Fig. 6, D and E, and Fig. 7, C and D), and for generalizaion in per-ransformaion (Fig. 6, B and D, and Fig. 7, A and D, pairwise classificaion, P.28; 8-way classificaion, P 1.9e 4; Wilcoxon paired sign-rank es; 6 unis in each ensemble esed) and all-ransformaion mode (Fig. 6, C and E, and Fig. 7, B and E; pairwise classificaion, P 1.4e 5; 8-way classificaion, P.25; Wilcoxon paired sign-rank es; 6 unis in each ensemble esed). The greaer generalizaion penaly for A1 as compared o SRAF was preserved for increasing number of neurons in he ensemble, as he discriminaion performance improved and he relaive difference beween across- and wihin- performance increased (Fig. 7, C and F). Taken ogeher, we find ha populaions of SRAF unis are beer able o generalize across acousic ransformaions of simuli han populaions of A1 unis, as characerized by linear encoding of simulus ideniy. These resuls sugges ha populaions of SRAF neurons are more invarian o ransformaions of audiory objecs han populaions of A1 neurons. DISCUSSION Our goal was o es wheher and how populaions of neurons in he audiory corex represened vocalizaions in an invarian fashion. We esed wheher neurons in he nonprimary area SRAF exhibi greaer invariance o simple acousic ransformaions han do neurons in A1. To esimae invariance in neuronal encoding of vocalizaions, we compued he difference in he abiliy of neuronal populaion codes o classify vocalizaions beween differen ypes following acousic disorions of vocalizaions (Fig. 1). We found ha, while neuronal populaions in A1 and SRAF exhibied similar seleciviy o vocalizaions (Figs. 3 5), neuronal populaions in SRAF exhibied higher invariance o acousic ransformaions of vocalizaions han in A1, as measured by lower generalizaion penaly (Figs. 6 and 7). These resuls are consisen wih he hypohesis ha invariance arises gradually wihin he audiory pahway, wih higher audiory areas exhibiing progressively higher invariances oward basic ransformaions of acousic signals. An invarian represenaion a he level of populaion neuronal ensemble aciviy suppors he abiliy o discriminae beween behaviorally imporan sounds (such as vocalizaions and speech) despie speaker variabiliy and environmenal changes. We recenly found ha ra ulrasonic vocalizaions can be parameerized as ampliude- and frequency-modulaed ones, similar o whisles (Carruhers e al. 213). Unis in he audiory corex exhibied selecive responses o subses of he vocalizaions, and a model ha relies on he ampliude- and frequency-modulaion ime course of he vocalizaions could predic he responses o novel vocalizaions. These resuls poin o ampliude- and frequency modulaions as essenial acousic dimensions for encoding of ulrasonic vocalizaions. Therefore, in his sudy, we esed four ypes of acousic disorions based on basic ransformaions of hese dimensions: emporal dilaion, frequency shif, frequency modulaion scaling and combined emporal dilaion and frequency modulaion scaling. These ransformaions likely carry behavioral significance and migh be encounered when a speaker s voice is emporally dilaed, or be characerisic of differen speakers (Fich e al. 1997). While here is limied evidence ha such ransformaions are ypical in vocalizaions emied by ras, preliminary analysis of ra vocalizaions revealed a large range of variabiliy in hese parameers across vocalizaions. Neurons hroughou he audiory pahway have been shown o exhibi selecive responses o vocalizaions. In response o ulrasonic vocalizaions, neurons in he audiory midbrain exhibi a mix of selecive and nonselecive responses in rodens (Holmsrom e al. 21; Pincherli Casellanos e al. 27). A he level of A1, neurons across species respond srongly o conspecific vocalizaions (Gehr e al. 2; Glass and Wollberg 1983; Huez e al. 29; Medvedev and Kanwal 24; Pelleg-Toiba and Wollberg 1991; Wallace e al. 25; Wang e al. 1995). The specializaion of neuronal responses for he naural saisics of vocalizaion has been under debae (Huez e al. 29; Wang e al. 1995). The avian audiory sysem exhibis srong specializaion for naural sounds and conspecific vocalizaions (Schneider and Woolley 21; Woolley e al. 25), and a similar hierarchical ransformaion has been observed beween primary and secondary corical analogs (Elie and Theunissen 215). In rodens, specialized responses o USVs in A1 are likely conex-dependen (Galindo-Leon e al. 29; Liu e al. 26; Liu and Schreiner 27; Marlin e al. 215). Therefore, exending our sudy o be able o manipulae he behavioral meaning of he vocalizaions hrough raining will grealy enrich our undersanding of how he ransformaion ha we observe conribues o audiory behavioral performance. A1 neurons adap o he saisical srucure of he acousic simulus (Asari and Zador 29; Blake and Merzenich 22; Kvale and Schreiner 24; Naan e al. 215; Rabinowiz e al. 213; Rabinowiz e al. 211). The ampliude of frequency shif and frequency modulaion scaling coefficien were chosen on he basis of he range of he saisics of ulrasonic vocalizaions ha we recorded (Carruhers e al. 213). These manipulaions were designed o keep he saisics of he acousic simulus wihin he range of original vocalizaions, in order o bes drive responses in A1. Psychophysical sudies in humans found ha speech comprehension is preserved over emporal dilaions up o a facor of 2 (Beasley e al. 198; Dupoux and Green 1997; Foulke and Sich 1969). Here, we used a scaling facor of 1.25 or.75, similar o previous elecrophysiological sudies (Gehr e al. 2; Wang e al. 1995), and also falling wihin he saisical range of he recorded vocalizaions. Furhermore, we included a simulus in which frequency modulaion scaling was combined wih emporal dilaion. This ransformaion was designed in order o preserve he velociy of frequency modulaion from he original simulus. The observed resuls exhibi robusness o he ype of ransformaion ha was applied o he simulus, and are herefore likely generalizable o ransformaions of oher acousic feaures. In order o quanify he invariance of populaion neuronal codes, we used he performance of auomaed classifiers as a lower bound for he informaion available in he populaion responses o original and ransformed vocalizaions. To assay generalizaion performance, we compued he difference beween classifier performance on wihin- and across-ransformaion condiions. We expeced his difference o be small for populaions of neurons ha generalized, and large for populaions of neurons ha did no exhibi generalizaion (Fig. 6A). Compuing his measure was paricularly imporan, as populaions of A1 and SRAF neurons exhibied a grea degree of variabiliy in classificaion performance for boh wihin- and across-ransformaion classificaion (Fig. 6, B E). This variabiliy is consisen Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 J Neurophysiol doi:1.1152/jn.95.215 www.jn.org

2738 INVARIANT CODING IN THE AUDITORY CORTEX wih he known deails abou heerogeneiy in neuronal cell ypes and conneciviy in he mammalian corex (Kanold e al. 214). Therefore, measuring he relaive improvemen in classificaion performance using he generalizaion penaly overcomes he limis of heerogeneiy in performance. In order o probe he ransformaion of represenaions from one brain area o he nex, we decided o limi he classifiers o informaion ha could be linearly decoded from populaion responses. For his reason, we chose o use linear suppor vecor machines (SVMs, see METHODS) for classifiers. SVMs are designed o find robus linear boundaries beween classes of vecors in a high-dimensional space. When rained on wo ses of vecors, an SVM finds a hyperplane (a fla, infinie boundary) ha provides he bes separaion beween he wo ses: a hyperplane ha divides he space in wo, assigning every vecor on one side o he firs se, and everyhing on he oher side o he second. In his case finding he bes separaion means a rade-off beween having as many of he raining vecors as possible be on he correc side, and giving he separaing hyperplane as large of a margin (he disance beween he hyperplane and he closes correcly classified vecors) as possible (Dayan and Abbo 25; Vapnik 2). The resul is generally a robus, accurae decision boundary ha can be used o classify a vecor ino one of he wo ses. A linear classificaion can be viewed as a weighed summaion of inpus, followed by a hresholding operaion, a combinaion of acions ha is undersood o be one of he mos fundamenal compuaions performed by neurons in he brain (Abbo 1994; decharms and Zador 2). Therefore, examinaion of informaion via linear classifiers places a lower bound on he level of classificaion ha could be accomplished during he nex sage of neural processing. Several mechanisms could poenially explain he increase in invariance we observe beween A1 and SRAF. As previously suggesed, corical microcircuis in A1 can ransform incoming responses ino a more feaure-invarian form (Aencio e al. 29). By inegraing over neurons wih differen uning properies, higher level neurons can develop uning o more specific conjuncion of feaures (becoming more selecive), while exhibiing invariance o basic ransformaions. Alernaively, higher audiory brain areas may be beer able o adap o he basic saisical feaures of audiory simuli, such ha he neuronal responses would be sensiive o paerns of specroemporal modulaion regardless of basic acousic ransformaions. A he level of he midbrain, adapaion o he simulus variance allows for invarian encoding of simulus ampliude flucuaions (Rabinowiz e al. 213). In he mouse inferior colliculus, neurons exhibi heerogeneous response o ulrasonic vocalizaions and heir acousically disored versions (Holmsrom e al. 21). A higher processing sages, as audiory processing becomes progressively mulidimensional (Sharpee e al. 211), adapaion could produce a neural code ha could be more robusly decoded across simulus ransformaions. More complex populaion codes may provide a greaer amoun of informaion in he brain (Averbeck e al. 26; Averbeck and Lee 24; Cohen and Kohn 211). Exensions o he presen sudy could be used o disinguish beween invariance due o saisical adapaion, and invariance due o feaure independence in neural responses. While our resuls suppor a hierarchical coding model for he represenaion of vocalizaions across differen sages of he audiory corex, he observed changes may originae a he subcorical level, e.g., inferior colliculus (Holmsrom e al. 21) or differenial halamocorical inpus (Covic and Sherman 211), and already should be encoded wihin specific groups of neurons or wihin differen corical layers wihin he primary audiory corex. Furher invesigaion including more selecive recording and argeing of specific cell ypes is required o pinpoin wheher he ransformaion occurs hroughou he pahway or wihin he canonical corical circui. ACKNOWLEDGMENTS We hank Drs. Y. Cohen, S. Eliades, M. Pagan, and N. Rus and members of he Geffen laboraory for helpful discussions on analysis, and L. Liu, L. Cheung, A. Davis, A. Nguyen, A. Chen, and D. Mohabir for echnical assisance wih experimens. GRANTS This work was suppored by NIDCD Grans R3-DC-1366 and R1-DC- 14479, Klingensein Foundaion Award in Neurosciences, Burroughs Wellcome Fund Career Award a Scienific Inerface, Human Froniers in Science Foundaion Young Invesigaor Award and Pennsylvania Lions Club Hearing Research Fellowship o M. N. Geffen. I. M. Carruhers and A. Jaegle were suppored by he Cogniion and Percepion IGERT raining gran. A. Jaegle was also parially suppored by he Hears Foundaion Fellowship. J. J. Briguglio was parially suppored by NSF PHY15822 and US-Israel BSF 21158. DISCLOSURES No conflics of ineres, financial or oherwise, are declared by he auhor(s). AUTHOR CONTRIBUTIONS Auhor conribuions: I.M.C., D.A.L., A.J., and M.N.G. concepion and design of research; I.M.C., D.A.L., A.J., L.M.-T., R.G.N., and M.N.G. performed experimens; I.M.C., D.A.L., A.J., J.J.B., L.M.-T., R.G.N., and M.N.G. analyzed daa; I.M.C., D.A.L., and M.N.G. inerpreed resuls of experimens; I.M.C., D.A.L., J.J.B., and M.N.G. prepared figures; I.M.C. and M.N.G. drafed manuscrip; I.M.C., D.A.L., A.J., J.J.B., L.M.-T., R.G.N., and M.N.G. approved final version of manuscrip; D.A.L., A.J., J.J.B., R.G.N., and M.N.G. edied and revised manuscrip. REFERENCES Abbo LF. Decoding neuronal firing and modelling neural neworks. Q Rev Biophys 27: 291 331, 1994. Aizenberg M, Geffen MN. Bidirecional effecs of audiory aversive learning on sensory acuiy are mediaed by he audiory corex. Na Neurosci 16: 994 996, 213. Arnaul P, Roger M. Venral emporal corex in he ra: connecions of secondary audiory areas Te2 and Te3. J Comp Neurol 32: 11 123, 199. Asari H, Zador A. Long-lasing conex dependence consrains neural encoding models in roden audiory corex. J Neurophysiol 12: 2638 2656, 29. Aencio C, Sharpee T, Schreiner C. Hierarchical compuaion in he canonical audiory corical circui. Proc Nal Acad Sci USA 16: 21894 21899, 29. Averbeck BB, Laham PE, Pouge A. Neural correlaions, populaion coding and compuaion. Na Rev Neurosci 7: 358 366, 26. Averbeck BB, Lee D. Coding and ransmission of informaion by neural ensembles. Trends Neurosci 27: 225 23, 24. Barfield RJ, Auerbach P, Geyer LA, Mcinosh TK. Ulrasonic vocalizaions in ra sexual-behavior. Am Zool 19: 469 48, 1979. Beasley DS, Bra GW, Rinelmann WF. Inelligibiliy of ime-compressed senenial simuli. J Speech Hear Res 23: 722 731, 198. Bialy M, Rydz M, Kaczmarek L. Preconac 5-kHz vocalizaions in male ras during acquisiion of sexual experience. Behav Neurosci 114: 983 99, 2. Bizley JK, Walker KM, Silverman BW, King AJ, Schnupp JW. Inerdependen encoding of pich, imbre, and spaial locaion in audiory corex. J Neurosci 29: 264 275, 29. Downloaded from hp://jn.physiology.org/ by 1.22.32.247 on November 24, 217 J Neurophysiol doi:1.1152/jn.95.215 www.jn.org