Temporal Envelope and Periodicity Cues on Musical Pitch Discrimination with Acoustic Simulation of Cochlear Implant

Temporal Envelope and Periodicity Cues on Musical Pitch Discrimination with Acoustic Simulation of Cochlear Implant Lichuan Ping 1, 2, Meng Yuan 1, Qinglin Meng 1, 2 and Haihong Feng 1 1 Shanghai Acoustics Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Shanghai, 200032, China 2 Graduate University of Chinese Academy of Sciences, Beijing, 100190, China pinglichuan@gmailcom Abstract The present study investigated musical pitch discrimination with acoustic simulation of cochlear implant A 4-channel noise-excited vocoder was used to simulate the signal processing of cochlear implants Eight normal-hearing subjects participated in this study Psychoacoustic experiments on pitch-direction discrimination test were carried out to examine the ability of detecting pitch-change direction Stimuli from four instruments (clarinet, trumpet, piano and violin) were utilized Experimental results showed that temporal envelope and periodicity information was important for musical pitch discrimination when the spectral information was limited 1 Introduction Cochlear implant (CI) has been widely used to restore hearing of patients with severe to profound hearing loss Although good speech understanding has been achieved with state-of-the-art CI devices in quiet environment, most of CI users still cannot understand speech in noise, perceive tone and music sounds [1] Many CI users describe music as an unpleasant and noisy sound Pitch and timbre are important features of music The discrimination of pitch-change direction is useful for music perception It reflects the ability of determining whether the pitch is changing higher or lower The perception of timbre is important for distinguishing different types of sound production, such as voices or musical instruments Previous studies demonstrated that the identification of pitch-change direction and the recognition of instrument timbre were much less accurate with CI users, compared with normal-hearing (NH) listeners [2-5] Gfeller et al tested 49 CI subjects with synthesized piano tone They found that the difference limens (DLs) of pitch-direction discrimination (PDD) were ranged from 1 semitone to 2 octaves (24 semitones), with the mean performance of 756 semitones In a recent study by Kang, 42 CI users were tested with synthesized complex tones [2] The DLs were ranged from 1 to 8 semitones, with the mean score of 3 semitones However, the DLs of PDD was approximately about 1 semitone for NH listeners [3-4] As for timbre, Gfeller et al tested CI users with an open-set timbre recognition task Their experimental results showed that the recognition performance was 47% correct [5] In Kang s study, timbre recognition was about 453% correct in a close-set task [4] Drennan and Rubinstein suggested that the poor pitch perception and timbre identification performance of CI users was probably caused by the limited spectral and temporal information delivered by the CI system [6] In current commercial CI devices, the signal processing strategies are all based on the vocodercentric method [7] In this method, the broad-band signal is filtered into several frequency bands (depending on the number of electrodes) The slowvarying envelope of the band-limited signal is extracted from each band and is used to modulate a fixed rate pulse carrier In such vocoder-centric signal processing strategies, spectral resolution is limited by the number of available electrodes (12-22) The temporal resolution is limited by the cutoff frequency (typically with 200 or 400 Hz) of the extracted envelope in each band Rosen proposed to divide the temporal information into three categories, depending on the rate of amplitude fluctuation: temporal envelope (below 50 Hz), periodicity (50 to 500 Hz), and fine structure (above 500 Hz) [8] In other studies, the temporal cues have been roughly divided into two parts: the temporal envelope and periodicity (2 to 500 Hz) component (TEPC), and the temporal fine structure (500 to 10000 Hz) component (TFSC) [9] 978-1-4244-5858-5/10/$2600 2010 IEEE 710 ICALIP2010

Many studies have been carried out to investigate the importance of temporal information to speech recognition [8-12] It was shown that TEPC was important to tonal and non-tonal language perception when the spectral information was limited [10] However, to our knowledge, there has been no study investigating the effect of TEPC to pitch or timbre perception The aim of this study was to investigate the effect of TEPC on musical pitch perception A noiseexcited vocoder was used to simulate the standard CI processing strategy, continuous Interleaved Sampling (CIS) strategy [12] Psychoacoustic experiment was manipulated to NH subjects DLs of the stimuli from different instruments and base frequencies were compared The effect of TEPC on musical pitch discrimination was analyzed 2 Experiment Design 21 Subjects Eight subjects participated in the experiment aging from 22 to 28 (Mean = 25875 years; SD = 25 years) None of them had hearing disease before Their audiometric thresholds were better than 20 db HL at octave frequencies from 125 and 8000 Hz in both ears 22 Test materials The test stimuli included four instruments (clarinet, trumpet, piano and violin), representing different instrumental families They were synthesized by a professional MIDI-Keyboard The duration of all the stimuli was 350 ms with 25 ms on- and off-ramp, respectively Before signal processing, the intensity of all stimuli were balanced with (Root-Mean- Square) equalization 23 Signal processing A 4-channel noise-excited vocoder was used to simulate the CI sound processing [9-11] As shown in Figure 2, the original stimuli were first pre-amplified for spectral equalization by a 1st-order high-pass Butterworth filter with cutoff frequency of 1 khz Then the signal was band-pass filtered into 4 contiguous frequency channels (4th-order Butterworth filters) between 125 and 4000 Hz The filters were designed ns using Greenwood function [12] To avoid differences in group delay between filters, zero-phase digital filtering was performed The TEPCs were extracted by half-wave rectification and low-pass filtering (using a 4nd-order Butterworth filter) at 500 Hz The TEPCs were then used to amplitude modulate independent white-noise carriers Finally, acoustic stimuli were generated by combining the modulated signals from each band The acoustic simulation was programmed with MATLAB 24 Procedure The PDD test was implemented using a twoalternative forced-choice task (2AFC) with 3-up, 1- down adaptive tracking [13] In each presentation, one tone with the reference F0 (base frequency) and one target tone with higher F0 were played in random order The F0 difference between the reference tone and the target tone was determined by the adaptive step, the minimum adaptive step was 1 semitone The DLs for each reference tone were calculated by the mean of the last 6 reversals (totally 12 reversals) for each trial Two tones (D#2 at 7778Hz and D#4 at 31113 Hz) were used as reference tones to test the effect of base frequency Four instruments (trumpet, clarinet, violin and piano) were chosen to test the timbre effect The stimuli were played to the listeners through a Sennheiser HD650 headphone The presented loudness levels were then randomized in a range of 65 to 70dB SPL, to minimize the effect of loudness cue on pitch discrimination The experiment was carried out in a sound-proof booth A graphical user interface (GUI) in the laptop was used The test conditions included base frequency and timbre The test order of all the conditions was randomized across subjects 3 Result For each condition, the DLs of PDD test were evaluated over all subjects The best score that could be achieved was one semitone, since it was the minimal adaptive step Figure 3 show the DLs of PDD test (the black line in each box represents the median values of DLs for different base frequencies and timbres) The results for different timbres are displayed separately The mean DL under the piano with D #2 condition was consistently higher than other conditio 711

BPF1 BPF1 Input BPF1 Output White Noise Carrier Figure 2 Block diagram of acoustic simulation of CI sound processing NH subjects were found to be the low boundary (1 semitone) 4 Discussion Figure 3 PDD test results of NH subjects listening to acoustic simulation A two-way ANOVA was carried out to analyze the factors of base frequency (D#2 and D#4) and timbre (trumpet, piano, violin and clarinet) The analyses revealed significant main effects of both factors (p<005) The interactions among the three factors were shown to have significant main effects (p<005) Posthoc comparison revealed that only the piano with D#2 condition was significantly different from the other conditions (p<005) In the control session, the DLs for Acoustically, pitch can be measured in terms of F0, which quantifies the periodicity of the signal Perceiving the pitch of a complex tone requires the ability to extract F0-related information, including F0 and its harmonics Pitch perception is expected to be a combination of two mechanisms: place theory and temporal theory For place theory, pitch can be perceived from the resolved F0 and low-order harmonics in the spectrum For temporal theory, pitch can also be perceived from the amplitude fluctuation of the unresolved high-order harmonics [14] In this study, the spectral resolution was limited by the number of channels in the noise-excited vocoder (4 channels), which limited the pitch information delivered from resolved harmonics The temporal resolution was limited by the cut-off frequency of lowpass filter (500Hz), which includes the pitch-related periodicity in temporal fluctuations Such design was used to investigate the effect of the TEPC on the musical pitch perception The TEPC were expected to encompass F0 and some low-order harmonics In the present study, the piano with D#2 condition was significantly different from other condition It was indicated that the TEPCs of this particular condition did not contain enough useful information for pitch perception 712

Figure 4 The TEPCs (left) from the third channel (9344Hz- 1975Hz) and their corresponding spectrums (right) of clarinet, piano, trumpet and violin with lower F0 (D#2, 7778Hz), respectively Figure 5 The TEPCs (left) from the third channel (9344Hz- 1975Hz) and their corresponding spectrums (right) of clarinet, piano, trumpet and violin with higher F0 (D#4, 31113Hz), respectively Figure 4 and Figure 5 show the TEPCs and their corresponding spectrums from the third channel (9344Hz- 1975Hz) Four instruments, clarinet, piano, trumpet and violin, with lower F0 (D#2) were presented in Panel (A), (B), (C) and (D) of Figure 4 It was shown that the TEPC of piano with lower F0 was highly irregular compared from the other TEPCs The periodicity in TEPCs reflects the F0 information of the stimuli The F0 (D#2 at 7778Hz) and its low-order harmonics could be easily distinguished in the spectrums of clarinet, trumpet and violin However, no F0-related information was found in the piano spectrum Figure 5 presented four instruments, clarinet, piano, trumpet and violin, with higher F0 (D#4) in 713

Panel (A), (B), (C) and (D)The F0 (D#4, 31113Hz) could be easily distinguished in the spectrums of clarinet, trumpet and violin and piano The contribution of TEPC to speech recognition has been widely studied using acoustic simulation with NH subjects Shannon found that NH subjects could achieve good English language recognition scores in quiet with limited temporal envelope information (below 50 Hz) [15] Xu showed that, when increasing the cut-off frequency of the temporal envelope from 50 to 500 Hz, the Mandarin tone recognition performance was improved noticeably [16] Generally speaking, TEPC could provide F0-related information for good speech recognition in quiet However, music is far more complex than speech It requires much finer perception of pitch than speech Although the current study presented relatively good musical pitch discrimination with temporal information in most cases, this limited temporal cue may not be adequate for the music perception which is more complex than musical pitch discrimination Further investigations on the contribution of other important cues should be demonstrated 5 Conclusion In this study, the effect of TEPC on musical pitch discrimination was investigated with acoustic simulation of cochlear implant Our study results confirmed that (i) the characteristics of instruments influenced the pitch discrimination (ii) TEPC was important for musical pitch discrimination when the spectral cues are limited This indicates that limited spectral and temporal resolution provided in currently CIs may not support accurate musical pitch perception New design of the CI system may be required Acknowledgements We are grateful to all the NH listeners for participating in this study This work has been supported in part by the Chinese Academy of Sciences Pilot Project of the Knowledge Innovation Program (KGCX2-YX-607) and Key Projects in the National Science & Technology Pillar Program in the Eleventh Five-year Plan Period (2008BAI50B08) References [1] Zeng, F G, Rebscher, S, Harrison, W, Sun, X, Feng, H, Cochlear Implants: System Design, Integration, and Evaluation, IEEE Reviews in Biomedical Engineering, vol1, 2008, pp 115-142 [2] Gfeller K, Turner C, Mehr M, Woodworth G, Fearn R, Knutson JF, Witt S, Stordahl J Recognition of familiar melodies by adult cochlear implant recipients and normalhearing adults, Cochlear Implants Int, vol3 (1), 2002,pp 29 53 [3] Kang, R S, Nimmons, G L, Drennan, W R, et al, Development and validation of the University of Washington Clinical Assessment Of Music Perception test, Ear Hear, vol 30(4), 2009, pp411-8 [4] Gfeller K, Witt S, Woodworth G, Mehr MA, Knutson J, Effects of frequency, instrumental family, and cochlear implant type on timbre recognition and appraisal, Ann Otol Rhinol Laryngol, vol111, 2002, pp349 356 [5] Ward R Drennan and Jay T Rubinstein, Music perception in cochlear implant users and its relationship with psychophysical capabilities, J Rehabil Res Dev, vol 45(5), 2008, pp 779 789 [6] Loizou, P, Speech processing in vocoder-centric cochlear implants, Adv Otorhinolaryngol Basel, Karger, vol64, 2006, pp109 143 [7] Rosen, S, Temporal information in speech: Acoustic, auditory and linguistic aspects, Philos Trans R Soc London, Ser B 336, 1992, pp 367 373 [8] Kong, Y-Y, and Zeng, F-G, Temporal and spectral cues in Mandarin tone recognition, J Acoust Soc Am, vol120, 2006, pp 2830 2840 [9] Meng Yuan, et at, Cantonese tone recognition with enhanced temporal periodicity cues J Acoust Soc Am, vol126 (1), 2009, pp 327-337 [10] Green, T, Faulkner, A, and Rosen, S, Spectral and temporal cues to pitch in noise-excited vocoder simulations of continuous-interleaved sampling cochlear implants, J Acoust Soc Am, vol112, 2002, pp 2155 2164 [11] Green, T, Faulkner, A, and Rosen, S, Enhancing temporal cues to voice pitch in continuous interleaved sampling cochlear implants, J Acoust Soc Am, vol116, 2004, pp 2298 2310 [12] Greenwood, DD, A cochlear frequency-position function for several species 29 years later, 1990, J Acoust Soc Am, vol87 (6), pp 2592 2605 [13] Levitt, H, Transformed up-down methods in psychoacoustics, J Acoust Soc Am, vol49, 1971, pp 467 477 [14] Plack CJ, Oxenham AA, Fay RR, Popper AN Pitch: Neural coding and perception, New York: Springer, 2005, pp 7-55 [15] Shannon, RV, Zeng, F-G, Kamath, V, Wygonski, J, Ekelid, M, Speech recognition with primarily temporal cues, Science, vol270, 1995, pp 303 304 [16] Xu, L, Tsai, Y, and Pfingst, B E, Features of stimulation affecting tonal-speech perception: Implications for cochlear prostheses, J Acoust Soc Am, vol112, 2002, pp 247 258 714