Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Musical Acoustics Session 3pMU: Perception and Orchestration Practice 3pMU4. Predicting blend between orchestral timbres using generalized spectralenvelope descriptions Sven-Amin Lembke*, Eugene Narmour and Stephen McAdams *Corresponding author's address: Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Schulich School of Music, McGill University, Montréal, H3A 1E3, Québec, Canada, sven-amin.lembke@mail.mcgill.ca Composers rely on implicit knowledge of instrument timbres to achieve certain effects in orchestration. In the context of perceptual blending between orchestral timbres, holistic acoustical descriptions of instrument-specific traits can assist in the selection of suitable instrument combinations. The chosen mode of description utilizes spectral-envelope estimates that are acquired as pitch-invariant descriptions of instruments at different dynamic markings. Prominent local spectral-envelope traits, such as spectral maxima or formants, have been shown to influence timbre blending, involving frequency relationships between local spectral features, their prominence as formants, and constraints imposed by the human auditory system. We present computational approaches to predict timbre blend that are based on these factors and explain around 85% of the variance in behavioral timbre-blend data. Multiple linear regression is employed in modeling a range of behavioral data acquired in different experimental investigations. These include parametric investigations of formant frequency and magnitude relationships as well as arbitrary combinations of recorded instrument audio samples in dyads or triads. The cataloguing of generalized acoustical descriptions of instruments and associated timbre-blend predictions for various instrument combinations could serve as a valuable aid to orchestration practice in the future. Published by the Acoustical Society of America through the American Institute of Physics 2013 Acoustical Society of America [DOI: 10.1121/1.4800054] Received 28 Jan 2013; published 2 Jun 2013 Proceedings of Meetings on Acoustics, Vol. 19, 035053 (2013) Page 1
INTRODUCTION When orchestrators seek blended timbres of simultaneously sounding instruments, they rely on either experimentation, prior experience, or examples from the musical repertoire. Moreover, suitable instrument combinations for blend are widely discussed in orchestration treatises, which are themselves based on subjective observations made by their authors [1, 2, 3]. Therefore, an acoustical description of instrument-specific traits across extended pitch ranges could present a valuable tool to orchestrators in allowing objective predictions of blend between arbitrary instrument combinations. Important perceptual cues for blend are known to be based on note-onset synchrony, partial-tone harmonicity, and spectral features [4, 5]. The first two factors mainly involve requirements that have to be fulfilled by the musical composition itself and also demand its precise execution during musical performance. In contrast, an orchestrator s choice of blending instruments is more likely motivated by spectral features of particular instruments. Spectral-Envelope Description Spectral-envelope representations aid the identification and description of prominent spectral features which characterize individual instruments and could serve as instrument-specific traits. In the context of orchestral wind instruments, previous studies have suggested the perceptual relevance of pitch-invariant spectral traits that characterize their timbre. The existence of stable local spectral maxima across a wide pitch range has been reported for these instruments [6, 7], which are also termed formants by analogy with the human voice. Furthermore, frequency alignment of formants between instruments has been argued to contribute to the percept of blend [8]. Certain aspects of this hypothesis have been replicated in perceptual investigations, showing that relative frequency location and magnitude difference of main formants are critical to blend [9]. Pitch-invariant spectral traits such as formants can be identified through an empirical spectral-envelope estimation method. Spectral envelopes are estimated by applying a curve-fitting procedure to composite distributions of partial tones compiled across the entire pitch range of instruments. Figure 1 shows such an estimate for the bassoon, exhibiting a prominent main formant at 500 Hz. power spectral density in db 10 5 0 5 10 15 20 25 30 35 spectral envelope estimate composite partial tone distribution 40 0 1000 2000 3000 4000 5000 frequency in Hz FIGURE 1: Empirical spectral-envelope estimate for bassoon at mf, derived from a composite distribution of partial tones across the instrument s entire pitch range. In most cases, spectral-envelope shape varies as a function of the dynamic marking, and as a result, spectral-envelope descriptions should be assessed separately for different dynamics. However, as shown in Figure 2, the frequency location and shape of the main formant of the bassoon still appears to be quite robust to changes in dynamics. This points to a potential utility of main formants serving as stable perceptual cues, which remain largely unaffected by musical Proceedings of Meetings on Acoustics, Vol. 19, 035053 (2013) Page 2
performance. In summary, it can be reasonably assumed that such generalized, instrument-specific spectral-envelope descriptions could represent reliable predictors of blend. FIGURE 2: Temporal spectral-envelope evolution of bassoon playing crescendo-descrescendo on pitch G3, computed with True-Envelope estimation [10]. MODELLING TIMBRE BLEND BASED ON SPECTRAL FEATURES Computational models may be used as objective tools to predict timbre blend between arbitrary instrument combinations. Linear correlation can be employed to associate behavioral blend measures with single acoustical features [5, 11], without, however, assessing the possibility of a combination of descriptors to model the behavioral data. Modelling the data on multiple descriptor variables would furthermore assess the relative contributions of different acoustical features to blend. Past attempts utilizing stepwise-regression models have succeeded in explaining up to 63% of the variance in behavioral blend ratings [5]. This investigation considers the multivariate option by employing linear multiple regression, utilizing a stepwise iteration scheme. A number of spectral-envelope features are tested as potential regressors. These comprise global descriptors of spectral-envelope traits, such as spectral centroid and spectral slope, as well as local spectral-envelope descriptors characterizing formant frequency location and magnitude. For example, the formant descriptors include the frequency at which the formant maximum is located as well as frequency bounds below and above the maximum at which the magnitude has decreased by 3 db or 6 db. Modelled Data Sets In order to attain a greater generalizability, timbre-blend predictions are assessed across three independent data sets of behavioral blend ratings, denoted A, B and C. The three sets differ with regard to the behavioral rating methods and the utilized stimuli (see Table 1). Set B involves a single rating per experimental trial, employing the entire range of the rating scale on a global level, i.e., across all trials. In contrast, sets A and C are based on trials involving multiple stimuli and a corresponding number of ratings, with participants asked to employ the entire scale range on a local level, i.e., relative to the stimuli presented within a given trial. In addition, set A stems from dyads between synthesized analogues of particular instruments and their audio-sample counterparts, whereas set B and C are all based on arbitrary combinations between sampled instruments. Proceedings of Meetings on Acoustics, Vol. 19, 035053 (2013) Page 3
TABLE 1: Behavioral data sets used for regression models and their differences in rating method and stimulus type. Set Rating scale Rated stimuli per trial Stimuli A local 4 dyads, wind instruments B global 1 dyads, wind instruments C local 20 triads, wind and string instruments Preliminary Results Only the models for data set A have been explored; as the data sets B and C are still in the process of being acquired. Therefore, only preliminary results can be reported at this stage. Data set A originates from a parametric investigation of relative frequency and magnitude relationships between the main formants of a variable, synthesized sound and its sampled counterpart [9]. The investigated instruments are flute, oboe, B clarinet, bassoon, C trumpet and French horn. Identical multiple regression solutions are obtained for two instrument subsets, based on a total of 176 cases. Both models explain around 85% of the variance in data set A [instrument subset 1: R 2 ad j =.86, F(3,116) = 250.42, p <.0001; subset 2: R2 =.86, F(3,52) = 117.62, ad j p <.0001]. The models rely on two spectral regressors: 1) a formant-based descriptor relating spectral-envelope magnitude differences between the upper 3 db frequency bound, and 2) the absolute difference in spectral centroid. Notably, the local spectral-envelope descriptor makes a much stronger contribution than the global descriptor based on the spectral centroid, with the standardized beta coefficients for the former being about five times larger. CONCLUSION Generalized, instrument-specific spectral-envelope descriptions can be shown to predict behavioral timbre blend to a promisingly high degree. The exploration of regression models on the remaining data sets will clarify the preliminary trends as well as expand the blend-prediction scenarios to arbitrary instrument combinations in dyads and triads, in the latter case even involving string instruments. The joint evaluation of prediction models for all three data sets will allow more generalizable prediction approaches based on spectral features to be derived. This will also involve the consideration of auditory-model representations. The final aim of the prediction models will, it is hoped, make a significant contribution to establishing a generalized perceptual theory of blend with respect to spectral features. At the same time, it will motivate the cataloguing of holistic acoustical descriptions of instruments that allow timbre-blend predictions for arbitrary instrument combinations to be made. This will serve as a valuable aid to orchestration practice. ACKNOWLEDGMENTS The authors would like to thank Bennett Smith for his assistance in the setup of perceptual testing hardware in general and for programming the software interfaces to acquire the behavioral data sets B and C. We would also like to thank Kyra Parker and Emma Kast for their assistance in running the behavioral experiments leading to data sets B and C. This work was supported by a Schulich School of Music scholarship to SAL and grants from the Canadian Natural Sciences and Engineering Research Council and the Canada Research Chairs program to SM. Proceedings of Meetings on Acoustics, Vol. 19, 035053 (2013) Page 4
REFERENCES [1] N. Rimsky-Korsakov, Principles of orchestration (Dover Publications, New York) (1964). [2] C. Koechlin, Traité de l orchestration : en quatre volumes (M. Eschig, Paris) (1954). [3] C. Reuter, Klangfarbe und Instrumentation: Geschichte - Ursachen - Wirkung, Systemische Musikwissenschaft (Peter Lang, Frankfurt am Main) (2002). [4] G. J. Sandell, Concurrent timbres in orchestration: a perceptual study of factors determining blend (Northwestern University) (1991). [5] G. J. Sandell, Roles for Spectral Centroid and Other Factors in Determining Blended" Instrument Pairings in Orchestration, Music Perception 13, 209 246 (1995). [6] K. E. Schumann, Physik der Klangfarben - Vol. 2, professorial dissertation, Universität Berlin, Berlin (1929). [7] D. Luce and J. Clark, Physical Correlates of Brass-Instrument Tones, The Journal of the Acoustical Society of America 42, 1232 1243 (1967). [8] C. Reuter, Die auditive Diskrimination von Orchesterinstrumenten - Verschmelzung und Heraushörbarkeit von Instrumentalklangfarben im Ensemblespiel (Peter Lang, Frankfurt am Main) (1996). [9] S.-A. Lembke and S. McAdams, Timbre blending of wind instruments : acoustics and perception, in Proc. 5th International Conference of Students of Systematic Musicology / SysMus12, 1 5 (Montreal, Canada) (2012). [10] F. Villavicencio, A. Röbel, and X. Rodet, Improving LPC Spectral Envelope Extraction Of Voiced Speech By True-Envelope Estimation, in 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, I 869 -I 872 (2006). [11] D. Tardieu and S. McAdams, Perception of dyads of impulsive and sustained instrument sounds, Music Perception 30, 117 128 (2012). Proceedings of Meetings on Acoustics, Vol. 19, 035053 (2013) Page 5