Semantic description of timbral transformations in music production

Similar documents
SocialFX: Studying a Crowdsourced Folksonomy of Audio Effects Terms

Classification of Timbre Similarity

AN INVESTIGATION OF MUSICAL TIMBRE: UNCOVERING SALIENT SEMANTIC DESCRIPTORS AND PERCEPTUAL DIMENSIONS.

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Developing multitrack audio e ect plugins for music production research

Animating Timbre - A User Study

A Need for Universal Audio Terminologies and Improved Knowledge Transfer to the Consumer

Crossroads: Interactive Music Systems Transforming Performance, Production and Listening

TYING SEMANTIC LABELS TO COMPUTATIONAL DESCRIPTORS OF SIMILAR TIMBRES

MUSI-6201 Computational Music Analysis

Dynamic Spectrum Mapper V2 (DSM V2) Plugin Manual

Eventide Inc. One Alsan Way Little Ferry, NJ

Liquid Mix Plug-in. User Guide FA

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

Convention Paper Presented at the 145 th Convention 2018 October 17 20, New York, NY, USA

ACME Audio. Opticom XLA-3 Plugin Manual. Powered by

Analysis of Peer Reviews in Music Production

Timbral description of musical instruments

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Subjective Similarity of Music: Data Collection for Individuality Analysis

Timbre blending of wind instruments: acoustics and perception

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

Recognising Cello Performers Using Timbre Models

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Topics in Computer Music Instrument Identification. Ioanna Karydi

Effects of acoustic degradations on cover song recognition

Recognising Cello Performers using Timbre Models

Analysis of Musical Timbre Semantics through Metric and Non-Metric Data Reduction Techniques

Enhancing Music Maps

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Towards a better understanding of mix engineering

Music Recommendation from Song Sets

A Semantic Approach To Autonomous Mixing

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Psychoacoustic Evaluation of Fan Noise

Audio Feature Extraction for Corpus Analysis

Eventide Inc. One Alsan Way Little Ferry, NJ

CS229 Project Report Polyphonic Piano Transcription

A New Method for Calculating Music Similarity

DERIVING A TIMBRE SPACE FOR THREE TYPES OF COMPLEX TONES VARYING IN SPECTRAL ROLL-OFF

Reference Guide Version 1.0

Towards Music Performer Recognition Using Timbre Features

A SEMANTIC DIFFERENTIAL STUDY OF LOW AMPLITUDE SUPERSONIC AIRCRAFT NOISE AND OTHER TRANSIENT SOUNDS

TYPE A USER GUIDE 2017/12/06

Neo DynaMaster Full-Featured, Multi-Purpose Stereo Dual Dynamics Processor. Neo DynaMaster. Full-Featured, Multi-Purpose Stereo Dual Dynamics

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

The Development of a Synthetic Colour Test Image for Subjective and Objective Quality Assessment of Digital Codecs

installation To install the Magic Racks: Groove Essentials racks, copy the files to the Audio Effect Rack folder of your Ableton user library.

Proceedings of Meetings on Acoustics

NOTICE. The information contained in this document is subject to change without notice.

Automatic music transcription

LEARNING TO CONTROL A REVERBERATOR USING SUBJECTIVE PERCEPTUAL DESCRIPTORS

Amazona.de Review Crème Buss Compressor and Mastering Equalizer

Audio Structure Analysis

Music Genre Classification

Features for Audio and Music Classification

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

TEN YEARS OF AUTOMATIC MIXING

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Eventide Inc. One Alsan Way Little Ferry, NJ

Automatic Rhythmic Notation from Single Voice Audio Sources

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Analysis, Synthesis, and Perception of Musical Sounds

Creating a Feature Vector to Identify Similarity between MIDI Files

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

M-16DX 16-Channel Digital Mixer

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DW Drum Enhancer. User Manual Version 1.

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

CMX-DSP Compact Mixers

Environmental sound description : comparison and generalization of 4 timbre studies

Modeling sound quality from psychoacoustic measures

Advance Certificate Course In Audio Mixing & Mastering.

Research & Development. White Paper WHP 228. Musical Moods: A Mass Participation Experiment for the Affective Classification of Music

CTP431- Music and Audio Computing Musical Acoustics. Graduate School of Culture Technology KAIST Juhan Nam

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

NOVEL DESIGNER PLASTIC TRUMPET BELLS FOR BRASS INSTRUMENTS: EXPERIMENTAL COMPARISONS

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

ANALYSIS of MUSIC PERFORMED IN DIFFERENT ACOUSTIC SETTINGS in STAVANGER CONCERT HOUSE

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

An Accurate Timbre Model for Musical Instruments and its Application to Classification

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Visual Encoding Design

Perception and Sound Design

User Manual Tonelux Tilt and Tilt Live

An interdisciplinary approach to audio effect classification

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Visual and Aural: Visualization of Harmony in Music with Colour. Bojan Klemenc, Peter Ciuha, Lovro Šubelj and Marko Bajec

Sound Recording Techniques. MediaCity, Salford Wednesday 26 th March, 2014

An ecological approach to multimodal subjective music similarity perception

Eventide Inc. One Alsan Way Little Ferry, NJ

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

A PSYCHOACOUSTICAL INVESTIGATION INTO THE EFFECT OF WALL MATERIAL ON THE SOUND PRODUCED BY LIP-REED INSTRUMENTS

Music Genre Classification and Variance Comparison on Number of Genres

«Limiter 6» Modules and parameters description

Sound synthesis and musical timbre: a new user interface

Transcription:

Semantic description of timbral transformations in music production Stables, R; De Man, B; Enderby, S; Reiss, JD; Fazekas, G; Wilmering, T 2016 Copyright held by the owner/author(s). This is a pre-copyedited, author-produced version of an article accepted for publication in MM '16 Proceedings of the 2016 ACM on Multimedia Conference following peer review. The version of record is available http://dl.acm.org/citation.cfm?id=2967238 For additional information about this publication click this link. http://qmro.qmul.ac.uk/xmlui/handle/123456789/22150 Information about this research object was correct at the time of download; we occasionally make corrections to records, please therefore check the published record when citing. For more information contact scholarlycommunications@qmul.ac.uk

Semantic Description of Timbral Transformations in Music Production Ryan Stables Digital Media Technology Lab Birmingham City University Birmingham, UK ryan.stables@bcu.ac.uk Joshua D. Reiss joshua.reiss@qmul.ac.uk Brecht De Man b.deman@qmul.ac.uk György Fazekas g.fazekas@qmul.ac.uk Sean Enderby Digital Media Technology Lab Birmingham City University Birmingham, UK sean.enderby@mail.bcu.ac.uk Thomas Wilmering t.wilmering@qmul.ac.uk ABSTRACT In music production, descriptive terminology is used to define perceived sound transformations. By understanding the underlying statistical features associated with these descriptions, we can aid the retrieval of contextually relevant processing parameters using natural language, and create intelligent systems capable of assisting in audio engineering. In this study, we present an analysis of a dataset containing descriptive terms gathered using a series of processing modules, embedded within a Digital Audio Workstation. By applying hierarchical clustering to the audio feature space, we show that similarity in term representations exists within and between transformation classes. Furthermore, the organisation of terms in low-dimensional timbre space can be explained using perceptual concepts such as size and dissonance. We conclude by performing Latent Semantic Indexing to show that similar groupings exist based on term frequency. CCS Concepts Information systems Information systems applications; Multimedia information systems; Multimedia databases; Keywords Semantic Audio, Timbre, Music Production, Hierarchical Clustering, Dimensionality Reduction 1. INTRODUCTION Musical timbre refers to the properties of a sound, other than loudness and pitch, which allow it to be distinguished Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. MM 16, October 15-19, 2016, Amsterdam, Netherlands c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-3603-1/16/10... $15.00 DOI: http://dx.doi.org/10.1145/2964284.2967238 from other sounds [8]. Loudness and pitch can easily be measured in low-dimensional space, allowing sounds to be ordered from quiet to loud or low to high in frequency, whereas timbre is a more complex property of sound, requiring multiple dimensions [11]. To characterise perceptual attributes of musical timbre, listeners often attribute semantic descriptors such as bright, rough or sharp to describe latent dimensions [5]. A widely cited definition of timbre [1] shows it can be determined by a range of low level features of an audio signal, where the spectral content and temporal characteristics both affect the perceived timbre of a sound. Signal analysis techniques can be used to extract information about these elements of a signal. The contribution of these low level features to perceived timbre is often the focus of academic research, whereby dimensionality reduction techniques allow for the organisation of terms in an underlying subspace, with the intention of discovering some perceptually relevant representation of the data [2,4,6,17]. In music production, this is of particular interest as it can allow for the manipulation of audio processing modules, comprising multiple parameters using intuitive, low-dimensional controls [3, 12, 14, 15]. In this paper we report our findings from the Semantic Audio Feature Extraction (SAFE) Project [13], and show that semantic descriptions of musical timbre can be grouped using both parameter and feature space representations, and can exhibit timbral similarities within and across audio processing types. We investigate the use of timbral descriptors to aid the retrieval of contextually relevant processing parameters given natural language descriptions of audio transformations. This allows for the development of intuitive and assistive music production interfaces, based on descriptive cues. 2. SAFE The Semantic Audio Feature Extraction (SAFE) plug-ins 1 provide music producers with a platform to describe timbral transformations in a Digital Audio Workstation (DAW) using natural language [13]. The plugins (referred to herein as transform classes) consist of a five band parametric equaliser, 1 Plugins and datasets available at semanticaudio.co.uk. 337

Num Instances Confidence Popularity Generality N term n term c term p term g 0 193 boxed.250 0.0019 sharp.828 1 bright 153 splash.250 bright 0.0014 deep.819 2 punch 34 wholesome.250 crunch 0.0006 boom.809 3 air 31 pumping.247 room 0.0005 thick.806 4 crunch 29 rounded.247 fuzz 0.0004 piano.696 5 room 28 sparkle.247 crisp 0.0004 strong.596 6 smooth 22 atmosphere.244 clear 0.0004 soft.575 7 vocal 21 balanced.244 cut 0.0004 bass.555 8 clear 20 bass.244 bass 0.0004 gentle.525 9 fuzz 19 basic.244 low 0.0004 tin.483 Table 1: The highest ranking terms using confidence, popularity and generality measures. a dynamic range compressor, amplitude distortion and a reverb effect. When a timbral transformation is recorded, the system extracts the descriptive terminology relating to the transform; a large set of temporal, spectral and abstracted audio features taken across a number of frames of the audio signal, both before and after processing (see [9] for a full list); the name and parameter settings of the audio effect; and a list of additional user data such as age, location, production experience, genre and instrument. This information is stored in an RDF triple store using an empirically designed ontology. 2.1 Dataset The dataset used for the study comprised 2694 transforms, split into four groups according to their transform class. Overall, 454 were applied using a compressor, 303 using distortion, 1679 using an equaliser, and 258 using a reverb. The transforms were described using 618 unique terms taken from 263 unique users (averaging 2.35 terms per user), all of whom were music producers who participated by using the SAFE Plugins within their workflow. We measure the confidence of a descriptor using the sum of its variance in feature space, where each of the features is mapped to a 6-dimensional space using Principal Component Analysis (PCA) in order to remove redundancy, whilst retaining 95% of the variance: c = 1 N 1 M 1 (P C n(m) µ n) 2 (1) M n=0 m=0 To further identify the popularity of a descriptor, we weight the output of Eq. (1) with a coefficient representing the term as a proportion of the dataset: n(d) p = c ln D 1 d=0 n(d) (2) where n(d) is the number of entries for a descriptor d. Finally, we evaluate the extent to which the descriptor is generalisable across a range of transform classes (generality) by finding the weighted mean of the term s sorted distribution. This is equivalent to finding the centroid of the density function across transform classes. g = 2 K 1 k sort(x(d)) k (3) K 1 k=0 where the distribution of the term d is calculated as a proportion of the transform class (k) to which it belongs: x(d) k = n d(k) N(k) 1 K 1 k=0 N(k) (4) Here, N(k) is the total number of entries in class k and n d (k) is the number of occurrences of descriptor d in class k. Using these metrics, the database is sorted and the top 10 descriptors are shown in Table 1. Similarly, Table 2 shows the most commonly used descriptors for each individual transform class. To group terms with shared meanings and variable suffixes, stemming conditions are applied using a Porter Stemmer [10]. This allows for the unification of terms such as, er and th into a parent category (). Compressor Distortion EQ Reverb 27 : punch 23 : crunch 440 : 30 : room 17 : smooth 20 : 424 : bright 13 : air 15 : sofa 6 : fuzz 16 : air 11 : big 14 : vocal 6 : destroyed 16 : clear 10 : subtle 12 : nice 5 : cream 12 : thin 9 : hall 9 : controlled 5 : death 11 : clean 9 : small 9 : together 5 : bass 11 : crisp 8 : dream 9 : crushed 5 : clip 10 : bass 7 : damp 8 : 5 : decimated 9 : boom 7 : drum 7 : comp 5 : distorted 9 : cut 6 : close Table 2: The first ten descriptors per class, ranked by number of entries. 3. WITHIN-CLASS SIMILARITY To find term-similarities within transform classes, hierarchical clustering is applied to differences (processed vs. unprocessed) in timbre space. To do this, the mean of the audio feature vectors from each unique descriptor is computed and PCA is applied, reducing the number of dimensions, whilst preserving 95% of the variance. Terms with < 8 entries are omitted for readability and the distances between datapoints are calculated using Ward distance [16], the results of which are shown in Figure 1. In each transform class, clusters are intended to retain perceived latent groupings, based on underlying semantic representations. From the term clusters, distances between groups of semantically similar timbral descriptions emerge. Among the Compressor terms, groups tend to exhibit correlation with the extent to which gain reduction is applied to the signal. Loud, fat and squashed generally refer to extreme compression, whereas subtle, gentle and soft tend to describe minor adjustments to the amplitude envelope. Distortion features tend to group based on the perceived dissonance of the 338

Feature Dendrogram [Compressor] master hard glue gentle punch soft comp controlled roll subtle flatten crushed tight flat loud sofa drum squashed fat smooth boost limit together compress press nice ice vocal 0 2 4 6 8 10 12 14 Feature Dendrogram [Distortion] destroy death crunch cream fuzz harsh clip decimated grit bass sorry fluff beef rasp drive growl subtle thick almost smooth distorted crisp tin broken fat crushed bright 0 2 4 6 8 10 12 Feature Dendrogram [EQ] bite air tin click cut clean thin mid presence clear hat thick crisp mud low bright vocal bass boom box punch boost add full deep 0 2 4 6 8 10 12 Feature Dendrogram [Reverb] huge drum subtle small distant close hall room wide soft damp echo space reverb natural verb rev air massive dream big 0 2 4 6 8 10 12 14 Figure 1: Dendrograms showing clustering based on feature space distances for each transform class. bass, mid and full tend to fall into separate partitions. Reverb terms tend to group based on size and descriptions of acoustic spaces. Hall and room for example exhibit similar feature spaces, while terms such as soft, damp and natural fall into the same group. 3.1 Parameter Space Representation To illustrate the relevance of the within-class feature groups found using the hierarchical clustering algorithm, we can show that terms within clusters maintain similar characteristics in their parameter spaces. To demonstrate this, Figure 2 shows curves corresponding to two groups of descriptors taken from opposing clusters in the equaliser s featurespace: cluster 2 (, bass, boom, box and vocal) and cluster 8 (thin, clean, cut, click and tin). Curves in cluster 2 generally exhibit a boost around 500 Hz with a high-frequency roll-off, whereas terms in cluster 8 exhibit a boost in highfrequency energy centered around 5 khz. To further evaluate the organisation of terms based on their position in a parameter space, we use PCA to reduce the dimensionality of each space and overlay the parameter vectors. Figure 3 shows this for the distortion and reverb, where in 3(a) the bias is highly correlated with PC2, which tends to organise descriptors based on dissonance. Similarly in 3(b), the mix and gain parameters of the reverb class correlate with PC2 and tend to retain variance using size-based descriptors. These exhibit 0.68 and 0.81 cross-correlation values respectively. 4. INTER-TRANSFORM SIMILARITY To investigate between-class similarities, we perform hierarchical clustering on the dataset, where transforms are grouped by unique terms and separated by transform class. Here, the organisation of terms into clusters is highly correlated with the organisation of terms into transform classes. Out of the 8 data partitions, the mean rank-order generality is 0.23, with a mean of 2.4 unique class labels per group. To identify transform-agnostic descriptors, i.e. those with similar between-class transformations, we select the top 10 terms with the highest generality scores (defined in Table 1) and measure the variance across the transformations in reduced-dimensionality space. All terms had entries in all 4 transform classes, and had at least 10 entries overall. Ranked by between-class agreement: 1. piano (0.001), 2. sharp (0.012), 3. soft (0.013), 4. thick (0.018), 5. tin (0.021), 6. deep (0.022), 7. bass (0.033), 8. gentle (0.039), 9. strong (0.050), 10. boom (0.058). 4.1 Term Frequency Analysis We measure term similarity independently of timbral or parameter space representations, using a term s association to a given transform class. Here, we use term frequency to define distributions across classes, resulting in four-dimensional vectors, e.g. t = [0.0, 0.5, 0.5, 0.0] has equal association with the distortion and equaliser, but no entries in the compressor or reverb classes. We then represent these using a Vector Space Model (VSM), and measure similarity between any two terms (t 1, t 2) using cosine distance: transform, with terms such as fuzz and harsh clearly separated from subtle, rasp and growl. Equalisation comprises a wide selection of description-categories, although terms that generally refer to specific regions of spectral energy such as sim(t 1, t 2 ) = t 1 t 2 t 1 t 2 = N i=1 t 1,it 2,i N N i=1 t2 1,i i=1 t2 2,i In order to better capture the true semantic relations of (5) 339

10 Equalisation Curves for cluster 2 10 Equalisation Curves for cluster 8 5 5 Magnitude 0 Magnitude 0 5 5 10 10 2 10 3 10 4 Frequency (Hz) 10 2 10 3 10 4 Frequency (Hz) (a), bass, boom, box and vocal (b) thin, clean, cut, click and tin Figure 2: Equalisation curves for two clusters of terms in the dataset. 10 1.0 Distortion 1.0 Reverb PC2 0.5 0.0 0.5 Bias death Knee almost destroy harsh distorted crushed bright decimated broken growl crunch drive clipfuzz cream fluff grit thick tin crisp Tone fat rasp InputGain subtle bass beef smooth OutputGain sorry 1.0 1.0 0.5 0.0 0.5 1.0 PC1 PC2 Size dampmix BandwidthFreq 0.5 distant close PreDelay dream hall room EarlyMix 0.0 echospace subtle drum Decay air big small re verb wide natural DampingFreq huge reverb Density massive soft 0.5 Gain 1.0 1.0 0.5 0.0 0.5 1.0 PC1 (a) (b) Figure 3: Biplots of the distortion and reverb classes, showing terms mapped onto 2 dimensions with overlaid parameter vectors. the terms and the transforms they are associated with, we apply Latent Semantic Indexing (LSI) [7], a process that involves reducing the term-transform space from rank four to three by performing a singular value decomposition of the N terms 4 occurrence matrix M = UΣV, and setting the smallest singular values to zero before reconstructing it using M = UΣ V. This process eliminates noise caused by differences in word usage, for instance due to synonymy and polysemy, whereas the latent semantic relationships between terms and effects are preserved. Figure 4 shows the resulting pairwise similarities of the high-generality terms used in Section 4. Here, the most similar terms are bass and strong, deep and sharp and boom and thick (all 0.99). Conversely, we can consider the similarity of transform types based on their descriptive attributes by transposing the occurrence matrix in the VSM. This is illustrated in Figure 4, in which terms used to describe equalisation transforms are similar to those associated with distortion (0.95), while equalisation and compression vocabulary is disjunct (0.641). 5. DISCUSSION/CONCLUSION We have illustrated within- and between-class groupings of semantic descriptions of sound transformations taken from processing modules in a DAW. We showed that the groups represent meaningful subsets of entries by evaluating correlation in their parameter spaces, and that the parameters of each processing module can be used to organise terms in a similar fashion. To evaluate between-transform similarity, we demonstrated that transforms tend to form the basis of 1 2 3 4 5 6 7 8 9 10 bass 1 boom 2 deep 3 gentle 4 piano 5 (a) sharp 6 soft 7 strong 8 thick 9 tin 10 1 2 3 4 Comp 1 Dist 2 EQ 3 (b) Figure 4: Vector-space similarity wrt. (a) highgenerality terms and (b) transform-classes. discrete clusters, and that terms such as piano, sharp, soft, thick and tinny have similar representations across a range of processing types. Finally, we measured the similarity of effects and terms based on their vector-space representations. This shows that equalisation and compression share a common vocabulary of terms, whilst reverb and distortion have a dissimilar description schema. The results are encouraging and show that timbre descriptors cluster in meaningful ways in the context of audio transformations. The findings thus provide useful insight into how to create semantic descriptor spaces for audio effects. Reverb 4 1.00 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 340

6. REFERENCES [1] American Standards Association. American standard acoustical terminology (including mechanical shock and vibration). Technical report, 1960. [2] A. Caclin, S. McAdams, B. Smith, and S. Winsberg. Acoustic correlates of timbre space dimensions: A confirmatory study using synthetic tones. The Journal of the Acoustical Society of America, 118(1):471 482, 2005. [3] M. B. Cartwright and B. Pardo. Social-EQ: Crowdsourcing an equalization descriptor map. In 14th International Society for Music Information Retrieval Conference (ISMIR), 2013. [4] J. Grey. Multidimensional perceptual scaling of musical timbres. The Journal of the Acoustical Society of America, 61(5):1270 1277, 1977. [5] D. Howard and J. Angus. Acoustics and Psychoacoustics. Focal Press, 4th edition, 2009. [6] R. Kendall and E. Carterette. Verbal attributes of simultaneous wind instrument timbres: I. von Bismarck s adjectives. Music Perception: An Interdisciplinary Journal, 10(4):445 467, 1993. [7] T. A. Letsche and M. W. Berry. Large-scale information retrieval with latent semantic indexing. Information sciences, 100(1):105 137, 1997. [8] M. Mathews. Introduction to timbre. In P. Cook, editor, Music, Cognition, and Computerized Sound: An Introduction to Psychoacoustics, chapter 7. MIT Press, 1999. [9] G. Peeters. A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Technical report, IRCAM, 2004. [10] M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130 137, 1980. [11] T. Rossing, R. Moore, and P. Wheeler. The Science of Sound. Addison Wesley, 3 edition, 2002. [12] P. Seetharaman and B. Pardo. Socialreverb: crowdsourcing a reverberation descriptor map. In ACM International Conference on Multimedia, November 2014. [13] R. Stables, S. Enderby, B. De Man, G. Fazekas, and J. Reiss. SAFE: A system for the extraction and retrieval of semantic audio descriptors. In 15th International Society for Music Information Retrieval Conference (ISMIR), 2014. [14] S. Stasis, R. Stables, and J. Hockman. A model for adaptive reduced-dimensionality equalisation. In 18th International International Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, 2015. [15] S. Stasis, R. Stables, and J. Hockman. Semantically controlled adaptive equalisation in reduced dimensionality parameter space. Applied Sciences, 6(4):116, 2016. [16] J. H. Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236 244, 1963. [17] A. Zacharakis, K. Pastiadis, J. D. Reiss, and G. Papadelis. Analysis of musical timbre semantics through metric and non-metric data reduction techniques. In 12th International Conference on Music Perception and Cognition (ICMPC), pages 1177 1182, 2012. 341