Sound Recording Techniques. MediaCity, Salford Wednesday 26 th March, 2014

Similar documents
Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Perceived audio quality of sounds degraded by non linear distortions and. single ended assessment using HASQI

Perception of audio quality in productions of popular music

Audio-Based Video Editing with Two-Channel Microphone

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Classification of Timbre Similarity

Variation in multitrack mixes : analysis of low level audio signal features

Automatic Rhythmic Notation from Single Voice Audio Sources

Experiments on tone adjustments

Analysis, Synthesis, and Perception of Musical Sounds

Music Genre Classification and Variance Comparison on Number of Genres

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT

Modeling sound quality from psychoacoustic measures

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

TERRESTRIAL broadcasting of digital television (DTV)

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

PERCEPTUAL QUALITY ASSESSMENT FOR VIDEO WATERMARKING. Stefan Winkler, Elisa Drelie Gelasca, Touradj Ebrahimi

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Proceedings of Meetings on Acoustics

Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter?

1 Introduction to PSQM

Perceptual dimensions of short audio clips and corresponding timbre features

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

A prototype system for rule-based expressive modifications of audio recordings

A SEMANTIC DIFFERENTIAL STUDY OF LOW AMPLITUDE SUPERSONIC AIRCRAFT NOISE AND OTHER TRANSIENT SOUNDS

Music Recommendation from Song Sets

Sound Quality Analysis of Electric Parking Brake

Musicians Adjustment of Performance to Room Acoustics, Part III: Understanding the Variations in Musical Expressions

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Objective quality measurement of audio using multiband dynamic range analysis

Timbre blending of wind instruments: acoustics and perception

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Recognising Cello Performers using Timbre Models

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

REAL-TIME VISUALISATION OF LOUDNESS ALONG DIFFERENT TIME SCALES

MODELS of music begin with a representation of the

Supervised Learning in Genre Classification

Subjective Similarity of Music: Data Collection for Individuality Analysis

Proceedings of Meetings on Acoustics

MUSI-6201 Computational Music Analysis

Audio Feature Extraction for Corpus Analysis

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

Computational Modelling of Harmony

Measurement of overtone frequencies of a toy piano and perception of its pitch

Topics in Computer Music Instrument Identification. Ioanna Karydi

Effect of Compact Disc Materials on Listeners Song Liking

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

HIDDEN MARKOV MODELS FOR SPECTRAL SIMILARITY OF SONGS. Arthur Flexer, Elias Pampalk, Gerhard Widmer

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

from ocean to cloud ADAPTING THE C&A PROCESS FOR COHERENT TECHNOLOGY

Topic 4. Single Pitch Detection

Effect of task constraints on the perceptual. evaluation of violins

Noise evaluation based on loudness-perception characteristics of older adults

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

THE RELATIONSHIP BETWEEN DICHOTOMOUS THINKING AND MUSIC PREFERENCES AMONG JAPANESE UNDERGRADUATES

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

Creating a Feature Vector to Identify Similarity between MIDI Files

DYNAMIC AUDITORY CUES FOR EVENT IMPORTANCE LEVEL

Convention Paper Presented at the 145 th Convention 2018 October 17 20, New York, NY, USA

ESG Engineering Services Group

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

A New Method for Calculating Music Similarity

Release Year Prediction for Songs

A PSYCHOACOUSTICAL INVESTIGATION INTO THE EFFECT OF WALL MATERIAL ON THE SOUND PRODUCED BY LIP-REED INSTRUMENTS

An Accurate Timbre Model for Musical Instruments and its Application to Classification

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Improving Frame Based Automatic Laughter Detection

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Reduced complexity MPEG2 video post-processing for HD display

Chord Classification of an Audio Signal using Artificial Neural Network

HIT SONG SCIENCE IS NOT YET A SCIENCE

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

Non-Reducibility with Knowledge wh: Experimental Investigations

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices

Keep your broadcast clear.

Correlating differences in the playing properties of five student model clarinets with physical differences between them

Perceptual Analysis of Video Impairments that Combine Blocky, Blurry, Noisy, and Ringing Synthetic Artifacts

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

ISMIR 2008 Session 2a Music Recommendation and Organization

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Topic 10. Multi-pitch Analysis

Speeding up Dirac s Entropy Coder

Automatic Classification of Instrumental Music & Human Voice Using Formant Analysis

Perceptual Considerations in Designing and Fitting Hearing Aids for Music Published on Friday, 14 March :01

Jacob A. Maddams, Saoirse Finn, Joshua D. Reiss Centre for Digital Music, Queen Mary University of London London, UK

Objective Video Quality Assessment of Direct Recording and Datavideo HDR-40 Recording System

Unequal Error Protection Codes for Wavelet Image Transmission over W-CDMA, AWGN and Rayleigh Fading Channels

Temporal Envelope and Periodicity Cues on Musical Pitch Discrimination with Acoustic Simulation of Cochlear Implant

A Language Modeling Approach for the Classification of Audio Music

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

The song remains the same: identifying versions of the same piece using tonal descriptors

Transcription:

Sound Recording Techniques MediaCity, Salford Wednesday 26 th March, 2014

www.goodrecording.net Perception and automated assessment of recorded audio quality, focussing on user generated content.

How distortion affects the perceived quality of music: Psychoacoustic experiments Iain Jackson, Bruno M. Fazenda, Trevor J. Cox, Paul Kendrick, Francis F. Li, Stephen Groves-Kirkby, & Alex Wilson Acoustics Research Centre, University of Salford

How does clipping affect the perception of quality in music? Are hard clipping and soft clipping perceived differently in terms of quality? How well does HASQI predict subjective quality ratings of clipped music? How robust is HASQI across different styles of music?

What is HASQI? Hearing Aid Speech Quality Index (Kates & Arehart, 2010) Models the effect of degradation on quality. Measures the combined effect of noise, nonlinear distortion, and linear filters. For both normal-hearing and hearing-impaired listeners. Good performance for speech signals (Kressner et al, 2013) What happens when applied to music?

Arehart, Kates & Anderson (2011) Wide variety of degradation/processing: Additive noise, peak clipping, amplitude quantisation, compression, compression + babble, spectral sub, high-pass filter, low-pass filter, bandpass filter, positive spectral tilt, negative spectral tilt, single resonance peak, multiple peaks, stationary noise......a total of 112 conditions. But... Only 3 samples of music. Haydn jazz vocalise Quality ratings reasonably well predicted by HASQI. Were also significantly affected by genre of music.

Experiment 1 The effect of hard clipping on perceptions of quality In contrast to previous work, we assess the effect of a single type of processing hard clipping against a comprehensive range of musical styles.

Sample Selection Aim: Select a representative sample of as wide a range of musical styles as possible. Guided by previous work (Rentfrew & Gosling, 2003) 25 prototype songs from each of 14 Genres: Classical, jazz, blues, folk, alternative, rock, heavy metal, country, pop, religious, rap/hip-hop, soul, funk, and electronica/dance. Final sample library of 140 songs. We obtained CD copies of 117 songs on the list. How to scale down to a manageable number of songs for test? Sort and cluster by timbre.

Sample Selection Why select by timbre, not genre?

Genre Intuitively useful but lacking in objectivity. Timbre Apply objective methods to compare songs. Samples clustered using modified version of technique used by Aucouturier and Pachet (2002). Gaussian Mixture Model (GMM) fitted to Mel Frequency Cepstrum Coefficients (MFCC) for 3 sections each song, which are then clustered by similarity. Total number of clusters is an emergent feature: In this case it was found to be 6.

The test set From each of our 6 timbre clusters we draw two samples. One cluster, number 4, however contains only one sample. Additionally, we include the three samples used by Arehart et al (2011) in their previous assessment of HASQI and music ( jazz, Haydn, vocalise). Thus the final test set consists of 14 samples.

Table 1. The 14 songs the final test samples were taken from, by cluster number. Song Name Artist/Composer Riverboat Set: Denis Dillon s Square John Whelan 1 Dance Polka, Dancing on the Riverboat Crazy Train Ozzy Osbourne Haydn * * Ave Maria Franz Schubert 2 Packin' Truck Leadbelly vocalise * Tierney Sutton 3 Kalifornia Fatboy Slim Brown Sugar The Rolling Stones 4 The Four Seasons: Spring Antonio Vivaldi 5 For What It's Worth Buffalo Springfield The Girl From Ipanema Stan Getz Spoonful Howlin' Wolf 6 Nobody Loves Me But My Mother B.B. King jazz * *

Method

Distortion of samples HASQI is continuous between values of 0 to 1. HASQI values used to estimate discrete levels. 10 Levels per song sample: 9 levels of distortion, spread at equal intervals over full range of (available) HASQI values. Plus original, clean sample.

Threshold (% of peak level) Relationship between HASQI values and threshold Crazy Train 100 80 60 40 For What It's Worth 20 0 0 0.2 0.4 0.6 0.8 1 Distortion level (1-HASQI)

Table 1. The 14 songs the final test samples were taken from, by cluster number. Song Name Artist/Composer Example Samples Clean Medium High Riverboat Set: Denis Dillon s Square John Whelan 1 Dance Polka, Dancing on the Riverboat Crazy Train Ozzy Osbourne Haydn * * Ave Maria Franz Schubert 2 Packin' Truck Leadbelly vocalise * Tierney Sutton 3 Kalifornia Fatboy Slim Brown Sugar The Rolling Stones 4 The Four Seasons: Spring Antonio Vivaldi 5 For What It's Worth Buffalo Springfield The Girl From Ipanema Stan Getz Spoonful Howlin' Wolf 6 Nobody Loves Me But My Mother B.B. King jazz * *

Broadly reproduced method used by Arehart et al. 30 participants. Mean age 23.7 years (SD: 4.7 years) No reported hearing impairments Sounds presented over headphones. Sennheiser 650 HD Stereo, 72dB (linear) 140 trials. 14 songs x 10 processing conditions 7 second samples (randomised presentation order) Ratings of overall quality. Slider labelled Bad and Excellent at either end (output: 0-100)

Results

Figure 1. Mean quality ratings of each cluster, as a function of distortion level. (Error bars show 95% CIs.)

Figure 1. Mean quality ratings of each cluster, as a function of distortion level. (Error bars show 95% CIs.)

Differences in quality between timbre clusters? Repeated-measures ANOVA Independent variables: Level of distortion, cluster Dependent variable: Mean quality ratings Significant main effect for distortion level (F(4.97, 144.26) = 458.38, p = <.01, η p ² =.94). Significant main effect for cluster (F(2.33, 67.48) = 42.43, p = <.01, η p ² =.59). Significant interaction of cluster x distortion level (F(11.91, 345.41) = 6.98, p = <.01, η p ² =.19). Each successive level of distortion is associated with a significant decrease in quality ratings, but the rate of degradation is not perceived equally across all timbres.

Table 2. Clusters grouped according to (between group) significantly different quality ratings. Song Name Artist/Composer Riverboat Set: Denis Dillon s Square John Whelan 1 Dance Polka, Dancing on the Riverboat Crazy Train Ozzy Osbourne Haydn * * Ave Maria Franz Schubert 2 Packin' Truck Leadbelly vocalise * Tierney Sutton Spoonful Kalifornia Howlin' Fatboy Slim Wolf 3 6 Nobody Brown Sugar Loves Me But My Mother B.B. The Rolling King Stones 4 jazz The Four * Seasons: Spring * Antonio Vivaldi 35 Kalifornia For What It's Worth Fatboy Buffalo Slim Springfield Brown The Girl Sugar From Ipanema The Stan Rolling Getz Stones For Spoonful What It's Worth Buffalo Howlin' Springfield Wolf 5 6 The Nobody Girl From Loves Ipanema Me But My Mother Stan B.B. King Getz 4 The jazz Four * Seasons: Spring Antonio * Vivaldi

Results HASQI performance

Table 3. Correlation coefficients for quality ratings and values predicted by HASQI for each timbre cluster. Cluster Quality 1.828 2.689 3.693 4.671 5.801 6.755 Mean (SD).732 (.065) HASQI performance: for speech =.942 (Kates & Arehart, 2010) for music =.838, (range =.770 to.849; Arehart et al, 2011) Rnonlin performance: for music =.95 (1 music sample, 10 participants; Moore et al, 2004)

Conclusions How robust is the HASQI model over a comprehensive range of musical styles? The performance of HASQI was found to be (a little) less accurate than previous work suggests. Overall correlation of predicted vs actual quality ratings =.73 (compared to equivalent value of.84 in Arehart et al). Predictive accuracy of HASQI can be improved by factoring in timbral features of samples.

Experiment 2 The effect of Hard Vs Soft clipping on perceptions of quality

Hard versus soft clipping Partial replication of Experiment 1. Both hard and soft clipping processing conditions included in test set. Equivalent to distortion levels 1 to 5 from Experiment 1 (as opposed to levels 1 to 9 considered in Experiment 1). Samples (original, clean files), experimental set-up, procedure, and number of participants all as per Experiment 1.

Threshold (% of peak level) Hard Clipping Thresholds 70 60 50 40 30 20 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Distortion level (1-HASQI)

Threshold (% of peak level) Soft Clipping Thresholds 70 60 50 40 30 20 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Distortion level (1-HASQI)

Hard versus soft clipping Table 4. Comparison examples of hard and soft clipping at equivalent HASQI levels. Song Name Artist/Composer Hard/Soft Clip Distortion Level Clean Low Medium Ave Maria Franz Schubert Hard Soft Packin' Truck Leadbelly Hard Soft

Mean quality rating Mean quality rating Hard versus soft clipping 100 90 80 70 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 100 90 80 70 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 60 60 50 50 40 40 30 30 20 20 10 10 0 0 1 2 3 4 5 6 7 8 9 Distortion Level (0 is clean, 9 is most distorted) 0 0 1 2 3 4 5 6 7 8 9 Distortion Level (0 is clean, 9 is most distorted) Figure 4. Mean quality ratings for hard (left) and soft (right) distortion conditions, shown by cluster. Error bars show 95% CIs.

Hard versus soft clipping Across all samples, no significant difference between ratings for hard and soft clipping. HASQI performance is unaffected by type of distortion.

Experiment 3 Descriptions of quality attributes in different distortion categories

Digital audio sample statistics Since digital audio is encoded as discrete samples of the audio waveform, much can be said about a recording by the statistical properties of these samples. The Probability Mass Function can show the presence of distortion in mastered audio. Consider three categories: 1. The clean distribution, where there is no clipping and a wide dynamic range. 2. Audio with hard-clipping will feature a PMF with high values at its extreme values, where the maximum amplitude has been reached. 3. Where softer distortions are used, there is not one single large value at extremes but more gentle bumps in the nearby regions.

Subjective Test (Wilson & Fazenda, submitted) 63 samples of music, containing a mix of clean, hard-clipping distortion and soft distortions. 22 participants gave quality ratings for each sample on a 5-point scale and also provided 2 descriptors. Ratings for clean samples were significantly higher than for the two distorted categories. The two distortion categories did not significantly differ between themselves (F(1, 2) = 5.72, p < 0.001, η 2 = 0.008).

Verbal descriptions of distortion categories As well as a rating out of 5 participants were also asked to provide two words which described the attributes on which quality was assessed. For example: I gave this sample 5 stars because it was clear and full I gave this sample 1 star because it was distorted and dull Word-clouds of the most common attributes associated with (a) clean samples, (b) hard clipped samples, (c) soft distortion samples.

Verbal descriptions of distortion categories Table shows the five most commonly used descriptor words and their absolute frequencies for each of the clean, hard-clipped and soft distortion categories. Chi-Square analysis shows that there is significant variation in the distribution of words used to describe each of the three categories (χ 2 (8, N = 547) = 33.28; p < 0.001). Bold frequencies in the table indicate values significantly greater (>) or less than (<) the expected counts of the null hypothesis.

Verbal descriptions of distortion categories Distorted is used less than expected by chance to describe samples in the clean category. The opposite is true for both other categories, the hard and soft clipped distortion samples. Samples in the soft category are more frequently described as Distorted than those in the hard category. This suggests that small amounts of hard-clipping can go unnoticed. Punchy used less often when describing the soft distortions, compared to hard-clipping. This may be due to the lesser influence of inter-sample peaks in soft distortions compared to hardclipping. Harsh was not associated with either of the distortion categories but does appear more often than expected by chance for words describing the clean samples.

Conclusions Overall, HASQI found to predict degradation in music quality reasonably well. Performance across hard and soft clipping is very good. Limitation of HASQI for music - not developed for stereo. Model does not account for stereo width and panning.

References K.H. Arehart, J.M. Kates and M.C. Anderson. Effects of noise, nonlinear processing, and linear filtering on perceived music quality. Int. J. Audiol. 50(3):177 190. (2011). J.J. Aucouturier and F. Pachet. Music similarity measures: What s the use?. Proc. ISMIR. (2002). J.M. Kates and K.H. Arehart. The Hearing-Aid Speech Quality Index (HASQI). J. Audio Eng. Soc. 58(5): 363 381. (2010). A. Kressner, D. Anderson, and C. Rozell. Evaluating the generalization of the Hearing Aid Speech Quality Index (HASQI). IEEE Trans. Audio. Speech. Lang. Processing. 21(2): 407 415. (2013). B.C.J. Moore, C-T, Tan, N.Zacharov and V-V. Mattila. Measuring and predicting the perceived quality of music and speech subjected to combined linear and nonlinear distortion. J. Audio Eng. Soc. 52(12): 1228 1244. (2004). J.P. Rentfrow and S.D. Gosling. The Do Re Mi s of everyday life: The structure and personality correlates of music preferences. J. Pers. Soc. Psychol. 84(6): 1236-56. (2003). A. Wilson and B.M. Fazenda. Sonic character: Categorisation of distortion profiles in relation to audio quality of music recordings. Submitted to 17 th Int. Conference on Digital Audio Effects (DAFx-14).