Loudness of transmitted speech signals for SWB and FB applications Challenges, auditory evaluation and proposals for handset and hands-free scenarios Jan Reimes HEAD acoustics GmbH Sophia Antipolis, 2017-05-10
Introduction (1/2) Loudness of received speech signal - most simple but important quality parameter of a communication device! Too loud: annoying, may cause hearing damage! Too quiet: impact on intelligibility (and other aspects of conversational quality ) Several measurement standards provide requirements for comfortable listening level Loudness == Level? Psycho-acoustics! 2
L/dB L/dB Introduction (2/2) For NB and WB terminals, so-called loudness ratings (LR) are used to evaluate transmission characteristics (e.g. ITU-T P.79) Basic concept of LR: calculate attenuation (in db) to achieve same perceived loudness compared to intermediate reference system (IRS) 300 500 f/hz 2000 4000 Reference System R f 15 13 11 9 7 5 15 13 D(f) = H f R f 11 LR ~ w f D f 300 500 f/hz 2000 4000 Device under test H f f 9 7 5 Weighted sum of transfer function provides attenuation versus IRS Technical measure; no information about absolute loudness Addresses mainly linear distortions Method not (yet?) defined for SWB/FB applications 3
Recent work on loudness Standardization: ITU-T SG12 / Q5 launched new work item P.Loudness Goal: evaluate and/or modify existing loudness models originated from psycho-acoustic domain Several standardized models already exists: Zwicker approach (DIN 45631/A1, ISO 532-1) Moore/Glasberg approach (ANSI S3.4-2007, ISO 532-2) Current release candidate model for P.Loudness available Based on very basic auditory experiments, no real terminals Loudness model is based on stationary loudness (ANSI S3.4) Modifications are fitted to auditory results Two modes for handset/hands-free are required Not applicable on artificial head recordings (handset) No binaural aspects considered 4
Auditory evaluation (1/3) Large test corpus based on binaural recordings of terminals (3G, 4G, VoIP) and realistic simulations (compression, codecs, loudspeaker distortions) Stimulus Binaural recording 8 German test sentences (ITU-T P.501) as source material Bandwidth from NB (up to 3.4 khz) to FB (up to 20 khz) Level range between 40 and 90 db SPL 52 conditions per mode (handset and hands-free mode) 4 sentences each 208 test stimuli per mode 5
Auditory evaluation (2/3) Absolute / categorial loudness assessment on 25-point scale 7 anchor definitions for better orientation Already used in previous studies 20 normal-hearing test subjects per mode Hearing-adequate playback of binaural recordings in listening lab 6
Auditory evaluation (3/3) Prior to evaluation: determination of individual loudness functions per test subject with a reference sound Principle of reference sound: should cause similar loudness excitation as speech, but independent of language, content, talker, Three different reference sounds were evaluated: 1 khz Sine tone (refers to definition of sone/phon) 1 Bark noise at 1 khz (used in initial P.Loudness experiments) 3 Bark noise at 1 khz (less tonal, smooth ) 7
Results of loudness models (1/5) Several state-of-the-art loudness models are evaluated: Zwicker: ISO 532-1 Moore/Glasberg: ANSI S3.4 (stationary), version 2002 & 2016, LT/ST smoothing P.Loudness candidate (stationary) Non-stationary models provide loudness vs. time curve, several single value calculations are possible: Average N5 percentile (peak-oriented) LL(p) (used in recent work) Auditory results of test stimuli provide values on point-scale Comparison to loudness models? 8
Results of loudness models (2/5) Proposed procedure for comparison between loudness models (results in phon/sone) and auditory test results (in points) Select reference signal (Sine, 1 Bark noise, 3 Bark noise, ) Calculate inverse of loudness functions with mapping function Transform auditory results in points to level in db ERL Example: 15.0 point in listening test refers to 75 db ERL (same loudness as 3 Bark noise reference signal at 75 db SPL ) 9
Results of loudness models (2/5) Proposed procedure for comparison between loudness models (results in phon/sone) and auditory test results (in points) Select loudness model and single value aggregat Calculate loudness (in sone or phon) for selected reference sound for a certain level range (e.g. from 40 to 90 db SPL ) Calculate mapping function between sone/phon and level Run loudness model on signal-under-test Transform output from sone/phon to level in db ERL with previously determined mapping function 10
Results of loudness models (3/5) Large amount of combinations possible (models, single values, reference signal) Evaluation of prediction performance by RMSE* Considering uncertainty of auditory data Baseline performance: auditory results vs. active speech level (ASL) acc. to ITU-T P.56 models should perform better! ASL/Sinus (HS) ASL/Sinus (HF) 11
Results of loudness models (4/5) Selected results per loudness model handset mode ISO 532-1/Avg./Sinus TVL2016-LT/Avg./3 Bark P.Loudness/3 Bark TVL2002-ST/Avg./1 Bark 12
Results of loudness models (5/5) Selected results per loudness model hands-free mode ISO 532-1/Avg./Sinus TVL2016-LT/LL(p)/3 Bark P.Loudness/3 Bark TVL2002-LT/N5/3 Bark 13
Summary & Conclusions Loudness assessment is a challenging task! SWB/FB terminals are commercially available but currently no instrumental loudness assessment test methods available Large auditory database and listening tests were conducted Considering state-of-the-art terminals and realistic simulations Evaluation of loudness models no clear winner : ISO 532-1 very accurate for HS & HF single model for both TVL2016-LT slightly worse, but considers binaural inhibition P.Loudness candidate also performs adequately, but New loudness model not necessarily needed? Finalize P.Loudness work item in standardization Specify application of loudness models in measurement standards 14
Jan Reimes Research & Standardization HEAD acoustics info@head-acoustics.de www.head-acoustics.de Copyright HEAD acoustics GmbH