Overview of ITU-R BS.1534 (The MUSHRA Method)

Similar documents
Digital Audio: Some Myths and Realities

Video Quality Evaluation with Multiple Coding Artifacts

Objective quality measurement of audio using multiband dynamic range analysis

PERCEPTUAL QUALITY ASSESSMENT FOR VIDEO WATERMARKING. Stefan Winkler, Elisa Drelie Gelasca, Touradj Ebrahimi

Loudspeakers and headphones: The effects of playback systems on listening test subjects

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

Measuring Radio Network Performance

1 Overview of MPEG-2 multi-view profile (MVP)

Experiment 4: Eye Patterns

Final Report. Executive Summary

BER MEASUREMENT IN THE NOISY CHANNEL

SUBJECTIVE QUALITY EVALUATION OF HIGH DYNAMIC RANGE VIDEO AND DISPLAY FOR FUTURE TV

Measuring and Interpreting Picture Quality in MPEG Compressed Video Content

Lecture 2 Video Formation and Representation

ECE 5765 Modern Communication Fall 2005, UMD Experiment 10: PRBS Messages, Eye Patterns & Noise Simulation using PRBS

ETSI TR V1.1.1 ( )

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices

ARTEFACTS. Dr Amal Punchihewa Distinguished Lecturer of IEEE Broadcast Technology Society

RECOMMENDATION ITU-R BT Methodology for the subjective assessment of video quality in multimedia applications

Audacity Tips and Tricks for Podcasters

APPLICATION OF A PHYSIOLOGICAL EAR MODEL TO IRRELEVANCE REDUCTION IN AUDIO CODING

How to Obtain a Good Stereo Sound Stage in Cars

Loudness of transmitted speech signals for SWB and FB applications

INTERNATIONAL TELECOMMUNICATION UNION

ABSTRACT 1. INTRODUCTION

Higher-Order Modulation and Turbo Coding Options for the CDM-600 Satellite Modem

Experiment 7: Bit Error Rate (BER) Measurement in the Noisy Channel

ESG Engineering Services Group

PEVQ ADVANCED PERCEPTUAL EVALUATION OF VIDEO QUALITY. OPTICOM GmbH Naegelsbachstrasse Erlangen GERMANY

DTS Neural Mono2Stereo

1 Introduction to PSQM

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Getting Started with the LabVIEW Sound and Vibration Toolkit

DCI Requirements Image - Dynamics

Understanding PQR, DMOS, and PSNR Measurements

NCTA Technical Papers

Predicting Performance of PESQ in Case of Single Frame Losses

Natural Radio. News, Comments and Letters About Natural Radio January 2003 Copyright 2003 by Mark S. Karney

Perceptual Video Metrics, a new vocabulary for QoE. Jeremy Bennington Cheetah Technologies

MEASURING LOUDNESS OF LONG AND SHORT TONES USING MAGNITUDE ESTIMATION

THE ROLE OF AUDIO IN THE SPORTS VIEWING EXPERIENCE AUDIO PRODUCTION & DISTRIBUTION WORKSHOP DECEMBER 11, 2017

BASE-LINE WANDER & LINE CODING

In addition, the choice of crossover frequencies has been expanded to include the range from 40 Hz to 220 Hz in 10 Hz increments.

Edison Revisited. by Scott Cannon. Advisors: Dr. Jonathan Berger and Dr. Julius Smith. Stanford Electrical Engineering 2002 Summer REU Program

Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio

Adaptive Key Frame Selection for Efficient Video Coding

Hugo Technology. An introduction into Rob Watts' technology

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Operation Manual OPERATION MANUAL ISL. Precision True Peak Limiter NUGEN Audio. Contents

Digital Representation

Precision testing methods of Event Timer A032-ET

OPERA APPLICATION NOTES (1)

SUBJECTIVE QUALITY OF VIDEO BIT-RATE REDUCTION BY DISTANCE ADAPTATION

Experiments on tone adjustments

Understanding Compression Technologies for HD and Megapixel Surveillance

TERRESTRIAL broadcasting of digital television (DTV)

Diamond Cut Productions / Application Notes AN-2

UHD Features and Tests

Comparison of NRZ, PR-2, and PR-4 signaling. Qasim Chaudry Adam Healey Greg Sheets

Proposed Standard Revision of ATSC Digital Television Standard Part 5 AC-3 Audio System Characteristics (A/53, Part 5:2007)

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

ANTHEM NOW SHIPPING MRX 710 AND MRX 510 A/V RECEIVERS No trade offs: high-end performance with ease of integration.

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Spatial Audio Quality Perception (Part 1): Impact of Commonly Encountered Processes

There is a button to select either the AES/EBU or S/PDIF input for the D/A converter, which is located on the rear panel.

Lecture 9 Source Separation

An Evaluation of Video Quality Assessment Metrics for Passive Gaming Video Streaming

Contents. Welcome to LCAST. System Requirements. Compatibility. Installation and Authorization. Loudness Metering. True-Peak Metering

Do Zwicker Tones Evoke a Musical Pitch?

Signia Rated Superior to Competing Products for Music Sound Quality

Tech Note: How to measure additive phase noise of amplifiers using the 7000 Series

Extreme Experience Research Report

MP212 Principles of Audio Technology II

TR 038 SUBJECTIVE EVALUATION OF HYBRID LOG GAMMA (HLG) FOR HDR AND SDR DISTRIBUTION

Security in digital cinema

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.

HELM: High Efficiency Loudness Model for Broadcast Content

Pitch is one of the most common terms used to describe sound.

HBI Database. Version 2 (User Manual)

Understanding Layered Noise Reduction

IP Telephony and Some Factors that Influence Speech Quality

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

Acoustical Noise Problems in Production Test of Electro Acoustical Units and Electronic Cabinets

Keep your broadcast clear.

Please feel free to download the Demo application software from analogarts.com to help you follow this seminar.

Investigation into Background Noise Conditions During Music Performance

Voxengo Soniformer User Guide

Table 1 Pairs of sound samples used in this study Group1 Group2 Group1 Group2 Sound 2. Sound 2. Pair

CPE 400L Computer Communication Laboratory. Laboratory Exercise #9 Baseband Digital Communication

INTERNATIONAL TELECOMMUNICATION UNION ).4%2.!4)/.!,!.!,/'5% #!22)%2 3934%-3

ETSI TR V1.1.1 ( )

XB-14 Quick Operation Manual V1 23/10/2013

Orbital Ka-ISO. Ext Ref Ka LNB with integrated isolator. Orbital Research Ltd Marine Drive, White Rock, BC. Canada V4B 1A9

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

SC26 Magnetic Field Cancelling System

Proceedings of Meetings on Acoustics

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

Decoder Assisted Channel Estimation and Frame Synchronization

Transcription:

Overview of ITU-R BS.1534 (The MUSHRA Method) Dr. Gilbert Soulodre Advanced Audio Systems Communications Research Centre Ottawa, Canada gilbert.soulodre@crc.ca 1

Recommendation ITU-R BS.1534 Method for the subjective assessment of intermediate quality level of coding systems 2

Quick Intro MUSHRA - MUlti-Stimulus Hidden Reference and Anchor Multi-Stimulus: Listeners have instant random access to each of the test items and the reference signal. Hidden Reference: One of the test items is a copy of the reference signal. Anchor: One of the test items must be a version of the reference signal low-passed filtered at 3.5 khz. 3

Background (yet another standard...) ITU-R BS.1116 was developed to assess the performance of high-quality perceptual audio codecs. Artifacts were expected to be small and difficult to hear. Internet-quality codecs create clearly audible impairments. A new method for evaluating their performance in a rigorous fashion was required. ITU-R BS.1534 was developed based on multi-stimulus method devised by Soulodre for comparing signals and systems with clearly audible differences. 4

BS.1534 versus BS.1116 Both methods try to estimate worst-case performance. BS.1534 tries to keep the parts of BS.1116 that are most effective, while dealing with clearly audible differences. BS.1534 is the same as BS.1116 for - listening environment (noise and reverberation) - reproduction system - training of listeners - selection and use of critical audio material - high-quality reference signal 5

Main Differences BS.1116 Double-blind triplestimulus with hidden reference. BS.1534 Multi-stimulus with hidden reference. 5-point impairment scale Continuous quality scale Detection and grading process Sorting and grading process 6

TEST: Rank According to Size 7

Results Item Rank Diameter Golf ball 1 4.3 cm Baseball 2 7.3 Soccer ball 3 22 Basketball 4 24 The Moon 5 3.5 Million Ranking does not provide any information about the size of the relative differences. Want to get as much information from subjects as possible. Also, what if the actual Moon was used as the reference? Would differences between the balls be noticeable? Consider how you just performed the ranking task. 8

Choice of Reference Signal Subjects need a reference in order to know the best-case or benchmark performance. The choice of reference signal is critical to the outcome of the experiment. When evaluating impairments (distortions) it is important to use a high-quality reference signal. Using a degraded reference (e.g. bandlimited) will introduce biases. The difficulty for the listeners is to compare apples and oranges. 9

Perceptual Distance Large perceptual distance between the reference and the test items. Small perceptual distance between the various test items. Quality Ref d A d B A B d AB Solution: Double-blind Multi-Stimulus Method 10

Controls for the multi-stimulus method Provides subjects with random access to the test items. Subjects tend to rank (sort) and then grade the test items. Get benefits of both paired-comparisons and grading!! 11

Evaluating Methodologies In a BS.1116 test the test items are presented to the subjects sequentially. In a BS.1534 test the subjects have random access to the test items. A formal subjective test was conducted to compare the performance of the two methods and to evaluate consistency of grades given by subjects. Also evaluate the degree of resolution provided. Highly controlled impairments were applied to a source signal. Broadband random noise (white) increased systematically in 2dB increments (9 levels of noise). Systematic increase of noise level allows relative qualities of the signals to be measured objectively and indisputably. 12

Results of Noise Impairment Tests Imperceptible 5 Mean Subjective Grade Perceptible but not annoying Slightly annoying Annoying 4 3 2 Random Access Sequential Access Very annoying Unacceptable 1 0 2 4 6 8 10 12 14 16 Relative Noise Level, db 5 subjects performed test using both methods. Button assignment randomized for each subject. Both methods give monotonic decrease in grades with increasing noise level. 13

Results of Noise Impairment Tests Imperceptible 5 Perceptible but not annoying Sequential Access Random Access Mean Subjective Grade Slightly annoying Annoying 4 3 2 Very annoying Unacceptable 1 0 2 4 6 8 10 12 14 16 Relative Noise Level, db 0 2 4 6 8 10 12 14 16 Relative Noise Level, db Error bars indicate critical differences. Random Access method gives finer resolution. Error bars are half the size of the Sequential Access method. 14

Subject Consistency Sequential Access Random Access Imperceptible 5 Perceptible but not annoying 4 Subjective Grade Slightly annoying Annoying 3 2 Very annoying Unacceptable 1 0 2 4 6 8 10 12 14 16 Relative Noise Level, db 0 2 4 6 8 10 12 14 16 Relative Noise Level, db Sequential Access Method - grades are not monotonic. Random Access Method - grades decrease monotonically with increasing noise levels. Random access method provides greater consistency. 15

Anchors A true BS.1534 test requires that a 3.5 khz low-pass filtered version of the reference signal be included as a test item. Additional anchors can be included. Intended to allow the results from different experiments to be scaled and compared A bad idea! Scaling between experiments introduces bias. Limits flexibility in experimental design. Also, the anchors probably introduce bias by drawing the subject s attention to a specific type of impairment (bandlimiting). Anchors make no sense when testing systems where bandlimiting is not an issue. 16

BS.1116 Scales BS.1534 5.0 4.0 3.0 2.0 1.0 4.9 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.1 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 Imperceptible Perceptible but Not Annoying Slightly Annoying Annoying Acceptable but Very Annoying Excellent Good Fair Poor Bad BS.1534 uses a relative scale so grades depend on context. Cannot compare results between experiments. 17

Taking Care of Details It is very important to take care of the details when conducting a subjective test, a) conduct a pilot test b) randomize the buttons and trials for each subject c) training must be done properly - subjects must hear ALL of the test sequences, - subjects should hear the full range of impairments, - there should be no surprises during the blind rating phase. 18

Why not always use BS.1534? The multi-stimulus method of BS.1534 has many advantages when conducting subjective tests. It tends to provide more consistent inter-subject data due to the sorting and grading process. So why not use the multi-stimulus method for all subjective tests (instead of BS.1116)? ANSWER: With the multi-stimulus method, you can t tell if a subject is guessing! Need to use BS.1116 if differences between the reference and the test items are hard to hear. 19

When to use the multi-stimulus method? When the differences between the reference signal and the test items are clearly audible. Not limited to clearly audible impairments. When comparing systems/sounds that are very different from each other (i.e. when comparing apples and oranges). We ve used it to evaluate inverse filtering algorithms, and the perception of envelopment in multichannel surround. You may already be using it (solo buttons on a mixer)! Choose a scale that makes sense for your test. Leave out the anchors. 20