THE MPEG-H TV AUDIO SYSTEM

Similar documents
ATSC Standard: A/342 Part 1, Audio Common Elements

Field Tests for Immersive and Interactive Broadcast Audio Production using MPEG-H 3D Audio

TECHNICAL SPECIFICATIONS FOR THE DELIVERY OF CLOSE TO TRANSMISSION TELEVISION PROGRAMMES TO THE

COPYRIGHT 2011 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

Digital Video Engineering Professional Certification Competencies

Dolby MS11 Compliance Testing with APx500 Series Audio Analyzers

Allocation and ordering of audio channels to formats containing 12-, 16- and 32-tracks of audio

New Technologies for Premium Events Contribution over High-capacity IP Networks. By Gunnar Nessa, Appear TV December 13, 2017

REGIONAL NETWORKS FOR BROADBAND CABLE TELEVISION OPERATIONS

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting

Ultra HD Forum Draft: Ultra HD Forum Phase B Guidelines

Verifying Digitized Files

MediaKind Content Processing

An Introduction to Dolby Vision

one century of international standards

Scalable Media Systems using SMPTE John Mailhot November 28, 2018 GV-EXPO

IPTV delivery of media over networks managed end-to-end, usually with quality of service comparable to Broadcast TV

Audio Watermarking (NexTracker )

supermhl Specification: Experience Beyond Resolution

Cue tone encoding and decoding with the HSI21 module. 3Gb/s, HD, SD embedded domain Dolby E to PCM decoder with audio shuffler

TECHNICAL STANDARDS FOR DELIVERY OF FILE BASED RADIO PROGRAMMES TO

Professional Media. over IP Networks. An Introduction. Peter Wharton Happy Robotz. Introduction to Video over IP

Flexible Encoding Platform

Cisco D9894 HD/SD AVC Low Delay Contribution Decoder

High Efficiency Video coding Master Class. Matthew Goldman Senior Vice President TV Compression Technology Ericsson

Multi-language audio in Dolby E. A description of how to encode multiple COPYRIGHT 2011 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

UHD 4K Transmissions on the EBU Network

4K UHDTV: What s Real for 2014 and Where Will We Be by 2016? Matthew Goldman Senior Vice President TV Compression Technology Ericsson

WHITE PAPER THE FUTURE OF SPORTS BROADCASTING. Corporate. North & Latin America. Asia & Pacific. Other regional offices.

TIME-COMPENSATED REMOTE PRODUCTION OVER IP

Date <> Time-of-day <> Frequency <> Phase

ATSC Digital Television Standard: Part 6 Enhanced AC-3 Audio System Characteristics

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Sound Measurement. V2: 10 Nov 2011 WHITE PAPER. IMAGE PROCESSING TECHNIQUES

Audio Watermarking (SyncNow ) Audio watermarking for Second Screen SyncNow with COPYRIGHT 2011 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

ATI Theater 650 Pro: Bringing TV to the PC. Perfecting Analog and Digital TV Worldwide

METADATA CHALLENGES FOR TODAY'S TV BROADCAST SYSTEMS

VNP 100 application note: At home Production Workflow, REMI

COPYRIGHT 2011 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

quantumdata 980 Series Test Systems Overview of Applications

TECHNICAL MEDIA SPECIFICATION ON THE FILE BASED SUBMISSION OF MATERIALS TO BE AIRED

New Standards That Will Make a Difference: HDR & All-IP. Matthew Goldman SVP Technology MediaKind (formerly Ericsson Media Solutions)

Exhibits. Open House. NHK STRL Open House Entrance. Smart Production. Open House 2018 Exhibits

Media Delivery Technical Specifications for VMN US Network Operations

SNG AND OB VANS. Broadcast Contribution Challenges in an IP World SOLUTION GUIDE. New interactive services to boost productivity.

Elegance Series Components / New High-End Audio Video Products from Esoteric

DTS Neural Mono2Stereo

COZI TV: Commercials: commercial instructions for COZI TV to: Diane Hernandez-Feliciano Phone:

8K AND HOLOGRAPHY, THEIR IMPACT ON COMMUNICATIONS AND FUTURE MEDIA TECHNOLOGY

DVB-UHD in TS

ATI Multimedia Center 7.6 Guide to New Features

Standard Definition. Commercial File Delivery. Technical Specifications

FascinatE Newsletter

Transmission System for ISDB-S

KTVN Silver Springs DTV Translator. K29BN D in KTVN Shop

TEN.02_TECHNICAL DELIVERY - INTERNATIONAL

Microwave PSU Broadcast DvB Streaming Network

The implementation of HDTV in the European digital TV environment

Multi-CODEC 1080P IRD Platform

ATSC TELEVISION IN TRANSITION. Sep 20, Harmonic Inc. All rights reserved worldwide.

MGW Ace Decoder. Professional Portable HEVC & H.264 Decoder VIDEO INNOVATIONS

!! 1.0 Technology Brief

OBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST

NOTICE. (Formulated under the cognizance of the CTA R4 Video Systems Committee.)

Digital Backbone Network Applications for Inter-City and Intra-City Regionai CATV Networks

One of the challenges in a broadcast facility is

High Dynamic Range Master Class. Matthew Goldman Senior Vice President Technology, TV & Media Ericsson

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

MANAGING HDR CONTENT PRODUCTION AND DISPLAY DEVICE CAPABILITIES

Frame Compatible Formats for 3D Video Distribution

The Current State of UHD HDR

OBJECT- BASED AUDIO FOR TELEVISON PRODUCTION

High Definition Television. Commercial File Delivery. Technical Specifications

Forward TS Product Line

Digital Terrestrial HDTV Broadcasting in Europe

BM- AV1- E16SHD Manual BM-AV1-E16SHD. 16 Channel Digital Audio Monitor. User s Guide. Version /01/2013. Version 2.

Encoder Modulator. IPTV Setup

AMERICAN NATIONAL STANDARD

HDMI 8x8 and 16x16 Crossbarrepeater for OEM applications

Video broadcast using cloud computing with metadata Carlos R. Soria-Cano 1, Salvador Álvarez Ballesteros 2

DOCSIS SET-TOP GATEWAY (DSG): NEXT GENERATION DIGITAL VIDEO OUT-OF-BAND TRANSPORT

Technology Cycles in AV. An Industry Insight Paper

C8000. sync interface. External sync auto format sensing : AES, Word Clock, Video Reference

Overview and Technical presentation

Initial Report of the UHDTV Ecosystem Study Group

ITU Workshop on "TV and content delivery on Integrated Broadband Cable Networks" Hangzhou, China, 26 May 2017 ITU-T SG9 OVERVIEW

MOBILE DIGITAL TELEVISION. never miss a minute

BM-A1-E16SHD V2.2. Manual BM-A1-E16SHD. 16 Channel Digital Audio Monitor. User s Guide. Page 1

EarStudio: Analog volume control. The importance of the analog volume control

Development trends in delivery of Live and VOD based services

OPEN STANDARD GIGABIT ETHERNET LOW LATENCY VIDEO DISTRIBUTION ARCHITECTURE

Technical Solution Paper

Cisco D9859 Advanced Receiver Transcoder

Dynamic Range Management in. Kenneth Hunold Broadcast Applications Engineer Dolby Laboratories, Inc.

Laboratory platform DVB-T technology v1

IQDEC01. Composite Decoder, Synchronizer, Audio Embedder with Noise Reduction - 12 bit. Does this module suit your application?

Requirements for the Standardization of Hybrid Broadcast/Broadband (HBB) Television Systems and Services

HEVC: Future Video Encoding Landscape

DIGITAL PROGRAM INSERTION FOR LOCAL ADVERTISING Mukta Kar, Ph.D., Majid Chelehmal, Ph.D., Richard S. Prodan, Ph.D. Cable Television Laboratories

Digital Representation

Transcription:

This whitepaper was produced in collaboration with Fraunhofer IIS. THE MPEG-H TV AUDIO SYSTEM Use Cases and Workflows

MEDIA SOLUTIONS FRAUNHOFER ISS THE MPEG-H TV AUDIO SYSTEM INTRODUCTION This document describes common use cases for the MPEG-H next generation audio codec. It is important to understand that the MPEG-H TV Audio System does not only describe a single audio codec like, for example, Augmentative and Alternative Communication (AAC) but instead a complete audio delivery system from capture to the end user. Next Generation Audio (NGA) codecs exploit the fact that today s audio decoders are able to handle more complex operations than were previously possible. This allows a far greater reliance on the decoder to render the audio on the specific reproduction system being used and enables the user to personalize the audio experience. In a traditional audio decoder a 5.1 AAC stream would be decoded to six channels, each of which would simply be fed via amplification to the corresponding speaker. In this case, the broadcaster maintained control over the end user experience by encoding multiple streams each with a complete mix for every coding mode (one 5.1 and one stereo for example) and / or language. This naturally requires lots of bandwidth and is inefficient since much of the audio mix is common to all streams. NGA codecs can carry channel-based data in the same way as existing codecs but they also allow the carriage of objects and, in the case of MPEG-H, Higher Order Ambisonics (HOA). This means that, for example, a channel based 5.1 bed can be encoded separately to user-selectable objects such as monaural language tracks. This is clearly much more efficient than multiple 5.1 encodes. The key features of MPEG-H Audio which set it apart from previous generations of audio codecs are: OBJECTS The ability to transmit specific elements of the audio mix separately and allow the user to change between different equivalent elements like different languages and alter these elements in terms of volume and position within the limits defined by the broadcaster. Objects allow: 1) Personalization the ability of the end user to select components such as biased commentary or languages 2) Dialogue Enhancement The ability to change the volume of the dialogue in relation to the ambience 3) Advanced Accessibility features Audio Description services can be provided in a very efficient way together with multiple languages, while still enabling Dialogue Enhancement HIGHER ORDER AMBISONICS The ability to transmit a sound field based on mathematical description of the sound field. This format allows an easy manipulation of immersive sound on the receiver side and is currently the most favored format for VR and AR applications. Rendering and advanced loudness and DRC capabilities. The ability of the decoder / renderer to make best use of the reproduction resources (speaker / headphone configuration) and adapt the audio to the reproduction device (TV speaker, AVR / soundbar or mobile device). 2

MEDIA SOLUTIONS FRAUNHOFER ISS THE MPEG-H TV AUDIO SYSTEM THE AUTHORING UNIT (AU) New ways of producing and defining content are required to support these complex, but efficient ways of processing audio. Besides the audio data, information about the properties of each audio element and their relationship to other elements are required. All the additional information is conveyed as metadata, and for the MPEG-H TV Audio System this is defined and handled by the newly introduced concept of an authoring unit. The AU is used as part of the mixing process and allows the engineer to define how all the individual captures will form the final presentation, including the personalization options available to the end user. The AU generates what is called a scene description. In the scene description the role and properties of each audio element of the actual mix is defined. An audio element can be a channel, an object or an ambisonic representation of the audio. These elements can be mixed together based on the content generators preference. By way of a simple example the AU might be used to associate a set of mixed audio channels with a 5.1, channel-based bed, plus two mono language tracks in the form of objects. The objects are defined by a set of metadata including their location, their default volume and the type of audio. Additionally, the content creator can allow the end user to alter the position and the volume within predefined limits. The output of the authoring unit is the uncompressed pulse code modulation (PCM) audio tracks in combination with metadata containing the scene description. For today s infrastructure the physical output of the AU is usually SDI. The metadata defined in the AU is modulated onto a PCM Control Track which is commonly carried on the 16th SDI channel along with the uncompressed audio. This combination allows a robust and secure transport of the metadata. ENCODING AND DECODING The purpose of a contribution encoder is to provide compression which is relatively transparent and allows downstream manipulation of the signals without audible artifacts. Historically, contribution encoding of audio has either used standard codecs at higher bitrates, in some cases with proprietary alignment (such as Ericsson s Phase Aligned Audio) or proprietary codecs such as Dolby E. The MPEG-H TV Audio System introduces a contribution format which, like Dolby E, is designed to replace SDI channel-based transport where compression and a secure transport of metadata is required. The input to the contribution encoder will be 16 channels of PCM audio over SDI where the 16th channel may be dedicated to the control track output from an external AU. In use cases where the AU is upstream of the contribution encoder the control track is demodulated by the encoder and the metadata is embedded and carried in the bitstream. Where the AU is not present upstream the encoder must act as a rudimentary AU in defining the minimum required metadata which includes the channel configuration, the object and loudness metadata. The audio channels are carried as independent full bandwidth encoded signals i.e. no exploitation of inter-channel redundancy or bandwidth (in the case of low-frequency effects (LFE) channels, for example) is used in the compression. It should be noted that MPEG-H encoded contribution bitstreams are not decodable by consumer MPEG-H decoder / renderers such as those found in set-topboxes. An MPEG H contribution decoder is required. The purpose of the contribution decoder is to decode the audio streams to PCM and re-modulate the metadata defined in the AU or encoder onto the control track. These are then output over SDI for further production, archiving or passing directly to the emission encoder. 3

MEDIA SOLUTIONS FRAUNHOFER ISS THE MPEG-H TV AUDIO SYSTEM The emission encoder is responsible for defining the bitstream which will be decoded by the end user device such as set-top-box or smartphone including any channels, objects, HOAs and personalization options. This means that all the metadata associated with these components must be made available to it. With the exception of a few predefined legacy configurations like stereo or 5.1 channel based setups, it is necessary that the emission encoder receives all metadata together with the PCM audio from an AU. OPERATING MODES Here we define some common use cases of MPEG-H which place different requirements on the encode workflow. LIVE MIX This use case covers any case where the final mix is defined at source. A typical example would be a live sports or news event where the audio mix will be defined in the outside broadcast (OB) truck. Here, the AU will be in the OB truck and will produce a control track in real-time. The contribution encoder will encode audio components and demodulate the metadata from the control track for carriage in the bitstream. A contribution decoder at a production site, for example, will decode the bitstream and present PCM audio components and a control track over SDI. Any intermediate mixing at the production facility would require an AU to re-write the control track. There may be one or more distribution hops all using MPEG-H contribution encoders enabling the preservation of the control track. At the final emission encoder, the control track and PCM component data are used to define the MPEG-H bitstream for consumption by the end user device. AUDIO INPUTS SDI BASEBAND AUDIO AND CONTROL TRACK SDI BASEBAND AUDIO AND CONTROL TRACK AUTHORING UNIT CONSUMER 4

MEDIA SOLUTIONS FRAUNHOFER ISS THE MPEG-H TV AUDIO SYSTEM LIVE, MULTIPLE MONO In this use case the acquired audio is not mixed at source but instead is first compressed using a contribution encoder and delivered to a production facility where the mixing and authoring is performed. In this case, the first contribution encoder must define the mandatory metadata associated with audio being carried in the knowledge that it will be used as input to a downstream authoring unit. The configuration of the encoder is limited to the transport of a number (N) of mono channels, each associated with a channel of the SDI input. The encoder GUI or API is used to define the number of mono tracks. The downstream AU is then used to define the metadata and modulate the control track to act as input to either subsequent distribution hops or an emission encoder. AUDIO INPUTS SDI BASEBAND AUDIO AND BASIC CONTROL TRACK AUTHORING UNIT CONSUMER N MONO CONFIGURATION SDI BASEBAND AUDIO AND CONTROL TRACK DISTRIBUTION OF LEGACY, CHANNEL BASED CONTENT This use case applies mostly to the carriage of legacy, channel based content using MPEG-H contribution encoders. Where a number of traditional, channel based, audio presentations are to be broadcast with a particular service (for example 5.1, English Language and stereo, Korean language presentations) the contribution encoder can be used to define the metadata. The encoder GUI or API is used to assign SDI input channels to one or more channel groups each of which has an associated loudness. This data is then used to set the metadata carried in the which will, after decode, appear in the control track. In this way, the unit is effectively acting as a basic authoring unit. AUDIO INPUTS SDI BASEBAND AUDIO AND CONTROL TRACK CONSUMER CHANNEL GROUP INPUT 5

MEDIA SOLUTIONS FRAUNHOFER ISS THE MPEG-H TV AUDIO SYSTEM OFF-LINE AUTHORING In certain circumstances, usually where the audio inputs are known to be fixed and stable, it may be desirable to use an authoring unit to define a set of channels, objects and personalization features off-line. With this feature broadcasters can generate a set of presets for the typical use cases and the onsite crew only needs to load the right preset. This reduces the time to setup the transmission and reduces the risk of errors in the metadata. The configuration and metadata output by SDI control track in the SDI, as used in previous examples is output as a configuration file. This file can then be uploaded to the contribution encoder via the GUI or API and will be applied to the audio input to the encoder via SDI. The output of the contribution encoder will be an with the metadata encoded as per the configuration file. Downstream contribution decodes and encodes and the emission encode will follow as per previous use cases. Audio Inputs AUTHORING UNIT Config File SDI Baseband Audio and Control Track Emission CONSUMER Contribution 6

MEDIA SOLUTIONS FRAUNHOFER ISS THE MPEG-H TV AUDIO SYSTEM CONCLUSION This document has introduced the MPEG-H TV Audio System and in so doing has described both the advantages in its use and the differences when compared to existing audio codecs. The use of objects and Higher Order Ambisonics not only enhances the immersive audio experience by ensuring that a single stream renders optimally on any reproduction system but also allows user personalization of features such a commentary and crowd noise. This, object based, delivery of audio also offers significant bandwidth savings, particularly when delivering content in multiple languages. 7