FascinatE Newsletter

Similar documents
Automatic Camera Selection for Format Agnostic Live Event Broadcast Production

Multiprojection and Capture

Exhibits. Open House. NHK STRL Open House Entrance. Smart Production. Open House 2018 Exhibits

Video Technologies for Next Generation Immersive Media

The software concept. Try yourself and experience how your processes are significantly simplified. You need. weqube.

THE MPEG-H TV AUDIO SYSTEM

The software concept. Try yourself and experience how your processes are significantly simplified. You need. weqube.

Understanding Compression Technologies for HD and Megapixel Surveillance

OBJECT-AUDIO CAPTURE SYSTEM FOR SPORTS BROADCAST

MULTIMEDIA TECHNOLOGIES

17 October About H.265/HEVC. Things you should know about the new encoding.

supermhl Specification: Experience Beyond Resolution

High Efficiency Video coding Master Class. Matthew Goldman Senior Vice President TV Compression Technology Ericsson

Tech Paper: Modernizing the Company Cafeteria

New forms of video compression

Subtitle Safe Crop Area SCA

Enhancing Music Maps

METHOD, COMPUTER PROGRAM AND APPARATUS FOR DETERMINING MOTION INFORMATION FIELD OF THE INVENTION

APPLICATION NOTE EPSIO ZOOM. Corporate. North & Latin America. Asia & Pacific. Other regional offices. Headquarters. Available at

Will Widescreen (16:9) Work Over Cable? Ralph W. Brown

Cisco Video Surveillance 6400 IP Camera

Entrance Hall Exhibition

Software Quick Manual

V9A01 Solution Specification V0.1

AI FOR BETTER STORYTELLING IN LIVE FOOTBALL

TIME-COMPENSATED REMOTE PRODUCTION OVER IP

Vicon Valerus Performance Guide

PROMAX NEWSLETTER Nº 25. Ready to unveil it?

WHITE PAPER THE FUTURE OF SPORTS BROADCASTING. Corporate. North & Latin America. Asia & Pacific. Other regional offices.

Large Format UHD Display-65UH5C. Easy Ways to Elevate Your Corporate Identity: In Conference Rooms

VP2780-4K. Best for CAD/CAM, photography, architecture and video editing.

Transparent Computer Shared Cooperative Workspace (T-CSCW) Architectural Specification

How To Stretch Customer Imagination With Digital Signage

Alcatel-Lucent 5910 Video Services Appliance. Assured and Optimized IPTV Delivery

INTRODUCTION AND FEATURES

New-Generation Scalable Motion Processing from Mobile to 4K and Beyond

THINKING ABOUT IP MIGRATION?

Alpha channel A channel in an image or movie clip that controls the opacity regions of the image.

Achieve Accurate Critical Display Performance With Professional and Consumer Level Displays

Reflections on the digital television future

DVB-T2 Transmission System in the GE-06 Plan

IMPROVING VIDEO ANALYTICS PERFORMANCE FACTORS THAT INFLUENCE VIDEO ANALYTIC PERFORMANCE WHITE PAPER

HDR Overview 4/6/2017

ATSC Standard: A/342 Part 1, Audio Common Elements

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

videowall [promultis] videowall

SIZE CLASS 65" UN65LS003

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Research & Development. White Paper WHP 318. Live subtitles re-timing. proof of concept BRITISH BROADCASTING CORPORATION.

Speech Recognition and Signal Processing for Broadcast News Transcription

A Statement of Position on Advanced Technologies IP HDR & 4K WHITE PAPER

-Technical Specifications-

PIERO SPORTS GRAPHICS

Color Reproduction Complex

2D/3D Multi-Projector Stacking Processor. User Manual AF5D-21

Unique Design and Usability. Large Zoom Range

JEDI : Just Explore Dimension

Usage of any items from the University of Cumbria s institutional repository Insight must conform to the following fair usage guidelines.

ISELED - A Bright Future for Automotive Interior Lighting

Case Study: Can Video Quality Testing be Scripted?

RadarView. Primary Radar Visualisation Software for Windows. cambridgepixel.com

Concept of ELFi Educational program. Android + LEGO

What is a Visual Presenter? Flexible operation, ready in seconds. Perfect images. Progressive Scan vs. PAL/ NTSC Video

G-106Ex Single channel edge blending Processor. G-106Ex is multiple purpose video processor with warp, de-warp, video wall control, format

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

Glossary Unit 1: Introduction to Video

Panasonic Highlights 100th Anniversary, Future Vision at CES 2018

ITU-T Y Functional framework and capabilities of the Internet of things

Guide to designing a device incorporating MEMSbased pico projection

Agenda. ATSC Overview of ATSC 3.0 Status

ALEXA. Framelines and ARRIRAW Converter / SDK Settings WHITE PAPER. Date: 7 October 2014

Smart Traffic Control System Using Image Processing

Agenda minutes each

New Technologies for Premium Events Contribution over High-capacity IP Networks. By Gunnar Nessa, Appear TV December 13, 2017

Pablo Rio, Pablo PA. V2.0 rev 13 New Feature List. If you have any questions please contact Damon Hawkins

G-106 GWarp Processor. G-106 is multiple purpose video processor with warp, de-warp, video wall control, format conversion,

SCode V3.5.1 (SP-601 and MP-6010) Digital Video Network Surveillance System

ATI Theater 650 Pro: Bringing TV to the PC. Perfecting Analog and Digital TV Worldwide

UHD 4K Transmissions on the EBU Network

16CH 1080p HD-SDI Security MAGIC Lite Series DVR System - Auto detects Analog/960H/HD-SDI

Cinos take collaboration to the next level with 'The Cyviz Experience'

Overview of the Hybridcast System

Audio Video Broadcasting at Bethlehem Lutheran Church

8K AND HOLOGRAPHY, THEIR IMPACT ON COMMUNICATIONS AND FUTURE MEDIA TECHNOLOGY

MiraVision TM. Picture Quality Enhancement Technology for Displays WHITE PAPER

Intelligent Monitoring Software IMZ-RS300. Series IMZ-RS301 IMZ-RS304 IMZ-RS309 IMZ-RS316 IMZ-RS332 IMZ-RS300C

RECOMMENDATION ITU-R BT.1201 * Extremely high resolution imagery

Virtual Graphics & Enhancements Virtual Advertising Insertion For All Sports VIRTUAL PLACEMENT PRODUCT INFORMATION SHEET

Altera's 28-nm FPGAs Optimized for Broadcast Video Applications

Introduction. Technology Development Group

NEW APPROACHES IN TRAFFIC SURVEILLANCE USING VIDEO DETECTION

An Introduction to Dolby Vision

PRACTICAL APPLICATION OF THE PHASED-ARRAY TECHNOLOGY WITH PAINT-BRUSH EVALUATION FOR SEAMLESS-TUBE TESTING

TEPZZ A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: H04S 7/00 ( ) H04R 25/00 (2006.

White Paper : Achieving synthetic slow-motion in UHDTV. InSync Technology Ltd, UK

TEPZZ 94 98_A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2015/46

Date <> Time-of-day <> Frequency <> Phase

USING LIVE PRODUCTION SERVERS TO ENHANCE TV ENTERTAINMENT

Access technologies integration to meet the requirements of 5G networks and beyond

VISUAL RADIO PRODUCTION FOR SPORT EVENTS

Transcription:

1 IBC Special Issue, September 2011 Inside this issue: FascinatE http://www.fascinate- project.eu/ Ref. Ares(2011)1005901-22/09/2011 Welcome from the Project Coordinator Welcome from the project coordinator Live ultra high resolution panoramic video Person tracking and production scripting Format agnostic 3D audio system FascinatE rendering node with ROI zoom and gesture control FascinatE network and delivery ALEXA M - The new capture device for the Omnicam Use of broadcast cameras to support the Omnicam p1 p2 p2 p3 p4 p5 p6 p6 I am happy to introduce you to FascinatE Since February 2010 our consortium consisting of eleven partners spread over Inside this issue: Europe has been working to implement our idea of the ultra high resolution interactive television service of the future. A full system comprising appropriate capturing and analysis technology, networking components and various terminal devices is being implemented. The capturing side uses as base an ultra high-definition panorama augmented with additional cameras and also 3D and ambient audio. This information is compiled into a layered scene representation together with metadata. The networking components will be able to interpret the layered scene representation and will adapt the content depending on the type of service or the capabilities of the target device. In respect to terminal devices the whole range from high resolution, immersive displays for a bigger audience, home environments with TV sets down to mobile devices used by individuals are covered. Interaction methods tailored to the device, e.g. hand gestures for big devices and touch gestures for the smaller ones are being investigated. During our first test shoot we had the opportunity to capture a full range of content of the English Premier League soccer game Chelsea vs. Wolverhampton Wanderers (article in first issue of the newsletter). We will showcase our results at different stages of the project. The first demonstrations will be given here at IBC 2011. In order to be kept up-to-date on the developments of the project please visit www.fascinate-project.eu or follow us on twitter "@Fascinate_Prjct". Georg Thallinger, Project Coordinator The FascinatE Consortium Special points of interest: Demonstrations highlighting key progress in project Real time stitching of OmniCam high resolution panoramic video Object based 3D audio system Gesture control of user defined scene Networks for real time video navigation The FascinatE consortium at our kick off meeting in Graz, February 2010

2 FascinatE Live Ultra-High Resolution Panoramic Video In the advanced format-agnostic production framework developed by FascinatE, real-time acquisition of ultra-high definition panoramic video is essential because this enables either the production side or the end user to select interesting viewing directions independently of what the camera operator shot. The omni-directional camera developed by Fraunhofer HHI uses six HD cameras mounted on a mirror-rig to capture an ultra-high definition 180 panoramic video with a resolution of 7000 x 1920 pixels. The major challenge here is blending and stitching these six camera views to form one single panoramic image. Even in panoramic video, the viewer is quite sensitive to incorrect stitching and blending if moving objects pass the border between views. Hence, a set of g e o m e t r i c a l a n d photometric corrections have to be applied to each camera view as well as the careful blending of image borders. Fraunhofer HHI is presenting its Realtime Stitching Engine Figure 1. 2D OMNICAM capturing six HD cameras at 25 FPS and stitching them together to a panoramic video at 7k x 2k pixel resolution. The real-time panoramic video can be observed at a large screen next to the omni-directional camera. 3D OmniCam The Fraunhofer Heinrich Hertz Institute HHI has developed a scalable, mirror-based, multi-camera rig that can be used for capturing immersive high-resolution 3D video panoramas. With its special mechanical and optical features it enables an optimal arrangement of multiple HD stereo cameras that solves the fundamental dilemma between parallax-free stitching of video panoramas on the one hand and the parallax needed for 3D stereo reproduction on the other. The rig is scalable in increments of 24 degrees and supports acquisition of live 3D video panoramas of up to 360 degrees with a maximal resolution of about 15.000 2.000 pel for each stereo view. The prototype of this new 3D camera can be seen at the FascinatE booth. Figure 2. 3D OMNICAM Person Tracking & Production Scripting Person Tracking Joanneum Research presents a demo for person detection and tracking in high-resolution panoramic video streams, obtained from a panoramic camera stitching video streams from six HD tiles. The tracking algorithm has to detect and track persons over six static and rectified HD image-sequences from the OmniCam. Instead of using the ultra-high definition image, each video tile is separately analysed by different workstations to enable real-time analysis. The AV content analysis uses a CUDA accelerated feature point tracker, a blob detector, and a CUDA HOG person detector, which are used for region tracking in each of the tiles before fusing the results for the entire panorama. The results of the person and blob detector for Figure 3 The left image shows detected person regions and tracked feature points. The resulting tracked persons with their IDs are shown on the right. each image of the different image sequences yield the regions of detected persons for further processing. Furthermore, person IDs are linked to the appropriate combined regions with their corresponding feature points. The tracking system is demonstrated on a single PC with appropriate graphics board, processing a full HD stream. Results are shown in figure 3. Production Scripting Engine The Production Scripting Engine (PSE) is responsible for decision making on content selection. The key feature is to automatically select a suitable area within the OmniCam panorama image, in addition to cuts between different broadcast cameras. Selection behaviour is based on pragmatic (cover most interesting actions) and cinematographic (ensure basic aesthetic principles) rules. In some cases, this is not fully automatic but involves a human in the loop, a production team member deciding between prepared options. The PSE is a distributed component with at least one instance at the production site and one at the terminal end. The output of the PSE is called script, which consists of a combination of content selection options and decisions, renderer instructions, user interface options etc. Scripts are passed to subsequent PSE components from the production site towards the terminal, where final instructions are given to a device-specific renderer.

3 IBC Special Issue, September 2011 Format Agnostic 3D Audio System Creating a format agnostic interactive broadcast experience poses some interesting challenges to the partners from Technicolor and The University of Salford who are responsible for the audio aspects of FascinatE. Of chief importance is the need to record the given audio scene in such a way that the content can be rendered on any reproduction system at the user end and can update depending on the dynamic viewing point. This demands a paradigm shift from how audio has been traditionally recorded for broadcast. Instead of broadcasting to match a specific hardware set up such as stereo, 5.1, 7.1 etc we adopt an object orientated approach which can be reproduced on any system. The audio scene is considered to be made up Instead of broadcasting of a set of audio objects (point sources with a to match a specific specific location) and an hardware set up such as ambient sound field stereo, 5.1, 7.1 etc we contribution. The challenge adopt an object at the recording side orientated approach therefore is to record the which can be sound field as well as the reproduced on any content and location of the audio objects at the scene. system This often involves using d i f f e r e n t o r a d a p t e d recording techniques to what is considered standard practice in the broadcast industry. Ideally each sound source would be individually close miked and tracked in space, however in many cases (such as the first FascinatE test shoot at a football match) this is not possible and the content and position of the audio objects needs to be derived by processing the signals from several microphones near to the sources. The ambient sound field can also be recorded in such a way that it can be updated to match a given viewing position for example, using ambisonic microphones such as the Eigenmike or the SoundField microphone which record the 3 dimensional sound field at a given point. With audio objects and sound field accurately recorded it is possible to encode these sources in various sound field representations such as high order ambisonics (HOA) or wave field synthesis (WFS) which can in turn be decoded for any reproduction system from stereo (e.g. on mobile devices) to true 3D sound using HOA with height in large public installations. As the user pans around the visual scene it is possible to both rotate and translate this sound field to match the new viewing position based on camera pan and zoom. On the rendering side, it is important t h a t t h e a u d i o u p d a t e s accurately with the updating Figure 4. The Eigenmike is used by view and that it matches the FascinatE for high order ambisonics user preferences based on a combination of production choices and user input. FascinatE bridges the gap between passive viewer and active participant scenarios; current television broadcasts could be considered as passive viewing, where the audio remains stationary regardless of the camera position; conversely active participant viewing is more akin to a video game scenario where the audio updates completely with the viewing position. Of interest for FascinatE is which of these viewing paradigms the user subscribes to when navigating round the scene. Future work will therefore be centred on not only recording the audio scene such that the content is format agnostic but also on determining how best to render the audio to match user preferences. Figure 5. System diagram for FascinatE audio

FascinatE FascinatE Rendering Node with ROI Zoom and Gesture Control Figure 6: Head position estimation example A first terminal prototype, shown in the Fascinate stand at IBC 2011, demonstrates the capability of navigating within content captured by FascinatE sensors, such as panoramic videos. The demonstrator employs gesture recognition to simplify the interaction between the terminal and the end user. It is also focused in a home scenario where the end user interacts with the rendered content on a high-definition TV set. The Universitat Politècnica de Catalunya (UPC) developed for this purpose a fast and robust head and hand tracking algorithm using depth information from a range sensor, allowing interactive and immersive applications. This functionality is used to control a real time rendering platform developed by Technicolor. This platform is configurable by scripts and provides Virtual Camera navigation with pan, tilt and zoom commands. An applied XML In order to interpret user gestures as based scripting means to navigate mechanism controls within a panorama, and scales visual hands and heads rendering performed are tracked by on camera clusters exploiting depth offering multiple estimation. This regions of interest process includes modeling templates for heads and calculating an elliptical matching score. The template is resized depending on the distance the person is placed. For a given search zone a matching score provides head position probabilities and confidence values for position estimations. For tracking the hands to understand the performed gestures, a workspace is defined as a 3D box, placed in relation to the detected head position. Within this 3D box, hands are detected by merging and filtering samples with similar size and depth information. Finally, an empirical law relating the area of a surface in the image with its real world counterpart is obtained. A distinction of open or closed hands is obtained by segmenting the area of the detected hand. An example of all these steps is shown in Figure 6. The variety of available end terminals require nowadays a format agnostic production to prepare the content best suited to all. FascinatE terminals and services will supply 4 interactive, personalized visual perspective to enrich the user experience. Content navigation like pan, tilt and zoom allows the user a real immersive experience beyond simple channel switching. The scalable architecture of the rendering platform developed for FascinatE allows applications of different target terminals such as home theaters or smart phones. An applied XML based scripting mechanism controls and scales visual rendering performed on camera clusters offering multiple regions of interest (see Figure 7). This supports automation of workflows and optimization of delivery channels. The visual rendering of such layered scenes into personalized perspectives on end user screens are performed by transformation from the circular panorama onto flat surfaces (fig 8). Additional effort is spend to place graphical elements for user information in relation to the Figure 7: Multiple Region-of- Interests in Panorama selected region of interest and the display surface used for presentation. In conclusion, the demonstrator presented at IBC 2011 is able to Figure 8 rendering of personalized perform a fast (68fps) perspectives by transformation from the circular panorama onto flat surfaces and robust hand and head tracking with an error of less than 6cm. The resulting smooth hand trajectories can be used for further gesture classification and analysis. This technology is applied to a real time capable terminal platform for pan, tilt and zoom navigation within a panoramic scene. An easy personalization by gestures is complemented by scripting support offering perspective options such as prepared region of interests.

5 IBC Special Issue, September 2011 FascinatE Network and Delivery The transmission of the FascinatE layered scene representation represents a major challenge for a delivery network, essentially in terms of bandwidth and processing requirements. As an example, the live delivery of the current FascinatE audio/video test material requires an uncompressed data rate of around 16Gbps. FascinatE aims to deliver immersive video services to a large range of terminals from high end audio/video set-ups with fibre connectivity, to low-powered mobile devices. To deliver an immersive and interactive media experience to any device in a scalable manner, the project has focused so far on the development of Audio/Video Proxies. Their role is to perform some of the media processing tasks on behalf of a terminal so as to reduce the processing and bandwidth requirements for the terminal hardware. The two following prototypes are demonstrated at IBC 2011: 1. Network Proxy for Real-Time Video Navigation: and coding processes are executed in real-time for each client device. The proxy then delivers a compressed video stream containing the requested views, which only requires a standard decoding step before display. With this approach, Ultra-HD content can be watched interactively in a natural manner, even on a low-power and small-display device. 2. Spatial segmented delivery of immersive media: Spatial segmentation is used as a method to efficiently deliver parts of the 7k x 2k video panorama to devices which are not capable of displaying the entire resolution at once, such as smartphones and tablets. The general concept behind spatial segmentation is to spatially split each video frame into several tiles. The video frames corresponding to the various tiles are encoded independently and stored separate as a new video stream, or spatial segment. We focus in this demonstrator on the case where the A/V proxy only requests a subset of segments, based on the ROI selected by the user(s) for which it performs the spatial segment re-assembly. In the prototype, spatial segments are transported using a protocol similar to HTTP adaptive streaming. The spatial segments are then reassembled by the proxy. The navigation can be controlled on a mobile device, such as a tablet or smart phone. Functionality is further increased by using multiple resolution layers, which allow for smoother zooming (Figure 10). Figure 9 Overview of the Network Proxy for Real-Time Video Navigation We focus in this demonstrator on a specific case where the proxy is able to process in real-time enduser requests for navigating a very high resolution video panorama (Figure 9). The proxy has access to a 7k x 2k video panorama and sends a reframed and compressed video to each client device at an appropriate resolution and bitrate. In this demonstrator, the end-user can directly navigate the 7k panoramic video using a tablet or a mobile phone equipped with a touchscreen. The user commands are translated into a stream of 2D translation and zooming commands that are sent upstream to the proxy. The corresponding reframing, rescaling Figure 10 Spatial segmented delivery of immersive media - Overview of the Network Proxy for Real-Time Video Navigation

6 FascinatE ALEXA M the new capturing device for the OmniCam The Alexa camera gained an overwhelming reputation in motion picture and broadcast productions for its outstanding quality and ease of operation. The OmniCam would greatly benefit from the quality of this device; but a much smaller form factor is required. Hence we reduced weight and size of the Alexa camera. At IBC the first working prototypes of Alexa M will be presented. In this modular camera version the sensor head is separated form the electronics back end, connected through an ultra fast fibre connection. The sensor A 3D model of the test bed scenario Figure 12. Laser scanned model of Stamford Bridge Stadium Figure 11. The Arri Alexa M head will be integrated into the next generation of the OmniCam for even more fascinating and immersive panoramic pictures. 3D laser scanning has proven its usefulness in many applications like civil engineering, architecture and archaeology. In the FascinatE system it is essential to know the precise 3D coordinates of all cameras and microphones. When the viewer of a FascinatE event selects an area of interest within the scenario, the displayed images and the presented audio signals should both focus on the same spot or area. A 3D laser scanner not only measures those coordinates, but also generates a static 3D model of the whole scenario enabling easier means of camera calibration and matching. Use of broadcast cameras to support the Omnicam It is impractical to obtain a tightly-zoomed high-definition close-up simply by selecting a window from the Omnicam image. For example, to obtain the same resolution as is available from an HD broadcast camera with a horizontal lens angle of 5 degrees would need the 180-degree panorama to have a horizontal resolution of about 70k pixels (36 times that of an HD image). It therefore makes sense to use conventional broadcast cameras to provide close-ups of key areas of the scene, as used in conventional coverage. To allow the user to zoom smoothly from a wide shot from the Omnicam into a region-of-interest covered by a broadcast camera, it is necessary to ensure that the images from the two cameras can be matched - both spatially and in terms of colorimetry. Figure 13 compares an image from an HD broadcast camera (right) to the corresponding portion of the Omnicam image (left), after background alignment and colour matching, showing the potential gain in resolution (BBC). Figure 13 compares an image from an HD broadcast camera (right) to the corresponding portion of the Omnicam image (left), after background alignment and colour matching, showing the potential gain in resolution.

FascinatE The FascinatE Project and Consortium FascinatE is an EU-funded project involving a group of 11 partners from across Europe. FascinatE stands for: Format-Agnostic SCript-based INterAcTive Experience and is looking at broadcasting live events to give the viewer a more interactive experience no matter what device they are using the view the broadcast. The FascinatE project is developing a system to allow end-users to interactively view and navigate around an ultra-high resolution video panorama showing a live event, with the accompanying audio automatically changing to match the selected view. The output will be adapted to their particular kind of device, covering anything from a mobile handset to an immersive panoramic display. At the production side, this requires the development of new audio and video capture systems, and scripting systems to control the shot framing options presented to the viewer. Intelligent networks with processing components will be needed to repurpose the content to suit different device types and framing selections, and user terminals supporting innovative interaction methods will be needed to allow viewers to control and display the content. Contact Details and Project Office Editor Georg Thallinger JOANNEUM RESEARCH Graz, Austria georg.thallinger@joanneum.at Ben Shirley University of Salford Salford, UK b.g.shirley@salford.ac.uk FascinatE is funded by the European Union s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 248138 by the FascinatE consortium 7