MDPI Film Processing Harder, Better, Faster, Stronger. Brian Wheeler, Library Technologies Digital Library Brown Bag Series #dlbb April 18, 2018

Similar documents
Quality Control Experiences from a Large-Scale Film Digitisation Project

Getting Started After Effects Files More Information. Global Modifications. Network IDs. Strand Opens. Bumpers. Promo End Pages.

Videotape to digital files solutions

Digital Cinema Specifications 71 Locarno Festival

TECHNICAL MEDIA SPECIFICATION ON THE FILE BASED SUBMISSION OF MATERIALS TO BE AIRED

AE16 DIGITAL AUDIO WORKSTATIONS

A COMPARATIVE ANALYSIS OF TAPE TECHNOLOGIES FOR MID-RANGE SYSTEMS AND SERVER APPLICATIONS

Images and Formats. Dave Bancroft. Philips Broadcast Film Imaging

Videotape Transfer. Why Transfer?

The Century Archive Project CAP

NTSC/PAL. Network Interface Board for MPEG IMX TM. VTRs BKMW-E2000 TM

Alpha channel A channel in an image or movie clip that controls the opacity regions of the image.

PWS-100TD1 Tape Digitizing Station Simple-to-use tape digitizing server for efficient migration of legacy videotape archives

Real-time QC in HCHP seismic acquisition Ning Hongxiao, Wei Guowei and Wang Qiucheng, BGP, CNPC

OPERATING GUIDE. HIGHlite 660 series. High Brightness Digital Video Projector 16:9 widescreen display. Rev A June A

NAS vs. SAN: Storage Considerations for Broadcast and Post- Production Applications

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Digital Video Editing

Audiovisual Archiving Terminology

MULTIMEDIA TECHNOLOGIES

Glossary Unit 1: Introduction to Video

Digital Video Work Flow and Standards

HDMI Demystified April 2011

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11

User s Guide W-E

PCIe HDMI Video Capture Card - HDMI, VGA, DVI, or Component Video at 1080p60

Using deltas to speed up SquashFS ebuild repository updates

Will Widescreen (16:9) Work Over Cable? Ralph W. Brown

Milestone Solution Partner IT Infrastructure Components Certification Report

USING LIVE PRODUCTION SERVERS TO ENHANCE TV ENTERTAINMENT

Understanding Multimedia - Basics

Table of content. Table of content Introduction Concepts Hardware setup...4

Motion Video Compression

50i 25p. Characteristics of a digital video file. Definition. Container. Aspect ratio. Codec. Digital media. Color space. Frame rate.

Winning With Better Storage:

ESI VLS-2000 Video Line Scaler

CUFPOS402A. Information Technology for Production. Week Three: Video and Film Production Format (SD, HD, 2k/4k, 16mm, 35mm and Stereoscopic 3D)

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

AN MPEG-4 BASED HIGH DEFINITION VTR

A White Paper on High Frame Rates from the EDCF Technical Support Group

Understanding Compression Technologies for HD and Megapixel Surveillance

Part 1: Introduction to Computer Graphics

Video VBOX Pro RLVD10P2P/RLVD10P2PV/RLVD10P4PV/RLVD20P2PV/RLVD20P4PV. Features

DT3162. Ideal Applications Machine Vision Medical Imaging/Diagnostics Scientific Imaging

2-/4-Channel Cam Viewer E- series for Automatic License Plate Recognition CV7-LP

A review of the implementation of HDTV technology over SDTV technology

8088 Corruption. Motion Video on a 1981 IBM PC with CGA

for File Format for Digital Moving- Picture Exchange (DPX)

Eduspot Technical Specifications:

DC-105 Quick Installation Guide

Ethan Gates, Genevieve Havemeyer, Karl McCool, Blake McDowell, Ben Peeples December 8, 2014 CINE-GT 3403 Assignment 2/Assignment 1 Revision

Video Information Glossary of Terms

Pablo Rio, Pablo PA. V2.0 rev 13 New Feature List. If you have any questions please contact Damon Hawkins

Spec 3.0 MRD Overview

VIDEO GRABBER. DisplayPort. User Manual

administration access control A security feature that determines who can edit the configuration settings for a given Transmitter.

TECHNICAL STANDARDS FOR DELIVERY OF TELEVISION PROGRAMMES TO

Reference Guide Version 1.0

PYROPTIX TM IMAGE PROCESSING SOFTWARE

8K120 Projection Application

Natural Radio. News, Comments and Letters About Natural Radio January 2003 Copyright 2003 by Mark S. Karney

HELICAL SCAN TECHNOLOGY: ADVANCEMENT BY DESIGN

Scanner PERENITY 5K The best complete scanning solution for Archives

Matrox PowerStream Plus

Overview. Project Shutdown Schedule

for the Epson Stylus Pro 4000 User s Guide

RECOMMENDATION ITU-R BT.1201 * Extremely high resolution imagery

Digital Media. Daniel Fuller ITEC 2110

Multicore Design Considerations

Agilent 87075C Multiport Test Set Product Overview

Implementation of MPEG-2 Trick Modes

RMS 8424S Quick Start

Technology Cycles in AV. An Industry Insight Paper

Erratum Spec 1.0 Page Sections Affected Description. Trusted Environment. Reel n+1... Encryption. (Reel n) [optional] Encryption (Reel n) [optional]

. ImagePRO. ImagePRO-SDI. ImagePRO-HD. ImagePRO TM. Multi-format image processor line

PulseCounter Neutron & Gamma Spectrometry Software Manual

TOOLKIT GUIDE 4.0 TECHNICAL GUIDE

VSP 516S Quick Start

Set-Top Box Video Quality Test Solution

What is the LTO Program?

User Manual. Model 9A60A. VGA to Component Video Converter. Made in the USA

Stockcode : Description:

Manual Version Ver 1.0

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Classroom Setup... 2 PC... 2 Document Camera... 3 DVD... 4 Auxiliary... 5

MULTI CHANNEL AV RECEIVERSTR-DH750/STR- DH550

PRODUCT GUIDE CEL5500 LIGHT ENGINE. World Leader in DLP Light Exploration. A TyRex Technology Family Company

Stream Labs, JSC. Stream Logo SDI 2.0. User Manual

About Final Cut Pro Includes installation instructions and information on new features

Signal Ingest in Uncompromising Linear Video Archiving: Pitfalls, Loopholes and Solutions.

Release Notes for LAS AF version 1.8.0

h t t p : / / w w w. v i d e o e s s e n t i a l s. c o m E - M a i l : j o e k a n a t t. n e t DVE D-Theater Q & A

Digital disaster recovery for audiovisual collections: testing the theory

Spec Sheet R&S SpycerBox Cell

IBM Linear Tape File System Applications and considerations for broadcast adopters

DT3130 Series for Machine Vision

Verifying Digitized Files

B. The specified product shall be manufactured by a firm whose quality system is in compliance with the I.S./ISO 9001/EN 29001, QUALITY SYSTEM.

Questions to Ask Before Beginning a Digital Audio Project

G-106Ex Single channel edge blending Processor. G-106Ex is multiple purpose video processor with warp, de-warp, video wall control, format

Transcription:

MDPI Film Processing Harder, Better, Faster, Stronger Brian Wheeler, Library Technologies Digital Library Brown Bag Series #dlbb April 18, 2018

Definitions (in no particular order) 1 Petabyte = 1,000 Terabytes = 1,000,000 Gigabytes = 10 15 bytes Scholarly Data Archive (SDA) IU s tape-based storage system High Performance Storage System (HPSS) The software under SDA Transcode Convert from one format to another (.wav ->.mp3) Package or Object All of the digital files for a single physical object Master A file made from the digital physical media Derivative A file created by transcoding another (i.e. thumbnail) Tarball A file made with the tar utility which combines multiple files into one (similar to a zip file, but with no compression) Me, I, We May refer to the software and not me personally A petabyte is equal 15,625 64G USB flash drives A view inside the SDA tape library Core of HAL 9000

MDPI review Media Digitization and Preservation Initiative Announced October 2013 by President McRobbie Digitize and preserve rare and unique time-based media in the university collections by 2020 Around 280,000 325,000 A/V items identified for digitization Film designated as Phase II Partnership with Memnon Archiving Services (a division of Sony) Memnon will digitize the bulk of the content IU Digitization Studios will handle rare, unique, or fragile objects

MDPI timeline 2013 2015 October: Project announcement by President McRobbie June: First production audio batches processed successfully November: First production video batches processed successfully 2016 February: Objects delivered to Dark Avalon for collection managers Second half: Investigation into Phase II (Film) began 2017 November: First production film batches processed successfully

MDPI object overall workflow Selected for digitization POD data entry Shipped from unit to IC Digitization Shipped back to unit Files on SDA Auto QC and Transcoding Manual QC Distributed to Dark Avalon Distributed to MCO Post-digitization processing Physical Object Digital Object

Post-digitization processing (A/V and Film use same workflow)

Post-digitization processing summary Each digital object must be Verified Valid barcode? Correct files from digitizer? Stored correctly on tape? Processed Auto QC d. Derivatives created. Metadata gathered Quality Checked by Humans Subjective issues (color, sound, etc) Distributed All passed objects are sent to a Dark Avalon for collection managers Will distribute to external users at some point in the future

A/V & Film processing requirements A/V ~300 hours of content per day >15 different digitization packages 10% human QC Digitization 5 days per week Film 16 hours of content per day 1 digitization package format 100% human QC Digitization 6 days per week Higher quality derivatives Film should be easy!

Harder, Better, Faster, Stronger Film is Harder than A/V The solution is to do things Better Re-organize existing solutions Faster Implement faster methods or solutions Stronger Throw hardware at the problem or make it more robust Harder Better Faster Stronger

An hour of Film is huge Harder 7000 6000 5000 4000 GB/Hour Archival sizes for 1 hour of Audio: 4G NTSC Video: 64G 2K Scanned Film: 1500G 4K Scanned Film: 6000G 3000 2000 1000 0 Audio Video 2K Film 4K Film GB/Hour

so a day s transfer is also huge. 40 35 30 25 20 15 10 5 0 An Actual Week 4-Apr 5-Apr 6-Apr 7-Apr 8-Apr 9-Apr 10-Apr A/V Film Total 16 hours of film per day 95% 2K Scan => 22.8T 5% 4K Scan => 4.8T 27.6T per day In addition to 8-12T for A/V

Which means it must be fast! There s only 24 hours per day to handle transfer, transcode, and storage of new content At theoretical peak, 10GbE will handle the rate handily 100 90 80 70 Transfer Time in Hours BUT, theoretical peak is rarely achieved: SDA transfer rates are closer to Gigabit Ethernet Lots of idle time waiting for tape migration Memnon doesn t hit peak for upload 60 50 40 30 20 10 0 Gigabit Ethernet 10 Gigabit Ethernet AV Film Total

Network upgrades for Film Faster Memnon added an additional 20Gbps uplink to Campus Network Film-related servers are in a different rack than AV A second SDA-only 10Gbps network link added to all Transcoders and QC machines Bottom line: IU Transcoders and QC machines can handle full speed transfers to/from SDA AND full speed transfers to/from workstations in the IC

Revise transfer windows Better 3am to 9am is A/V transfers 9am to 8pm is Film transfers 8pm to 3am is idle/overflow 7 hours of room to grow Possible because Improved network topology Memnon transfer optimization Time per Day A/V (03:00-09:00) Film (09:00-20:00) Idle (20:00-03:00)

Tape validation data flow Harder Current HPSS doesn t validate internal copies Data corruption is possible! Digitizer SDA Cache Normal flow New objects are loaded into the SDA disk cache SDA Cache Tape Data is migrated from cache to tape The SDA disk cache is purged SDA Cache The data is staged from tape back into the disk cache Data sent from cache to transcoders Tape SDA Cache Time consuming For A/V we can do this with 100% of the content SDA Cache Transcoder Film takes hours to write to tape, and hours to recall

Reduced validation for Film Faster Reduced validation Wait for a tape copy to be made Send the object from SDA cache to the transcoder Digitizer SDA Cache Film objects ending with an even digit use this method Can start transcoding hours earlier Allows transcoders to keep up with daily uploads Transcoder Tape Compatible with HPSS s End-to-End Data Integrity Enables validation on all data moves within HPSS Coming with the SDA upgrade this Summer When implemented, ALL objects will use this method

Tapes are a sequential media Harder Data can only be written to the end of the tape If there are requests to read and write a single tape Fast-forward to the end of the tape Write the data Rewind to the location of the desired data Read the data This is called shoe shining Film must be read from tape while A/V are uploaded (and reverse) SDA uses IBM 3592 JD tapes. Each tape can store 10TB and contains 3527ft of tape

New tape pool for Film masters Stronger Three different tape pools Film Masters A/V Masters Film & A/V Derivatives Efficiency through scheduling Transcode after uploading A/V & Film upload at different times Distribution happens later Not a real-time operation At that point the tapes may be full A/V Masters Film Masters Derivatives A/V upload Write A/V transcode Read Write Film upload Write Film transcode Read Write Distribute Read

Preservation master file is simple Harder Metadata Audio The preservation format is a tarball consisting of: A few metadata files A file manifest with checksums Descriptive and technical metadata A WAV file for the soundtrack May be absent if it is a silent film A DPX image for every frame in the film At 24 FPS, 1440 files per minute of film Frames But Uncompressed images, ~13M per frame (2K), ~52M (4K)

Auto QC is hard to do on tarballs Automated QC on a Preservation tarball needs to: Verify all payload files are present and have the correct checksums Make sure all DPX files have the same size Spot check a percentage of the DPX files for correct metadata Check the WAV file for correct structure and format Check the Metadata for completeness and correctness The tar format makes this hard: Each file consists of a header followed by data The files are written sequentially Finding a file means reading from the beginning until the file is found Extracting the whole tarball takes longer than watching the film Metadata Audio Frames

Tarball index for quick retrieval Faster Create an index Read the tarball from end to end, reading headers, but skipping data Store the header metadata and the offset/length of the data Cost of reading the tarball to find a file is paid ONCE, rather than for every file extraction Faster than extracting data since No disk is allocated Data isn t copied The index allows Fast access to a file s data within the tarball Quick file-metdata actions (i.e. checking if all files are there, size, etc.) Frame 3232

Multithreading automatic QC on Stronger preservation master Creates the tarball index Verify that all of the expected file names are there Verify the metadata files Verify the manifest (72 checksum threads concurrently) Files aren t extracted checksum computed in memory Check DPX metadata on sample set (72 frames concurrently) Less than 1% are pulled, but pulled from all over the film Verifies frame format, position in film, etc. Usual validation is 25-50% of the film s runtime Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7

Film has many variations Harder Aspects of the digitized file impact how derivatives should be made: Scanning Resolution: 2K or 4K Display aspect ratios: overscan & cropped Pixel format: Linear or Logarithmic representation Audio: Silent, Mono, or Stereo Frame rate: 24, 18, other? Anamorphic? Warp gate used? Film gauge: 8mm, 16mm, etc Too many combinations! The cropped and color corrected version An overscan frame. Perforations are on the left, the soundtrack is on the right. Portions of the previous and next frames are visible.

Parameterized configuration Stronger Barcode XML file parameters read by configuration code Extracted directly into variables Converted into other variables These variables are used by the automated QC XML <ScanningResolution>2k</ScanningResolution> <SampleEncoding>Linear 10 bit</sampleencoding> <OverscanAspectRatio>1.316:1</OverscanAspectRatio> QC Variable Width=2048 DPXSampleDepth=10, DPXColorSpace=RGB Height=1556

Processing time varies greatly Harder Different types of objects process at different rates Audio is fastest, non-vhs Video, VHS Video, Film is slowest The duration makes a huge difference A wax cylinder is much shorter than a 2 hour DAT A commercial VHS tape is 2 hours, many home-made ones are 6 hours Films vary from 5 minutes to 50 minutes Each transcoder will load up objects until all CPUs are allocated Problem: Mixing short and long objects ties up the whole machine Transfer times can cause the rates to vary wildly

Machine queue scheduling Originally, each transcoding machine had a single queue that can accommodate 3 objects concurrently. It had to wait until the longest object is done before starting the next ones: Xcode-05 Idle CPUs! Finished Start time First re-queue Second re-queue

Lane-based Queues Better Each machine now has multiple lanes that are queued independently Xcode-05_A Xcode-05_B Lane-based queues have been added to all transcoders, so A/V can also take advantage of it. For VHS this has been a boon because a 6-hour tape will not clog up the system Xcode-05_C Re-queues

More hardware for Film transcodes Stronger A/V Transcoders (4) Lenovo x3650m4, 48 CPU Threads, 128G RAM, 1.5T Scratch Three transcoder systems were added for film Dell r730, 72 CPU Threads, 256G RAM, 7.3T Scratch SSD Each new transcoder has 5 lanes (old ones have 3) 15 film transcodes simultaneously New transcoders used for both Film and A/V 27 queue lanes, 408 CPU Threads, 1.2T RAM, 28T Scratch

Manual QC checks Harder A/V 10% content checked 1.2T per day 5 days digitization weekly 1 week of backlog = 6-10T Evaluation Content transferred to workstation Local tools used for checking Film 100% content checked 27T per day 6 days of digitization weekly 2 weeks of backlog > 324T Evaluation Access content on file server VidiCert needs to scan media Local tools used for checking

Solutions for manual QC checks Stronger Working/backlog space Networking Updates 324T is unaffordable! Leave out preservation master Normally not needed Drops from 1.5T/hour to 400G/hour (2K) or 6T/hour to 1.5T/hour (4K) Greatly reduces transfer times 120T disk array will provide Enough space for backlog Space for post-production Enough bandwidth for mezzanines on server VidiCert Servers Two r730 w/gpu, 64G, small SSD Running Windows Server 2012r2 Mounts storage via Samba Workflow optimization QC Staff pass/fail by moving folder Work space for exceptional conditions

Current derivatives unsuitable Harder Video assets in MDPI are NTSC video NTSC quality is questionable, VHS even more so 10 million pixels/second Film looks better Outside of physical damage, quality can be very good 75 million pixels/second (2K), 302 million pixels/second (4K) Must be suitable for projection A VHS screen shot showing Interlace combing Bottom of frame distortion David Byrne in a 1985 Chrysler LeBaron

Higher-quality Film derivatives Better Low quality derivative the same to allow poor network streaming Medium quality is the same resolution, but higher bitrate leads to better quality picture High quality has a higher resolution and double the bitrate. 50% more pixels than video Table is for a 4:3 film Other ratios retain height and use the computed width Video Film Low Resolution 480x360 480x360 Bitrate 500Kb/s 500Kb/s Medium Resolution 640x480 640x480 Bitrate 1Mb/s 2Mb/s High Resolution 960x720 1200x900 Bitrate 2Mb/s 4Mb/s

Post-production activities Film staff need preservation file Automated Transcoding Film restoration/clean up Dropbox-based on QC server Editing Several formats available: Specific quality troubleshooting ProRes mezzanine New born-digital content From modifications above New packages stored in SDA Digital Cinema Package DVD Quality Automated SDA ingest Dropbox-based on QC server

Little surprises Aspect ratio precision issues Given a ratio of 4:3, the height is 2048 / (4 / 3) = 1536 XML file specified 1.33:1, so 2048 / (1.33 / 1) = 1539 Scanning device issues Additional audio inserted into the soundtrack (7KHz noise) Frame images having different shades on right/left halves Scanning software issues Pops in soundtrack added due to audio alignment issues Misc. format issues (aspect ratio metadata, DPX frame position, etc.) Right side of this frame is slightly more green than the left. The vertical line is the boundary between the two CCDs in the scanner

2015/06 2015/07 2015/08 2015/09 2015/10 2015/11 2015/12 2016/01 2016/02 2016/03 2016/04 2016/05 2016/06 2016/07 2016/08 2016/09 2016/10 2016/11 2016/12 2017/01 2017/02 2017/03 2017/04 2017/05 2017/06 2017/07 2017/08 2017/09 2017/10 2017/11 2017/12 2018/01 2018/02 2018/03 2018/04 Where are we now? 8000 7000 6000 5000 Storage in TB Since film started, we ve ingested 2PB every 3 months. If these trends continue...aaaay! 4000 3000 2000 1000 1 st PB in 12mo 2 nd PB in 7mo 3 rd PB in 5mo 1 st PB in 3mo 0 A/V Film Both

What s next? A/V A few new formats still coming (DVD-R) Bulk digitization may wrap up by the end of the year Film Workflow and processing improvements Troubleshooting Both SDA updates for end-to-end data integrity throughput increase! Off-site third copy of data

Thank You! Questions? Harder Better Faster Stronger