R&D White Pape WHP 119 Septembe 2005 Mezzanine Compession fo HDTV R.T. Russell Reseach & Development BRITISH BROADCASTING CORPORATION
BBC Reseach & Development White Pape WHP 119 Mezzanine Compession fo HDTV R. T. Russell Abstact The 1080p50 and 1080p60 standads ae ideal fo use as studio acquisition fomats, hoeve a bit-ate of 3 Gb/s is equied to tanspot them in an uncompessed fom. A compession method is descibed hich makes it possible to encode a 1080p50 o 1080p60 signal into a fomat that is compatible ith the 1080i standad and suitable fo tanspot via HD-SDI at 1.5 Gb/s. A novel featue is that the compessed signal is ecognisable hen displayed as 1080i, fo ease of monitoing and identification. When that featue is not equied a compession atio of 2.5:1 is achievable. This document as oiginally published in the Poceedings of the Intenational Boadcasting Convention, Septembe 2005. BBC 2005. All ights eseved.
White Papes ae distibuted feely on equest. Authoisation of the Chief Scientist is equied fo publication. BBC 2005. All ights eseved. Ecept as povided belo, no pat of this document may be epoduced in any mateial fom (including photocopying o stoing it in any medium by electonic means) ithout the pio itten pemission of BBC Reseach & Development ecept in accodance ith the povisions of the (UK) Copyight, Designs and Patents Act 1988. The BBC gants pemission to individuals and oganisations to make copies of the entie document (including this copyight notice) fo thei on intenal use. No copies of this document may be published, distibuted o made available to thid paties hethe by pape, electonic o othe means ithout the BBC's pio itten pemission. Whee necessay, thid paties should be diected to the elevant page on BBC's ebsite at http://.bbc.co.uk/d/pubs/hp fo a copy of this document.
MEZZANINE COMPRESSION FOR HDTV R. T. Russell BBC Reseach & Development Depatment, U.K. ABSTRACT The 1080p50 and 1080p60 standads ae ideal fo use as studio acquisition fomats, hoeve a bit-ate of 3 Gb/s is equied to tanspot them in an uncompessed fom. A compession method is descibed hich makes it possible to encode a 1080p50 o 1080p60 signal into a fomat that is compatible ith the 1080i standad and suitable fo tanspot via HD-SDI at 1.5 Gb/s. A novel featue is that the compessed signal is ecognisable hen displayed as 1080i, fo ease of monitoing and identification. When that featue is not equied a compession atio of 2.5:1 is achievable. INTRODUCTION The 1080p50 and 1080p60 standads ae ideal fo use as studio acquisition fomats, because they combine a high spatial esolution (1920 piels 1080 lines) ith a high tempoal esolution (50 o 60 fames pe second, pogessive scan). In addition they can be easily conveted to eithe of the common tansmission fomats, 1080i and 720p. Hoeve these standads equie appoimately tice the bit-ate of the tansmission fomats to cay them in an uncompessed fom (3 Gb/s athe than 1.5 Gb/s). Vaious poposals have been made fo dealing ith this poblem, including dual-link HD- SDI (multipleing the 3 Gb/s data beteen to 1.5 Gb/s cicuits), a 3 Gb/s Seial Digital Inteface standad and the use of high-speed data ove tisted-pai cabling. This pape descibes an altenative appoach involving mild compession (bit-ate eduction), making it possible to encode a 1080p50/60 signal into a fomat that is compatible ith the 1080i standad and theefoe suitable fo tanspot via HD-SDI at 1.5 Gb/s. Such a signal could be caied by eisting HD cabling and outeing infastuctues in a boadcast cente, and ould have the additional convenience of pemitting the use of eisting embedded audio and ancillay data standads ithin the signal. This ok is the subject of to patent applications (Ref. 1). CHARACTERISTICS The poposed compession algoithm has a numbe of popeties hich ae desiable fo this application: Lo-delay coding and decoding (total codec delay 8 lines) eliminates pictuesound synchonization poblems. Lo-loss compession povides visually nea-pefect epoduction. Multi-geneation compession adds negligible additional loss. Small pictue blocks limit the popagation of eos and avoid spatial cossmodulation (e.g. noise in one pat of a pictue affecting anothe pat). Simple algoithm should be elatively easy to implement in hadae.
In addition the poposed algoithm povides a means to make the compessed data ecognizable if it is displayed as a standad 1080i signal. This is achieved by encoding a coasely quantized, intelaced, vesion of the pictue in the most-significant bits of the compessed signal. Although this compatible pictue is noisy it is good enough to allo identification and monitoing using eisting HDTV displays. COMPRESSION PROCESS Block Stuctue Each fame is divided into inta-coded macoblocks of 16 (luminance) piels 4 TV lines. Each macoblock contains 64 luminance piels, 32 Cb chominance piels and 32 C chominance piels. Fo a 1080p fame this esults in 120 270 = 32400 macoblocks in total. The height of the macoblock detemines the minimum delay though the code o decode (i.e. fou lines). Intege Tansfoms Each macoblock is subdivided into 4-piel 4-line tansfom blocks, so in each macoblock thee ae 4 luminance tansfom blocks, 2 Cb tansfom blocks and 2 C tansfom blocks. Each 44 block is tansfomed into fequency space using the folloing Intege tansfom R = TXT T (Ref. 2): 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 1 = 2 1 1 1 1 1 2 1 1 1 2 1 2 1 1 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 1 1 1 1 2 1 1 2 1 1 1 1 1 2...(1) 2 1 Fo the puposes of the computation the 10-bit input piels (luminance and chominance) ae assumed to be signed values in the ange 512 to +511 so zeo luminance is mid gey; this is the convention used in BBC R&D s Kingsood fomat pictue files. The tansfom has a maimum gain of 36. Scaling The 44 outputs fom the Intege tansfoms ae scaled and ounded accoding to the folloing algoithm: s ij = Sign( ij ) * ((Abs( ij * ij ) + 32768) >> 16)...(2) hee: 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 16384 = 10486 16384 10486 10486 6711 10486 6711 16384 10486 16384 10486 10486 6711...(3) 10486 6711 Note that the multiplies ae appoimations to the folloing constants (Ref. 2): 16384 = 2 10486 2 6711 2 22 22 22 /16/16 / 20/ 20 / 25/ 25
Hadamad Tansfoms The DC tems (s 00 ) of the scaled Intege tansfoms, that is fou values pe macoblock fom the luminance and to values each pe macoblock fom Cb and C, ae futhe pocessed using 1-dimensional Hadamad tansfoms as follos: Luminance: h a = s a + s b + s c + s d...(4) h b = s a + s b s c s d...(5) h c = s a s b + s c s d...(6) h d = s a s b s c + s d...(7) Chominance: h a = s a + s b...(8) h b = s a s b...(9) These tansfoms eploit the pobable coelation beteen the DC tems in adjacent tansfom blocks, and esult in thee aggegated DC values pe macoblock (Y, Cb, C) hich ae sent, uncompessed, as 10-bit numbes in the compessed data. This esults in the DC tems being accuately epoduced at the decode and educes block atefacts. Quantisation To contol the oveall bit ate, thus ensuing that the compessed signal fits into the available data capacity, a vaiable degee of quantising is applied independently to each macoblock. The quantising is applied to all the non-zeo tansfom coefficients in the macoblock, luminance and chominance, ecept the thee aggegated DC tems h a (Y, Cb and C). Since the only pupose is to contol the bit ate, it is undesiable to apply any ineffective quantising, i.e. the addition of quantisation noise ithout any consequent eduction in the total numbe of coded bits. This is a shotcoming of the quantising schemes used by some othe compession techniques. To avoid this shotcoming the folloing constaints ae imposed: 1. Quantisation consists alays of bit shifts, i.e. divisions by poes of to. 2. The vaiable-length code chosen to encode the quantised coefficients has the popety that dividing the coefficient value by to is guaanteed to esult in a eduction in the numbe of coded bits. The combination of these chaacteistics means that heneve an additional degee of quantisation is applied (i.e. a coefficient is divided by a geate poe of to) the total numbe of bits is guaanteed to fall. It also helps achieve the popety of adding vey little o no loss on a second o subsequent geneation. The simplest appoach to such quantising ould be to divide all (non-zeo) coefficients in the macoblock by the same poe of to, and signal that numbe to the decode. Hoeve such a coase contol could esult in a much geate degee of quantisation noise than necessay to meet the bit budget.
What is needed is a method of quantising diffeent coefficients to diffeent degees, ithout the ovehead of signalling additional infomation to the decode. The ay this is achieved is as follos: 1. The degee of quantisation is contolled by a 5-bit unsigned numbe q, taking a value fom 0 to 31. 2. The most-significant 3 bits of q, being a value in the ange 0 to 7, detemine a pai of poes of to by hich each coefficient in a macoblock may be divided, as follos: 0 divide by 1 o 2 1 divide by 2 o 4 2 divide by 4 o 8 3 divide by 8 o 16 4 divide by 16 o 32 5 divide by 32 o 64 6 divide by 64 o 128 7 divide by 128 o 256 Table 1 Coase diviso selection 3. The least-significant 2 bits of q, in conjunction ith the value of the coefficient itself, detemine hich of the to possible divisos is used. A thee bit numbe in the ange 4 to 7 is deived fom the coefficient value as follos: (a) Take the absolute value of the coefficient. (b) If the value is geate than 7 keep dividing by to until 7 o less. (c) If the value is less than 4 keep multiplying by to until 4 o moe. The folloing table detemines hethe the smalle o lage diviso is used, hee the os coespond to the least-significant 2 bits of q and the columns to the numbe deived fom the coefficient: 4 5 6 7 0 smalle smalle smalle smalle 1 lage smalle smalle smalle 2 lage lage smalle smalle 3 lage lage lage smalle Table 2 Fine diviso selection So if the least-significant 2 bits of q ae zeo the smalle of the to divisos is alays used, but otheise hich diviso is used depends on the value of the coefficient. In this ay fine contol ove the degee of quantisation is achieved, such that the equivalent quantisation facto doubles fo evey incement of 4 in q.
In pactice this is complicated by to factos: 1. To minimise the added quantisation noise the coefficient values ae ounded athe than simply tuncated. 2. The decode sees only the quantised value, hich may not necessaily give ise to the same thee-bit numbe (4-7) as that deived fom the oiginal coefficient. As a esult the actual algoithm used is slightly moe comple. The folloing pocess is used to detemine the appopiate value of q to use fo each macoblock: (a) Initialise q to zeo, coesponding to minimum quantising. (b) Quantise all the (non-dc) coefficients in the macoblock accoding to the value of q, hee the most-significant thee bits of q detemine a pai of possible divisos and the least-significant to bits of q (in association ith the coefficient value) detemine hich of the to divisos to use. (c) Detemine the total numbe of bits needed to code the quantised macoblock (see belo). (d) If the numbe of bits eceeds the available capacity (512 bits) incement q. (e) Repeat (b), (c) and (d) until the coded macoblock fits in the available space. Vaiable Length Coding The quantised coefficients (ecept the thee DC tems) ae coded accoding to the folloing signed Ep-Golomb code (Ref. 3): 0 1 7 0001110 1 010 7 0001111 1 011 8 000010000 2 00100 8 000010001 2 00101 9 000010010 3 00110 9 000010011 3 00111 10 000010100 4 0001000 10 000010101 4 0001001 11 000010110 5 0001010 11 000010111 5 0001011 12 000011000 6 0001100-12 000011001 6 0001101 etc. etc. Table 3 Signed Ep-Golomb code
Packet Stuctue Each macoblock is encoded into a data packet of 512 bits, coesponding to a 2.5:1 compession compaed to the oiginal 10-bits-pe-piel input. The value of q selected fo a macoblock is the smallest that esults in the size of the compessed data not eceeding 512 bits. The stuctue of each packet is as follos: 1. Quantisation code q, 5 bits, MSB fist. 2. DC luminance coefficient divided by 16 and ounded, 10 bits, MSB fist. 3. DC Cb chominance coefficient divided by 8 and ounded, 10 bits, MSB fist. 4. DC C chominance coefficient divided by 8 and ounded, 10 bits, MSB fist. 5. Remaining luminance coefficients, signed Ep-Golomb coded, in the ode shon belo: Y 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 16 20 24 28 17 21 25 29 18 22 26 30 19 23 27 31 32 36 40 44 33 37 41 45 34 38 42 46 35 39 43 47 48 52 56 60 49 53 57 61 50 54 58 62 51 55 59 63 Table 4 Luminance coefficient odeing 6. Remaining chominance coefficients, signed Ep-Golomb coded, in the ode shon belo: Cb 3 7 11 1 5 9 13 C 4 8 12 2 6 10 14 15 19 23 27 17 21 25 29 16 20 24 28 18 22 26 30 31 35 39 43 33 37 41 45 32 36 40 44 34 38 42 46 47 51 55 59 49 53 57 61 48 52 56 60 50 54 58 62 Table 5 Chominance coefficient odeing Tailing zeo coefficients in the packet ae not counted hen detemining hethe the data fits in the available space. Such coefficients need not be stoed if the packet is aleady full (they ould otheise have needed one eta bit of capacity pe coefficient). It is only fo this eason that the ode of the coefficients is significant. In paticula hee thee is no chominance (all chominance coefficients ae zeo) the entie data packet is available fo coded luminance. It may be beneficial to fill any unused data bits ith andom values, to minimise the visibility of the data on the compatible signal. When used to compess a 1080p50 o 1080p60 input to a 1080i-compatible output, the data packet is tanspoted in the least-significant 8 bits of the (10-bit) video, equiing 64 consecutive video samples (32 luminance and 32 chominance). Theefoe 60 packets can be contained ithin a single TV-line of intelaced output. The fist bit of the data packet is the most-significant bit of the fist 8-bit value.
Compatible Pictue The to most-significant bits of the 10-bit output can be used to cay a compatible intelaced vesion of the input pictue, so that if the compessed signal is vieed as if it ee standad 1080i video the content ill be ecognisable, although noisy. This makes it possible to monito the signal fo the puposes of identification and to give confidence that the code is oking. The steps to achieve this ae as follos: 1. An intelaced vesion of the input is ceated, typically by aveaging pais of consecutive pogessive lines. The phase of the line paiing must be diffeent on even fields and odd fields in ode to achieve the equied intelaced stuctue. 2. Each 10-bit piel value is modified by subtacting the 8-bit value of the coded data packet coesponding to that piel s position and adding a to-dimensional halftone dithe. Afte the addition and subtaction the esult must be limited to a valid ange. The dithe consists of a epeating 88 patten of values as follos: 0 128 32 160 8 136 40 168 192 64 224 96 200 72 232 104 48 176 16 144 56 184 24 152 240 112 208 80 248 120 216 88 12 140 44 172 4 132 36 164 204 76 236 108 196 68 228 100 60 188 28 156 52 180 20 148 252 124 220 92 244 116 212 84 Table 6 2D halftone values The dithe is added independently to the luminance, Cb chominance and C chominance components, hee the o selected is detemined by the line numbe evaluated modulo-8, and the column selected is detemined by the piel numbe (0-1919 fo luminance, 0-959 fo chominance) evaluated modulo-8. 3. The least-significant 8-bits of the esulting piel value ae discaded and eplaced by the 8-bit value fom the coded data packet. 4. If the final 10-bit value is a fobidden TRS code (Ref. 4), the most-significant to bits ae modified: 00 000000 is modified to 01 000000 11 111111 yy is modified to 10 111111 yy
The effect of the halftone dithe can be seen fom the folloing illustations (magnified so the individual piels ae visible): Figue 1 - Oiginal linea amp Figue 2 - Ramp quantised to to bits Figue 3 - Ramp quantised to to bits ith 2D halftone dithe Figue 4 - Quantised amp ith dithe and andom data in LSBs DECOMPRESSION PROCESS In the inteests of bevity details of the decompession pocess ae not given in this pape, but it essentially involves evesing the steps in the compession pocess (apat fom those concened ith geneating the compatible intelaced signal and the need to detemine the degee of quantisation).
TEST RESULTS The compession method has been tested using both still pictues and pogessivelyscanned moving sequences. The moving sequences ae moe citical because quantising noise and aliasing can often be masked by detail in a still pictue. The esults have been assessed both subjectively and by means of PSNR measuements. The subjective tests demonstated that, fo the mateial used, the atefacts ee geneally invisible even hen a diffeence pictue as used to da attention to them. The PSNR measuements ae pesented belo. Pictue PSNR (luminance) PSNR(chominance) Nepat PM 55.5 db 57.6 db Kiel Habou (1080) 51.4 db 51.6 db Test Cad W 56.9 db 60.0 db Boy ith toys 52.2 db 51.9 db Dick 51.2 db 50.8 db Boat 52.7 db 52.3 db Tee 48.4 db 47.6 db Tee (10 th geneation) 48.0 db 47.5 db Table 7 Still pictue PSNR measuements Sequence PSNR (luminance) PSNR (chominance) Toys & Calenda (720) 50.5-51.1 db 49.4-49.8 db Yosemite (720) 50.1-51.1 db 49.4-50.0 db White City (720) 50.4-50.5 db 49.4-49.5 db BBCdisk 49.0-50.0 db 48.7-49.5 db Football 48.2-49.0 db 47.8-48.5 db Skate 52.1-53.3 db 51.7-53.0 db Panslo 49.9 51.5 db 50.7 52.0 db Panslo (7 th gen) 1 41.5 43.2 db 42.9 44.3 db Table 8 Pogessive sequence PSNR measuements Figues 5 to 10 belo sho some sample pictues, illustating the appeaance of the compatible coded pictue and the natue of the diffeence beteen the oiginal and decoded pictues. A gain of 16 has been applied to the diffeence pictues. 1 With piel shifts beteen each geneation.
Figue 5 Compatible coded pictue Figue 6 Decoded pictue, 1st geneation Figue 7 Diffeence pictue, 1st geneation (gain 16)
Figue 8 Compatible coded pictue Figue 9 Decoded pictue, afte 7 geneations ith piel shifts Figue 10 - Diffeence pictue, afte 7 geneations (gain 16)
IMPLEMENTATION It is anticipated that the suggested algoithm could be pototyped ithin a single inepensive FPGA (Field Pogammable Gate Aay) device, cetainly in the case of the decode and pobably in the case of the encode. The lo delay implies lo stoage equiements, hich should be achievable ith the intenal esouces of a moden FPGA. One possible implementation ould be as an etenal dongle poviding convesion fom dual HD-SDI to single HD-SDI (encode) o fom single HD-SDI to dual HD-SDI (decode). The main pactical consideation hen deploying a system of this kind ould be the delay, hich although only 8 TV lines pe codec may be significant in espect of video synchonisation, fo eample at the inputs of a vision mie. It may be that some equipment could be made toleant of delay diffeences of this sot at its inputs, but failing that it might be necessay to equalise the delays on diffeent souces. CONCLUSIONS The compession method descibed shos pomise in being able to achieve a 2:1 eduction in bit ate ith atefacts belo the theshold of visibility fo typical 422 mateial. Futhe ok ill be equied to assess the pefomance acoss a ide ange of mateial and to detemine its suitability fo the suggested application. REFERENCES 1. United Kingdom Patent Applications No. 0507105.5 and 0507106.3 (07 Apil 2005). 2. BS ISO/IEC 14496-10:2003 Infomation technology Coding of audio-visual objects Pat 10: Advanced video coding, section 8.6.1.1 3. BS ISO/IEC 14496-10:2003 Infomation technology Coding of audio-visual objects Pat 10: Advanced video coding, section 9.1 4. ITU-R BT.1120-4 (2003) Digital intefaces fo HDTV studio signals, sections 2.1 and 2.3. ACKNOWLEDGEMENTS The autho ould like to thank the Bitish Boadcasting Copoation fo pemission to publish this pape.