Picture-Quality Optimization for the High Definition TV Broadcast Chain

Size: px

Start display at page:

Download "Picture-Quality Optimization for the High Definition TV Broadcast Chain"

Virgil Hancock
5 years ago
Views:

1 Technical Note PR-TN 2007/00338 Issued: 06/2007 Picture-Quality Optimization for the High Definition TV Broadcast Chain A. Dimou; R.J. van der Vleuten; G. de Haan Philips Research Europe Unclassified Koninklijke Philips Electronics N.V. 2007

2 PR-TN 2007/00338 Unclassified ad- Authors dress A. Dimou WO 01 R.J. van der Vleuten WO 01 G. de Haan WO 01 KONINKLIJKE PHILIPS ELECTRONICS NV 2007 All rights reserved. Reproduction or dissemination in whole or in part is prohibited without the prior written consent of the copyright holder. ii Koninklijke Philips Electronics N.V. 2007

3 Unclassified PR-TN 2007/00338 Title: Author(s): Reviewer(s): Picture-Quality Optimization for the High Definition TV Broadcast Chain A. Dimou; R.J. van der Vleuten; G. de Haan IPS Facilities Technical Note: PR-TN 2007/00338 Additional Numbers: Subcategory: Project: Video-Codec Artefact Repair ( ) Customer: Keywords: Abstract: digital video broadcasting, HDTV, video compression, MPEG-2 compression, coding artifacts, broadcast systems, HDTV standards The High Definition scene is constantly changing. The arrival of Full HD flat panel displays, the constant improvement of the AVC encoder, and the trend towards 1920x1080 progressive broadcasting have changed the balances of the High Definition broadcasting chain. It is, therefore, required to revisit the TV-formats debate. This research project was set up to consider: End-viewer perceived quality for critical content when broadcasting 720p/50, 1080i/50, and 1080p/50. End-viewer perceived quality for critical content encoded with MPEG 2, VC1 and H.264/AVC in various bitrates. Industry trend to migrate from MPEG 2 at 18 Mbps to AVC at 12 Mbps. New techniques for better usage of the available bitrate. Research during the project showed that: MPEG 2 and AVC have different types of artifacts but the overall perceived quality is similar for the same bitrate. The choice depends on the viewer. 1080i/50 quality is highly dependent on the set of (de-) interlacer. A possible solution for better bitrate usage could be frame subsampling prior the encoding and frame interpolation on the decoder. Tests showed that it can work only for content with smooth motion or temporally filtered. For a certain bitrate, there is an optimal tradeoff between the Koninklijke Philips Electronics N.V iii

4 PR-TN 2007/00338 Unclassified resolution format and the perceived quality. It is, though, highly content dependent and therefore not easily exploitable. The results of the perception test conducted showed that: 720p, 1080i, and 1080p have differences in quality, not statistically significant for MPEG 2 encodings. 1080p and 720p are preferred to 1080i for AVC encodings. Viewers prefer MPEG 2 artifacts to AVC artifacts. Viewers prefer MPEG 2 at 18 Mbps to AVC at 12 Mbps Subjects that need to be investigated: The role of Kell factor in the comparison between 1080p and 720p formats. The simulation of the broadcast chain with less critical content. Conclusions: iv Koninklijke Philips Electronics N.V. 2007

5 Unclassified PR-TN 2007/00338 Contents 1. Introduction Problem description Project objectives Project delimitations Report outline HDTV & Broadcasting HDTV Broadcasting video chain Simulations on the broadcasting video chain Background Scanning formats...11 Evaluation methodology Progressive video chain p p Non-Standard resolutions Frame-Rate Sub-Sampling Film content Interlaced video chain i Perception Experiment Introduction Experiment protocol Results CrowdRun DucksTakeOff OldTownCross All Sequences Results & Conclusions Discussion & Future Work...39 Koninklijke Philips Electronics N.V v

6 PR-TN 2007/00338 Unclassified 7. Acknowledgements Bibliography...47 Appendix A. Hardware configuration...49 A.1. Streamer...49 A.2. Displays...49 A.2.1. TFT-LCD Panel...49 A.2.2. TV Set...51 Appendix B. Software configuration...53 B.1. MPEG 2 - Encoder...53 B.1.1. Introduction to MPEG B.1.2. Canopus Procoder B.2. H.264/AVC Encoder...56 B.2.1. Introduction to H.264/AVC...56 B.2.2. X.264 (r628)...57 B.2.3. JM B.3. Scaling...59 B.3.1. Scalers...60 B.3.2. Scaling test...60 B.4. De-interlacing...61 B.4.1. De-interlacing test...61 Appendix C. Reference video sequences...63 C.1. Video content sequences...63 C.2. Video Sequences Processing...67 Appendix D. Baseline Document...71 D.1. Phasing Plan...71 D.1.1. Initiation...71 D.1.2. Definition...71 D.1.3. Design...71 D.1.4. Preparation...72 D.1.5. Realization...72 D.2. Control Plan...73 D.2.1. Time/Capacity...73 D.2.2. Progress Control...74 vi Koninklijke Philips Electronics N.V. 2007

7 Unclassified PR-TN 2007/00338 D.2.3. Money...77 D.2.4. Quality...77 D.2.5. Information...77 D.2.6. Organization...78 D.2.7. Risk Analysis...79 Koninklijke Philips Electronics N.V vii

8 PR-TN 2007/00338 Unclassified viii Koninklijke Philips Electronics N.V. 2007

9 Unclassified PR-TN 2007/ Introduction 1.1. Problem description High-definition television (HDTV) is a product that has received massive attention long before it was even available. The technology and consumer electronics related press, business and lifestyle magazines, news bulletins, financial and virtually all other daily newspapers in the world continue to cover HDTV on daily basis. This coverage highlights that HDTV is a complex subject: a technology story, a public policy story, a lifestyle story, science, politics and marketing all rolled up together. Moreover, large flat LCD screens, Plasma Display Panels (PDP), and projectors with HD resolution are becoming widely available with attractive features, at prices that it seems the public can afford to pay. Blu-Ray/HD-DVDs, together with equipment to play and record them, are becoming available, stimulating the public appetite for HD. Even in PC displays and graphics technology, HD quality is the new trend. Europe is now on the verge of taking the step towards HDTV and it has a hard road to hoe until it actually achieves it. TV creates a huge ecosystem around it, consisting of numerous parties like: consumers, broadcasters, content providers, content producers, display manufacturers, acquisition hardware manufacturers, the European Commission, advertisers, and many more diverse stakeholders. European broadcasters, together with the other stakeholders, need to analyze three core technical issues when they plan a route to HD. These are: the platform to use for delivering the HD package, the scanning format to use for the delivery channel, and the compression system to be used for delivering the HD. The debates on these issues are still quite heated. The adversary resolution formats are 1280x720 and 1920x1080, scanned in either progressive or interlaced format. 1280x720 is the lower resolution defined as HD. The European Broadcasting Union (EBU) proposes it in its progressive format, 720p, but for the European consumers that have experienced PAL (720x576 interlaced) doesn t seem like the spectacular change they have been waiting for. On the contrary, the highly advertised 1920x1080, also in the progressive format (commonly seen as Full HD), is extremely popular. Tech enthusiasts are already dreaming of 1920x1080 progressive (1080p), carried away by the fact that new TVs and game consoles are now supporting 1080p. With a constantly increasing number of home entertainment devices supporting 1080p, there is a strong trend towards it. In-between the two progressive resolution standards, there is 1920x1080 interlaced, which is also proposed by EBU. It combines higher resolution than 720p and less bitrate needed for encoding than the 1080p, but it comes with the challenge of more complex processing. The appearance of H.264/AVC and VC1 has raised yet more debates. These new competing compression standards together with MPEG 2, which is already established, have created a new dimension to the problem. Despite its age, MPEG 2 delivers robust efficiency in encoding the HD signal. The implementations of the 1

10 PR-TN 2007/00338 Unclassified standard have matured, taking full advantage of its virtues. H.264/AVC is the newest standard available, claiming to improve efficiency by a factor of 2. The complexity, though, of the AVC encoder is up to 10 times greater than that of MPEG 2. VC1 comes in here with a lighter version of AVC. It has reduced complexity, yet offering the least possible loss of quality. VC1 is developed by a group of companies led by Microsoft. The bitrate that will be used by the broadcasters is another important issue. Consumers, carried away by the HD trend, have created huge expectations for the quality of the next generation of TVs. To quench this thirst for quality, a significant increase in the bandwidth is imperative. On the other hand, the broadcasters are called to make significant investments on new equipment and pay for the extra bandwidth. They see AVC as a chance to provide high quality of picture, with a reasonable bitrate. The issue has been investigated already for some years. Since 2000 European broadcasters have investigated the implications from the use of new types of displays. Already from 2001, the European Broadcasting Union (EBU) has re-opened the discussions for higher definition TV-formats. A technical committee was created and two separate studies were completed. The first one was performed by RAI, Italy focused on main level broadcasting (SDTV, 576i/50) targeting at WideVGA panels (852x480 pixels). The second one was organized by BBC, UK, which studied the differences between Main Level-broadcasting and High Level-broadcasting (HDTV, 720p/50) targeting both WideVGA panels (852x480 pixels) and WideXGA panels (1366x768 pixels). In 2002 SVT, Sweden, and IRT, Germany, undertook the task to perform the tests, targeting mainly on WideXGA flat panels. The report came out on April 2002 claiming that 720p is overall better (1). In 2005 another survey was done in Mainz focusing on the comparison between 720p and 1080i (2). A new survey (3), which took place at the end of 2006, was presented by Hans Hoffmann from the EBU technical department, including also 1080p as a possible TV-format. The new inputs to the problem have created the need for new studies on High Definition TV. The entrance of 1080p in the competition and the constantly improving AVC encoders are creating new givens in High Definition TV debate. In order to explore every aspect of this problem, we decided to perform new independent quality tests. These tests include objective and subjective evaluation of all the options available for the HD broadcasting chain. This survey is concluded with a perception test to confirm the results and analyze the consumer behavior Project objectives The objective of this project was to study every block of the broadcasting chain of High Definition TV, the effect it has on the chain, and the interaction between those blocks. There is a set of core parameters that were evaluated, namely the scanning format, the resolution, the encoding standard, and the bitrate. But in order to make an unbiased evaluation, special care had to be taken in the selection of all the remaining 2 Koninklijke Philips Electronics N.V. 2007

11 Unclassified PR-TN 2007/00338 parameters. The last category includes the input video sequences, the scaler, the (de-) interlacer, color transformations, pre- and post-processing, and finally the display device. The two competing scanning formats that were evaluated are the progressive and the interlaced. The progressive can be processed easily and has a good motion portrayal, but needs double the number of pixels for the same number of lines. On the other hand, interlaced is regarded as more adjusted to the human visual system, and is more bandwidth efficient, but also sensitive to vertical frequencies and complex motion. During encoding, the HD scanning formats employed can produce different types of artifacts. The perception of the human eye for each type of artifacts can vary significantly and it must be taken into account. 720p and 1080i are the main representative standards, proposed by EBU. An important goal of the project was to evaluate the different artifacts that they produce and their behavior against different bitrates. The resolution format that will be adopted was also under research. 1280x720 and 1920x1080 are the main candidates. EBU proposes 720p (1280x720 progressive), because it shows very good behavior during encoding, especially at low bitrates. The other candidate is 1920x1080. Although 1920x1080 is currently proposed by EBU in its interlaced format, there is pressure to move to 1080p directly. The interaction between different sets of resolution and scanning format, and the encoding parameters, was very important for the project. Encoding lies on top of these parameters. The compression standards that needed to be assessed are MPEG 2, H264, and VC1. Different codecs can have different performance for progressive and interlaced, or high and low bitrate. The exploration of the interaction between the encoders and the previous parameters was thoroughly tested. After some initial simulations, VC1 was rendering significantly lower quality than both MPEG 2, and AVC and was, therefore, excluded from the tests. The bitrates that were tested aimed towards the broadcast industry. Following the ITU recommendations, the lower bitrate tested was 8 Mbits per second and the highest was 18 Mbits per second. As an intermediate bitrate, 12 Mbps was also employed. The interaction of the encoding with the previous broadcast parameters was one of the main interests of this project. The selection of input material can define the outcome of the tests. Our concern was to use a diverse set of sequences that would cover a wide range of shows on television. The set of input sequences that were utilized for the project was created by SVT (Swedish television). They tried to create a set of sequences that is representative of the main categories of video. It includes sequences that represent sports, nature, panoramas and action. In addition to those sequences, some film material was used for real-life scenarios. As mentioned above, in our quest for objectivity, we had to make sure that sideparameters were not biasing our results. The sequences from SVT were captured in a 3840x2160 progressive format with 16bit color depth. Therefore, special care had to be taken in converting them into different color, resolution, and scanning format sequences. The transformation of the sequences includes the use of scaling and (de)interlacing. To ensure the highest quality of results, special care had to be taken for these modules. Some preliminary tests were done to assess the quality of the available scalers and (de)interlacers before starting with the main work (see Appendix B). The final quality of the image is also strongly affected by the pre- and post-processing procedures. Some of the methods included in these stages are sharpening and filtering. None of them is really independent from coding. They have a unique way to interact 3

12 PR-TN 2007/00338 Unclassified with the encoding procedure, improving or even decreasing the image quality. They can enhance the signal but also amplify annoying artifacts caused by encoding at a limited bandwidth. The evaluation of the differences in the perceived quality for different display devices was also part of our objectives. The experience of the viewer on a small TV can be significantly different from a large one. Every type of display has its own characteristics that have to be taken under account (e.g. a LCD panel can have a poor motion portrayal, blurring motion scenes, if it lacks a scanning backlight). Moreover, every display device has its own post-processing block. The way the picture and the artifacts are dealt with inside it, was also part of the targets of this project Project delimitations An exhaustive survey of all pending questions on High Definition TV broadcast is beyond the capabilities of this project. Therefore, before starting this project the delimitations of this project were defined. These delimitations are stated here to make even clearer the objectives of the project. The higher goal was to contribute to the path of identifying the way to better quality for the consumers. Therefore, mainly high quality and complexity input sequences are used for the simulations. The degree of complexity (of the SVT sequences) is higher than for most of the current TV and movie content available to the consumers. A short study on this type of non-critical content has been conducted and will be presented in this report for awareness reasons only. The project was focused on the subjective evaluation of the simulation results. The results were evaluated in two different phases. During the first phase, results were assessed by members of the organization of this project. In this phase, the most important and interesting issues were identified. Subsequently, in the second phase, a perception test was organized to study the identified issues. The participants of the test consisted of two groups of people, experts on video and non-experts. Objective quality metrics were used only as a comparison reference for the results of the subjective evaluation and not to extract conclusions that can be deceiving. During the project, for the processing of the sequences we utilized only software (and not hardware) solutions. Our goal was to keep every processing step under control. Since this is a research activity, implementation issues were not considered. The choice of the parameters was done in terms of quality. The quality of the simulations had strict priority over the speed of all the procedures that were used during the project Report outline This report is structured in 6 chapters and 4 Appendices. Following this introduction, Chapter 2 gives an introduction to the history of High Definition TV. It also gives a definition of the broadcasting chain and analyzes it in stages and blocks. A short description is given for every stage and block. Chapter 3 is focusing the discussion on broadcasting chains. It gives some background information on the scanning format debate and the evaluation methods used in this project. It deals separately with the progressive and the interlaced video chains, reporting on both standardized and non standardized formats. The Kell factor and how 4 Koninklijke Philips Electronics N.V. 2007

13 Unclassified PR-TN 2007/00338 it effects the visual perception of high spatial frequencies is also discussed. All results are evaluated objectively and subjectively Chapter 4 describes a perception test that followed the simulations in the previous chapters. This perception test is trying to validate scientifically the results taken before and to extract the viewers preferences. Chapter 5 summarizes all the results taken above and derives conclusions and Chapter 6 and triggers a discussion on the results and proposes future research subjects for a Follow-up phase. More technical details about the simulations and the tools used for it are given in the Appendices. Appendix A describes the hardware used for the simulations and defines the configuration used. Appendix B describes the software tools used and Appendix C the input material employed. The baseline document of this project is given in Appendix D. 5

15 Unclassified PR-TN 2007/ HDTV & Broadcasting 2.1. HDTV High-Definition Television (HDTV) is a television broadcasting system with a resolution significantly higher than traditional formats (NTSC, SECAM, and PAL) offer. HDTV has at least twice the linear resolution of standard-definition television (SDTV), thus allowing much more detail to be shown compared with analog television or regular DVD. In addition, the technical standards for broadcasting HDTV are also able to handle 16:9 aspect ratio pictures without using letterboxing or anamorphic stretching, thus further increasing the effective resolution for such content. The term "high definition" was used to describe the electronic television systems of the late 1930s and 1940s beginning with the former British 405-line black-and-white system, introduced in 1936; however, this and the subsequent 525-line U.S. NTSC system, established in 1941, were high definition only in comparison with previous mechanical and electronic television systems; NTSC, along with the later European 625-line PAL and SECAMs, is described as standard definition today. On the other hand, the 819-line French black-and-white television system introduced after World War II was arguably high definition in the modern sense, as it had a line count and theoretical maximum resolution considerably higher than those of the 625- line systems introduced across most of post-war Europe. However, it required far more bandwidth than other systems, and was switched off in 1986, a year after the final British 405-line broadcasts. Japan was the only country where commercial analog HDTV was launched and had some success. In Europe, analog (HD-MAC) HDTV failed. The United States has also experimented with analog HDTV (there were about 10 proposed formats), but it soon moved towards a digital approach. Except for early analog formats in Europe and Japan, HDTV is broadcasted digitally and, therefore, its introduction sometimes coincides with the introduction of Digital Television (DTV). This technology was first introduced in the USA during the 1990s by the Digital HDTV Grand Alliance (grouping together AT&T, General Instrument, MIT, Philips, Sarnoff, Thomson, and Zenith). While a number of high-definition television standards have been competing with each other for the niche of the markets or implemented on a limited basis, the current HDTV standards are defined in ITU-R BT.709 as 1080 active, interlace or progressive, scan lines, or 720 progressive scan lines, using a 16:9 aspect ratio. The term "high-definition" can refer to the resolution specifications themselves, or more loosely to media capable of similar sharpness, such as photographic film. 7

16 PR-TN 2007/00338 Unclassified 2.2. Broadcasting video chain Nowadays, there is a massive production of video content. Capturing is performed with a wide variety of devices. Professional and commercial analog/digital cameras, web cams, mobile phones, surveillance cameras and photo cameras are some of the capturing devices available to candidate producers. All of them store video content in different resolutions, color formats, scanning formats and quality. On the other end of the video chain, stand the video content consumers. They use a set of displays and devices to watch content. Beamers, High Definition (HD) and Standard Definition (SD) TV sets, PC monitors, mobile phones and multimedia players are typical examples of such devices. Each one displays video only in specific formats. They have a limited resolution, the ability to show a limited number of colors and certain processing capabilities. The quality standards, expected by the consumers, depend a lot on the application. For example, consumers expect the best possible quality from their brand new HDTV, while keeping their expectations relatively low when it comes to the screen of the mobile phone. It is evident that a lot of different needs coexist. And above all those needs, the consumer asks for interoperability. The task to match the variety of the produced content with the needs of the consumers is performed by a series of processing blocks, called the video chain. These blocks ensure that the displaying device will be able to show the transmitted content in the best possible quality. The broadcasting chain for High Definition TV is depicted in Figure 2.1. Pre-Processing Stage Encoding Stage Post-Processing Stage (TV) Decoding Scaling Content Acquisition Scaling Interlacing Encoding (MPEG 2, AVC VC1) Scanning Format Control De- Interlacing Display Panel Color Format Control Frame Rate Control Conversion to Stream Artifact Reduction Frame Rate Control Sharpening User Controlled Preferences Figure 2.1: The video chain in stages and the corresponding processing blocks The broadcasting video chain consists of three main stages. The first one is preprocessing that takes place before encoding the content into a compressed stream. The second one is the actual encoding of the signal. The last stage includes the decoding of the stream and the final post-processing blocks. This stage is done inside the Set- Top Box (STB), if available, and the TV set, to produce the final result delivered to the consumer. The pre-processing stage aims to convert the video sequence to a format suitable for the encoder as well as to enhance the subjective quality of the sequence. The video content, to be transmitted, is typically captured in a resolution higher than the 8 Koninklijke Philips Electronics N.V. 2007

17 Unclassified PR-TN 2007/00338 displayed, to allow effective post-production processing. Thus, a processing block is needed to downscale it to an application-suitable resolution. This can be the native resolution of the end-display, or even a lower resolution that will enhance the encoding procedure. In that case, the picture has to be again up-scaled later in the post-processing stage. Frame-rate control is also necessary in this stage to ensure global compatibility. In different countries and applications the default frame rate is not the same. Therefore, frame-rate conversions are needed to ensure interoperability in the global market. Frame rate conversions can also be utilized for encoding purposes. By reducing the frame rate before the encoding and increasing it after decoding, the encoder s load is reduced significantly. Scanning format control is another block in the pre-processing stage. In case the content is captured by a progressive camera and the display format is interlaced or vice versa, a conversion is needed. Both pre- and post-processing stages are candidates to perform the conversion, depending on the scanning format preferred for encoding. An interlacing or de-interlacing block respectively is added to the stage chosen. Color format control in this stage has control of the color depth and the color space. The captured color depth can vary from 1bit to 64bit color, depending on the capturing device. A variety of color spaces formats exists, not compatible with all encoders and displays. This block deals with compatibility issues. Dynamic color range is another important parameter dealt with by the color format controller. Another optional block in the pre-processing stage is the subjective quality enhancement control. Sharpness enhancement can be used before encoding instead of sharpening afterwards, and thus subjectively magnifying the artifacts created. After passing from the pre-processing stage the video sequence is ready to be encoded. The encoding stage includes compression of the video sequence and transformation into a stream. The bitrates that can be used for broadcasting are limited. Due to limitations in the bandwidth available for broadcasting in Europe, the bandwidth cannot exceed 18Mbps. On the other hand, bitrates less than 8 Mbps produce unacceptable quality that is no longer regarded as HD. Three bitrates were chosen for the tests based on the market trends: 8, 12, and 18 Mbps. There are available different encoding standards that can be utilized. In this project MPEG 2, VC1 and H.264/AVC are used. The first encoder that is used for this video broadcasting chain is MPEG 2. The implementation used is Canopus Procoder 2. It is regarded as one of the best implementations in the market with a very high and robust performance. The second encoder is based on H.264/AVC. The implementation is X264, an open implementation which combines the best quality with reasonable encoding time. A set of parameters has to be tuned to get the best encoding quality possible in every scenario. The compressed sequence is then stored in a transport stream that is suitable for broadcasting. The stream is sent by air, satellite or cable to the consumer and it reaches the TV. The post-processing stage includes decoding, converting the video sequence to the native display format and subjective picture enhancement. The decoder decompresses the sequence, making it ready for the next processing blocks. In case of interlaced videos, they must be de-interlaced because High Definition TVs have panels that can display only progressive sequences. Next step is to match the video format with the TV s native format. Thus, some scaling might be needed if the resolution does not 9

18 PR-TN 2007/00338 Unclassified match the native resolution of the TV. Especially if the sequence was downscaled in the pre-processing stage for encoding purposes, the scaling has to be reverted. A frame rate conversion may also be needed. Another important block in the postprocessing stage is artifact reduction. Due to possible poor original quality and compression the final result can have artifacts. This block recognizes artifacts and tries to reduce them, enhancing the subjective quality of the picture. Finally, the users are controlling through their remote control, individually, picture properties like contrast, brightness, saturation, white color temperature, sharpness etc. 1,2 The video chain reaches its end with the final result on the display. 1 The order of the processing blocks can differ, depending on the device and the manufacturer. 2 Depending on the manufacturer a TV set can have different parameters that can be tuned by the user. 10 Koninklijke Philips Electronics N.V. 2007

19 Unclassified PR-TN 2007/ Simulations on the broadcasting video chain The simulations performed on the broadcasting chain can be separated in two parts: the progressive and the interlaced broadcasting chain. These two categories are characterized from the scanning format used throughout the chain. The results are evaluated objectively and subjectively. Some more information on the scanning formats and the evaluation methodology is given as background information Background Before presenting the methodology and the results of the simulations, it is useful to give a small introduction on those two scanning formats and the debate between them. Additionally, the evaluation methodology will be briefly discussed in this introductory part Scanning formats One of the main debates in the HDTV ecosystem is the scanning format that will be used for the video chain. This debate is spread beyond the standardization process to academia, industry, and even enthusiasts among the consumers. It is, therefore, useful to make a good introduction to these two formats and spot their advantages and disadvantages. A video sequence is a sequence of still images representing scenes in motion displayed rapidly one after another. Each of these still images is called a frame. In progressive scanning, every line of a frame is scanned. Interlaced scanning completes a line by line, from top to bottom scan, but only for every second line. The same process is repeated again, only this time starting at the second row, in order to fill in those particular gaps left behind while performing the first progressive scan on alternate rows only. Such scan of every second line is called a field. In terms of processing complexity, progressive scanning has an advantage over interlaced. Special care has to be taken in processing interlaced sequences. Neighboring pixels in interlaced frames do not have a correlation as strong as in progressive ones. Typically, in every frame of an interlaced sequence two different fields are stored, which belong to a different time moment. This fact complicates even simple processing like filtering or compressing. For the same number of lines, progressive scanning has higher vertical resolution with less motion artifacts. It also avoids flickering artifacts in narrow horizontal patterns, which are typical for interlaced sequences. On the other side, progressive has the double pixels compared to interlaced, stressing more the encoder. Interlaced video reduces the signal bandwidth (measured in Megahertz for analog video, or bit rate for digital video) by a factor of two, for a given line count and refresh rate. For a given signal bandwidth and refresh rate, interlaced video can be used to provide a higher spatial resolution than the progressive scan. For instance, 1920x1080 pixel resolution interlaced HDTV with a 50 Hz field rate (known as 1080i50) has a similar bandwidth to 1280x720 pixel progressive scan HDTV with a 50 Hz frame rate (720p50), but approximately 50% more spatial resolution. (Note that this ignores the results of data compression, which tends to be more efficient when applied to progressive scan video, and assumes close to perfect de-interlacing.) 11

20 PR-TN 2007/00338 Unclassified Evaluation methodology The results of the simulation are evaluated both objectively and subjectively. The objective evaluation is done by calculating the Peak Signal to Noise Ratio (PSNR) of the simulation products. PSNR metric is defined in Equations 3.1 and 3.2. (3.1) with Mean Square Error (MSE) defined as: (3.2) where N is the total number of pixels in the image,, the value of the original pixel, and,the reconstructed one (4). The results were also subjectively evaluated by Rene van der Vleuten (senior researcher in Philips Research) and Anastasios Dimou on a TFT-LCD panel (see Appendix A) Progressive video chain Progressive or non-interlaced scanning is a method for displaying video content. In the progressive scanning format, every frame is created by sequentially drawing all the lines. It was first used for television transmissions in the Baird 240 line from Alexandra Palace, England in The interlaced scanning format was subsequently preferred for standard definition (SD) TV transmission and progressive scanning was abandoned. Modern display devices like LCD, micro-mirror (DLP), or plasma, which are inherently progressive scan, have contributed to the re-introduction of the noninterlaced format. The standardized progressive resolution formats (for broadcasting) today are 720p, and 1080p (at 25 frames per second), with 720 and 1080 lines respectively. In this project, they have been tested thoroughly. Some intermediate resolutions have also been tested to get a better insight on the progressive format and its behavior under different bitrates and encoding standards. The effect of film content and reduced frame-rate content is also examined p 720p is the shorthand name for a HDTV resolution format. It was designed at AT&T Bell Laboratories in the late 1980s, under the supervision of Arun Netravali, for analog HDTV. The number 720 stands for 720 lines of vertical display resolution, while the letter p stands for progressive scan. It is regarded the minimum resolution for high 12 Koninklijke Philips Electronics N.V. 2007

21 Unclassified PR-TN 2007/00338 definition TV. 720p assumes, as all HDTV formats, a widescreen aspect ratio of 16:9, and, thus, a horizontal resolution of 1280 pixels for a total of about 0.92 million pixels. The frame rate can be either implied by the context or specified in hertz after the letter p. The five 720p frame rates in common use are 24, 25, 30, 50 and 60 Hz (or fps). In general, traditional PAL and SECAM countries (Europe, Australia, much of Asia, Africa, and parts of South America) are or will be using the 25p and 50p frame rates, whereas traditional NTSC countries (North and Central America, Japan, South Korea, Philippines ) are using 24p, and 60p for high motion programming. The behavior of 720p in the HDTV broadcasting chain is tested with a simple combination of processing blocks, depicted in Figure 3.1. The original 1080p video sequence is downscaled to 720p, using the Avisynth Lanzcos4 filter (see Appendix B). The video is downscaled to 720p, while some slight sharpening is applied. The video sequence is subsequently passed to the encoder. The encoder outputs the encoded bitstream which is decoded and upscaled to 1080p, again with the Avisynth Lanzcos4 filter. The video sequence is ready for evaluation on the 1080p display. Content Acquisition Display Panel 1920x1080p Downscale 1280x720p Encoder/ Decoder 1280x720p Upscale 1920x1080p Figure 3.1: 720p video broadcasting chain The simulation is run for the bitrates chosen, namely 8, 12, and 18 Mbps, for MPEG 2 and H.264/AVC. The SVT sequences (see Appendix C) are used as a test set for the simulation. The SVT material contains some challenging to compress sequences captured with significant high frequency content, intense motion and detailed foreground and background, like CrowdRun, ParkJoy, and DucksTakeOff. There are also sequences which are easy to compress like InToTree, and OldTown- Cross. This test set was created as a representative set of high quality, but diverse in character, pictures that typically appear in consumer s TV sets. The results are evaluated both objectively and subjectively. The results of the PSNR comparisons for all sequences at 18 Mbps can be seen in Figure

22 PR-TN 2007/00338 Unclassified Figure 3.2: PSNR comparison of MPEG 2 and H.264/AVC encodings for 720p sequences at 18Mbps It is interesting to notice that the sequences that were characterized with high coding difficulty, namely CrowdRun, ParkJoy, and DucksTakeOff, have a better PSNR when compressed with H.264/AVC. On the other hand, InToTree, and OldTownCross which are characterized as easy to encode, have a better PSNR when encoded with MPEG 2. Exactly the same behavior can be found at 8 and 12 Mbps. It seems that AVC has an advantage when the level of encoding complexity is high, but MPEG 2 is performing better in a high bitrate. The absolute difference in PSNR is small in most cases (except ParkJoy, which is the most complex sequence). The evaluation results specifically for CrowdRun, one of the most demanding sequences, can be seen in Figure 3.3. Figure 3.3: PSNR comparison of MPEG 2 and AVC encodings for CrowdRun 720p at all bitrates It is evident that just downscaling the original 1080p source to 720p, introduces al- 14 Koninklijke Philips Electronics N.V. 2007

23 Unclassified PR-TN 2007/00338 ready important degradation of the PSNR. Compressing the sequence degrades further the PSNR, starting from almost 6dB at 18Mbps using H.264/AVC until almost 9 db at 8Mbps using MPEG 2. Both H.264/AVC and MPEG 2, though, show stability and robustness in the bitrate range that was tested. H.264/AVC seems to consistently produce slightly better PSNR (<1 db), but the difference is not significant 3 objectively. A viewing session was, subsequently, employed to evaluate the results of the simulation. The viewings showed, as expected, that the encoding bitrate is a crucial parameter for the quality of broadcasting. HDTV sets new requirements in required bitrate, due to the increased resolution and the quality of the acquired sequences. The viewing session showed that 8 Mbps is not enough for broadcasting demanding HD content, both on MPEG 2 and H.264/AVC. For the difficult sequences, namely CrowdRun, ParkJoy, and DucksTakeOff, the quality is so bad that it makes the comparison worthless. MPEG 2 creates severe blocking artifacts, while H.264/AVC smoothes out all the sharpness and produces mild blocking artifacts. 12 Mbps is the minimum bitrate needed for broadcasting this type of demanding content. Problems with blocking artifacts and smoothing still exist in the difficult sequences, but the quality becomes non-objectionable. For the easier sequences like IntoTree and OldTown- Cross it seems to be adequate for broadcasting, especially using H.264 for the encoding. 18 Mbps is an acceptable bitrate. Both encoders have enough bitrate available to show their capabilities. The typical artifacts, of both encoders, still exist for the demanding sequences, and it is up to the viewer to prefer sharpness and blocking artifacts, or blurring and less noise p 1080p is the shorthand name for another HDTV resolution format. It has 1080 lines of vertical display resolution. It is regarded as the main resolution for high definition TV. Therefore, it is sometimes referred to in marketing materials as "True High- Definition" or "Full High-Definition". 1080p assumes, as all HDTV formats, a widescreen aspect ratio of 16:9, and therefore, a horizontal resolution of 1920 pixels for a total of about 2.07 million pixels. 1080p is currently the digital standard for filming digital motion pictures but it is foreseen as the future broadcasting standard for production. The same frame rates apply for 1080p, namely 24,25,30,50, and 60, although ATSC and DVB have standardized only the frame rates of 24, 25, and 30 frames per second, due to current bandwidth limitations of broadcast frequencies. The broadcasting video chain for 1080p is really straightforward, as it can be seen in Figure 3.4. It consists only from an encoder, a decoder and the display panel. 3 Only expert viewers are capable of detecting differences below 1 db for the same type of encoding. 15

24 PR-TN 2007/00338 Unclassified Content Acquisition Display Panel 1920x1080p Encoder/ Decoder 1920x1080p Figure 3.4: 1080p video broadcasting chain The simulation is run for the bitrates chosen, namely 8, 12, and 18 Mbps, for MPEG 2 and H.264/AVC. The results are evaluated both objectively and subjectively. The results of the objective evaluation, in terms of PSNR, for all sequences at 18 Mbps can be seen in Figure 3.5. Figure 3.5: PSNR comparison of MPEG 2 and H.264/AVC encodings for 1080p sequences at 18Mbps The complex sequences, namely CrowdRun, ParkJoy, and DucksTakeOff, are again compressed better with H.264/AVC in accordance with the results for 720p. The difference from MPEG 2 becomes somewhat bigger than in the 720p case. On the other hand, InToTree, and OldTownCross still have a higher PSNR when encoded with MPEG 2, but the difference becomes smaller than in the 720p case. At 8 Mbps the bitrate is not enough for 2.07 million pixels per frame. H.264/AVC is performing better in terms of PSNR, at this low bitrate, increasing the PSNR differences it already had at 18 Mbps and taking the lead in all cases. The PSNR scores in all cases can be seen in Figure Koninklijke Philips Electronics N.V. 2007

7. Figure 3.7: PSNR comparison of MPEG 2 and AVC encodings for CrowdRun 1080p at all bitrates Studying Figure 3.7, it is obvious that AVC gives better PSNR scores than MPEG 2.

25 Unclassified PR-TN 2007/00338 Figure 3.6: PSNR comparison of MPEG 2 and H.264/AVC encodings for 1080p sequences at 8Mbps The evaluation results again for CrowdRun in all examined bitrates can be seen in Figure 3.7. Figure 3.7: PSNR comparison of MPEG 2 and AVC encodings for CrowdRun 1080p at all bitrates Studying Figure 3.7, it is obvious that AVC gives better PSNR scores than MPEG 2. The difference is small for 18 Mbps (0.75 db) and it gets bigger as the bitrate drops, becoming 2 db for 8 Mbps. AVC seems to be more robust in lower bitrates, where the encoder is more stressed. Another interesting observation can be made comparing the PSNR values between 1080p and 720p in all bitrates. At 18 Mbps 1080p gives better PSNR scores than 720p, due to the higher quality of the input. But this advantage is lost when the bitrate becomes 12 Mbps. Despite the superiority of the 1080p input, the encoder cannot handle this amount of data, producing results of lower quality. These phenomena are 17

26 PR-TN 2007/00338 Unclassified even more evident for 8 Mbps. The results were also subjectively evaluated. The results of this evaluation are similar to the ones of 720p. In the lower bitrate, 8 Mbps, all sequences are encoded poorly, resulting in unacceptable quality for the viewer. At 12 and 18 Mbps the quality improves significantly but there are still artifacts. Compared to 720p, artifacts are becoming more intense as the bitrate goes lower. The quality offered by 720p and 1080p is similar. At 18 Mbps sequences are sharper in 1080p than 720p, but the encoding artifacts, in hard-to-encode sequences, are prohibiting the viewer from appreciating the added sharpness. In lower bitrates, the artifacts are becoming the dominant criterion for the perceived quality and, therefore, 720p becomes more attractive for the viewer Non-Standard resolutions Apart from the standard progressive resolution formats (720p and 1080p) any format that has above 720 lines and keeps the aspect ratio to 16:9 can be a candidate resolution format. A set of resolutions was created 4, namely 1728x976, 1632x920, 1536x864, 1440x808, 1344x768, and 1152x While they are highly unlikely to be adopted as broadcasting standards, they can be used to acquire some insight on the Kell factor and a better understanding on the tradeoff between higher resolution and compression artifacts. The first part of the experiment was dedicated on the exploration of the Kell factor of our display device, a LCD panel. Kell factor is an empirical parameter used to determine the effective resolution of a discrete display device. The number was first measured in 1934 by RCA engineer Raymond D. Kell and his associates. It has no fixed value, but is usually taken to be about 0.7, for electron gun scanning. Kell originally defined this as 0.64, and then later revised it to about The number can be higher than 0.9, when modern displays are used. For the experiment, the original 1080p sequences, crowdrun and ducks, were downscaled to each of the previously mentioned resolutions: 1728x976 = 0.90 (1920x1080) 1632x920 = 0.85 (1920x1080) 1536x864 = 0.80 (1920x1080) 1440x808 = 0.75 (1920x1080) 1344x768 = 0.70 (1920x1080) 1152x648 = 0.60 (1920x1080) All the downscaled sequences were subsequently upscaled to 1080p and were directly compared to the original. The methodology used is depicted in Figure 3.8. The target 4 Not all of the resolution formats have an exact 16:9 ratio. This was done to ensure that we have an integer number of coding blocks both horizontally and vertically, for encoding purposes. 5 Although 1152x648 does not have enough lines for HD it was included in our test set for reasons that will be evident further in the experiments. 18 Koninklijke Philips Electronics N.V. 2007

27 Unclassified PR-TN 2007/00338 of the experiment is to eliminate the higher frequencies of the sequences and test whether the difference is visible. The resolution at which the difference becomes just visible is an indication of the role the Kell factor plays in the subjective quality of the sequences. 1080p Downscale factor a (1080 a)p Upscale factor a p 1080p a { 0.6, 0.7, 0.75, 0.8, 0.85, 0.9} Figure 3.8: Kell factor experiment chain The results of the experiment were evaluated by expert viewers in Philips Research laboratories. They showed that for factor a = 0.85 it is already difficult to identify the original sequence in a side-by-side comparison. For a = 0.9 the difference is not distinguishable any more. This factor will be called from now on Maximum Perceived Resolution Factor (MPRF) and it encloses the practical effect of the Kell factor on the maximum effective resolution that can be shown. That means that the LCD panel used for the experiments has a Maximum Perceived Resolution Factor (MPRF) of 0.9 approximately. In the second part of the experiment the downscaled versions of the input sequences were encoded with both MPEG 2 and H.264/AVC at 8, 12, 18 Mbps after downscaling. The methodology used is depicted in Figure p Downscale (1080 a)p Encoding / factor a Decoding (1080 a)p Upscale factor a p a { 0.6, 0.7, 0.75, 0.8, 0.85, 0.9} Figure 3.9: Non-standard resolution experiment chain with encoding The results of this experiment were assessed both objectively and subjectively. For the objective evaluation the PSNR metric was used. The results for two different sequences, CrowdRun and DucksTakeOff, encoded with both MPEG 2 and H.264/AVC can be seen in Figure 3.10, Figure 3.11, Figure 3.12, and Figure

28 PR-TN 2007/00338 Unclassified Figure 3.10: Sequence crowdrun encoded with MPEG 2 in various resolutions Figure 3.11: Sequence ducks encoded with MPEG 2 in various resolutions In Figure 3.10 and Figure 3.11 sequences DucksTakeOff and CrowdRun are encoded in all tested progressive resolutions, with MPEG 2. The trend lines for all bitrates are almost linear in the range between 976p and 648p. On the contrary, for 1080p it is observed that only for 18 Mbps the linearity is consistent. For 12 and especially for 8 Mbps, there is a break in the trend line and the PSNR score is lower. This is a clear indication that the encoder has reached its limits and is unable to cope with the input data, deteriorating the quality of the encoded sequence. Comparing the trend lines, e.g. 18 Mbps, for the two different sequences, they have differences in their trend. For CrowdRun the encoder can take advantage of the details of the full resolution of 1080p and the PSNR is becoming lower as the input resolution becomes lower in a close to linear way. On the other hand, for DucksTakeOff the scene is too complex for the encoder which performs better for lower resolutions. In general there seems to be a trade-off between scene complexity and 20 Koninklijke Philips Electronics N.V. 2007

Unclassified PR-TN 2007/00338 bitrate. Concluding, for every sequence and bitrate, there is a different resolution for which the encoder produces the optimal result. In Figure 3.

264/AVC in various resolutions Figure 3.13: Sequence ducks encoded with H.

29 Unclassified PR-TN 2007/00338 bitrate. Concluding, for every sequence and bitrate, there is a different resolution for which the encoder produces the optimal result. In Figure 3.12 and Figure 3.13 the same simulation has been done for H.264/AVC for both CrowdRun and DucksTakeOff in the progressive scanning format. Figure 3.12: Sequence crowdrun encoded with H.264/AVC in various resolutions Figure 3.13: Sequence ducks encoded with H.264/AVC in various resolutions The results seem to have the same trend with MPEG 2, but with a bit higher PSNR scores. The main difference lies in the 1080p case at 8 Mbps. AVC seems to perform better under stress, showing close to linear behavior in the whole range of resolution 21

30 PR-TN 2007/00338 Unclassified formats tested, including 1080p at 8 Mbps. Concluding these simulations, we realized that the optimal balance between bitrate and resolution is highly context dependent. Thus it is not a straightforward task to get advantage of this knowledge and apply it to some encoding technique Frame-Rate Sub-Sampling High quality HD video content typically has a resolution of 1080p and a frame-rate of 50 frames per second (in Europe). This sums up to an enormous number of pixels entering the encoder every second. One possible solution to relax the encoder significantly is to reduce the frame-rate of the video using temporal sub-sampling prior to encoding and up-sample again after decoding. This solution was put under test in this experiment (simulation). The input video content is sampled temporally by a factor of 2. The sampling is done without filtering, by throwing away half of the frames. The new content is still progressively scanned, with the same resolution, but it now has a frame rate of 25 fps. Thus, every frame is effectively encoded with the double amount of bits (18 Mbps was used). The encoded bit-stream is then decoded and up-sampled using Philips Digital Natural Motion algorithm to reconstruct the missing frames. Philips Digital Natural Motion is using advanced motion detection and interpolation techniques to create new frames between the present ones to produce sharp moving pictures in order to eliminate film judder. The simulation configuration is depicted in Figure Temporal 50fps 25fps Encoder/ Sampling Decoder (factor 2) 25fps Temporal Up-sampling (factor 2) 50fps Figure 3.14: Temporal Sampling and Up-sampling Encoding each frame with the double amount of bits produces excellent results in encoding. The next step is to reproduce the frames that were thrown away. Philips Digital Natural Motion is using the neighboring frames to interpolate the required frames. Motion compensation techniques are employed to ensure a motion close to natural. The result is a frame that is a bit blurred due to the interpolation. These blurred frames cannot be distinguished by the average viewer due to the high framerate. Thus, a high quality encoding can be achieved. This method, though, can create motion artifacts in scenes with complex motion. In some cases (e.g. CrowdRun ) interpolation produces unnatural motion creating an artifact that is annoying for the viewer. The simulation showed that the subjective evaluation of the results depends heavily on the type of content that is used. For simple sequences, the results were very good. On the other side, for complex sequences, the motion artifacts were a limiting factor for the evaluation. Therefore, it was decided to suspend further testing on this scheme Film content Film content comes at a frame-rate of 24 frames per second. Thus, this kind of content can be regarded as very close to the Frame-Rate Sub-Sampling content discussed 22 Koninklijke Philips Electronics N.V. 2007

31 Unclassified PR-TN 2007/00338 above. Despite the similarities, Film material has some special characteristics that define it as a different type of content. The lower frame-rate makes it less demanding in compression, similarly to the previous case, but the capturing procedure is completely different. An important difference from the sequences tested until now, is the camera settings. In the SVT sequences used until now, both foreground and background are captured with high detail. The camera settings are optimized for sharpness and high frequency content. These settings compose sequences that are more demanding than the average content broadcasted. In film content the camera focuses on specific regions of the scene, capturing them in full possible sharpness. On the other hand, it leaves other regions, which compose the background, out of focus and without sharpness. All these parameters make encoding of film content less bandwidth-intensive for the encoder. Another characteristic of film content is that the frames are temporally filtered. They are captured for a lower frame-rate and, thus, they have less temporal high frequencies. As a consequence, applying the Philips Digital Natural Motion algorithm, doesn t give annoying motion artifacts met with 50 fps content. Therefore, we can take advantage of the effectively double bitrate for the encoding. A number of simulations were run with film content in both 720p and 1080p format for various film content. A collection from scenes was used for this purpose containing content with slow and fast motion. For the simulations we used both MPEG 2 and AVC in three different bitrates: 8, 12, and 18 Mbps. The results showed that both encoders give acceptable quality at 18 and 12 Mbps with negligible artifacts, for both 720p and 1080p. In 1080p at 8 Mbps, MPEG 2 instance produces severe blocking artifacts in high motion scenes and immediately after each scene cut. AVC, in the same configuration, has some blocking artifacts, but they are smeared out by the deblocking loop filter. It seems that film content is not challenging enough for the encoders, at least for bitrates higher than 12 Mbps. AVC is producing acceptable quality even for 1080p at 8 Mbps. MPEG 2, on the other side, can encode 720p with as few as 8 Mbps. For 1080p, it needs at least 12 Mbps in high motion scenes, and scene cuts. Moreover, there is almost no subjective difference between 720p and 1080p instances for the input material that was used. Since the project is focusing on more critical content, no further work was done on film content Interlaced video chain Interlacing is a technique of improving the picture quality of a video transmission without consuming extra bandwidth. In every frame, alternately, only the odd or the even lines are scanned. It was invented by RCA engineer Randall C. Ballard in the 1930s. It was ubiquitous in television until the 1970s, when the needs of computer monitors resulted in the reintroduction of progressive scan. The killer application for the interlaced scan pattern was the CRT (cathode ray tube) displays. The only standardized interlaced resolution format today is 1080i. It features a resolution of 1920x1080 and a field-rate of 50 fields per second i 1080i is shorthand name for a category of video modes. The number 1080 stands for 1080 lines of vertical resolution, while the letter i stands for interlaced or non- 23

32 PR-TN 2007/00338 Unclassified progressive scan. 1080i is considered to be an HDTV video mode. The term usually assumes a widescreen aspect ratio of 16:9, implying a horizontal resolution of 1920 pixels and a frame resolution of , and a field resolution of / 2 (because it's interlaced) or about 1.04 million pixels. The field rate (not the frame rate) in hertz can be either implied by the context or specified after the letter i. The two field rates in common use are 50 and 60 Hz, with the former (1080i50) generally being used in traditional PAL and SECAM countries, the latter (1080i60) in traditional NTSC 6 countries. Both variants can be transported by both major digital television formats, ATSC and DVB. The basic broadcasting video chain for interlaced video is depicted in Figure The input video sequences are not natively interlaced. The progressive original versions were used to produce an interlaced instance. Content Acquisition Display Panel 1920x1080p Interlace 1920x1080i Encoder/ Decoder 1920x080i De-interlace 1920x1080p Figure 3.15: Interlaced broadcasting video chain The simulation is run for the same bitrates, namely 8, 12, and 18 Mbps. MPEG 2 is a mature format that works for both interlaced and progressive material. On the other hand, H.264/AVC is relatively new and the support for interlaced was not fully implemented until recently. Version 12 of the reference software JM (see Appendix B) is implementing the bitrate control of encodings of interlaced material and is, thus, included in the tests. Other software solutions tend to deal with two interleaved fields as one frame, encoding it as progressive. The results are evaluated both objectively and subjectively. The results of the PSNR values for all encoded sequences at 18 and 8 Mbps can be seen in Figure 3.16 and Figure Due to a revision of the NTSC format when color became available, the field rate of actual 1080i broadcasts is usually 0.1% slower than is implied. Both the straight 24/30/60 and /29.97/59.94 frequencies are supported by current standards. 24 Koninklijke Philips Electronics N.V. 2007

This can be the result of the blurring that AVC s loop filter introduces, and smoothes out flickering. The difference is about 1 db or less, implying that the difference is not large. Figure 3.

33 Unclassified PR-TN 2007/00338 Figure 3.16: PSNR comparison of MPEG 2 and H.264/AVC encodings for 1080i sequences at 18Mbps The results show that AVC produces a better PSNR value than MPEG 2 for all values when encoding at 18 Mbps. This is true for all the sequences tested. This can be the result of the blurring that AVC s loop filter introduces, and smoothes out flickering. The difference is about 1 db or less, implying that the difference is not large. Figure 3.17: Comparison of MPEG 2 and H.264/AVC for 1080p sequences compressed at 8Mbps The same comparison at 8 Mbps is showing the same results. The lead of AVC is boosted to about 2 dbs. The evaluation results again for CrowdRun alone in different bitrates can be seen in Figure In this figure we can see how the difference between AVC and MPEG 2 is evolving from 18 to 8 Mbps, compared to the uncompressed original. 25

34 PR-TN 2007/00338 Unclassified Figure 3.18: comparison of MPEG 2 and AVC encodings for CrowdRun 1080i at all bitrates It seems that the PSNR difference is getting bigger as the encoder is getting more stressed. Still the difference is relatively small and subjective evaluation is needed to decide which of the two instances has a better perceived quality. The results were, subsequently, subjectively evaluated. The superiority of AVC in terms of PSNR doesn t seem to be confirmed visually. MPEG 2 gives a naturally detailed result with some flickering and blocking effects. AVC is smoothing out the artifacts, but it gives an unnatural blurring to portions of the picture which is annoying for the viewer. The overall quality of the sequences, regardless the encoder, depends a lot on the bitrate used for compression. The quality for 18 Mbps is good enough for broadcasting. Also 12 Mbps produces acceptable quality. As in progressive, 8 Mbps is a very low bitrate for this kind of critical complexity content, and the overall quality is unacceptable for the HD standards. 26 Koninklijke Philips Electronics N.V. 2007

35 Unclassified PR-TN 2007/ Perception Experiment 4.1. Introduction During the project, we have performed a number of simulations with different parameters. Different resolutions, scanning formats, encoding standards, and bitrates have been tested. To validate our results a perception test has been organized. Due to the overwhelming number of comparisons performed during the project, it was not possible to test all of them. We had to restrict to a number of comparisons that would fit in the limited duration of a single perception experiment, and the time planning of the project. Our main goals in this experiment were the following: Compare the proposed high definition standards 720p and 1080i, together with 1080p. Compare the perceived quality for different encoding standards (MPEG 2, H.264/AVC). Evaluate the industry trend to move from the current encoding settings MPEG 2 at 18 Mbps to AVC at 12 Mbps. Today, 720p and 1080i are the HD standards proposed by the European Broadcasting Union (EBU). A number of perception surveys have been already performed on this comparison indicating that 720p is performing better especially at low bitrates. There are, however, voices inside EBU asking for setting a path to 1080p. Therefore, we included 1080p in this test to get a better understanding of its behavior in broadcasting bitrates and compare its performance to the currently proposed standards. MPEG 2 is already being used by broadcasters for High Definition TV, but AVC is just around the corner threatening its position in the HD broadcasting chain. In this test we wanted to test the behavior of both encoders for every scanning format. Moreover, we wanted to evaluate the type of artifacts that the encoders produce and the level of annoyance that they cause to the viewer. As we already mentioned, currently broadcasters are using MPEG 2 for their broadcasts and they have the infrastructure for it. AVC seems promising, to provide better compression ratios and, therefore, better quality. This means for the broadcasters that they will have to change their equipment to migrate to AVC, which would cost money. An incentive to do so would be to reduce the bitrate from 18 Mbps to 12. With this experiment, we wanted to assess the effects that this change would have to picture quality for the viewers. We chose to use as input material three of the SVT sequenceserror! Bookmark not defined., namely CrowdRun, DucksTakeOff, and OldTownCross. CrowdRun was chosen because it includes a complex motion scene together with high detailed nature in the background. It is regarded as a complex and challenging sequence to encode. DucksTakeOff was our second choice because of the existence of water, which is always challenging to encode, and the rapid motion of the ducks wings. It is also challenging for test participants to evaluate. Our last choice was OldTownCross, which is a rather easy sequence to encode. It is a sequence with a 27

36 PR-TN 2007/00338 Unclassified lot of film-grain noise, and not so much detail Experiment protocol The different parameters of the broadcasting chain employed in this survey include variations in: Resolution (1920x1080, 1280x720) Scanning format (progressive, interlaced) Encoder (MPEG 2, H.264/AVC) Encoding bit-rate (8, 12, 18 Mbps) Out of all possible combinations of processing, 6 cases were chosen for the comparison: MPEG2 720p 18Mbps MPEG2 1080p 18Mbps MPEG2 1080i 18Mbps AVC 720p 12 Mbps AVC 1080p 12 Mbps AVC 1080i 12 Mbps An exhaustive paired comparison was used to evaluate the image quality perception. A total of 6x5/2=15 different comparisons are produced for each video sequence. The test comprised 3 different video sequences, so the total number of comparisons in the test is 15x3=45. Participants are seated in front of a TV (SONY KDL-40X2000) at a distance of 4 times the height of the visible display area of the TV, namely 4 x 0.50 = 2m. The participants took the test one by one, making sure that they have the best viewing angle (perpendicular to the middle of the screen) in front of the TV. The TV is set at the default parameters (setting standard ) defined by the manufacturer. The experiment is being held under artificial stable lightning conditions, 50 Lux measured on the screen in the direction of the viewer. The curtain around the screen gave a lightning of 300 Lux. Instead of simulating the living room conditions, we adjusted the lightning to the settings assumed by the manufacturer. To achieve that, we use the light sensor built-in the TV. Lightning conditions were adjusted such as the brightness of the 7 The characterization of the sequences to easy and difficult to encode has been made according to the SVT characterization and the PSNR scores of the sequences after compression. 28 Koninklijke Philips Electronics N.V. 2007

37 Unclassified PR-TN 2007/00338 screen would not be adjusted automatically by the TV. The video sequences are originally in different resolution and scanning formats. They have all been converted to the highest format, 1920x1080 progressive, for comparison reasons. Split-screens have been created with both sides of the split-screen showing the same original sequence, but in different picture formats and/or with different processing. Each instance is positioned on the screen either right or left, but overall, positioning is symmetrical. For every participant the comparisons are done in a different order, using a pseudorandom ordering. The participants are requested to judge the overall image quality and choose which version of the video sequence they prefer (left or right). They do not know which instances of the sequences are being compared (blind test). The perception test is taken by 20 participants. They are a mixed team of employees from the Philips Research Eindhoven Video Processing groups (experts) and students or employees with a different background (non-experts) Results After the completion of the perception test the data was given to the Video Processing and Visual Perception Group in Philips Research Eindhoven for statistical processing (5). The results were subsequently processed and the results were sent back. During the processing no data was removed or excluded from the test set. The results were organized in four groups of cases according to the sequences included: CrowdRun DucksTakeOff OldTownCross All Sequences After the statistical processing, every instance of each sequence is given a value called the z-score which is calculated using the Thurstone method (6). The z-score can take values from 0 to 3. It is a metric showing the preference to this instance of the sequence. Using an inverse cumulative normal probability function the z-score can be mapped to a preference probability. Two different instances of the same sequence have a significant statistical difference if they have a difference in preference probability bigger than 20%. All instances of each sequence, both MPEG 2 and AVC, are directly compared. The results will be presented sequence by sequence, and then all together. The preference probability and the z-score are used for the presentation of the results. The z-score metric is presented here on radar graphs. On those graphs the six different TV-formats are depicted on the edges of the hexagon. The z-score value of each format is depicted by the edge of the colored area, on the radius starting from the center of the hexagon to the corresponding, to the format, edge. By studying the topology and the size of the colored area, it is easy to get an intuitive idea about the preferred formats during the test. 29

PR-TN 2007/00338 Unclassified Moreover, the preference probability graphs are translating the results to a percentage format that is easier to understand quantitatively.

38 PR-TN 2007/00338 Unclassified Moreover, the preference probability graphs are translating the results to a percentage format that is easier to understand quantitatively. Every format is depicted by a bar with the preference probability value on it. Every graph includes all examined formats CrowdRun The z-scores for CrowdRun are depicted in Figure 4.1, and the preference probability in Figure 4.2. Figure 4.1: The z-score for all instances of the sequence "CrowdRun" Figure 4.2: Preference probability for all instances of the sequence CrowdRun With a quick look at the z-score diagram, it is evident that there is a strong preference for the MPEG 2 encodings. This preference is also statistically confirmed in all cases. The feedback from the participants mentioned that the artifacts of AVC on the grass were very obvious and annoying. They indicated that it was a very easy selection 30 Koninklijke Philips Electronics N.V. 2007

39 Unclassified PR-TN 2007/00338 because the overall perceived sharpness was strongly in favor of MPEG 2. The only AVC format that survived was 720p, but still the difference from the lower ranked MPEG 2 format, 1080i, was statistically significant. The differences between the MPEG 2 instances are not statistically significant. On the AVC side, 720p is statistically better than 1080i and on the verge of being safely regarded as better than 1080p, with a difference of 19%. During the feedback sessions after the perception test, participants commented that AVC is producing blurring artifacts on the grass making it annoying for the viewer. For the 1080p instance, they commented that it produced even more blurring than the 720p instance DucksTakeOff The z-scores for DucksTakeOff are depicted in Figure 4.3, and the preference probability in Figure 4.4. Figure 4.3: z-score for all instances of the sequence "DucksTakeOff" 31

40 PR-TN 2007/00338 Unclassified Figure 4.4: Preference probability for all instances of the sequence DucksTakeOff The z-score diagram shows that also for DucksTakeOff there is a preference for MPEG 2, but also AVC 720p. Preference probability graphs show that MPEG 2 720p, MPEG i, MPEG p, and AVC 720p cannot be safely compared since they gather almost the same percentage of preference. The difference, though, from AVC 1080i and 1080p is significant, exceeding 20%. Many of the participants, in discussions after the end of the test, commented that it was a very difficult comparison for them. The artifact that some of them could notice was blockiness, which was more evident in the 1080p case. In their feedback, they also noted for AVC that despite the high level of detail on some regions of the image, some easy to encode regions had contouring and blocking artifacts OldTownCross Figure 4.6. The z-scores for DucksTakeOff are depicted in Figure 4.5, and the preference probability in 32 Koninklijke Philips Electronics N.V. 2007

41 Unclassified PR-TN 2007/00338 Figure 4.5: z-score for all instances of the sequence "OldTownCross" Figure 4.6: Preference probability for all instances of the sequence OldTownCross Studying the results for OldTownCross, there is no certain result that we can draw. AVC seems to be overall better than MPEG 2, but the lead is not very confident. Another interesting fact is the advantage that the 1080p instances, of both AVC and MPEG 2, have. Taking into account the high PSNR scores of this sequence after the encodings, it seems that the sequence is easy to encode and the encoders can take advantage of the extra detail of 1080p. On the other hand, the encodings were of good quality in every case and it was hard for some non-expert participants to spot the differences. The only exception was MPEG i, which proved significantly worse than most of the other instances. Due to its interlaced nature, it suffered from flickering in horizontal edges. It seemed, though, that this flickering was much more 33

PR-TN 2007/00338 Unclassified annoying for MPEG 2 than for the AVC instance which performed better in the tests. 4.

42 PR-TN 2007/00338 Unclassified annoying for MPEG 2 than for the AVC instance which performed better in the tests All Sequences The z-scores for the complete set of sequences are depicted in Figure 4.7, and the preference probability in Figure 4.8. Figure 4.7: z-score for all instances of the complete set of sequences Figure 4.8: Preference probability for all instances of the complete set of sequences Summing up all the results, we can make some overall comments. In the difficult-toencode sequences, like CrowdRun and DucksTakeOff, where artifacts are significant, the participants tended to like the sharpness and constant quality of MPEG 2 encodings. In the same sequences, there was no significant difference between 720p, 1080p, and 1080i. On OldTownCross, which is classified as easy to encode by SVT, AVC was preferred, partly because of the film-grain noise filtering, to MPEG Koninklijke Philips Electronics N.V. 2007

43 Unclassified PR-TN 2007/00338 In this sequence, both encoders seem to take advantage of the extra detail of 1080p and participants prefer it over 720p. On the interlaced side, AVC 1080i was the least preferred of all formats, having statistically important differences from AVC 720p and all the MPEG 2 instances. 1080i MPEG 2 performed better staying close to the progressive formats in terms of preference of the participants. 35

45 Unclassified PR-TN 2007/ Results & Conclusions The evaluation of the HDTV broadcasting chain and its parameters is a highly complicated subject. It involves countless technical details that can affect drastically the final results, as well as political issues that can bias the results of a survey. In this project we tried to stay out of any influence or bias, and create clear, transparent, and reproducible results that will add to our knowledge of the HDTV broadcasting chain. We, certainly, do not claim to have created an exhaustive research on the subject, but we tried to give some new insights and clear out some existing myths. In this chapter of the report the results are presented and they will be further discussed in the Discussion section. The quality of the products of the interlaced broadcasting chain is highly dependent on the quality of interlacing and deinterlacing. Using tools provided by Philips Research Laboratories (PTS - MBVP), an interlaced version of all input sequences was created. The interlaced sequences that were created scored 4 db higher in terms of PSNR than the SVT ones. Moreover the perceived sharpness of the sequences was higher. MPEG 2 and AVC produce encodings of comparable objective quality for HDTV content at the moment. Comparing AVC with MPEG 2 objectively, in terms of PSNR, AVC has a small gain over MPEG 2 for hard-to-encode video sequences (e.g. CrowdRun, ParkJoy, and DucksTakeOff ), or for high resolution formats (e.g. 1080p) encoded with low bitrates (e.g. 8 Mbps). For easy-to-encode sequences (e.g. Old- TownCross and InToTree ) and low bitrates (e.g Mbps) MPEG 2 produces encodings with higher PSNR. The perceived quality of the two encoders is also comparable. MPEG 2 and AVC produce different types of artifacts for the same bitrate, but subjectively the difference is small. MPEG 2 maintains sharpness but creates blocking artifacts. On the other hand, AVC smoothes out blocking artifacts and noise, but also removes details and sharpness. Moreover, MPEG 2 produces an average uniform quality, while AVC provides a region dependent quality for every frame. It is up to the viewer to decide which type of artifacts is preferred. This project was concluded with a perception test focusing on the industry trend of migrating from MPEG 2 at 18Mbps to AVC at 12Mbps. The participants of the perception test had a preference for MPEG 2 at 18 Mbps in the difficult-to-encode sequences. For the easy ones there was no clear preference. Overall, the subjects preferred the MPEG 2 encodings. They assessed the MPEG 2 artifacts as less annoying than the ones produced by AVC. For MPEG 2, the progressive format is comparable in quality to the interlaced one; the difference is small, and in most cases not statistically significant. Only exception was OldTownCross where many vertical edges exist and the flickering was annoying for the test participants. For AVC, where the encoding for interlaced content is still under development, the difference between progressive and interlaced was bigger. 37

47 Unclassified PR-TN 2007/ Discussion & Future Work After presenting the results a more thorough discussion on them is presented in this chapter. Conclusions, technical details and choices that have been made during the project are discussed. Furthermore, some proposals for future work and open subjects are stated here. High quality content is the target of this project. The sequences shot by SVT are captured with settings optimized for maximum detail and sharpness, both at the foreground and the background. Their complexity is well above the content that is currently broadcasted. Therefore, the conclusions drawn out of this project apply only to demanding content. Some simulations with simpler content were done, but the results were quite different and could have been misleading for the rest of the project. Thus, these simulations were stopped and the results are briefly presented in the Film Content paragraph of the Progressive Video Chain. It is a challenge though for further research to make more simulations with this kind of content. The beginning of the HDTV era is here for big production studios and it will be interesting to work on material that is now produced. The scaler was one of the first choices made in this project. Avisynth Lanczos filter is performing some sharpening next to the scaling. Its purpose is to compensate for the lost sharpness due to downscaling, but also to create additional perceived sharpness in the upscaling. The contrast enhancement can be considered aggressive. It causes some light ringing effects, but also creates artificial higher frequency content. These effects enhance the perceived detail of the picture. The sharpening applied by the filter, levels the perceived sharpness of the original 1080p content. Thus, it forces the viewer to take a decision based on the detail of the tested sequence and not just the sharpness. Thus, sitting in a distance where the eye becomes the limiting factor for perceived detail, choosing between 720p and 1080p can become tricky. The interlacer and the de-interlacer used for the broadcasting chain can affect significantly the quality of the end result. Employing PTS Interlacer, the 1080i sequences that were created were about 4 db better, in terms of PSNR, from the SVT sequences. The added sharpness due to the new interlacing process, was preserved to an extent in the encoded sequences (see Appendix C). Combined with the MBVP de-interlacer, the final result was comparable with the progressive encodings. This was confirmed by the perception test, where 720p and 1080p were not significantly better than 1080i. In contrast with previous perception tests in the literature, did not result to a clear victory of the progressive format. A bad interlacer and de-interlacer can be a limiting factor in the comparison of progressive and interlaced formats. Thus, extreme care has to be taken when choosing the tools to handle interlaced content. MPEG 2 and AVC have a different Rate-Distortion (R-D) optimization strategy. MPEG 2 is optimizing its bitrate control based on the overall quality of the picture. Thus, the whole picture is encoded with a uniform quality, depending on the overall complexity. On the other side, AVC is optimized for PSNR. As a consequence, it provides a lot of bits in hard-to-encode areas of the image and less in easier parts. In case of complex sequences, difficult regions are encoded with high quality, in contrast with easier regions where annoying contouring and blocking artifacts are introduced. Therefore, its product has a PSNR higher than MPEG 2, but subjectively it is not preferred to the average, but uniform, quality of MPEG 2. In easy sequences, where the bitrate is enough, AVC has a slightly lower PSNR due to the loop filter which removes high frequency details. It produces, though, a clean result, without noise, 39

48 PR-TN 2007/00338 Unclassified that is pleasant for the average viewer. It would be interesting in the future to compare MPEG 2 and AVC with the same R-D optimization strategy. MPEG 2 and AVC produce encodings of comparable subjective quality for HDTV content at the moment. AVC supporters insist that it produces the same quality with MPEG 2 for half the bitrate, but this proved not to be true in our simulations. This statement may be true for low bitrate video content, but for the high bitrates (8-18 Mbps) of High Definition (HD) signal the difference is small. For even higher bitrates (e.g. 25 Mbps for BluRay or HD-DVD), the difference is not even perceivable. Comparing AVC with MPEG 2 objectively, in terms of PSNR, AVC has a small gain over MPEG 2 when the encoder is under intense pressure. This happens for complex video sequences (e.g. CrowdRun, ParkJoy, and DucksTakeOff ), or for high resolution formats (e.g. 1080p) encoded with low bitrates (e.g. 8 Mbps). For easier sequences (e.g. OldTownCross and InToTree ) and higher bitrates (e.g Mbps), MPEG 2 produces encodings with higher PSNR. The two encoders produce different types of artifacts for the same bitrate, but subjectively the difference is small. MPEG 2 maintains sharpness but creates blocking artifacts. On the other hand, AVC smoothes out blocking artifacts and noise, but it also removes details and sharpness. It is up to the viewer to decide which type of artifacts is preferred. These results reflect the current situation, regarding the encoders. Due to the constant optimization of AVC, this comparison should be regularly revisited and studied until AVC matures. During the subjective evaluations, a 1080p panel and a 1080p capable TV set were used. This raises some concerns about the effect of the Kell factor in the results of the experiment. For the 1080p sequences, Kell factor implies that the human eye will not be able to see the full range of frequencies available. After some short experiments that are described in the Progressive Video Chain chapter, we reached the conclusion that the Maximum Perceived Resolution Factor (MPRF) for the TFT-LCD panel that we used is around The same experiment was performed for the TV used for the perception test and was found around As a consequence, according to MPRF, the maximum frequency that we can see, in the vertical direction, is the number of lines divided by 2 and multiplied by the Maximum Perceived Resolution Factor (MPRF). That is 459 cycles for the panel and 486 for the TV set. This can be a limiting factor for the perceived high frequencies displayed and, thus, a bias towards 720p in the comparisons with 1080p instances. In addition to that, for the perception tests we chose a distance of 4 times the picture height (about 2 meters). This choice is also introducing some possible limitations in the high frequency content that we can perceive. Considering the human visual system, the average viewer can see frequencies that are bound by the product of the viewing angle (θ), as described in Figure 6.1, multiplied by his/hers contrast sensitivity, which is defined as 30 cycles/degree (Figure 6.2). 40 Koninklijke Philips Electronics N.V. 2007

49 Unclassified PR-TN 2007/00338 θ h Figure 6.1: Representation of TV viewing conditions Figure 6.2: Contrast Sensitivity Function of the Human Visual System For a distance of 4 times the picture height, this adds up to about 430 cycles compared to the theoretical maximum of 540 cycles. Consequently, this choice is setting even higher constraints to the maximum frequencies that we can perceive. To check the frequency content of the sequences displayed, we chose the most detailed of the sequences CrowdRun. The amplitude of the frequency content of the original in a linear scale is shown in Figure 6.3. For representation purposes the color scale has been bounded for values between 0 and Values above are represented by the same red color. 41

50 PR-TN 2007/00338 Unclassified Figure 6.3: Amplitude of the frequency content of 1080p "CrowdRun" We can see that CrowdRun has most of its frequency content below 450 cycles even in the original 1080p version. During encoding some of the high frequency content is lost as it can be seen in Figure 6.4 and Figure 6.5. Figure 6.4: Amplitude of the frequency content of 1080p "CrowdRun" encoded with MPEG 2 For MPEG 2 we can see that the content frequency spectrum has been reduced. There is some additional content in high vertical-only and horizontal-only details. It seems that the blocking artifacts produce some additional frequencies. Most of the frequency content is bounded below 400 cycles. 42 Koninklijke Philips Electronics N.V. 2007

51 Unclassified PR-TN 2007/00338 Figure 6.5: Amplitude of the frequency content of 1080p "CrowdRun" encoded with AVC For AVC the frequency spectrum is further reduced due to the loop smoothing filter. Most of the frequency content is also bounded below 400 cycles. Concluding, the frequencies available in the encoded instances of Crowdrun are mostly bounded below the 430 cycles that the viewer could see in the perception test. It is known, though, that the energy in the high frequency bands of video material is low compared to the low frequency bands, but it still makes a difference in perceived sharpness. Moreover, the artifacts in the sequences even at 18 Mbps are introducing frequency content that is significantly higher (2-3 orders of magnitude) than the remaining high frequency content. Therefore, the artifacts are strong enough to attract the attention of the viewer, drastically reducing the perceived overall quality. The possible extra detail that the 1080p resolution introduces cannot be appreciated by the viewer due to the intensity of the artifacts. Defining a distance of 4 times the picture height can actually help the viewer concentrate on the overall quality of the picture and not be trapped by the existing artifacts. To our opinion, the compression quality today, even at 18 Mbps, is insufficient to profit from 1080p, for sequences of such complexity. During our simulations we realized that there is no global optimal encoding standard or scanning format. It is always the content itself that defines which is the best parameter set for the broadcasting chain. Some low complexity, stationary content favors higher resolution (e.g. 1080p) and, depending on its vertical frequency content, interlaced scanning. On the other hand, high complexity content will favor 720p because it offers less visible artifacts. The simulations performed showed that in most cases an intermediate resolution yields the (objectively and subjectively) best results. The same conclusion applies for the choice of the encoders. For noisy video content viewers seem to prefer H.264/AVC that smoothes out the noise (sometimes even compared to the uncompressed original) to MPEG 2. For different content they can choose MPEG 2 that gives higher perceived sharpness 43

53 Unclassified PR-TN 2007/ Acknowledgements I would like to thank some people in Philips Research Laboratories for their contribution in this project. Rene van der Vleuten, for his continuous support, advice and help during the project. I couldn t have done it without him. Gerard de Haan, for his supervision, the interest he showed and his important remarks and support throughout the project. The participants in my perception test, for the time you spent for me and the patience you showed. Ingrid Heijnderickx, Arnold van Keersop, Ronald Kaptein, and Roos Rajae-joordens, Ingrid Vogels from the Visual Perception Group of Philips Research Laboratories for their support during the perception test. VPA and VPS groups, for their support and hospitality. Special thanks to Frank, Harold, Ihor, Leo Jan, and Nico. Ine van den Broek, for taking care of the administrative tasks. I would also like to thank some people from TU/e for their support: Rian van Gaalen, secretary of the SAI-ICT program. Marja de Mol, secretary of the SAI-ICT program. Leon Kaufmann, Director of the SAI-ICT program. 45

55 Unclassified PR-TN 2007/ Bibliography 1. SVT Corporate Development Technology. Overall-Quality Assessment When Targeting Wide-XGA Flat Panel Displays. Stockholm : s.n., Alexander Schuch (FH Köln), Tobias Schwahn (HdM Stuttgart). Eine qualitative Vergleichsuntersuchung der HD-Formate 720/50p und 1080/50i. Mainz, Germany : s.n., Hoffmann, Hans. High Definition TV. Was bringt die zukunft? s.l. : EBU-UER, 1 17, Hamon, Gregory. Objective Quality Measures For Compressed Images. Eindhoven : Nat.Lab, / Rosemarie Rajae-Joordens, Jan Engel. Paired comparisons in visual perception studies using sample sizes. Eindhoven, the Netherlands : Elsevier, Thurstone, L.L. A law of comperative judgement. s.l. : Psychological Review, , pp AU Optronics, Inc. Product Functional Specification: 46" Color TFT-LCD Module T460HW01 v.3. November 11, SONY Corporation. LCD Digital Colour TV. BRAVIA KDL-40X International Organization for Standardization. ISO/IEC :2000. Information technology -- Generic coding of moving pictures and associated audio information: Systems. 11 2, Doom9.net. Codec shoot-out Final. Doom9. [Online] MSU Graphics & Media Lab (Video Group). Third Annual MSU MPEG-4 AVC/H.264 Video Codec Comparison. Compression Project. [Online] _avc_h264_2006_en.html. 12. Marco, Bosma. Eco PixelPlus: Low cost PixelPlus for mainstream application. Eindhoven : Philips, Internal Report. PR-TN 2005/ Reconstruction filters in computer graphics. Netravali, D. P. Mitchell and A. N. s.l. : ACM Press, SIGGRAPH '88 Proceedings. Vol. 22, pp AviSynth. Resize. AviSynth, A Powerful Non-Linear Scrippting Language For Video. [Online] April 2, Bellers, Erwin. MBVP de-interlacing: An algorithmic overview. s.l. : Philips, Internal Report. 47

56 PR-TN 2007/00338 Unclassified 16. A 2-dimensional generalised sampling theory and application to de-interlacing. Ciuhu, Calina and Haan, Gerard de. s.l. : SPIE, VCIP Proceedings. 17. SVT. The SVT High Definition Multi Format Test Set. Stockholm : s.n., February Koninklijke Philips Electronics N.V. 2007

57 Unclassified PR-TN 2007/00338 Appendix A. Hardware configuration A.1. Streamer During the evaluation the sequences were played uncompressed in both the TFT-LCD panel and the TV set. To achieve streaming the data to the display devices, a streamer PC was used. The compilation of this streamer can be seen in Table A.. Table A.1: PC streamer characteristics PC Streamer Processor 2 x Opteron Ghz Motherboard Nvidia nforce 4 chipset RAM 2 Gbytes Graphics Card Nvidia GeForce 7800 GTX Hard Disk 8 x Maxtor 6 L200 MO SCSI 1 x Maxtor 6 B200 MO Network Adapter 1 Gbit Lan adapter Operating System Windows XP Pro Service Pack 2 A.2. Displays A.2.1. TFT-LCD Panel The display panel is an AUO 46.0 inch Color TFT-LCD Module T460HW01. This LCD module has a TFT active matrix type liquid crystal panel 1920x1080 pixels, and diagonal size of 46 inch. This module supports 1920x1080 HDTV in progressive mode. It is intended to support displays where brightness, wide viewing angles, high color saturation, and high color depth are very important. Some general information about the panel is given in Table A.2. Table A.2: TFT panel general Information Items Specification Unit Actual Screen Size 46 Inches Display Area (H) x (V) mm Outline Dimension (H) x (V) x 47.68(D) mm Driver Element a-si TFT active matrix Display Colors 16.7M Colors 49

58 PR-TN 2007/00338 Unclassified Number of Pixels 1920 x 1080 Pixel Pixel Arrangement RGB vertical stripe Pixel Pitch 0.531(H) x 0.531(W) mm Display Mode Normally Black Surface Treatment Anti-Glare Viewing Angles 178 o x 178 o Degrees Temperature Range 0-50 o o Celsius Interface Low Voltage Differential Signaling Backlight 28 Cold Cathode Fluorescent Lamps Inverter Built - In Furthermore, it is important for a perception test to be aware of the optical characteristics of the display. The optimal viewing parameters can be extracted from those specifications. The main optimal parameters are given in Table A.3. Table A.3: TFT panel optical specifications Parameter Symbol Values Typ Max Unit Contrast Ratio CR 600 Surface Luminance, white LWH 600 cd/m 2 Luminance Variation δ white(9p) 1.33 T g 8 Response Time T r 15 ms T d 5 x axis, right(φ = 0 o ) θ r 85 x axis, left(φ = Viewing 180 o ) θ l 85 Angle y axis, up(φ = 90 o ) θ u 85 Degree y axis, down(φ = 0 o ) θ d 85 Color is another parameter of the optical specifications. In this panel each pixel is divided into Red, Green and Blue sub-pixels or dots which are arranged in vertical stripes. Gray scale or the brightness of the sub-pixel color is determined with an 8-bit gray scale signal for each dot. More details on the color specification of the display are given in Table A Koninklijke Philips Electronics N.V. 2007

59 Unclassified PR-TN 2007/00338 Table A.4: TFT panel Color Specifications Parameter Symbol Values Min Typ Max RED R x R y GREEN G x Color Chromaticity G y Typ BLUE B x B y WHITE W x W y Typ For further details on the TFT-LCD panel please refer to the panel s white paper (7). A.2.2. TV Set For the subjective evaluations and the perception tests, a TV set was used. Since the comparisons were made in 1080p scanning format, a TV capable of receiving 1080p as input was needed. The TV set that was used is a Sony KDL-40X2000. Figure A.1: The SONY KDL-40X2000 TV The main characteristics are given in Table A.5. 51

60 PR-TN 2007/00338 Unclassified Table A.5: Sony KDL-40X2000 specifications Sony KDL-40X2000 Characteristics Value Details Screen size: Display Resolution: Panel Type: Dimensions: Mass: TV Signal Reception: Power Consumption: Standby Power Consumption: Input: Output: More details on the TV can be found in (8). 40 inches 1920 x 1080 dots LCD Panel 1111(w) x 657 (h) x 121(d) mm Approx. 30 Kg Analogue PAL,SECAM,NTSC B/G/H, D/K,L,I Digital MPEG 2 MP@ML DVB-T 205 W 0.3 W 2 x HDMI 1080p,1080i,720p, 576p,576i,480p,480i 1 x 4-pin S-Video 1 x Composite 2 x Component video 1080i,720p,576p,576i,480p,480i 3 x 21-pin Scart 1 x RF 1 x 15-pin D-Sub VGA,SVGA,XGA,WXGA (only progressive) PC Input 1 x Optical Output 1 x Audio Output 1 x Headphone jack 52 Koninklijke Philips Electronics N.V. 2007

61 Unclassified PR-TN 2007/00338 Appendix B. Software configuration B.1. MPEG 2 - Encoder B.1.1. Introduction to MPEG 2 MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information (9). It was developed by the Motion Pictures Expert Group (MPEG) and is an international standard (ISO/IEC 13818). MPEG 2 was designed to have a practically limitless application range. This was achieved by not completely specifying formats to be used with certain applications. Regional institutions adapt it to their needs by restricting and augmenting aspects of the standard and by making use of predefined Profiles and Levels (see Table B.1 and Table B.2). It is the core encoding standard of the digital television signals that are broadcast by terrestrial (over-the-air), cable, direct broadcast satellite TV systems, and DVDs. MPEG-2 consists of a Systems part (part 1) and a Video part (part 2). The Systems part defines two distinct container formats. One is the Transport Stream, which is designed to carry digital video and audio over a medium without guaranteed Quality of Service (QoS). MPEG-2 Transport Stream is commonly used for broadcasting applications, such as ATSC and DVB. MPEG-2 Systems also defines Program Stream, a container format that is designed for reasonably reliable media such as disks. MPEG-2 Program Stream is used in the DVD, SVCD, Blue-ray, and HD-DVD standards. The Video part (part 2) of MPEG-2 provides support for both progressive and interlaced video. Table B.1: List of MPEG 2 Profiles Abbr. Name Frames YCbCr Streams Details SP Simple Profile P, I 4 : 2 : 0 1 no interlacing MP Main Profile P, I, B 4 : 2 : P 4:2:2 Profile P, I, B 4 : 2 : 2 1 SNR SNR Profile P, I, B 4 : 2 : SP Spatial Profile P, I, B 4 : 2 : HP High Profile P, I, B 4 : 2 : Table B.2: List of MPEG 2 Levels Low, Normal And High Quality Decoding Abbr. Name Pixels/Line Lines Frame-rate (Hz) Bitrate(Mbps) LL Low Level ML Main Level H-14 High HL High Level

62 PR-TN 2007/00338 Unclassified B.1.2. Canopus Procoder 2 MPEG 2 is out in the market already for about 10 years. During those years commercial encoders have matured and are able to exploit fully the virtues of the codec. There are various implementations available in the market. The encodings for the tests were done using Canopus Procoder 2, provided by Philips Research Laboratories. Procoder 2 is one of the best products in the market with an outstanding performance on both progressive and interlaced video. Moreover, it seems to be one of the favorite encoders among encoding enthusiasts, and it has won a number of awards. Canopus Procoder 2 is highly parameterized, enabling the user to get the best possible result. An example snapshot of these parameters for 1080i, is shown in Figure B.1. Figure B.1: Canopus Procoder 2 parameters snapshot This project s main aim is broadcasting. Therefore, there are some guidelines that have to be followed. For broadcasting video, due to the constant bitrate channels that are used for broadcasting, the transported stream must also have a constant bitrate (CBR). Therefore we use the CBR encoding for the tests conducted. An overview of the settings used in this project for video content can be seen in Table B.3, Table B.4, and Table B.5. Table B.3: Encoding parameters used for encoding 720p in Procoder 2 Stream Format Stream Type Resolution Frame Rate Interlacing Generic ISO MPEG Stream MPEG 2 Elementary Stream 1280x fps Not Interlaced 54 Koninklijke Philips Electronics N.V. 2007

63 Unclassified PR-TN 2007/00338 Aspect Ratio Code 16:9 Quality/Speed Mastering Quality Bitrate Type CBR Number of Passes 1 Pass Video Bitrate 8,12,18 Mbps Profile/Level HL Put Header on Each GOP No VBV Buffer Size 1194 Maximum GOP size 30 GOP Structure Automatic Picture Structure Automatic Intra DC Precision 9 Use Strict GOP Bitrate Control No Create DVD Compatible Stream No Table B.4: Encoding parameters used for encoding 1080i in Procoder 2 Stream Format Generic ISO MPEG Stream Stream Type MPEG 2 Elementary Stream Resolution 1920x1080 Frame Rate 25 fps Interlacing Upper/Top Field First Aspect Ratio Code 16:9 Quality/Speed Mastering Quality Bitrate Type CBR Number of Passes 1 Pass Video Bitrate 8,12,18 Mbps Profile/Level HL Put Header on Each GOP No VBV Buffer Size 1194 Maximum GOP size 15 GOP Structure Automatic Picture Structure Automatic Intra DC Precision 9 Use Strict GOP Bitrate Control No Create DVD Compatible Stream No Although 1080p at 50 frames per second is not standardized for MPEG 2 a trick was utilized to achieve the compression. The frame-rate was changed in the header of the input from 50 to 25 frames per second. The encoding could, then, be performed. To maintain the total bitrate equal, the video bitrate (bits per second) was also reduced to half. After decoding, the frame-rate was corrected in the header. 55

64 PR-TN 2007/00338 Unclassified Table B.5: Encoding parameters used for encoding 1080p in Procoder 2 Stream Format Generic ISO MPEG Stream Stream Type MPEG 2 Elementary Stream Resolution 1920x1080 Frame Rate 25 fps Interlacing Not Interlaced Aspect Ratio Code 16:9 Quality/Speed Mastering Quality Bitrate Type CBR Number of Passes 1 Pass Video Bitrate 4,6,9 Mbps Profile/Level HL Put Header on Each GOP No VBV Buffer Size 1194 Maximum GOP size 30 GOP Structure Automatic Picture Structure Automatic Intra DC Precision 9 Use Strict GOP Bitrate Control No Create DVD Compatible Stream No For the film content the same settings are used. The only difference in settings is the frame-rate, which is either 24 or frames per second. B.2. H.264/AVC Encoder B.2.1. Introduction to H.264/AVC H.264, MPEG-4 Part 10, or AVC (for Advanced Video Coding), is a digital video codec standard that is noted for achieving very high data compression. It was written by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership effort known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard (formally, ISO/IEC ) are technically identical. The final drafting work on the first version of the standard was completed in May The standard includes the following six profiles, targeting specific classes of applications: Baseline Profile (BP): Primarily for lower-cost applications with limited computing resources, this profile is used widely in videoconferencing and mobile applications. 56 Koninklijke Philips Electronics N.V. 2007

65 Unclassified PR-TN 2007/00338 Main Profile (MP): Originally intended as the mainstream consumer profile for broadcast and storage applications, the importance of this profile faded when the High profile was developed for those applications. Extended Profile (XP): Intended as the streaming video profile, this profile has relatively high compression capability and some extra tricks for robustness to data losses and server stream switching. High Profile (HiP): The primary profile for broadcast and disc storage applications, particularly for high-definition television applications (this is the profile adopted into HD DVD and Blu-ray Disc, for example). High 10 Profile (Hi10P): Going beyond today's mainstream consumer product capabilities, this profile builds on top of the High Profile adding support for up to 10 bits per sample of decoded picture precision. High 4:2:2 Profile (Hi422P): Primarily targeting professional applications that use interlaced video, this profile builds on top of the High 10 Profile adding support for the 4:2:2 chroma sampling format while using up to 10 bits per sample of decoded picture precision. B.2.2. X.264 (r628) There are different implementations of H.264/AVC in the market. Open source X264 was chosen because it combines encoding efficiency with time efficiency. Even though it is a free encoder, it shows consistent performance and it is among the best ones available. Comparisons between H.264/AVC encoders available can be found in (10) and (11). The MeGUI interface was used for the encodings. A screenshot of MeGUI can be seen in Figure B.2. Figure B.2: MeGUI interface for X264 encoder The settings used for X264 are given in screenshots in Figure B.3. 57

66 PR-TN 2007/00338 Unclassified 58 Koninklijke Philips Electronics N.V. 2007

67 Unclassified PR-TN 2007/00338 Figure B.3: X264 configuration in screenshots B.2.3. JM 12.2 JM is the reference H.264/AVC encoder/decoder. Until recently bitrate control for encoding interlaced content was not working. In version 12 this problem was fixed and it is working properly. Both X264 and JM were tested in encoding some interlaced content. Unfortunately, decoding problems were met with X264 and JM was chosen for the simulation. The settings used for JM 12.2 are the default ones with some changes that are mentioned in Table B.6. Table B.6: Changes in the default settings of JM 12.2 for 12 Mbps encodings LevelIDC 41 IntraPeriod 25 QPISlice 35 QPPSlice 38 FrameSkip 2 NumberReferenceFrames 2 NumberBFrames 2 BRefPicQPOffset 0 ReferenceReorder 0 PocMemoryManagement 0 BiPredMESubPel 1 PicInterlace 1 WeightedBiprediction 1 RDOptimization 0 RestrictRefFrames 1 RateControlEnable 1 InitialQP 35 BasicUnit 16 RCUpdateMode 2 AdaptiveRounding 0 SearchMode 2 B.3. Scaling Due to the need to test sequences in different resolution formats, a scaler is an important component of the video chain. The scaler is used for down-scaling to produce lower resolution sequences from the original one, which has a resolution of 3840 x From this original format, 1080p, 720p, and all different intermediate resolutions used in the simulations are produced. Moreover, up-scaling is needed to increase the resolution of an existing video sequence. In order to evaluate the resolution formats used in the simulation, all of them were converted to 1920 x

68 PR-TN 2007/00338 Unclassified progressive. Therefore, it is important to have the best possible scaler to avoid interference to the results from the performance of the scaler. B.3.1. Scalers The selection process includes a comparison between the candidate scalers. In our experiments three different up-converters were available: pts scale, eco Pixel Plus and Avisynth Lanczos4Resize. PTS scale: PTS is a library of tools for image and video processing developed by Philips Research. It is a toolbox intended only for internal use only. One of its functions is scaling, which is being used here. Eco PixelPlus: Eco PixelPlus is a low cost implementation of the Philips PixelPlus algorithm (12). Avisynth Lanczos filter: Avisynth Lanczos4Resize uses the Mitchell-Netravali two-part cubic filtering function depicted below (13)., where and, are constants. The parameters and can be used to adjust the properties of the cubic, they are sometimes referred to as "blurring" and "ringing" respectively. You have to set, for the numerically most accurate filter. For different values of those constants we can design different filters. For the Lanczos filter, B=0 and C= are used, which results in some sharpening. As its name suggests, Lanczos4Resize is a 4-tap filter. For upscaling, the filter is sized such that the entire equation falls across 4 input samples, making it a 4-tap filter. For downscaling, the equation is sized so it will fall across 4 destination samples, which are spaced at wider intervals than the source samples. Thus the total number of taps you need for downscaling is the downscaling ratio multiplied by the number of lobes (thus Tx downscaling and LanczoskResize results in T*2*k taps). And practically, one needs to round that up to the next even integer. For upscaling, it's always 4 taps.(14) B.3.2. Scaling test An informal test was performed between the three scalers. For this test the SVT material was used. The sequences were downscaled from 1080p to 720p, and subsequently upscaled again to 1080p. The products were evaluated objectively and subjectively. 60 Koninklijke Philips Electronics N.V. 2007

69 Unclassified PR-TN 2007/00338 PTS Scale produced a subpixel phase shift which resulted in very bad PSNR values and was excluded. Eco PixelPlus produced a lot of alias and was also excluded. Avisynth Lanczos filter produced sharp images with high PSNR values and without alias, and was, therefore, chosen for the experiments. B.4. De-interlacing De-interlacing doubles the vertical-temporal sampling frequency of a video sequence. This is not, however, a straightforward procedure. Complications are introduced by two different phenomena. Interlaced sampling is causing the creation of repeat spectrum. Removing all repeat spectra is not easy to achieve due to the practical problem that there is no filtering after the sampling that the TV screen is producing. Moreover, the human vision system has the tendency to track moving objects, translating them to stationary at the retina of the eye. Consequently, removing high frequencies, which seem irrelevant, can produce motion blurring and deteriorate quality. De-interlacing is a crucial component of the video chain. It allows the conversion of interlaced sequences for use with progressive displays, like LCD, Plasma, and PDP. Its result can affect significantly the outcome of the whole video chain. De-interlacing can produce its own artifacts, added to the compression artifacts. Researchers have proposed numerous de-interlacing algorithms. Linear and non-linear methods have been suggested, with or without motion compensation. For a fair comparison between a progressive and an interlaced video chain, it is necessary to ensure that the best possible de-interlacer is employed. Therefore, a deinterlacing test was made to decide upon the one that will be used for this project. B.4.1. De-interlacing test De-interlacing is exploiting the spatial and temporal correlation in a video sequence to interpolate the data content that was removed during interlacing. Methods using spatial correlation give good results in stationary videos, while methods using temporal can give less artifacts in high motion videos. Most de-interlacers are using a combination of those methods. Two different de-interlacers were available by Philips Research Laboratories, namely MBVP and GST. MBVP is NXP s latest generation of motion adaptive de-interlacers, which includes features like edge adaptivity and local still detection. For our tests, we used MBVP in the so called 4-field mode with local still detection enabled. It has two different modes; Motion Adaptive (MA) and Motion Compensation (MC) mode(15). On the other side, GST is based on a further generalization of the generalized sampling theorem, to design vector-adaptive inseparable 2D filters, which use samples from the current and the motion compensated previous field that are not available for all vectors on a vertical line. The resulting inseparable filters give a better interpolation quality at a given number of input pixels. The algorithm can be made robust against the sensitivity to inaccurate motion vectors(16). 61

70 PR-TN 2007/00338 Unclassified The testing configuration uses as input the 1080i instances of the input sequences. MBVP was tested both in Motion Adaptive (MA) and Motion Compensation (MC) mode, against GST. The testing configuration is depicted in Figure B.4. mbvp MA mode 1080p 1080i Source MPEG 2 Encode/Decode mbvp MC mode 1080p GST 1080p Figure B.4: De-interlacing test configuration The de-interlacers gave 1080p instances of the sequences. The subjective evaluation of the produced sequences was done on a 1080p 46 TFT-LCD panel. GST produced more interlacing artifacts than both MBVP modes. Motion Adaptive and Motion Compensation modes performed similarly, with Motion Compensation mode performing slightly better. Since quality is our main goal, MC mode was chosen, even though it is computationally more intensive than MA mode. Therefore MBVP in Motion Compensation mode was chosen for use in all our tests. 62 Koninklijke Philips Electronics N.V. 2007

Unclassified PR-TN 2007/00338 Appendix C. Reference video sequences The material used for these tests came from the SVT (Sveriges Television AB) High Definition Multi Format Test Set. C.1.

71 Unclassified PR-TN 2007/00338 Appendix C. Reference video sequences The material used for these tests came from the SVT (Sveriges Television AB) High Definition Multi Format Test Set. C.1. Video content sequences The sequences used for this test were picked from the demanding, but not unduly so, multi-genre TV-program Fairytale (by SVT) mastered in 3840x2160p/50. A list of all the sequences available can be found in Table C.1. Table C.1: List of sequences Name CrowdRun ParkJoy DucksTakeOff InToTree OldTownCross Coding Difficulty Duration Difficult Difficult Difficult Medium Medium 10 seconds 10 seconds 10 seconds 10 seconds 10 seconds Figure C.1: Screenshot from sequence "CrowdRun" 63

72 PR-TN 2007/00338 Unclassified Figure C.2: Screenshot from sequence "ParkJoy" Figure C.3: Screenshot from sequence "DucksTakeOff" 64 Koninklijke Philips Electronics N.V. 2007

73 Unclassified PR-TN 2007/00338 Figure C.4: Screenshot from sequence "InToTree" Figure C.5: Screenshot from sequence "OldTownCross" 65

74 PR-TN 2007/00338 Unclassified The data itself comes in various different resolutions which are widely used all in 50 Hz motion portrayal, namely 2160p, 1080p, 1080i, 720p, and 576i. Lower resolutions were gained by filtering the 2160p/50 master. Details on the output formats can be seen in Table C.2. Table C.2: Output formats of the sequences. Width Height Scanning FPS Comments Progressive 50 Master Progressive Progressive Interlaced 25 Top field first Interlaced 25 Top field first, 16:9 'anamorphic' At all resolutions the bit depth is full interval 16 bits per (RGB) colour plane to obtain the very high quality of the original shots. Technical and production details are depicted in Table C.3. Table C.3: Technical details for the sequences. Production details Time of original on-location production October (Post production during 2005). Producer Camera Filming Speed Mirror shutter Lenses Film Stocks Transfer Scanner Post processing Post-processing software Sveriges Television AB (SVT), SVT Technology/R&D. Lars Haglund, ARRI ArriFlex 765 System for 65mm, 5 perf, film 50 fps ( fps for slow-motion) 180 degrees resulting in 1/100 second exposure time Zeiss/ArriFlex prime lenses (30-700mm) Exterior: Kodak Vision, 250D Interior: Kodak Vision2, 500T Filmlight 65mm NorthLight film scanner Apple Shake Bit depth per color plane Full interval 16 bit (linear), no head room or foot room Transfer and chromaticity parameters ITU-R BT Data format sgi16 (interleaved RGB), one file per frame Filtering for down-sampling Sinc filter (as in Apple Shake) For interlacing to 1080i, every second 2164p-frame was shifted vertically two lines downwards. After deleting the first two and last two lines in the frames that were shifted (and the last four lines for the frames that were not shifted) to get 2160 lines again, each frame was filtered to 540 lines by line averaging (using Shake s Box Filter). Horizontal filtering from 3840 to 1920 columns was performed using Shake s Sinc Filter to benefit in perceived sharpness from the over-sampled master. The two 540-line fields where then weaved into one single 1080-line interlaced frame. This process resembles the process in any video camera performing interlace in the basic 66 Koninklijke Philips Electronics N.V. 2007

75 Unclassified PR-TN 2007/00338 default Field Integration Mode i.e. like a 2160 line video camera sensor reading out the average of the sensor s line to Field 1; line to Field 2; line to Field 1 etc. The data presented here for the reference sequences used has been taken from (17). For further details on the production of the sequences please refer to it. C.2. Video Sequences Processing The original video sequences were captured in 2160p format with 16bit color depth. SVT is also providing instances of the sequences with lower resolution (1080p, 1080i, 720p) and color depth (10bit). These versions were produced with the methodology described in (17). From the beginning of the project we tried to repeat the production of these instances of the sequences. After the tests we performed with the scaler and the de-interlacer we decided to create our own versions and compare it to the instances provided by SVT. The 1080p instance provided by SVT was used as reference material. Using PTS, the internal video processing created by Philips, the color depth was decreased to 8bits per component. The color space was also converted from RGB to YUV422, to create an instance of the sequence that is suitable for broadcasting. First we created the downscaled instances of the reference material. Avisynth Lanczos4Resize was used to make a 720p version of the material. The comparison between the Avisynth 720p and the SVT 720p showed that the Avisynth 720p was a lot sharper. The slight sharpening performed by the Lanczos4 filter was compensating for the blurriness that the downscaling filter introduces. After getting this result we decided to use Avisynth to create the sequence instances that were used in the whole project. Subsequently we created a 1080i instance of the material. The PTS library was used to interlace the 1080p instance considered as reference. PTS interlace with the filtering option activated was utilized. In order to compare the SVT instance of 1080i with the PTS 1080i, we used the MBVP de-interlacer to convert it back to the progressive format. The results were compared visually in a split-screen, revealing that the PTS 1080i was sharper. Even though PTS 1080i was sharper, it didn t introduce any extra aliasing when de-interlaced. In addition to the subjective evaluation we performed an objective comparison in terms of Peak Signal to Noise Ratio. The results for the uncompressed sequences after interlacing and de-interlacing can be seen in figure C.6. 67

The two instances were encoded using the MPEG 2 encoder and then decoded and de-interlaced. Subsequently, they were again compared in terms of PSNR.

7. Figure C.7: PSNR comparison between PTS and SVT interlacers with MPEG 2 encoding at 18Mbps From Figure C.

76 PR-TN 2007/00338 Unclassified Figure C.6: PSNR comparison between PTS and SVT interlacers The next step was to compare the two interlaced versions in the interlaced broadcasting chain. The two instances were encoded using the MPEG 2 encoder and then decoded and de-interlaced. Subsequently, they were again compared in terms of PSNR. The results for an encoding bitrate of 18 Mbps can be found in Figure C.7. Figure C.7: PSNR comparison between PTS and SVT interlacers with MPEG 2 encoding at 18Mbps From Figure C.6 we can see that we have a difference of 3 to 4 db in the uncompressed instances of the input material. After encoding, some of this gain is lost due to compression. Nevertheless, there is a difference of 1 db (as seen in Figure C.7) even after compression. Our tests showed that at lower bitrates this advantage becomes smaller but it does not cease to exist. It is crucial, therefore, to pay attention to the set 68 Koninklijke Philips Electronics N.V. 2007

hdtv (high Definition television) and video surveillance

hdtv (high Definition television) and video surveillance introduction The TV market is moving rapidly towards high-definition television, HDTV. This change brings truly remarkable improvements in image