Enhanced Frame Buffer Management for HEVC Encoders and Decoders

Size: px
Start display at page:

Download "Enhanced Frame Buffer Management for HEVC Encoders and Decoders"

Transcription

1 Enhanced Frame Buffer Management for HEVC Encoders and Decoders BY ALBERTO MANNARI B.S., Politecnico di Torino, Turin, Italy, 2013 THESIS Submitted as partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering in the Graduate College of the University of Illinois at Chicago, 2016 Chicago, Illinois Defense Committee: Dan Schonfeld, Chair and Advisor Wenjing Rao Maurizio Martina, Politecnico di Torino

2 ACKNOWLEDGMENTS I want to thank all the people that have been part of my life because they shaped the person that I am and so they are part of my success today. First of all I want to thank my parents, that despite of all the difficulties have always been strong and supported me. I needed many years to realize they gave me the best gift possible, made of love, principles, and values. They created the basis to let me grow and think independently, stimulating my curiosity and broadening my knowledge through reading and traveling. They supported my decision to travel abroad under all aspects, even though I know it has been difficult to look at me taking that flight to the USA. I want to thank my sister, because she has always been present in my life. She is also one of the best friends of mine and a person I know I will always be able to rely on. We have always been a reference for each other, silently trying to do better while admiring the success of the other. Despite of the thousands of kilometers that can separate us we are one of the closest siblings that I know. Thank you because you always push me to do more and better. A special thank to those friends who have always been close to me since our childhood, in particular Chiara, Matteo, and Matteo. You are all part of my family and I can remember at least one of you at all the most important events of my life. To be there when it matters is the real value of friendship. I also want to thank those old friends that I met along the way, such as Davide, Andrea, Michela, and all the other amici del mare. ii

3 ACKNOWLEDGMENTS (continued) Torino made me meet a new friend that I already know will last, Andrea. Thank you and to all my other friends in Torino such as Giuditta, Laura and the other Sconosciuti Thanks to the people that I met through this experience abroad, both before and during it. Eric, Ruggero and Paolo had a fundamental part in making this experience unforgettable, together with all my international and American friends. You are the newest friends that I made, but no matter how long we have been knowing each other or where we will be these connections will not be lost. Last but not the least I want to thank my advisors, for giving me the chance to work on an interesting topic, stimulating my curiosity and guiding me to this point. AM iii

4 TABLE OF CONTENTS CHAPTER PAGE 1 INTRODUCTION Background Overview on Data Compression Peculiarities of Video Compression Motivation The Challenge on Hardware Decoded Picture Buffer Contributions Organization VIDEO COMPRESSION Overview Encoder Decoder HEVC Standard Levels, Profiles and Tiers Data Structure HEVC Algorithms Motion Estimation Residuals Coding VIDEO HARDWARE HEVC Encoders Architecture Examples HEVC Decoders Architecture Example COMPRESSED BUFFER DATA MANAGEMENT Motivation Compressed Buffer Data Early Approach The New Approach General Organization of the Algorithm ADPCM Truncated Bit Packaging Adaptive Compression Ratio Storage Analysis iv

5 TABLE OF CONTENTS (continued) CHAPTER PAGE 4.4 Computational Analysis Encoding Decoding COMPUTER EXPERIMENTS Overview Matlab Simulations Results Discussion CONCLUSIONS AND FUTURE WORKS Conclusions Future Works APPENDIX CITED LITERATURE VITA v

6 LIST OF TABLES TABLE PAGE I HEVC LEVELS DEFINITION FOR MAIN PROFILE II BASKETBALL DRIVE 3 BITS PER RESIDUAL CR III BASKETBALL DRIVE 2 BITS PER RESIDUAL CR IV KIMONO 2 BITS PER RESIDUAL CR V KIMONO 2 BITS PER RESIDUAL CR VI TRAFFIC 2 BITS PER RESIDUAL CR VII COMPARISON OF DIFFERENT BLOCK SIZES vi

7 LIST OF FIGURES FIGURE PAGE 1 Typical video encoder scheme, adapted from (1, p. 69) Typical video decoder scheme, adapted from (1, p. 69) Example of group of pictures (GOP) IBBPBBPBB Serpentine paths for DPCM Hardware optimized paths for DPCM Classical buffer memory organization Proposed buffer memory organization vii

8 LIST OF ABBREVIATIONS CPB CR CTB CTU DBF (I)DCT DPB (S)DRAM DPCM DST FPGA fps GOP HDR HEVC IC Coded Picture Buffer Compression Ratio Coding Tree Block Coding Tree Unit DeBlocking Filter (Inverse) Discrete Cosine Transform Decoded Picture Buffer (Synchronous) Dynamic Random Access Memory Differential Pulse Code Modulation Discrete Sine Transform Field Programmable Logic Array Frames per Second Group Of Pictures High Dynamic Range High Efficiency Video Coding Integrated Circuit LSB Least Significant Bit viii

9 LIST OF ABBREVIATIONS (continued) ME MSB NAL NSQT PSNR PU RExt RGB SAO SoC SRAM TB TBP UIC VLC YCbCr Motion Estimation Most Significant Bit Network Adaptation Layer Non-Square Transform Peak Signal-to-Noise Ratio Prediction Unit Range Extensions Red, Green, Blue (color space) Sample Adaptive Offset (filter) System on Chip Static Random Access Memory Transform Block Truncated Bit Packaging University of Illinois at Chicago Variable-Length Code Luminance, Chroma: Blue, Chroma: Red (color space) ix

10 SUMMARY Huge quantities of data are continuously transmitted across the world to deliver news, data or simply something to watch on television. The amount of information transmitted everyday is exponentially growing, and speaking about multimedia content it is important to take into consideration the increase of quality too. Cameras are becoming more and more popular and now almost every phone embeds one so that everybody can easily take dozens of pictures or videos everyday, ready to be shared through messaging services or social networks. These cameras are able to record videos up to FullHD resolution or even higher, creating hours and hours of sequences that can quickly reach the size of many GigaBytes. The constant growth of quality and size creates new challenges such as finding efficient ways to store and transmit them. Image and video compression are techniques that are part of our everyday life, even if most of us do not even notice it, allowing delivery and retrieval of incredibly large amount of data even from smartphones on a mobile connection. In order to achieve such a seamless integration, hardware and software have been continuously improved by bringing new encoding standards and dedicated hardware solutions. This thesis is going to focus on the latest video coding standard High Efficiency Video Coding (HEVC or also H.265), which is taking the place of the previous standard H.264. This new standard allows even higher resolutions and frame rates, up to 8k 120 fps, and this of course requires great computational resources if it needs to be done in real-time. Devices with limited power and computational resources such as smartphones or televisions struggle to x

11 SUMMARY (continued) seamlessly play or capture videos coded compliantly with that standard. New solutions need to be implemented to help hardware designers. The Decoded Picture Buffer (DPB) is a memory location where encoders and decoders need to store uncompressed frames during the execution of their operations. With the highest HEVC level the DPB can be larger than 200 MB and if that memory needs to be embedded in a hardware encoder or decoder it will be very expensive and a big power drain. Since that memory also needs to be accessed thousands of times for every new compressed frame, the power consumption due to the read and write operations will be very high. The aim of this research is to find a way to reduce the memory requirement and to reduce its power consumption by limiting the number of reading and writing operations. Due to the high throughput imposed by HD and UltraHD videos it is also important to keep the solution computationally simple and hardware-friendly so that its overhead will not affect the system performance. The proposed solution implements a customized version of adaptive pulse code modulation (ADPCM), adapted to exploit hardware parallelization. Truncated bit packing (TBP) encoding system has been chosen to enable parallelism and guarantee a fixed compression ratio. TBP has been modified to solve some of its limitations such as its reduced flexibility compared to other solutions. A new organization of the memory also allows adaptive compression ratio (CR) to be used. This new feature is important to overcome some limitations of the completely fixed CR, without losing direct access to the desired block. xi

12 SUMMARY (continued) Results demonstrate how the proposed solution outperforms the reference algorithm by offering better features and higher quality. The hardware required for implementation is one of the simplest in literature, but the performance of the system will not be affected thanks to a structure tailored to the most efficient exploitation of the implemented resources. xii

13 CHAPTER 1 INTRODUCTION Few years ago the only way to have access to internet was using a computer, while today we can access it using our phone, television or car, and even appliances are being developed to be internet-enabled too. The type of data being transferred changed too, from simple text to complex interactive pages with support to many different multimedia contents. Among those, videos have experienced the biggest changes during the past years, with increasing resolutions and cheaper and more accessible ways to record them. Everyone can now take videos in any moment thanks to smartphones, and then easily share them through different social networks. The fundamental factors for keeping videos as accessible as we use today are that videos need to be: easy to store easy to transfer or stream and video compression is the key for maintaining these two characteristics. To reduce the size of videos, with an acceptable quality loss, is the only way in which we are able to enjoy FullHD shows broadcast over a normal DSL or cable connection as well as allowing us to store several hours of recordings on a simple flash drive. 1

14 2 1.1 Background Overview on Data Compression Data compression is divided into two major parts: encoding and decoding. Encoding It means applying one or more transformations to data in order to reduce its size. Information theory [2] tells us that the entropy is a measure of the information that is stored inside a certain number, symbol or pixel, and the smaller the entropy the higher the compression ratio we can achieve. Compression ratio is defined as CompressionRatio = UncompressedData CompressedData and the higher it is the better the algorithm performs. The compression goal is to maximize the entropy of data, in our case reducing the number of bits to the minimum requirements for transferring information. For the purposes of image and video compression, the encoding process is usually divided into two phases: Energy reduction is when pixels are manipulated in order to reduce the information stored in them. This phase outputs new values, and their entropy should be smaller compared to the original data. Even if it is common to find this phase in most compression algorithms, it is not compulsory for the following phase.

15 3 Entropy coding is when the values containing the original numbers (or residuals if we computed them) are substituted by specific symbols with a different bit length trying to maximize the entropy of the message. More common numbers carry less information and so they can be encoded using shorter symbols, while rare values carry more information and will be therefore encoded using longer symbols. This step is fundamental to abandon fixed-dimension data types thus obtaining an overall shorter version of the original image or video but still containing the same information. Decoding It means reversing the operations done during encoding in order to reconstruct the original data. If the compression algorithm was lossless, the decoded data will be identical to the original one while a lossy algorithm will provide us a modified version of the original data. In video broadcasting it is common to use compression algorithms that require less computational resources for decoding the data compared to encoding it, since in most cases the video is encoded in one centralized source that can be easily made powerful enough to handle it, while the video needs to be decoded on all the receivers and there is the possibility that the computational resources will be limited Peculiarities of Video Compression Once data is compressed it will be much easier to store or broadcast it, and the space saving can be dramatic depending on the chosen algorithm as well as on the type of data. The high correlation among pixels in video sequences allows them to be efficiently compressed and

16 4 because of this the research in the field has room to improve and find always better ways to achieve higher compression ratios. Pixels are not completely independent from each other, they tend to be similar to neighboring pixels and therefore it reduces the entropy of the information they carry. There are two types of correlation in videos: spatial correlation temporal correlation. The former is due to the fact that images usually have smooth transitions moving from one pixel to any of the adjacent pixels, especially if they are real life images. The latter is due to the fact that frames in a video sequence are not unrelated images but subsequent pictures of the same scene. Apart from particular cases, two temporally consecutive frames share most of the information, often containing a slightly modified copy of the other frame shifted in some direction. If the content of a pixel is easily predictable that means it is carrying a small amount of information and therefore only needs few bits to be encoded. An algorithm that exploits those two kinds of correlation can easily reach very high compression ratios, and indeed many people and companies have focused on the research for always more efficient algorithms. Several approaches and algorithms have been used and some of them became standards, commonly used by the majority of us on a daily basis. The continuous increase of the number of videos recorded every day and their growing resolution push the development of new standards that are able to further reduce the bitrate, but still maintaining a comparable quality. To achieve

17 5 this goal new standards implement more complex algorithms that can obtain better results, but their implementation has an higher computational cost. If the algorithms are becoming more complex, new and more powerful hardware architectures and solutions are needed to execute them. This can be challenging if there are strict design constraints, as explained in grater detail in the next section. 1.2 Motivation Standards are built on compression algorithms and the more efficient they become the more complex they are. Smarter and more thorough search algorithms can find pixels with higher correlation, while more complex encoding systems can even further reduce the number of bits required to store a certain element. The number and the complexity of operations is rising very quickly therefore triggering the development of new and more powerful hardware that will be able to execute those procedures compliantly with the constraints set by the specific application The Challenge on Hardware Depending on the case, constraints can be very different but they can all put huge limitations on the hardware design. Here are some of the most common constraints: Power is not an unlimited resource and if the device is battery powered it must meet certain criteria in order to keep a desired battery duration. This could impose limitations on the complexity of the architecture as well as on the number and type of components, therefore limiting the performance.

18 6 Area can be limited, especially if the design under discussion needs to be embedded in another integrated circuit or if the device itself has size constraints. Area is very expensive in integrated circuits so its limitation can be fundamental in cost driven designs. For power driven designs, area is a key factor again because power consumption is strictly related. Cost is almost always an aspect that needs to be minimized. As mentioned before, cost limitations can impose boundaries on the area but also on other aspects such as the quality of components and their number. These constraints, together with the previous ones, often limit performance. Performance can be an important requirement for a product design. It is usually set by the specific application and specified as a minimum number of frames that need to be encoded or decoded per second. Performance constraints are usually very strict when custom hardware is required for a design because it means that a general purpose processor executing the software encoder or decoder is not fast enough. A specific performance requirement for a system is real-time, signifying that its speed must be greater or equal to the source of the data that is decoded or encoded, sometimes including limitations on latency too. The list is not exhaustive and the different possible constraints are clearly not independent from each other, meaning that one can imply many others, and that they can also be present in any combination. It is obvious how performance requirements are pushing against all the other constraints, asking designers to find the best balance and trade off between all these aspects as well as finding new solutions and architectures with better overall characteristics.

19 7 Every new standard allows for higher resolutions and frame rates compared to previous ones, setting high performance requirements for new architecture designs. The complexity of the algorithms implemented by new standards is continuously increasing too, therefore overstressing certain stages of the encoding or decoding chain. Among all of them, this thesis project is focused on the decoded picture buffer (DPB) Decoded Picture Buffer As better explained in Section 2.3, the DPB is a memory that needs to store an uncompressed version of the frames found in the reference list of the currently encoded or decoded frame. This list contains the frames that can be used as a reference for motion compensation for a particular frame and so all the frames listed in there need to be available in uncompressed form and easily accessible. For these reasons, every new frame that has been decoded is stored in the DPB if the following frames will use it as a reference, and since the frames are stored uncompressed it can require an enormous memory space. HEVC allows up to 16 frames in the DPB and if the resolution is sufficiently high the buffer may need to be capable of storing several MegaBytes. Memories that can be embedded in integrated circuits such as SRAMs can be expensive and require a large area if their size grows excessively, while the use of external DRAMs can cause lower performance and higher power consumption. The goal of this thesis is to find a way to reduce the amount of memory required by the DPB, so that it will be possible to alleviate the pressure on that component due to the design constraints that certain applications impose.

20 8 Applying compression on the DPB can reduce the amount of data that we need to store, read, and write at each operation and this can bring direct and indirect advantages on many different aspects. The required memory can be less. Depending on the chosen compression algorithm, it is possible to have a fixed or variable compression ratio that also means a fixed or variable memory requirement. In the case that compression ratio is fixed, a smaller memory can be installed therefore saving area and cost as well as simplifying the design process. In the case of variable compression ratio there could not be any upper bound for the memory requirement therefore forcing the designer to install a memory as big as it would be without compression. The number of read and write operations can be reduced. If the data size is smaller, less read and write operations are required to store or retrieve the same number of pixels, increasing performance and reducing bus congestion if the memory is connected on a shared bus. The indirect advantage that can be achieved as a consequence of the two previous improvements is the reduction of power consumption. If the installed memory is smaller it will consume less power, but also in the case of a memory as capable as the original one there can be some energy saving by simply turning off the unused banks, depending on the achieved compression ratio at a particular time. Reducing the number of memory accesses is also a good way to decrease the power consumption because memory opera-

21 9 tions require more energy than normal computational cycles, especially if the memory is off-chip. Considering all these improvements together it is clear that compressing the DPB can help some of the problems that hardware designers need to face when studying new architectures for HEVC encoders and decoders, especially if targeted for high resolution or low power systems. 1.3 Contributions Many previous works already focused on DPB recompression and many solutions can be found in literature. The issues that need to be solved are the memory size requirement and the number of accesses to memory while keeping the computational effort as low as possible. Most of the previous solutions address one or more of this aspects, but hardly ever the focus was on all of them at the same time. The aim of this thesis project is to propose a recompression algorithm that can be a valid solution for all the mentioned problems without ignoring any aspect. The solution will also be developed keeping in mind the fact that the algorithm will be implemented in hardware, so its structure will be hardware-friendly and tailored to best exploit the resources. 1.4 Organization After this introduction on data compression in general and how it is applied to videos, this thesis is going to give more detail on video compression in Chapter 2 focusing on: 1. an overview on video compression and its evolution 2. the HEVC standard

22 10 3. the HEVC implementation through its algorithms. The following Chapter 3 will instead discuss the hardware architectures that implement the algorithms mentioned above, marking the distinction and the differences between: 1. encoders and 2. decoders with a more detailed analysis of the motion estimation (ME) module. This thesis is focused on solving the stress introduced on the DPB by the HEVC standard with a new DPB compression algorithm, and the topic will be covered in Chapter 4 discussing: 1. motivations that led to choose the suggested solution 2. the proposed buffer compression algorithm 3. its computational cost and possible hardware implementations 4. its storage requirement. In Chapter 5 the idea is supported by: 1. validating the approach through Matlab simulations 2. estimating its performance in comparison with another reference solution. In Appendix A the code is available for the reader to have a better understanding of the implementation. The thesis will conclude in Chapter 6 with a summary of the activity and final thoughts as well as ideas for future developments.

23 CHAPTER 2 VIDEO COMPRESSION The video compression field has had huge improvements during the last decades, driven by a fast growing entertainment industry. More and more powerful hardware is readily available, enabling any type of device to easily encode and decode videos using very complex yet efficient algorithms. Thanks to more powerful means, researchers have more room for developing smarter and better strategies to achieve higher compression ratios while still maintaining the same perceived quality. This chapter is going to start with a general overview about video compression and then the focus will move on HEVC in particular, giving details on its structure and the algorithms chosen for the different stages. 2.1 Overview In video compression the information is reduced in size in order to be more easily stored or transmitted. That requires everyone creating or accessing the video to agree on how the information is compressed because without knowing how the video has been compressed it will be difficult to recover the original information. This is the purpose of video compression standards as MPEG-2, MPEG-4, H.264 and the new HEVC. The high compression ratios achieved on videos is based on the fact that the human eye is not able to see all the details and little variations that images have and so it is possible to discard 11

24 12 some of the information. The two most common techniques used to isolate and, if necessary, discard the less important information are placed at two different stages of the compression chain. The first of the two stages is in the color mapping and sampling phase, while the second stage is in the quantization step, both of them will be discussed in the following sections. The first step before starting any type of compression is to convert the color space from RGB to YCbCr (also YUV). This conversion does not bring any size reduction, but reorganizes the information in order to allow us to discard some of the detail with only little impact on the quality. RGB stands for Red, Green and Blue and if an image is coded using that color space it usually means that the information about each one of those colors is stored using the same amount of bits for every pixel in the picture. YCbCr is instead divided in one luminance (Y) and two chrominance (U,V) values that respectively store the information about the light and two colors, blue and red. While with RGB the data about the light can be computed adding all the 3 colors, with YUV the light value is specified and the third color is computed subtracting the 2 chrominance values from the luminance. Although using the YUV color space does not bring any direct space saving, it allows us to apply sub-sampling without major quality losses due to how the human eye works [3]. We are more sensible to light rather than color so it is more important to preserve the information about the luminance rather than the chrominance. While with RGB it would be impossible to selectively work on color or light, the YUV color space has the light explicitly saved as a separate coefficient. Sub-sampling can therefore be applied on chrominance information only

25 13 preserving the luminance one, and the available color sampling modes become more, commonly 4:4:4, 4:2:2 and 4:2:0. When all the 3 components are present for every pixel then the sampling is called 4:4:4, while with the other schemes the image is sub-sampled. The 4:2:2 scheme has alternated rows (or columns) of pixels with all the 3 components and rows (or columns) with only luminance samples, resulting in one luma value per pixel and a pair of chroma values every 2 pixels. In the 4:2:0 scheme the pair of chroma samples are shared by 4 pixels instead of 2, obtaining an image that needs half the bits compared to the original 4:4:4 one. Despite the great improvement in space requirement, the perceived quality of the image is not degraded too much due to our insensitivity to small differences in color changes. In the following subsections a detailed analysis of the typical stages included in a modern video encoder and decoder is presented, giving the general structure and the type of operations that are involved Encoder To better explain the different stages of a typical video encoder, in Figure 1 we can see a scheme that summarizes the whole process. The main data that needs to be available for proper functionality of the algorithm is the currently encoded frame and the reference frames, and the latter ones are stored in the decoded picture buffer (DPB). If the image is encoded with Inter prediction, it will use information from previously encoded frames so they need to remain available for the encoder. Together with reference frames, configuration parameters and encoding tables need to be available too along

26 14 Figure 1: Typical video encoder scheme, adapted from [1, p. 69] the process, usually set by the user at the operation start or dynamically adjusted during the encoding process. All the major parts of the encoder will now be described. Motion estimation The current frame is divided in blocks and they are compared with previously encoded frames that are present in the DPB as a reference. Each block is compared against other blocks that are temporally and spatially shifted, searching for the best match. There can be different search algorithms as well cost functions to evaluate which block should be considered the best approximation, but the main goal is to minimize the residuals that are computed subtracting the current and the reference block. With the information about temporal and spatial location of the block that best approximate the current one, the encoder can compute a motion vector that will be included in the bit stream to inform the decoder on how to reconstruct the block.

27 15 Motion compensation As explained above, the reference block is subtracted from the current block and the new computed values are called residuals. If the reference block is a good approximation of the current block, then the residuals will be very small values, concentrated around zero. Now that the information carried by these values is greatly reduced, it is possible to pass the residuals to the following stages of the encoder. These operations are lossless so they do not introduce any quality reduction. The following stages of the encoding process will instead introduce some losses and for this reason there is a reconstruction branch at the bottom of Figure 1 that will behave exactly as a decoder in order to reduce the propagation error, as it will be explained later. The information from the reference block found by the motion estimation module is passed to the reconstruction branch too. Transform Coding With the previous stages the energy of pixels in the block has been reduced already, passing from their original value to a residual. The following two stages are transform coding and quantization and working together they will further decrease the information. Wavelet and Discrete Cosine Transform (DCT) are the two most common transform coding techniques, the latter being the one implemented in the recent H.264 and HEVC standards. Their purpose is to change the domain of data decorrelating it and accumulating most of the information in fewer values.

28 16 DCT is a type of transformation that can be done through matrix multiplication and can be inverted (IDCT) to reconstruct the original data. The process is lossless and the residuals will be converted in coefficients, representing different frequencies and in a certain way different levels of detail. The first coefficient represents frequency zero which is the average value of the residuals, while the following coefficients represent higher frequencies that add finer detail. The higher the frequency the finer the detail added but high frequencies also give the smallest contribution to the overall image fidelity. This automatically translates into a rule to define which coefficients are more relevant, with the first and bigger coefficients that are fundamental while the last coefficients are usually small and negligible. DCT therefore reorganizes how the information is distributed and allows the quantization module to obtain many zero and few nonzero values. Quantization The quantization module simply takes all the coefficients from the DCT module and divide them by a factor Q. Q is usually a power of 2, so that the division can be easily implemented as a shift of bits, but even if the operation is very simple it has major effects of quality and efficiency of the encoding process. The purpose of this operation is to trade some detail for space saving, discarding the information stored in the least significant bit(s). The operation is lossy because that information cannot be recovered anymore, but it has the great advantage to reduce many small coefficients to zero. Long sequences of zero values can be easily compressed by run-level encoding and so high space saving can be achieved.

29 17 Quantization is very easily reversible, multiplying the truncated value by Q, operation that can again be done as a left shift. Since the information about the least significant bit(s) has been discarded and lost, and now it is replaced with other bits different from the original. The difference between the original value and the reconstructed one is the error introduced by quantization and cannot be recovered. If color sub-sampling is not counted as part of the encoding process, this is the only lossy operation in the video encoding process and so the value of Q is what sets the quality loss and one of the ways to regulate the compression ratio. Reordering and run-level encoding Now that the coefficients have been quantized there are many more zero values and few non zero values. Since coefficients are still in a square 2-dimensional matrix form they need to be reordered in a 1-dimensional vector in order to be encoded in a bitstream. The way to do it is strongly dependent on the following stage of the process, called run-level encoding, as the encoder tries to achieve the best compression. Run-level is an encoding technique that exploits long runs of zeros compactly representing them. Every non zero value is coded using two values, where the first represents the number of zeros that preceded the nonzero value while the second is the nonzero value itself. In this way long sequences of zeros are very efficiently encoded, but this encoding method can also increase the overall data if there are few zeros in the sequence or if they are not properly grouped together. To maximize the efficiency of the run-level encoder the reordering module needs to scan the 2-D matrix in a way to create a 1-D array with all the nonzero values concentrated at the

30 18 beginning and a long tail of zeros. Looking at how the DCT output is formatted it is possible to notice that the low frequencies coefficients are grouped in the upper left corner while the high frequencies ones are grouped around the lower right corner. Different scan modes can be used but they usually start from (0, 0), the DC coefficient, and they proceed in a serpentine fashion finally reaching the opposite corner. After scanning and run-level encoding the output of the quantization module, now the nonzero values are compactly represented and passed to the entropy encoder. Entropy encoding The aim of this last stage is to take all the information from the previous stages, including coefficients, parameters and modes, and to maximize the entropy of that information reducing it to the minimum number of bits necessary for storing or transmitting it. There are different techniques but most of them rely on variable-length coding, where fixed size symbols as the 8 bits values that we worked with until now are matched with codes of different lengths. How symbols and codes are matched depends on the probability of every symbol, as more common symbols carry less information than rare ones. Symbols that occur often are then encoded with a short code while a rare symbol will need more bits. Using this procedure the total number of bits will be reduced, balancing the entropy and so the amount of information contained in the bitstream. The most efficient ones are Huffman and arithmetic encoding, with the former being the most commonly chosen because of the good trade off between efficiency and speed, while the

31 19 latter is able to get closer to the theoretical minimum bit utilization at the expense of higher computational cost. As an example, Huffman coding requires defining a probability for every input symbol and a code is then attributed to each of them following a tree structure in order to assign longer codes to rare symbols and short codes to the most common values. The tree structure used to assign the codes also ensure the uniqueness of the decoding process, so that there cannot be any misinterpretation of the code even if its length is not known while reading the bitstream. Since the coding tables used to translate symbols to codes and vice versa need to be computed analyzing the distribution of values, it is not efficient to do it every time that an image is encoded. To simplify the operations they are predetermined using sample sequences and then embedded in encoders and decoders and they become part of the coefficients used during the coding process. The bitstream is now ready to be stored or transmitted and its size can be hundreds of times smaller than the original sequence of uncompressed frames. Other operations can have been applied during the process depending on the complexity of the encoder or on the chosen standard, enhancing the final quality of the image that will be decoded from that bitstream. One example is a de-blocking filter, in charge of smoothing the junctions between different blocks so it reduces the artifacts introduced by the lossy encoding process Decoder one. The decoder structure is depicted in Figure 2 and it is simpler and faster than the encoder

32 20 Figure 2: Typical video decoder scheme, adapted from [1, p. 69] The purpose of the decoder is to use the information contained in the bitstream to reconstruct a copy of the original video that is possibly close to the original one, ready to be displayed, modified or re-encoded. Encoder and decoder need to use the same standard to be fully compatible, so that the order of the encoded information as well as the tools and techniques used during the coding process are supported by both. Decoding Process The decoding operation follows the same steps as the encoding one while just inverting the function of every module and reversing the order of modules in the procedure flow. The bitstream is therefore first analyzed by the entropy decoder, which unpacks it converting the codes back to the original symbols using the predefined parameters. The run-level decoder and reordering module expands the recently decoded symbols to recreate a 2-dimensional square matrix that represents the block to be decoded, recreating the zero values and placing the coefficients in the proper order.

33 21 The rescale module is the one in charge of reversing the effect of the quantization parameter Q by multiplying the elements of the matrix. Since most of the time Q is a power of 2, the rescale operation is performed by a simple left shift. In this phase the decoder is introducing an error in the process because the information about the bits that the encoder quantization module discarded are not available anymore and cannot be recovered. After these steps the block matrix is composed by the DCT coefficients that need to be converted back into residuals. The DCT transformation can be reversed into Inverse Discrete Cosine Transform (IDCT) by applying the transposed version of the matrices. IDCT converts the matrix elements back from the frequency domain into residuals that can be used by the motion compensation module to reconstruct the block of pixels. The last stage consists of using the motion vector to load the reference block from one of the frames stored in the DPB and add that information to the residuals in order to rebuild the block. The information that is now stored in the matrix corresponds to the Y, U, or V components of the pixels of the block that has been currently decoded and it is ready to be stored in the DPB. From there it can be displayed and, if required by the following frames, used as a reference. Differences and Common Parts Complexity is the first major difference between encoder and decoder can be easily found by simply comparing Figure 1 and Figure 2. The number of blocks and the structure of the encoder show that more operations are involved in the process, but also the complexity of some common blocks can be different.

34 22 The motion estimation block is one of the most computationally expensive modules inside the encoder because it needs to search for the best approximation of the current block. If the search has a wide radius and a thorough algorithm it will require many accesses to memory and thus long execution times. The result of all these computations is a motion vector, that is used for motion compensation which is then included in the bitstream. The decoder will simply have to read the motion vector then read the reference block pointed by it, and finally use it to reconstruct the current block. The difference is similar in other modules too, even if smaller. When different operating modes are available the encoder needs to do extra calculations to be able to choose which one to use, increasing the complexity of operations and their execution time. Again, the decoder will simply read the chosen mode from the bitstream and decode the data accordingly. Comparing Figure 1 and Figure 2 again it is possible to recognize part of the decoder scheme inside the encoder, in a branch known as reconstruction. Decoder capabilities are copied inside the encoder in order to minimize the error propagation. If the motion estimation module uses frames from the original video as a reference, that would generate additional quality loss during the process. While decoding, the decoder will not have the original video available and the only source of references is the DPB, that contains the previously decoded frames. The encoder must use the same information as reference so after encoding every block is included in the bitstream but also immediately decoded in order to populate the DPB. It will be available as reference for the following blocks, closing this feedback loop. In case of lossless compression this

35 23 solution would not be necessary, but since quantization introduces modifications to the video sequence then the encoder needs to implement the decoding feedback explained above. 2.2 HEVC Standard HEVC stands for High Efficiency Video Coding and it aims to achieve even higher compression ratios without major changes in the encoding or decoding time compared to the previous standard H.264. Indeed as in [4] we can see how performance has changed from the previous standard to this one, with great efficiency improvements and little overhead. Using the highest profile and all the available tools HEVC can provide an increase of 36% in compression compared to H.264 maintaining an equal peak signal-to-noise ratio (PSNR) [5] and if only subjective quality is taken into consideration then the size reduction can be greater than 50% [6]. These improvements are brought by more complex and thorough algorithm but used in an efficient manner such that the time needed for encoding is about 10% greater than with H.264 while for decoding the execution time increases by 60% [7]. The standard mainly defines the layout of the bitstream and the structure and tools that decoders need to have in order to ensure that different decoders will always produce the same output given the same encoded video. On the other hand, the lack of rules about encoders give flexibility to developers and designers so that the best trade off between performance and efficiency can be chosen depending on the specific application. Below is an analysis of the structure of a HEVC encoder and its available tools, starting from its organization and setup to analyzing the implemented algorithms in Section 2.3.

36 Levels, Profiles and Tiers Levels and profiles are two different and cooperative ways to define and limit the requirements to decode a certain video sequence. This is very important to ensure conformance and to guarantee the communication between sources and different decoders. Tiers are also defined for a finer classification and below is a description of all of them. Profile A profile defines the algorithms and tools that are available during the encoding process, limiting the encoder capabilities. An encoder does not need to use all the available tools included with the chosen profile, but the importance to define the profile is that any decoder conforming to that will have at least the same tools available, therefore it will be able to decode the video sequence successfully. Version 1 of HEVC only has three profiles: Main Main 10 Main Still Picture Since HEVC has been designed for high quality purposes, considering contents from professional cameras, lossless compression and coding of screen content, some Range Extensions (RExt) of HEVC have been approved and are part of version 2. The extensions include new tools, available under 21 new profiles [8]: Monochrome, Monochrome 12, Monochrome 16

37 25 Main 12 Main 4:2:2 10, Main 4:2:2 12 Main 4:4:4, Main 4:4:4 10, Main 4:4:4 12 Monochrome 12 Intra, Monochrome 16 Intra Main 12 Intra Main 4:2:2 10 Intra, Main 4:2:2 12 Intra Main 4:4:4 Intra, Main 4:4:4 10 Intra, Main 4:4:4 12 Intra, Main 4:4:4 16 Intra Main 4:4:4 Still Picture, Main 4:4:4 16 Still Picture High Throughput 4:4:4 16 Intra The Main profile is suitable for the most common case of video sequences, with 4:2:0 sampling and 8 bits depth. Main 10 allows an increment of the bit depth from 8 to 10, thus the number of colors are multiplied by 4. The Main Still Picture profile is instead a subset of the Main profile and allows a single picture to be encoded using the same constraints as for Main. Since video sequences can have very different characteristics and applications, these 3 profiles are not enough to exploit the full potential of the compression algorithms and adapt them to the different situations. The most relevant features introduced with the new profiles included in version 2 are: Monochrome support allowing only the luma component, and that can be sometimes specified as 4:0:0 chroma sampling (no chroma). This type of sequences are common in

38 26 applications dealing with Magnetic Resonance Imaging or it can be used for compression of alpha channel and depth maps in 3D applications. 12 and 16 bits depth support for increased number of color allowing smother transitions than other profiles, especially for application where High Dynamic Range (HDR) has to be considered. 4:2:2 support is often used in conjunction with 12 bits of depths, as it is important for Ultra High Definition (UHD) broadcast applications in order to maintain a higher quality level. 4:4:4 and High Throughput support tailored for high fidelity applications for professional contents, often associated with 16 bits of depth. The 4:4:4 sampling is also important for screen sharing applications where the sharpness of details is better maintained through a R G B color space. Intra only support allows the video sequence to be encoded using Intra prediction only, while the other profiles can use both Intra and Inter predictions. The combination of all these different features brought to the definition of 21 new profiles. Two scalable extension profiles and one multi-view profile are also part of version 2: Scalable Main, Scalable Main 10 Multiview Main while version 3 could include support for: 3D Main

39 27 Screen-Exteded Main, Screen-Extended Main 10 Screen-Exteded 4:4:4 Main, Screen-Extended 4:4:4 Main 10 Profiles also have a hierarchy, defining which profiles are subgroups of other profiles. The importance of such definition is that it guarantees that a decoder that is conforming to a certain profile will also be able to decode any video encoded with a different profile within the subgroup. Levels and Tiers Levels define limits on the maximum sample rate, the size of the picture, minimum bit rate and compression ratio as well as the dimension of the DPB and coded picture buffer (CBP). Since some applications differ only between CBP size and maximum bitrate requirements, two tiers have been defined: Main Tier, for generic applications High Tier, for the most demanding applications. The standard provides 13 levels [9] as reported in Table I, and the level definition is fundamental to set constrains on the bitstream in order to limit the processing load on the decoder side, together with its memory requirements. Width and height of the pictures are limited in order to avoid problems in decoders due to extreme picture ratios. They have an upper limit that we can call M, defined as: M = 8 Mps where Mps is the maximum luma picture size as stated in Table I.

40 28 TABLE I: HEVC LEVELS DEFINITION FOR MAIN PROFILE Level Max sample rate Max frame size Max bitrate for (luma/s) (luma) Main profile (kbit/s) Main tier High tier 1 552,960 36, ,686, ,880 1, ,372, ,760 3, ,588, ,960 6, ,177, ,040 10, ,846,720 12,000 30,000 2,228, ,693,440 20,000 50, ,386,880 25, , ,773,760 8,912,896 40, , ,069,547,520 60, , ,069,547,520 60, , ,139,095,040 35,651, , , ,278,190, , ,000 The sizes of the two buffers follow two different rules. The CPB capacity is defined by the maximum bitrate, multiplying it by 1 second. This rule is valid for all levels apart from the first one which has a capacity of 350,000 bits. The DPB capacity is instead defined by the number of frames that needs to store and its size can change depending on the encoded picture resolution. Independently from the level, it has a minimum dimension of 6 frames when the picture size is the biggest allowed by the level, while its capacity can increase as the picture size decreases, up to a maximum of 16 frames. The number of frames always includes the currently decoded frame and all the other frames that will be used in the future as reference or that still need to be displayed.

41 29 HEVC can support high resolutions and bitrates. In its highest level, HEVC can support resolutions up to the so called 8k 4k ( ) and even slightly higher, with frame rates of more than 120 fps. If the resolution is lower, the frame rate can reach 300 fps still with a picture size much bigger than FullHD Data Structure Every frame that needs to be encoded is partitioned into coding tree units (CTU), each composed by a luma and two chroma coding tree blocks (CTB). Given the usual case of 4:2:0 color sampling, if the luma CTB dimension is L L samples then the chroma CTBs are L 2 L 2, where L can vary among the values 16, 32, or 64. It is the first time that blocks have reached these dimensions and it is due to the fact that HEVC targets ultra-hd and other very high resolution formats. Larger blocks can enable greater savings in videos with so high pixels density, also reducing the overhead due to signaling the encoding parameters. Each CTB can be recursively partitioned in one or more coding blocks (CB) and can be further partitioned into one or more prediction blocks (PB), basic elements of Inter or Intra picture prediction. Slices The encoder can divide a frame in one or more slices containing many blocks and assign a letter to the slice corresponding to the prediction mode used to encode it. The available slice types are: I slice all the blocks are coded using only Intra prediction, that means all blocks will only use reference blocks contained in the already encoded or decoded part of the current frame

42 30 P slice same as I slice, but some blocks can also be coded using Inter prediction with only one motion vector, that means all blocks will still have one motion vector and one reference block, but this block can also be in another frame among the ones available in the DPB B slice shares similarity with P slice, but some blocks can be coded using Inter prediction with two motion vectors, which means some blocks can have two reference frames and the average of the two will be used as reference for residuals computation. Group of Pictures Another important concept is the group of pictures (GOP), that defines which is the usual encoding sequence of frames and the reference relations between them. First, it is important to note that the encoding order of frames can be different from the display order, giving a second purpose to the DPB. Other than storing frames for future reference, the DPB keeps the decoded frames that still need to be displayed, deleting one only when it has been displayed and no other frame will need it as a reference. An example of GOP is shown in Figure 3 and is usually defined as IBBPBBPBB, where every character represents a frame respectively coded as I, B or P slice. As explained before, I slices use only information from the frame itself and so they never have incoming arrows. Blocks inside P slices can have one motion vector each and one reference frame is used to encode the frame, showing only one incoming arrow. B slices instead use information from two different frames and so they have two incoming arrows. I slices need more space because the accuracy of the prediction is lower than P and B slices due to the smaller number of available reference blocks, but since they do not rely on

43 31 Figure 3: Example of group of pictures (GOP) IBBPBBPBB information from previous frames they are useful for recovering from transmission errors. Some data could indeed be corrupted if an error occurs during the bitstrem transmission and that error can indefinitely propagate in the video sequence if there are no more I slices. 2.3 HEVC Algorithms The algorithms used in the different parts of the coding process are the result of refinements and improvements of the approaches used by the previous standards. New solutions as well as adaptations of widely used methods are tailored to obtain the maximum performance based on how videos and applications are changing over time. Here is an analysis of the tools available for a HEVC encoder Motion Estimation HEVC is a hybrid codec, that means two prediction modes are available: Intra picture and Inter picture prediction. As already explained in Section 2.1.1, the currently encoded block can

44 32 be compared to potential reference blocks in the same frame and in previously encoded frames, searching for the best approximation that will minimize residuals. If the reference block is inside the same frame then the prediction mode is called Intra, while it is called Inter if the reference block is in another frame. Intra Prediction In Intra prediction, the motion vector is encoded using a number that corresponds to an index for a predefined list that can count up to 33 directions. All of them point in directions between 225 to 360 and from 0 to 45, this means that only the upper left half of the possible angles is used because these are the only directions in which there can be already encoded or decoded blocks. The frame is then processed from the top left corner to the bottom right one and so at any point during encoding operations only the upper left half contains data that can be used as a reference. Intra prediction also includes: DC mode, that uses the mean of pixels from left column and top row I PCM mode, that directly codes the pixels without using prediction planar mode, that interpolates and replicates central pixels from the left column and the top row again LM Chroma mode, that predicts chroma samples using the reconstructed information of the already encoded or decoded luma samples.

45 33 Inter Prediction Inter prediction uses information from one or two reference blocks placed in frames different from the currently encoded one. The block can be then partitioned into smaller parts called prediction units (PU) and can be square or rectangular. The motion vector can have 1 4 pel precision for luma and 1 8 pel precision for chroma samples, increasing the possibility to find a good approximation of the current block. 1 2 pel precision means that not only blocks corresponding to real pixels can be used, but the encoder can compute pixels in between existing pixels locally to upscale the resolution of the reference frame using a filtered averaging system. The process can be reiterated, with some modifications, to enable 1 4 and 1 8 pel precision motion vectors. The motion vector is then coded using prediction again, estimating its value from temporally and spatially neighboring blocks and encoding only the difference between the predicted and the actual motion vector. Skip and Merge modes allow instead to infer or copy the motion vector from other blocks so that only the information about which block to use as a reference needs to be sent. Furthermore, both encoder and decoder automatically generate a list of candidate reference blocks, which helps reduce the amount of information needed to encode because only the index needs to be sent Residuals Coding Transform Coding The purpose of transform coding is to translate residuals in a different domain in which the information is not equally distributed, as explained in Section The most used transform for coding residuals is DCT and it can work on blocks of dimension 4 4, 8 8, and

46 Discrete sine transform (DST) [10] is also used in HEVC, for transforming 4 4 luma blocks coded using Intra prediction. To simplify encoders and decoders, only one matrix of dimension is used, and sub-sampled copies are used for all other blocks dimensions. In the case of rectangular PUs, Non-Square Transform (NSQT) is used [4]. Scaling and Quantization Scaling and quantization are done using a technique called uniform reconstruction quantization (URQ) together with a quantization parameter that can have values from 0 to 51 [11]. HEVC also supports the use of scaling matrices depending on the block size in order to have different quantization parameters for every frequency or group of frequencies. In order to reduce the dimension of these matrices, they are defined for 4 4 and 8 8 blocks, while blocks of dimension and will use a 8 8 matrix. In this case every quantization parameter is used for more than one frequency. The DC value always has its own parameter instead. Coefficients reordering The available scan patterns are 3, but the new block partitioning system allows them to be more efficient than ever before. The most commonly used pattern is the diagonal one, which starts from the left top corner and ends at the opposite, in the right bottom corner of the block. The reason why this is the most efficient scan pattern is due to the fact that coefficients have a descending magnitude from the top left corner, where the DC value is stored, to the bottom right part, where there are the higher frequency coefficients. In this way, most of the nonzero values will be grouped together at the beginning of the 1-D vector created by this scan pattern. In case of Intra prediction, horizontal and vertical scan patterns are also available.

47 35 A significant difference from the previous standard is in the block partitioning structure, which allows transform blocks (TB) bigger than 4 4. The HEVC approach is to divide any block that is bigger than 4 4 into 4 4 sub-blocks which are then scanned using the chosen scan pattern. Finally, the 1-D vectors created by scanning the sub-blocks are concatenated following the same order defined by the scan pattern used inside the sub-blocks. This method allows some parallelization and reduction in hardware complexity due to the fact that sub-blocks can be independently scanned at the same time [12]. Entropy Coding The most used entropy coding algorithm is called context adaptive binary arithmetic coding (CABAC), but Golomb Rice and Exponential Golomb are used as well [12]. The CABAC algorithm has two different operating modes, regular and bypass. The regular mode uses the full potential of the algorithm, that relies on probability context models that are updated during the encoding operation. A context modeling stage selects the most appropriate model and then the symbol is coded. The result is finally used to update the model, implementing a feedback system. Since this algorithm is strongly data dependent and highly sequential it is complex to exploit the high capabilities of modern hardware to execute parallel operations. In order to increment the throughput and the parallelization, the encoder can select the bypass mode that is a faster and simpler version. In bypass mode an equiprobable model is assumed and so all the context selection and update operations are skipped, opening the feedback loop and enabling a better exploitation of the available hardware resources.

48 36 HEVC also introduces many other tools to improve efficiency, throughput, parallelization, required area, and computational time. Some of them are mode dependent coefficient scanning, multilevel significance maps, improved significance flag context modeling, last significant coefficient coding, and sign data hiding. An extensive discussion of all these tools is out of the purpose of this thesis. In-Loop Filter This stage is part of the decoding operation only and the main purpose is to reduce the artifacts that the encoding process may have introduced. Since both encoders and decoders execute some decoding operations, this stage is part of both in order to ensure consistency between the available reference data during encoding and decoding. HEVC has two different filters called deblocking filter (DBF) and sample adaptive offset (SAO) filter [11], one executed after the other. The first one is the deblocking filter, that is applied to the samples closest to the boundaries of a PU, a TU or close to the picture boundary. The filter can have 3 different strengths 0, 1, and 2: 0 means no filtering, while 1 and 2 select two different coefficient sets that will be used to compute the filtering on luma samples. Chroma samples can only have 2 filtering modes, no filtering and normal filtering, depending on the value of the strength set before. In particular, 0 means no filtering and both 1 and 2 mean normal chroma filtering. While the deblocking filter is applied to the samples on boundaries of blocks, the SAO filter is applied to all the samples, adding an offset. As for the DBF, there can be three different strengths, where 0 means disabled and 1 and 2 mean band offset and edge offset respectively.

49 37 In band mode the amplitude range of all samples is divided in 4 bands and each of the bands has a different parameter that is transmitted and that can be either positive or negative. If a sample value is in one of the bands, the corresponding offset is added to it and this operation is executed on all the samples in the block. Edge mode is based on gradients computed on the block itself using one of 4 different modes, corresponding to vertical, horizontal, and the two diagonal directions. Each sample is compared with two other values and categorized from 0 to 4 and the offset is added depending on the assigned category. 0 means no offset while categories from 1 to 4 have an offset value each, that is transmitted through a look-up table. Offsets corresponding to categories 3 and 4 are positive, while the other two are negative and the filter generally smooths the block while working in edge mode.

50 CHAPTER 3 VIDEO HARDWARE The new HEVC standard can reach much higher compression ratios without affecting the video quality. This goal has been achieved thanks to updates to algorithms and strategies at different stages of the encoding process, as described before. These improvements come with the drawbacks such as increased computational cost and technology needs to be re-adapted to the new requirements imposed by HEVC. Due to the improvements, blocks can be much larger than before so the motion estimation module needs to read more data in order to find the best reference block. Furthermore, the allowed resolutions are higher and this means more pixels per frame, greatly increasing the number of searches that are required. Subpixel interpolation needs to be executed more often and on larger blocks. All the subsequent modules that analyze residuals and process them need to be updated too. Operations such as DCT, quantization, reordering, and entropy coding now need to process a much higher quantity of data, requiring adapted solutions. Both the amount of data per frame and the maximum framerate allowed has been increased, thus pushing the throughput requirements to new limits. Software solutions can be accepted for offline conversions, especially if it is not necessary to have a very high ratio between quality and bitrate. If encoders and decoders need to exploit all the new tools available in HEVC then software solutions start to not be suitable anymore. 38

51 39 General purpose processors are not optimized for the type of operations required and do not take the advantage of hardware acceleration. Custom solutions have special hardware that has been specifically designed for executing particular operations that would be otherwise too slow if executed on a general purpose processor. These custom designs could be found as a standalone hardware encoders and decoders or as hardware accelerators embedded into larger systems on chip (SoC). Smartphones processors are examples of SoCs that integrate video hardware accelerators. Specialized hardware is fundamental for obtaining high performances and also for lowering the power consumption. Custom hardware designs only implement the logic required to execute the operations of that particular encoding or decoding module, increasing its power efficiency. When not needed it can also be shut down, avoiding any additional power consumption. 3.1 HEVC Encoders The complexity of the operations required by HEVC for the encoding process depends on the target quality of the encoded sequence as well as the tools that the encoder wants to use. If higher bitrates are allowed, the video will be encoded much faster but it will also need more space to be stored since the compression ratio will be high. Lower bitrates mean smaller encoded videos, but at the expense of a quality reduction. The development of new standards includes the improvement of tools available to encoders so that the quality of the video sequence can be better preserved. Encoders compliant with the highest profile have all the tools available and can therefore provide the best ratio between quality and bitrate, exploiting the full potential of the new algorithms.

52 40 The complexity of the new standard has been analyzed in [13]. In the passage from H.264 to HEVC most of the modules have been reviewed and we can define two groups: increased complexity: motion compensation Intra picture prediction transforms reduced complexity: entropy coding deblocking filter Among those, motion compensation, in-loop filtering and entropy encoding are the ones with the most relevant changes. The increased complexity for encoders is due to the availability of many more modes in tools as Intra prediction, and due to new structures like the quadtree. These changes force HEVC encoders to be several times more complex than before, but this was expected during HEVC preparation. HEVC standard has been designed with the awareness of its increased complexity and knowing the latest trends in hardware architectures. In the last decade the quest for higher clock frequencies stopped, while the focus has been on the development of smarter architectures. The processing power is still following Moore s law, which means new processors are twice as powerful as the ones from two years ago. To reach higher frequency limits than the ones we have

53 41 today represents a big challenge. Due to thermal and physical constraints, new architectures attempt to exploit parallelism to increase processing capabilities. Multi-thread and multi-core processors are now the most commonly installed solutions for computers and smartphones and can guarantee higher performances while still maintaining power consumption under control. To exploit parallelism at its best, algorithms and software have substantially changed in the last 10 years and the same is happening for video standards. HEVC design team was aware of this and introduced many changes to improve the parallelism of the encoding and decoding process. Thanks to slices, tiles and wavefronts, encoders and decoders can process different parts of the same image at the same time, increasing the throughput if multiple identical processing units analyze different portions at the same time. Even if slices have been designed mainly for error resilience and for network transmission, they are now used for parallelization purposes too. They are sent through independent network adaptation layer (NAL) units and they can therefore be decoded independently. Prediction cannot go over the boundaries of a slice thus reducing the coding efficiency, but this easily allows parallel processing. Tiles follow the same concept of slices, limiting prediction inside their boundaries. Even if similar to slices, they allow encoders to compose an image from different rectangular sources and encode them independently from each other. Wavefronts are instead rows of CTUs and allow all of them to be encoded or decoded by different threads. Wavefronts do not set prediction boundaries, but the CABAC engine needs to be reinitialized for every row. To improve efficiency,

54 42 the entropy encoder can inherit some of the initialization information from the previous row, speeding up the process and reducing the overhead due to such a fine division Architecture Examples Because of the recent development of the standard, literature examples of architectures for HEVC encoders are not very common yet. In [14] we can have a general idea of the advantages a hardware solution gives compared to a software encoder. Just by using a field programmable logic array (FPGA) the authors of the paper managed to encode FullHD videos ( ) at 60 fps with a bitdepth of 10 bits. The chosen profile is the Main 10 and the level is 6.1. From their results we can see that they obtained great bitrate reduction together with improved image quality, while still operating in real-time. Their design of course needed modifications to the standard, because some of the decision systems in the control logic of the standard model are too complex to allow real-time processing. One example is the rate-distortion optimization algorithm [15], a logic that decides and adapts the coding parameters during the encoding operations. This feedback system is too complex and computationally expensive to be used in real-time and so they substituted it with a much faster sum of absolute differences cost function. Another key point that they needed to adapt is Intra prediction because HEVC allows 35 different modes and an extensive full search is computationally not affordable. Their solution is to use a decision system divided in three different levels, whose fineness increases at each level.

55 43 Motion estimation is the focus of most of the research in the video hardware field, especially when related to HEVC. The broad choice of modes for Intra prediction allows greater compression ratios but hardware architectures need to implement clever methodologies if they want to overcome the long processing time. In [16] a new search algorithm is presented, designed with the goal of optimizing it for hardware implementation. Different solutions were proposed in the past that had to trade performance for precision, but the proposed one still allows sequential processing of blocks together with exact motion vector predictor computations. CU can have different dimensions, ranging from 8 8 to Cost calculations and search patterns are different depending on the block size so the proposed architecture has four independent engines, one for each CU size. The output result of the four engines are then evaluated starting from the smallest to the biggest in order to find the best combination of CU and PU size. The search strategy is adapted too, reducing the inter-dependencies between different stages in order to increase the parallelism and reduce the worst-case hardware requirement. The search is divided into two parts that can work in parallel. The first one checks one block every 8 over a ±64 range looking for changes in the motion patterns, while the second one looks for regular motion in smaller areas as ±7. Thanks to this approach the algorithm checks only 285 candidates instead of 850, reducing the hardware area to one third. The proposed solution also includes a memory sharing system that allows the different engines to use the same local memory space, which is called reference buffer. The reference

56 44 buffer needs to cache the portion of the decoded picture buffer needed for the motion estimation and by merging them they can reduce the memory bandwidth too. To reduce the latency of the off-chip memory, a pre-fetching strategy is implemented together with all the other improvements. The information about the area to fetch from the memory is inferred from the neighboring blocks and is sent to the memory management unit as soon as possible. This operation is done before getting the actual more precise value from the following stages of the process, trading precision for speed. Since the memory access speed could be on the order of hundreds or thousands of cycles, such strategy can save a good amount of time. The results achieved by the solution they propose are about 47 times the maximum bandwidth saving, 16% reduction of the on-chip buffer size and the area is reduced to one third. 3.2 HEVC Decoders Even though the complexity of operations for HEVC encoders has greatly increased, decoders have almost the same requirements as the ones for H.264 [13]. This was an important goal during the HEVC design because any device that wants to be able to reproduce an HEVC video needs to have a software or hardware decoder inside. By requiring device producers to upgrade their hardware in order to decode new videos, the diffusion of the new standard could be limited. Furthermore this would have the side effect to reduce the advance in technology. Because of the relative simplicity of decoders compared to encoders, the first ones are still executable on general purpose processors. As discussed in [13], a good laptop equipped with an optimized version of the software encoder can achieve good bitrates even with high resolution videos at frame rates up to 60fps. From their analysis it is possible to notice that about half of

57 45 the time is usually spent on motion compensation. As already discussed, memory bandwidth and access speed are key factors in deciding the overall performance of decoders. At high bitrates it is also possible to notice that the entropy decoding module can become the bottleneck of the system Architecture Example In [17] the authors demonstrate the advantages of a hardware solution for decoding UHD video sequences. While FullHD resolution is still manageable by a software decoder run on a general purpose processor, UltraHD resolution videos contain an excessive amount of information to decode and a hardware solution is required for real-time processing. As already discussed in the previous section, the solution to best exploit the new HEVC structure is through parallelization. The syntax of the standard allows different degrees of parallelization that can adapt to a multi-core or multi-thread system. Slices and tiles define independent areas of the image that can be decoded independently from each other and therefore a multi-core system can easily process them at the same time. Once the structure of one core is defined, many can cooperate to increase the decoder throughput. The design of a core is based on the coding structure of HEVC. A frame or a tile is partitioned in CTUs that can have a maximum dimension of 64 64, and they are further partitioned in CUs, PUs as discussed in Chapter 2. The decoding process needs to follow the internal hierarchical structure of the CTU, therefore the simplest solution is to implement a pipeline system at the CTU level. This choice avoids the use of complex coherence and synchronization algorithms, when using a pipeline system based on CUs or PUs might need information from other CUs or

58 46 PUs in the same CTU. Stall bubbles need to be inserted into the operation flow every time data has dependencies from other stages, blocking the pipeline and slowing down the whole system. The pipeline has 5 stages: entropy decoding inverse quantization and transform prediction and reconstruction in-loop filtering decoded picture storage The first stage pre-processes the bitstream and decodes it, generating data and coefficients for the following stages. The second stage uses data and coefficients from the previous one to apply rescaling, reordering, and inverse transformation. With these steps the coefficients becomes residuals and the block can be reconstructed by the third stage. The prediction and reconstruction pipeline stage is in charge of inverting Intra or Inter prediction. The information about the mode chosen by the encoder as well as the motion data have been extracted from the bitstream in the first stage and passed directly to the third. The motion compensation module contains a cache and interpolation filters and its internal structure is sub-pipelined to improve efficiency. Furthermore, it presents processing blocks with different sizes to better adapt to the different block dimensions. The sub-pipelined version allows us to save almost 82% of the local on-board memory and also reduces the operating cycles by 30.5%. The motion compensation module also needs to fetch the required reference information from an

59 47 external memory containing the decoded picture buffer, together with inheriting the parameters for in-loop filtering from the neighboring CTU. After the block reconstruction, the in-loop filtering stage applies the deblocking filter and the SAO filter to improve the image quality. Configuration parameters from neighboring CTU can be used for filtering and new parameters are outputted for next CTU processing. The last stage of the pipeline stores the completely reconstructed block in the external memory so that it will be available for display and for future reference. The team of authors tested the proposed architecture implementing it on a FPGA prototyping board. The board is equipped with a Xilinx Virtex-7, a general purpose RISC processor, a video controller, and 2GB of RAM. The Virtex-7 FPGA chip contains the hardware decoder cores, while the general purpose processor parses the bitstream and manages the control and settings of the cores. Once the hardware decoders are set they do not need any additional input from the processor until new data needs to be decoded. The 2GB RAM is a DDR3 SDRAM which stores the bitstream, the decoded picture buffer, and all the intermediate parameters. The video controller manages the HDMI communication with a 4K-UHD LCD screen to display the result of the conversion. The proposed solution should be able to decode UHD content in real-time if implemented on a custom integrated circuit.

60 CHAPTER 4 COMPRESSED BUFFER DATA MANAGEMENT In the previous chapters we discussed about video coding in general as well as the innovation brought by HEVC. Hardware encoders and decoders struggle when they need to manage the high memory requirements set by the new standard and therefore new solutions are needed to overcome the problem. The next section presents a detailed explanation of the most relevant issues related to buffer data management and the possible solutions. The sections after it will present the proposed compression algorithm for the buffer data, followed by an analysis of the storage and computational advantages with respect to previous works. Chapter 5 will describe the code used to simulate the algorithm and the results of the simulations both in terms of image quality and achieved compression ratio. 4.1 Motivation Targets of the HEVC standard are not only consumer applications but also video sequences taken with professional cameras, as well as contents that require high fidelity and high quality. Resolutions are quickly increasing and HEVC has the objective to be applicable to as many applications as possible, today and in the upcoming years. For these reasons it allows picture sizes that can pass the so called 8k 4k ( ) resolution at 120 fps, while the frame rate can rise up to 300 fps with smaller resolutions. 48

61 49 The numbers mentioned before make clear that hardware architectures will need to be adapted and enhanced in order to meet those requirements, especially the components that need to stand the greatest stress. One of them is the DPB. Here is an analysis of the major issues that need to be addressed. Buffer Size As already explained, the dimension of the DPB depends on level, profile, and picture resolution. In case of Main profile and level 6.2, it is possible to have 16 frames with a resolution of in the DPB at the same time. The formula to compute the DPB size requirement S is: S = W H Bd Cp F r where: W = frame width H = frame height Bd = bit depth Cp = average components per pixel F r = number of frames. W and H are defined by the resolution mentioned above and the corresponding number of frames F r is 16 due to the choice of level 6.2. The Main profile supports 4:2:0 chroma sampling and so the pictures will have one luma sample for each pixel and 2 chroma samples every 4

62 50 pixels, for an average of 1.5 components per pixel. The chosen profile also sets the bit depth to 8 bits, allowing us to compute S as: S = = bits 190 MegaBytes Almost 200 MegaBytes is a very large number for a memory that should be embedded in encoders and decoders, because it is impossible to implement it as on-chip memory. Off-chip memories can be much more capable such as solving the memory size problem, but introduce other major issues that some applications cannot accept. The area increase is a good example of a negative side effect, because off-chip memories need to be placed outside the main integrated circuit, and they often also use different printing techniques. DRAMs for instance use a variation of the usual lithographic process and for that reason cannot be built on the same area of any other kind of integrated circuit. One other issue introduced by the use of a large memory is the power consumption, that is also a side effect of area increase. In applications that require large memories or an intense memory usage, memory is a very power-hungry component and it can easily be one of the main sources of power consumption, if not the first one. Power is critical especially when a device needs to be battery supplied or when there are very strict design constraints. Another important aspect is the performance, strongly dependent from the type of memory chosen for the design. Off-chip memories need longer interconnections and more complex interfaces which slows down global memory speed. In the case the memory speed is the same,

63 51 larger quantities of data require more time to be read or written, but usually larger memories are also slower and this worsens the performance degradation. The metric used to establish the performance of a memory depends on the application but usually takes into account latency and throughput. Latency is the time that the memory needs to provide a specific data after its request, while the throughput is the total amount of data that can be transmitted in a fixed time. Large off-chip memories usually have lower performance compared to smaller onchip ones, both in terms of latency and throughput, and they sometimes also need additional hardware for their management. All the previous issues related to the need of a large off-chip memory can also translate into an additional cost because: a larger area, if first of all it is available, is more expensive a higher power consumption may require more capable batteries that are also more expensive different or additional hardware may be needed in order to compensate the reduced performance, increasing costs. If it is impossible to increase area, power consumption or cost, or if a performance reduction is not tolerable, then the increased memory requirement of HEVC might make the design impossible. Number of Accesses HEVC is based on both Intra and Inter prediction modes, and as discussed in Section they work by scanning the spatially or temporally neighboring blocks searching for the best

64 52 approximation. Motion estimation can use different search algorithms in order to trade some efficiency for encoding speed, but in any case the number of readings is very high. In the worst cases, dozens of readings can be necessary just for the motion estimation of one block. HEVC can support very high resolution videos that therefore contain hundreds of thousands of blocks per frame, and so many million reading operations per frame will be necessary just for the motion estimation part. Even if it mainly affects encoders, decoders face this problem too since the motion vector is usually not sent explicitly. Most of the times both encoders and decoders use search algorithms to define a list of most probable reference blocks while using information from the previously encoded or decoded blocks. By transmitting the motion vector as the indec of the chosen block in the list, it reduces the amount of information that is needed to be sent. Even though this system improves the compression ratio, it requires a larger number of readings to decode an image. The decoder process will require more accesses to memory, but still less compared to the encoding process. Power consumption in memories depends on their architecture, but it is also strongly related to the number of read or write operations executed. For this reason HEVC can bring increased power consumption due to its highly data dependent encoding algorithms. Since each read or write operation can only transmit a certain amount of data, the number of these operations depends on the number of blocks to retrieve from the memory and the blocks dimension. Furthermore, if the memory is connected to the integrated circuit through a shared bus, a high number of read and write operations as well as a high amount of data transmitted from and

65 53 to the memory can congest the bus. This can slow down other parts of the chip that need to transfer data through the bus and find it busy. If the search algorithm cannot be modified, different approaches can be used to reduce one or the other aspect. Two valid examples are: data caching data compression. A cache is a smaller and usually faster memory used to temporarily store the data read in the recent read operations in order to reuse that data in the case it is requested again. The date can then be accessed more quickly without the need to retrieve it again from the main memory. This approach can be useful if a certain area is read multiple times, but if new sections are required to be read again then data caching brings no advantages. One solution that aims to reduce data size is to compress data in the buffer. Less data to transmit means less read and write operations and therefore potentially lowering bus utilization and power consumption. This solution will be discussed in detail in the following sections of this chapter. Access Speed The number of read and write operation required by HEVC has increased, as well as the amount of data that needs to be transmitted to and from the buffer memory. Furthermore, the new standard allows higher resolutions and frame rates compared to the previous standards. Encoder and decoder architectures face big challenges if they need to process the video in realtime because more data needs to be elaborated in a shorter time. In order to respect the timing

66 54 constraint, new architectures require more powerful hardware, especially if a certain component risks to be a bottle-neck for the rest of the system. Due to the large amount of data that the motion estimation module needs to continuously read from the DPB, the access speed of the memory is a key factor in ensuring the performance of the whole encoder or decoder. Faster memories are more expensive, may require more area, and have an increased power consumption. These drawbacks are not always acceptable depending on the application that the IC will have. A simplified formula to compute the throughput requirement for the buffer is: S = D T where: S is the minimum throughput speed required D is the amount of data that needs to be transmitted T is the maximum allowed time to transmit that amount of data. If the application requires real-time execution then T is fixed, because if the video is recorded at 30 fps it means that at least 30 frames need to be encoded or decoded every second. D is the amount of data that needs to be transmitted to and from the memory for processing those 30 frames and reducing it can also reduce the minimum throughput speed requirement S. As discussed above, DPB compression can be a valid solution to reduce the amount of data and therefore to allow the use of slower memories, potentially reducing cost, area, and power consumption.

67 Compressed Buffer Data Our goal is to relieve some of the stress put on the DPB by the HEVC requirements, enabling an easier design of HEVC compatible hardware. The research had originally started in a different direction from the finally proposed thesis, this is because we were attempting to solve the problem with a different approach. The first approach did not work as well as expected, but we are going to present it anyway because the analysis of its results helped in the choice of the final solution. After describing the first approach, the new solution is presented showing its different features and justifying the choices taken during the development comparing it with previous works Early Approach At the beginning of designing the new algorithm we tried to explore a new field. We noticed that the majority of the recompression algorithms relied only on the information contained in the frame that needs to be compressed without considering any other data. The decoded picture buffer holds from 6 to 16 frames in HEVC and they are not completely unrelated images. All the pictures come from the same video sequence where they are also temporally close to each other, therefore guaranteeing similarity among them. Most of the recompression algorithms exploit the redundancy between neighboring pixels to reduce the information they store and to achieve good compression ratios. Video compression algorithms use the same approach when they encode with Intra prediction, looking for the candidate that best approximates the currently processed block. HEVC and its predecessors are however strongly based on Inter prediction too, technique that uses reference information

68 56 from previously encoded or decoded frames. For this reason we tried to explore the possibility of designing a solution based on that. The first idea was to obtain residuals from the difference between pixel (x, y) in frame n and pixel (x, y) in frame n 1. Frame n is the one being compressed, while n 1 is the frame immediately before in the buffer. The more time passes between two frames the more different they will be, so the best case scenario is that frame n 1 is not only the previous in decoding order but also in display order. Because of the GOP structure this is rarely true, as shown in Section and Figure 3, but for a first analysis we can consider this as true. Residuals obtained with this technique can then be coded using VLC such as Golomb-Rice or Exponential-Golomb and compared with other solutions. The first tests on small and medium resolutions gave good results, comparable with other techniques. The characteristics of working at resolutions up to CIF is that the differences between neighboring pixels in the same frame are not very small and the changes between consecutive frames are small. This is due to the low pixel density, where few pixels are needed to cover a large area therefore accumulating all the information and detail in them. Furthermore, an object moving in the video will only change position by few pixels between every frame, keeping the differences between consecutive frames small. All these characteristics reduce the performance of recompression algorithms that exploit the spatial redundancy and increase the efficiency of our first proposed solution. Even though videos with small and medium resolutions gave good results, the motivation behind this work is to alleviate the memory requirement on hardware encoders and decoders when it is extremely

69 57 high. This situation is much more likely to happen in case of high and ultra-high definition videos so we also tested the algorithm on some of the HEVC test sequences with resolutions and Results on these kinds of videos showed that our solution was not adequate, and here is the analysis of the reasons. Even though the resolution of a video is very high, its field of view is always about the same, and this means that many more pixels will represent almost the same amount of information that a low resolution video needs to. Although higher resolutions bring greater detail, there is often much more spatial redundancy due to the increased pixel density. The effect of this is a reduction of the differences between neighboring pixels, therefore allowing recompression algorithms that rely on it to perform much better. The second reason why the temporal approach does not perform well in case of high resolutions is the change between consecutive frames. If an object moves from the left side of the screen to the center, in a high resolution video it will have moved by many more pixels than in a low resolution one. If the camera is changing pointing direction, the entire frame will be shifted and the difference in terms of pixels will be much higher in high resolution sequences even if the camera motion happens at the same speed. Given these results, we tried to use some of the solutions implemented by video coding standards when working with Inter prediction. Since there are usually some differences between frames, motion vectors describe the offset between the current block and the best approximation in the reference frame. Since it would be too computationally expensive to implement a full motion compensation algorithm inside the recompression routine, we simply tried to compute

70 58 a global motion vector and apply it to the whole image. Even though the image quality takes advantage from this modification, the improvement is so small that it is not worth the extra calculations. As discussed before, the aim of this recompression algorithm is to reduce the memory requirement while asking for a small overhead of extra computational effort and this first solution does not perform well enough where needed. Since high and ultra-high resolution videos set the highest buffer requirements, we need an algorithm that performs well with that kind of sequences exploiting their characteristics in the best way. The next sections are going to explain the new approach and its features, comparing it with previous works The New Approach The work on the first approach gave us a clear understanding of the characteristics of the video sequences the algorithm will have to elaborate. The high pixel density will guarantee high correlation between neighboring pixels. Information from previous frames cannot be used without the implementation of a motion compensation system which is too computationally expensive for this kind of application. This new recompression algorithms aims to: reduce the buffer size requirement reduce the number of memory accesses keep the computational overhead as low as possible. In literature there is a large variety of papers about recompression algorithms, but only few of them tried to address all these problems at the same time. With the proposed solution we

71 59 try to give a valid solution for all these problems, but by simply enabling or disabling some of the features the effort can be focused on some of the issues only. The following sections will describe on the overall structure and the different stages of the algorithm, highlighting the reasoning behind each of the choices while comparing it to previous works General Organization of the Algorithm As mentioned above, the proposed recompression algorithm aims at addressing all the most relevant problems related to the decoded picture buffer in stress situations such as encoding and decoding of HD and UHD videos. Here is how. Buffer size A key factor in the choice of the strategy to follow is the buffer size requirement. The algorithms that are able to reach the highest compression or to maintain the best quality rely on variable compression ratios [18] [19] [20]. If the compression ratio is adapted during the process it allows us to always have the best trade off between quality and bit utilization, reducing the size when only little information needs to be sent and increasing the size in case of areas with more detail. The disadvantage of this technique is that the final size of the compressed image cannot be predicted and therefore it is impossible to define an upper bound for the size of the buffer. If the algorithm is well designed the maximum size of the compressed image is the original size, but if the worst case is not taken into consideration the maximum size can be even larger. Variable length codes can indeed be longer than the original data, and the worst case scenario of uncompressible data would result in a size increase compared to the original frame.

72 60 Even if the variable compression ratio can give the best results, it is not suitable for reducing the buffer size so we decided to use a fixed compression ratio. A fixed compression ratio does not guarantee a lossless compression because data will not always be able to fit into the allocated space. The choice of lossy compression is the only one that allows us to reduce the memory size requirement because a fixed compression ratio algorithm always outputs a predetermined amount of data. Lossless algorithms with variable output size can still give some advantages such as average memory bandwidth saving and some power saving, but a full size memory still needs to be installed. The mentioned power saving can be achieved by removing power to the memory banks that are not used, but the area occupied by the memory chip will be the same as the original. Its cost might be too high for certain designs, so the only way to solve the cost problem is by reducing the memory size. A lossy fixed CR algorithm can bring the same advantages, but also including the area reduction. Number of Accesses Another important goal of the algorithm is to reduce the number of memory accesses. A memory access operation is very expensive in terms of both power consumption and execution time, so we can have great advantages by finding a way to reduce them. This is another situation in which a variable compression ratio solution is not adequate because the size of the compressed image, or parts of it, cannot be predicted and therefore it is impossible to certainly know where to find the required pixel [18]. When the motion estimation or compensation module requires access to a certain area of the image, it is important to know in advance which memory location we need to access. In case of variable CR it is impossible to know where to

73 61 find a certain pixel unless some packing techniques are implemented to allow it [19] [20]. These techniques however usually bring an efficiency reduction either on the CR or on the number of accesses. In the case nothing is done to help finding a location, the only way to reach a pixel is by scanning the whole image and it is clear how this solution is unpractical when compressing HD or UHD videos. In our case, a square block is used as basic structure for compression. The size can be chosen depending on the application and on the desired trade off between quality preservation and compression ratio, but the usual values are usually from 4 to 8 pixels. The choice is driven by the fact that different independent structures are compressed without any information from other blocks, therefore allowing us to access the information in it just by decompressing it. The use of larger structures forces us to read and decode all the area even if we need only a small portion of it. On the other side, a too small area will not be convenient because the space occupied by parameters is fixed and would cause a too high overhead. This solution is the most commonly adopted by all other algorithm too. Computational effort HD and UHD videos bring huge quantities of data to encode or decode and hardware architectures need to be updated in order to be able to process them in real-time. Since the timing constraints are already very strict, it is important that a recompression algorithm will add the smallest overhead possible. Ideally the process should be completely transparent from the point of view of motion compensation or estimation module, meaning that they should not notice any difference in the communication with the buffer. In order to reach this goal the

74 62 buffer management algorithm needs to be fast enough to provide the requested information with a very short latency and this is possible only if the implemented operations are fast and allow a high degree of parallelization. Solutions implementing complex transform operations on pixels or residuals [21] [22] introduce a computational overhead that can be afforded only with low resolutions. The high throughput requirement of HD and UHD videos only allows for simple and fast operations, otherwise the entire system performance will be reduced. Furthermore, VLC techniques produce coefficients or residuals with an unpredictable number of bits and for this reason it is impossible to directly access a specific value [22] [23]. The whole block bitstream needs to be read and parsed in order to find the desired coefficient thus preventing any form of parallelization. To allow fast access and decompression it must be possible to parallelize the operations and independent direct access of residuals needs to be available. Truncated bit encoding system can guarantee a fixed bit-length and therefore the position of all coefficients is known. Furthermore, it is important to avoid excessively long chains of operations because of their latency. The choice of the scanning order has been influenced by that too. Following sections will give more details about the different implemented algorithms that allow the proposed recompression scheme to address all the critical issues mentioned above ADPCM Differential pulse code modulation (DPCM) is a modulation technique that reduces the information that needs to be sent by computing the difference between consecutive samples. In our case, instead of trying to directly compress the pixel values, the algorithm will first

75 63 compute the difference between neighboring pixels to generate what we will call residuals. As described before, HD and UHD videos have a high pixel density so the majority of the times neighboring pixels will have very similar values. If the value of pixels in a block is high but is almost always the same number then it is redundant to directly try compressing the pixel because the efficiency will be low. Using DPCM we can reduce the value to a very small number and it will be much easier to compress. Compared to others techniques, one of the greatest advantages of this solution is the simplicity of the operations. A simple adder is able to execute the operation without the need of complex multipliers as other solutions such as DCT require. To improve efficiency of DPCM it is possible to use adaptive pulse code modulation (AD- PCM), a modification of the original DPCM to adapt it to an algorithm with fixed output size. The additional flexibility is given by the use of quantization, that as we know can reduce the detail by dividing and scaling the original pixel value. In the case of a block with too big differences between neighboring pixels, it will be impossible to fit that information in the allocated space and therefore we need to discard some of the detail. A coefficient Q defines the magnitude of the quantization effect and it is usually a power of two. This choice is the most popular because of the following relation: p/q (Q = 1, 2, 4, 8,... ) = p >> Q (Q = 0, 1, 2, 3,... )

76 64 where the left operation is a division and the right one is a bit shift. Executing division in hardware is a very demanding operation both in terms of area and time, while the bit shift only requires little hardware resources. Many previous works use DPCM and ADPCM to compute small residuals, but its implementation is sometimes not tailored to be hardware-friendly. Here is an example. Implementation in Previous Works As discussed before DPCM and ADPCM can obtain great data reduction with very simple operations and it is therefore a commonly adopted technique. However, this solution could not be the fastest if it is not implemented wisely. The reconstruction of the block is a sequential operation that starts from a non-differentially coded value, usually at one corner, and moves from one residual to the neighboring one reconstructing the original value of pixels. Without decoding all the previous residuals on the path, it is impossible to directly reconstruct a pixel so the choice of the path is fundamental for efficiency. Figure 4 shows two of the possible paths implemented in [23]. We can notice how there is always only one path that the decoder can follow to reconstruct the pixels, meaning that it is a strongly serialized process. The example is based on a 4 4 block so the path counts 16 pixels and therefore requires 15 additions that can only be done one after the other. If the block is larger, for example 8 8, it would require 63 consecutive additions with a consequently long delay. In their solution, 8 different paths are available in order to have the same that are available in H.264 for coefficients scanning. Their algorithm uses the information from the previous

77 65 Figure 4: Serpentine paths for DPCM encoder or decoder modules to know which could be the optimal scanning order and they then compare it with another of their choice. The mode giving the smallest residuals is chosen and the mode number is signaled at the beginning of the block. The 8 different paths have different orientations in order to adapt to the various image patterns such as horizontal, vertical, or diagonal stripes. Proposed Solution Figure 5 shows the proposed paths for DPCM. In [19] it is possible to see them offered as two of the four available modes. The greatest advantages of these different kinds of paths are the length of the longest subpath and the availability of multiple sub-paths to follow. With the proposed path two different

78 66 Figure 5: Hardware optimized paths for DPCM independent directions are available already from the starting pixel, that means two different adders can start reconstructing the pixels at the same time. If we look at the left block in Figure 5 we can see that after reconstructing the first pixel toward the bottom, another independent horizontal path becomes available and a third adder can start decoding it. The same applies to the last row, therefore allowing different adders to work in parallel greatly reducing the latency. For simplicity the example shows a 4 4 block, but given a 8 8 block the decoding operation can be 4.5 times faster than using only one adder, as Section 4.4 will analyze in greater detail. Since images usually present a microscopic structure organized in horizontal or vertical stripes, the algorithm allows 2 modes to best adapt to both situations, as shown in Figure 5.

79 67 The presence of diagonal stripes is less common and also more difficult to efficiently exploit using a square block structure so the available modes are limited to two only. The advantage of such choice is the need of only one bit to encode the selected mode. Adaptive pulse code modulation is implemented through a parameter Q that gives the number of least significant bits that have been discarded during encoding. The quantization operation is done by shifting right the original values of the pixels before applying DPCM, therefore discarding the smallest detail. It is important to shift the pixel values and then apply exact DPCM rather than just shifting the residuals, because in the latter case the introduced error will accumulate at every addition without any control on its maximum value. In the first case instead the maximum error will always be the quantization error. The value of Q can be any integer number from 0 to 7 so only three bits are needed to store it. It is transmitted at the beginning of the block together with the mode selection bit. When reconstructing the block, the decompression algorithm reads the value of Q and shifts left all the pixel values by the correspondent number of bits. The rescaling operation through left shifting is usually done by inserting trailing zeros, but in order to reduce the quantization error one zero will be first inserted followed by all ones. This is equivalent as adding half of the maximum dynamic of the quantization error thus placing the reconstructed pixel value in the middle of the interval of possible original values.

80 68 In the case the original pixel values is 59 and Q = 4 we obtain: = = 03 = 59 >> Q = 48 = (59 >> Q) << Q (4.1) = 55 = (59 >> Q) << Q + 2 Q 1 1 (4.2) Inserting one zero followed by all ones is equivalent to adding 2 Q 1 1 which minimizes the quantization error. All numbers from = 48 to = 63 are reduced to 011 = 3 when quantized with Q = 4 and therefore we can see how Equation 4.2 gives a number that on average is closer to all the numbers in the initial interval if compared with Equation 4.1. The value of Q that ADPCM will use is determined by the truncated bit encoding system that will compress the residual reducing their bit-length. The system is described in the following section Truncated Bit Packaging After applying ADPCM, the value of pixels is now transformed into the difference between the pixel itself and its predecessor along the path defined in the previous section. Thanks to the high spatial redundancy most of the pixels in a block have similar values therefore generating small residuals. In [18] the authors present a graph with the probability distribution for residuals computed with different techniques. It is clear that the majority of them have very small values concentrated around zero so a proper encoding technique needs to be chosen to reduce the bit

81 69 length. Indeed, no compression would be achieved without an encoding system that reduces the required number of bits to store the block. Since residuals can be either positive or negative, the chosen encoding technique needs to be able to efficiently reduce the bit length of positive and negative numbers as far as the absolute value is small. Binary negative numbers are represented in two s complement and therefore they always need all the bits allowed by the system dynamic. For the HEVC that means 8, 10, or 12 bits. In this case, a proper conversion formula should be defined so that small negative numbers only need few bits too. Comparison of Already Proposed Encoding Techniques In literature, different approaches have been taken to shorten residuals and therefore to obtain good compression ratios. Most of the techniques rely on variable-length codes (VLC), encoding systems that assign codes of different length to residuals based on their value or probability. The solution implemented in [22] makes use of the Huffman VLC. The theory behind Huffman says that the most common numbers, those that appear more often, carry less information than numbers that appear rarely and therefore they need a different number of bits to be encoded. In particular, common numbers can be encoded with just few bits while rare numbers require more bits. The association between numbers and corresponding codes is defined through the Huffman tables, conversion look-up tables that contain the number-code pairs. The generation of the table is done by analyzing all the possible numbers and their probability, creating a tree struc-

82 70 ture. The two numbers with the smallest probability are grouped together and their probability is summed to create a new probability value and the process is repeated over and over again in order to create a tree structure. The following step is to start from the root of the tree and assign a zero and a one to the two branches that generate at each bifurcation. Once all the branches have been labeled with zeros and ones up to the leaves, the code corresponding to each number can be generated by reading the sequence of bits from the root of the tree to the corresponding leaf. Huffman ensures a great compression but to get the best results the table must be tailored to the particular data set that needs to be compressed. This operation is long and computationally expensive so the tables are usually generated using some sample data and then stored to be reused for encoding and decoding. Even though Huffman provides one of the best compression ratios, it has the disadvantage that for every encoding or decoding operation there is the need to access the table to pass from the number to the code and vice versa. Since the table contains as many entries as the possible numbers, an 8 bits system will require a 256 entry table. A table with such a dimension needs to be stored on a memory and therefore every encoding or decoding operation will require a memory access, slowing down the system. Furthermore VLC is most of the times shorter than the original number, but codes corresponding to rare numbers can be much longer compared to the number value itself. This can reduce the efficiency of the recompression algorithm and requires high Q values in order to make the block fit into a predetermined space, therefore strongly reducing the quality.

83 71 The mentioned problem is also shared by other VLC encoding techniques such as Golomb- Rice, implemented in [23]. This algorithm assigns an increasing bit length as the numbers increase their value therefore matching the smallest codes with the smallest numbers. The conversion algorithm works with positive numbers only so it is necessary to convert negative numbers in order to adapt it to encode residuals. One of the most commonly implemented conversions consists in shifting left the absolute value of numbers by one bit and then subtract one if the number was negative. The mathematical rule is: 2x if x 0 y = 2x 1 if x < 0 In this way the (-128,127) interval is transformed into a (0,255) interval and numbers with small absolute value are mapped into small positive numbers. In Section 4.4 the computational effort of this operation will be analyzed and a simple hardware solution will be able to implement it. Golomb-Rice encoding also uses a parameter k that defines the division factor used inside the algorithm. A smaller k value will generate smaller codes in correspondence of small numbers, but the bit length will also grow faster as the number value increases. A larger k value will help by limiting the growing speed but the small values will have longer codes associated to them. The usual values of k are 1 and 2 and the choice between the two is usually made based on the expected magnitude of the numbers to encode. As discussed for Huffman encoding, VLC can be much longer than the original number and this could reduce the compression performance. Exponential Golomb is a variation of Golomb-

84 72 Rice encoding where it reduces the grow speed of the bit length but at the expense of longer codes for small numbers and a more complex algorithm. The compression efficiency of VLC systems is due to the fact the the length of the codes can be adapted to the value to encode, but a variable size compression does not allow direct access to data. Independently from the path used for ADPCM, it would be impossible to exploit parallelism to decode a block because the size of encoded residuals as well as their position in the bitstream will be unknown. VLC decoding is a highly serialized operation and thus it blocks any form of parallelization. A technique that can generate fixed size codes is the truncated bit packaging, as discussed in [18]. The original values can be converted into all positive numbers using the same conversion as before and then they are all encoded by simply expressing them with the minimum number of bits that can represent all of them. If all the residuals of a block are inside the interval (-16,15), they will be converted into the interval (0,31) and therefore they can all be represented using only five bits. The great advantage of this solution is that all residuals will have a known bit length so their position in the bitstream can be easily found and directly accessed. This feature is fundamental to allow parallelization and exploit the proposed ADPCM path system at its best. For this reason the truncated bit packaging (TBP) system is our choice for the encoding algorithm. Although its great performance, the algorithm is less flexible than VLC solutions and therefore it needs adaptations. The following paragraphs will explain the proposed modifications to the standard TBP.

85 73 ADPCM starts the first computation of residuals using Q = 0 using both orientations of the path and TBP analyzes the residuals to compute the maximum number of bits required. If no one of the paths generates residuals that can be encoded in the predetermined space set by the desired CR, then Q is increased and the process restarts. If a path generates residuals that fit into the allocated space than the orientation that requires less bits is selected and used for final encoding and packaging. Position Based Length Adaptation The main problem of TBP is the lack of flexibility, because if all the residuals can be encoded with 4 bits but one requires 5 bits then all residuals will be encoded with 5 bits forcing an unnecessary much larger size for the entire block. Given a 4 4 block, passing from 4 to 5 bits due to only one residual will cause an increase of 15 bits so it needs a smarter solution. As discussed before, images usually present a microscopic pattern of horizontal or vertical stripes. This means that pixels along a line will have very similar values while pixels on different lines will present larger differences even if placed one next to each other. As the Golomb-Rice k value is adapted in [23] when jumping from one line to the other, TBP could be modified to adapt as well. The proposed solution is tailored to the proposed ADPCM path, as shown in Figure 5. The path on the left block will better adapt to an area with horizontal stripes, while the path on the right to vertical stripes. This means that we can expect small residual while moving along the green sub-paths, while larger residuals will be computed along the red sub-path. Since we can

86 74 already expect to have this two different groups of residuals, the TBP can be adapted allowing one bit more for residuals on the red line compared to residuals on green lines. In the case of a 4 4 block this solution will always require 3 additional bits even if not needed, but if the additional bit avoids to increase the bit length for the entire block then the saving will be of 12 bits. Small error adjustment As mentioned before, the major problem of TBP is the lack of flexibility because if a residual needs one bit more than all the others then it will force the entire block to use one more bit. Given a target CR that requires residuals to be encoded with at most 4 bits, all residuals should stay into the (-8,7) interval. If one residual has a value of -9, it cannot fit into 4 bits and it will require an increase of Q. Supposing that Q passes from 0 to 1 due to that residual, an error could be introduced of a magnitude of 1 on one or more residuals. If Q passes from 1 to 2 or even higher values, the introduced additional error will be even higher. The proposed modification is to recognize when a residual does not fit into the required bit length just for a small amount and to correct the excess. In the previous example if the value -9 is recognized, the residual is adjusted and corrected to -8 so it will fit into 4 bits only. If the residual is simply saturated, the introduced error will be propagated throughout the entire block during reconstruction, decreasing the overall quality. To preserve quality when a slightly not fitting residual is recognized during ADPCM calculations, the residual will be adjusted together with the value of the pixel that generated it. In this way the following subtractions will compensate the introduced error avoiding propagation.

87 75 The proposed modification purposely introduces an error, but if the error is smaller than the additional quantization error that a larger Q would introduce then the small error adjustment is worth and can help preserving the block quality Adaptive Compression Ratio In Section we explained how we tried to take advantage of the availability of other frames in the buffer in order to obtain better results. We discussed as a temporal prediction technique cannot be efficient, but the fact that more than one frame is stored in the frame buffer at the same time gives us the possibility to gain some flexibility. Our proposed solution works with a fixed CR and therefore the number of bits allowed for TBP is fixed, even though for certain blocks it could be larger than what actually needed. It is important to have a fixed CR in order to be able to know the maximum memory size requirement and also to directly access any block in the reference frames, however it also reduces the flexibility of the algorithm. Some flexibility could allow us to better adapt to the data that needs to be compressed and therefore we can better preserve the quality of the images. Given a fixed CR that imposes a maximum of 4 bits per residual, a frame could contain blocks that only require 3 or less bits to be encoded. The proposed solution is to encode a block with the minimum bit length it requires and use the saved space for the same block in a following frame. If the following frame has a block in the same position that requires more than 4 bits then it will be able to use the additional available space and to be able to encode without increasing Q thus preserving some quality.

88 76 As discussed already, a variable CR means that the size of blocks will be unknown so it will not be possible to access them directly because their position is unknown. A trade off between a variable and a flexible CR system can be found by reorganizing how the frames are packed and stored into the DPB. Instead of storing all the blocks of the first frame followed by all the blocks of the second frame and so on, the proposed packing solution will have the first block of all the frames stored at the beginning of the memory, followed by all the second blocks, and so on. In this way, the fixed CR will not fix the maximum size of a block, but the maximum size of the group of blocks. Figure 6 and Figure 7 show the organization of the memory as it is usually done and as it is proposed. The number inside the cells corresponds to the block number (1 to 20) while the cell background helps to understand which frame the block belongs to (diagonal, horizontal or vertical stripes). If a globally variable CR requires us to read the whole compressed image from the beginning until we find the desired block, with this approach we will always know where to go to access a certain block, and we will only have to find the desired block inside the group of blocks. Since the DPB can have a maximum capability of 16 frames, we will only need to read at most 16 times the size of a compressed block. The proposed system introduces an overhead increasing the amount of data that we need to read from the memory, but depending on the application this could be an interesting solution anyway. If a specific application prefers to focus on reducing the memory size rather than the memory bandwidth, then this solution can guarantee a higher quality preservation especially when the DPB holds many frames.

89 77 Figure 6: Classical buffer memory organization 4.3 Storage Analysis Thanks to the fixed CR it is possible to establish a maximum buffer memory size and be sure that data will always fit inside that boundary. Since the TBP system only allows an integer number of bits, not all block sizes are possible and therefore only certain compression ratios exist. Given a video sequence with 8 bits of depth, an L L block size and a maximum bit length of n for general residuals, the block size will be: B = n(l 2 1) + (L 1)

90 78 Figure 7: Proposed buffer memory organization Where: 1 is due to the ADPCM path orientation selection 3 is due to Q value signaling 8 is due to the first pixel because it is not differentially encoded n(l 2 1) is the space occupied by residuals (L 1) takes into account the additional bit available for the first line or row The compression ratio will then be defined as: CR = n(l2 1) + (L 1) 8L 2

Chapter 10 Basic Video Compression Techniques

Chapter 10 Basic Video Compression Techniques Chapter 10 Basic Video Compression Techniques 10.1 Introduction to Video compression 10.2 Video Compression with Motion Compensation 10.3 Video compression standard H.261 10.4 Video compression standard

More information

Motion Video Compression

Motion Video Compression 7 Motion Video Compression 7.1 Motion video Motion video contains massive amounts of redundant information. This is because each image has redundant information and also because there are very few changes

More information

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Implementation of an MPEG Codec on the Tilera TM 64 Processor 1 Implementation of an MPEG Codec on the Tilera TM 64 Processor Whitney Flohr Supervisor: Mark Franklin, Ed Richter Department of Electrical and Systems Engineering Washington University in St. Louis Fall

More information

Chapter 2 Introduction to

Chapter 2 Introduction to Chapter 2 Introduction to H.264/AVC H.264/AVC [1] is the newest video coding standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The main improvements

More information

Video coding standards

Video coding standards Video coding standards Video signals represent sequences of images or frames which can be transmitted with a rate from 5 to 60 frames per second (fps), that provides the illusion of motion in the displayed

More information

Digital Video Telemetry System

Digital Video Telemetry System Digital Video Telemetry System Item Type text; Proceedings Authors Thom, Gary A.; Snyder, Edwin Publisher International Foundation for Telemetering Journal International Telemetering Conference Proceedings

More information

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur Module 8 VIDEO CODING STANDARDS Lesson 27 H.264 standard Lesson Objectives At the end of this lesson, the students should be able to: 1. State the broad objectives of the H.264 standard. 2. List the improved

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Recommendation ITU-T H.261 Fernando Pereira The objective of this lab session about Recommendation ITU-T H.261 is to get the students familiar with many aspects

More information

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique Dhaval R. Bhojani Research Scholar, Shri JJT University, Jhunjunu, Rajasthan, India Ved Vyas Dwivedi, PhD.

More information

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and Video compression principles Video: moving pictures and the terms frame and picture. one approach to compressing a video source is to apply the JPEG algorithm to each frame independently. This approach

More information

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards COMP 9 Advanced Distributed Systems Multimedia Networking Video Compression Standards Kevin Jeffay Department of Computer Science University of North Carolina at Chapel Hill jeffay@cs.unc.edu September,

More information

Multimedia Communications. Video compression

Multimedia Communications. Video compression Multimedia Communications Video compression Video compression Of all the different sources of data, video produces the largest amount of data There are some differences in our perception with regard to

More information

Multimedia Communications. Image and Video compression

Multimedia Communications. Image and Video compression Multimedia Communications Image and Video compression JPEG2000 JPEG2000: is based on wavelet decomposition two types of wavelet filters one similar to what discussed in Chapter 14 and the other one generates

More information

The H.26L Video Coding Project

The H.26L Video Coding Project The H.26L Video Coding Project New ITU-T Q.6/SG16 (VCEG - Video Coding Experts Group) standardization activity for video compression August 1999: 1 st test model (TML-1) December 2001: 10 th test model

More information

An Overview of Video Coding Algorithms

An Overview of Video Coding Algorithms An Overview of Video Coding Algorithms Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Video coding can be viewed as image compression with a temporal

More information

MPEG + Compression of Moving Pictures for Digital Cinema Using the MPEG-2 Toolkit. A Digital Cinema Accelerator

MPEG + Compression of Moving Pictures for Digital Cinema Using the MPEG-2 Toolkit. A Digital Cinema Accelerator 142nd SMPTE Technical Conference, October, 2000 MPEG + Compression of Moving Pictures for Digital Cinema Using the MPEG-2 Toolkit A Digital Cinema Accelerator Michael W. Bruns James T. Whittlesey 0 The

More information

17 October About H.265/HEVC. Things you should know about the new encoding.

17 October About H.265/HEVC. Things you should know about the new encoding. 17 October 2014 About H.265/HEVC. Things you should know about the new encoding Axis view on H.265/HEVC > Axis wants to see appropriate performance improvement in the H.265 technology before start rolling

More information

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure Representations Multimedia Systems and Applications Video Compression Composite NTSC - 6MHz (4.2MHz video), 29.97 frames/second PAL - 6-8MHz (4.2-6MHz video), 50 frames/second Component Separation video

More information

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Video coding Concepts and notations. A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Each image is either sent progressively (the

More information

Understanding Compression Technologies for HD and Megapixel Surveillance

Understanding Compression Technologies for HD and Megapixel Surveillance When the security industry began the transition from using VHS tapes to hard disks for video surveillance storage, the question of how to compress and store video became a top consideration for video surveillance

More information

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur Module 8 VIDEO CODING STANDARDS Lesson 24 MPEG-2 Standards Lesson Objectives At the end of this lesson, the students should be able to: 1. State the basic objectives of MPEG-2 standard. 2. Enlist the profiles

More information

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences Michael Smith and John Villasenor For the past several decades,

More information

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes Digital Signal and Image Processing Lab Simone Milani Ph.D. student simone.milani@dei.unipd.it, Summer School

More information

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

ELEC 691X/498X Broadcast Signal Transmission Fall 2015 ELEC 691X/498X Broadcast Signal Transmission Fall 2015 Instructor: Dr. Reza Soleymani, Office: EV 5.125, Telephone: 848 2424 ext.: 4103. Office Hours: Wednesday, Thursday, 14:00 15:00 Time: Tuesday, 2:45

More information

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work Introduction to Video Compression Techniques Slides courtesy of Tay Vaughan Making Multimedia Work Agenda Video Compression Overview Motivation for creating standards What do the standards specify Brief

More information

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4 Contents List of figures List of tables Preface Acknowledgements xv xxi xxiii xxiv 1 Introduction 1 References 4 2 Digital video 5 2.1 Introduction 5 2.2 Analogue television 5 2.3 Interlace 7 2.4 Picture

More information

Overview: Video Coding Standards

Overview: Video Coding Standards Overview: Video Coding Standards Video coding standards: applications and common structure ITU-T Rec. H.261 ISO/IEC MPEG-1 ISO/IEC MPEG-2 State-of-the-art: H.264/AVC Video Coding Standards no. 1 Applications

More information

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC Motion Compensation Techniques Adopted In HEVC S.Mahesh 1, K.Balavani 2 M.Tech student in Bapatla Engineering College, Bapatla, Andahra Pradesh Assistant professor in Bapatla Engineering College, Bapatla,

More information

MPEG has been established as an international standard

MPEG has been established as an international standard 1100 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999 Fast Extraction of Spatially Reduced Image Sequences from MPEG-2 Compressed Video Junehwa Song, Member,

More information

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0 General Description Applications Features The OL_H264e core is a hardware implementation of the H.264 baseline video compression algorithm. The core

More information

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks July 22 nd 2008 Vineeth Shetty Kolkeri EE Graduate,UTA 1 Outline 2. Introduction 3. Error control

More information

Lecture 1: Introduction & Image and Video Coding Techniques (I)

Lecture 1: Introduction & Image and Video Coding Techniques (I) Lecture 1: Introduction & Image and Video Coding Techniques (I) Dr. Reji Mathew Reji@unsw.edu.au School of EE&T UNSW A/Prof. Jian Zhang NICTA & CSE UNSW jzhang@cse.unsw.edu.au COMP9519 Multimedia Systems

More information

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0 General Description Applications Features The OL_H264MCLD core is a hardware implementation of the H.264 baseline video compression

More information

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS Item Type text; Proceedings Authors Habibi, A. Publisher International Foundation for Telemetering Journal International Telemetering Conference Proceedings

More information

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

06 Video. Multimedia Systems. Video Standards, Compression, Post Production Multimedia Systems 06 Video Video Standards, Compression, Post Production Imran Ihsan Assistant Professor, Department of Computer Science Air University, Islamabad, Pakistan www.imranihsan.com Lectures

More information

HEVC: Future Video Encoding Landscape

HEVC: Future Video Encoding Landscape HEVC: Future Video Encoding Landscape By Dr. Paul Haskell, Vice President R&D at Harmonic nc. 1 ABSTRACT This paper looks at the HEVC video coding standard: possible applications, video compression performance

More information

The H.263+ Video Coding Standard: Complexity and Performance

The H.263+ Video Coding Standard: Complexity and Performance The H.263+ Video Coding Standard: Complexity and Performance Berna Erol (bernae@ee.ubc.ca), Michael Gallant (mikeg@ee.ubc.ca), Guy C t (guyc@ee.ubc.ca), and Faouzi Kossentini (faouzi@ee.ubc.ca) Department

More information

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions 1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,

More information

Film Grain Technology

Film Grain Technology Film Grain Technology Hollywood Post Alliance February 2006 Jeff Cooper jeff.cooper@thomson.net What is Film Grain? Film grain results from the physical granularity of the photographic emulsion Film grain

More information

Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018

Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018 Into the Depths: The Technical Details Behind AV1 Nathan Egge Mile High Video Workshop 2018 July 31, 2018 North America Internet Traffic 82% of Internet traffic by 2021 Cisco Study

More information

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

The Multistandard Full Hd Video-Codec Engine On Low Power Devices The Multistandard Full Hd Video-Codec Engine On Low Power Devices B.Susma (M. Tech). Embedded Systems. Aurora s Technological & Research Institute. Hyderabad. B.Srinivas Asst. professor. ECE, Aurora s

More information

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding Jun Xin, Ming-Ting Sun*, and Kangwook Chun** *Department of Electrical Engineering, University of Washington **Samsung Electronics Co.

More information

AN MPEG-4 BASED HIGH DEFINITION VTR

AN MPEG-4 BASED HIGH DEFINITION VTR AN MPEG-4 BASED HIGH DEFINITION VTR R. Lewis Sony Professional Solutions Europe, UK ABSTRACT The subject of this paper is an advanced tape format designed especially for Digital Cinema production and post

More information

COMP 9519: Tutorial 1

COMP 9519: Tutorial 1 COMP 9519: Tutorial 1 1. An RGB image is converted to YUV 4:2:2 format. The YUV 4:2:2 version of the image is of lower quality than the RGB version of the image. Is this statement TRUE or FALSE? Give reasons

More information

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second 191 192 PAL uncompressed 768x576 pixels per frame x 3 bytes per pixel (24 bit colour) x 25 frames per second 31 MB per second 1.85 GB per minute 191 192 NTSC uncompressed 640x480 pixels per frame x 3 bytes

More information

H.264/AVC Baseline Profile Decoder Complexity Analysis

H.264/AVC Baseline Profile Decoder Complexity Analysis 704 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003 H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, Senior

More information

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1 MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1 Toshiyuki Urabe Hassan Afzal Grace Ho Pramod Pancha Magda El Zarki Department of Electrical Engineering University of Pennsylvania Philadelphia,

More information

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003 H.261: A Standard for VideoConferencing Applications Nimrod Peleg Update: Nov. 2003 ITU - Rec. H.261 Target (1990)... A Video compression standard developed to facilitate videoconferencing (and videophone)

More information

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21 Audio and Video II Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21 1 Video signal Video camera scans the image by following

More information

4 H.264 Compression: Understanding Profiles and Levels

4 H.264 Compression: Understanding Profiles and Levels MISB TRM 1404 TECHNICAL REFERENCE MATERIAL H.264 Compression Principles 23 October 2014 1 Scope This TRM outlines the core principles in applying H.264 compression. Adherence to a common framework and

More information

New forms of video compression

New forms of video compression New forms of video compression New forms of video compression Why is there a need? The move to increasingly higher definition and bigger displays means that we have increasingly large amounts of picture

More information

Multicore Design Considerations

Multicore Design Considerations Multicore Design Considerations Multicore: The Forefront of Computing Technology We re not going to have faster processors. Instead, making software run faster in the future will mean using parallel programming

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

Principles of Video Compression

Principles of Video Compression Principles of Video Compression Topics today Introduction Temporal Redundancy Reduction Coding for Video Conferencing (H.261, H.263) (CSIT 410) 2 Introduction Reduce video bit rates while maintaining an

More information

Digital Media. Daniel Fuller ITEC 2110

Digital Media. Daniel Fuller ITEC 2110 Digital Media Daniel Fuller ITEC 2110 Daily Question: Video How does interlaced scan display video? Email answer to DFullerDailyQuestion@gmail.com Subject Line: ITEC2110-26 Housekeeping Project 4 is assigned

More information

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Computer Representation of Audio Quantization

More information

CERIAS Tech Report Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E

CERIAS Tech Report Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E CERIAS Tech Report 2001-118 Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E Asbun, P Salama, E Delp Center for Education and Research

More information

IMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO ZARNA PATEL. Presented to the Faculty of the Graduate School of

IMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO ZARNA PATEL. Presented to the Faculty of the Graduate School of IMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO by ZARNA PATEL Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of

More information

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Constant Bit Rate for Video Streaming Over Packet Switching Networks International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Constant Bit Rate for Video Streaming Over Packet Switching Networks Mr. S. P.V Subba rao 1, Y. Renuka Devi 2 Associate professor

More information

Lecture 2 Video Formation and Representation

Lecture 2 Video Formation and Representation 2013 Spring Term 1 Lecture 2 Video Formation and Representation Wen-Hsiao Peng ( 彭文孝 ) Multimedia Architecture and Processing Lab (MAPL) Department of Computer Science National Chiao Tung University 1

More information

Video 1 Video October 16, 2001

Video 1 Video October 16, 2001 Video Video October 6, Video Event-based programs read() is blocking server only works with single socket audio, network input need I/O multiplexing event-based programming also need to handle time-outs,

More information

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform MPEG Encoding Basics PEG I-frame encoding MPEG long GOP ncoding MPEG basics MPEG I-frame ncoding MPEG long GOP encoding MPEG asics MPEG I-frame encoding MPEG long OP encoding MPEG basics MPEG I-frame MPEG

More information

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard Ram Narayan Dubey Masters in Communication Systems Dept of ECE, IIT-R, India Varun Gunnala Masters in Communication Systems Dept

More information

Visual Communication at Limited Colour Display Capability

Visual Communication at Limited Colour Display Capability Visual Communication at Limited Colour Display Capability Yan Lu, Wen Gao and Feng Wu Abstract: A novel scheme for visual communication by means of mobile devices with limited colour display capability

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences Comparative Study of and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences Pankaj Topiwala 1 FastVDO, LLC, Columbia, MD 210 ABSTRACT This paper reports the rate-distortion performance comparison

More information

Frame Processing Time Deviations in Video Processors

Frame Processing Time Deviations in Video Processors Tensilica White Paper Frame Processing Time Deviations in Video Processors May, 2008 1 Executive Summary Chips are increasingly made with processor designs licensed as semiconductor IP (intellectual property).

More information

ITU-T Video Coding Standards

ITU-T Video Coding Standards An Overview of H.263 and H.263+ Thanks that Some slides come from Sharp Labs of America, Dr. Shawmin Lei January 1999 1 ITU-T Video Coding Standards H.261: for ISDN H.263: for PSTN (very low bit rate video)

More information

Advanced Computer Networks

Advanced Computer Networks Advanced Computer Networks Video Basics Jianping Pan Spring 2017 3/10/17 csc466/579 1 Video is a sequence of images Recorded/displayed at a certain rate Types of video signals component video separate

More information

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS ABSTRACT FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS P J Brightwell, S J Dancer (BBC) and M J Knee (Snell & Wilcox Limited) This paper proposes and compares solutions for switching and editing

More information

A RANDOM CONSTRAINED MOVIE VERSUS A RANDOM UNCONSTRAINED MOVIE APPLIED TO THE FUNCTIONAL VERIFICATION OF AN MPEG4 DECODER DESIGN

A RANDOM CONSTRAINED MOVIE VERSUS A RANDOM UNCONSTRAINED MOVIE APPLIED TO THE FUNCTIONAL VERIFICATION OF AN MPEG4 DECODER DESIGN A RANDOM CONSTRAINED MOVIE VERSUS A RANDOM UNCONSTRAINED MOVIE APPLIED TO THE FUNCTIONAL VERIFICATION OF AN MPEG4 DECODER DESIGN George S. Silveira, Karina R. G. da Silva, Elmar U. K. Melcher Universidade

More information

Analysis of MPEG-2 Video Streams

Analysis of MPEG-2 Video Streams Analysis of MPEG-2 Video Streams Damir Isović and Gerhard Fohler Department of Computer Engineering Mälardalen University, Sweden damir.isovic, gerhard.fohler @mdh.se Abstract MPEG-2 is widely used as

More information

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS Susanna Spinsante, Ennio Gambi, Franco Chiaraluce Dipartimento di Elettronica, Intelligenza artificiale e

More information

Improvement of MPEG-2 Compression by Position-Dependent Encoding

Improvement of MPEG-2 Compression by Position-Dependent Encoding Improvement of MPEG-2 Compression by Position-Dependent Encoding by Eric Reed B.S., Electrical Engineering Drexel University, 1994 Submitted to the Department of Electrical Engineering and Computer Science

More information

Drift Compensation for Reduced Spatial Resolution Transcoding

Drift Compensation for Reduced Spatial Resolution Transcoding MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY http://www.merl.com Drift Compensation for Reduced Spatial Resolution Transcoding Peng Yin Anthony Vetro Bede Liu Huifang Sun TR-2002-47 August 2002 Abstract

More information

EMBEDDED ZEROTREE WAVELET CODING WITH JOINT HUFFMAN AND ARITHMETIC CODING

EMBEDDED ZEROTREE WAVELET CODING WITH JOINT HUFFMAN AND ARITHMETIC CODING EMBEDDED ZEROTREE WAVELET CODING WITH JOINT HUFFMAN AND ARITHMETIC CODING Harmandeep Singh Nijjar 1, Charanjit Singh 2 1 MTech, Department of ECE, Punjabi University Patiala 2 Assistant Professor, Department

More information

How Does H.264 Work? SALIENT SYSTEMS WHITE PAPER. Understanding video compression with a focus on H.264

How Does H.264 Work? SALIENT SYSTEMS WHITE PAPER. Understanding video compression with a focus on H.264 SALIENT SYSTEMS WHITE PAPER How Does H.264 Work? Understanding video compression with a focus on H.264 Salient Systems Corp. 10801 N. MoPac Exp. Building 3, Suite 700 Austin, TX 78759 Phone: (512) 617-4800

More information

Reduced complexity MPEG2 video post-processing for HD display

Reduced complexity MPEG2 video post-processing for HD display Downloaded from orbit.dtu.dk on: Dec 17, 2017 Reduced complexity MPEG2 video post-processing for HD display Virk, Kamran; Li, Huiying; Forchhammer, Søren Published in: IEEE International Conference on

More information

MULTIMEDIA TECHNOLOGIES

MULTIMEDIA TECHNOLOGIES MULTIMEDIA TECHNOLOGIES LECTURE 08 VIDEO IMRAN IHSAN ASSISTANT PROFESSOR VIDEO Video streams are made up of a series of still images (frames) played one after another at high speed This fools the eye into

More information

Colour Reproduction Performance of JPEG and JPEG2000 Codecs

Colour Reproduction Performance of JPEG and JPEG2000 Codecs Colour Reproduction Performance of JPEG and JPEG000 Codecs A. Punchihewa, D. G. Bailey, and R. M. Hodgson Institute of Information Sciences & Technology, Massey University, Palmerston North, New Zealand

More information

PACKET-SWITCHED networks have become ubiquitous

PACKET-SWITCHED networks have become ubiquitous IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 7, JULY 2004 885 Video Compression for Lossy Packet Networks With Mode Switching and a Dual-Frame Buffer Athanasios Leontaris, Student Member, IEEE,

More information

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing ATSC vs NTSC Spectrum ATSC 8VSB Data Framing 22 ATSC 8VSB Data Segment ATSC 8VSB Data Field 23 ATSC 8VSB (AM) Modulated Baseband ATSC 8VSB Pre-Filtered Spectrum 24 ATSC 8VSB Nyquist Filtered Spectrum ATSC

More information

INTRA-FRAME WAVELET VIDEO CODING

INTRA-FRAME WAVELET VIDEO CODING INTRA-FRAME WAVELET VIDEO CODING Dr. T. Morris, Mr. D. Britch Department of Computation, UMIST, P. O. Box 88, Manchester, M60 1QD, United Kingdom E-mail: t.morris@co.umist.ac.uk dbritch@co.umist.ac.uk

More information

CHAPTER 2 SUBCHANNEL POWER CONTROL THROUGH WEIGHTING COEFFICIENT METHOD

CHAPTER 2 SUBCHANNEL POWER CONTROL THROUGH WEIGHTING COEFFICIENT METHOD CHAPTER 2 SUBCHANNEL POWER CONTROL THROUGH WEIGHTING COEFFICIENT METHOD 2.1 INTRODUCTION MC-CDMA systems transmit data over several orthogonal subcarriers. The capacity of MC-CDMA cellular system is mainly

More information

A Low-Power 0.7-V H p Video Decoder

A Low-Power 0.7-V H p Video Decoder A Low-Power 0.7-V H.264 720p Video Decoder D. Finchelstein, V. Sze, M.E. Sinangil, Y. Koken, A.P. Chandrakasan A-SSCC 2008 Outline Motivation for low-power video decoders Low-power techniques pipelining

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

CHROMA CODING IN DISTRIBUTED VIDEO CODING

CHROMA CODING IN DISTRIBUTED VIDEO CODING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 67-72 CHROMA CODING IN DISTRIBUTED VIDEO CODING Vijay Kumar Kodavalla 1 and P. G. Krishna Mohan 2 1 Semiconductor

More information

Content storage architectures

Content storage architectures Content storage architectures DAS: Directly Attached Store SAN: Storage Area Network allocates storage resources only to the computer it is attached to network storage provides a common pool of storage

More information

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV First Presented at the SCTE Cable-Tec Expo 2010 John Civiletto, Executive Director of Platform Architecture. Cox Communications Ludovic Milin,

More information

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come 1 Introduction 1.1 A change of scene 2000: Most viewers receive analogue television via terrestrial, cable or satellite transmission. VHS video tapes are the principal medium for recording and playing

More information

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206) Case 2:10-cv-01823-JLR Document 154 Filed 01/06/12 Page 1 of 153 1 The Honorable James L. Robart 2 3 4 5 6 7 UNITED STATES DISTRICT COURT FOR THE WESTERN DISTRICT OF WASHINGTON AT SEATTLE 8 9 10 11 12

More information

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER PERCEPTUAL QUALITY OF H./AVC DEBLOCKING FILTER Y. Zhong, I. Richardson, A. Miller and Y. Zhao School of Enginnering, The Robert Gordon University, Schoolhill, Aberdeen, AB1 1FR, UK Phone: + 1, Fax: + 1,

More information

Part1 박찬솔. Audio overview Video overview Video encoding 2/47

Part1 박찬솔. Audio overview Video overview Video encoding 2/47 MPEG2 Part1 박찬솔 Contents Audio overview Video overview Video encoding Video bitstream 2/47 Audio overview MPEG 2 supports up to five full-bandwidth channels compatible with MPEG 1 audio coding. extends

More information

Bit Rate Control for Video Transmission Over Wireless Networks

Bit Rate Control for Video Transmission Over Wireless Networks Indian Journal of Science and Technology, Vol 9(S), DOI: 0.75/ijst/06/v9iS/05, December 06 ISSN (Print) : 097-686 ISSN (Online) : 097-5 Bit Rate Control for Video Transmission Over Wireless Networks K.

More information

So far. Chapter 4 Color spaces Chapter 3 image representations. Bitmap grayscale. 1/21/09 CSE 40373/60373: Multimedia Systems

So far. Chapter 4 Color spaces Chapter 3 image representations. Bitmap grayscale. 1/21/09 CSE 40373/60373: Multimedia Systems So far. Chapter 4 Color spaces Chapter 3 image representations Bitmap grayscale page 1 8-bit color image Can show up to 256 colors Use color lookup table to map 256 of the 24-bit color (rather than choosing

More information

Lossless Compression Algorithms for Direct- Write Lithography Systems

Lossless Compression Algorithms for Direct- Write Lithography Systems Lossless Compression Algorithms for Direct- Write Lithography Systems Hsin-I Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley

More information

Advanced Video Processing for Future Multimedia Communication Systems

Advanced Video Processing for Future Multimedia Communication Systems Advanced Video Processing for Future Multimedia Communication Systems André Kaup Friedrich-Alexander University Erlangen-Nürnberg Future Multimedia Communication Systems Trend in video to make communication

More information

CHAPTER 8 CONCLUSION AND FUTURE SCOPE

CHAPTER 8 CONCLUSION AND FUTURE SCOPE 124 CHAPTER 8 CONCLUSION AND FUTURE SCOPE Data hiding is becoming one of the most rapidly advancing techniques the field of research especially with increase in technological advancements in internet and

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005. Wang, D., Canagarajah, CN., & Bull, DR. (2005). S frame design for multiple description video coding. In IEEE International Symposium on Circuits and Systems (ISCAS) Kobe, Japan (Vol. 3, pp. 19 - ). Institute

More information

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract: Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract: This article1 presents the design of a networked system for joint compression, rate control and error correction

More information