626 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012

Size: px
Start display at page:

Download "626 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012"

Transcription

1 626 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012 A 135 MHz 542 k Gates High Throughput H.264/AVC Scalable High Profile Decoder Gwo-Long Li, Yu-Chen Chen, Yuan-Hsin Liao, Po-Yuan Hsu, Meng-Hsun Wen, and Tian-Sheuan Chang, Senior Member, IEEE Abstract To satisfy the requirement of application heterogeneities, the latest H.264/AVC based video coding standard called scalable video coding additional includes temporal, SNR, and spatial scalabilities for frame rate, quality, and frame resolution adaptation. However, these inclusions significantly increase chip design difficulties such as decoding time, memory bandwidth, and area cost. This paper presents an H.264/AVC scalable high profile decoder realization with several optimization techniques to provide high throughput video decoding. For decoding flow, this paper proposes an one-pass macroblock-based quality layer decoding flow for SNR scalability and 71% of external memory bandwidth and 66% of macroblock processing cycles can be saved. For texture padding in interlayer intra prediction, the modified padding flow can save 26% of decoding time. For interlayer predictor design, this paper proposes a centralized concept for accumulation-based calculation of corresponding spatial position, simplified poly-phase interpolator, and efficient motion vector generator to save area cost and decoding time. Furthermore, the residual reconstruction path with the parallelpipeline architecture is also proposed to cope with the additional decoding complexity and thus leads to 54% of gate count savings compared to the traditional serial-pipeline architecture. Finally, the proposed H.264/AVC scalable high profile decoder design is implemented with 90 nm CMOS technology and it costs 542 k gate count and Kbytes on-chip memory while is capable to decode 60 frames/s for CIF+SD480p+HD1080p resolution with three quality layers at 135 MHz operating frequency. Index Terms Scalable video coding (SVC), SVC decoder, very large scale integration (VLSI) design. I. Introduction WITH THE PROSPERITY of portable devices, digital televisions, and internet videos, video bitstreams are required to fit different video size, quality, and frame rate. To fit above needs in a unified way, an extension of Manuscript received March 18, 2011; revised June 13, 2011 and July 26, 2011; accepted September 10, Date of publication October 10, 2011; date of current version April 2, This paper was recommended by Associate Editor R. C. Lancini. G.-L. Li is with the Industrial Technology Research Institute, Hsinchu 31040, Taiwan ( glli@itri.org.tw). Y.-C. Chen is with the VLSI Signal Processing Laboratory, National Chiao-Tung University, Hsinchu 300, Taiwan ( ycchen@dragons.ee.nctu.edu.tw). Y.-H. Liao and M.-H. Wen are with PixArt Imagine, Inc., Hsinchu 300, Taiwan ( yhliao@dragons.ee.nctu.edu.tw; mhwen@dragons.ee. nctu.edu.tw). P.-Y. Hsu and T.-S. Chang are with the Department of Electronics Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan ( pyhsu@dra gons.ee.nctu.edu.tw; tschang@dragons.ee.nctu.edu.tw). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TCSVT /$26.00 c 2011 IEEE H.264/AVC called scalable video coding (SVC) [1] has been standardized recently which can encode the video sources with diverse qualities into single bitstream and thus achieve the temporal, spatial, and quality scalabilities [2]. Fig. 1 shows the block diagram of an SVC decoder which includes the base layer of H.264/AVC and the scalable extension; in which the extracted bitstream with the necessary scalability layer is decoded by an entropy decoder for the following reconstruction processes. The reconstruction process uses the existing H.264/AVC standard coding tools like intra prediction and motion compensation for the base layer reconstruction and for the spatial and temporal scalability as well. The quality refinement process derives different quality layer information by means of accumulating scaled transform coefficients successively. Afterwards, the base layer information will be upsampled by the corresponding interlayer prediction (ILP). Once all predictions have been derived successfully, the pixel samples are reconstructed by adding residuals to those predicted samples. Finally, the deblocking filter is applied to remove the blocking effects. Above advanced techniques for SVC significantly increase the design complexities beyond the intrinsic complexity of H.264/AVC. Therefore, many researches were published recently to address its real time hardware design issues. First, the memory storage issue due to additional SVC data dependency is addressed in [3] and [4]. Reference [3] analyzed the spatial layer decoding flow and concluded that the frame-based decoding flow is the most memory efficient one. Reference [4] proposed a memory architecture for SVC with focus on reduction of on-chip memory size. For interlayer prediction, [5] proposed a cost efficient residual prediction hardware architecture for encoding. However, these works only addressed parts of component design instead of overall SVC system for optimization. For the whole chip integration, designs [6] [11] implemented and optimized H.264/AVC only decoders with focus on the power consumption, on-chip memory demand, or gate count issues. However, although these literatures can support high visual quality applications, the scalabilities specified in SVC were not supported in these literatures so that no application adaptations can be achieved. References [12] and [13] were the only two published works about the integration of a SVC decoder. However, these designs did not support combined scalabilities and only supported 30 f/s frame rate which is not enough for high visual quality video applications.

2 LI et al.: A 135 MHZ 542 K GATES HIGH THROUGHPUT H.264/AVC SCALABLE HIGH PROFILE DECODER 627 Fig. 2. Frame-based quality layer decoding flow. Fig. 1. Block diagram of a SVC decoder. To achieve high performance video decoding, we propose a H.264/AVC scalable high profile decoder with several advanced techniques to reduce the hardware costs and memory requirements. The proposed decoder not only supports main features specified in scalable high profile but also attains the following processing specifications: 1) decoding of H.264/AVC scalable high profile; 2) at most three spatial layers from QCIF to HD1080p, CIF+SD480p+HD1080p at 60 f/s (equivalent to at 51 f/s) for high visual quality applications; 3) at most three quality layers for any QP value setting; 4) all GOP sizes smaller than or equal to eight; 5) arbitrary spatial resolution ratios between spatial layers with extended spatial scalability (ESS). The rest of this paper is organized as follows. Section II presents the analysis of our proposed H.264/AVC scalable high profile decoder and the overall hardware architecture design. The detailed architecture is described in Section III. Section IV shows implementation results and comparisons with other works. Finally, a conclusion is made in Section V. II. Analysis and Architecture Overview of the Proposed H.264/AVC Scalable High Profile Decoder The design performance and cost of a video decoder mainly depend on the adopted decoding flow and corresponding memory access. In the following, we will first analyze different decoding flows and select the most memory efficient one with the consideration of memory access. With the selected flow, we will present the proposed four-stage pipelined H.264/AVC scalable high profile decoder. A. Analysis of Decoding Flow For the three scalabilities of SVC, the temporal scalability adopts the hierarchical-b structure and thus its decoding flow is the same as the traditional B-slice decoding. The spatial scalability with interlayer prediction is the most distinct part between H.264/AVC and SVC. In our previous analysis [3], the frame-based spatial decoding flow has better performance than other coding approaches like row-based or macroblock (MB)-based approach. Therefore, the frame-based spatial decoding flow will be adopted here for this design. In SVC, quality scalability with coarse-grain scalability will sum the coefficients from the base layer and the coefficient differences in the enhancement layer decoded by the entropy decoder for reconstruction. The main design challenge is the complexity due to the increasing number of quality layers. To deal with all quality layer coefficients, the decoding flow plays an important role in both processing time and memory access aspects. Therefore, we will show the analysis for two commonly used decoding flows: frame-based and MB-based flow in the following and then present our proposed one pass quality layer decoding flow for a memory efficient design. Fig. 2 shows frame-based quality layer decoding approach, in which enhancement layer frames are reconstructed after the frames in the base layer have been decoded. However, this flow introduces significant external memory access overheads since the required reference data must be loaded (I/P slice in top spatial layer) for generating the predictions. In this case, the external memory access for predicted reference data will be doubled when supporting multiple quality layers. In addition, the base layer coefficients also contribute the external memory access overheads since they should be stored for later reference in the enhancement layer. In summary, the external memory space of bytes is required for the frame size of HD1080p. An alternative way is the MB-based decoding approach [14]. In this method, the coefficients of different quality layers of the same MB are reconstructed in a successive order as shown in Fig. 3. Thus, it is no need to store the quality coefficients into external memory since they will be referenced immediately within the same MB decoding process. Instead, the internal memory is used to store the transform coefficients (384 2 bytes) for quality enhancement layer reconstruction. As a result, no external memory access for quality coefficients is required in the MB-based quality layer decoding approach. The only data to be accessed from external memory is the reference data for generating the predicted pixels. Although the external memory access overheads can be saved in the MB-based decoding flow, the MB-based decoding approach still suffers from the long decoding latency problem due to entropy decoding and the residuals reconstruction of additional quality layers. Fortunately, all of the coefficients

3 628 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012 TABLE I Memory Bandwidth Requirement for Different Quality Decoding Flow Sequence Coding Flow Prediction Quality Others Total Prediction Reference Coefficients Frame-based MB/s MB/s MB/s MB/s Blue-Sky MB-based MB/s MB/s MB/s One-pass MB/s MB/s MB/s Frame-based MB/s MB/s MB/s MB/s Tractor MB-based MB/s MB/s MB/s One-pass MB/s MB/s MB/s Frame-based MB/s MB/s MB/s MB/s Pedestrianarea MB-based MB/s MB/s MB/s One-pass MB/s MB/s MB/s Frame-based MB/s MB/s MB/s 237 MB/s Average MB-based MB/s MB/s 76 MB/s One-pass MB/s MB/s 69 MB/s GOP: 8/frame-rate: 30 f/s/qp: /frame-size: CIF-480p. **Others: the interlayer data or frame pixels to be written out. Fig. 3. MB-based quality layer decoding flow. in each quality layer can be parsed individually since the entropy decoding process of each quality layer is independent. Therefore, the parallel entropy decoding mechanism is adopted here to achieve higher entropy decoding throughput. In addition, with the combined scalability concept [15], the residual reconstruction path can be integrated into single MB processing. With above flow, we propose an one-pass quality layer decoding, in which the quality decoding can be done in one pass for all quality layers as shown in Fig. 4. Compared to the frame-based and MB-based decoding flows, 66% (only 1/3 processing cycles are required) of cycle can be reduced to support three quality layers since our proposed one-pass quality layer decoding flow can process three quality layers at the same time but the frame-based and MB-based decoding flows only decode the quality layer one by one. Furthermore, since the quality layers are processed in parallel, the prediction generation can be processed in parallel with residual reconstruction and thus lead to the single-loop prediction generation. Therefore, the prediction reference data (i.e., reference pixels for motion compensation, base layer texture, base layer residual, and so on) are only fetched once from external memory in single MB processing. Table I shows the external memory bandwidth requirements for above quality decoding flows and the results are derived by calculating the required data amount from the simulation of reference software JSVM9.14. The proposed one-pass quality decoding flow can, respectively, achieve 77.5% and 66% of memory bandwidth and processing cycle savings on average when compared to the frame-based quality decoding flow, as shown in Fig. 5. Fig. 4. Proposed one-pass quality layer decoding flow. In summary, the most memory efficient decoding flow will be the frame-based spatial layer decoding with onepass quality layer decoding, which will be adopted in our architecture design. B. Overview of Proposed Architecture With above proposed decoding flow, Fig. 6 shows the overall four-stage pipeline architecture of our high throughput H.264/AVC scalable high profile decoder, which will be briefly described as follows. The first stage is parallelized entropy decoding components and a syntax parser, in which at most three DCT coefficients from three quality layers are parsed in this stage to support one-pass quality layer decoding. However, by adopting our previous proposed high throughput entropy decoding [16], [17], we need only two sets of context-based adaptive variable length coding (CAVLC)/context-based adaptive binary arithmetic coding (CABAC) decoders, which result in less hardware cost. Besides, the motion vector difference (MVD) is also derived in this stage and will be passed to the next stage along with parsed syntax parameters. The second stage is organized by the residual reconstruction path, interlayer predictor, motion vector generator, and other reconstruction elements. In this stage, the coefficients received from the first stage are fed into the reconstruction path, such as the inverse quantization, coefficients refinement, inverse transform, and residual accumulation pipeline chain,

4 LI et al.: A 135 MHZ 542 K GATES HIGH THROUGHPUT H.264/AVC SCALABLE HIGH PROFILE DECODER 629 TABLE II Comparison of Serial- and Parallel-Pipeline Strategy Item Serial-Pipeline Parallel-Pipeline Inverse quantization Coefficients refinement Inverse transform Residual accumulation Residual/texture Input selector Interpolation Basic interpolator Ver Ver Hor Hor Total Memory requirement 768 Bytes 0 Synthesized by 90 nm CMOS process at 135 MHz operating frequency. Fig. 5. Comparison of quality decoding flows in (a) memory bandwidth and (b) MB processing cycles. to derive residual information. In this path, triple sets of reconstruction pipeline chains are used to produce different quality layer residuals in parallel for one pass quality layer decoding. The ILP module generates interlayer predictions by upsampling the information of the base layer, including motion vectors, reconstructed pixels, and residuals. The ILP interpolator will be involved in the pipeline chain to obtain the final residuals if the residual prediction mode is applied to current MB. Besides, MV generator generates the motion vectors for the bi-directional reference of current MB. With the derived motion vectors and partition sizes, reference pixels for motion compensation can be fetched at once. The third stage will generate the predicted pixels of both intercoded and intracoded modes. The produced predictions are then added with residuals from the second stage to form the predeblocking samples. In the fourth stage these samples are filtered by the deblocking filter, and padded by the texture padding module for interlayer upsampling. In this design, each pipeline stage is separated by pipelined ping-pong buffers for interleaved read or write. To support high visual quality application with reasonable operating frequency, we will discuss the cycle limit for designing our system. Instead of designing our system with the highest operating frequency possible, we finally select the operating frequency of 135 MHz for our system since it is multiple of 27 MHz (a commonly used base frequency in current video system design) and is also easily achievable with the modern 0.13 μm or 90 nm CMOS technology. With above designs and the target specification, the cycle limit for each pipeline stage will be 227 = ( )/[ ] Frequency total MBs in one frame frame rate (1) Fig. 6. Architecture of proposed decoder. cycles for single MB decoding at the selected 135 MHz operation frequency. Under the available clock cycle counts constraint, the optimization techniques are hence proposed with suitable parallelism and pipelining. III. Detailed Architectures of the Proposed Decoder A. First Stage Design In our decoder design, the first stage is composed by entropy decoding including syntax parser, CAVLC, and CABAC decoder. Fig. 7 shows the system level architecture of the proposed entropy decoder, in which dual hardware units of our previous works [16], [17] are adopted here to support high decoding throughput. In addition, to support the extra adopted scalabilities in SVC, we have made some modifications for the overall entropy decoding process. First, we adopt a bitstream scanner to quickly detect the start points of quality enhancement layers and nonquality enhancement layers by recognizing the start code to make two entropy decoding engines work in parallel. Afterwards, the addresses are transmitted to the bitstream fetcher to access the distinguished quality enhancement layer bitstream from the nonquality enhancement layer bitstream. Second, to reduce hardware cost overhead, one CABAC decoder for the quality enhancement layer is simplified by removing the unused context models in quality enhancement layers since the MB information is inherited from the quality base layer when decoding the quality enhancement layers. As

5 630 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012 TABLE III Symmetry of Coefficient Table: (a) Bi-Linear Filter, (b) 4-Tap Filter (a) Phase Coefficients C 0 C (b) Phase Interpolation Filter Coefficients C 1 C 0 C 1 C TABLE IV Input Selection and Classification for Adders Mode Phase Adder A Adder B Adder C Adder D in0 in1 in2 in0 in1 in2 in0 in1 in0 in1 in b 16b A B C b 16b 0 2c 0 0 a d A B C b 16b b 0 4c 0 2a d A B C b 16b 2b 2c 4c d a 2a A B C b 16b 4b 0 8c d a 2a A B C b 16b 2b 2c 8c c 4a d A B C b 16b 0 2c 8c 4c 4a 2d A B C b 16b 0 2d 16c d a 2a A B C 0 8 C 2C 16B 0 b c a d A B 2B b 16b A B C b 16b 2b 0 0 2c 0 0 A B C b 8b 4b 4c A B C b 8b 2b 4c 0 2c 0 0 A B C b 8b 0 0 8c A B C b 2b 4b 0 8c 2c 0 0 A B C b 0 4b 4c 8c A B C b 2b 0 4c 8c 2c 0 0 A B C 1 8 8b c A B C Set A Set B Set C Fig. 8. Residual reconstruction flow for SNR scalability. Fig. 7. Detailed framework of proposed entropy decoder. a result, bits memory space can be saved for storing upper MB information such as mvd and mb type. Astothe CAVLC decoder engine, no simplification can be applied since it is designed for decoding residual block information. B. Second Stage Design In this stage, the interlayer prediction and related prediction modules are realized to derive the prediction information and residuals for the following reconstruction usage. Interlayer prediction includes interlayer texture prediction, interlayer residual prediction, and interlayer motion prediction. Moreover, to support nondyadic spatial scalability, a more complex mechanism, ESS, [18] has been adopted to achieve nondyadic interlayer prediction. The interlayer prediction process with ESS includes two steps. The first step is the operation called calculation of corresponding spatial positions (CCSP) which computes the corresponding position in the base layer (BL). The second step executes the interlayer intra or residual prediction. In the following subsections, the detail design principles of our proposed ILP module will be described. 1) Interlayer Residual Prediction Module (ILresP): Fig. 8 shows the typical residual reconstruction flow for SNR scalability. Coefficients in the enhancement layer are reconstructed by summing the delta coefficients from the previous quality layers. Then, these coefficients are summed with the interlayer residuals to derive the residuals of different quality layers. This multilevel coefficient summation path makes the video quality scalable for different requirements but at the cost of complex reconstruction path. To speed up above reconstruction, a straightforward design is to use pipelining, as shown in Fig. 9. Each pipeline stage deals with a block of samples and then passes the results to next stage in every cycle. After first four cycles, the residuals of a block from the same MB are generated in successive cycles. A straightforward pipelined design called the serial-pipeline strategy is to reconstruct quality refinement coefficients from different layers in serial order within different timing intervals. By reusing the computations in every quality layer, only one set of pipeline processing unit is required. However, a size of bytes coefficient refinement buffer is required to restore the coefficients of quality base layer. In addition, to

6 LI et al.: A 135 MHZ 542 K GATES HIGH THROUGHPUT H.264/AVC SCALABLE HIGH PROFILE DECODER 631 TABLE V List of Gate Count of Proposed Decoder Module Gate Counts Entropy decoder + syntax parser Motion compensation Deblocking filter Interlayer prediction Centralized CCSP Texture/residual upsample MV upsample External data buffer Residual reconstruction Inverse DCT and Hadamard transform Inverse quantization Reconstruction + control 4095 Intraprediction Prediction generator + control Neighboring pixel buffer 7490 Texture padding Padding unit Neighboring pixel buffer 5472 Memory controller System control 2001 Total Synthesized by UMC90 at 135 MHz. TABLE VI List of SRAM Requirement of Proposed Decoder (Unit: Kbyte) Module SRAM Requirement Single Port Dual/Two Port Entropy decoder + syntax parser Motion compensation Deblocking filter Neighboring pixels (Q0+Qmax) Others Interlayer prediction 4 Intraprediction Luma neighboring data Chroma neighboring pixels Texture padding Pipeline ping-pong buffer Stage Coefficients(Q0+Q1+Q2) MVDs Stage Residuals (Q0+Qmax) MC reference pixels MVs Stage Reconstructed Pels(Q0+Qmax) MVs Total Total (single port + two/dual port) Fig. 9. Pipeline chain of residual reconstruction. meet 227 cycles timing constraint, the processing throughput has to be eight in serial-pipeline strategy. In this paper, we adopt the 4-pixel/cycle throughput parallelpipeline strategy that triples the pipeline chains to separately reconstruct the residuals of different quality layers as shown in Fig. 10. Table II lists the synthesis gate counts and memory requirements of major components in both strategies. From Table II, we can observe that the gate counts of the serial-parallel strategy are larger than that of the parallelpipeline strategy. This situation can be analyzed as follows. First, to meet a 227-cycle timing constraint in our design, 8-pixels/cycle throughput has to be supported in the serialpipeline strategy, but only 4-pixels/cycle for the parallelpipeline strategy. Therefore, the hardware cost of the serialpipeline strategy is much higher than that of the parallelpipeline strategy from the throughput perspective. Besides, although components such as inverse quantization and inverse transform are tripled in different quality layers, the largest area component, the interlayer prediction module, is not tripled in the parallel-pipeline strategy. That is because all quality layers use the same contents as their prediction with the combined scalability. Thus, only one set of interpolator is required for the parallel reconstruction for different quality layers. In addition, by simplifying the most complex part in the interlayer interpolator, total gate counts in the interpolation Fig. 10. Parallel-pipeline chain for quality layer processing. are significantly reduced in spite of the increasing gate counts of pipelined components. Furthermore, the memory usage is saved in the parallel-pipeline strategy via the coefficient passing technique. As a result, the parallel-pipeline method is adopted in this paper. 2) Proposed Centralized Accumulation-Based CCSP Engine: Compared to the dyadic spatial scalability, the nondyadic one suffers from the issue of indirect position mapping between two spatial layers. ESS adopts the CCSP scheme to map the corresponding samples between two successive layers. This paper adopts the simplification form of the CCSP operation from [18]. As a result, the area cost of the CCSP can be reduced to be simple accumulators, which is a more efficient way in hardware architecture design. Second, the repeated CCSP operations can be eliminated to save the computation complexity. The CCSP operation is to calculate the corresponding position of current MB in the BL by generating some spatial information such as interpolation phase and corresponding MB partition. In the traditional interlayer prediction flow, the CCSP would be executed several times to generate the corresponding spatial

7 632 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012 Fig. 11. Architecture of the proposed interpolator. parameters for various interlayer prediction modes. However, since the spatial parameters mentioned above are all the same for all interlayer prediction modes when decoding a MB, these spatial parameters can be stored temporarily for the following decoding usage and thus the repeated computations for deriving the spatial parameters can be eliminated. Therefore, we propose a centralized CCSP approach to remove these repeated computations. In the proposed strategy, the accumulationbased CCSP is only executed once to generate spatial parameters including interpolation phase and the corresponding MB partition, and the generated spatial parameters are stored internally in the accumulation-based CCSP engine for the further reuse purpose. Afterwards, for all interlayer prediction modes, the stored spatial parameters can be freely used without recalculation and thus result in the flexible prediction flow and decoding time improvement. 3) Shared Interpolation Components for Interlayer Intra and Residual Prediction: Both of interlayer intra and interlayer residual prediction use the poly-phase 4-tap filter and bilinear filter to generate the prediction samples for luma and chroma signals, respectively. These filter coefficients have the symmetrical property as listed in Table III, in which the Phase indicates the index of interpolation coefficient set used in the following interpolation operation. Thus, half of contents in the table can be reduced by exchanging inputs with symmetric coefficient columns in the table for hardware realization. Fig. 11 shows our proposed hybrid interpolation module with three pipeline stages to implement 4-tap and bilinear interpolation in a shared adder tree. In Stage I, reference samples are rearranged according to the interpolation phase and filtering mode as listed in the above table. In Stage II, the scaling engine produces scaled elements and classifies them to three sets. The classifying strategy is listed in Table IV. Finally, Stage III uses a simple two-level adder-tree architecture to compute interpolation values. 4) MV Generator for Interlayer Motion Prediction: To efficiently exploit the relationship between interlayers, SVC adopts the interlayer motion prediction mechanism by upsampling motion vector if the spatial resolution changes. The whole process first finds the corresponding motion vectors in the base layer, and upsamples the motion vector as follows: MVEL =(MVBL Dmv ) 16 (2) where MVBL is the reference motion vector in the base layer, and Dmv is the spatial parameter which represents the ratio between spatial layers. MVEL indicates the target derived motion vector in the current MB. To implement motion vector upsampling, we use two multipliers to realize (2) for the MVx and MVy in the proposed MV Fig. 12. Fig. 13. Timing schedule of MV generator. Data path of intra prediction. generator, which takes 16 cycles to scale the motion vectors in a reference list. Afterwards, the scaled motion vectors and the reference indexes are refined and merged by profiling their similarities to decide new partitions and MVs and thus avoiding inconsistency. Furthermore, the MV upsampling can be accelerated by the identification of direct 8 8 inference flag. This optional flag is used to signal whether the other three motion vectors within the same 8 8 block should be set to the motion vector of corner one in bi-direction slice type frame. Therefore, by using direct 8 8 inference flag, only eight motion vectors are required to be upsampled and the MV merge step is skipped due to the motion vectors are already integrated. Fig. 12 shows the timing schedule of the proposed MV generator determined by the slice type and direct 8 8 inference flag. With flag identification, the processing time is thus saved by the adaptive scheduling. C. Third Stage Design This stage reconstructs the sample pixels by using the decoded residual and prediction data from previous stages through inverse intra prediction and motion compensation as described as follows. 1) Intra Prediction: Fig. 13 shows the architecture of the proposed intra predictor in which four-pixel parallelism is adopted to achieve best tradeoff between hardware cost and the cycle budget. With the required neighboring data, prediction of current MB can be generated in a sequential order. The prediction data would be added with residuals to form the reconstructed pixels. During the reconstruction process, the reconstructed data of the lowest quality layer will be updated to the neighboring buffer for the un-processed blocks. Besides, with these reconstructed samples, the highest quality layer residuals can be formed by a simple accumulation. The resid-

8 LI et al.: A 135 MHZ 542 K GATES HIGH THROUGHPUT H.264/AVC SCALABLE HIGH PROFILE DECODER 633 TABLE VII Comparison with Other State-of-the-Art Video Decoders [7] [8] [11] [12] [13] Proposed Technology 0.18 μm 0.18 μm 0.09 μm 0.03 μm 0.09 μm 0.09 μm Max clock rate 100 MHz 120 MHz 175 MHz 163 MHz 210 MHz 135 MHz Profile MPEG-2 SP, H.264 L4 H.264 BP/MP H.264 High MPEG-2 MP, H.264 HP, SVC SBP SVC HP MVC HP SVC HP at L5 Max spec. (H.264) at 30 f/s at at 60 f/s at 30 f/s at 24 f/s at 60 f/s f/s Max spec. (SVC) N/A N/A N/A N/A 1 SpatialScalability 3 CombinedScalability 2 SNRScalability Gate count K 160 K 662 K 439 K K K Internal memory Kbtyes 4.5 Kbtyes 59.6 Kbytes 10.9 Kbytes 8.99 Kbtyes Kbtyes Max throughput MB/s MB/s MB/s MB/s MB/s MB/s Gate efficiency MB/Kgates-s 1530 MB/Kgates-s MB/Kgates-s MB/Kgates-s MB/Kgates-s MB/Kgates-s Max throughput = frame rate processing MBs in (spatial + quality) layers. 1 SpatialScalability: ( )+( ) at 30 f/s. 2 SNRScalability: ( ) 4 quality layers at 30 f/s. 3 CombinedScalability: [( )+( )+( )] 3 quality layers at 60 f/s. Fig. 15. Proposed combined BL-level padding and deblocking flow. Fig. 14. Proposed pipeline architecture of motion compensation design. uals with different quality layers are accessed in parallel in this paper and thus the reconstruction of two different quality layers can be processed within the same cycle. 2) Motion Compensation: The high memory bandwidth requirement in motion compensation is also the bottleneck in a video decoder design. Therefore, reusing the overlapped data inside a partitioned block [19] [23] can solve this problem. To efficiently reduce memory bandwidth with less hardware complexity and cost, this paper adopts block size based data request approach [20] [22] with direct data accessing mechanism to acquire reference data for motion compensation. In other words, the reference data is only fetched from external memory in necessary without any aids of cache buffers or complex addresses generators so that the hardware costs and implementation complexity can be reduced significantly. Simulation results show that about 62% 74% [20] [22] of the data bandwidth can be reduced. Fig. 14 shows the block diagram of proposed motion compensation architecture in two pipeline stages, data access and interpolation. The first stage consists of motion vector generation and reference pixel accessing modules to generate MVs of current MB and its data request to memory controller for accessing reference pixels from external memory. The returned reference pixels are collected in a pixel register array, and then written to a reference data buffer for next MB use. The data rearrangement for the pixel register array is briefly stated below. At the beginning, the reference data are requested from external memory according to the partition size. If the partition size is for current MB, pixels will be acquired from external memory and stored into the pixel register array. However, for other partition size smaller than 16 16, we will fetch and store the reference data for one partition at a time. For example, if the partition size is 16 8, the reference data of one block would be stored into the pixel register array first. Once this block has finished the operations of motion compensation, the pixel register array would be refreshed immediately and the new reference data of another block would be fetched and stored. The same data rearranging strategy has also been applied for other partition sizes. The second stage consists of the interpolation module which is used to interpolate fractional pixels from reference data and then reconstruct pixels by adding the interpolated pixels and residuals together.

9 634 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012 D. Fourth Stage Design This stage includes the deblocking filter and texture padding modules. The deblocking filter module adopts the designs from [24] and [25] for high throughput requirement. The texture padding module will generate the un-available regions in the spatial base layer for texture upsampling used in the interlayer intra prediction of the enhancement layer. To reduce the padding complexity, we adopt a BL-level padding flow [14] in our design. In the BL-level padding flow, the padding procedure is moved from the enhancement layer to the base layer. In other words, the MB pre-padding is executed in the base layer to extend the reconstruction border and fill up the un-available inter coded regions so that no extension procedure will be executed in the enhancement layer and thus save the cycle time. It is worth mentioning that the deblocking filter would not be applied in these MBs since the inter coded MBs in the spatial base layer would not be reconstructed to derive the pixel samples. Thus, pre-padding for un-available region instead of deblocking can be processed during the intercoded MBs decoding process. Therefore, we combine the padding and deblocking in the same stage in our design as shown in Fig. 15 so that the padding cycle can be hidden without additional penalty since the deblocking filter component is commonly designed as an independent pipeline stage in a video decoder. From simulation results, 26% of decoding time can be saved for interlayer intra coded MBs on average with the combined padding and deblocking flow. IV. Implementation Results The proposed architecture is implemented in Verilog HDL, and synthesized by Synopsys Design Compiler with UMC 90 nm 1P9M CMOS Technology Library in the worse case setting. The decoded videos of our decoder are verified to be the same as the decoded videos of reference software JSVM9.14 [26] with various test conditions (for example, the video sequences of Bluesky, Pedestrain area, and Tractor with three spatial, temporal, quality layers as specified in Section I). Table V shows the detailed gate count of each component. The result shows that the total gate count of this paper is about 542 k for the target 135 MHz operating frequency; in which, the entropy decoder occupies the most of the area due to parallel designs and two types of decoding scheme. Table VI lists the internal memory requirements among components and pipeline stages. In summary, the total internal memory requirement in this design is Kbytes. The major buffer cost is due to buffering all required key picture data of two reconstructed quality layers and pipeline data for the parallel reconstruction. The comparison of this paper and other state-of-the-art video decoders is listed in Table VII. Since only two SVC literatures [12], [13] have been published so far, H.264/AVC HD decoders [7], [8], and [11] are adopted here for comparison. Generally, the gate count costs of SVC decoder are larger than that of H.264/AVC decoder due to the additional scalabilities. It majorly comes from the interlayer prediction which introduces high arithmetic complexity and numerous external data buffers. Also, SVC applications have more external memory requirements as well. Compared to other SVC decoders, [13] can support two spatial layers with only one quality layer each, and four quality layers inside single spatial layer. However, our design can provide three spatial layers with three quality scalabilities in each spatial layer. In addition, this paper can provide superior Max throughput for multiple scalabilities in which the Max throughput represents the processing capability for combined spatial and quality scalability as defined in Table VII. To normalize the performance, the Gate efficiency is calculated by means of computing the max throughput per kilo gates. The results show that this paper has better performance in Gate efficiency when compared to other designs. V. Conclusion This paper presented a complete H.264/AVC scalable high profile decoder with optimizations from decoding flow analysis to module implementation with focus on the design of three scalabilities. For decoding flow, we proposed an one-pass quality decoding method to save 71% of memory bandwidth and at most 66% of MB processing time. For interlayer prediction, we proposed the combined decoding and padding flow, centralized accumulation-based CCSP concept, simplified poly-phase interpolator, and efficient motion vector upsampling to save the area cost and decoding time. Furthermore, the proposed parallel-pipeline architecture for the residual reconstruction path can achieve 54% of gate count savings compared to the traditional serialpipeline architecture. Implementation results show that the proposed decoder can simultaneously decode 60 frames/s for CIF+SD480p+HD1080p resolution along with three quality layers under 135 MHz operating frequency with k gate counts and Kbytes internal memory consumption. References [1] H. Schwarz, D. Marpe, and T. Wiegand, Overview of the scalable video coding extension of the H.264/AVC standard, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp , Sep [2] Joint Draft 11 of SVC Amendment, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Oct [3] P.-Y. Hsu, G.-L. Li, and T.-S. Chang, Memory analysis for H.264/AVC scalable extension decoder, in Proc. APSIPA Annu. Summit Conf., Oct. 2009, pp [4] N. D. Narvekar, B. Konnanath, S. Mehta, S. Chintalapati, I. Alkamal, C. Chakrabarti, and L. J. Karam, An H.264/SVC memory architecture supporting spatial and course-grained quality scalabilities, in Proc. IEEE Conf. Image Process., Nov. 2009, pp [5] Y. H. Chen, T. D. Chuang, C. Y. Tsai, Y. J. Chen, and L. G. Chen, A cost-efficient residual prediction VLSI architecture for H.264/AVC scalable extension, in Proc. Picture Coding Symp., Nov [6] D. Zhou, Z. You, J. Zhu, J. Kong, Y. Hong, X. Chen, X. He, C. Xu, H. Zhang, J. Zhou, N. Deng, P. Liu, and S. Goto, A 1080p@60fps multistandard video decoder chip designed for power and cost efficiency in a system perspective, in Proc. Symp. VLSI Circuit, Jun. 2009, pp [7] T. M. Liu, T. A. Lin, S. Z. Wang, W. P. Lee, J. Y. Yang, K. C. Hou, and C. Y. Lee, A 125 μw, fully scalable MPEG-2 and H.264/AVC video decoder for mobile applications, IEEE J. Solid-State Circuits, vol. 42, no. 1, pp , Jan [8] C. C. Lin, J. I. Guo, H. C. Chang, Y. C. Yang, J. W. Chen, M. C. Tsai, and J. S. Wang, A 160 kgate 4.5 kb SRAM H.264 video decoder for HDTV applications, in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2006, pp

10 LI et al.: A 135 MHZ 542 K GATES HIGH THROUGHPUT H.264/AVC SCALABLE HIGH PROFILE DECODER 635 [9] C. D. Chien, C. A. Chien, J. C. Chu, J. I. Guo, and C. H. Cheng, A 252 Kgates/4.9 Kbytes SRAM/71 mw multistandard video decoder for high definition video applications, ACM Trans. Design Autom. Electron. Syst., vol. 14, no. 1, p. 17:17, Jan [10] V. Sze, D. F. Finchelstein, M. E. Sinangil, and A. P. Chandrakasan, A 0.7 V 1.8-mW H.264/AVC 720p video decoder, IEEE J. Solid-State Circuits, vol. 4, no. 11, pp , Nov [11] D. Zhou, J. Zhou, X. He, J. Zhu, J. Kong, P. Liu, and S. Goto, A 530 Mpixels/s @60fps H.264/AVC high profile video decoder chip, IEEE J. Solid-State Circuits, vol. 46, no. 4, pp , Apr [12] C. A. Chien, Y. C. Yang, H. C. Chang, J. W. Chen, C. Y. Chang, J. I. Guo, J. S. Wang, and C. W. Cheng, A H.264/MPEG-2 dual mode video decoder chip supporting temporal/spatial scalable video, in Proc. Asia South Pac. Des. Autom. Conf., Jan. 2011, pp [13] T. D. Chuang, P. K. Tsung, P. C. Lin, L. M. Chang, T. C. Ma, Y. H. Chen, Y. H. Chen, C. Y. Tsai, and L. G. Chen, A 59.5 mw scalable/multiview video decoder chip for quad/3-d full HDTV and video streaming applications, in Proc. IEEE Solid-State Circuits Conf., Feb. 2010, pp [14] T. D. Chuang, P. K. Tsung, P. C. Lin, L. M. Chang, T. C. Ma, Y. H. Chen, and L. G. Chen, Low bandwidth decoder framework for H.264/AVC scalable extension, in Proc. IEEE Symp. Circuits Syst., May 2010, pp [15] H. Schwarz, D. Marpe, T. Schierl, and T. Wiegand, Combined scalability support for the scalable extension of H.264/AVC, in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2005, pp [16] Y. H. Liao, G. L. Li, and T. S. Chang, A 385 MHz K gates CAVLD decoder for level 5.1 H.264/AVC video, IEEE Trans. Circuits Syst. Video Technol., to be published. [17] Y. H. Liao, G. L. Li, and T. S. Chang, A high throughput VLSI design with hybrid memory architecture for H.264/AVC CABAD decoder, in Proc. IEEE Int. Symp. Circuits Syst., May 2010, pp [18] C. A. Segall and G. J. Sullivan, Spatial scalability within the H.264/AVC scalable video coding extension, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp , Sep [19] C. Y. Tsai, T. C. Chen, T. W. Chen, and L. G. Chen, Bandwidth optimized motion compensation hardware design for H.264/AVC HDTV decoder, in Proc. IEEE Int. Midwest Symp. Circuit Syst., vol. 2. Aug. 2005, pp [20] T. D. Chuang, L. M. Chang, T. W. Chiu, Y. H. Chen, and L. G. Chen, Bandwidth-efficient cache-based motion compensation architecture with DRAM-friendly data access control, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr. 2009, pp [21] P. Chao and Y. L. Lin, A motion compensation system with a high efficiency reference frame pre-fetch scheme for QFHD H.264/AVC decoding, in Proc. IEEE Int. Symp. Circuits Syst., May 2008, pp [22] P. Chao and Y. L. Lin, Reference frame access optimization for ultrahigh resolution H.264/AVC decoding, in Proc. IEEE Int. Conf. Multimedia Expo., Jun. 2008, pp [23] P. Chao and Y. L. Lin, An elastic software cache with fast prefetching for motion compensation in video decoding, in Proc. IEEE/ACM Int. Conf. Hardw.-Softw. Codesign Syst. Synthesis, Oct. 2010, pp [24] F. Tobajas, G. M. Callico, P. A. Perez, V. de Armas, and R. Sarmiento, An efficient double-filter hardware architecture for H.264/AVC deblocking filtering, IEEE Trans. Consumer Electron., vol. 54, no. 1, pp , Feb [25] K. Xu and C.-S. Choy, A five-stage pipeline, 204 cycles/mb, singleport SRAM-based deblocking filter for H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 3, pp , Mar [26] JSVM Software Version JSVM 9.14, ITU-T, I. JTC1. Gwo-Long Li received the B.S. degree from the Department of Computer Science and Information Engineering, Shu-Te University, Kaohsiung, Taiwan, in 2004, the M.S. degree from the Department of Electrical Engineering, National Dong-Hwa University, Hualien, Taiwan, in 2006, and the Ph.D. degree from the Department of Electronics Engineering, National Chiao-Tung University, Hsinchu, Taiwan, in He is currently an Engineer with the Industrial Technology Research Institute, Hsinchu. His current research interests include the video signal processing and its very large scale integration architecture design. Dr. Li received the Excellent Master Thesis Award from the Institute of Information and Computer Machinery in Yu-Chen Chen received the M.S. degree from the Department of Electronics Engineering, National Chiao-Tung University (NCTU), Hsinchu, Taiwan, in In 2008, he joined the VLSI Signal Processing Laboratory, NCTU. His current research interests include signal processing, scalable video coding, as well as very large scale integration architecture design of video decoder. Yuan-Hsin Liao received the B.S. and M.S. degrees in electronics engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 2008 and 2010, respectively. In 2010, he joined PixArt Imagine, Inc., Hsinchu. His current research interests include video processing, computer vision, IP, and system-on-chip design. Po-Yuan Hsu received the B.S. and M.S. degrees from the Department of Electronics Engineering, National Chiao-Tung University, Hsinchu, Taiwan, in 2007 and 2009, respectively. His current research interests include digital signal processing, scalable video coding, and associated very large scale integration architectures. Meng-Hsun Wen received the B.S. and M.S. degrees in electrical engineering from National Chiao- Tung University, Hsinchu, Taiwan, in 2009 and 2011, respectively. After graduation, he joined PixArt Imagine, Inc., Hsinchu. His major research interests include H.264/AVC video coding and associated very large scale integration architecture design. Tian-Sheuan Chang (S 93 M 06 SM 07) received the B.S., M.S., and Ph.D. degrees in electronic engineering from National Chiao-Tung University (NCTU), Hsinchu, Taiwan, in 1993, 1995, and 1999, respectively. He is currently an Associate Professor with the Department of Electronics Engineering, NCTU. From 2000 to 2004, he was a Deputy Manager with Global Unichip Corporation, Hsinchu. His current research interests include (silicon) intellectual property and system-on-a-chip design, very large scale integration signal processing, and computer architecture.

WITH the demand of higher video quality, lower bit

WITH the demand of higher video quality, lower bit IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 8, AUGUST 2006 917 A High-Definition H.264/AVC Intra-Frame Codec IP for Digital Video and Still Camera Applications Chun-Wei

More information

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS 9th European Signal Processing Conference (EUSIPCO 2) Barcelona, Spain, August 29 - September 2, 2 A 6-65 CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS Jinjia Zhou, Dajiang

More information

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension 05-Silva-AF:05-Silva-AF 8/19/11 6:18 AM Page 43 A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension T. L. da Silva 1, L. A. S. Cruz 2, and L. V. Agostini 3 1 Telecommunications

More information

Chapter 2 Introduction to

Chapter 2 Introduction to Chapter 2 Introduction to H.264/AVC H.264/AVC [1] is the newest video coding standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The main improvements

More information

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC Motion Compensation Techniques Adopted In HEVC S.Mahesh 1, K.Balavani 2 M.Tech student in Bapatla Engineering College, Bapatla, Andahra Pradesh Assistant professor in Bapatla Engineering College, Bapatla,

More information

SCALABLE video coding (SVC) is currently being developed

SCALABLE video coding (SVC) is currently being developed IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 7, JULY 2006 889 Fast Mode Decision Algorithm for Inter-Frame Coding in Fully Scalable Video Coding He Li, Z. G. Li, Senior

More information

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame I J C T A, 9(34) 2016, pp. 673-680 International Science Press A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame K. Priyadarshini 1 and D. Jackuline Moni

More information

FRAME RATE BLOCK SELECTION APPROACH BASED DIGITAL WATER MARKING FOR EFFICIENT VIDEO AUTHENTICATION USING NETWORK CONDITIONS

FRAME RATE BLOCK SELECTION APPROACH BASED DIGITAL WATER MARKING FOR EFFICIENT VIDEO AUTHENTICATION USING NETWORK CONDITIONS FRAME RATE BLOCK SELECTION APPROACH BASED DIGITAL WATER MARKING FOR EFFICIENT VIDEO AUTHENTICATION USING NETWORK CONDITIONS A. Kirthika 1 and A. Senthilkumar 2 1 Department of Electronics and Communication

More information

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur Module 8 VIDEO CODING STANDARDS Lesson 27 H.264 standard Lesson Objectives At the end of this lesson, the students should be able to: 1. State the broad objectives of the H.264 standard. 2. List the improved

More information

THE new video coding standard H.264/AVC [1] significantly

THE new video coding standard H.264/AVC [1] significantly 832 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 9, SEPTEMBER 2006 Architecture Design of Context-Based Adaptive Variable-Length Coding for H.264/AVC Tung-Chien Chen, Yu-Wen

More information

An Efficient Reduction of Area in Multistandard Transform Core

An Efficient Reduction of Area in Multistandard Transform Core An Efficient Reduction of Area in Multistandard Transform Core A. Shanmuga Priya 1, Dr. T. K. Shanthi 2 1 PG scholar, Applied Electronics, Department of ECE, 2 Assosiate Professor, Department of ECE Thanthai

More information

Video coding standards

Video coding standards Video coding standards Video signals represent sequences of images or frames which can be transmitted with a rate from 5 to 60 frames per second (fps), that provides the illusion of motion in the displayed

More information

Jun-Hao Zheng et al.: An Efficient VLSI Architecture for MC of AVS HDTV Decoder 371 ture for MC which contains a three-stage pipeline. The hardware ar

Jun-Hao Zheng et al.: An Efficient VLSI Architecture for MC of AVS HDTV Decoder 371 ture for MC which contains a three-stage pipeline. The hardware ar May 2006, Vol.21, No.3, pp.370 377 J. Comput. Sci. & Technol. An Efficient VLSI Architecture for Motion Compensation of AVS HDTV Decoder Jun-Hao Zheng 1;3 (ΨΞ ), Lei Deng 2 ( Π), Peng Zhang 1;3 (Φ ±),

More information

Overview: Video Coding Standards

Overview: Video Coding Standards Overview: Video Coding Standards Video coding standards: applications and common structure ITU-T Rec. H.261 ISO/IEC MPEG-1 ISO/IEC MPEG-2 State-of-the-art: H.264/AVC Video Coding Standards no. 1 Applications

More information

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard Conference object, Postprint version This version is available

More information

A Low-Power 0.7-V H p Video Decoder

A Low-Power 0.7-V H p Video Decoder A Low-Power 0.7-V H.264 720p Video Decoder D. Finchelstein, V. Sze, M.E. Sinangil, Y. Koken, A.P. Chandrakasan A-SSCC 2008 Outline Motivation for low-power video decoders Low-power techniques pipelining

More information

H.264/AVC Baseline Profile Decoder Complexity Analysis

H.264/AVC Baseline Profile Decoder Complexity Analysis 704 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003 H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, Senior

More information

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION 1 YONGTAE KIM, 2 JAE-GON KIM, and 3 HAECHUL CHOI 1, 3 Hanbat National University, Department of Multimedia Engineering 2 Korea Aerospace

More information

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions 1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,

More information

A Novel VLSI Architecture of Motion Compensation for Multiple Standards

A Novel VLSI Architecture of Motion Compensation for Multiple Standards A Novel VLSI Architecture of Motion Compensation for Multiple Standards Junhao Zheng, Wen Gao, Senior Member, IEEE, David Wu, and Don Xie Abstract Motion compensation (MC) is one of the most important

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005. Wang, D., Canagarajah, CN., & Bull, DR. (2005). S frame design for multiple description video coding. In IEEE International Symposium on Circuits and Systems (ISCAS) Kobe, Japan (Vol. 3, pp. 19 - ). Institute

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

The H.26L Video Coding Project

The H.26L Video Coding Project The H.26L Video Coding Project New ITU-T Q.6/SG16 (VCEG - Video Coding Experts Group) standardization activity for video compression August 1999: 1 st test model (TML-1) December 2001: 10 th test model

More information

Adaptive Key Frame Selection for Efficient Video Coding

Adaptive Key Frame Selection for Efficient Video Coding Adaptive Key Frame Selection for Efficient Video Coding Jaebum Jun, Sunyoung Lee, Zanming He, Myungjung Lee, and Euee S. Jang Digital Media Lab., Hanyang University 17 Haengdang-dong, Seongdong-gu, Seoul,

More information

A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm

A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm Mustafa Parlak and Ilker Hamzaoglu Faculty of Engineering and Natural Sciences Sabanci University, Tuzla, 34956, Istanbul, Turkey

More information

A High Performance Deblocking Filter Hardware for High Efficiency Video Coding

A High Performance Deblocking Filter Hardware for High Efficiency Video Coding 714 IEEE Transactions on Consumer Electronics, Vol. 59, No. 3, August 2013 A High Performance Deblocking Filter Hardware for High Efficiency Video Coding Erdem Ozcan, Yusuf Adibelli, Ilker Hamzaoglu, Senior

More information

Multimedia Communications. Video compression

Multimedia Communications. Video compression Multimedia Communications Video compression Video compression Of all the different sources of data, video produces the largest amount of data There are some differences in our perception with regard to

More information

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

The Multistandard Full Hd Video-Codec Engine On Low Power Devices The Multistandard Full Hd Video-Codec Engine On Low Power Devices B.Susma (M. Tech). Embedded Systems. Aurora s Technological & Research Institute. Hyderabad. B.Srinivas Asst. professor. ECE, Aurora s

More information

Motion Compensation Hardware Accelerator Architecture for H.264/AVC

Motion Compensation Hardware Accelerator Architecture for H.264/AVC Motion Compensation Hardware Accelerator Architecture for H.264/AVC Bruno Zatt 1, Valter Ferreira 1, Luciano Agostini 2, Flávio R. Wagner 1, Altamiro Susin 3, and Sergio Bampi 1 1 Informatics Institute

More information

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION Heiko

More information

A QFHD 30 fps HEVC Decoder Design

A QFHD 30 fps HEVC Decoder Design 9035 1 A QFHD 30 fps HEVC Decoder Design Pai-Tse Chiang, Yi-Ching Ting, Hsuan-Ku Chen, Shiau-Yu Jou, I-Wen Chen, Hang-Chiu Fang and Tian-Sheuan Chang, Senior Member, IEEE, Abstract The HEVC video standard

More information

ALONG with the progressive device scaling, semiconductor

ALONG with the progressive device scaling, semiconductor IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 4, APRIL 2010 285 LUT Optimization for Memory-Based Computation Pramod Kumar Meher, Senior Member, IEEE Abstract Recently, we

More information

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding 1240 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 6, DECEMBER 2011 On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding Zhan Ma, Student Member, IEEE, HaoHu,

More information

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System Zhibin Xiao and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Outline Introduction to H.264

More information

Multimedia Communications. Image and Video compression

Multimedia Communications. Image and Video compression Multimedia Communications Image and Video compression JPEG2000 JPEG2000: is based on wavelet decomposition two types of wavelet filters one similar to what discussed in Chapter 14 and the other one generates

More information

WITH the rapid development of high-fidelity video services

WITH the rapid development of high-fidelity video services 896 IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 7, JULY 2015 An Efficient Frame-Content Based Intra Frame Rate Control for High Efficiency Video Coding Miaohui Wang, Student Member, IEEE, KingNgiNgan,

More information

Real-time SHVC Software Decoding with Multi-threaded Parallel Processing

Real-time SHVC Software Decoding with Multi-threaded Parallel Processing Real-time SHVC Software Decoding with Multi-threaded Parallel Processing Srinivas Gudumasu a, Yuwen He b, Yan Ye b, Yong He b, Eun-Seok Ryu c, Jie Dong b, Xiaoyu Xiu b a Aricent Technologies, Okkiyam Thuraipakkam,

More information

Video Encoder Design for High-Definition 3D Video Communication Systems

Video Encoder Design for High-Definition 3D Video Communication Systems INTEGRATED CIRCUITS FOR COMMUNICATIONS Video Encoder Design for High-Definition 3D Video Communication Systems Pei-Kuei Tsung, Li-Fu Ding, Wei-Yin Chen, Tzu-Der Chuang, Yu-Han Chen, Pai-Heng Hsiao, Shao-Yi

More information

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Selective Intra Prediction Mode Decision for H.264/AVC Encoders Selective Intra Prediction Mode Decision for H.264/AVC Encoders Jun Sung Park, and Hyo Jung Song Abstract H.264/AVC offers a considerably higher improvement in coding efficiency compared to other compression

More information

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0 General Description Applications Features The OL_H264MCLD core is a hardware implementation of the H.264 baseline video compression

More information

Decoder Hardware Architecture for HEVC

Decoder Hardware Architecture for HEVC Decoder Hardware Architecture for HEVC The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Tikekar, Mehul,

More information

An FPGA Implementation of Shift Register Using Pulsed Latches

An FPGA Implementation of Shift Register Using Pulsed Latches An FPGA Implementation of Shift Register Using Pulsed Latches Shiny Panimalar.S, T.Nisha Priscilla, Associate Professor, Department of ECE, MAMCET, Tiruchirappalli, India PG Scholar, Department of ECE,

More information

Video Over Mobile Networks

Video Over Mobile Networks Video Over Mobile Networks Professor Mohammed Ghanbari Department of Electronic systems Engineering University of Essex United Kingdom June 2005, Zadar, Croatia (Slides prepared by M. Mahdi Ghandi) INTRODUCTION

More information

ISSCC 2006 / SESSION 14 / BASEBAND AND CHANNEL PROCESSING / 14.6

ISSCC 2006 / SESSION 14 / BASEBAND AND CHANNEL PROCESSING / 14.6 ISSCC 2006 / SESSION 14 / BASEBAND AND CHANNEL PROSSING / 14.6 14.6 A 1.8V 250mW COFDM Baseband Receiver for DVB-T/H Applications Lei-Fone Chen, Yuan Chen, Lu-Chung Chien, Ying-Hao Ma, Chia-Hao Lee, Yu-Wei

More information

Design of a Fast Multi-Reference Frame Integer Motion Estimator for H.264/AVC

Design of a Fast Multi-Reference Frame Integer Motion Estimator for H.264/AVC http://dx.doi.org/10.5573/jsts.2013.13.5.430 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.13, NO.5, OCTOBER, 2013 Design of a Fast Multi-Reference Frame Integer Motion Estimator for H.264/AVC Juwon

More information

A Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding

A Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding 8 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 47, NO. 1, JANUARY 2012 A Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding Vivienne Sze, Member, IEEE, and Anantha P. Chandrakasan,

More information

Reduced complexity MPEG2 video post-processing for HD display

Reduced complexity MPEG2 video post-processing for HD display Downloaded from orbit.dtu.dk on: Dec 17, 2017 Reduced complexity MPEG2 video post-processing for HD display Virk, Kamran; Li, Huiying; Forchhammer, Søren Published in: IEEE International Conference on

More information

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard Ram Narayan Dubey Masters in Communication Systems Dept of ECE, IIT-R, India Varun Gunnala Masters in Communication Systems Dept

More information

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Implementation of an MPEG Codec on the Tilera TM 64 Processor 1 Implementation of an MPEG Codec on the Tilera TM 64 Processor Whitney Flohr Supervisor: Mark Franklin, Ed Richter Department of Electrical and Systems Engineering Washington University in St. Louis Fall

More information

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder. Video Streaming Based on Frame Skipping and Interpolation Techniques Fadlallah Ali Fadlallah Department of Computer Science Sudan University of Science and Technology Khartoum-SUDAN fadali@sustech.edu

More information

Scalable multiple description coding of video sequences

Scalable multiple description coding of video sequences Scalable multiple description coding of video sequences Marco Folli, and Lorenzo Favalli Electronics Department University of Pavia, Via Ferrata 1, 100 Pavia, Italy Email: marco.folli@unipv.it, lorenzo.favalli@unipv.it

More information

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy Vladimir Afonso 1-2, Henrique Maich 1, Luan Audibert 1, Bruno Zatt 1, Marcelo Porto 1, Luciano Agostini

More information

Error concealment techniques in H.264 video transmission over wireless networks

Error concealment techniques in H.264 video transmission over wireless networks Error concealment techniques in H.264 video transmission over wireless networks M U L T I M E D I A P R O C E S S I N G ( E E 5 3 5 9 ) S P R I N G 2 0 1 1 D R. K. R. R A O F I N A L R E P O R T Murtaza

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 4,000 116,000 120M Open access books available International authors and editors Downloads Our

More information

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS Yuanyi Xue, Yao Wang Department of Electrical and Computer Engineering Polytechnic

More information

17 October About H.265/HEVC. Things you should know about the new encoding.

17 October About H.265/HEVC. Things you should know about the new encoding. 17 October 2014 About H.265/HEVC. Things you should know about the new encoding Axis view on H.265/HEVC > Axis wants to see appropriate performance improvement in the H.265 technology before start rolling

More information

Visual Communication at Limited Colour Display Capability

Visual Communication at Limited Colour Display Capability Visual Communication at Limited Colour Display Capability Yan Lu, Wen Gao and Feng Wu Abstract: A novel scheme for visual communication by means of mobile devices with limited colour display capability

More information

The H.263+ Video Coding Standard: Complexity and Performance

The H.263+ Video Coding Standard: Complexity and Performance The H.263+ Video Coding Standard: Complexity and Performance Berna Erol (bernae@ee.ubc.ca), Michael Gallant (mikeg@ee.ubc.ca), Guy C t (guyc@ee.ubc.ca), and Faouzi Kossentini (faouzi@ee.ubc.ca) Department

More information

Design Challenge of a QuadHDTV Video Decoder

Design Challenge of a QuadHDTV Video Decoder Design Challenge of a QuadHDTV Video Decoder Youn-Long Lin Department of Computer Science National Tsing Hua University MPSOC27, Japan More Pixels YLLIN NTHU-CS 2 NHK Proposes UHD TV Broadcast Super HiVision

More information

Joint Algorithm-Architecture Optimization of CABAC

Joint Algorithm-Architecture Optimization of CABAC Noname manuscript No. (will be inserted by the editor) Joint Algorithm-Architecture Optimization of CABAC Vivienne Sze Anantha P. Chandrakasan Received: date / Accepted: date Abstract This paper uses joint

More information

Implementation of Memory Based Multiplication Using Micro wind Software

Implementation of Memory Based Multiplication Using Micro wind Software Implementation of Memory Based Multiplication Using Micro wind Software U.Palani 1, M.Sujith 2,P.Pugazhendiran 3 1 IFET College of Engineering, Department of Information Technology, Villupuram 2,3 IFET

More information

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks July 22 nd 2008 Vineeth Shetty Kolkeri EE Graduate,UTA 1 Outline 2. Introduction 3. Error control

More information

Parameters optimization for a scalable multiple description coding scheme based on spatial subsampling

Parameters optimization for a scalable multiple description coding scheme based on spatial subsampling Parameters optimization for a scalable multiple description coding scheme based on spatial subsampling ABSTRACT Marco Folli and Lorenzo Favalli Universitá degli studi di Pavia Via Ferrata 1 100 Pavia,

More information

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences Comparative Study of and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences Pankaj Topiwala 1 FastVDO, LLC, Columbia, MD 210 ABSTRACT This paper reports the rate-distortion performance comparison

More information

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264 Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264 Ju-Heon Seo, Sang-Mi Kim, Jong-Ki Han, Nonmember Abstract-- In the H.264, MBAFF (Macroblock adaptive frame/field) and PAFF (Picture

More information

Memory interface design for AVS HD video encoder with Level C+ coding order

Memory interface design for AVS HD video encoder with Level C+ coding order LETTER IEICE Electronics Express, Vol.14, No.12, 1 11 Memory interface design for AVS HD video encoder with Level C+ coding order Xiaofeng Huang 1a), Kaijin Wei 2, Guoqing Xiang 2, Huizhu Jia 2, and Don

More information

Project Proposal Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359

Project Proposal Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359 Project Proposal Time Optimization of HEVC Encoder over X86 Processors using SIMD Spring 2013 Multimedia Processing Advisor: Dr. K. R. Rao Department of Electrical Engineering University of Texas, Arlington

More information

A Reed Solomon Product-Code (RS-PC) Decoder Chip for DVD Applications

A Reed Solomon Product-Code (RS-PC) Decoder Chip for DVD Applications IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001 229 A Reed Solomon Product-Code (RS-PC) Decoder Chip DVD Applications Hsie-Chia Chang, C. Bernard Shung, Member, IEEE, and Chen-Yi Lee

More information

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0 General Description Applications Features The OL_H264e core is a hardware implementation of the H.264 baseline video compression algorithm. The core

More information

A low-power portable H.264/AVC decoder using elastic pipeline

A low-power portable H.264/AVC decoder using elastic pipeline Chapter 3 A low-power portable H.64/AVC decoder using elastic pipeline Yoshinori Sakata, Kentaro Kawakami, Hiroshi Kawaguchi, Masahiko Graduate School, Kobe University, Kobe, Hyogo, 657-8507 Japan Email:

More information

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work Introduction to Video Compression Techniques Slides courtesy of Tay Vaughan Making Multimedia Work Agenda Video Compression Overview Motivation for creating standards What do the standards specify Brief

More information

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna

More information

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes Digital Signal and Image Processing Lab Simone Milani Ph.D. student simone.milani@dei.unipd.it, Summer School

More information

LUT Optimization for Memory Based Computation using Modified OMS Technique

LUT Optimization for Memory Based Computation using Modified OMS Technique LUT Optimization for Memory Based Computation using Modified OMS Technique Indrajit Shankar Acharya & Ruhan Bevi Dept. of ECE, SRM University, Chennai, India E-mail : indrajitac123@gmail.com, ruhanmady@yahoo.co.in

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018

Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018 Into the Depths: The Technical Details Behind AV1 Nathan Egge Mile High Video Workshop 2018 July 31, 2018 North America Internet Traffic 82% of Internet traffic by 2021 Cisco Study

More information

Design of Memory Based Implementation Using LUT Multiplier

Design of Memory Based Implementation Using LUT Multiplier Design of Memory Based Implementation Using LUT Multiplier Charan Kumar.k 1, S. Vikrama Narasimha Reddy 2, Neelima Koppala 3 1,2 M.Tech(VLSI) Student, 3 Assistant Professor, ECE Department, Sree Vidyanikethan

More information

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER Wassim Hamidouche, Mickael Raulet and Olivier Déforges

More information

MPEG-2. ISO/IEC (or ITU-T H.262)

MPEG-2. ISO/IEC (or ITU-T H.262) 1 ISO/IEC 13818-2 (or ITU-T H.262) High quality encoding of interlaced video at 4-15 Mbps for digital video broadcast TV and digital storage media Applications Broadcast TV, Satellite TV, CATV, HDTV, video

More information

Video Compression - From Concepts to the H.264/AVC Standard

Video Compression - From Concepts to the H.264/AVC Standard PROC. OF THE IEEE, DEC. 2004 1 Video Compression - From Concepts to the H.264/AVC Standard GARY J. SULLIVAN, SENIOR MEMBER, IEEE, AND THOMAS WIEGAND Invited Paper Abstract Over the last one and a half

More information

Frame Processing Time Deviations in Video Processors

Frame Processing Time Deviations in Video Processors Tensilica White Paper Frame Processing Time Deviations in Video Processors May, 2008 1 Executive Summary Chips are increasingly made with processor designs licensed as semiconductor IP (intellectual property).

More information

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding Jun Xin, Ming-Ting Sun*, and Kangwook Chun** *Department of Electrical Engineering, University of Washington **Samsung Electronics Co.

More information

Hardware Decoding Architecture for H.264/AVC Digital Video Standard

Hardware Decoding Architecture for H.264/AVC Digital Video Standard Hardware Decoding Architecture for H.264/AVC Digital Video Standard Alexsandro C. Bonatto, Henrique A. Klein, Marcelo Negreiros, André B. Soares, Letícia V. Guimarães and Altamiro A. Susin Department of

More information

OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS

OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS Habibollah Danyali and Alfred Mertins School of Electrical, Computer and

More information

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Video coding Concepts and notations. A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Each image is either sent progressively (the

More information

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur Module 8 VIDEO CODING STANDARDS Lesson 24 MPEG-2 Standards Lesson Objectives At the end of this lesson, the students should be able to: 1. State the basic objectives of MPEG-2 standard. 2. Enlist the profiles

More information

MPEG has been established as an international standard

MPEG has been established as an international standard 1100 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999 Fast Extraction of Spatially Reduced Image Sequences from MPEG-2 Compressed Video Junehwa Song, Member,

More information

/$ IEEE

/$ IEEE 568 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 5, MAY 2007 Fast Algorithm and Architecture Design of Low-Power Integer Motion Estimation for H.264/AVC Tung-Chien Chen,

More information

Variable Block-Size Transforms for H.264/AVC

Variable Block-Size Transforms for H.264/AVC 604 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003 Variable Block-Size Transforms for H.264/AVC Mathias Wien, Member, IEEE Abstract A concept for variable block-size

More information

Error Resilient Video Coding Using Unequally Protected Key Pictures

Error Resilient Video Coding Using Unequally Protected Key Pictures Error Resilient Video Coding Using Unequally Protected Key Pictures Ye-Kui Wang 1, Miska M. Hannuksela 2, and Moncef Gabbouj 3 1 Nokia Mobile Software, Tampere, Finland 2 Nokia Research Center, Tampere,

More information

COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.

COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS. COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS. DILIP PRASANNA KUMAR 1000786997 UNDER GUIDANCE OF DR. RAO UNIVERSITY OF TEXAS AT ARLINGTON. DEPT.

More information

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010 Study of AVS China Part 7 for Mobile Applications By Jay Mehta EE 5359 Multimedia Processing Spring 2010 1 Contents Parts and profiles of AVS Standard Introduction to Audio Video Standard for Mobile Applications

More information

Dual Frame Video Encoding with Feedback

Dual Frame Video Encoding with Feedback Video Encoding with Feedback Athanasios Leontaris and Pamela C. Cosman Department of Electrical and Computer Engineering University of California, San Diego, La Jolla, CA 92093-0407 Email: pcosman,aleontar

More information

A Novel Architecture of LUT Design Optimization for DSP Applications

A Novel Architecture of LUT Design Optimization for DSP Applications A Novel Architecture of LUT Design Optimization for DSP Applications O. Anjaneyulu 1, Parsha Srikanth 2 & C. V. Krishna Reddy 3 1&2 KITS, Warangal, 3 NNRESGI, Hyderabad E-mail : anjaneyulu_o@yahoo.com

More information

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Constant Bit Rate for Video Streaming Over Packet Switching Networks International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Constant Bit Rate for Video Streaming Over Packet Switching Networks Mr. S. P.V Subba rao 1, Y. Renuka Devi 2 Associate professor

More information

Key Techniques of Bit Rate Reduction for H.264 Streams

Key Techniques of Bit Rate Reduction for H.264 Streams Key Techniques of Bit Rate Reduction for H.264 Streams Peng Zhang, Qing-Ming Huang, and Wen Gao Institute of Computing Technology, Chinese Academy of Science, Beijing, 100080, China {peng.zhang, qmhuang,

More information

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure Representations Multimedia Systems and Applications Video Compression Composite NTSC - 6MHz (4.2MHz video), 29.97 frames/second PAL - 6-8MHz (4.2-6MHz video), 50 frames/second Component Separation video

More information

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding 356 IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.1, January 27 Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding Abderrahmane Elyousfi 12, Ahmed

More information

Low Power H.264 Deblocking Filter Hardware Implementations

Low Power H.264 Deblocking Filter Hardware Implementations 808 IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, MAY 2008 Low Power H.264 Deblocking Filter Hardware Implementations Mustafa Parlak and Ilker Hamzaoglu Abstract In this paper, we present

More information

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards COMP 9 Advanced Distributed Systems Multimedia Networking Video Compression Standards Kevin Jeffay Department of Computer Science University of North Carolina at Chapel Hill jeffay@cs.unc.edu September,

More information