A QFHD 30 fps HEVC Decoder Design
|
|
- Jayson Ford
- 6 years ago
- Views:
Transcription
1 A QFHD 30 fps HEVC Decoder Design Pai-Tse Chiang, Yi-Ching Ting, Hsuan-Ku Chen, Shiau-Yu Jou, I-Wen Chen, Hang-Chiu Fang and Tian-Sheuan Chang, Senior Member, IEEE, Abstract The HEVC video standard provides superior compression with large and variable-sized coding units and advanced prediction modes, which leads to high buffer costs, memory bandwidth and irregular computation for ultra high definition (HD) video decoding hardware. Thus, this paper presents an HEVC decoder with a four-stage mixed block size pipeline to reduce by approximately 91% the pipeline stage buffer size compared with the 64x64 block-based pipeline. The high memory bandwidth due to motion compensation problem was solved by blockbased data access, precision-based data access and a smart buffer to reduce the data bandwidth by 88%. In addition, for irregular computation, a reconfigurable architecture was adopted to unify the variable size transform. A common intra prediction module was also designed with a 4x4 block-based bottom up computation for variable size intra prediction and modes in a regular manner. Furthermore, the corner position computation for the motion vector predictor (MVP) was applied to handle variable size motion compensation. Finally, the implementation with the TSMC 90 nm CMOS process used 467 K logic gates and KBytes of on-chip memory and supported @30fps video decoding at a 270 MHz operation frequency. Index Terms High efficiency video coding, decoder, VLSI implementation I. INTRODUCTION To meet the ultra high definition (HD) video compression demand, high efficiency video coding (HEVC) [1][2], which is the latest video coding standard, has recently been standardized to provide a 50% bit rate reduction over the previously popular H.264/AVC standard. Although HEVC uses a similar hybrid coding scheme as that of H.264, Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an to pubs-permissions@ieee.org. Manuscript received Jan. 28, 2014; revised May , Aug , Nov. 28, 2014, and Feb. 1, 2015; and accepted Feb This work was supported by Ministry of Science and Technology, R. O. C., under Grant E All the authors are with Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R. O. C. ( tschang@mail.nctu.edu.tw) consisting of inter- and intra-frame prediction, transform units, an in-loop filter, and entropy coding, it displays a marked improvement over H.264 in several aspects. 1) Large hierarchical units: Instead of the fixed macroblocks in H.264, HEVC partitions a frame into raster scanned coding tree units (CTU) that are fixed to 64x64, 32x32 or. Each CTU is further recursively split into a smaller coding unit (CU), down to the size of 8x8. Each CU is split into one, two or four prediction units (PUs) from 64x64 to 4x4 (intra) or 8x4/4x8 (inter) with mixed intra or inter prediction instead of all-inter or all-intra predictions as in H.264. This approach results in the data dependency of the intra and the inter predictions. Each CU is recursively split into transform units (TUs) from 32x32 to 4x4. These large, hierarchical and flexible CUs, PUs and TUs can encode dynamic content very well, but the size of the largest CTU is 16x larger than the previous marcroblock, which significantly increases the required buffer size. 2) Advanced predictions: HEVC uses 35 intra-prediction modes, including DC, planar and 33 angular modes for all PU sizes, which is much more than the 9 modes in H.264. In addition, the inter-prediction in HEVC uses an 8-tap interpolation filter instead of a 6-tap one, as found in H.264. Generally speaking, these predictions can model content better; however, they also lead to irregular computation and high memory bandwidth. 3) Simplified structure: The in-loop filter uses a simpler deblocking filter, operating on an 8x8 grid instead of a small 4x4 grid as in H.264. This approach enables parallel filtering of different edges. HEVC also introduces a new loop filter, sample adaptive offset (SAO), to reduce any distortion between the reconstructed and original data. Moreover, HEVC uses only one type of entropy coding to simplify the design. The above-mentioned improvements certainly facilitate better coding efficiency but, at the same time, also require significant computational complexity [3], large on-chip storage and high memory bandwidth, especially for real-time ultra HD video processing, which in turn demands a hardware implementation to meet the real-time requirements. To meet the above requirements, several decoder designs have been recently proposed [4]-[6]. An FPGA prototype has been proposed in [4] to decode 1080p@60fps with a 7-stage pipeline architecture. To avoid complex synchronization of the variable PU and TU size within a CTU, they adopted the
2 largest CTU level (64x64) as their basic pipeline unit, which leads to a simple control but incurs a significant pipeline buffer cost. [5] proposed a 3840x2160@30fps decoder that uses a two-stage sub-pipeline scheme. Their pipeline scheme was also a CTU-based design but was able to adapt to different CTU sizes. Their design adopted variable-sized pipeline blocks for processing that had a fixed height (64) and CTU dependent widths, (16, 32 or 64) to reduce the data access switching between luma and chroma pixels. Although this approach unified the control flow, the pipeline buffer size was still based on 64x64 for the worst-case condition. Furthermore, [6] proposed a 1080p@30fps decoder that used a four-stage pipeline with embedded compression to reduce bandwidth. Several single module designs have also been proposed [7]-[14]. However, all these module designs are still waiting for suitable tailoring to meet the needs of the entire decoder. This paper presents an efficient HEVC decoder design with a four-stage pipeline to address, in particular, the buffer, irregular computation and memory bandwidth issues. For the buffer size, we analyzed the data dependency of the decoder and proposed a mixed block size pipeline that uses 32x32 for the first stage and for the rest of the stages instead of 64x64 or a fixed height for the entire pipeline. The complex control of the pipeline was avoided through a 4x4 block-based prediction structure for adaptation to variable PU sizes. This approach can reduce the total buffer size by 87.2% compared to the previous design [5]. The memory bandwidth in the motion compensation (MC) was reduced by blockbased data access and a reference data cache with a simplified addressing scheme. Moreover, to handle irregular computations, a reconfigurable transform architecture, common intra-prediction modules with a 4x4 bottom-up structure, and corner position computation for motion vector prediction (MVP) were used. All these optimizations led to a 35% gate count reduction, compared to the previously reported design [5]. The rest of the paper is organized as follows. In Section II, the design analysis and pipeline overview of the proposed decoder will be presented. In Section III, the details of the component designs will be described. Next, the implementation results and design comparisons will be shown in Section IV. Finally, conclusions will be made in Section V. II. MIXED BLOCK SIZE PIPELINE A. Overview of the four-stage pipeline Fig. 1 shows the pipeline of the proposed HEVC decoder. For the target specification, 4096x2160@30fps at 270 MHz with the 90 nm CMOS process, the decoder was divided into four pipeline stages: entropy decoder, IQ/IT and reference data loading, reconstruction of prediction, and loop filters. The overall functions of the four stages are as follows. The first stage, the entropy decoder, decodes residual coefficients and other information with the context adaptive binary arithmetic decoder (CABAD). Then, the second stage reconstructs the residual data through inverse quantization (IQ) and inverse transform (IT). At the same stage, the corresponding motion vector (MV) is generated to load reference data into a smart buffer. The third stage reconstructs pixels by adding the residual data to the prediction values. Finally, the reconstruction data are filtered by the two in-loop filters, deblocking and SAO, for the final output. B. Analysis of the pipeline The pipeline block size directly affects the final scheduling and the cost of both stage buffers and intermediate buffers. The stage buffer size can be estimated as follows. Assume a 2Nx2N pipeline block. The buffer size for the transform coefficient buffer, residual buffer, or pre-filter buffer is given by (Size for Y Size for Cb and Cr)x2 = (2Nx2N NxNx2)x2 = 12N 2 (1) This size consists of buffers for Y, Cb, and Cr in the 4:2:0 format with the ping-pong buffer style to match the throughput RISC Bitstream DRAM Bitstream Buffer Memory Controller CABAD Transform Coeff. Buf. Inverse Quantization Inverse Transform Ref. Pixel Accessing MV Generation Smart Buf. Pred. Info. Buf. Ref. Pixel. Buf. Intra Pred. Interpolation Rec. SAO Deblocking Filter v REC. Pixels Buf. Residuals Buf. 1 st stage 2 nd stage 3 rd stage 4 th stage (c) 2015 IEEE. Personal Fig. 1 use Pipeline is permitted, of the but proposed republication/redistribution HEVC decoder requires IEEE permission. See
3 differences between the stages. The size of the smart buffer (reference data buffer for MC) can be roughly estimated as no. of 8x4 Y block 8x4 interpolation no. of 4x2 CbCr block 4x2 interpolation 2 = NxNx2 x(15x11) 2Nx2N 8x4 (2) 4x2 x(7x5) x2 = 58.75N2 Byte This size considers only the data for one pipeline block for simplicity and consists of buffers for Y, Cb, and Cr in the 4:2:0 format with the ping-pong buffer style. Thus, the total stage buffer size is 12N 2 ( ) 58.75N2 = N 2 for 16-b/9-b/8-b precision for the transform coefficient buffer, residual buffer, and pre-filter buffer, respectively. A direct design with a 64x64 block size (N=32) will need KB SRAM. The buffer size can be simplified to only the Y buffer size, i.e., N 2, if Y and CbCr data are interleaved. The optimized buffer size should consider the trade-off concerning the computational dependency and efficiency within and between the modules, the buffer cost and the external memory bandwidth, which will be derived below. For the first stage, the only restriction for the buffer between CABAD and IQ/IT is to accommodate the maximum IT size, 32x32. A buffer size larger than 32x32, say, 64x64, is unnecessary and does not provide any remarkable benefit. However, a smaller buffer cannot compute a larger size transform without complicated scheduling. In addition, the transform coefficients decoded from CABAD have to be reordered before the inverse transform. Hence, we had to set the transform coefficient buffer size to 32x32x2 instead of 64x64x2. The corresponding IT pipeline block was set to 32x32 for the first 1-D IT. However, the second 1-D IT could be decomposed to two units with even-odd decomposition. Thus, the maximum required size of the IT output buffer, the residual buffer, would be x2 instead of 32x32x2, which saves 3/4 of the SRAM. For the second stage, Fig. 2 shows the ratios of the external data access increase to smart buffer size for MC interpolation relative to a 64x64 block for a typical video sequence. The data access amount depends on the selected PU size and the MC pipeline block size. If the PU size is larger than the MC pipeline block size, it is necessary to split the PU access into smaller ones to fit the pipeline block size, which results in accessing duplicated data. Thus, a smaller pipeline size will lead to much higher data access even though it has a smaller buffer cost. Therefore, we choose as a tradeoff. The above-mentioned analysis ignored reuse of duplicated data between blocks. If we consider possible data reuse schemes as discussed in Section III, the smart buffer size could be increased to (32x32)x2 = 2048 Bytes to save memory bandwidth for list 0 and list 1. Ratio Fig. 2 Ratio of data access increase and buffer size relative to the 64x64 pipeline block. For the final stage, the pre-filter pixel buffer stores the reconstructed output for the following deblocking filter. Thus, the stage buffer size was set to x2 to store data from the previous stage with a pipeline block. Its internal processing is decomposed into an 8x8 sub-pipeline block to fit 8x8-grid edges at the deblocking filter. For line buffers to store neighboring information, such as neighboring MVs, prediction information, reference data for intra prediction, and deblocking buffers, we only store part of the them on-chip (e.g., three LCU wide buffers) and other data off-chip to reduce buffer sizes, which is similar to previous H.264 decoder chips and an HEVC decoder design [6]. In summary, the pipeline size was set to 32x32 for CABAD, IQ/IT and for the rest of the modules. The stage buffer size was 7.232KB, which saves 62% and 91% of the buffer cost when compared with the 32x32 and 64x64 cases, respectively. C. Scheduling of the proposed design Fig. 3 shows the mixed block size scheduling of the proposed design with four pipeline stages and interleaved luma and chroma block processing. First, this design decoded the bitstream by CABAD at the first stage in a 32x32 block size and then applied IQ and the first 1-D IT at the second stage in a 32x32 block size for luma blocks. The second 1-D IT was operated in a block. The chroma blocks were operated in a block size. Once the required data were available, the subsequent operations could be started. The cycle count in each module was not fixed because of variable PU size and different complexity. A. CABAD 8x8 32x32 64x64 III. Pipeline block size COMPONENT DESIGNS buffer size data amount When the targeted HEVC decoder design was 4096x2160@30fps, the number of bins required was 110 Mbins/sec to 120 Mbins/sec based on our simulations for the target bit rate specification [1]. The limits on particular CTUs and 32x32 units could be much higher for the worst-case condition (e.g., 4x4 block with CTB), which rarely
4 occurs in practical conditions. Thus, 270 Mbins/sec is enough for bit rate variations based on the simulations. Thus, the single bin per cycle architecture was proposed and has been shown in Fig. 4 with a throughput of 270 Mbins/s when operating at 270 MHz. The input bitstream was first decoded through binary arithmetic decoding, and the decoded bin was then directly passed to the context selection because of the smaller critical delay. The context selection produced one context address from two parallel options of the decoded bin to avoid dependency and ensure a single bin per cycle. The context address and the context state from the context modeling were then sent to the binary arithmetic decoding for the next bin decoding. B. Inverse transform The inverse transform matrix in the HEVC possesses the property of coefficient symmetry; the same coefficients are present in the even rows and similar coefficients are in the odd rows for all of the TU sizes. Thus, this paper proposes a reconfigurable inverse transform architecture [7] that is well suited to various TU sizes. This architecture adopts the commonly used row-column decomposition to convert a 2D transform into two consecutive 1-D transforms. Each 1-D 32- point transform is further decomposed into two 1-D 16-point transforms with even-odd decomposition. The even part of a 16-point transform can again be decomposed into smaller size transforms to adapt to various TU sizes by reusing the same coefficients. However, the odd part has different but similar coefficients. Thus, we decompose the coefficients in the odd part into a base coefficient for one TU size and a refined term for another TU size to compensate for the difference. The scheduling of the proposed design involves a row-byrow processing of each of the 32x32 units, regardless of the TU combination within a single 32x32 unit. Moreover, one 32-point row is reconfigured into any legal TU combination from (4, 8, 16, 32) to maintain regular processing and full hardware utilization. C. Deblocking filter and SAO Fig. 5 shows the interleaved scheduling of the proposed nonpipelined deblocking filter (see the details in Table I), which operates on a 4x4 block base. The processing of luma and chroma blocks is also interleaved. The number of clock cycles for a block is 182. The proposed filter design first takes one line of pixels from two adjacent 4x4 blocks as inputs for every cycle, computes its strength and finally applies the suitable filters. Each edge in a 4x4 block takes a total of 8 cycles: 2 cycles for boundary strength, 1 cycle for filter strength, 4 cycles for edge filtering, and 1 cycle to write back the remaining output. In the interleaved scheduling, horizontal filtering is first Time 32x32 B3 32x32 B2 IQ/IT 32x32 B1 MC 32x32 B1 Intra prediction 32x32 B0 In-loop filtering 32x32 1st-D 2nd-D Y 2nd-D 1098 cycle CABAD 2nd-D Y 2nd-D CbCr CbCr Y CbCr Fig. 3 Schedule of the decoder Y CbCr state Pipeline buffer Selection Next bin binarization generator address state Modeling Initial & Read state address Pipeline buffer Binary Arithmetic decoding Bin decoder State update decoded bin Pipeline buffer Fig. 4 Single bin CABAD architecture
5 performed over its two vertical edges, from the top four-line unit to the bottom four-line unit (edge 1 and edge 2). Because all the data are available for the horizontal edge 3, vertical filtering is then performed over the top horizontal edge (edge 3 to edge 4). This horizontal-vertical interleaved approach is repeated for every 8x8 block in the Z-scan order. This interleaved approach reuses all the available data immediately to reduce the intermediate buffer cost. B0 B1 B2 B3 3 4 B B5 B6 5 B7 h1 CYCLE UNIT (8 CYCLES) Table I. Scheduling of the deblocking filter P BLK Q BLOCK PROCESSI NG EDGE C1 B4 B5 E1 C2 B8 B9 E2 FILTERED BLOCK TO SAO C3 B0 B4 E3 B0, B4 C4 B1 B5 E4 B1, B5 B8 B12 2 B9 B10 6 B v1 B13 B B16 10 B17 B18 14 B19 v2 B15 Fig. 5 Interleaved scheduling of the proposed deblocking filter Fig. 6 shows the deblocking filter architecture that shares horizontal and vertical filters with the same filters. The input is from the pre-filer buffer or the line buffer that stores partial filtered data, depending on the edge locations. All memory access schemes follow the interleaved scheduling. Fig. 7 shows the proposed SAO design, which is tightly coupled with the deblocking filter. This design receives its four columns of 8-pixel input from the deblocking filter every 16 cycles, computes and then accumulates its offset cost: band offset (BO) or edge offset (EO). Once the data are ready, it selects the minimum as the final offset operation for every 8x8 block. BO has no dependency to different rows of pixels. EO has four patterns: horizontal, vertical, 45 degree and 135 degree, which require the neighboring upper and lower rows of reference pixels for computation, except the horizontal pattern. The cost computation is a direct implementation as these equations have been optimized in the standard. Filtered blocks for SAO Register_ group1 Bs calculation Choose Block Source form Ping-pong buffer Register_ group2 Filter strength calculation h2 Weak filter Strong filter Filter Decision Deblocking filter output Internal buffer Buffer C(8x8bit) Buffer D(8x8bit) Reference buffer Buffer A(192x8bit) External memory Buffer E(8x8bit) Buffer F(8x8bit) Buffer B(192x8bit) BO EO_pattern0 EO_pattern1 EO_pattern2 EO_pattern3 Fig. 7 Architecture of the proposed SAO D. Reconstruction of intra prediction Min cost selection The main challenges in the design of intra prediction were the various PU sizes, ranging from 64x64 to 4x4, and the 35 prediction modes, which complicate computation and on-chip data fetching. A previously reported design [11] used several parallel data paths to calculate prediction equations for different PU sizes, but the only supported PU sizes were 4 4 and 8 8. This method resulted in low hardware utilization to satisfy more PU sizes. Therefore, we have proposed an efficient hardware architecture suitable for all intra PU sizes, as shown in Fig. 8, which involves a common intra prediction module with the 4 4 bottom-up structure and an adaptive sample fetch controller to compute different PU size predictions in a regular manner. The proposed intra prediction module also implements the DC mode with tree-based average buffers, the directional mode with pixel-based prediction mapping, and PU size adaptive reconstruction buffers for different PU sizes. Fig. 9 displays the corresponding schedule for each PU size with a two-stage sub-pipeline. In this schedule, each 4x4 block requires 9 cycles to load the reference data, calculate prediction values and wait for reconstruction. In addition, based on the decoding order, the chroma blocks are computed after the luma block is completed for each PU. Therefore, the total necessary cycles are 216 per block. Output pixel Re-constructed Blocks from previous stage & Line buffer Fig. 6 Architecture of the proposed deblocking filter
6 byte 4 byte Mode Information Partition Information System Controller Directional Mode Sample Selection Mode Angle Calculation 384 byte 256 byte Residue Neighboring Data On-Chip Memory Adaptive Sample Fetch Controller Intra Prediction DC Average Calculation Planar Mode Sample Selection Main Sample Side Sample Projecting Filter Reconstructed Samples OUTPUT Fig. 8 Architecture of the intra prediction 2-stage sub-pipeline: Load Data Load Data Load Data Load Data Prediction (2 pixels) Prediction (2 pixels) Prediction (2 pixels) Prediction (2 pixels) 9 cycles PU 4 4 and 8 8: (for a block 8 8) PU and 32 32: (for a block 16 16) (U) (V) 36 cycles 9 cycles 9 cycles 54 cycles * 16 * 4 * 4 (U) 144 cycles 36 cycles 216 cycles (U) (V) Fig. 9 Decoding schedule of intra prediction for each PU size Neighboring Data 36 cycles (V) x N y N Orig Orig Orig Orig T Tn L Ln 4x4 8x8 angx * x Main Sample Array Index angy * y Side Table Sample Array Index Orig Register Register Register << N * x << N * y TOP Sample Array 32x32 LEFT Sample Array * * Planar Mode >> N1 DC Mode >> N1 >> 5 Directional Mode Residue Register Fig. 10 The intra prediction datapath
7 ) 4 4 bottom-up structure and PU size adaptive sample fetching To satisfy the recursive combination of PUs with a simple control, a 4 4 block was selected as a basic unit to form a bottom-up structure. That is, each block is processed in a double-z-scan order with 16 basic units instead of row-byrow. Therefore, the processing order of each block is regular, regardless of the PU sizes. For the above processing, we implemented a PU sizeadaptive sample fetch controller, which fetches the corresponding reference samples adaptively based on the PU size and the coordinates of the current block because the prediction equations of different PU sizes are similar. The reconstructed data are stored in the PU size adaptive reconstruction buffer (Fig. 11), which can adapt to different PU sizes. With this buffer, the use of any extra buffer for each PU size can be avoided. 5 th PU: 6 th PU: Pred 1 (x 1, y) = [(T 1 n) (B 1 y)] [(L n) (R x 1 ) 2 n ] (n 1) (4) 2 cycles L L L LN - 1 cycle T0 T1 T0 T1 (x0,y) - B0 (x1,y) PU n n (N N, 2 2 ) B1 B0 B1 TN R R R - Fig. 11 An example of the PU size-adaptive reconstruction buffer 2) Common intra prediction modules for all PU sizes Fig. 10 shows a common intra prediction data path for different PU sizes with 2-pixel-per-cycle throughput to satisfy the processing rate. In this data path, the planar, DC, and directional modes are separated into independent flows because of different prediction equations. The corresponding also reference sample filterings are integrated into the prediction datapath but are not shown here for figure clarity. Planar mode: Fig. 10 and Fig. 12 show the prediction data path and buffer for the planar mode. With 2-pixelper-cycle throughput, the adaptive sample fetch controller updates the top reference pixels, T 0 and T 1, every cycle and the left reference pixels, L, every two cycles. The values of T N and L N are updated only for a new PU. Next, the values of the bottom reference pixels, B 0, and B 1, and right reference pixels, R, for the following blocks are calculated with subtractors, as displayed in Fig. 12. Finally, the prediction values are computed with a position-related linear combination, which is simplified with shift operators as follows Pred 0 (x 0, y) = [(T 0 n) (B 0 y)] [(L n) (R x 0 ) 2 n ] (n 1) (3) Fig. 12 The prediction buffers of the planar mode DC mode with tree-based average buffers: The DC mode requires all the reference samples to estimate the average value of different PU sizes and positions. Thus, to share these average values for different cases, 2 four-level treebased average buffers are implemented for the left and above reference samples, respectively. Directional mode with pixel-based prediction mapping: In the current HEVC reference software, a single function is used to cover all 33 prediction modes to simplify the software implementation. To fit into this function, the top reference and the left reference samples are exchanged first when the prediction mode belongs to the horizontal modes (modes 2 through 17). Then, the prediction values are calculated with the corresponding vertical modes (modes 19 through 33). After all prediction values in a PU are computed, the block flips back again. However, this flow is a block-based operation with respect to the hardware that requires additional buffers to store the unflipped block, and thus time is wasted on flipping. To solve the above problems, the common prediction equation was decomposed as a pixel-based mapping equation with the angle parameter shown in Table II, which does not need to flip the PU. Pred(x, y) = [(32 f(x, y)) S 0 f(x, y) S 1 16] 5 (5) f(x, y) = [(x_ang x) (y_ang y)] % 32 (6) where x and y are the coordinates of pixels, and S 0 and S 1 are the corresponding reference samples. In addition, the equation to calculate the extension of the main reference samples was also modified as follows. Ext_Index(x, y) = [128 invang ang(x, y)] 8 (7)
8 ang(x, y) = [(x_ang x) (y_ang y)] 5, mode < 11, mode 26 [(y_ang y) (x_ang x)] 5, mode < 18 [(x_ang x) (y_ang y)] 5, mode < 26 (8) Table II. The angle parameter of the directional modes Mode x_ang y_ang invang Mode x_ang y_ang invang E. Motion compensation The MC design in HEVC was more complex than the priorreported standards because of various PU sizes, more complicated interpolation, and MVP, which increases computation irregularity, buffer size and bandwidth to a large extent. In this proposed MC design, the bandwidth and buffer size was reduced first with block access, precisionbased access, and a smart buffer. Then, the computation irregularity was avoided and the area cost was reduced with the corner index-based MVP generation and 4x4 interpolators, with common subexpression sharing. 1) Bandwidth reduction The memory bandwidth reduction method in MC involves reuse of the reference samples in the highly overlapped data between different blocks, which is similar to the previous approaches [15]-[17] for H.264. Other approaches, such as efficient DRAM data mapping [24] and embedded compression [25], can also be combined directly to save more bandwidth. In the current work, the sole focus was on data reuse only, which was based on three approaches partitionbased access, precision-based access and cache-based design. The partition-based access loads reference data according to different partition sizes. A straightforward method to load reference data is to decompose a partition into 4x4 blocks and load an 11x11 block for each 4x4 interpolation. However, if the partition size is larger than 4x4, the corresponding reference data of 4x4 blocks in the same partition would be highly overlapped, and they can be reused to reduce the bandwidth. Thus, in the previous work for H.264 [15]-[17], the upper and upper left blocks were saved to reuse the overlapped part of the current block[15] or access (M5)x(N5) blocks for MxN blocks with M, N = 4 or 8 [16] or up to 16 in [17]. In general, a larger access size will have a lesser bandwidth but will increase the buffer cost. This method also works for HEVC. However, the maximum PU size of HEVC is 64x64 instead of in H.264, which should lead to significant data storage. In the current design, the block-based data access was adopted instead of 64x64, which would reuse overlapped samples for PU size <=. Moreover, the proposed smart buffer was also capable of reusing data for larger PUs, as described later. Thus, this size selection is compatible with the processing unit in MC and provides a good tradeoff between the buffer cost and the bandwidth. The precision-based data access loads the reference data according to different fractional MV positions, similar to previous H.264 work [18]-[21]. For HEVC, a MxN PU will load MxN, (M7)xN, Mx(N7) or (M7)x(N7) data according to its fractional MV positions, which avoids loading the (M7)x(N7) data for all types of MV positions. A cache in MC can further increase data reuse of adjacent block access (e.g., [21]-[23] for H.264) to reduce the bandwidth. The design in [22] proposed a 6-way cache to reduce the bandwidth by more than 70%. However, the authors used a large-sized memory (6x Bytes), a large number of registers and complex address mapping to implement a cache. These issues were also found in the design of HEVC [5]. Moreover, the PU size in HEVC was up to 64x64, which is much larger than the in H.264. To address these issues, this paper proposes a cache design with a simplified addressing scheme, denoted as a smart buffer, as shown in Fig. 13. Table III lists the percent savings of the data loading amount compared with the original 4x4 block-based loading. An average savings of 88% was found for the current proposed approach. Other sequences show similar results. The table also shows that the savings for cache size larger than 32x32 are quite similar. Thus, 32x32 was chosen for the current method, which can fit our selection of the processing unit and access size.
9 The proposed design partitions a 32x32 buffer into 16 8x8 blocks with a double-z-scan (DZS) index, as shown in Fig. 14, to fit the DZS computing order in HEVC. The same partition method and its DZS index were also applied to a 64x64 block (Fig. 15). An 8x8 block in a 64x64 block can, thus, be easily mapped to a block in the smart buffer using the DZS index modulo 16 as the mapping function. This addressing scheme provides an easy way to update the smart buffer block without any complex address control. Table III. Percent savings of data loading amount compared with the 4x4 block-based access Method Basket balldrive Race- HorsesC Basket- ball Pass Avg. B 24.7% 24.3% 68.1% 39.0% C 79.9% 78.9% 84.2% 81.0% D() 82.1% 80.9% 86.1% 83.0% D(32x32) 89.1% 86.9% 88.1% 88.0% D(64x32) 90.6% 88.2% 88.7% 89.1% D(64x64) 91.5% 89.2% 89.4% 90.0% Note: B: Precision-based request (4x4 based), C: B block size request D: C smart buffer MV Index Data Request Generator Tag 64 Buffer Tag V T Tag update compare Miss Reloading Hit 32x32x2 Smart Buffer Fetch Data Data Reload Interpolation Fig. 15 A smart buffer example (8x8 pixels for each word) 2) MC design The proposed MC architecture is divided into two stages. The first stage can compute the final MV with MVP and motion vector difference (MVD) and can access the reference data. After the reference pixels of a 4x4 block are ready in the stage buffer, the second stage starts to interpolate the reference data and reconstruct the pixels. To generate MVs in a regular way adaptable to variable PU sizes, the basic block of MVs is set to 4x4. Thus, the neighboring MVs are stored in a 4x4 base instead of a PU base to provide a regular way of accessing the neighboring MVs. Next, the corner index of the PU is computed to access the corresponding MVP candidates in different PU sizes, as shown in Fig. 16. If the size of the PU is larger than, only the top leftmost 4x4 block will then go through the MVP generation. For interpolation, the 4x4 block was chosen as the basic unit to process the recursive PU combinations with a bottom-up structure, as shown in Fig. 17. The area cost of the filters was reduced through hardware sharing for the same coefficients and the common subexpression sharing for different coefficients. Thus, the proposed interpolator can save 19.77% and 55% of the area cost compared with the original costs for the luma and chroma components, respectively. Fig. 13 The smart buffer architecture 32x x64 Fig. 14 Double-z scan order Fig. 16 Corner position for different partition sizes
10 To further reduce the temporary buffer cost, scheduling for all the interpolations is performed at the earliest opportunity (Fig. 17). The original scheduling was to compute all the horizontal filters first, store them next, and finally compute the vertical filter, which needs to save 44 results in the local buffer. The proposed scheduling computes the left half of the horizontal filters first, then directly consumes the results with the left half vertical filters, and then repeats the first two steps for the right half part. It is important to note that, when this order is followed, only saving 22 local results is necessary Pixel in current 4x4 Ref. pixel Hor. FIR result Ver. FIR result Compute order Fig. 17 Proposed computational order of the luma interpolator IV. IMPLEMENTATION RESULTS AND COMPARISONS The proposed decoder has been implemented and synthesized by the TSMC 90 nm 1P9M CMOS technology, as shown in Table IV and V, with the main profile tools support. The prediction part occupies the largest area because of the various prediction modes and complex memory data access, while the inverse transform occupies the second largest area because of the size up to 32x32. Table VI summarizes the techniques used in the current design to reduce the buffer size and the area cost while, at the same time, also increasing the adaptability to variable size processing. Table VII and VIII show comparisons with the previously published designs [4]-[6][26]. All of these designs support the main profile tools with B-frames, except [6], which implemented an earlier HEVC version with CAVLD as the entropy decoder and without any SAO. [4] requires the largest area and buffer cost because of its 64x64 pipeline. When compared with the intra prediction in [4], the proposed intra prediction reduces the gate count by 71.5% and the buffer size by 83% because of the common prediction units and smaller pipeline size. Other modules in [4] also need higher area cost than the proposed one. The design in [6] has a similar gate count and buffer cost as the proposed design but is only capable of 1920x1080@30fps video processing. However, a detailed comparison was rather impossible due to the unavailability of detailed design information. Both the proposed design and [5] can support similar real-time decoding capability with variable-size pipelines. However, [5] uses a 64x64 pipeline size as the worst case, which significantly increases the area cost. By contrast, our pipeline size is 32x32 in the first stage, and it is reduced to for the remaining stages. The smaller pipeline size decision and module optimization also help to reduce the gate count. The proposed design reduces the gate count by 35% and the buffer size by 87.2% in comparison to [5]. A detailed module comparison in Table VII shows that every module in the proposed design needs a lower gate count than that in [5]. The major gate count savings come from the module optimizations, such as single bin CABAD, reconfigurable prediction datapath, interleaved deblocking filter, and smart buffer scheme for MC cache (which avoids complicated register files in the cache tag). A commercial IP from [26] supports 4Kp120 processing with 700K gate count and no other details disclosed. However, its on-chip memory requirement is as high as 151 KB. Table IV. Detailed gate count of the proposed decoder Pipeline stage Module Gate Count (270 MHz, 90 nm) Control 21,342 1 st stage CABAD 48,430 2 nd stage Inverse 48,556 Quantization Inverse Transform 63,844 Motion Vector 84,730 Generation Reference Pixel 15,841 Accessing Smart buffer 47,130 3 rd stage Interpolation 18,742 -luma interpolator 14,462 -chroma 4,280 interpolator Intra Prediction 76,812 4 th stage Deblocking Filter 20,551 SAO 21,752 Total 467,730 Table V. Buffer size of the proposed design Module On-chip memory (Byte) Inverse transform 2048 MC 1230 Intra prediction 704 SAO 384 Others 4180 Pipeline ping-pong buffer Coefficients Smart buffer Residual buffer 576 -Pre-filter pixel 512 Total 15778
11 Module CABAD IT Intra prediction Table VI. Summary of the design techniques Techniques Single bin per cycle Reconfigurable for all TU size for various PU size a common intra prediction module 4 4 based bottom-up structure adaptive sample fetch controller MC 4x4 based structure for various PU size Cache with simplified addressing scheme corner position computation for MVP for various PU size Deblocking filter Pipeline Interleaved processing to save intermediate buffer Mixed block size (32x32 and ) to save pipeline buffer Table VII. Module comparison with other designs Design [5] [4] Proposed feature HEVC Standard HEVC WD4 HEVC WD6 Gate count 715K 1763K 467K -Entropy 94.5K 138K 48.4K decoder (CAVLD) (CABAC) (CABAC) -ITIQ 121.1K 779K 112.4K K (inter 270K(intra) 76.8K(intra) Prediction intra) -MC 440K (inter 166.4K (inter 126K cache cache) cache) K 84K 20.5K deblocking -SAO NA 32K 21.7K 131.2K (register files 20K (signal -others for cache tag, communicat 21K MEM ion) interface) V. CONCLUSION This paper has presented a real-time HEVC decoder that is capable of decoding 4096x2160@30fps at 270 MHz. The proposed design adopts a 4-stage mixed-block-size pipeline to create a tradeoff between the buffer cost and the bandwidth, and it has reduced the stage buffer cost by approximately 90%. The required bandwidth was reduced by approximately 88% through block access, precision-based access and a smart buffer scheme. The complex irregular computation was unified by a reconfigurable design and 4x4 block-based processing for the inverse transform and prediction, respectively. By following the above-mentioned approaches, the gate count and on-chip memory were reduced to 467 K and KB, respectively. These values indicate that there is a savings of 35% for the gate count and 87.2% for the buffer cost when compared to the other published designs with the same processing capabilities. REFERENCES [1] High Efficiency Video Coding, ITU-T Rec. H.265, Apr [2] G. J. Sullivan, J.-R. Ohm, W.-J. Han, T. Wiegand, Overview of the High Efficiency Video Coding (HEVC) Standard, in IEEE Tran. Circuits Syst. Video Technol., vol. 22, no. 12, Dec. 2012, pp [3] B. Li, G. J. Sullivan, and J. Xu, Comparison of compression performance of HEVC Draft 9 with AVC high Profile and performance of HM9.0 with temporal scalability characteristics, JCTVC-L0322, Jan [4] S. Cho and H. Kim, Hardware implementation of a HEVC decoder, JCTVC-L0096, Jan [5] C.-T. Huang, M. Tikekar, C. Juvekar, V. Sze and A. Chandrakasan, A 249Mpixel/s HEVC video-decoder chip for Quad Full HD applications, in ISSCC Dig. Tech. Papers, 2013, pp [6] C.-H. Tsai, H.-T. Wang, C.-L. Liu, Y. Li, and C.-Y. Lee, "A 446.6K-gates V H.265/HEVC decoder for next generation video applications," in Proc. ASSCC, 2013, pp [7] P.-T. Chiang and T. S. Chang, "A reconfigurable inverse transform architecture design for HEVC decoder," in Proc. ISCAS, 2013, pp [8] J.-S. Park, W.-J. Nam, S.-M. Han and S. Lee, 2-D large inverse Transform (, 32x32) for HEVC (High Efficiency Video Coding), J. Semi. Tech. Sci., vol. 12, no. 2, pp , June [9] J. Zhu, Z. Liu, and D. Wang, "Fully pipelined DCT/IDCT/Hadamard unified transform architecture for HEVC Codec," in Proc. ISCAS, 2013, pp [10] W. Zhao, T. Onoye, and T. Song, "High-performance multiplierless transform architecture for HEVC," in Proc. ISCAS, 2013, pp
12 Table VIII. Decoder comparison with other designs Design feature [5] [4] [6] [26] Proposed Standard HEVC main HEVC WD6 HEVC HEVC main/main10 HEVC main Version WD4 (no SAO, CAVLC, fewer WD6 NA Final Final intra modes) Technology 40nm 65nm 90nm NA 90nm Frequency 200MHz 266 MHz 224MHz 600MHz 270MHz Resolution 3840x x x x x2160 Frame rate 30 fps 60 fps 35fps 120fps 30 fps Pipeline Variable 64x64 (7 stages) 4 stages NA Variable (4 stages) Gate count 715K 1763K 446.6K 700K (8b)/ 840K(10b) 467K Memory (KB) (8b)/ 179(10b) Throughput (Mpixel/sec) [11] E. Kalali, Y. Adibelli, and I. Hamzaoglu, A High performance and low energy intra prediction hardware for HEVC video decoding, in Proc. DASIP, [12] Z. Guo, D. Zhou and S. Goto, An optimized MC interpolation architecture for HEVC, in Proc. ICASSP, 2012, p.p [13] V. Afonso, H. Maich, L. Agostini and D. Franco, Low cost and high throughput FME interpolation for the HEVC emerging video coding standard, in Proc. LASCAS, 2013, pp [14] W. Shen, Q. Shang, S. Shen, Y. Fan, and X. Zeng, "A high-throughput VLSI architecture for deblocking filter in HEVC," in Proc. ISCAS, 2013, pp [15] D.-Y. Shen, T.-H. Tsai, A 4x4-block level pipeline and bandwidth optimized motion compensation hardware design for H.264/AVC decoder, in Proc. ICME, 2009, pp [16] C.-C. Lin, J.-I. Guo, H.-C. Chang, Y.-C. Yang, J.-W. Chen, M.-C. Tsai and J.-S. Wang, A 160kGate 4.5kB SRAM H.264 video decoder for HDTV applications, in ISSCC Dig. Tech. Papers, 2006, pp [17] C. Li, K. Huang, X. Yan, J. Feng, D. Ma and H. Ge, A high efficient memory architecture for H. 264/AVC motion compensation, in Proc. ASAP, 2010, pp [18] R.-G. Wang, J.-T. Li and C. Huang, Motion compensation memory access optimization strategies for H.264/AVC decoder, in Proc. ICASSP, 2005, vol. 5, pp [19] Y. Li and Y. He, Bandwidth optimized and high performance interpolation architecture in motion compensation for H.264/AVC HDTV decoder, J. Signal Proc. Syst. vol. 52, no. 2, pp , Aug [20] E. Matei, C. Praet, J. Bauweklinck, P. Cautereels and E. Lumley, Novel data storage for H. 264 motion compensation: system architecture and hardware implementation, EURASIP J. Image and Video Proc., p.p1-12, Dec [21] Y. Li, Y. Qu and Y. He. Memory cache based motion compensation architecture for HDTV H. 264/AVC decoder, in Proc. ISCAS, 2007, pp [22] T.-D. Chuang, L.-H. Chang, T.-W. Chiu, Y.-H. Chen and L.-G. Chen, Bandwidth-efficient cache-based motion compensation architecture with DRAM-friendly data access control, in Proc. ICASSP, 2009, pp [23] X. Chen, L. Peilin, Z. Jiayi, Z. Dajiang, and S. Goto, "Block-pipelining cache for motion compensation in high definition H.264/AVC video decoder," in Proc. ISCAS, pp [24] G.-S. Yu and T. S. Chang, "Optimal Data Mapping for Motion Compensation in H.264 Video Decoding," in Proc. SiPS, 2007, pp [25] L. C. Chiu and T. S. Chang, "A lossless embedded compression codec engine for HD video decoding," in Proc. VLSI-DAT, 2012, pp [26] ovics, ViC-1 HEVC 4Kp120 Decoder, ProdBrief pdf, 2014 Pai-Tse Chiang received the B.S and M.S degrees in electronics engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 2011 and 2013, respectively. His current research interests include digital signal processing, video coding and systemon-chip design.
13 Yi-Ching Ting received the B.S. and M.S. degrees in electronics engineering from National Chiao Tung University, Hsinchu, Taiwan, in 2011 and 2013 respectively. He is an engineer with PixArt Imaging Inc., Hsinchu, Taiwan. His current research interests include image processing and multimedia applications. Hsuan-Ku Chen recevied the B.S. and M.S. degrees in Electronics Engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, R.O.C., in 2011 and 2013 respectively. He is an engineer with PixArt Imaging Inc., Hsinchu, Taiwan. His current research interests are video coding and digital integrated circuits. Shiaw-Yu Jou recevied the B.S. and M.S. degrees in Electronics Engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, R.O.C., in 2011 and 2013 respectively. He is an engineer with PixArt Imaging Inc., Hsinchu, Taiwan. His current research interests are image processing and digital integrated circuits. I-Wen Chen received the B.S. and M.S. degree in Electronics Engineering from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 2012, and She is an engineer with PixArt Imaging Inc., Hsinchu, Taiwan. Her current research interests are image processing and digital integrated circuits. Hang-Chiu Fang received the B.S. in Electrical Engineering from National Chung Hsing University, Taichung, Taiwan, R.O.C., in 2012, and M.S. degree in Electronics Engineering from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in He is an engineer with ELan Inc., Hsinchu, Taiwan. His current research interests are video coding and digital integrated circuits. Tian-Sheuan Chang (S 93 M 06 SM 07) received the B.S., M.S., and Ph.D. degrees in electronic engineering from National Chiao-Tung University (NCTU), Hsinchu, Taiwan, in 1993, 1995, and 1999, respectively. From 2000 to 2004, he was a Deputy Manager with Global Unichip Corporation, Hsinchu, Taiwan. In 2004, he joined the Department of Electronics Engineering, NCTU, where he is currently a Professor. In 2009, he was a visiting scholar in IMEC, Belgium. His current research interests include system-on-a-chip design, VLSI signal processing, and computer architecture. Dr. Chang has received the Excellent Young Electrical Engineer from Chinese Institute of Electrical Engineering in 2007, and the Outstanding Young Scholar from Taiwan IC Design Society in He has been actively involved in many international conferences as an organizing committee or technical program committee member. He is current an Editorial Board Member of IEEE Transactions of Circuits and Systems for Video Technology.
A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS
9th European Signal Processing Conference (EUSIPCO 2) Barcelona, Spain, August 29 - September 2, 2 A 6-65 CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS Jinjia Zhou, Dajiang
More informationWITH the demand of higher video quality, lower bit
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 8, AUGUST 2006 917 A High-Definition H.264/AVC Intra-Frame Codec IP for Digital Video and Still Camera Applications Chun-Wei
More informationA High Performance Deblocking Filter Hardware for High Efficiency Video Coding
714 IEEE Transactions on Consumer Electronics, Vol. 59, No. 3, August 2013 A High Performance Deblocking Filter Hardware for High Efficiency Video Coding Erdem Ozcan, Yusuf Adibelli, Ilker Hamzaoglu, Senior
More informationMauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard
Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard Conference object, Postprint version This version is available
More informationChapter 2 Introduction to
Chapter 2 Introduction to H.264/AVC H.264/AVC [1] is the newest video coding standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The main improvements
More informationDecoder Hardware Architecture for HEVC
Decoder Hardware Architecture for HEVC The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Tikekar, Mehul,
More informationInternational Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC
Motion Compensation Techniques Adopted In HEVC S.Mahesh 1, K.Balavani 2 M.Tech student in Bapatla Engineering College, Bapatla, Andahra Pradesh Assistant professor in Bapatla Engineering College, Bapatla,
More informationA Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension
05-Silva-AF:05-Silva-AF 8/19/11 6:18 AM Page 43 A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension T. L. da Silva 1, L. A. S. Cruz 2, and L. V. Agostini 3 1 Telecommunications
More informationCOMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.
COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS. DILIP PRASANNA KUMAR 1000786997 UNDER GUIDANCE OF DR. RAO UNIVERSITY OF TEXAS AT ARLINGTON. DEPT.
More informationMemory interface design for AVS HD video encoder with Level C+ coding order
LETTER IEICE Electronics Express, Vol.14, No.12, 1 11 Memory interface design for AVS HD video encoder with Level C+ coding order Xiaofeng Huang 1a), Kaijin Wei 2, Guoqing Xiang 2, Huizhu Jia 2, and Don
More informationA Low-Power 0.7-V H p Video Decoder
A Low-Power 0.7-V H.264 720p Video Decoder D. Finchelstein, V. Sze, M.E. Sinangil, Y. Koken, A.P. Chandrakasan A-SSCC 2008 Outline Motivation for low-power video decoders Low-power techniques pipelining
More informationVideo coding standards
Video coding standards Video signals represent sequences of images or frames which can be transmitted with a rate from 5 to 60 frames per second (fps), that provides the illusion of motion in the displayed
More informationModule 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur
Module 8 VIDEO CODING STANDARDS Lesson 27 H.264 standard Lesson Objectives At the end of this lesson, the students should be able to: 1. State the broad objectives of the H.264 standard. 2. List the improved
More informationA High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame
I J C T A, 9(34) 2016, pp. 673-680 International Science Press A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame K. Priyadarshini 1 and D. Jackuline Moni
More informationOverview: Video Coding Standards
Overview: Video Coding Standards Video coding standards: applications and common structure ITU-T Rec. H.261 ISO/IEC MPEG-1 ISO/IEC MPEG-2 State-of-the-art: H.264/AVC Video Coding Standards no. 1 Applications
More informationProject Proposal Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359
Project Proposal Time Optimization of HEVC Encoder over X86 Processors using SIMD Spring 2013 Multimedia Processing Advisor: Dr. K. R. Rao Department of Electrical Engineering University of Texas, Arlington
More informationMotion Compensation Hardware Accelerator Architecture for H.264/AVC
Motion Compensation Hardware Accelerator Architecture for H.264/AVC Bruno Zatt 1, Valter Ferreira 1, Luciano Agostini 2, Flávio R. Wagner 1, Altamiro Susin 3, and Sergio Bampi 1 1 Informatics Institute
More informationHardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy
Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy Vladimir Afonso 1-2, Henrique Maich 1, Luan Audibert 1, Bruno Zatt 1, Marcelo Porto 1, Luciano Agostini
More informationThe Multistandard Full Hd Video-Codec Engine On Low Power Devices
The Multistandard Full Hd Video-Codec Engine On Low Power Devices B.Susma (M. Tech). Embedded Systems. Aurora s Technological & Research Institute. Hyderabad. B.Srinivas Asst. professor. ECE, Aurora s
More informationA 249-Mpixel/s HEVC Video-Decoder Chip for 4K Ultra-HD Applications
A 249-Mpixel/s HEVC Video-Decoder Chip for 4K Ultra-HD Applications The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More information626 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012
626 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 4, APRIL 2012 A 135 MHz 542 k Gates High Throughput H.264/AVC Scalable High Profile Decoder Gwo-Long Li, Yu-Chen Chen, Yuan-Hsin
More informationAn efficient interpolation filter VLSI architecture for HEVC standard
Zhou et al. EURASIP Journal on Advances in Signal Processing (2015) 2015:95 DOI 10.1186/s13634-015-0284-0 RESEARCH An efficient interpolation filter VLSI architecture for HEVC standard Wei Zhou 1*, Xin
More informationFast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264
Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264 Ju-Heon Seo, Sang-Mi Kim, Jong-Ki Han, Nonmember Abstract-- In the H.264, MBAFF (Macroblock adaptive frame/field) and PAFF (Picture
More informationWITH the rapid development of high-fidelity video services
896 IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 7, JULY 2015 An Efficient Frame-Content Based Intra Frame Rate Control for High Efficiency Video Coding Miaohui Wang, Student Member, IEEE, KingNgiNgan,
More informationA video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.
Video coding Concepts and notations. A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Each image is either sent progressively (the
More informationAn FPGA Implementation of Shift Register Using Pulsed Latches
An FPGA Implementation of Shift Register Using Pulsed Latches Shiny Panimalar.S, T.Nisha Priscilla, Associate Professor, Department of ECE, MAMCET, Tiruchirappalli, India PG Scholar, Department of ECE,
More informationTHE new video coding standard H.264/AVC [1] significantly
832 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 9, SEPTEMBER 2006 Architecture Design of Context-Based Adaptive Variable-Length Coding for H.264/AVC Tung-Chien Chen, Yu-Wen
More informationA Low Energy HEVC Inverse Transform Hardware
754 IEEE Transactions on Consumer Electronics, Vol. 60, No. 4, November 2014 A Low Energy HEVC Inverse Transform Hardware Ercan Kalali, Erdem Ozcan, Ozgun Mert Yalcinkaya, Ilker Hamzaoglu, Senior Member,
More informationInterim Report Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359
Interim Report Time Optimization of HEVC Encoder over X86 Processors using SIMD Spring 2013 Multimedia Processing Advisor: Dr. K. R. Rao Department of Electrical Engineering University of Texas, Arlington
More informationA Novel VLSI Architecture of Motion Compensation for Multiple Standards
A Novel VLSI Architecture of Motion Compensation for Multiple Standards Junhao Zheng, Wen Gao, Senior Member, IEEE, David Wu, and Don Xie Abstract Motion compensation (MC) is one of the most important
More informationAlgorithm and architecture design of the motion estimation for the H.265/HEVC 4K-UHD encoder
J Real-Time Image Proc (216) 12:517 529 DOI 1.17/s11554-15-516-4 SPECIAL ISSUE PAPER Algorithm and architecture design of the motion estimation for the H.265/HEVC 4K-UHD encoder Grzegorz Pastuszak Maciej
More informationAn Efficient Reduction of Area in Multistandard Transform Core
An Efficient Reduction of Area in Multistandard Transform Core A. Shanmuga Priya 1, Dr. T. K. Shanthi 2 1 PG scholar, Applied Electronics, Department of ECE, 2 Assosiate Professor, Department of ECE Thanthai
More informationA Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm
A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm Mustafa Parlak and Ilker Hamzaoglu Faculty of Engineering and Natural Sciences Sabanci University, Tuzla, 34956, Istanbul, Turkey
More informationALONG with the progressive device scaling, semiconductor
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 4, APRIL 2010 285 LUT Optimization for Memory-Based Computation Pramod Kumar Meher, Senior Member, IEEE Abstract Recently, we
More informationQuarter-Pixel Accuracy Motion Estimation (ME) - A Novel ME Technique in HEVC
International Transaction of Electrical and Computer Engineers System, 2014, Vol. 2, No. 3, 107-113 Available online at http://pubs.sciepub.com/iteces/2/3/5 Science and Education Publishing DOI:10.12691/iteces-2-3-5
More informationWe are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors
We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 4,000 116,000 120M Open access books available International authors and editors Downloads Our
More informationExpress Letters. A Novel Four-Step Search Algorithm for Fast Block Motion Estimation
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 6, NO. 3, JUNE 1996 313 Express Letters A Novel Four-Step Search Algorithm for Fast Block Motion Estimation Lai-Man Po and Wing-Chung
More informationVideo Encoder Design for High-Definition 3D Video Communication Systems
INTEGRATED CIRCUITS FOR COMMUNICATIONS Video Encoder Design for High-Definition 3D Video Communication Systems Pei-Kuei Tsung, Li-Fu Ding, Wei-Yin Chen, Tzu-Der Chuang, Yu-Han Chen, Pai-Heng Hsiao, Shao-Yi
More informationA Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.8, NO.5, OCTOBER, 08 ISSN(Print) 598-657 https://doi.org/57/jsts.08.8.5.640 ISSN(Online) -4866 A Modified Static Contention Free Single Phase Clocked
More informationDesign Challenge of a QuadHDTV Video Decoder
Design Challenge of a QuadHDTV Video Decoder Youn-Long Lin Department of Computer Science National Tsing Hua University MPSOC27, Japan More Pixels YLLIN NTHU-CS 2 NHK Proposes UHD TV Broadcast Super HiVision
More information17 October About H.265/HEVC. Things you should know about the new encoding.
17 October 2014 About H.265/HEVC. Things you should know about the new encoding Axis view on H.265/HEVC > Axis wants to see appropriate performance improvement in the H.265 technology before start rolling
More informationSelective Intra Prediction Mode Decision for H.264/AVC Encoders
Selective Intra Prediction Mode Decision for H.264/AVC Encoders Jun Sung Park, and Hyo Jung Song Abstract H.264/AVC offers a considerably higher improvement in coding efficiency compared to other compression
More informationA High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System
A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System Zhibin Xiao and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Outline Introduction to H.264
More informationChapter 10 Basic Video Compression Techniques
Chapter 10 Basic Video Compression Techniques 10.1 Introduction to Video compression 10.2 Video Compression with Motion Compensation 10.3 Video compression standard H.261 10.4 Video compression standard
More informationH.264/AVC Baseline Profile Decoder Complexity Analysis
704 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003 H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, Senior
More informationMULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER Wassim Hamidouche, Mickael Raulet and Olivier Déforges
More informationReduced complexity MPEG2 video post-processing for HD display
Downloaded from orbit.dtu.dk on: Dec 17, 2017 Reduced complexity MPEG2 video post-processing for HD display Virk, Kamran; Li, Huiying; Forchhammer, Søren Published in: IEEE International Conference on
More informationJoint Algorithm-Architecture Optimization of CABAC
Noname manuscript No. (will be inserted by the editor) Joint Algorithm-Architecture Optimization of CABAC Vivienne Sze Anantha P. Chandrakasan Received: date / Accepted: date Abstract This paper uses joint
More informationThe H.26L Video Coding Project
The H.26L Video Coding Project New ITU-T Q.6/SG16 (VCEG - Video Coding Experts Group) standardization activity for video compression August 1999: 1 st test model (TML-1) December 2001: 10 th test model
More informationA parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b 1 Education Ministry
More informationA VLSI Architecture for Variable Block Size Video Motion Estimation
A VLSI Architecture for Variable Block Size Video Motion Estimation Yap, S. Y., & McCanny, J. (2004). A VLSI Architecture for Variable Block Size Video Motion Estimation. IEEE Transactions on Circuits
More informationVideo Compression - From Concepts to the H.264/AVC Standard
PROC. OF THE IEEE, DEC. 2004 1 Video Compression - From Concepts to the H.264/AVC Standard GARY J. SULLIVAN, SENIOR MEMBER, IEEE, AND THOMAS WIEGAND Invited Paper Abstract Over the last one and a half
More informationImplementation of an MPEG Codec on the Tilera TM 64 Processor
1 Implementation of an MPEG Codec on the Tilera TM 64 Processor Whitney Flohr Supervisor: Mark Franklin, Ed Richter Department of Electrical and Systems Engineering Washington University in St. Louis Fall
More informationDesign of a Fast Multi-Reference Frame Integer Motion Estimator for H.264/AVC
http://dx.doi.org/10.5573/jsts.2013.13.5.430 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.13, NO.5, OCTOBER, 2013 Design of a Fast Multi-Reference Frame Integer Motion Estimator for H.264/AVC Juwon
More informationA Reed Solomon Product-Code (RS-PC) Decoder Chip for DVD Applications
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001 229 A Reed Solomon Product-Code (RS-PC) Decoder Chip DVD Applications Hsie-Chia Chang, C. Bernard Shung, Member, IEEE, and Chen-Yi Lee
More informationA Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding
8 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 47, NO. 1, JANUARY 2012 A Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding Vivienne Sze, Member, IEEE, and Anantha P. Chandrakasan,
More informationVisual Communication at Limited Colour Display Capability
Visual Communication at Limited Colour Display Capability Yan Lu, Wen Gao and Feng Wu Abstract: A novel scheme for visual communication by means of mobile devices with limited colour display capability
More informationFRAME RATE BLOCK SELECTION APPROACH BASED DIGITAL WATER MARKING FOR EFFICIENT VIDEO AUTHENTICATION USING NETWORK CONDITIONS
FRAME RATE BLOCK SELECTION APPROACH BASED DIGITAL WATER MARKING FOR EFFICIENT VIDEO AUTHENTICATION USING NETWORK CONDITIONS A. Kirthika 1 and A. Senthilkumar 2 1 Department of Electronics and Communication
More informationSCALABLE video coding (SVC) is currently being developed
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 7, JULY 2006 889 Fast Mode Decision Algorithm for Inter-Frame Coding in Fully Scalable Video Coding He Li, Z. G. Li, Senior
More informationMultimedia Communications. Video compression
Multimedia Communications Video compression Video compression Of all the different sources of data, video produces the largest amount of data There are some differences in our perception with regard to
More informationTHE TRANSMISSION and storage of video are important
206 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBRUARY 2011 Novel RD-Optimized VBSME with Matching Highly Data Re-Usable Hardware Architecture Xing Wen, Student Member,
More informationHardware Decoding Architecture for H.264/AVC Digital Video Standard
Hardware Decoding Architecture for H.264/AVC Digital Video Standard Alexsandro C. Bonatto, Henrique A. Klein, Marcelo Negreiros, André B. Soares, Letícia V. Guimarães and Altamiro A. Susin Department of
More informationOL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features
OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0 General Description Applications Features The OL_H264e core is a hardware implementation of the H.264 baseline video compression algorithm. The core
More informationHIGH Efficiency Video Coding (HEVC) version 1 was
1 An HEVC-based Screen Content Coding Scheme Bin Li and Jizheng Xu Abstract This document presents an efficient screen content coding scheme based on HEVC framework. The major techniques in the scheme
More informationAdvanced Screen Content Coding Using Color Table and Index Map
1 Advanced Screen Content Coding Using Color Table and Index Map Zhan Ma, Wei Wang, Meng Xu, Haoping Yu Abstract This paper presents an advanced screen content coding solution using Color Table and Index
More informationJun-Hao Zheng et al.: An Efficient VLSI Architecture for MC of AVS HDTV Decoder 371 ture for MC which contains a three-stage pipeline. The hardware ar
May 2006, Vol.21, No.3, pp.370 377 J. Comput. Sci. & Technol. An Efficient VLSI Architecture for Motion Compensation of AVS HDTV Decoder Jun-Hao Zheng 1;3 (ΨΞ ), Lei Deng 2 ( Π), Peng Zhang 1;3 (Φ ±),
More informationIN DIGITAL transmission systems, there are always scramblers
558 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 7, JULY 2006 Parallel Scrambler for High-Speed Applications Chih-Hsien Lin, Chih-Ning Chen, You-Jiun Wang, Ju-Yuan Hsiao,
More informationFinal Report Time Optimization of HEVC Encoder over X86 Processors using SIMD. Spring 2013 Multimedia Processing EE5359
Final Report Time Optimization of HEVC Encoder over X86 Processors using SIMD Spring 2013 Multimedia Processing Advisor: Dr. K. R. Rao Department of Electrical Engineering University of Texas, Arlington
More informationTHE USE OF forward error correction (FEC) in optical networks
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract
More informationIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH GHEVC: An Efficient HEVC Decoder for Graphics Processing Units
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH 2017 459 GHEVC: An Efficient HEVC Decoder for Graphics Processing Units Diego F. de Souza, Student Member, IEEE, Aleksandar Ilic, Member, IEEE, Nuno
More informationAn Overview of Video Coding Algorithms
An Overview of Video Coding Algorithms Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Video coding can be viewed as image compression with a temporal
More informationIMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO ZARNA PATEL. Presented to the Faculty of the Graduate School of
IMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO by ZARNA PATEL Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of
More informationMultimedia Communications. Image and Video compression
Multimedia Communications Image and Video compression JPEG2000 JPEG2000: is based on wavelet decomposition two types of wavelet filters one similar to what discussed in Chapter 14 and the other one generates
More informationUniversity of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.
Wang, D., Canagarajah, CN., & Bull, DR. (2005). S frame design for multiple description video coding. In IEEE International Symposium on Circuits and Systems (ISCAS) Kobe, Japan (Vol. 3, pp. 19 - ). Institute
More informationAN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS
AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS Susanna Spinsante, Ennio Gambi, Franco Chiaraluce Dipartimento di Elettronica, Intelligenza artificiale e
More informationParallel Implementation of Sample Adaptive Offset Filtering Block for Low-Power HEVC Chip. Luis A. Fernández Lara
Parallel Implementation of Sample Adaptive Offset Filtering Block for Low-Power HEVC Chip by Luis A. Fernández Lara B.S., Massachusetts Institute of Technology (2014) Submitted to the Department of Electrical
More informationInto the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018
Into the Depths: The Technical Details Behind AV1 Nathan Egge Mile High Video Workshop 2018 July 31, 2018 North America Internet Traffic 82% of Internet traffic by 2021 Cisco Study
More informationIntroduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work
Introduction to Video Compression Techniques Slides courtesy of Tay Vaughan Making Multimedia Work Agenda Video Compression Overview Motivation for creating standards What do the standards specify Brief
More informationThe H.263+ Video Coding Standard: Complexity and Performance
The H.263+ Video Coding Standard: Complexity and Performance Berna Erol (bernae@ee.ubc.ca), Michael Gallant (mikeg@ee.ubc.ca), Guy C t (guyc@ee.ubc.ca), and Faouzi Kossentini (faouzi@ee.ubc.ca) Department
More informationFAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION
FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION 1 YONGTAE KIM, 2 JAE-GON KIM, and 3 HAECHUL CHOI 1, 3 Hanbat National University, Department of Multimedia Engineering 2 Korea Aerospace
More informationCOMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards
COMP 9 Advanced Distributed Systems Multimedia Networking Video Compression Standards Kevin Jeffay Department of Computer Science University of North Carolina at Chapel Hill jeffay@cs.unc.edu September,
More informationLUT Optimization for Memory Based Computation using Modified OMS Technique
LUT Optimization for Memory Based Computation using Modified OMS Technique Indrajit Shankar Acharya & Ruhan Bevi Dept. of ECE, SRM University, Chennai, India E-mail : indrajitac123@gmail.com, ruhanmady@yahoo.co.in
More informationH.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003
H.261: A Standard for VideoConferencing Applications Nimrod Peleg Update: Nov. 2003 ITU - Rec. H.261 Target (1990)... A Video compression standard developed to facilitate videoconferencing (and videophone)
More informationTHE High Efficiency Video Coding (HEVC) standard is
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012 1649 Overview of the High Efficiency Video Coding (HEVC) Standard Gary J. Sullivan, Fellow, IEEE, Jens-Rainer
More informationAuthors: Glenn Van Wallendael, Sebastiaan Van Leuven, Jan De Cock, Peter Lambert, Joeri Barbarien, Adrian Munteanu, and Rik Van de Walle
biblio.ugent.be The UGent Institutional Repository is the electronic archiving and dissemination platform for all UGent research publications. Ghent University has implemented a mandate stipulating that
More informationProject Interim Report
Project Interim Report Coding Efficiency and Computational Complexity of Video Coding Standards-Including High Efficiency Video Coding (HEVC) Spring 2014 Multimedia Processing EE 5359 Advisor: Dr. K. R.
More informationImplementation of Memory Based Multiplication Using Micro wind Software
Implementation of Memory Based Multiplication Using Micro wind Software U.Palani 1, M.Sujith 2,P.Pugazhendiran 3 1 IFET College of Engineering, Department of Information Technology, Villupuram 2,3 IFET
More informationAn Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions
1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,
More informationWHITE PAPER. Perspectives and Challenges for HEVC Encoding Solutions. Xavier DUCLOUX, December >>
Perspectives and Challenges for HEVC Encoding Solutions Xavier DUCLOUX, December 2013 >> www.thomson-networks.com 1. INTRODUCTION... 3 2. HEVC STATUS... 3 2.1 HEVC STANDARDIZATION... 3 2.2 HEVC TOOL-BOX...
More informationOn Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding
1240 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 6, DECEMBER 2011 On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding Zhan Ma, Student Member, IEEE, HaoHu,
More informationISSCC 2006 / SESSION 14 / BASEBAND AND CHANNEL PROCESSING / 14.6
ISSCC 2006 / SESSION 14 / BASEBAND AND CHANNEL PROSSING / 14.6 14.6 A 1.8V 250mW COFDM Baseband Receiver for DVB-T/H Applications Lei-Fone Chen, Yuan Chen, Lu-Chung Chien, Ying-Hao Ma, Chia-Hao Lee, Yu-Wei
More informationMPEG has been established as an international standard
1100 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999 Fast Extraction of Spatially Reduced Image Sequences from MPEG-2 Compressed Video Junehwa Song, Member,
More informationOL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features
OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0 General Description Applications Features The OL_H264MCLD core is a hardware implementation of the H.264 baseline video compression
More informationFast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding
356 IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.1, January 27 Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding Abderrahmane Elyousfi 12, Ahmed
More informationPHASE-LOCKED loops (PLLs) are widely used in many
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 5, MAY 2005 233 A Portable Digitally Controlled Oscillator Using Novel Varactors Pao-Lung Chen, Ching-Che Chung, and Chen-Yi Lee
More informationStudy of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010
Study of AVS China Part 7 for Mobile Applications By Jay Mehta EE 5359 Multimedia Processing Spring 2010 1 Contents Parts and profiles of AVS Standard Introduction to Audio Video Standard for Mobile Applications
More informationMemory efficient Distributed architecture LUT Design using Unified Architecture
Research Article Memory efficient Distributed architecture LUT Design using Unified Architecture Authors: 1 S.M.L.V.K. Durga, 2 N.S. Govind. Address for Correspondence: 1 M.Tech II Year, ECE Dept., ASR
More informationStandardized Extensions of High Efficiency Video Coding (HEVC)
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Standardized Extensions of High Efficiency Video Coding (HEVC) Sullivan, G.J.; Boyce, J.M.; Chen, Y.; Ohm, J-R.; Segall, C.A.: Vetro, A. TR2013-105
More informationInternational Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna
More informationDesign and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture
Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Vinaykumar Bagali 1, Deepika S Karishankari 2 1 Asst Prof, Electrical and Electronics Dept, BLDEA
More information