COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.

COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS. DILIP PRASANNA KUMAR 1000786997 UNDER GUIDANCE OF DR. RAO UNIVERSITY OF TEXAS AT ARLINGTON. DEPT. OF ELECTRICAL ENGINEERING

1. INTRODUCTION HEVC or the High Efficiency Video Coding standard is a new standard being developed as a joint project by ITU-T VCEG and ISO/IEC MPEG, working together in a partnership known as the Joint Collaborative Team on Video Coding (JCT-VC) [1]. It has been designed to address all existing applications of H.264/MPEG-4 AVC and to particularly focus on two key issues: increased video resolution and increased use of parallel processing architecture [2]. HEVC aims to achieve 50% compression over H.264 while maintaining a similar visual quality [2]. The compression is achieved at the cost of increased encoder complexity. It supports visually lossless compression and lossless compression, pictures from QVGA (320x240) to ultrahd (8k x 4k). It allows random access, fast channel switching, trick modes and also intra only coding modes. HEVC employs a traditional hybrid coding model employing temporal and spatial predictions, spatial transforms, quantization, entropy coding and in loop filtering. The block diagram for a typical HEVC encoder is shown in figure 1. Figure 1: HEVC encoder block diagram. [1]

Figure 2: HEVC decoder block diagram. [12] HEVC encoder is functionally similar to the older video coding standards such as H.264 and AVC. The notable differences are the new coding tree unit instead of the macroblock, new in loop filtering techniques and a single entropy coding method- Context Adaptive Binary Arithmetic Coding (CABAC). However, modifications have been made to almost all aspects of the encoder and these differences contribute to the increased compression achieved by HEVC [1]. 2. CODING TREE UNIT IN HEVC The significant change is the new coding tree unit instead of a macroblock. In principle, the quadtree coding structure in HEVC is described by means of blocks and units. A block defines an array of samples and sizes thereof whereas a unit encapsulates one luma and corresponding chroma blocks together with syntax needed to code these. Consequently, a Coding Tree Unit (CTU) includes coding tree blocks (CTB) and syntax specifying coding data and further subdivision. This subdivision results in coding unit (CU) leaves with coding blocks (CB). Each CU incorporates more entities for the purpose of prediction, so called prediction units (PU), and of transform, so called transform units (TU). Similarly, each CB is split into prediction blocks (PB) and transform blocks (TB) [3]. Figure 2 shows the coding units for a frame from the sequence Traffic.

Figure 3: Detail of 4kx2k sequence Traffic showing the coding block (white) and nested transform block (red) structure resulting from recursive quadtree structure. [3]. The coding units are traversed in a Z-scan order as shown in figure 4. The sizes of the luma CTB can be chosen as 16x16, 32x32 or 64x64 samples [2]. The larger sizes typically enable better compression [4]. Figure 4: HEVC Z-scan order for traversing the coding units. Figure adapted from the documentation accompanying the HM 8.0 source code [1][2].

3. INTRA PREDICTION IN HEVC FOR LUMA SAMPLES HEVC introduces 33 directional modes and a planar and a DC mode for intra prediction for luma samples [Fig 5]. The angles are intentionally designed to provide denser coverage for near-horizontal and near-vertical angles and coarser coverage for near-diagonal angles [1]. The angles are ± 0, ± 2, ± 5, ± 9, ± 17, ± 21, ± 26 and ± 32 in degrees, from the horizontal and vertical directions. In addition to this, HEVC supports two alternative prediction methods, planar and DC, to target regions which have strong directional edges. Figure 5: Luma modes for intra prediction in HEVC. Mode 0 is planar mode and Mode 1 is DC mode. [1] The prediction process of the Intra_Angular modes can involve extrapolating samples from the projected reference sample location according to a given directionality. To remove sample-by-sample switching between the reference row and column buffers, all extrapolations in a PB refer to a single reference row or column depending on the mode number [1]. The reference samples used for intra prediction are sometimes filtered by a 3 tap smoothing filter [1 2 1]/4 smoothing filter, in a manner similar to what was used for 8 8 intra prediction in H.264/MPEG-4 AVC. However, HEVC applies this smoothing operation more adaptively according to the directionality and the block size. As in H.264/MPEG-4 AVC [2], the smoothing filter is not applied for 4 4 blocks. For 8 8 blocks, only the diagonal directions 2, 18, or 34, use the reference sample smoothing. For 16 16 blocks, the reference samples are filtered for most directions except the nearhorizontal and near-vertical directions, (directions in the range of 9 to 11 and 25 to 27). For 32 32 blocks, all directions except the exactly-horizontal and exactly-vertical directions use the smoothing filter.

The Intra Planar mode also uses the smoothing filter when the block size is equal or greater than 8 8, and the smoothing is not used (or useful) for the Intra DC case. HEVC applies this smoothing operation adaptively according to the directionality and block size. [2]. Figure 6: Reference samples used with mode 29. [1]

4. PROBLEM STATEMNT Intra prediction in HEVC is incredibly complex because of the large number of possible modes. This is further complicated by the quad tree structure: the encoder must decide both the quad tree depth and the prediction direction for each coding unit. A brute force search for the best intra mode will consist of evaluating 735 modes per LCU 1. Figure 7: The recursive coding tree search employed by the HM encoder. [2]. Evaluating each mode consists of generating a prediction, subtracting the prediction from the original image and obtaining the residual. This is then used to compute the cost for rate distortion optimization. The reference software HM8.0 employs this brute force method, and it has been shown that in the all intra mode, a quarter of the total time is spent on rate-distortion optimization and an additional 16% for intra prediction [3]. The problem being considered is to evaluate a method to reduce the number of modes that need to be evaluated for each coding unit and generate data regarding its performance. It is hoped that similar attempts by other researchers will generate sufficient data so that the best method can eventually be determined by comparing such results of all proposed methods. 1 35 modes + ( 4 * (35 modes + (4 * 35 modes) ) = 735 modes

5. HYPOTHESIS The hypothesis is that the optimum coding unit size, and hence the quad tree depth, can be determined as a function of the variance of the pixel values within a coding unit and the correlation to the reference pixels. Also, the statistics of the most frequently used tree depth around the neighboring CUs can be helpful in determining the order in which the possible modes can be tested. Further, once the coding tree depth has been determined by analyzing the statistics of the image, the intra mode direction within each prediction unit can be viewed as classification based on image features. Thus an artificial neural network classifier can be trained to identify the most probable intra mode direction. [13] 6. PROPOSD WORK To verify the hypothesis, the following work is proposed: 1. Obtain a large set of sample pixels and corresponding intra mode decisions by modifying the HM 8.0 software [1][2], to log the relevant data and then compressing the test sequences using the modified encoder. 2. Develop a method to interpret the logged data to determine the image statistics. Use this to obtain the statistics for the data samples. 3. Search for a relation between the image statistics and the code tree depth. Specifically, determine if the variance within an image, and the correlation to the reference pixels can be used to obtain the quad tree depth. 4. Develop a MATLAB code to determine the quad tree depth using the developed method using variance and correlation and use it to determine quad tree depths with reference samples. Compare the results with the results of brute force search employed by HM 8.0 5. Train a neural network to identify the most possible intra mode direction. Collect the training data required from the modified HM 8.0 encoder. 6. Develop a MATLAB code to use the neural network to identify the intra mode for reference samples. Compare these with the results of HM 8.0 encoder. 7. Study the implementation complexity and verify that the proposed method is less complex than the brute force method. Also, compare the results with other complexity reduction techniques [6][7][8]. 8. Implement the above model in HM 8.0 and measure the gain in encoding time. Also, measure the increase in BD-bitrate and BD-PSNR [15][16][17]. Compare the results with the results achieved by other techniques [6][7][8]. 9. Use the data to conclude if the proposed solution is a feasible solution to the complexity of the HEVC encoder. Determine the limitations and possible improvements to the proposed solution.

REFERENCES 1. B. Bross, W. J. Han, J. R Ohm and T Wiegand, High efficiency video coding (HEVC) text specification draft 8, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT- VC) document JCTVC-J1003, July 2012 2. G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the high efficiency video coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol 22, pp., December 2012. 3. F. Bossen, B. Bross, K. Sühring, and D. Flynn, "HEVC Complexity and Implementation analysis," IEEE Transactions on Circuits and Systems for Video Technology, vol 22, pp., December 2012. 4. G. J. Sullivan and R.L Baker Efficient quadtree coding of images and video, IEEE Transactions on Image Processing, vol 3. No 3. pp. 327-31, Jan 1994. 5. Y. Tan, C. Yeo, H. Tan, and Z. Li, On residual quad-tree coding in HEVC" in IEEE International Workshop on Multimedia Signal Processing (MMSP), pp. 1 4, October 2011. 6. S.-W. Teng, H.-M. Hang, and Y.-F. Chen, Fast mode decision algorithm for residual quadtree coding in HEVC" in The Visual Communications and Image Processing (VCIP) Conference, pp. 1-4, November 2011. 7. K. Choi and E. Jang, Fast coding unit decision method based on coding tree pruning for high effciency video coding," Opt. Eng.0001; 51(3) : 030502 1 030502 3. doi: 10.1117/ 1.OE.51.3.030502, March 2012. 8. L. Zhao, L. Zhang, S. Ma, and D. Zhao, Fast mode decision algorithm for intra prediction in HEVC" in The Visual Communications and Image Processing (VCIP) Conference, pp. 1-4, November 2011. 9. W. Jiang, H. Ma, and Y. Chen, Gradient based fast mode decision algorithm for intra prediction in HEVC," in International Conference on Consumer Electronics, Communications and Networks (CECNet), pp. 1836-1840, April 2012. 10. X. Shen, L. Yu, and J. Chen, Fast coding unit size selection for hevc based on Bayesian decision rule" in Picture Coding Symposium (PCS), pp. 453 456, May 2012. 11. G. Tian and S. Goto, Content based hierarchical fast coding unit decision algorithm for HEVC" in Picture Coding Symposium (PCS), vol 1, pp. 56-59, May 2012. 12. C. Fogg, Suggested figures for the HEVC specification, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC) document JCTVC- J0292r1, July 2012. 13. C. H Lampert, Machine Learning for Video Compression: Macroblock Mode Decision, in 18 th International Conference on Pattern Recognition, 2006 (ICPR 2006). Vol 1, pp. 936-940. 14. Dr. Gary J. Sullivan, HEVC: The Next Generation in Video Compression, Keynote speech at VCIP 2012, Nov 29th 2012.

15. G. Bjontegaard, Calculation of average PSNR differences between RD-Curves, ITU-T SG16, Doc. VCEG-M33, 13th VCEG meeting, Austin, TX, April 2001. 16. G. Bjontegaard, Improvements of the BD-PSNR model, ITU-T SG16 Q.6, Doc. VCEG- AI11, Berlin, Germany, July 2008. 17. K. Anderson, R. Sjobetg and A. Norkin, BD measurements based on MOS, (online), ITU- T Q6/SG16, document VCEG-AL23, Geneva, Switzerland, July 2009. Available at http://wfpt3.itu.int/av-arch/video-site/0906_lg/vceg-al23.zip 18. J. Lainema et al, Intra coding of the HEVC standard, IEEE Trans. Circuits and Systems for Video Technology, vol 22, pp., Dec 2012 19. G. Correa et al, Performance and computational complexity assessment of high efficiency video encoders, IEEE Trans. Circuits and Systems for Video Technology, vol 22, pp.,dec 2012. 20. M. Zhou et al, HEVC lossless coding and improvements, IEEE Trans. Circuits and Systems for Video Technology, vol. 22, pp., Dec 2012.