A Novel Parallel-friendly Rate Control Scheme for HEVC

Similar documents
WITH the rapid development of high-fidelity video services

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICASSP.2016.

Conference object, Postprint version This version is available at

Bit Rate Control for Video Transmission Over Wireless Networks

Feasibility Study of Stochastic Streaming with 4K UHD Video Traces

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

ERROR CONCEALMENT TECHNIQUES IN H.264 VIDEO TRANSMISSION OVER WIRELESS NETWORKS

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b

Adaptive Key Frame Selection for Efficient Video Coding

RATE-DISTORTION OPTIMISED QUANTISATION FOR HEVC USING SPATIAL JUST NOTICEABLE DISTORTION

HEVC Real-time Decoding

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges

Rate-distortion optimized mode selection method for multiple description video coding

Convolutional Neural Network-Based Block Up-sampling for Intra Frame Coding

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

ROBUST REGION-OF-INTEREST SCALABLE CODING WITH LEAKY PREDICTION IN H.264/AVC. Qian Chen, Li Song, Xiaokang Yang, Wenjun Zhang

Key Techniques of Bit Rate Reduction for H.264 Streams

SCALABLE video coding (SVC) is currently being developed

HEVC Subjective Video Quality Test Results

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

Reduced complexity MPEG2 video post-processing for HD display

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

Visual Communication at Limited Colour Display Capability

Rate-Distortion Analysis for H.264/AVC Video Coding and its Application to Rate Control

Hierarchical SNR Scalable Video Coding with Adaptive Quantization for Reduced Drift Error

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

CONSTRAINING delay is critical for real-time communication

Bridging the Gap Between CBR and VBR for H264 Standard

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

Popularity-Aware Rate Allocation in Multi-View Video

IN OBJECT-BASED video coding, such as MPEG-4 [1], an. A Robust and Adaptive Rate Control Algorithm for Object-Based Video Coding

Overview: Video Coding Standards

Lecture 2 Video Formation and Representation

THE CAPABILITY of real-time transmission of video over

Chapter 10 Basic Video Compression Techniques

AUDIOVISUAL COMMUNICATION

Compressed Domain Video Compositing with HEVC

Video coding standards

Error concealment techniques in H.264 video transmission over wireless networks

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation

Low Power Design of the Next-Generation High Efficiency Video Coding

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

Chapter 2 Introduction to

Performance and Energy Consumption Analysis of the X265 Video Encoder

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

Dual Frame Video Encoding with Feedback

HIGH Efficiency Video Coding (HEVC) version 1 was

Temporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle

Analysis of the Intra Predictions in H.265/HEVC

The H.26L Video Coding Project

Signal Processing: Image Communication

Authors: Glenn Van Wallendael, Sebastiaan Van Leuven, Jan De Cock, Peter Lambert, Joeri Barbarien, Adrian Munteanu, and Rik Van de Walle

A Color Gamut Mapping Scheme for Backward Compatible UHD Video Distribution

RATE-REDUCTION TRANSCODING DESIGN FOR WIRELESS VIDEO STREAMING

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

Error Resilient Video Coding Using Unequally Protected Key Pictures

Wireless Multi-view Video Streaming with Subcarrier Allocation by Frame Significance

Robust Transmission of H.264/AVC Video using 64-QAM and unequal error protection

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

PACKET-SWITCHED networks have become ubiquitous

Fast Simultaneous Video Encoder for Adaptive Streaming

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 6, JUNE

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

Robust Transmission of H.264/AVC Video Using 64-QAM and Unequal Error Protection

Motion Video Compression

II. SYSTEM MODEL In a single cell, an access point and multiple wireless terminals are located. We only consider the downlink

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.

FMO-based H.264 frame layer rate control for low bit rate video transmission

Error Concealment for SNR Scalable Video Coding

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

Dynamic bandwidth allocation scheme for multiple real-time VBR videos over ATM networks

Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter?

Highly Efficient Video Codec for Entertainment-Quality

SCENE CHANGE ADAPTATION FOR SCALABLE VIDEO CODING

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame

UHD 4K Transmissions on the EBU Network

Multiview Video Coding

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Efficient Bandwidth Resource Allocation for Low-Delay Multiuser MPEG-4 Video Transmission

CHROMA CODING IN DISTRIBUTED VIDEO CODING

ROBUST ADAPTIVE INTRA REFRESH FOR MULTIVIEW VIDEO

Embedding Multilevel Image Encryption in the LAR Codec

Memory interface design for AVS HD video encoder with Level C+ coding order

Systematic Lossy Forward Error Protection for Error-Resilient Digital Video Broadcasting

UC San Diego UC San Diego Previously Published Works

A robust video encoding scheme to enhance error concealment of intra frames

Transcription:

A Novel Parallel-friendly Rate Control Scheme for HEVC Jianfeng Xie, Li Song, Rong Xie, Zhengyi Luo, Min Chen Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University Cooperative Medianet Innovation Center, Shanghai, China School of Electronics and Information Engineering, Shanghai University of Electric Power Multicoreware Email: {richrd, song li, xierong}@sjtu.edu.cn, lzy@shiep.edu.cn, chenm003@163.com Abstract Rate control plays a key role in video coding, which has a significant effect on encoder performance. With parallel video coding frameworks more and more popular, rate control suitable for parallel coding is highly desired. However, most rate control algorithms only focus on the rate distortion performance but ignoring the data correlation in parallel coding. In this paper, based on the parallel framework of the x265 encoder, we propose a parallel-friendly rate control scheme for HEVC coding, which supports both frame level and slice level parallel. Experimental results show that the algorithm can achieve not only highly accurate rate control but also excellent rate distortion performance under parallel coding. I. INTRODUCTION In recent years, high resolution video application becomes more and more popular. In order to satisfy the demands of high quality video services, multiple parallel technologies, such as frame parallel, slice/tile parallel, Wavefront Parallel Processing(WPP) [1] are designed to accomplish real-time video coding. However video encoder which uses parallel framework will introduce the challenge of data dependency. Data dependence effects not only the speedup, but also rate control (RC) algorithm in the parallel framework. Rate control, which is closed related to RD performance, is adopted to minimize distortion under a target bitrate limitation. Rate control roughly includes two steps. Firstly, appropriate target bits or bit budgets are allocated at different coding levels. Then appropriate coding parameters are set to produce bits as allocated. The bit budgets should be should be dynamically adjusted according to previous coding information, such as actually used bits, content complexity, buffer status, etc. However, with the introduction of parallel coding, multiple coding units may be encoded at the same time, which usually makes coding information of immediately previous units unavailable and degrades the rate control performance. Therefore, conventional rate control algorithms have to be adapted for the parallel coding frameworks. In [2][3], a parallel rate control algorithm based on image spatial division was for MPEG-2 MP@HL (Main Profile at High level). Every frame is divided into several parts of equal size, which are encoded independently at the same time. After finishing encoding, all bit streams are merged into an integrated stream. A global rate control algorithm is designed to allocate target bit for different parts according to previous coding information. In [4], a parallel rate control algorithm for H.264 SVC (Scalable Video Coding) was based on the dependency between different layers. For every slice in one frame, target bit is allocated according to the co-located slice s coding complexity in the previous frame, where coding complexity is defined as the product of actual bit and quantization step. A similar strategy is adopted to allocate target bits at the MB level. In [5], a parallel rate control algorithm for H.264/AVC was. This scheme performs target bit allocation at the GOP and the frame level. At the GOP level, target bit is allocated based on buffer occupancy rate. For every frame in one GOP, target bit is calculated according to their frame types in advance, which enables parallel encoding of multiple frames. All of above methods are designed for the previous generations of coding standard. As far as we known, there does not exist specifically designed rate control algorithm for HEVC. In this paper, we propose a novel parallel-friendly rate control scheme, which supports parallel coding both at the frame and the slice level. If applied the scheme necessitates few modifications of original parallel frameworks. Besides, low computational complexity is still maintained, which enables real-time application as well. The rest of this paper is organized as follows. Section II describes the parallel-friendly rate control algorithm. Section III shows the experimental results and discusses the coding performance. Section IV draws the conclusion. II. PROPOSED PARALLEL-FRIENDLY RATE CONTROL SCHEME The rate control algorithm can be applied to parallel coding down to slice level. As shown in Fig 1, every frame is divided into multiple parts of equal sizes, i.e. slices, which get encoded independently at the same time. The output stream is composed of bit streams from different sub coders. Except for respective internal rate control in every coder, a global rate control module is designed to periodically synchronize internal control and reallocate bit rates for different slices based on previous coding information. This method can efficiently adapt to slice parallel and minimize the target bit estimation error due to content variation.

Fig. 1. Joint rate control scheme of parallel video coding architecture This parallel rate control scheme mainly consists of four modules: frame level bit allocation, slice level bit allocation, rate-lambda (R-λ) model and rate control status update. Bit allocation modules allocate bits to different granularity levels according to actually coded bits, content complexity, buffer status, frame type, etc. R-λ model is used to set appropriate encoding parameters, i.e. lambda and QP, to produce allocated bits. Rate control status update module includes R-λ model parameter update and buffer status update. A. Frame level target bit allocation All frames are classified into four types: I, P, B and b, where B denotes the reference B frame and b denotes the nonreference B frame. Frames of different types receive different rate control. For the non-i frame, QP determination by R-λ model can be applied. As far as I frames are concerned, in light of an usual large interval, weak correlation between different I frames are assumed and neglected. Hence the original R-λ model used to set encoding parameters is no longer applicable and a simple QP estimation method is designed for I frames. 1) Non-I frame: The bit allocation depends mainly on three conditions: global target bitrate, virtual buffer status and frame type. Firstly, the average bits per frame is calculated as T avg = R tar fps where R tar is the global target bitrate and fps is the frame rate. The average frame bit is set as the benchmark of adaptive bit allocation. Secondly, the virtual buffer occupancy can be calculated by { V i = L i = 0 V i 1 + b i T avg otherwise where L denotes target buffer occupancy, which is set as 0.5 times of R tar. (1) (2) L = 0.5 R tar (3) In other words, the buffer can tolerate 0.5 second s bitrate fluctuation at most. A bigger target occupancy means virtual buffer can tolerate more bitrate fluctuation but a bigger delay. In our experiment, 0.5 times of R tar is a good experience value which compromises between bitrate fluctuation and delay. The virtual buffer size is also initialized as target buffer occupancy L, and after encoding one frame, the buffer occupancy is updated using the actually generated bit b i. Buffer status directs bit allocation in two ways. When the buffer occupancy ranges from 10% to 90% of target buffer occupancy, low risk of overflow or underflow is assumed and the frame target bit is slightly corrected as B = L V (4) SW where SW is the size of sliding window, which is used for smooth bit rate adjustment. The SW used in our experiments is set to 40. When the buffer occupancy is less than 10% or larger than 90% of the buffer size, a high risk of underflow or overflow is assumed. So the frame target bit needs a further adjustment to avoid that situation happening. The target bit considering buffer status is calculated by T norm = α T avg + B (5) where α is defined as 0.9 V > 0.9 T avg α = 1 0.1 T avg V 0.9 T avg (6) 1.1 V < 0.1 T avg Thirdly, frame type should be also taken into account. The final target bit is defined as T = T norm ω p = (α T avg + B) ω p (7) where ω p is the frame type dependent weight and can be fitted by pre-analysis of coding information. It is decided by the setting of GOP structure, mainly including key frame interval keyint and number of consecutive b-frames bf rames. For example, when bframes is set as the default value 4, frame weight can be determined as ω p = a keyint b + c (8) where parameter a, b and c are shown as Table I. TABLE I PARAMETER OF FRAME WEIGHT CALCULATION Frame type a b c P -7.272-0.451 3.589 B -1.333-0.627 0.6468 b -0.39-0.4974 0.2842 For other GOP structures, corresponding parameter can be fitted with the similar method. 2) I frame: I frames coding pattern is quite different from that of non-i frames, which makes it typically consume much more bits than others. Usually, exact bit allocation and QP estimation is still a challenge for I frames, for which the main reason is the weak correlation between neighboring I frames due to large intervals. Fortunately, since rate control usually

runs for a number of frames but not for a single one, bit rates can still be regulated afterwards in despite of possible inaccuracy. Here we use a simple QP estimation method to determine the quantization parameter for I frames. Usually, in view of the roles different frame types play, B frames QP should be larger than P frame, while P frames QP should be larger than the periodically inserted I frame. QP of I and B frames can be expressed with a rough conversion of that of P frames as follows QP I = QP P 6 ipf actor (9) QP B = QP P + 6 pbf actor (10) where ipf actor and pbf actor are the transfer factors for I and P frames and are set to 1.4 and 1.3, respectively. So all frames QP in an equivalent P frame format can be updated via an exponential average of coded frames QP with a forgetting factor set to 0.95 QP n = n 1 i=0 QP i 0.95 n 1 i +0.24 0.95 n n (11) 0.95 n i + 0.01 0.95 n i=1 where QP i is the QP of the i-th coded frame in equivalent P frame format and QP n is the estimated QP value of current frame in the equivalent P frame format. If the current frame is I frame, the QP will be estimated through the above equation. Notice that the above equation only applies to I frames. Consider the limited number of I frames, the method is undoubtedly feasible despite of slight inaccuracy. B. Slice level target bit allocation As is shown in Fig 1, the component slices from one image are independently encoded. Consider the possible content differences between slices, uniform allocation of bits for all slices may lead to significantly different quality. For example, band phenomenon may appear at the slice boundaries. To avoid that, a global rate control module is designed to periodically synchronize all coders rate control status and reallocate target bitrate for different slices according to the previous coding information. Specifically, every slice should have three parameters calculated before one frame is encoded: average target bit, target buffer occupancy and actual buffer occupancy. Let m denotes the number of slices, the jth slice s average target bit T j avg, target buffer occupancy L j and actual buffer occupancy V j can be recalculated through (12) (13) (14), respectively. T j avg = L j = SAT Dj m T avg (12) SAT D k k=1 SAT Dj m L (13) SAT D k k=1 V j = SAT Dj m V (14) SAT D k k=1 where SAT D j indicates the jth slice s weighted Sum of Absolute Transformed Difference(SATD) value with previous co-located slices, with a forgetting factor set to 0.5. This can make the reallocating adjustment more smoothly. The weight indicates how important is the history frame SATD to current frame. The frame far from current frame has a low weight. n SAT Dn j = w i SAT D j i (15) i=0 w i = 0.5 n i / n 0.5 n k (16) k=0 After the reallocation adjustment, the target bit calculation can be conducted as the frame level bit allocation. C. λ and QP determination with R-λ model Except for the I frames QP determination using the above method, the R-λ model is adopted to determine QP of non I-frames. The R-λ model is the latest rate control model in HEVC, which has been adopted by the HEVC reference software HM. According to the RD relationship analysis on HEVC, Li [6] builds an exponential relationship between rate and lagrange multiplier λ, which is modeled as λ = α bpp β (17) where bpp indicates the bit per pixel. If the target bit is T and the number of pixels is N, then the bpp is defined as bpp = T (18) N The model parameters α and β are updated according to the actually used bits after coding every frame. λ comp = α old bpp β old real (19) α new = α old + δ α (ln λ real ln λ comp ) α old (20) β new = β old + δ β (ln λ real ln λ comp ) ln bpp real (21) where bpp real is the actual bit per pixel. QP can be determined through the empirical equation between λ and QP λ QP = 3 log 2 + 12 (22) To keep consistent quality of coded video, QP is clipped into an appropriate range as follows. First, the difference from that of the last frame with different frame type should not exceed 10. QP last diff type 10 QP QP last diff type + 10 (23) Second, the difference from that of the last frame with same frame type should not exceed 3. QP last same type 3 QP QP last same type + 3 (24)

III. EXPERIMENTAL RESULTS Experiments are conducted to test the performance of the rate control scheme. Main indexes include R-D performance and rate control accuracy, where R-D performance is measured by PSNR and, rate control accuracy is measured by bitrate error between target bitrate and actual bitrate. The benchmark we used in our experiment is x265 1.6 which supports frame parallel and WPP encoding. Two kinds of rate control scheme in original x265 including and is used as the comparing object, where means average bit rate, and VBV means video buffer verifier. mode has a good RD performance while rate control accuracy is terrible. VBV mode is a plug-in mode which can be used in most of rate control scheme to further subtly adapt the QP and achieve better rate control accuracy, but the RD performance will suffer a great degradation. Our ultimate goal is to obtain a enough accurate rate control accuracy close to mode, with a RD performance improvement. The preset of x265 is set as medium. Considering the demand of rate stability, scene cut detection is turn off, because the uncertain I frame introduced by it will lead to drastic rate fluctuation and have a significant harmful influence to rate control performance. Actually, in most of real-time coding application, scene cut detection is usually not used. The key frame interval is set as 30 frames. Number of consecutive b- frames is set as 4. The hierarchical depth is two by default. All the 1080p HD sequences (Kimono1, P arkscene, Cactus, BasketballDrive and BQT errace) in the HEVC standard test sequences Class B are adopted. Target bitrate is set according to HEVC call for proposal [7]. Specially, for VBV mode, the vbv buffer size is set as one second s rate bit. The vbv max rate and vbv init size is set by encoder default. A. Performance comparing to x265 anchor Firstly, to validate the RD-performance improvement, two quality metrics, PSNR and, are used in our experiment. More and more researches have reached a consensus that is a more effective video quality metric than PSNR which provides a good approximation of the perceptual visual quality degradation. TABLE II BD-RATE TO ORIGINAL RC ALGORITHM BD-Rate psnr ssim psnr ssim Kimono1 6.00% 6.39% -0.34% -1.32% ParkScene 1.29% 1.77% -3.13% -3.78% Cactus -1.50% -2.54% -2.95% -4.32% BasketballDrive 1.77% -0.70% 0.11% -3.38% BQTerrace -2.40% -1.88% -4.44% -5.06% Average 1.03% 0.61% -2.15% -3.57% The BD-Rate comparing to x265 original mode and mode is list in Table II. The two columns of psnr and ssim list the BD-rate on quality metric with PSNR and, respectively. From this table, we can find that algorithm s RD performance is slightly worse than mode, 38.5.5.5 Rate-PSNR curve of Cactus 34 0.92 0.91 0.9 (a) Rate- curve of Cactus 0.84.5.5 5 (b) Rate-PSNR curve of BQTerrace 34 5 5 5 5 (c) Rate- curve of BQTerrace 0.845 (d) Fig. 2. R-D curve ( (a) Rate-PSNR curve of Cactus (b) Rate- curve of Cactus (c) Rate-PSNR curve of BQT errace (d) Rate- curve of BQT errace) ) while has a significant improvement to mode. Specially, for the metric, algorithm has a close performance to mode, even better on some sequences. Comparing to mode, method achieves a

great gain up to 3.57% on average. Fig 2 shows the Rate-PSNR curve and Rate- curve of two sequences, Cactus and BQT errace. RD performance achieves a improvement to both the original and algorithm. This is largely because we adopt the more reasonable target bit allocation method and more accurate R-Q model than the original rate control module. Secondly, to compare the rate control accuracy of our algorithm, a mismatch ratio is defined by M% = R actual R t arg et R t arg et 100% (25) where R target and R actual denote the target bit rate and the actual bit rate, respectively. As stated before, a sequence adopts the same target bitrate for the anchor algorithm and the algorithm. Table III states the bit rate mismatch comparisons of the two x265 anchor rate control algorithm and the rate control algorithm. It shows that the rate control accuracy of algorithm is much better than mode, while slightly worse than mode. Observing all the sequences result, we can find that the worst performance sequence is Kimono1. The main reason is that there exists a scene cut in this sequence, which causes a rate control performance degradation. But method s performance is still much better than mode where maximum mismatch is up to 18%. Kimono1 ParkScene Cactus BasketballDrive BQTerrace TABLE III MISMATCH COMPARING target/kbps Rate control mismatch 6000 9.05% 2.49% 6.10% 4000 13.65% 3.47% 5.30% 1600 17.73% 5.17% 3.25% 1000 18.09% 5.67% 1.83% 6000 5.% 1.% 4.68% 4000 4.99% 1.82% 3.79% 1600 5.20% 2.78% 2.10% 1000 5.10% 3.26% 1.52% 10000 1.41% 0.90% 1.79% 7000 0.69% 0.95% 1.72% 3000 0.54% 0.93% 1.48% 2000 0.97% 1.01% 1.30% 10000 2.15% 1.80% 3.17% 7000 1.62% 1.73% 3.23% 3000 0.24% 1.31% 3.08% 2000 1.04% 1.14% 2.86% 10000 3.02% 1.60% 0.16% 7000 2.42% 1.40% 0.72% 3000 0.33% 0.79% 1.28% 2000 0.52% 0.74% 1.70% Average 4.71% 2.02% 2.55% To illustrate the rate control performance more intuitive, frame bit frame bit 14 12 10 8 6 4 2 10 5 Actual bits of P frame 0 0 50 100 150 200 250 300 0 400 frame number 4 3.5 3 2.5 2 1.5 1 0.5 10 5 (a) Actual bits of B frame 0 0 50 100 150 200 250 300 0 400 450 frame number (b) Fig. 3. Frame actual bit ( (a) P frame bits (b) B frame bits ) Fig 3 shows the frame actual bits of three kinds of rate control method. The sequence used is the connected sequence of all test sequences mentioned above, which makes the sequence more closing to a real video with scene cut. Fig 3(a) shows the P frame s bit and Fig 3(b) shows the B frame s bit. We can find that method has more smoothly frame bit variation than the original algorithm. To sum up, algorithm can obtain a enough accurate rate control accuracy close to mode. Meanwhile, a significant RD performance improvement has been achieved. B. Performance of the joint rate control module One important aspect we need to validate is the performance of joint rate control module. The test condition is designed as follows. Two kinds of image division strategies is used in our experiment. One is dividing each image to 2 equal parts and the other is dividing each image to 4 equal parts. For each division strategies, two kinds of bit allocation scheme is used for these slices. First one is averagely allocating frame bits to each slice, which is marked by parts equal in the following pages. The second one uses the slice target bit allocation scheme described in Section II, which is marked by parts satd in the following pages. Table IV shows the slice bit allocation scheme s RD-performance improvement comparing to the equal slice bit allocation. The two columns of psnr and ssim list the BDrate on quality metric with PSNR and, respectively. We can find that RD has about 1% gain on PSNR for both kinds of division strategies, while on, about 2% gain is achieved.

TABLE IV BD-RATE TO EQUAL BIT ALLOCATION 39 38.5 Rate-PSNR curve of BasketballDrive of 2 parts BD-Rate 2parts 4parts psnr ssim psnr ssim Kimono1-1.52% -2.53% -1.71% -4.49% ParkScene 0.63% 1.24% 0.47% 1.% Cactus -0.09% 1.21% 0.16% 1.06% BasketballDrive -2.77% -1.85% -4.58% -4.58% BQTerrace -1.64% -6.21% 0.64% -5.12% Average -1.08% -1.63% -1.00% -2.% Notice that three sequences, Kimono1, BasketballDrive and BQT errace, have a higher RD performance improvement than others. Observing these sequences content, these sequences have a bigger difference between divided slices than others, which verifies that slice bit allocation scheme has a better content adaption. Fig 4 shows the Rate-PSNR curve and Rate- curve of sequence BasketballDrive, when sequences are divided into 2 parts and 4 parts, respectively. Comparing to the equal slice bit allocation, method achieves a significant performance improvement. Kimono1 ParkScene Cactus BasketballDrive BQTerrace TABLE V MISMATCH COMPARING target bitrate /kbps 2parts equal Rate control mismatch 2parts 4parts satd equal 4parts satd 6000 6.59% 6.25% 6.58% 6.10% 4000 5.73% 5.63% 5.88% 5.57% 1600 3.46% 3.43% 3.51% 3.55% 1000 2.02% 2.07% 2.33% 2.39% 6000 5.03% 5.05% 4.91% 4.96% 4000 3.93% 3.88% 3.83% 3.80% 1600 1.59% 1.24% 1.29% 0.90% 1000 0.38% 0.10% 0.68% 0.14% 10000 2.30% 2.29% 2.22% 2.21% 7000 2.13% 2.09% 2.03% 1.97% 3000 1.45% 1.45% 1.32% 1.28% 2000 1.08% 0.98% 0.90% 0.68% 10000 2.95% 2.93% 2.85% 2.79% 7000 2.82% 2.83% 2.67% 2.68% 3000 2.59% 2.70% 2.43% 2.57% 2000 2.% 2.47% 2.32% 2.40% 10000 0.06% 0.14% 0.09% 0.29% 7000 0.46% 0.66% 0.43% 0.79% 3000 1.09% 1.15% 0.97% 1.34% 2000 0.92% 1.28% 1.22% 1.57% Average 2.45% 2.43% 2.42% 2.40% Table V states the bit rate mismatch of the two kinds of slice bit allocation scheme. we can find that scheme has a slight refinement, which also verifies that slice bit allocation scheme has a better content adaption. 38.5.5.5 0.92 0.91 0.9 39 38.5 38.5.5.5 0.92 0.91 0.9 2parts equal 2parts stad (a) Rate- curve of BasketballDrive of 2 parts 2parts equal 2parts stad (b) Rate-PSNR curve of BasketballDrive of 4 parts 4parts equal 4parts stad (c) Rate- curve of BasketballDrive of 4 parts 4parts equal 4parts stad (d) Fig. 4. R-D curve ( (a) Rate-PSNR curve of 2 parts (b) Rate- curve of 2 parts (c) Rate-PSNR curve of 4 parts (d) Rate- curve of 4 parts) ) IV. CONCLUSION In this paper, we propose a novel parallel-friendly rate control scheme, which supports parallel coding both at the frame and slice level. Experimental results show that the

algorithm can obtain a rate accuracy close to that of the original x265 plus VBV mode but with a significant improvement of RD performance. Besides, SATD based slice bit allocation provides a better content adaptation, which makes the algorithm more applicable to content variation than other schemes. ACKNOWLEDGMENT This work was supported by Shanghai Zhangjiang national independent innovation demonstration zone development fund(201501-pd-sb-b201-001) and NSFC (61671296,61527804,61521062). REFERENCES [1] Chi C C, Alvarez-Mesa M, Juurlink B, et al. Parallel scalability and efficiency of HEVC parallelization approaches[j]. Circuits and Systems for Video Technology, IEEE Transactions on, 2012, 22(12): 1827-1838. [2] Nakamura K, Ikeda M, Yoshitome T, et al. Global rate control scheme for MPEG-2 HDTV parallel encoding system[c]//information Technology: Coding and Computing, 2000. Proceedings. International Conference on. IEEE, 2000: 195-200. [3] Nog S. A study on rate control method for MP@ HL encoder with parallel encoder architecture using picture partitioning[c]//image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on. IEEE, 1999, 4: 261-265. [4] Sanz-Rodriguez S, Mayer T, Alvarez-Mesa M, et al. A low-complexity parallel-friendly rate control algorithm for ultra-low delay high definition video coding[c]//multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on. IEEE, 2013: 1-4. [5] Wang J, Gao Z, Zhang X. Efficient parallel-friendly rate control for realtime UHD video encoder on many-core platform[c]//multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on. IEEE, 2014: 1-6. [6] Bin Li; Houqiang Li; Li Li; Jinlei Zhang, Domain Rate Control Algorithm for High Efficiency Video Coding, Image Processing, IEEE Transactions on, vol.23, no.9, pp.3841,3854, Sept. 2014. [7] ITU-T Q6/16, lso/lec JTC1/SCZQ/WG11, VCEG-AM91 (2010) Joint Call for Proposals on Video Compression Technology, 22 January 2010, Kyoto, Japan.