Visual Communication at Limited Colour Display Capability Yan Lu, Wen Gao and Feng Wu Abstract: A novel scheme for visual communication by means of mobile devices with limited colour display capability is proposed. A colour quantization technique is first proposed based on the perception of human visual system on skin colour. An efficient coding scheme is then developed to compress colour quantised video at very low bit rate. I. INTRODUCTION With the rapid developments on multimedia and Internet, more and more images and videos are transmitted and viewed in digital form. However, people still experience difficultly access image/video despite of the great achievements in visual communication techniques. One of the main problems arises owing to the limited colour display capability of mobile devices. Full-colour display devices use 24 bits to specify the colour of each pixel on the screen. However, most of mobile devices have insufficient capability on displaying a full-colour video. Therefore, a reduced colourresolution screen is often used with 8 bits for each pixel. Instead of directly specifying a colour, 8 bits give a colour index in the colour map (or palette). The video represented by the indexes associated with a colour palette is known as pseudocolour video. A further main problem of mobile communication is due to low channel bandwidth, which seeks to code source video at very low bit rates. Generally efficient compression of pseudo-colour video is very difficult because the indexes lose spatial correlation. A straightforward way is to directly code raw full-colour video by utilizing the available coding techniques such as H.263 and MPEG-4, and then apply dynamic colour quantisation on the decoder side prior to display. However, good dynamic colour quantization usually requires high intensive computation that most of mobile devices cannot offer [1]. In this Letter we first propose a colour quantisation technique by considering the perception of human visual system (HVS) on human skin colour. The skin colour model is trained based on yellow and white human as an example. For other human species, different skin colour models can be easily produced. After colour 1
quantisation, the video sequence still consists of luminance and chrominance values with fewer bits. We call this limited colour video in this Letter. The existing video coding techniques are still suitable to the limited colour video. Therefore, an efficient coding scheme is developed from JVT [2] with significantly modification on chrominance compression. At the decoder side, the reconstructed luminance and chrominance can properly display with the proposed colour map. II. COLOR QUANTISATION Since YUV colour space is used in the most video coding systems, the colour quantisation algorithm proposed in this Letter is operated on it. HVS can readily understand and recognize the luminance (or gray) image rather than the chrominance image. Therefore, allocating more bits to represent the luminance (Y) component is reasonable. Assume that N (N<8) bits are assigned to Y component. The raw luminance data is divided by M to obtain the quantised luminance, where M=2 (8-N). In the designed colour map, the N bits data s is mapped to sm+m/2 for the display. Fig. 1b shows the 6 bits luminance representation of the original image in Fig. 1a, which provides acceptable visual quality. Thus, we select N=6. Since the chrominance signal is a residual one and provides less important information, it can be specified by few bits. We proposes to represent each U/V component with only one bit. Facial skin colour is considered as an emphasis in the reconstructed colour space because face-shoulder video is extensively applied in mobile visual communication. The statistics from over 1,000 images shows that the distribution of skin colour pixels only locates on a small range in the U-V space. Hence the skin colour can be represented with the U/V value of the distribution center. In order to effectively distinguish skin colour from non-skin colour, the U/V threshold is close to the distribution boundaries. Fig. 1c shows the image produced by the proposed luminance quantisation and chrominance quantisation. Obviously, it provides more natural visual quality than the image in Fig. 1d, which is represented with the traditional pseudo-colour quantisation technique. 2
a b c d Fig. 1 Typical images. a 24-bit full colour; b 6-bit luminance; c proposed colour quantisation; d traditional pseudo-colour quantization. III. LIMITED COLOUR VIDEO CODING Fig. 2 shows the block diagram of the proposed encoder, in which different schemes are used to code luminance and chrominance components. The baseline of luminance coding is from the JVT standard [2]. As shown in Fig. 2, input raw luminance data is first quantised into 6 bits with the proposed luminance quantisation (LQ). The JVT coding scheme is then performed on the quantised luminance. The major modules, such as motion compensation with adaptive block size, 4x4 integer transform, entropy coding, and loop filter, are similar to the JVT standard. However, some coding parameters have to be adjusted because of the different ranges of input data. For the chrominance coding, the key problem is not only how to achieve high coding efficiency for a given sequence, but how to generate the chrominance sequence so that it is easy to be compressed with acceptable perception quality. Adaptive chrominance quantisation embedded in the encoder is used for this purpose. As shown in Fig. 2, the input U/V component can be coded with either intra-mode or inter-mode. The selection is controlled by switch S 1. For the inter-coding, motion compensation with adaptive block size is first performed with the motion vectors (MVs) obtained in the luminance coding. For the Intra-coding, adaptive context-based prediction is utilised. The prediction of one pixel comes from its up and left neighboring pixels. After inter/intra-prediction, the raw chrominance data is binarised. 3
Suppose that t s is he chrominance threshold defined according to the skin colour model, and t c is a constant. If the predicted value is 0, we set the threshold T=t s +t c ; otherwise, T=t s -t c. Notice that for the intra coding, adaptive binarization has to be done pixel by pixel along with the intra prediction. After inter/intra prediction and binarization, XOR is operated on the quantized data and the predicted data. The XOR results are coded with the context-based arithmetic encoding (CAE) scheme. Intra Prediction + - DCTQ Entropy coding Luminance Bitstream S 0 MC Q -1 IDCT Video LQ ME MVs Frame Buffer0 Loop Filter + MVs MC Frame Buffer1 Intra Prediction S 1 Adaptive Thresholding XOR XOR Chrominance Bitstream CAE Fig. 2 Block diagram of the proposed limited colour video coding scheme. IV. EXPERIMENTAL RESULTS In order to evaluate the proposed visual communication scheme with limited colour video, we performed experiments on 4 MPEG-4 test sequences. The sequence Akiyo represents the scene with little head motion, the sequence Grandma represents the scene with relatively larger head motion, the sequence Salesman has complex still background, and the sequence Foreman has motion background. Each sequence is of QCIF format with the frame rate of 10 fps. Fig. 3 illustrates a typical frame of each decoded sequence. The left column shows the images coded with the proposed encoder and displayed with the proposed colour map. The middle column shows the images coded with JVT standard and displayed in 24 bits colour. And the right column shows the images converted from full-colour images in the middle column to pseudo-colour images. 4
It is fully shown that although raw sequences are quantised with the proposed technique, the proposed scheme can achieve very good visual quality at very low bit rates, especially on human face region. The decoded images are very close to that reconstructed by JVT standard except for a little colour distortion on the background. However, the conventional JVT standard cannot be directly used for coding the limited colour video. Obviously, the images with the traditional colour quantisation in the right column have very worse visual quality than those in the left column. If continuously displaying the sequence in the right column, the visual quality is even worse because it might suffer from the flicks. a b c d Fig. 3 Typical frame of each decoded sequence. a Akiyo at 4.9 kbps; b Grandma at 7.0 kbps; c Salesman at 8.3 kbps; d Foreman at 16.5 kbps. V. CONCLUSIONS We have presented a novel scheme for visual communication by the mobile devices with limited colour display capability. With the assistances of the proposed colour quantisation and limited colour video coding techniques, the proposed scheme provides very good visual quality for mobile communication. 5
REFERENCES 1. Roytman, E. and Gotsman, C.: Dynamic color quantization of video sequences, IEEE Trans. On Visualization and Computer Graphics, 1995, 1, pp. 274-286 2. Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Geneva, Switzerland: Working draft number 2 (WD-2), February 2002 Yan Lu (Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China) Email: ylu@jdl.ac.cn or ylu@ieee.org) Wen Gao (Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China) Feng Wu (Microsoft Research Asia, Beijing, 100080, China) 6