MULTI-VIEW VIDEO COMPRESSION USING DYNAMIC BACKGROUND FRAME AND 3D MOTION ESTIMATION

Similar documents
Adaptive Down-Sampling Video Coding

TRANSFORM DOMAIN SLICE BASED DISTRIBUTED VIDEO CODING

THE INCREASING demand to display video contents

Truncated Gray-Coded Bit-Plane Matching Based Motion Estimation and its Hardware Architecture

application software

Overview ECE 553: TESTING AND TESTABLE DESIGN OF. Ad-Hoc DFT Methods Good design practices learned through experience are used as guidelines:

UPDATE FOR DESIGN OF STRUCTURAL STEEL HOLLOW SECTION CONNECTIONS VOLUME 1 DESIGN MODELS, First edition 1996 A.A. SYAM AND B.G.

Region-based Temporally Consistent Video Post-processing

-To become familiar with the input/output characteristics of several types of standard flip-flop devices and the conversion among them.

Measurement of Capacitances Based on a Flip-Flop Sensor

DO NOT COPY DO NOT COPY DO NOT COPY DO NOT COPY

application software

Lab 2 Position and Velocity

10. Water tank. Example I. Draw the graph of the amount z of water in the tank against time t.. Explain the shape of the graph.

A Turbo Tutorial. by Jakob Dahl Andersen COM Center Technical University of Denmark

A Methodology for Evaluating Storage Systems in Distributed and Hierarchical Video Servers

CE 603 Photogrammetry II. Condition number = 2.7E+06

BLOCK-BASED MOTION ESTIMATION USING THE PIXELWISE CLASSIFICATION OF THE MOTION COMPENSATION ERROR

A ROBUST DIGITAL IMAGE COPYRIGHT PROTECTION USING 4-LEVEL DWT ALGORITHM

Personal Computer Embedded Type Servo System Controller. Simple Motion Board User's Manual (Advanced Synchronous Control) -MR-EM340GF

Supercompression for Full-HD and 4k-3D (8k) Digital TV Systems

MELODY EXTRACTION FROM POLYPHONIC AUDIO BASED ON PARTICLE FILTER

Coded Strobing Photography: Compressive Sensing of High-speed Periodic Events

Video Summarization from Spatio-Temporal Features

Workflow Overview. BD FACSDiva Software Quick Reference Guide for BD FACSAria Cell Sorters. Starting Up the System. Checking Cytometer Performance

Enabling Switch Devices

Automatic location and removal of video logos

Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion

Hierarchical Sequential Memory for Music: A Cognitive Model

LOW LEVEL DESCRIPTORS BASED DBLSTM BOTTLENECK FEATURE FOR SPEECH DRIVEN TALKING AVATAR

LATCHES Implementation With Complex Gates

Telemetrie-Messtechnik Schnorrenberg

Solution Guide II-A. Image Acquisition. Building Vision for Business. MVTec Software GmbH

The Art of Image Acquisition

4.1 Water tank. height z (mm) time t (s)

Solution Guide II-A. Image Acquisition. HALCON Progress

Source and Channel Coding Issues for ATM Networks y. ECSE Department, Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A

2015 Communication Guide

Video inpainting of complex scenes based on local statistical model

Nonuniform sampling AN1

The Art of Image Acquisition

MELSEC iq-f FX5 Simple Motion Module User's Manual (Advanced Synchronous Control) -FX5-40SSC-S -FX5-80SSC-S

First Result of the SMA Holography Experirnent

Computer Vision II Lecture 8

Computer Vision II Lecture 8

Automatic Selection and Concatenation System for Jazz Piano Trio Using Case Data

Computer Graphics Applications to Crew Displays

Besides our own analog sensors, it can serve as a controller performing variegated control functions for any type of analog device by any maker.

Mean-Field Analysis for the Evaluation of Gossip Protocols

SAFETY WITH A SYSTEM V EN

(12) (10) Patent N0.: US 7,260,789 B2 Hunleth et a]. (45) Date of Patent: Aug. 21, 2007

AUTOCOMPENSATIVE SYSTEM FOR MEASUREMENT OF THE CAPACITANCES

Real-time Facial Expression Recognition in Image Sequences Using an AdaBoost-based Multi-classifier

Determinants of investment in fixed assets and in intangible assets for hightech

Connecting Battery-free IoT Tags Using LED Bulbs

AN ESTIMATION METHOD OF VOICE TIMBRE EVALUATION VALUES USING FEATURE EXTRACTION WITH GAUSSIAN MIXTURE MODEL BASED ON REFERENCE SINGER

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

Drivers Evaluation of Performance of LED Traffic Signal Modules

Removal of Order Domain Content in Rotating Equipment Signals by Double Resampling

SC434L_DVCC-Tutorial 1 Intro. and DV Formats

Novel Power Supply Independent Ring Oscillator

Digital Panel Controller

VECM and Variance Decomposition: An Application to the Consumption-Wealth Ratio

Advanced Handheld Tachometer FT Measure engine rotation speed via cigarette lighter socket sensor! Cigarette lighter socket sensor FT-0801

Marjorie Thomas' schemas of Possible 2-voice canonic relationships

THERMOELASTIC SIGNAL PROCESSING USING AN FFT LOCK-IN BASED ALGORITHM ON EXTENDED SAMPLED DATA

Physics 218: Exam 1. Sections: , , , 544, , 557,569, 572 September 28 th, 2016

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

On Mopping: A Mathematical Model for Mopping a Dirty Floor

Commissioning EN. Inverter. Inverter i510 Cabinet 0.25 to 2.2 kw

Circuit Breaker Ratings A Primer for Protection Engineers

EX 5 DIGITAL ELECTRONICS (GROUP 1BT4) G

R&D White Paper WHP 120. Digital on-channel repeater for DAB. Research & Development BRITISH BROADCASTING CORPORATION.

Diffusion in Concert halls analyzed as a function of time during the decay process

Communication Systems, 5e

United States Patent (19) Gardner

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

Singing voice detection with deep recurrent neural networks

G E T T I N G I N S T R U M E N T S, I N C.

The Impact of e-book Technology on Book Retailing

Monitoring Technology

Techniques to Improve Memory Interface Test Quality for Complex SoCs

H3CR. Multifunctional Timer Twin Timer Star-delta Timer Power OFF-delay Timer H3CR-A H3CR-AS H3CR-AP H3CR-A8 H3CR-A8S H3CR-A8E H3CR-G.

SMD LED Product Data Sheet LTSA-G6SPVEKT Spec No.: DS Effective Date: 10/12/2016 LITE-ON DCC RELEASE

SOME FUNCTIONAL PATTERNS ON THE NON-VERBAL LEVEL

LCD Module Specification

Sustainable Value Creation: The role of IT innovation persistence

WITH the rapid development of high-fidelity video services

Taming the Beast in Mankind Telecommunications in the 21st Century

Study of Municipal Solid Wastes Transfer Stations Locations Based on Reverse Logistics Network

AN-605 APPLICATION NOTE

The Measurement of Personality and Behavior Disorders by the I. P. A. T. Music Preference Test

Conference object, Postprint version This version is available at

Ten Music Notation Programs

ROBUST ADAPTIVE INTRA REFRESH FOR MULTIVIEW VIDEO

ZEP - 644SXWW 640SX - LED 150 W. Profile spot

Type: Source: PSU: Followspot Optics: Standard: Features Optical Fully closing iris cassette: Long lamp life (3000 h) Factory set optical train:

Student worksheet: Spoken Grammar

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Transcription:

MULTI-VIEW VIDEO COMPRESSION USING DYNAMIC BACKGROUND FRAME AND 3D MOTION ESTIMATION Manoranjan Paul, Junbin Gao, Michael Anoolovich, and Terry Bossomaier School of Compuing and Mahemaics, Charles Sur Universiy, Bahurs, Ausralia Email: {mpaul, jbgao, manolovich, bossomaier}@csu.edu.au ABSTRACT The H.264/MVC muli-view video coding sandard provides a beer compression rae compared o he simulcas coding echnique (i.e., H.264/AVC) by exploiing iner- and inraview redundancy. However, his echnique imposes random access frame delay as well as requires huge compuaional ime. In his paper hree novel echniques are proposed o overcome he above menioned problems. Firsly, a simulcas video coding echnique is proposed where each view is encoded individually using wo reference framesimmediae previous frame and a dynamic bacground frame (popularly nown as McFIS- he mos common frame in a scene) of he corresponding view. Secondly a novel echnique is proposed using 3D moion esimaion (3D-ME) where a 3D frame is formed using he same emporal frames of all views and ME is carried ou for he curren 3D frame using he immediae previous 3D frame as a reference frame. Thereafer, a fracional ME refinemen is also conduced on individual frames of 3D curren frame using individual reference frames. Finally, a modificaion of he 3D-ME echnique is proposed where an exra reference frame namely 3D McFIS is used for 3D-ME. As he correlaion among he inra-view images is higher compared o he correlaion among he iner-view images, he proposed 3D-ME echniques reduce he overall compuaional ime and eliminae he frame delay wih comparable rae-disorion (RD) performance compared o H.264/MVC. Experimenal resuls reveal ha he proposed echniques ouperform he H.264/MVC in erms of improved RD performance by reducing compuaional ime and by eliminaing he random access frame delay. Index Terms McFIS, 3D Moion Esimaion, 3D Video Coding, uncovered bacground, hierarchical B-picure, and MRFs. 1. INTRODUCTION In muli-view video coding (MVC), a scene is capured by differen video cameras from differen angles ha provide a more realisic experience abou he scene compared o ha of a single video camera. Obviously, ransmission and soring of muli-view videos require huge amouns of compuaion and daa manipulaion compared o single view video, alhough here is a significan amoun of daa redundancy among views. Recenly, H.264/MVC [1]-[3] proposed a reference srucure among views (S) and emporal (T) images. In he reference srucure of MVC, hierarchical B-picure predicion forma [4][5] is used for inra- and iner-view. The echnique explois he redundancy from neighbouring frames as reference from boh iner- and inra-view o encode he curren frame. The iner- and inraview referencing echnique provides 20% more bisream reducion compared o he simulcas echnique where noiner-view redundancy is exploied i.e., each view is encoded separaely [1]. Fig 1: Predicion srucure recommended by he H.264/MVC for referencing differen views (S) and differen emporal (T) images for coding. Fig 1 shows he predicion srucure of he MVC recommended by he H.264/MVC sandard where eigh views are used. According o he predicion srucure, a frame may use maximum four frames as reference frames. Someimes, encoding (or decoding) a frame using muliple iner- and inra-view reference frames requires encoding (or decoding) a number of frames in advanced; hus, he srucure inroduces random access delay due o he dependency on oher frames. The random access delay is measured based on he maximum number of frames ha mus be decoded in order o access a B-frame in he hierarchical srucure. The access delay for he highes hierarchical order is given by: F level 2 ( N 1) / 2 (1) max 3 max

where level max is he highes hierarchical order and N is he oal number of views [3]. For insance, in order o access a B-frame in he 4h hierarchical order (B4-frames in Fig 1), 18 frames (F max = 18) mus be decoded. Due o he random access problem, some applicaions such as ineracive realime communicaion may no be possible using he exising predicion srucure. The H.264/AVC video coding sandard improves coding performance by reducing up o 50% bisreams compared o is predecessor H.263 by increasing compuaional complexiy by up o 10 imes [7] for a single view video. In addiion, when H.264/MVC encodes muliview videos, i requires muliple amouns of compuaional ime compared o he H.264/AVC. The enormous requiremen of compuaional ime limis he scope of 3D video coding applicaions especially for elecronic devices wih limied processing and baery power such as smar phones. Alhough he simulcas coding echnique using H.264/AVC (where each view is encoded individually) for muli-view videos is inferior compared o H.264/MVC in erms of rae-disorion (RD) performance, i does no have he random access delay problem. Recenly, a video coding echnique was proposed where a dynamic bacground frame (i.e., McFIS- he mos common frame in a scene) is used as an exra reference frame [8][9] for encoding he curren frame assuming ha he moion par of he curren frame would be referenced using he immediae previous frame and he saic bacground par would be referenced using McFIS. McFIS is generaed using he Gaussian mixure model [10]- [12]. In his paper a simulcas video coding approach (named H.264/AVC-McFIS) based on H.264/AVC is proposed where McFIS of each view is used o encode he corresponding view. This echnique improves RD performance compared o H.264/MVC and simulcas H.264/AVC on muli-view videos which have significan amouns of bacground area. Compuaional ime requiremen of he proposed echnique should be higher compared o H.264/MVC and H.264/AVC. In his paper a new echnique is proposed using 3D moion esimaion (3D-ME) o overcome he random access frame delay and compuaional ime problems of he exising MVC echnique. In he 3D-ME echnique, a 3D frame is formed using he same emporal frames (i.e, i h frames) of all views and moion esimaion (ME) is carried ou for a macrobloc of he curren 3D frame using he immediae previous 3D frame as reference frame (which is formed by he (i-1) h frames of all views). Thereafer, a fracional ME refinemen is also conduced on individual frames of he 3D curren frame using individual reference frames o capure differen moions of each view. As he correlaion among he inra-view images is higher han he correlaion among he iner-view images, he proposed 3D- ME echnique does no degrade he rae-disorion performance significanly, bu reduces he overall compuaional ime and eliminaes he random access frame delay compared o H.264/MVC which enables ineracive real ime communicaions. Anoher echnique (named 3D-ME-McFIS) is also proposed in his paper where an exra reference 3D marix comprising McFISes of all views is used for 3D-ME. Experimenal resuls reveal ha he proposed 3D-ME- McFIS echnique ouperforms he H.264/MVC by improving rae-disorion performance and reducing compuaional ime wihou frame delay for mos of video sequences. The res of he paper is organized as follows. Secion 2 describes he proposed simulcas H.264/AVC-McFIS coding echnique wih deails of McFIS generaion seps. Secion 3 explains he proposed 3D-ME video coding echniques wih experimenal raionaliies. Secion 4 describes he hird proposed 3D-ME-McFIS coding echniques. Secion 5 analyses compuaional complexiy of he proposed echniques agains he sae-of-he-ar mehod. Secion 6 describes experimenal se up and analyses experimenal resuls, while Secion 7 concludes he paper. 2. PROPOSED SIMULCAST VIDEO CODING WITH MCFIS (H.264/AVC-MCFIS) The simulcas coding echnique using H.264/AVC is inferior o H.264/MVC in erms of rae-disorion performance; however, he simulcas echnique does no have he random access frame delay problem. Moreover, he McFIS-based coding echnique ouperforms he H.264/AVC echnique by exploiing saic and uncovered bacground areas. Thus, a new simulcas echnique is proposed where he curren frame is encoded using wo reference frames - immediae previous frame and a McFIS. Noe ha a McFIS of a view is generaed using he frames of he corresponding view. The ulimae reference frame is seleced a bloc and sub-bloc levels using he Lagrangian muliplier [7]. The McFIS-based coding echnique ouperforms H.264/AVC by mainly exploiing uncovered bacground areas of a scene. Generally we consider a pixel as a par of he bacground if i eeps is inensiy for a number of frames. Based on his assumpion, dynamic bacground modeling (DBM) [10]-[12] is formulaed. We assume ha he h Gaussian a ime represening a pixel inensiy wih mean, sandard deviaion (STD), recen value, and weigh such ha 1. The learning parameer α is used o balance he curren and pas values of parameers such as weigh, STD, mean, ec. Afer iniializaion, for every new observaion X (pixel inensiy a ime ) is firs mached agains he exising models in order o find one (e.g., h model) such ha X 1 1 2.5. If such a model exiss, hen updae he corresponding recen value parameer wih X. Oher parameers are updaed wih he learning rae as:

(1 ) X ; 1 2 1 (1 ) 2 T ( ) ( X X ); 1 (1 ) ; and he weighs of he remaining Gaussians (i.e., l where l ) are updaed as 1 l (1 ) l. Afer each ieraion, he weighs are normalized. If he model does no exis, a new Gaussian model is inroduced wih γ = µ = X, = 30, and ω = 0.001 by evicing he K h (based on w/σ in descending order) model if i exiss. For more deails in modeling and model updaing, please refer o [8]-[12]. To ge he bacground pixel inensiy from he above menioned models for a paricular pixel, we ae he average of he mean pixel inensiy and recen pixel value of he model ha has he highes value of weigh/sandard deviaion among he models of a pixel. (a) (b) (c) (d) (h) Fig 2: Examples of McFISes and uncovered/previously occluded bacground using Vassar, Ball Room, Exi, and Brea Dancing video sequences, (a), (b), (c), and (d) an original frame of Vassar, Ball Room, Exi, and Brea Dancing sequences respecively; (e), (f), (g), and (h) corresponding McFISes of he videos respecively. (e) (f) (g) Four examples of McFIS are shown in Fig 2 using frames of Vassar, Ball Room, Exi, and Brea Dancing video sequences respecively. Fig 2 (a), (b), (c), & (d) show he original frames of corresponding videos and (e), (f), (g), & (h) show McFISes. The red-doed ovals/recangles in (e), (f), (g), & (h) indicae he uncovered/occluded bacground capured by he corresponding McFIS. To capure he uncovered bacground by any single frame is impossible unless his uncovered bacground is visible for one frame and ha frame is used as a second reference frame. The experimenal resuls of compuaional ime requiremens and RD performance are analyzed in Secion 5 and Secion 6 ogeher wih oher schemes. The simulcas H.264/AVC- McFIS requires more encoding ime compared o H.264/MVC, however, i can ouperforms he MVC echnique if a video has significan amouns of bacground area. 3. PROPOSED 3D MOTION ESTIMATION (3D ME) TECHNIQUE A scene is capured by a number of cameras which are placed a differen angles in muli-view sysem. As he same scene is capured by all cameras, here are iner- and inraview redundancies. In general, we can assume ha relaive objec movemen wihin a view is very similar o oher views. To find he moion similariy, we have invesigaed he moion vecor relaionship among he views of he muliview video sequences using four sandard video sequences such as Exi, Ball Room, Vassar, and Brea Dancing. Firs we deermine he moion vecors of all macroblocs for each frame of a view using a 16 16 bloc size full search ME echnique of ±15 search lengh. Then find he similariy of he moion vecors of a view wih oher views. Fig 3 (a) shows average similariy of moion vecors among differen views where he firs 10 frames are used for each view of each sequence. The figure confirms ha he similariy is 51% o 93%. The experimenal daa indicae ha he moion vecor of he macrobloc a he i h frame of he j h view has 51% o 93% of similariy wih he co-locaed macrobloc a he i h frame of oher views. We have also ploed absolue moion vecor differences for four sandard muli-view video sequences in Fig 3 (b), (c), (d), and (e) using he firs wo frames. The figures confirm ha a significan amoun of macroblocs have zero moion vecor differences beween wo views. To be more specific, Vassar and Ballroom sequences have more zero moion vecor differences. We can exploi his relaionship o avoid random access delay and compuaional ime problems of he exising H.264/MVC predicion srucure. In he proposed 3D-ME echnique, we can mae a 3D frame comprising i h frames of all views and ME can be carried using ineger search lengh for a 3D macrobloc (anoher dimension is formed using co-locaed macroblocs from differen views) where he reference 3D frame would be formed using he immediae previous i.e., (i-1) h frames

of all views. Thereafer, a fracional ME refinemen is also conduced on individual frames of he curren 3D frame using individual reference frames o capure differen moions of each view. As we have seen ha for some video sequences, moion similariy is no very high, fracional ME refinemen for each view has improved he RD performance. In he proposed 3D-ME echnique, we do no exploi iner-view redundancy explicily, due o he following hree reasons: (i) he correlaion among he inraview images is higher han he correlaion among he inerview images [1]-[3], (ii) o avoid random access frame delay, and (iii) o reduce compuaional ime. muliple moion esimaion for each reference frame (e.g., B4-frame of S3 view a T3 posiion in Fig 1 requires 4 imes he moion esimaion using 4 reference frames), he proposed mehod requires only one moion esimaion. A significan reducion in compuaional ime can be achieved using he proposed mehod as i does no need dispariy esimaion and moion esimaion for muliple reference frames. The proposed mehod does no require any frame delay for random access which is anoher benefi of i compared o he exising predicion srucure as all frames a T i are available for encoding/decoding T i+1 frames (see Fig 1). Curren 3D frame (a) (b) (c) Reference 3D frame (immediae Reference 3D frame (McFIS) previous) Fig 4: Differen 3D frames for moion esimaion and compensaion for he proposed (3D-ME-McFIS) mehod where ME&MC of a macrobloc in a curren 3D frame (comprises i h frames of all views) is carried ou using boh reference 3D frame (comprises (i-1) h frames of all views) and 3D McFIS (comprises all McFISes of views up o (i-1) h frames). (d) (e) Fig 3: Moion relaionship among muli-view frames: (a) average similariy of he moion vecors among differen views for four sandard muli-view video sequences where he firs 10 frames are used for each view of each sequence; (b) (c) (d) and (e) moion vecor differences among wo views for Exi, Ballroom, Vassar, and Brea Dancing video sequences respecively. Fig 4 shows he formaion of he 3D frame using i h frames and (i-1) h frames of all views where he firs 3D frame is he curren 3D frame and he laer is he reference 3D frame (where he hird dimension is formed using i h and (i-1) h frames from differen views respecively). We will discuss he oher 3D reference frame in Secion 4. The proposed mehod (3D-ME) does no require any dispariy esimaion [13] for iner-views as we do no explicily use any iner-view relaionships. Insead of 4. PROPOSED 3D MOTION ESTIMATION TECHNIQUE WITH 3D MCFIS (3D-ME-MCFIS) Alhough he proposed 3D-ME mehod successfully overcomes wo limiaions such as compuaional ime and random access frame delay, i (wih is curren sae) could no ouperform H.264/MVC in erms of rae-disorion performance as he experimenal resuls (see Fig 3) reveal ha he moion vecor similariy is no 100%. The experimenal resuls also reveal ha some cases such as moion acive video sequences (exi and Brea Dancing), he moion vecor similariy is around 50%. In resuls, he proposed mehod degrades he rae-disorion performance for hose cases. I is also worhy o invesigae he uilizaion of he compuaional gain of he proposed mehod for improving he rae-disorion performance wihou sacrificing compuaional gain and random access delay. As menioned earlier, McFIS can successfully capure a saic bacground including occluded bacground areas (if expressed once) from a scene of a video sequence. We have formed 3D McFIS using he McFISes of all views and hen

used i as a second reference frame when 3D moion esimaion was carried ou for he curren 3D frame. The proposed 3D-ME-McFIS echnique uses addiional reference frames compared o he proposed 3D-ME echnique. Obviously he 3D-ME-McFIS echnique requires addiional compuaional ime compared o he 3D-ME echnique due o he McFIS modelling and exra moion esimaion using 3D McFIS, however, beer rae-disorion performance is achieved due o foreground (using he immediae 3D previous frame) and bacground (using 3D McFIS) referencing. In he 3D-ME-McFIS echnique, afer encoding a 3D frame, we have updaed 3D McFIS by updaing individual McFIS (for each view) using he laes encoding frame of he corresponding views. For example (i-1) h 3D McFIS is used while i h 3D curren frame is encoded and he i h 3D McFIS is updaed using he ih encoded frames. The benefi of he updaed 3D McFIS is o eep he McFIS relevan in erms of referencing. Compuaional Saving (%) Compuaional Saving (%) 70 60 50 40 30 20 10 0-10 -20-30 -40 80 70 60 50 40 30 20 10 0-10 -20-30 3D-ME 3D-ME-McFIS H.264/AVC-McFIS 36 32 28 24 20 Quanizaion Parameers (QPs) (a) 3D-ME 3D-ME-McFIS H.264/AVC-McFIS 36 32 28 24 20 Quanizaion Parameers (QPs) (b) Fig 5: Average compuaional complexiy reducion by he proposed mehods (3D-ME, 3D-ME-McFIS, simulcas H.264/AVC-McFIS) agains H.264/MVC using four video sequences where search lengh (a) 15 and (b) 31 are used. 5. COMPUTATIONAL COMPLEXITY One of he objecives of he proposed mehods is o reduce he compuaional ime of he exising muli-view video coding sandard o enhance he scope of 3D video coding applicaions. Fig 5 shows compuaional ime comparisons among he proposed mehods and he exising video coding sandard using wo search lenghs (i.e., 15 and 31) on four sandard video sequences. The figure reveals ha he proposed mehods (3D-ME, 3D-ME-McFIS, and H.264/AVC-McFIS) reduced around 51%, 37%, and -30% of he compuaional ime compared o H.264/MVC sandard when a search lengh 15 was used. The corresponding figure is 53%, 67%, -13% while a search lengh 31 was used. Experimens were conduced on a PC wih Inel(R) Core (TM) 2 CPU 6600@2.40 GHz, 2.39 GHz, and 3.50 GB of RAM. Noe ha he proposed H.264/AVC-McFIS required 30% and 13% more compuaional ime compared o H.264/MVC for 15 and 31 search lengh respecively. The proposed 3D-ME-McFIS is beer compared o oher schemes in erms of compuaional ime while larger search lengh used as he McFIS modelling requires fixed amoun of operaions and i does no depend on search lengh. We also noe ha he firs hree views were used o calculae he resuls. Due o he fixed amoun of operaions requiremen for McFIS modelling, he proposed 3D-ME-McFIS mehod reduced compuaional ime slighly less compared o ha of he proposed 3D-ME scheme. When large search lengh was used, he compuaional ime requiremen for he McFIS modelling was negligible compared o he moion esimaion. The proposed 3D-ME-McFIS scheme used a small search lengh (e.g., 2) for he moion esimaion using he McFIS as he McFIS was only used for referencing he bacground which had no moion. Thus, he compuaional ime reducion for boh mehods is almos he same for large search lengh. 6. EXPERIMENTAL RESULTS To compare he performance of he proposed schemes (H.264/AVC-McFIS, 3D-ME and 3D-ME-McFIS), we have implemened all he algorihms based on he H.264/MVC recommendaions wih 25 Hz, ±15 as he search lengh wih quarer-pel accuracy and wih 16 as he GOP size. In he proposed 3D-ME scheme, we have considered he IBBP predicion forma, in he proposed 3D-ME-McFIS mehod, we have only used he IPPP forma, whereas we have used he hierarchical B-picure predicaion srucure for H264/MVC and H.264/AVC-McFIS schemes. Obviously he proposed H.264/AVC-McFIS and 3D-ME-McFIS echniques will ae some exra operaions o generae McFIS. We used he same echnique for modeling McFIS a he encoder and decoder, hus, we do no need o encode and ransmi he McFISes o he decoder. Fig 6 shows rae-disorion performance using H.264/MVC and he hree proposed schemes he H.264/AVC-McFIS, 3D-ME and 3D-ME-McFIS using firs hree views of four sandard muli-view video sequences. The figure reveals ha he rae-disorion performance of he proposed 3D-ME scheme is comparable o H.264/MVC. However, he proposed 3D-ME scheme ouperforms H.264/MVC by reducing compuaional ime by 37 ~ 67% (see Fig 5) and eliminaing random access delay. The proposed 3D-ME-McFIS scheme ouperforms H.264/MVC in erms of rae-disorion performance (Brea Dancing is an excepion) by improving PSNR by more han 0.25 db,

compuaional complexiy by reducing he ime by more han 51%, and eliminaing random access frame delay. Due o he huge moions (bacground modeling is less effecive), he proposed 3D-ME-McFIS mehod does no ouperform H.264/MVC for he Brea Dancing sequence. The proposed H.264/AVC-McFIS scheme ouperforms he H.264/MVC in erms of rae-disorion performance; however, i aes more compuaional ime i.e., from 13% o 30% compared o H.264/MVC. Fig 6: Rae-disorion performance by H.264/MVC and he proposed schemes (3D-ME, 3D-ME-McFIS, and H.263/AVC-McFIS) using four sandard video sequences namely Exi, Ball Room, Vassar, and Brea Dancing. 7. CONCLUSIONS In his paper, we proposed a new 3-Dimensional moion esimaion and moion compensaion scheme o reduce he compuaional ime and eliminae he random access frame delay of he exising H.264/MVC muli-view video coding sandard. To eliminae random access frame delay, firsly we have proposed a simulcas McFIS-based echnique on he H.264/AVC plaform. The proposed echnique ouperforms H.264/MVC in erms of RD performance; however, i aes 30% more compuaional ime. In he proposed 3D-ME echnique, a 3D frame is formed using he same emporal frames of all views and moion esimaion is carried ou for a bloc of he curren 3D frame using he immediae previous 3D frame as reference frame. A moion esimaion refinemen for individual curren images is also conduced afer ineger level 3D moion esimaion o improve he RD performance. This echnique ouperforms he exising sandard by reducing compuaional ime by more han 51% and eliminaing random access frame delay wihou degrading he rae-disorion performance significanly compared o he sae-of-he-ar mehod i.e., H.264/MVC. This paper also proposes anoher echnique (3D-ME-McFIS) where an exra 3D reference frame is used in addiion o he immediae previous 3D frame. The exra 3D frame is formed using dynamic bacground frames of each view which are popularly nown as McFISes (he mos common frame in a scene) based on Gaussian mixure modelling. The experimenal resuls reveal ha 3D-ME- McFIS ouperforms he H.264/MVC coding sandard by improving PSNR by 0.25dB, by reducing compuaional ime by 51%, and by eliminaing random access frame delay compared o he exising H.264/MVC muli-view video coding. The proposed echniques enhance he 3D video coding applicaion scope for ineracive real ime video communicaions. 8. REFERENCES [1] A. Vero, T. Wiegand, and G. J. Sullivan, Overview of he Sereo and Muliview Video Coding Exensions of he H.264/MPEG-4 AVC Sandard, Proceedings of he IEEE, 99(4), 626-642, 2011. [2] P. Pandi, A. Vero, Y. Chen, Join Muliview Video Model (JMVM) 7 Reference Sofware, N9579, MPEG of ISO/IEC JTC1/SC29/WG11, Analya, Jan. 2008. [3] M. Talebpourazad, 3D-TV conen generaion and muliview video coding, PhD hesis, 2010. [4] H. Schwarz, D. Marpe, and T. Wiegand, Analysis of hierarchical B-picures and MCTF, IEEE Inernaional Conference on Mulimedia and Expo, pp. 1929-1932, 2006. [5] M. Paul, W. Lin, C. T. Lau, and B. S. Lee, "McFIS in hierarchical bipredicive picure-based video coding for referencing he sable area in a scene," IEEE Inernaional conference on Image Processing (IEEE ICIP-11), 2011. [6] T. Wiegand, G. J. Sullivan, G. Bjønegaard, and A. Luhra, Overview of he H.264/AVC Video Coding Sandard, IEEE Transacion on Circuis and Sysems for Video Technology, vol. 13, no. 7, pp. 560-576, 2003. [7] M. Paul, M. Fraer, and J. Arnold, An efficien Mode Selecion Prior o he Acual Encoding for H.264/AVC Encoder, IEEE Transacion on Mulimedia, vol. 11, no. 4, pp. 581-588, June, 2009. [8] M. Paul, W. Lin, C. T. Lau, and B. S. Lee Explore and model beer I-frame for video coding, IEEE Transacion on Circuis and Sysems for Video Technology, 2011. [9] M. Paul, W. Lin, C. T. Lau, and B. S. Lee, "Video coding using he mos common frame in scene," IEEE Inernaional conference on Acousics, Speech, and Signal processing (IEEE ICASSP-10), pp. 734 737, 2010. [10] C. Sauffer and W. E. L. Grimson, Adapive bacground mixure models for real-ime racing, IEEE CVPR, vol. 2, 246 252, 1999. [11] D.-S. Lee, Effecive Gaussian mixure learning for video bacground subracion, IEEE Transacions on PAMI, 27(5), pp. 827 832, May 2005. [12] M. Haque, M. Murshed, and M. Paul, "On Sable Dynamic Bacground Generaion Technique using Gaussian Mixure Models for Robus Objec Deecion, IEEE Inernaional Conference on Advanced Video and Signal Based Surveillance (IEEE AVSS-08), pp. 41-48, 2008. [13] X. Li, D. Zhao, S. Ma, and W. Gao, Fas dispariy and moion esimaion based on correlaions for muli-view video coding, IEEE Transacions on Consumer Elecronics, 54(4), pp. 2037-2044, 2008.