A Quantization-Friendly Separable Convolution for MobileNets

Similar documents
LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Anchor Box Optimization for Object Detection

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

Why Take Notes? Use the Whiteboard Capture System

Optimized PMU placement by combining topological approach and system dynamics aspects

Novel Quantization Strategies for Linear Prediction with Guarantees

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

Error Concealment Aware Rate Shaping for Wireless Video Transport 1

The Traffic Image Is Dehazed Based on the Multi Scale Retinex Algorithm and Implementation in FPGA Cui Zhe1, a, Chao Li2, b *, Jiaqi Meng3, c

A Comparative Analysis of Disk Scheduling Policies

Critical Path Reduction of Distributed Arithmetic Based FIR Filter

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

Statistics AGAIN? Descriptives

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

current activity shows on the top right corner in green. The steps appear in yellow

Technical Information

Following a musical performance from a partially specified score.

Improving Reliability and Energy Efficiency of Disk Systems via Utilization Control

A Scalable HDD Video Recording Solution Using A Real-time File System

QUICK START GUIDE v0.98

SONG STRUCTURE IDENTIFICATION OF JAVANESE GAMELAN MUSIC BASED ON ANALYSIS OF PERIODICITY DISTRIBUTION

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

Quantization of Three-Bit Logic for LDPC Decoding

Simple Solution for Designing the Piecewise Linear Scalar Companding Quantizer for Gaussian Source

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALKING CONDITIONS

AIAA Optimal Sampling Techniques for Zone- Based Probabilistic Fatigue Life Prediction

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

Analysis of Subscription Demand for Pay-TV

Modeling Form for On-line Following of Musical Performances

Study on the location of building evacuation indicators based on eye tracking

Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms

Reduce Distillation Column Cost by Hybrid Particle Swarm and Ant

arxiv: v1 [cs.cl] 12 Sep 2018

Simple VBR Harmonic Broadcasting (SVHB)

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

Failure Rate Analysis of Power Circuit Breaker in High Voltage Substation

A STUDY OF TRUMPET ENVELOPES

Detecting Errors in Blood-Gas Measurement by Analysiswith Two Instruments

INSTRUCTION MANUAL FOR THE INSTALLATION, USE AND MAINTENANCE OF THE REGULATOR GENIUS POWER COMBI

User s manual. Digital control relay SVA

SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS. Orion Hodson, Colin Perkins, and Vicky Hardman

Product Information. Manual change system HWS

Clock Synchronization in Satellite, Terrestrial and IP Set-top Box for Digital Television

Environmental Reviews. Cause-effect analysis for sustainable development policy

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

Product Information. Manual change system HWS

Scalable QoS-Aware Disk-Scheduling

3 Part differentiation, 20 parameters, 3 histograms Up to patient results (including histograms) can be stored

FPGA Implementation of Cellular Automata Based Stream Cipher: YUGAM-128

AN INTERACTIVE APPROACH FOR MULTI-CRITERIA SORTING PROBLEMS

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

Fast Intra-Prediction Mode Decision in H.264/AVC Based on Macroblock Properties

Loewe bild 7.65 OLED. Set-up options. Loewe bild 7 cover Incl. Back cover. Loewe bild 7 cover kit Incl. Back cover and Speaker cover

Product Information. Universal swivel units SRU-plus

Craig Webre, Sheriff Personnel Division/Law Enforcement Complex 1300 Lynn Street Thibodaux, Louisiana 70301

Discussion Paper Series

Multi-Line Acquisition With Minimum Variance Beamforming in Medical Ultrasound Imaging

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

TRADE-OFF ANALYSIS TOOL FOR INTERACTIVE NONLINEAR MULTIOBJECTIVE OPTIMIZATION Petri Eskelinen 1, Kaisa Miettinen 2

User Manual. AV Router. High quality VGA RGBHV matrix that distributes signals directly. Controlled via computer.

Production of Natural Penicillins by Strains of Penicillium chrysogenutn

Color Monitor. L200p. English. User s Guide

Loewe bild 5.55 oled. Modular Design Flexible configuration with individual components. Set-up options. TV Monitor

T541 Flat Panel Monitor User Guide ENGLISH

Conettix D6600/D6100IPv6 Communications Receiver/Gateway Quick Start

Product Information. Miniature rotary unit ERD

(12) Ulllted States Patent (10) Patent N0.: US 8,269,970 B2 P0lid0r et a]. (45) Date of Patent: Sep. 18, 2012

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

SWS 160. Moment loading. Technical data. M x max Nm M y max Nm. M z max Nm

JTAG / Boundary Scan. Multidimensional JTAG / Boundary Scan Instrumentation. Get the total Coverage!

Academic Standards and Calendar Committee Report # : Proposed Academic Calendars , and

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

Modular Plug Connectors (Standard and Small Conductor)

CASH TRANSFER PROGRAMS WITH INCOME MULTIPLIERS: PROCAMPO IN MEXICO

GENERAL AGREEMENT ON MMra

S Micro--Strip Tool in. S Combination Strip Tool ( ) S Cable Holder Assembly (Used only

Turn it on. Your guide to getting the best out of BT Vision

Sealed Circular LC Connector System Plug

User guide. Receiver-In-Ear hearing aids. resound.com

Expressive Musical Timing

INTERCOM SMART VIDEO DOORBELL. Installation & Configuration Guide

A question of character. Loewe Connect ID.

JTAG / Boundary Scan. Multidimensional JTAG / Boundary Scan Instrumentation

DT-500 OPERATION MANUAL MODE D'EMPLOI MANUAL DE MANEJO MANUAL DE OPERA(_._,O. H.-,lri-D PROJECTOR PROJECTEUR PROYECTOR PROJETOR

Product Bulletin 40C 40C-10R 40C-20R 40C-114R. Product Description For Solvent, Eco-Solvent, UV and Latex Inkjet and Screen Printing 3-mil vinyl films

WPA REGIONAL CONGRESS OSAKA Japan 2015

Product Information. Universal swivel units SRU-plus 25

SCTE Broadband Premises Technician (BPT)

ELEGT110111C. Servicing & Technology November Pick and place and holding fixtures. Whatever happened to if transformers

zenith Installation and Operating Guide HodelNumber I Z42PQ20 [ PLASHATV

www. ElectricalPartManuals. com l Basler Electric VOLTAGE REGULATOR FEATURES: CLASS 300 EQUIPMENT AVC63 4 FEATURES AND APPLICATIONS

Scene Classification with Inception-7. Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Printer Specifications

Transcription:

arxv:1803.08607v1 [cs.cv] 22 Mar 2018 A Quantzaton-Frendly Separable for MobleNets Abstract Tao Sheng tsheng@qt.qualcomm.com Xaopeng Zhang parker.zhang@gmal.com As deep learnng (DL) s beng rapdly pushed to edge computng, researchers nvented varous ways to make nference computaton more effcent on moble/iot devces, such as network prunng, parameter compresson, and etc. Quantzaton, as one of the key approaches, can effectvely offload GPU, and make t possble to deploy DL on fxed-pont ppelne. Unfortunately, not all exstng networks desgn are frendly to quantzaton. For example, the popular lghtweght MobleNetV1 [1], whle t successfully reduces parameter sze and computaton latency wth separable convoluton, our experment shows ts quantzed models have large accuracy gap aganst ts float pont models. To resolve ths, we analyzed the root cause of quantzaton loss and proposed a quantzaton-frendly separable convoluton archtecture. By evaluatng the mage classfcaton task on ImageNet2012 dataset, our modfed MobleNetV1 model can archve 8-bt nference top-1 accuracy n 68.03%, almost closed the gap to the float ppelne. Keywords Separable, MobleNetV1, Quantzaton, Fxed-pont Inference 1 Introducton Quantzaton s crucal for DL nference on moble/iot platforms, whch have very lmted budget for power and memory consumpton. Such platforms often rely on fxed-pont computatonal hardware blocks, such as Dgtal Sgnal Processor (DSP), to acheve hgher power effcency over float pont processor, such as GPU. On exstng DL models, such as VGGNet [2], GoogleNet [3], ResNet [4] and etc., although quantzaton may not mpact nference accuracy for ther over-parameterzed desgn, t would be dffcult to deploy those models on moble platforms due to large computaton latency. Many lghtweght networks, however, can trade off accuracy wth effcency by replacng conventonal convoluton wth depthwse separable convoluton, as shown n the Fgure 1(a)(b). For example, the MobleNets proposed by Google, drastcally shrnk parameter sze and memory 2018. Chen Feng chenf@qt.qualcomm.com Lang Shen lang.shen@qt.qualcomm.com (a) Standard Shaoje Zhuo shaojez@qt.qualcomm.com Mckey Aleksc maleksc@qt.qualcomm.com Qualcomm Technologes, Inc. Depthwse /6 Pontwse /6 (b) MobleNet Separable Depthwse Pontwse (c) Proposed Quantzaton-frendly Separable Fgure 1. Our proposed quantzaton-frendly separable convoluton core layer desgn vs. separable convoluton n MobleNets and standard convoluton footprnt, thus are gettng ncreasngly popular n moble platforms. The downsde s that the separable convoluton core layer n MobleNetV1 causes large quantzaton loss, and thus resultng n sgnfcant feature representaton degradaton n the 8-bt nference ppelne. To demonstrate the quantzaton ssue, we selected Tensor- Flow mplementaton of MobleNetV1 [6] and InceptonV3 [7], and compared ther accuracy on float ppelne aganst 8-bt quantzed ppelne. The results are summarzed n Table1. The top-1 accuracy of InceptonV3 drops slghtly after applyng the 8-bt quantzaton, whle the accuracy loss s sgnfcant for MobleNetV1. Table 1. Top-1 accuracy on ImageNet2012 valdaton dataset Networks Float Ppelne 8-bt Ppelne InceptonV3 78.00% 76.92% MobleNetV1 70.50% 1.80% Comments Only standard convoluton Manly separable convoluton There are a few ways that can potentally address the ssue. The most straght forward approach s quantzaton wth more bts. For example, ncreasng from 8-bt to 16-bt could

Tao Sheng, Chen Feng, Shaoje Zhuo, Xaopeng Zhang, Lang Shen, and Mckey Aleksc boost the accuracy [14], but ths s largely lmted by the capablty of target platforms. Alternatvely, we could re-tran the network to generate a dedcated quantzed model for fxed-pont nference. Google proposed a quantzed tranng framework [5] co-desgned wth the quantzed nference to mnmze the loss of accuracy from quantzaton on nference models. The framework smulates quantzaton effects n the forward pass of tranng, whereas back-propagaton stll enforces float ppelne. Ths re-tranng framework can reduce the quantzaton loss dedcatedly for fxed-pont ppelne at the cost of extra tranng, also the system needs to mantan multple models for dfferent platforms. In ths paper, we focus on a new archtecture desgn for the separable convoluton layer to buld lghtweght quantzatonfrendly networks. The proposed new archtecture requres only sngle tranng n the float ppelne, and the traned model can then be deployed to dfferent platforms wth float or fxed-pont nference ppelnes wth mnmum accuracy loss. To acheve ths, we look deep nto the root causes of accuracy degradaton of MobleNetV1 n the 8-bt nference ppelne. And based on the fndngs, we proposed a re-archteched quantzaton-frendly MobleNetV1 that mantans a compettve accuracy wth float ppelne, but a much hgher nference accuracy wth a quantzed 8-bt ppelne. Our man contrbutons are: 1. We dentfed batch normalzaton and 6 are the major root causes of quantzaton loss for MobleNetV1. 2. We proposed a quantzaton-frendly separable convoluton, and emprcally proved ts effectveness based on MobleNetV1 n both the float ppelne and the fxed-pont ppelne. 2 Quantzaton Scheme and Loss Analyss In ths secton, we wll explore the TensorFlow (TF) [8] 8-bt quantzed MobleNetV1 model, and fnd the root cause of the accuracy loss n the fxed-pont ppelne. Fgure 2 shows a typcal 8-bt quantzed ppelne. A TF 8-bt quantzed model s drectly generated from a pre-traned float model, where all weghts are frstly quantzed offlne. Durng the nference, any float nput wll be quantzed to an 8-bt unsgned value before passng to a fxed-pont runtme operaton, such as QuantzedConv2d, QuantzedAdd, and QuantzedMul, etc. These operatons wll produce a 32-bt accumulated result, whch wll be converted down to an 8-bt output through an actvaton re-quantzaton step. Noted that ths output wll be the nput to the next operaton. 2.1 TensorFlow 8-bt Quantzaton Scheme TensorFlow 8-bt quantzaton uses a unform quantzer, n whch all quantzaton steps are of equal sze. Let x f loat represent for the float value of sgnal x, the TF 8-bt quantzed value, denoted as x quant8 can be calculated as: x quant8 = [x f loat/ x ] δ x, (1) Inputs float32 weghts Quantzaton Input Quantzaton Loss float32 Quantzaton (offlne) unt8 unt8 nt32 unt8 Re- OP TF8 Quantzaton Saturaton & Clppng Loss Weght Quantzaton Loss Actvaton Quantzaton Loss Fgure 2. A fxed-pont quantzed ppelne Outputs x = x max x mn 2 b and δ x = [x mn/ x ] (2) 1 where x represents for the quantzaton step sze; b s the bt-wdth,.e., b = 8, and δ x s the offset value such that float value 0 s exactly represented. x mn and x max are the mn and max values of x n the float doman, and [ ] represents for the nearest roundng operaton. In the TensorFlow mplementaton, t s defned as [x] = sдn(x) x + 0.5 (3) where sgn(x) s the sgn of the sgnal x, and represents for the floor operaton. Based on the defntons above, the accumulated result of a convoluton operaton s computed by: (xf ) accum f loat = loat w f loat (xquant8 ) ( ) = x w + δ x wquant8 + δ w = x w accum nt32 (4) Fnally, gven known mn and max values of the output, by combnng equaton (1) and (4), the re-quantzed output can be calculated by multplyng the accumulated result wth x w, and then subtractng the output offset δ ouput. [ ] 1 output quant8 = accum f loat δ ouput [ ] (5) x w = accum nt32 δ ouput 2.2 Metrc for Quantzaton Loss As depcted n Fgure 2, there are fve types of loss n the fxed-pont quantzed ppelne, e.g., nput quantzaton loss, weght quantzaton loss, runtme saturaton loss, actvaton re-quantzaton loss, and possble clppng loss for certan non-lnear operatons, such as 6. To better understand the loss contrbuton that comes from each type, we use Sgnal-to-Quantzaton-Nose Rato (SQNR), defned as the power of the unquantzed sgnal x devded by the power of the quantzaton error n as a metrc to evaluate the quantzaton accuracy at each layer output. ( ) SQNR = 10 log E(x 2 10 )/E(n 2 ) n db (6) Snce the average magntude of the nput sgnal x s much larger than the quantzaton step sze x, t s reasonable to

A Quantzaton-Frendly Separable for MobleNets assume that the quantzaton error s zero mean wth unform dstrbuton and the probablty densty functon (PDF) ntegrates to 1 [10]. Therefore, for an 8-bt lnear quantzer, the nose power can be calculated by E(n 2 ) = x 2 x 2 1 x n 2 dn = 2 x 12 Substtutng equaton (2) and (7) nto equaton (6), we get SQNR = 58.92 10 log 10 (x max x mn ) 2 E(x 2 ) (7) n db (8) SQNR s tghtly coupled wth sgnal dstrbuton. From equaton (8), t s obvous that SQNR s determned by two terms: the power of the sgnal x, and the quantzaton range. Therefore, ncreasng the sgnal power or decreasng the quantzaton range can help to ncrease the output SQNR. 2.3 Quantzaton Loss Analyss on MobleNetV1 2.3.1 Norm n Depthwse Layer As shown n Fgure 1(b), a typcal MobleNetV1 core layer conssts of a depthwse convoluton and a pontwse convoluton, each of whch followed by a [9] and a non-lnear actvaton functon, respectvely. In the TensorFlow mplementaton, 6 [11] s used as the non-lnear actvaton functon. Consder a layer nput x = (x (1),..., x (d) ), wth d-channels and m elements n each channel wthn a mn-batch, the Transform n depthwse convoluton layer s appled on each channel ndependently, and can be expressed by, y (k) = γ (k) xˆ (k) + β (k) x(k) (k) µ (k) = γ + β (k) = 1,...,m, k = 1,...,d where xˆ (k) represents for the normalzed value of x (k) on channel k. µ (k) and σ (k) are mean and varance over the mnbatch. γ (k) and β (k) are scale and shft. Noted that ϵ s a gven small constant value. In the TensorFlow mplementaton, ϵ = 0.0010000000475. The Transform can be further folded n the fxed-pont ppelne. Let α (k) = γ (k) equaton (9) can be reformulated as and β (k) = β (k) γ (k) µ (k) y (k) = α (k) x (k) = 1,...,m, + β (k) k = 1,...,d (9) (10) (11) In the TensorFlow mplementaton, for each channel k, α can be combned wth weghts and folded nto the convoluton operatons to further reduce the computaton cost. 21.5191 0.223199 1.08687 14.1944 0.863097 0.183423 0.101928 0.194357 24.2374 0.0637057 0.113416 1.48599 0.299164 0.6962 0.932557 3.70812 14.8428 0.0413136 0.184471 0.228431 17.1846 30.6505 0.114515 1.3315 0.214905 0.627026 0.0899751 0.199664 0.0595126 0.461869 0.339099 0.229413 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132 Fgure 3. An example of α values across 32 channels of the frst depthwse conv. layer from MobleNetV1 float model Depthwse convoluton s appled on each channel ndependently. However, the mn and max values used for weghts quantzaton are taken collectvely from all channels. An outler n one channel can easly cause a huge quantzaton loss for the whole model due to an enlarged data range. Wthout correlaton crossng channels, depthwse convoluton may prone to produce all-zero values n one channel, leadng to zero varance (σ (k) = 0) for that specfc channel. Ths s commonly observed n MobleNetV1 models. Refer to equaton (10), zero varance of channel k would produce a very large value of α (k) due to the small constant value of ϵ. Fgure 3 shows observed α values across 32 channels extracted from the frst depthwse convoluton layer n MobleNetV1 float model. It s notced that the 6 outlers of α caused by the zero-varance ssue largely ncrease the quantzaton range. As a result, the quantzaton bts are wasted on preservng those large values snce they all correspond to all-zero-value channels, whle those small α values correspondng to nformatve channels are not well preserved after quantzaton, whch badly hurts the representaton power of the model. From our experments, wthout retranng, proper handlng the zero-varance ssue by changng the varance of a channel wth all-zero values to the mean value of varances of the rest of channels n that layer, the top-1 accuracy of the quantzed MobleNetV1 on ImageNet2012 valdaton dataset can be dramatcally mproved from 1.80% to 45.73% on TF8 nference ppelne. A standard convoluton both flters and combnes nputs nto a new set of outputs n one step. In MobleNetV1, the depthwse separable convoluton splts ths nto two layers, a depthwse separable layer for flterng and a pontwse separable layer for combnng [1], thus drastcally reducng computaton and model sze whle preservng feature representatons. Based on ths prncple, we can remove the non-lnear operatons,.e., and 6, between the two layers, and let the network learn proper weghts to handle the Transform drectly. Ths procedure preserves all the feature representatons, whle makng the model quantzaton-frendly. To further understand per-layer output accuracy of the network,

Tao Sheng, Chen Feng, Shaoje Zhuo, Xaopeng Zhang, Lang Shen, and Mckey Aleksc SQNR (db) 35 30 25 20 15 10 5 0 conv2d dw1 pw1 dw2 pw2 dw3 pw3 dw4 pw4 dw5 pw5 dw6 pw6 dw7 pw7 dw8 pw8 dw9 pw9 dw10 pw10 dw11 pw11 dw12 pw12 dw13 Layer n all pontwse layers depthwse conv output pontwse conv layer output 6 n all pontwse layers depthwse conv output pontwse conv layer output Orgnal MobleNet, alpha folded depthwse conv output pontwse conv layer output Fgure 4. A comparson on the averaged per-layer output SQNR of MobleNetV1 wth dfferent core layer desgns we use SQNR, defned n equaton (8) as a metrc, to observe the quantzaton loss n each layer. Fgure 4 compares an averaged per-layer output SQNR of the orgnal MobleNetV1 wth α folded nto convoluton weghts (black curve) wth the one that smply removes and 6 n all depthwse convoluton layers (blue curve). We stll keep the and 6 n all pontwse convoluton layers. 1000 mages are randomly selected from ImageNet2012 valdaton dataset (one n each class). From our experment, ntroducng and 6 between the depthwse convoluton and pontwse convoluton largely n fact degrades the per-layer output SQNR. 2.3.2 6 or In ths secton, we stll use SQNR as a metrc to measure the effect of choosng dfferent actvaton functons n all pontwse convoluton layers. Noted that for a lnear quantzer, SQNR s hgher when sgnal dstrbuton s more unform, and s lower when otherwse. Fgure 4 shows an averaged per-layer output SQNR of MobleNetV1 by usng and 6 as dfferent actvaton functons at all pontwse convoluton layers. A huge SQNR drop s observed n the frst pontwse convoluton layer whle usng 6. Based on equaton (8), although 6 helps to reduce the quantzaton range, the sgnal power also gets reduced by the clppng operaton. Ideally, ths should produce smlar SQNR wth that of. However, clppng the sgnal x at early layers may have a sde effect of dstortng the sgnal dstrbuton to make t less quantzaton frendly, as a result of compensatng the clppng loss durng tranng. As we observed, ths leads to a large SQNR drop from one layer to the other. Expermental result on the mproved accuracy by replacng 6 wth wll be shown n Secton 4. 2.3.3 L2 Regularzaton on Weghts Snce SQNR s tghtly coupled wth sgnal dstrbuton, we further enable the L2 regularzaton on weghts n all depthwse convoluton layers durng the tranng. The L2 regularzaton penalzes weghts wth large magntudes. Large weghts could potentally ncrease the quantzaton range, and make the weght dstrbuton less unform, leadng to a large quantzaton loss. By enforcng a better weghts dstrbuton, a quantzed model wth an ncreased top-1 accuracy can be expected. 3 Quantzaton-Frendly Separable for MobleNets Based on the quantzaton loss analyss n the prevous secton, we propose a quantzaton-frendly separable convoluton framework for MobleNets. The goal s to solve the large quantzaton loss problem so that the quantzed model can acheve smlar accuracy to the float model whle no re-tranng s requred for the fxed-pont ppelne. 3.1 Archtecture of the Quantzaton-frendly Separable Fgure 1(b) shows the separable convoluton core layer n the current MobleNetV1 archtecture, n whch a and a non-lnear actvaton operaton are ntroduced between the depthwse convoluton and the pontwse convoluton. From our analyss, due to the nature of depthwse convoluton, ths archtecture would lead to a problematc quantzaton model. Therefore, n Fgure 1(c), three major changes are made to make the separable convoluton core layer quantzaton-frendly. 1. and 6 are removed from all depthwse convoluton layers. We beleve that a separable convoluton shall consst of a depthwse convoluton followed by a pontwse convoluton drectly wthout any non-lnear operaton between the two. Ths procedure not only well preserves feature representatons, but s also quantzaton-frendly. 2. All 6 are replaced wth n the rest layers. In the TensorFlow mplementaton of MobleNetV1, 6 s used as the non-lnear actvaton functon. However, we thnk 6 s a very arbtrary number. Although [11] ndcates that 6 can encourage a model learn sparse feature earler, clppng the sgnal at early layers may lead to a quantzaton-unfrendly sgnal dstrbuton, and thus largely decreases the SQNR of the layer output. 3. The L2 Regularzaton on the weghts n all depthwse convoluton layers are enabled durng the tranng. 3.2 A Quantzaton-Frendly MobleNetV1 Model The layer structure of the proposed quantzaton-frendly MobleNetV1 model s shown n Table2, whch follows the overall layer structure defned n [1]. The separable convoluton core layer has been replaced wth the quantzatonfrendly verson as descrbed n the prevous secton. Ths model stll nherts the effcency n terms of the computatonal cost and model sze, whle acheves hgh precson for fxed-pont processor.

A Quantzaton-Frendly Separable for MobleNets Table 2. Quantzaton-frendly modfed MobleNetV1 Input Operator Repeat Strde 224x224x3 Conv2d+ 1 2 112x112x32 DC+PC++ 1 1 112x112x64 DC+PC++ 1 2 56x56x128 DC+PC++ 1 1 56x56x128 DC+PC++ 1 2 28x28x256 DC+PC++ 1 1 28x28x256 DC+PC++ 1 2 14x14x512 DC+PC++ 5 1 14x14x512 DC+PC++ 1 2 7x7x1024 DC+PC++ 1 2 7x7x1024 AvgPool 1 1 1x1x1024 Conv2d+ 1 1 1x1x1000 Softmax 1 1 4 Expermental Results We tran the proposed quantzaton-frendly MobleNetV1 float models usng the TensorFlow tranng framework. We follow the same tranng hyperparameters as MobleNetV1 except that we use one Nvda GeForce GTX TITAN X card and a batch sze of 128 s used durng the tranng. ImageNet2012 dataset s used for tranng and valdaton. Note that the tranng s only requred for float models. The expermental results on takng each change nto the orgnal MobleNetV1 model n both the float ppelne and the 8-bt quantzed ppelne are shown n Fgure 5. In the float ppelne, our traned float model acheves smlar top-1 accuracy as the orgnal MobleNetV1 TF model. In the 8-bt ppelne, by removng the and 6 n all depthwse convoluton layers, the top-1 accuracy of the quantzed model can be dramatcally mproved from 1.80% to 61.50%. In addton, by smply replacng 6 wth, the top-1 accuracy of 8-bt quantzed nference can be further mproved to 67.80%. Furthermore, by enablng the L2 regularzaton on weghts n all depthwse convoluton layers durng the tranng, the overall accuracy of the 8-bt ppelne can be mproved by another 0.23%. From our experments, the proposed quantzaton-frendly MobleNetV1 model acheves an accuracy of 68.03% n the 8-bt quantzed ppelne, whle mantanng an accuracy of 70.77% n the float ppelne for the same model. 5 Concluson and Future Work We proposed an effectve quantzaton-frendly separable convoluton archtecture, and ntegrated t nto MobleNets for mage classfcaton. Wthout reducng the accuracy n the float ppelne, our proposed archtecture shows a sgnfcant accuracy boost n the 8-bt quantzed ppelne. To generalze ths archtecture, we wll keep applyng t on more networks based on separable convoluton, e.g., MobleNetV2 Core Layer Desgn Float Ppelne 8-bt Ppelne Orgnal 6 6 6 Our proposed desgns + L2 Regularzer 70.50% 70.55% 70.80% 70.77% 1.80% 61.50% 67.80% 68.03% Fgure 5. Top-1 accuracy wth dfferent core layer desgns on ImageNet2012 valdaton dataset [12], ShuffleNet [13] and verfy ther fxed-pont nference accuracy. Also, we wll apply proposed archtecture to object detecton and nstance segmentaton applcatons. And we wll measure the power and latency wth the proposed quantzaton frendly MobleNets on devce. References [1] A. Howard, M. Zhu, B. Chen, D. Kalenchenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Moblenets: Effcent convolutonal neural networks for moble vson applcatons. Apr. 17, 2017, https://arxv.org/abs/1704.04861. [2] K. Smonyan and A. Zsserman. Very deep convolutonal networks for large-scale mage recognton. Sep.4, 2014, https://arxv.org/abs/1409.1556. [3] C. Szegedy, W. Lu, Y. Ja, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabnovch. Gong deeper wth convolutons. In Proceedngs of the IEEE Conference on CVPR, pages 1-9, 2015. 1 [4] K. He, X. Zhang, S. Ren, and J. Sun. Deep resdual learnng for mage recognton. Dec. 10, 2015, https://arxv.org/abs/1512.03385. [5] B. Jacob., S Klgys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalengchenko. Quantzaton and Tranng of Neural Networks for Effcent Integer-Arthmetc-Only Inference. Dec.15, 2017, https://arxv.org/abs/1712.05877. [6] Google TensorFlow MobleNetV1 Model. https://storage.googleaps.com/download.tensorflow.org/models/tflte/- moblenet_v1_1.0_224_float_2017_11_08.zp [7] Google TensorFlow InceptonV3 Model. http://download.tensorflow.org/models/- ncepton_v3_2016_08_28.tar.gz [8] Google TensorFlow Framework. https://www.tensorflow.org/ [9] S. Loff, and C. Szegedy. : Acceleratng Deep Network Tranng by Reducng Internal Covarate Shft. Feb. 11, 2015, https://arxv.org/abs/1502. [10] Udo ZÃűlzer. Dgtal Audo Sgnal Processng, Chapter 2 John Wley & Sons, Dec. 15, 1997 [11] A. Krzhevsky. al Deep Belef Networks on CIFAR-10. http://www.cs.utoronto.ca/ krz/conv-cfar10-aug2010.pdf [12] M. Sandler, A. Howard, M. Zhu, A. Zhmognov, and L. Chen. Inverted Resduals and Lnear Bottlenecks: Moble Networks for Classfcaton, Detecton and Segmentaton. Jan. 13, 2018, https://arxv.org/abs/1801.04381. [13] X. Zhang, X. Zhou, M. Ln, and J. Sun. ShuffleNet: An Extremely Effcent al Neural Network for Moble Devces Dec. 7, 2017, https://arxv.org/abs/1707.01083. [14] J. Cheng, P. Wang, G. L, Q. Hu, and H. Lu. Recent Advances n Effcent Computaton of Deep al Neural Networks Feb. 11, 2018, https://arxv.org/abs/1802.00939.