Variation-and-Aging Aware Low Power embedded SRAM for Multimedia Applications

Variation-and-Aging Aware Low Power embedded SRAM for Multimedia Applications Na Gong, Shixiong Jiang, Anoosha Challapalli, Manpinder Panesar and Ramalingam Sridhar University at Buffalo, State University of New York, Buffalo, NY, USA rsridhar@buffalo.edu ABSTRACT This paper presents a low power embedded SRAM memory design for MPEG-4 video processors. Considering both of the process variation and aging effect, the proposed design adopts an optimal high voltage for spatial voltage scaling to achieve high power efficiency. Simulations in FreePDK 45nm CMOS technology show that our proposed technique can achieve 85%, 90%, and 79% reduction in write power, read power, and leakage current, respectively, with graceful degradation (~5.6%) in video quality, as compared to conventional SRAM design. I. INTRODUCTION Nowadays, the growing popularity of powerful smart-phones and other portable devices results in the exponential growth of demand for multimedia applications. MPEG-4 (Moving Pictures Experts Group) is one of the most popular video codec standards [1, 2] in multimedia communications. Due to the intensive computation, these multimedia applications require highly frequent embedded memory accesses. Accordingly, embedded SRAM consumes large power, limiting the battery lifetime of portable devices. As a popular low power technique, supply voltage scaling has been widely used in CMOS VLSI systems [2-4]. However, SRAM cells are highly vulnerable to failures in low-voltage operation. Therefore, aggressive voltage scaling of all bits leads to considerable image/video quality degradation. To counter this effect, Minki Cho et al. have recently explored spatial voltage scaling (SVS) [5], while apples different voltages for Lower-order bits (LOBs) and higher-order bits (HOBs) of an SRAM array. HOBs with normal voltage can achieve acceptable multimedia quality; LOBs with lower voltage can reduce the power consumption effectively. However, since the HOBs are still stored in cells with normal voltage, the power savings that is provided by SVS is limited. In this paper, we propose a new low power SRAM design. We apply SVS with optimal high voltage for HOBs. Compared to existing work, our scheme is different in a couple ways: (1) it considers both process variation and aging effect; (2) the optimal high voltage is lower than normal voltage, thereby achieving greater potential for power reduction. The paper is organized as follows. Section II discusses the SRAM failures with process variation and aging effect. Section III presents an overview of MPEG-4 decoder. A detailed description of our proposed SRAM design is given in Section IV. Simulation results and analysis are provided in Section V and finally the paper is concluded in Section VI. The analysis of this paper is based on FreePDK 45 nm CMOS technology [6]. II. SRAM FAILURE WITH VARIATION AND AGING EFFECT A. SRAM failure analysis Fig. 1 (a) shows a standard 6T SRAM cell design (W PU :W PD :W AX =1:2:1.5). In SRAM cells, read failure and write failure are two most important failure mechanisms. The read failure occurs if the voltage difference of two bit-lines are smaller than the offset voltage of the sense amplifiers. The minimum acceptable voltage difference of bit-lines in state-of-the-art is 100 mv [4]. On the other hand, if a cell cannot be written successfully, write failure happens. 978-1-4673-1295-0/12/$31.00 2012 IEEE 21

BL WL Q W AX /L AX =75/50 Vdd W PU /L PU =50/50 W PD /L PD =100/50 (a) Vdd WL QB W AX /L AX =75/50 BLB Traditionally, the reliability of SRAM cells is measured in terms of static noise margin (SNM), which is the maximum length of embedded squares inside the butterfly curves. The read SNM and write SNM of the SRAM cell with different voltages are shown in Fig. 1 (b) and (c), respectively. As shown, the read SNM and write SNM are both reduced significantly as the voltage V dd scales, so the stability becomes an important issue in SRAM with voltage scaling. It is also shown that as voltage scales, the read SNM is much smaller than write SNM. For example, as V dd equals to 0.6V, the write SNM is 0.225V, while the read SNM is as low as 0.081V. This indicates read failure dominates SRAM failures in low voltage operation. B. Impact of process variation (b) SRAM failures at low voltage become more severe with the increasing process variation as technology scales. In particular, the random dopant fluctuation (RDF) effect leads to threshold voltage (Vth) variation and SRAM cell failures [4]. In low-voltage operation with process variation, Fast-NMOS and Slow-PMOS (FS) and Slow- NMOS and Fast-PMOS (SF) are the worst process corners of read and write operations, respectively [2]. Accordingly, the failure probability of an SRAM cell P F can be expressed as P F P RF ( FS) P ( SF) WF where P RF (FS) and P WF (SF) are read failure probability in FS corner and write failure probability in SF corner, respectively. Since the read failure at FS corner is much larger than the write failure at SF corner [2], that is, P RF ( FS) P ( SF) WF (1) (2) Figure 1: SRAM SNM with voltage scaling. (a) The schematic of 6T SRAM cell; (b) Read SNM; (c) Write SNM. (c) Therefore, the SRAM cell failures in worse case happen in the read operation in the FS process corner. Fig. 2 shows the read SNM in the FS corner. We can see that SNM decreases from 0.098 V to 0.029 V under process variation. 22

C. Impact of aging effect With the CMOS technology scaling, NBTI effect is another important factor that results in large memory failures [7, 8]. In an SRAM cell, the negative biased pull-up PMOS transistor generates considerable interface traps, leading to an increase in V th of the PMOS. Such NBTI induced V th increase influences the read stability of SRAM cells. In our analysis, we used predictive model to calculate the V th shift due to NBTI effect after seven years, which is the typical lifetime of modern processors [7]. The supply voltage is 1V and the temperature is 110 o C. Since the NBTI effect only happens when the gate of PMOS is applied by 0, the V th shift due to NBTI depends on the zero probability of Q. We assume the zero probability is 0.5. As shown in Fig.2, considering the NBTI aging effect, the read SNM is reduced from 0.098V to 0.084V. Therefore, with the aging effect, SRAM is more vulnerable to failures. Figure 2: Impact of process variation and NBTI on read SNM of embedded memory. Video streams Buffer Spatial Voltage Scaling (SVS) Entropy decoder Motion compensator embedded SRAM Figure 3: MPEG-4 decoder processor. IQ&IT Off-chip SDRAM buffer III. Embedded SRAM design for MPEG-4 DECODER Fig.3 shows the general block diagram of the MPEG-4 decoder. The decoding process has three frames: I (intracoded) frame, P (predicted) frame, and B (bidirectionally predicted) frame, respectively. I frame is intra-coded and it is the reference frame of P frame, while B frame is obtained based on both P and I frames. The MPEG decoding process is as follows. By performing entropy decoding, inverse quantization (IQ) and inverse transformation (IT), the residual error of the P/B frames can be reconstructed based on the compressed video streams. The motion compensator uses the previous reconstructed frames stored in the memory and the transmitted motion vectors (MV) to construct new frames. Therefore, except the first frame, all other frames are derived from their previous frames. The final reconstructed frame is obtained by combining the motion compensated frame with the residual error [2]. In this process, all of the reconstructed frames have to be stored. To reduce the implementation cost, external Synchronous DRAM (SDRAM) is used instead of a large on-chip SRAM. In state-ofthe-art, on-chip SRAM is usually below 25Kbits for low power decoders. Therefore, the previously decoded frame is stored in the on-chip SRAM, while the whole decoded frame data is sent to offchip SDRAM. The stored frame data is obtained from off-chip SDRAM when needed. Then, the incoming MVs are added to the previous frame data in SRAM, to get the present frame. This present frame in SRAM acts as the previous frame for the next set of MVs for the following frame. Thus, the whole decoding process goes on, to give all the decoded frames which are combined back to give the final decoded video output. In our analysis, we use Peak-signal-noiseratio (PSNR) as the frame quality metric [4]. 255 PSNR 20 log10 MSE where MSE is the mean square error between the original videos (Org) and the degraded videos (Deg), as expressed in (4): (3) 23

MSE mn 1 1 1 m n Org i, j Deg i, j i 0 j 0 2 (4) IV. Variation-Aging aware Low Power SRAM Design Due to the highly frequent accesses, embedded SRAM consumes large power consumption, which is the dominant contributor to the whole MPEG-4 decoder [2]. Accordingly, to reduce the power consumption, the voltage of onchip SRAM is usually over-scaled in MPEG-4 decoder. However, the scaled V dd degrades the output quality significantly due to the propagation of error in frame reconstruction process. Fig.4 shows the PSNR as a function of n. Here, n represents the number of non-failed bits in highorder positions. If n=1, all bits but the highest order bit are failed; if n=8, all eight bits in a SRAM array are failed. Obviously, PSNR decreases significantly as n becomes smaller. As n is 5, the PSNR degradation is only 5.6% but it is increased to 20% while n becomes 3. This is due to the larger contribution of HOBs to the frame quality. Therefore, SVS is an effective approach for low power embedded SRAM design: in SRAM arrays, HOBs are stored in cells with high voltage to maintain the video quality and LOBs are stored in cells with low voltage to reduce power consumption. With SVS, cell failures occur in LOBs, achieving acceptable video quality. PSNR (db) 25 20 15 10 0 1 2 3 4 5 6 7 8 9 Figure 4: PSNR vs. n. n =4: %5.9 n = 5: %1.6 n =1, PSNR degradation is %63.6 n To achieve higher power savings, we propose a novel memory design based on SVS: the higher n HOBs are stored in cells with high V dd (Vdd_Hi) to enhance its reliability and the LOBs are stored in cells with low V dd (Vdd_Lo). Different from SVS, we apply an optimal high voltage instead of normal voltage to enhance the power efficiency. As mentioned before, SNM has been the standard way of analyzing SRAM cell stability. However, the read operation is a not a static process. It also depends on the bit-line capacitance and word-line period strongly [9]. SNM assumes an infinite word-line period, overestimating the read failure. Fig. 5 compares the simulation result based on SNM and dynamic approach. We can see that, based on SNM simulation, Vdd_Hi should be equal to V dd, as adopted in SVS. However, based on the dynamic simulation approach, when Vdd_Hi is 0.6 V, the voltage difference between two bit-lines is 100 mv. If Vdd_Hi continues to scale, the bit-line voltage difference will less than 100 mv and it results in a read failure, as discussed in Section II- A. Accordingly, the optimal Vdd_Hi in the proposed design is 0.6V. Another important design concern is the number of cells with Vdd_Hi (n). In high performance systems, as n increases, the reliability would be improved. At the same time, the power consumption becomes increased. We determined n base on PSNR degradation characteristics. As shown in Fig. 4, as n decreases from 4 to 3, the PSNR degradation is increased from 5.9% to 20%. Therefore, to keep the frame quality with no significant degradation (<6%), we select n=4 to achieve good frame output. The schematic of proposed SRAM is shown in Fig. 6. Similar to SVS, the largest challenge is to generate two different word-line voltages in a single array. We adopt voltage configuration scheme in [5] in our design which generates the different voltage based on inverters. As an output of decoder is zero, the memory array connected to the low word-line is enabled. By using two inverters with Vdd_Hi and Vdd_Lo, the high word-line (WL_Hi) and low word-line (WL_Lo) are generated. Note that, since the low-to-high delay is more 24

(a) 1080p/720p), the target frequency of different embedded memories is 500 MHz. We generated errors in the memory array in the worst case (considering both process variation and aging effect) and then evaluate the quality of the output frames. Fig. 7 shows the frames of video FOOTBALL and with different SRAM design and their corresponding PSNR value. It shows that the memory design with V dd =0.4 V results in largest video quality degradation, while our proposed design can achieve 14.37 db PSNR improvement over the standard cell design with V dd =0.4 V. Another important observation is that the video quality of our proposed design is as good as the SVS design with Vdd_Hi=1V. Fig. 8 shows the power consumption and leakage current improvement of our proposed technique over standard SRAM design and the SVS design with Vdd_Hi=1V. Here, we estimated the write power, read power, and leakage current in FS corner, 110 o C, and word-line period is 20 ns. It can be seen that significant power saving can be achieved with our technique. As compared to the standard SRAM design, our technique can achieve 85%, 90%, and 79% reduction in write power, read power, and leakage current, respectively. In addition, as compared to the SVS design with Vdd_Hi=1V, an addition 72% write power, 80% read power and 63% leakage current can be obtained at the same degradation level. BL_Lo WL_Lo Vdd_Lo Vdd_Lo BLB_Lo WL_Lo (b) Q QB Figure 5: Determining Vdd_Hi.(a) SNM based result; (b) Dynamic approach based result. critical to get fast world-line signals, large PMOS transistors are preferable in inverters. V. SIMULATION RESULT AND ANALYSIS A. Simulation Result We have used 50 frames of standard gray scale FOOTBALL Video sequence, which is in raster format. The frame size in our simulation is 352px240p. In order to meet the performance requirements of high quality video formats (e.g. Inverters Vdd_Hi=0.6V Vdd_Lo=0.4V bit 8 bit 5 bit 4 bit 1 BL_Hi WL_Hi n=4 HOBs Q (8-n) =4 LOBs Vdd_Hi Vdd_Hi BLB_Hi WL_Hi QB Figure 6: Schematic of proposed SRAM design 25

INPUT frame 50 frame 50 - all 8 bits bad/flipped Vdd H Inverters 11812.5 nm Vdd L 720 nm Normal Design with Vdd=1V PSNR=25.47 db Figure 7: Quality and PSNR of FOOTBALL video frame using different SRAM design. Power Improvement frame 50 - only 4 LSBs (bits4:1)flipped SVS design Vdd_Hi=1V; Vdd_Lo=0.4V PSNR=23.43 db Normal Design with Vdd=0.4V PSNR=9.06 db frame 50 - only 4 LSBs (bits4:1)flipped Our design Vdd_Hi=0.6V and Vdd_Lo=0.4V PSNR=23.43 db Figure 8: Read power, write power, and leakage current improvement of our proposed design as compared to the standard design (Vdd_Hi=Vdd_Lo=1V) and the SVS design in [5] (Vdd_Hi=1V and Vdd_Lo=0.4V) B. Area Overhead Based on conservative MOSIS deep submicrometer design rules [10], we designed the layout of novel SRAM, as shown in Fig. 9. The area of inverters and a cell are 900 360 nm 2 and 1316.25 360 nm 2, respectively. Therefore, the inserted inverters take up about 68% of a cell area. Accordingly, we can express the area overhead of the proposed memory design as: 68 Area _ Overhead % (5) N where N is the number of 6T cells in a word. Therefore, if N is 8, the area overhead of our technique is around 8%. If we design embedded SRAM with 16 bits, the area overhead can be reduced to around 5%. Figure 9: Layout design. VI. CONCLUSION A variation and aging effect aware low power embedded SRAM is presented for MPEG-4 video processors. Based on the spatial voltage scaling, we determines the optimal voltage for HOBs, thereby achieving additional power savings. Simulation results show that 85%, 90%, and 79% reduction in write power, read power, and leakage current can be obtained as compared to standard SRAM design. At the same time, 72%, 80%, and 62% reduction in write power, read power, and leakage current can be achieved as compared to existing SVS design. REFERENCES 1. MPEG [Online]. Available: http://www.mpeg4.org 2. I. Chang, D. Mohapatra, and K. Roy, A priority-based 6T/8T hybrid SRAM architecture for aggressive voltage scaling in video applications, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 2, pp. 101 112, Feb. 2011.2. 3. Masood Qazi, Mahmut E. Sinangil, and Anantha P. Chandrakasan, Challenges and Directions for Low-Voltage SRAM, IEEE Design & Test of Computers, vol. 28, no. 1, pp: 32-43, Jan. 2011. 4. Jinmo Kwon, Insoo Lee, and Jongsun Park, "Heterogeneous SRAM Cell Sizing for Low Power H.264 Applications," IEEE Transactions on Circuits and Systems I (TCAS I), vol. 99, no. 2, pp. 1 10, Feb. 2012.4. 5. Minki Cho, Jason Schlessman, Wayne Wolf, and Saibal Mukhopadhyay, Reconfigurable SRAM Architecture With Spatial Voltage Scaling for Low Power Mobile Multimedia Applications, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 1, pp. 161-165, Jan. 2011. 6. FreePDK45. Available: http://www.eda.ncsu.edu/wiki/freepdk45 7. Fahad Ahmed and Linda Milor, NBTI Resistant SRAM Design, In Proc. IWASI, 2011, pp. 82-87. 8. Sang Park, Kaushik Roy, and Kunhyuk Kang, Reliability Implications of Bias-Temperature Instability in Digital ICs, IEEE Design & Test of Computers, pp.8-17, Dec. 2009. 9. Jiajing Wang and Benton H. Calhoun, Minimum Supply Voltage and Yield Estimation for Large SRAMs Under parametric Variations, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 1, pp. 2120-2125, Jan. 2011. 10. MOSIS deep design rules. http://www.mosis.com 26