Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Similar documents
March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

L12: Reconfigurable Logic Architectures

RELATED WORK Integrated circuits and programmable devices

L11/12: Reconfigurable Logic Architectures

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

High Performance Carry Chains for FPGAs

Field Programmable Gate Arrays (FPGAs)

Why FPGAs? FPGA Overview. Why FPGAs?

Examples of FPLD Families: Actel ACT, Xilinx LCA, Altera MAX 5000 & 7000

Cyclone II EPC35. M4K = memory IOE = Input Output Elements PLL = Phase Locked Loop

CDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida

Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

INTERMEDIATE FABRICS: LOW-OVERHEAD COARSE-GRAINED VIRTUAL RECONFIGURABLE FABRICS TO ENABLE FAST PLACE AND ROUTE

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

FPGA Design with VHDL

Integrated circuits/5 ASIC circuits

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

2. Logic Elements and Logic Array Blocks in the Cyclone III Device Family

A Fast Constant Coefficient Multiplier for the XC6200

FPGA Hardware Resource Specific Optimal Design for FIR Filters

Lecture 6: Simple and Complex Programmable Logic Devices. EE 3610 Digital Systems

Chapter 7 Memory and Programmable Logic

Lecture 2: Basic FPGA Fabric. James C. Hoe Department of ECE Carnegie Mellon University

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large

Implementation of Low Power and Area Efficient Carry Select Adder

CAD for VLSI Design - I Lecture 38. V. Kamakoti and Shankar Balachandran

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

Lecture 10: Programmable Logic

EEM Digital Systems II

FPGA Implementation of DA Algritm for Fir Filter

FPGA Design. Part I - Hardware Components. Thomas Lenzi

An Efficient High Speed Wallace Tree Multiplier

9 Programmable Logic Devices

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

Dynamically Reconfigurable FIR Filter Architectures with Fast Reconfiguration

VLSI IEEE Projects Titles LeMeniz Infotech

CSE140L: Components and Design Techniques for Digital Systems Lab. FSMs. Tajana Simunic Rosing. Source: Vahid, Katz

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Midterm Exam 15 points total. March 28, 2011

Combinational vs Sequential

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

Design and Analysis of Modified Fast Compressors for MAC Unit

ESE534: Computer Organization. Previously. Today. Previously. Today. Preclass 1. Instruction Space Modeling

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 9 Field Programmable Gate Arrays (FPGAs)

Outline Synchronous Systems Introduction Field Programmable Gate Arrays (FPGAs) Introduction Review of combinational logic

L14: Quiz Information and Final Project Kickoff. L14: Spring 2004 Introductory Digital Systems Laboratory

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

IE1204 Digital Design. F11: Programmable Logic, VHDL for Sequential Circuits. Masoumeh (Azin) Ebrahimi

VU Mobile Powered by S NO Group

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

IE1204 Digital Design F11: Programmable Logic, VHDL for Sequential Circuits

CS184a: Computer Architecture (Structures and Organization) Last Time

CHAPTER 4 RESULTS & DISCUSSION

Automatic Transistor-Level Design and Layout Placement of FPGA Logic and Routing from an Architectural Specification

A Tour of PLDs. PLD ARCHITECTURES. [Prof.Ben-Avi]

Digital Systems Design

WINTER 15 EXAMINATION Model Answer

DESIGN OF HIGH PERFORMANCE, AREA EFFICIENT FIR FILTER USING CARRY SELECT ADDER

XC4000E and XC4000X Series. Field Programmable Gate Arrays. Low-Voltage Versions Available. XC4000E and XC4000X Series. Features

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

A S. x sa1 Z 1/0 1/0

When the OR-array is pre-programed (fixed) and the AND-array. is programmable, you have what is known as a PAL/GAL. These are very low

Designing for High Speed-Performance in CPLDs and FPGAs

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

Final Project [Tic-Tac-Toe]

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

PROGRAMMABLE ASIC LOGIC CELLS

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

SA4NCCP 4-BIT FULL SERIAL ADDER

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it,

Microprocessor Design

Lucent ORCA OR2C15A-2S208 FPGA Circuit Analysis

On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

Chapter 3. Boolean Algebra and Digital Logic

CPS311 Lecture: Sequential Circuits

Modeling Digital Systems with Verilog

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Day 21: Retiming Requirements. ESE534: Computer Organization. Relative Sizes. Today. State. State Size

Software Engineering 2DA4. Slides 3: Optimized Implementation of Logic Functions

Introduction to Digital Logic Missouri S&T University CPE 2210 Exam 3 Logistics

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

ECE 263 Digital Systems, Fall 2015

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Chapter 8 Functions of Combinational Logic

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Modeling and simulation of altera logic array block using quantum-dot cellular automata

Chapter Contents. Appendix A: Digital Logic. Some Definitions

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

ALONG with the progressive device scaling, semiconductor

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

Transcription:

Reconfigurable Architectures Greg Stitt ECE Department University of Florida

How can hardware be reconfigurable? Problem: Can t change fabricated chip ASICs are fixed Solution: Create components that can be made to function in different ways

History SPLD Simple Programmable Logic Device Example: PAL (programmable array logic) PLA (programmable logic array Basically, 2-level grid of and and or gates Program connections between gates Initially, used fuses/prom Could only be programmed once! GAL (generic array logic) allowed to be reprogrammed using EPROM/EEPROM But, took long time Implements hundreds of gates, at most [Wikipedia]

History CPLD Complex Programmable Logic Devices Initially, was a group of SPLDs on a single chip More recent CPLDs combine macrocells/logic blocks Macrocells can implement array logic, or other common combinational and sequential logic functions [Xilinx]

Current/Future Directions FPGA (Field-programmable gate arrays) - mid 98s Misleading name - there is no array of gates Array of fine-grained configurable components Will discuss architecture shortly Currently support millions of gates Coarse-grained RC architectures Array of coarse-grained components Multipliers, DSP units, etc. Potentially, larger capacity than FPGA But, applications may not map well Wasted resources Inefficient execution

FPGA Architectures How can we implement any circuit in an FPGA? First, focus on combinational logic Example: Half adder Combinational logic represented by truth table What kind of hardware can implement a truth table? Input Out A B S Input Out A B C

Look-up-tables (LUTs) Implement truth table in small memories (LUTs) Usually SRAM A B S Logic inputs connect to address inputs, logic output is memory output A B Addr S Output 2-input, -output LUTs A B A B C Addr C Output

Look-up-tables (LUTs) Alternatively, could have used a 2-input, 2-output LUT Outputs commonly use same inputs A Addr A Addr A Addr B B B S C S C

Look-up-tables (LUTs) Slightly bigger example: Full adder Combinational logic can be implemented in a LUT with same number of inputs and outputs 3-input, 2-ouput LUT Truth Table 3-input, 2-output LUT Inputs Outputs A B Cin S Cout A B Cin S Cout

Look-up-tables (LUTs) Why aren t FPGAs just a big LUT? Size of truth table grows exponentially based on # of inputs 3 inputs = 8 rows, 4 inputs = 6 rows, 5 inputs = 32 rows, etc. Same number of rows in truth table and LUT LUTs grow exponentially based on # of inputs Number of SRAM bits in a LUT = 2 i * o i = # of inputs, o = # of outputs Example: 64 input combinational logic with output would require 2 64 SRAM bits.84 x 9 Clearly, not feasible to use large LUTs So, how do FPGAs implement logic with many inputs?

Look-up-tables (LUTs) Fortunately, we can map circuits onto multiple LUTs Divide circuit into smaller circuits that fit in LUTs (same # of inputs and outputs) Example: 3-input, 2-output LUTs

Look-up-tables (LUTs) What if circuit doesn t map perfectly? More inputs in LUT than in circuit Truth table handles this problem Unused inputs are ignored More outputs in LUT than in circuit Extra outputs simply not used Space is wasted, so should use multiple outputs whenever possible

Look-up-tables (LUTs) Important Point The number of gates in a circuit has no effect on the mapping into a LUT All that matters is the number of inputs and outputs Unfortunately, it isn t common to see large circuits with a few inputs gate,, gates Both of these circuits can be implemented in a single 3-input, -output LUT

Sequential Logic Problem: How to handle sequential logic Truth tables don t work Possible solution: Add a flip-flop to the output of LUT 3-in, -out LUT 3-in, 2-out LUT etc.

Sequential Logic Example: 8-bit register using 3-input, 2-output LUTs Input: x, Output: y x(7) x(6) x(5) x(4) x(3) x(2) x() x() 3-in, 2-out LUT 3-in, 2-out LUT 3-in, 2-out LUT 3-in, 2-out LUT y(7) y(6) y(5) y(4) y(3) y(2) y() y() What does LUT need to do to implement register?

Sequential Logic 3-in, 2-out LUT y() x() x() Example, cont. y() LUT simply passes inputs to appropriate output Inputs/Outputs LUT functionality Corresponding Truth Table y() x() x() y() x() x() y() y() Corresponding LUT x() x() y() y()

Sequential Logic Isn t it a waste to use LUTs for registers? YES! (when it can be used for something else) Commonly used for pipelined circuits Example: Pipelined adder + + 3-in, 2-out LUT 3-in, 2-out LUT.... Register Register + Register Adder and output register combined not a separate LUT for each

Sequential Logic Existing FPGAs don t have a flip flop connected to LUT outputs Why not? Flip flop has to be used! Impossible to have pure combinational logic Adds latency to circuit Actual Solution: Configurable Logic Blocks (s)

Configurable Logic Blocks (s) s: the basic FPGA functional unit First issue: How to make flip-flop optional? Simplest way: use a mux Circuit can now use output from LUT or from Where does select come from? (will be answered shortly) 3-in, -out LUT 2x

Configurable Logic Blocks (s) s usually contain more than LUT Why? Efficient way of handling common I/O between adjacent LUTs Saves routing resources (we haven t discussed yet) 2x 3-in, 2-out LUT 3-in, 2-out LUT 2x 2x 2x 2x

Configurable Logic Blocks (s) Example: Ripple-carry adder Each LUT implements full adder Use efficient connections between LUTs for carry signals A() B() A() B() Cin() 2x Cin() 3-in, 2-out LUT 3-in, 2-out LUT 2x 2x Cout() 2x 2x Cout() S() S()

Configurable Logic Blocks (s) s often have specialized connections between adjacent s Further improves carry chains Avoids routing resources Some commercial s even more complex Xilinx Virtex 4 consists of 4 slices slice = 2 LUTs + 2 s + other stuff Virtex 4 = 8 LUTs Altera devices has LABs (Logic Array Blocks) Consist of 6 LEs (logic elements) which each have 4 input LUTs

What Else? Basic building block is Can implement combinational+sequential logic All circuits consist of combinational and sequential logic So what else is needed?

Reconfigurable Interconnect FPGAs need some way of connecting s together Reconfigurable interconnect But, we can only put fixed wires on a chip Problem: How to make reconfigurable connections with fixed wires? Main challenge: Should be flexible enough to support almost any circuit

Reconfigurable Interconnect Problem 2: If FPGA doesn t know which s will be connected, where does it put wires? Solution: Put wires everywhere! Referred to as channel wires, routing channels, routing tracks, many others s typically arranged in a grid, with wires on all sides

Reconfigurable Interconnect Problem 3: How to connect to wires? Solution: Connection box Device that allows inputs and outputs of to connect to different wires Connection box

Reconfigurable Interconnect Connection box characteristics Flexibility The number of wires a input/output can connect to Flexibility = 2 Flexibility = 3 *Dots represent possible connections

Reconfigurable Interconnect Connection box characteristics Topology Defines the specific wires each I/O can connect to Examples: same flexibility, different topology *Dots represent possible connections

Reconfigurable Interconnect Connection boxes allow s to connect to routing wires But, that only allows us to move signals along a single wire Not very useful Problem 4: How do FPGAs connect wires together?

Reconfigurable Interconnect Solution: Switch boxes, switch matrices Connects horizontal and vertical routing channels Switch box/matrix

Reconfigurable Interconnect Switch boxes Flexibility - defines how many wires a single wire can connect to Topology - defines which wires can be connected Planar/subset switch box: only connects tracks with same id/offset (e.g. to, to, etc.) Wilton switch box: connects tracks with different offsets 2 3 2 3 Planar Wilton 2 2 2 2 3 3 3 3 2 3 *Not all possible connections shown 2 3

Reconfigurable Interconnect Why do flexiblity and topology matter? Routability: a measure of the number of circuits that can be routed Higher flexibility = better routability Wilton switch box topology = better routability Src Src No possible route from src to dest Dest Dest

Reconfigurable Interconnect Switch boxes Short channels Useful for connecting adjacent s Long channels Useful for connecting s that are separated Allows for reduced routing delay for non-adjacent s Short channel Long channel

FPGA Fabrics FPGA layout called a fabric 2-dimensional array of s and programmable interconnect Sometimes referred to as an island style architecture...... Can implement any circuit But, should fabric include something else?

FPGA Fabrics What about memory? Could use s in s to create a memory Example: Create a MB memory with: with a single 3-input, 2-output LUT Each = 2 bits of memory (because of 2 outputs) Total s = ( MB * 8 bits/byte) / 2 bits/ 4 million s!!!! FPGAs commonly have tens of thousands of LUTs Large devices have -2k LUTs State-of-the-art devices ~8k LUTs Even if FPGAs were large enough, using a chip to implement MB of memory is not smart Conclusion: Bad Idea!! Huge waste of resources!

FPGA Memory Components Solution : Use LUTs for logic or memory LUTs are small SRAMs, why not use them as memory? Xilinx refers to as distributed RAM Solution 2: Include dedicated RAM components in the FPGA fabric Xilinx refers to as Block RAM Can be single/dual-ported Can be combined into arbitrary sizes Can be used as FIFO Different clock speeds for reads/writes Altera has Memory Blocks M4K: 4k bits of RAM Others: M9K, M2k, M44K

FPGA Memory Components Fabric with Block RAM Block RAM can be placed anywhere Typically, placed in columns of the fabric BR BR BR BR... BR BR....

DSP Components FPGAs commonly used for DSP apps Makes sense to include custom DSP units instead of mapping onto LUTs Custom unit = faster/smaller Example: Xilinx DSP48 Includes multipliers, adders, subtractors, etc. 8x8 multiplication 48-bit addition/subtraction Provides efficient way of implementing Add/subtract/multiply MAC (Multiply-accumulate) Barrel shifter FIR Filter Square root Etc. Altera devices have multiplier blocks Can be configured as 8x8 or 2 separate 9x9 multipliers

Existing Fabrics Existing FPGAs are 2-dimensional arrays of s, DSP, Block RAM, and programmable interconnect Actual layout/placement differs for different FPGAs BR DSP DSP DSP DSP BR BR BR... BR BR BR.... BR

Programming FPGAs How to program/configure FPGA to implement circuit? So far, we ve mapped a circuit onto FPGA fabric Known as technology mapping Process of converting a circuit in one representation into a representation that corresponds to physical components Gates to LUTs Memory to Block RAMs Multiplications to DSP48s Etc. But, we need some way of configuring each component to behave as desired Examples: How to store truth tables in LUTs? How to connect wires in switch boxes? Etc.

Programming FPGAs General Idea: include s in fabric to control programmable components Example: Need a way to specify select for mux 3-in, -out LUT FPGA can be programmed to use/skip mux by storing appropriate bit Select? 2x

Programming FPGAs Example 2: Connection/switch boxes Need s to specify connections

Programming FPGAs FPGAs programmed with a bitfile File containing all information needed to program FPGA Contains bits for each control Also, contains bits to fill LUTs But, how do you get the bitfile into the FPGA? > k LUTs Small number of pins

Programming FPGAs Solution: Shift Registers General Idea Configuration bits input here Make a huge shift register out of all programmable components (LUTs, control s) Shift in bitfile one bit at a time Shift register shifts bits to appropriate location in FPGA

Programming FPGAs Example: Program with 3-input, -output LUT to implement sum output of full adder Assume data is shifted in this direction In Out A B Cin S Should look like this after programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming 2x 2x

Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift register During programming After programming is programmed to implement full adder! Easily extended to program entire FPGA 2x 2x

Programming FPGAs Problem: Reconfiguring FPGA is slow Shifting in bit at a time not efficient Bitfiles can be greater than MB Eliminates one of the main advantages of RC Partial reconfiguration With shift registers, entire FPGA has to be reconfigured Solutions? Virtex II allows columns to be reconfigured Virtex IV allows custom regions to be reconfigured Requires a lot of user effort Better tools needed

FPGA Architecture Tradeoffs LUTs with many inputs can implement large circuits efficiently Why not just use LUTs with many inputs? High flexibility in routing resources improves routability Why not just allow all possible connections? Answer: architectural tradeoffs Anytime one component is increased/improved, there is less area for other components Larger LUTs => less total LUTs, less routing resources More Block RAM => less LUTs, less DSPs More DSPs => less LUTs, less Block RAM Etc.

FPGA Architecture Tradeoffs Example: Determine best LUTs for following circuit Choices 4-input, 2-output LUT (delay = 2 ns) 5-input, 2-output LUT (delay = 3 ns) Assume each SRAM cell is 6 transistors 4-input LUT = 6 * 2 4 * 2 = 92 transistors 5-input LUT = 6 * 2 5 * 2 = 384 transistors

FPGA Architecture Tradeoffs Example: Determine best LUTs for following circuit Choices 4-input, 2-output LUT (delay = 2 ns) 5-input, 2-output LUT (delay = 3 ns) Assume each SRAM cell is 6 transistors 4-input LUT = 6 * 2 4 * 2 = 92 transistors 5-input LUT = 6 * 2 5 * 2 = 384 transistors 5-input LUT Propagation delay = 6 ns Total transistors = 384 * 2 = 768

FPGA Architecture Tradeoffs Example: Determine best LUTs for following circuit Choices 4-input, 2-output LUT (delay = 2 ns) 5-input, 2-output LUT (delay = 3 ns) Assume each SRAM cell is 6 transistors 4-input LUT = 6 * 2 4 * 2 = 92 transistors 5-input LUT = 6 * 2 5 * 2 = 384 transistors 4-input LUT Propagation delay = 4 ns Total transistors = 92 * 2 = 384 4-input LUTs are.5x faster and use /2 the area

FPGA Architecture Tradeoffs Example 2 Determine best LUTs for following circuit Choices 4-input, 2-output LUT (delay = 2 ns) 5-input, 2-output LUT (delay = 3 ns) Assume each SRAM cell is 6 transistors 4-input LUT = 6 * 2 4 * 2 = 92 transistors 5-input LUT = 6 * 2 5 * 2 = 384 transistors

FPGA Architecture Tradeoffs Example 2 Determine best LUTs for following circuit Choices 4-input, 2-output LUT (delay = 2 ns) 5-input, 2-output LUT (delay = 3 ns) Assume each SRAM cell is 6 transistors 4-input LUT = 6 * 2 4 * 2 = 92 transistors 5-input LUT = 6 * 2 5 * 2 = 384 transistors 5-input LUT Propagation delay = 3 ns Total transistors = 384

FPGA Architecture Tradeoffs Example 2 Determine best LUTs for following circuit Choices 4-input, 2-output LUT (delay = 2 ns) 5-input, 2-output LUT (delay = 3 ns) Assume each SRAM cell is 6 transistors 4-input LUT = 6 * 2 4 * 2 = 92 transistors 5-input LUT = 6 * 2 5 * 2 = 384 transistors 4-input LUT Propagation delay = 4 ns Total transistors = 384 transistors 5-input LUTs are.3x faster and use same area

FPGA Architecture Tradeoffs Large LUTs Fast when using all inputs Wastes transistors otherwise Must also consider total chip area Wasting transistors may be ok if there are plently of LUTs Virtex V uses 6 input LUTs Virtex IV uses 4 input LUTs

FPGA Architecture Tradeoffs How to design FPGA fabric? There is no overall best Design fabric based on different domains DSP will require many of DSP units HPC may require balance of units Embedded systems may require microprocessors Example: Xilinx Virtex IV Three different devices LX - designed for logic intensive apps SX - designed for signal processing apps FX - designed for embedded systems apps Has 45 MHz PowerPC cores embedded in fabric