FSM Implementations TIE-50206 Logic Synthesis Arto Perttula Tampere University of Technology Fall 2016 Input Next State Current state Output Moore
Acknowledgements Prof. Pong P. Chu provided official slides for the book which is gratefully acknowledged See also: http://academic.csuohio.edu/chu_p/ Most slides were originally made by Ari Kulmala and other previous lecturers (Teemu Pitkänen, Konsta Punkka, Mikko Alho, Erno Salminen ) M. Perkows, ECE 590. DIGITAL SYSTEM DESIGN USING HARDARE DESCRIPTION LANGUAGES, Portland State University, June 2008, http://web.cecs.pdx.edu/~mperkows/class_vhdl_99/june2008/ Arto Perttula 15.11.2016 2
Next_track y=z5 in=next in=pl in=others in=x2 Play x 2 y=z3 in=others always Prev_track y=z6 in=prev Stop y=z1 in=others in=st Play y=z2 in=x2 in=-x2 Rewplay x 2 y=z4 in=others in=-x2 All the previous teachings are still valid and just the description style changes FINITE STATE MACHINES Arto Perttula 15.11.2016 3
Finite-State Machine (FSM) Application defines the set of correct state machines State machines are usually the hardest part of logic design You must be extra careful that actions happen on the correct clock cycle! Two basic flavors: Mealy and Moore In both cases, one must select whether to include the output registers or not Moreover, you decide the VHDL presentation of FSM Description style: how many processes Encoding of states, if not automated in synthesis Input Next State Current state Output Input Next State Current state Output Moore Mealy 4
FSM Implementation in VHDL General form: We define an own type for the state machine states ALWAYS use enumeration type for state machine states Synthesis software, e.g., Quartus II, does not recognize it otherwise Sometimes next state (NS) is separate VHDL signal but not always architecture rtl of traffic_light is type states_type is (red, yellow, green); Enumeration type for states -- init state explicitly defined in reset, not here signal present_state_r : states_type;... begin -- rtl Signal is the register for current state 15.11.2016 5
At Least 5 Implementation Styles 1. 1 sequential process 2. 2 processes a) Seq: curr. state registers and output, Comb: next state logic b) Seq: curr state, Comb: next state, output c) Seq: next and curr state, Comb: output 3. 3 processes (Seq: curr state, Comb: output, Comb: next state logic separated) Input Next State Current state Output Input Next State Current state Output Moore Mealy 6
Coding Style: 1seg-Moore/Reg-Mealy One segment coding style uses 1 sequential process sync_all : process (clk, rst_n) begin if rst_n = '0' then <INIT STATE and OUTPUT OF THE FSM> elsif clk'event and clk = '1' then <Define new value of curr state> <Define output. All these outputs become registers!> end if; end process sync_all; Arto Perttula 15.11.2016 7
Coding Style: 2seg-Moore/Mealy Moore, two segment coding style uses 1 sequential process and 1 combinatorial process/concurrent assignments sync_ps : process (clk, rst_n) begin if rst_n = '0' then <INIT STATE OF THE FSM> elsif clk'event and clk = '1' then <Synchronous part of the FSM; assign next state to curr state> end if; end process sync_ps; comb_output : process (ctrl_r, input) begin <Combinational part; define output> (Mealy looks same as Moore, but considers also the input when determining the output)> end process comb_output; Arto Perttula 15.11.2016 8
Coding Style: 3seg-Moore/Mealy Three segment coding style uses 1 sequential process and 2 combinatorial processes sync_ps : process (clk, rst_n) begin if rst_n = '0' then <INIT STATE OF THE FSM> elsif clk'event and clk = '1' then <curr state assignment> end if; end process sync_ps; comb_ns : process (ctrl_r, input) begin <Combinational part; define next state> end process comb_ns; comb_output : process (ctrl_r, input) begin <Combinational part; define output> end process comb_output; Arto Perttula 15.11.2016 9
Examples: Traffic Light FSM Implemented with Various Styles red Simple traffic light controller Two inputs: request green and request red One output: one-hot encoded color Implemented in various styles, Moore-Mealy, 1-3 segments VHDL at http://www.tkt.cs.tut.fi/kurssit/50200/s16/luennot.html Output latency is larger for Moore and registered Mealy However, all the implementations keep yellow light on for the same amount of time Examples show also the usage of counter in state machine Acts as a timer for showing yellow light Sometimes designer must modify timer limit values, e.g., if counter_r = limit-1 instead of counter_r = limit) yellow green Arto Perttula 15.11.2016 10
Timing of Traffic Light FSMs All versions show yellow light the same amount of time r_y_g = 010 for 5 cycles However, there is 1 cycle difference when yellow is ON Inside the DUV, state and counter values are aligned differently as well I. Mealy 2a,2b,3 react immediately II. III. No output register, only combinatorial delay from request to r_y_g (=0ns in RTL simulation) Mealy 1 needs 1 cycle Output register and curr_state_r change simultaneously VHDL code assigns them when the state changes Moore 2 needs 1 cycle Updating curr_state_r takes 1 cycle Output assigned combinatorially from curr_state_r (=0ns in RTL simulation) Arto Perttula 15.11.2016 11
Comparison of Implementation Styles: Coding Style 1-segment: Just synchronous process Automatically inferred output registers Simple view to the design, everything at one place Safe, registered Mealy machine is easy to implement with this style Recommended (as opposite to some books!) 2-segment, 3-segment Only way to implement unregistered outputs to FSMs Modular Long ago synthesis tools did not recognize 1-segment FSMs correctly Not the case anymore Recommended style in many books, partially because of those limitations of the old tools Useful for quite simple control machines that do not have, e.g., delay counters included Complex state machines are cumbersome to read The code does not proceed smoothly, have to jump around the code The same condition may be repeated in many processes Arto Perttula 15.11.2016 12
Quartus II design flow after you ve simulated and verified the design Generic gate-level representation, just gates and flip-flops Places and routes the logic into a device, logic elements, macros and routing cells Converts the post-fit netlist into a FPGA programming file (.sof) Analyzes and validates the timing performance of all logic in a design. Run on FPGA Arto Perttula 15.11.2016 13
Examples: Extracted State Diagram Tool A Note the encoding: it s not basic binary nor one-hot. 14
RTL Synthesis Result: Tool A State register Register for output bit 2 Registers for output bits 1..0 Next state logic, incl. counter for showing yellow light long enough Comb path from input to output logic Combinatorial output logic Registered mealy machine, traffic light VHD Arto Perttula 15.11.2016 15
Technology Schematic, Tool A Single flip-flops Look-up tables, max 6 inputs Logic on previous slide has been mapped to FPGA s primitives. Registered mealy machine, traffic light VHD 15.11.2016 16
RTL Synthesis Result: Tool B Same VHDL, slightly different result # Info: [45144]: Extracted FSM in module work.traffic_light(rtl){generic map (n_colors_g => 3 yellow_length_g => 10)}, with state variable = ctrl_r[1:0], async set/reset state(s) = 00, number of states = 3. # Info: [45144]: Preserving the original encoding in 3 state FSM# Info: [45144]: FSM: State encoding table. # Info: [40000]: FSM: Index Literal Encoding # Info: [40000]: FSM: 0 00 00 # Info: [40000]: FSM: 1 01 01 # Info: [40000]: FSM: 2 10 10 Note the different state encoding Registered mealy machine, traffic light VHD 15.11.2016 17
Technology Schematic, Tool B LUTs Multi-bit registers Registered mealy machine, traffic light VHD Arto Perttula 15.11.2016 18
Physical Placement On-Chip The traffic_light.vhd place and routed Stratix 2S180, 143 000 ALUTs (~LUTs) Quite much unused resources... 15.11.2016 19
Implementation Area And Frequency Note that no strict generalization can be made about the betterness Tool A Total ALUTs 15 ALUTs with register 10 Tool B LUTs 16 Registers 9 The one register difference is due to the different state encoding The state encoding can be explicitly defined or left to the tool to choose (as in this case) Arto Perttula 15.11.2016 20
Synthesis of Different VHDLs AREA [LUT] AREA [reg] Lines of Code mealy (single) 16 9 104 mealy (output separated) 13 6 126 mealy_2proc. (out+ns separated) 11 6 125 mealy_3proc 11 6 150 Moore 11 6 108 Functionally equivalent Timing aspect vary Different max frequency Only the Mealy single has output registers (3 bit) Coding style has a minor effect here Readibility of the code is as crucial! Arto Perttula 15.11.2016 21
Comparison of Implementation Styles: Moore and Mealy Number of processes does not affect HW area and speed deterministically. The differences are mainly in readability. Generally, we want that outputs are registers Traditionally Mealy machine is sometimes problematic due to possible combinatorial paths or loops For registered outputs, use a registered Mealy machine Outputs are registered, but has shorter latency than Moore machine with registered outputs Otherwise, opt Moore machine Arto Perttula 15.11.2016 22
Notes on Finite State Machines Quite often datapath and control get mixed in HDL description Start with slow and simple FSM if neighbour blocks allow that Takes few extra cycles but has less branching and hence simpler conditions Easier to get working at all You can later reduce few clock cycles by skipping some state in certain conditions (e.g., adding red arc wait_ack -> write) Be careful with the timing of output register Always mark the 1st state clearly Draw also the self-loops Wait ack Wait data write Arto Perttula 15.11.2016 23
Synthesis Observations Make sure that you are aware of what signals of the shown codes have been implemented as registers! In most cases, use enumeration and let the synthesis tool to decide the state encoding Not much difference in delay or area in realistic circuits One-hot encoding is easiest to debug! Different tools produce slightly different results in even small designs Synthesis tools are heuristic due to very large design space Modest effect (e.g., -10%-+10%) also achievable by tuning the tool settings Even a single tool may produce slightly different results on different runs! Optimization heuristics utilize randomness However, no tool can convert a bad design into a good one! Arto Perttula 15.11.2016 24
Conclusions Finite state machines can be coded in a variety ways Prefer simplicity, according to TUT coding rules Synthesis tools create different but functionally equivalent netlists even for small designs Arto Perttula 15.11.2016 25