Audio Broadcasting using FPGA (Pod-Casting) Haider Ali Kent Borecky ECE 445, SENIOR DESIGN PROJECT. Summer TA: Alex Spektor. Project No.

Audio Broadcasting using FPGA (Pod-Casting) By Haider Ali Kent Borecky ECE 445, SENIOR DESIGN PROJECT Summer 2005 TA: Alex Spektor 07/31/2005 Project No. 3

ABSTRACT Pod-Casting is an audio-content streaming project with Ethernet support. It is a proof-of-concept project in essence, because although there are several software-based solutions available for audio streaming and management (for example Nullsoft s Shoutcast), hardware based implementations which combine both decoding of compressed audio formats and provide network connectivity is not very popular. As such, Pod-Casting allows XUP development boards to communicate and stream PCM audio content. Attempts to implement a functional Ethernet MAC layer did not materialize initially, however, subsequent efforts led to functioning MAC layer allowing full network connectivity as proposed in the design review. Subsequent efforts with Pod-Casting would involve implementing the complete solution discussed above involving a MPEG decoder. In addition, any future endeavors in this area utilizing the XUP development hardware would certainly benefit from the experience gained from trying to implement Pod-Casting. ii

TABLE OF CONTENTS 1. INTRODUCTION... 1 1.1 MOTIVATION AND PURPOSE... 1 1.2 SPECIFICATIONS... 1 1.3 SUBPROJECTS... 2 1.3.1 LM4550 AC 97 AUDIO CODEC... 2 1.3.2 MEDIA ACCESS LAYER AND LXT972A PHY TRANSCEIVER... 2 1.3.3 PS/2 KEYBOARD INTERFACE... 2 1.3.4 SVGA ADAPTOR AND FMS3818 TRIPLE VIDEO DAC... 2 2. DESIGN PROCEDURE... 3 2.1 LM4550 AC 97 AUDIO CODEC... 3 2.2 MEDIA ACCESS LAYER AND LXT972A PHY TRANSCEIVER... 4 2.3 SRAM INTERFACE... 5 2.4 PS/2 KEYBOARD INTERFACE... 5 2.5 SVGA ADAPTOR AND FMS3818 TRIPLE VIDEO DAC... 6 2.5.1 CLK_DIVIDER... 6 2.5.2 DEBOUNCE_XUP... 6 2.5.3 DEFAULTS... 6 2.5.4 VGA_SYNC... 6 2.6 CHARACTER GENERATION...6 3. DESIGN DETAILS... 8 3.1 LM4550 AC 97 AUDIO CODEC... 8 3.1.1 CONFIGURING THE CODEC...8 3.1.2 WAVE FILE PLAYBACK FROM SRAM AND MIC-IN/LINE-IN... 8 3.2 MEDIA ACCESS LAYER AND LXT972A PHY TRANSCEIVER... 9 3.3 PS/2 KEYBOARD INTERFACE... 10 3.4 SVGA ADAPTOR AND FMS3818 TRIPLE VIDEO DAC... 10 3.4.1 VGA_SYNC... 10 3.5 CHARACTER GENERATION... 11 4. DESIGN VERIFICATION... 12 4.1 TESTS... 12 4.1.1 COUNTER SIMULATION TESTING... 12 4.1.2 AC97 CODEC SIMULATION TESTING... 12 4.1.3 VGA TIMING GENERATOR SIMULATION TESTING... 12 4.1.3 AC97, ETHERNET TESTING... 13 4.1.4 MAC TESTING... 13 4.2 CONCLUSIONS... 13 iii

5. COST... 14 6. CONCLUSIONS... 15 REFERENCES... 16 APPENDIX A BLOCK DIAGRAMS... APPENDIX B HARDWARE DESCRIPTION LANGUAGE SCHEMATICS... APPENDIX C SIMULATIONS... APPENDIX D SPECTRAL ANALYSIS... iv

1. INTRODUCTION Pod-Casting is a hardware-based audio-content streaming project designed and implemented to run on the XUP development boards from Diligent. These boards feature Xlinix s Virtex-II Pro FPGA along with numerous other I/O interfacing chips including National Semiconductor s LM4550 AC97 audio codec and Intel s LXT972 Ethernet transceiver. 1.1 Motivation and Purpose The motivation for Pod-Casting is for devices like ipod s to become more extensible by adding network connectivity so similar devices in close physical proximity can share their music. The general idea is to bring the extensible and popular P2P world of file sharing into the daily world without the requisite need of a complete PC. The social and legal ramifications of such widespread usage would have to be considered once the technology became fully functional. One technical issue with such device-based P2P networks is bandwidth and compatibility. To address the issue of bandwidth further research would be required to formulate a resource sharing mechanism similar to how the Torrent network operates every user while downloading any given file also uploads an equal amount of file content to other users who may be downloading the file. Extending this idea to the realm of Device-P2P, a typical situation might be where a single user could be broadcasting the same audio to several client users it would only to be equitable if the client users also shared whatever content they may have already received from the broadcaster to their fellow client users. Although this is a very high-level description of the bandwidth sharing regime that would need to be implemented for Device-P2P to be popular, it nonetheless encapsulates the general idea of equitable share on all users, not just the primary broadcaster who has the audio content. Although the current implementation of Pod-Casting uses Ethernet to support network connectivity, for compatibility reasons the project would need to support something like Bluetooth to support users with different vendor audio-devices. 1.2 Specifications Pod-Casting was initially envisioned to implement both the MPEG decoding and networking functionality. However, it was soon realized that in the limited time period available for the project, it would be best to pursue only one of these major goals as such, the project initially was set to be a audio broadcasting project with an Ethernet MAC layer for connectivity. Both the LM4550 Audio Codec interface and the LXT972 Ethernet transceiver interface were developed independently and eventually were brought together and connected. Besides interfacing with the additional chips, FIFO queues and arbiters also had to be included in order to accommodate the varying clock speeds at which the devices operated at (for example, the LXT972 operating at 100 Mbps is clocked at 25 MHz while the LM4550 operates at a 12.5 MHz frequency). In addition, keyboard and VGA devices were incorporated to provide an asynchronous means of providing textual information about the current stream. Since limited memory was issue during the course of development of the project, Pod-Casting utilizes the external add-on SRAM module for reading audio file content. Even though there are 136 Select+ BlockRAMs on board the Virtex-II Pro, the issue of loading arbitrary audio files onto the FPGA would be to cumbersome and infeasible given that the BlockRAMs are not intended to replace permanent storage, rather facilitate calculations. Pod-Casting utilizes the add-on USB module to transfer files onto 1

the SRAM module. SATA interfacing with a hard drive or Compact Flash would be adequate to address the storage issue. 1.3 Subprojects The design was broken into many modules, which each perform specific tasks. The modules we decided to implement were: 1.3.1 LM4550 AC 97 Audio Codec The LM4550 [1] is an audio codec (coupled with a stereo amplifier) which supports the widely used AC97 audio standard from Intel. The AC97 codec standard provides an abstraction around the process of sampling/playing-back audio data by internally handling all the A/D and D/A conversions the codec-designer interface is serial PCM (Pulse Code Modulation) data. Hardware implementation was required to configure the codec for both digital playback and sampling data in from the Stereo Line-In input. The codec interface was responsible for providing the audio data for streaming. 1.3.2 Media Access Layer and LXT972A PHY Transceiver In order to stream the audio/textual data, the Ethernet network was utilized. The onboard LXT972A PHY Transceiver [2] provided very rudimentary, low-level functionality like encoding data in Manchester Encoding or MLT4/5 block encoding for transmission across the wire. Hardware implementation was still required to control the PHY; in addition, for extended connectivity, the Media Access Layer was implemented which operates at the frame level. It was responsible for generating Ethernet frames and embedding the audio data within them. 1.3.3 PS/2 Keyboard Interface In order to interface with the PS/2 port and keyboard, a hardware controller needed to be implemented which read data from the PS2_DATA using the PS2_CLOCK signals coming out of the PS/2 port. Standard device-to-host communication protocol was adhered to extract data from the PS2_DATA signal. 1.3.4 SVGA Adaptor and FMS3818 Triple Video DAC Driving a SVGA display involved passing the digital representation through the Video DAC hardware (FMS3818 chip [4]) which converts the digital pixels into analog voltages used by the monitor. However, the FMS3818 hardware is just simply a Video DAC as illustrated in [4]. Separate hardware needed to be implemented to drive the control signals to the display and provide pixel values at the right rate. In addition, hardware needed to be implemented for managing the timing of the VGA signals i.e. HSYNCH, VSYNCH, and BLANK. We targeted VGA resolution (640 X 480 at 256 colors) 2

2. DESIGN PROCEDURE 2.1 LM4550 AC 97 Audio Codec National Semiconductors LM4550 chip supports the AC97 Audio Codec from Intel. The AC97 standard supports 4-48 KHz audio resolution with 20-bit samples (the LM4550 only supports 18-bit samples). The standard deals with directly processing Pulse Code Modulated (PCM) audio samples applying the necessary DAC conversions to output the sound to Line Out. The AC97 standard specifies communication with the audio codec over a serial interface in frames of 256 bits. The audio codec provides a BIT_CLK operating at approximately 12.5 MHz data on the SDATA_OUT/IN signals change on rising edges of the BIT_CLK. The frames are the unit of communication with the audio codec, as the codec will discard frames not exactly 256 bits. Frames are used for both configuring the codec and transmitting audio samples. The 256 bit frame is divided into 1 16-bit tag slot followed by 12 data slots of 20-bits each; a new frame is indicated by holding the SYNCH signal high for 16 clock cycles as shown below in Figure 1. Figure 1 AC97 Frame Format [5] Before the LM4550 audio codec can be utilized to play back sampled WAVE files or streamed data, the codec must be configured to enable various gain registers on the playback path. The codec by default starts out by muting the Line Out and Headphone Out ports, in addition to DAC gain registers. These registers must be configured in addition to others on the path from Line-In/MIC-In to the FPGA utilizing frames. Another non-standard feature of the LM4550 is that it enters into self-test modes if configured improperly and requires a cold-reset to be restored to functional mode. Although there are 12 slots each consisting of 20-bits of data, only slots 3 and 4 are used to carry sampled left and right channel audio data. The tag slot serves to inform the codec on the status of the rest of the slots that will follow: the first bit indicates whether the current frame is valid, followed by subsequent bits to indicate the status of the rest of the slots (unused bits are zeroed throughout the frame). When configuring the codec, slot 1 and 2 are used to carry the register to configure and the associated value to write to the register. As seen in the Figure 1 above, the SYNCH signal should be asserted one clock cycle before the tag slot starts and should be de-asserted before the last bit of the tag slot is transmitted to the codec over SDATA_OUT. Sampled data from Line-In/MIC-In is transmitted from the codec to the controller (the FPGA) over the SDATA_IN signal. Data transmitted from the codec to the controller follow the same protocol and conventions that are utilized for controller-to-codec communication. Audio data after being sampled and passed through the ADC s is transmitted in slots 3 and 4 (left and right channels) and are ready for playback on the following frame. 3

2.2 Media Access Layer and LXT972A PHY Transceiver The LXT972 Ethernet transceiver from Intel operates at the physical layer of the OSI 7-layer model. The following diagram (Figure 2) describes its position and functionality. Figure 2 Functionality and positioning of the PHY Transceiver within the OSI 7-layer model [2] The LXT972 is responsible for taking data and transmitting them over the CAT-5 cable in addition to establishing/negotiating a link with other Ethernet devices. At 100BASE-X, the transceiver utilizes 4B/5B block coding which is then encoded into MLT-3 codes (in 10BASE-X the data is Manchester encoded) to transmit over the wire. In addition to being responsible for negotiating/establishing links with other devices, the LXT972 also implements the Media Independent Interface (MII) which allows the transceiver to be connected to any MAC (which also supports MII) which operates at level 2 in the OSI 7-layer model. The MAC is responsible for implementing the complete 802.3 Ethernet standard and also sits in between higher levels like the TCP/IP stack which operates on IP packets and the absolute low level transceiver. Part of implementing the MII, the transceiver provides a Management Data Input Output (MDIO) interface to configure the chip and check the status of various internal registers in the device. The MDIO is also a serial interface which operates at a maximum of 8 MHz (the clock is provided by the controller (the FGPA) to the transceiver). Fortunately, minimal effort is required for configuring the LXT972 because of the way it is implemented on the XUP board by Xlinix; it is automatically set to autonegotiate links at 10/100 Mbps, half/full duplex. When in auto-negotiate mode, the transceiver sends Fast Link Burst (FLP) signals to the device on the other end to establish connectivity; using FLP signals both devices advertise their capabilities and the fastest mode of common operation is chosen. The project s interface includes support for testing to determine if the link is operational and configuring the transceiver LEDs to provide information about link status, speed, and transmit/receive status. Pod-Casting includes support for communicating between multiple XUP boards using standard Ethernet cables and a networking switch; the full 802.3 Ethernet standard is supported. Figure 3 illustrates the basic Ethernet frame structure. 4

Figure 3 Ethernet Frame Formats [2] The designs of the MAC consists of separately constructing the individual fields described in Figure 3 and then bringing them together to create the Cyclic Redundancy Check (CRC) after which all the fields are transmitted across the PHY Transceiver. Separate storage via shift-registers is allocated to fields of standard size such as the Preamble (64 bits), the Source and Destination Address (48 bits each) and the Field Length (16 bits). All Ethernet frames must be a certain minimum length which is 46 bytes as a result, the Data Field is padded to this length to ensure that the Ethernet Switch does not drop the packet after classifying it as a runt frame. Before the MAC frame can be assembled for transmission over the Ethernet wire, the Frame Check Sequence or CRC has to be calculated the generation of the CRC is a not trivial exercise. The CRC is generated on the Destination and Source Address, Length Field and the Data field. The generation of the CRC relies of assuming the input data as one polynomial M(x) and performing multiplication and division operations using a second reference polynomial G(x) G(x) is specific for Ethernet CRC and differs for standard file CRC. G(x) is given below as Equation 1. Another important feature of Ethernet frames is that they are transmitted Least Significant Bit first for all fields in the order given above in Figure 3 however, the FCS is transmitted Most Significant Bit first. A suitable Inter-Frame Gap must also be inserted in order to prevent successive frames from being discarded as invalid data. (1) 2.3 SRAM Interface The add-on expansion SRAM [3] module is utilized in conjunction with the USB module. Files from a PC are copied utilizing Diligent s USB software, while a separate bitfile (also from Diligent) is responsible from interfacing between the USB and SRAM modules and writing data to the SRAM chips. The SRAM module consists of two 8-bit 512 KB ISSI chips which are addressed separately using a 19- bit address bus and Chip Select (CS0, CS1) signals. The SRAM supports a relatively fast access time of 10/12 ns; however, because the chip is single-ported, doing simultaneous read and write operations became unreliable. SRAM access speed is not a critical factor in the project s design because the audio codec operates at a much slower clock period (approximately 80 ns). In addition, besides reading the data for playback every frame, there is no time critical usage of the SRAM. 2.4 PS/2 Keyboard Interface Our design of the scan code reader consists of a single block which takes the PS2_CLK1_H, PS2_DATA1_H (ps/2 signals), and PB_ENTER (push button onboard the XUP board) signals. The 5

interface reads from the PS2_DATA1_H signal until the stop bit is received and the keyboard clock stops pulsating. The particular design is simple and efficient for the given task. We did not have to worry about the break codes since they are the same as the scan codes (appended by a constant). A design utilizing a state machine would have achieved the same, but required creating multiple states for each bit received. 2.5 SVGA Adaptor and FMS3818 Triple Video DAC The SVGA design consisted of several high-order blocks: 2.5.1 Clk_divider This block provides clock down conversion functionality. It accepts the system clock, a divisor specification, and reset as inputs. It outputs a down-converted clock signal as determined by the divisor. Specifically, if T represents the clock period of the input clock and D is the integer given by the divisor, then this block s output has a period of 2*D*T. This unit sources the debouncing circuit clock. 2.5.2 Debounce_XUP This block is intended to debounce keypresses with the assistance of the 100Hz clock generator. It accepts a raw button input and clock, and it outputs a debounced, non-inverted button signal. 2.5.3 Defaults Default values to other blocks are supplied here. For example, lengths of horizontal and vertical blanking intervals are defined and output, as are most of the constant or ported values into any other block. 2.5.4 VGA_sync After accepting relevant parameters (clock, horizontal and vertical resolution, blanking lengths, and synchronization lengths) for the desired resolution, this block produces horizontal and vertical synchronization signals along with blanking signals. A pixel clock at 25MHz (for 640 x 480 at 60Hz refresh) is also generated off the input clock. 2.6 Character Generation The design of the Character Generation unit consists of central pixel data ROM, a Finite State Machine acting as controller for displaying the characters and several combinational logic blocks for address generation and for book keeping purposes i.e. number of pixels displayed, number of characters finished, etc. 6

Character pixel data had to be generated and obtained through ASCII art because there are no onboard palettes available on the XUP board. Alternatively, a bmp image file containing font data could have been utilized but issues with decoding the bmp file in addition to storage concerns lead to not using this approach. One important factor was that the SRAM where the bmp file would be stored would need to be constantly read in order to keep the display on the VGA valid previous experience with the SRAM module showed that it could become unreliable with excessive read requests. Drawing the characters on the VGA was implemented using a State Machine (FSM). The FSM sequences through all the characters that need to be drawn and sequentially copies the first pixel data entry before moving onto the next pixel data line for all the characters again. This is necessary because of the low-level nature of the implementation where we are forced to deal directly with VGA timing signals. In conventional implementations, data that needs to be drawn to the VGA would be copied to a set video memory location and that video memory is dumped directly however, since we have no video memory, the above approach had to be utilized. 7

3. DESIGN DETAILS 3.1 LM4550 AC 97 Audio Codec The LM4550 interface controller is responsible for both configuring the audio codec as well as supplying it either PCM samples from the WAVE file residing on the SRAM or playing back the sampled audio data received from MIC-In/Line-In. Audio data sampled in as well as transmitted via Line-out is sampled at 48 KHz (although the codec supports Variable Bit Rate audio as well). 3.1.1 Configuring the Codec In order to facilitate configuring the codec, a configuration ROM is implemented which contains tag, register address slot, and register value slot triples for configuring the various codec registers. To allow audio data to be transmitted on Line-Out and the Headphone jacks, the PCM Gain Register (0x18), the Master Volume Register (0x02) and Headphone Volume Register (0x04) should be un-muted and configured to an appropriate gain value. In the input path, the MIC Gain Register (0x0E) and Record Gain Register (0x1C) need to be set up appropriately. In addition to the above mentioned registers, the National 3D Sound Circuitry and PC Beep Tone Registers are also disabled to avoid any signal interference. The first frames are thus purely configuration (configuration data is transmitted in slots 1 and 2). Counters are set up to count the number of rising edges of AC97_BIT_CLOCK on which data is transmitted. A separate counter keeps track of the number complete frames that have been transmitted and is used to index into the ROM. Besides holding configuration triples, the ROM also has one slot for the data frame tag, which is always read. The state machine ensures that data from the ROM is directed into the left shift register which forms the data to be transmitted on AC97_SDATA_OUT is always chosen when a new frame starts. 3.1.2 WAVE file playback from SRAM and MIC-In/Line-In The core functionality of the audio codec controller interface relies on a state machine which helps transmit data on the rising edge of the AC97_BIT_CLOCK and assert SYNCH signals at appropriate times during the frame transmission. Since only the first four data slots (besides the tag slot which is necessary for all frames) are necessary for both configuration frames and data frames, a single state machine handles both types of transmissions. Based on the current count of frames sent, a determination is made about whether it is necessary to send empty data on slot 1 and 2 or whether valid data needs to be transmitted. State slot_x and slot_y represent the generic slot on which valid data is transmitted (for configuration frames that would translate to slot 1 and slot 2, while for data frames that would be slot 3 and slot 4). Valid data is preloaded prior to entering either of these states by the preceding state the only action preformed by the above states is to enable the shift signal that feeds into the left shift register which is supplying the state machine data in MSB format to send over the SDATA_OUT line. If WAVE files are being played, a separate SRAM address generating block is utilized which generates the address based on the current frame count and a counter that counts up to 4. The WAVE files are 8

encoded to store 16-bit stereo sound and as such, every sample consists of 4 bytes of data bytes 0 and 1 represent the left channel sub-sample and bytes 2 and 3 the right channel sub-sample. Before entering slot_x, the state machine asserts the appropriate signals to start the data fetching from the SRAM as mentioned earlier, time is not critical because not only is the period of the AC97_BIT_CLOCK approximately 80 ns but in addition, 40 BIT_CLOCK cycles are expended before ever needing the left channel data in the case of the right channel data, it is fetched starting on the 19 and 20 th bit transmission of slot_x. Also, because the WAVE file samples are encoded as little-endian, they have to be exchanged before being transmitted to the codec because the codec requires data in big-endian format. Selection between playback of the WAVE file contents from the SRAM and sampled audio data from MIC-IN/Line-IN is made by a user controller dip switch on the XUP development board. A single 40-bit left-shift register is maintained to hold a single data-in sample at any given time; this is all that is required for on-demand playback because if the data is sampled on the current frame, it will be available for playback on the next frame. The codec interface controller is also extended to take a 40-bit streamed PCM sample and allow playback in lieu of the initial project goal to support audio streaming. In addition, sampled audio data or WAVE file samples are sourced for transmission over Ethernet. 3.2 Media Access Layer and LXT972A PHY Transceiver The Ethernet PHY interface consists establishing a link between XUP boards and then subsequently programming the transceiver LEDs. Once the interface controller determines the transceiver status and configuration, it is available to the rest of the design to transmit and receive data over the connection. The MDIO communication is a serial process and requires a controller generated clock the 32 MHZ board clock is utilized to derive an 8 MHz MDC clock to establish the bidirectional data transfer over the signal MDIO. As was the case with the audio codec configuration, read and write commands to the transceiver are stored in ROM for ease of programming and updating as well as extensibility (it is much easier to add or remove new configuration commands if they are in a ROM structure as opposed to other means). The transceiver samples data from the MDIO on the rising edge, and confirms all data transferred from the controller is valid prior to the rising edge of MDC; however, when the transceiver returns data, it is valid on the rising edge of MDC. The transceiver provides the controller for both the TX_CLOCK and RX_CLOCK (which operate at approximately 25 MHz when in 100BASE-X mode). The interface unit is responsible for fetching the 36-bit audio sample from the audio codec, packaging it into a Ethernet frame by prepending the 64-bit preamble, appending the Source and Destination Addresses and the Length Field, generating the CRC on these fields, and transmitting the data over TX_DATA while asserting TX_VALID to indicate the validity of the data. Because the transceiver operates using two-pairs of wires (one for transmitting and one for receiving), it is possible that at the same time the similar receiving logic is performing the reverse operations on received data over RX_DATA. As mentioned earlier in Section 2.2, the MAC design consists of storing the contents of an Ethernet frame in separate storage registers and collating them when necessary. Shift registers feature prominently in the implementation of the MAC. The Preamble field has a standard value and as such is hard-coded to 0x5D. For the Destination and Source Addresses, the onboard Dip Switches are utilized to assign unique addresses to each board since we are not on a real network, there is no concern of MAC address conflicts. (The onboard Silicon Serial Number contains a unique 48-bit MAC address if we ever wanted to have real addresses however, those are not registered with IEEE, so again would not be useful in a real network context) An implementation detail that was ascertained through successive 9

orders of testing was that Ethernet Switches do not allow successive frames with the Destination Address set to the universal Broadcast Address to be transmitted to other ports on the Switch this is to prevent Broadcast Storms, a common network exploit. As such, we have to sequence through all the possible addresses available and transmit every other frame to a particular device. Another approach that was utilized was sending a dummy frame in between broadcast frames to reset the Switch and allow a port to keep transmitting (Switches generally disable the port sending out Broadcast frames for 1-100 ms timeout time however, if a frame with a uniquely addressed destination is sent by the same port, it resets the timeout). The CRC generation proved to be a difficult part to implement. We used the hardware design given in the Xlinix Application Note [8] to generate our CRC s. Quirks related to the CRC generation include the fact that the resulting CRC has to be bit-reversed and complemented. In addition, the first 32 bits on the input data have to be complemented which is accomplished by setting the CRC register to all 1 s. 3.3 PS/2 Keyboard Interface The implementation of the ps/2 interface consisted of a VHDL process, sensitive of PS2_CLK1_H and PB_ENTER, which read the PS2_CLK1_H signal on the falling edge (standard device-to-host communication protocol), and read the contents of PS2_DATA1_H signal, storing it in a register. A 10- bit shift register was implicitly defined in the block to store the scan code and parity. A shift register was necessary since the keyboard controller sends the data in frames of 11-bits (start bit, 8-bit data, parity bit, stop bit) starting with the least significant bit. As was the case with the design, the implementation is simple yet efficient. In addition, the block also detects when the break code is sent; in addition for validation purposes, the implementation also includes a timeout period detection after which the block is certain that the device-host communications has finished. 3.4 SVGA Adaptor and FMS3818 Triple Video DAC Only implementation of non-trivial blocks is covered here. 3.4.1 VGA_sync To produce the required 25 MHz pixel clock from the 100 MHz system clock, a standard clock divider is used with an input divisor of 1. This down converts the 100 MHz clock appropriately. Two counters are present to track when horizontal and vertical synchronization pulses should be sent. The horizontal sync counter is clocked off the pixel clock and counts up to one line s worth of pixels (800) as specified by the input port Hres. When the maxcount output goes high, we know one line has been traversed. This output clocks the second counter, which tracks lines in a similar manner. When the second counter s maxcount port goes high (after 520 lines), then an entire frame has been displayed. In order to output synchronization signals at the correct times, two small state machines are each used to generate horizontal and vertical syncs and blanking signals. They are almost identical in structure, so only the state_horiz state machine will be detailed here. The parameters state_horiz requires two outside parameters the length of the synchronization and blanking intervals in terms of pixel clock cycles. 10

Upon reset, the fsm remains in the idle state until 640 pixels of real data have been displayed, as tracked by the horizontal sync counter (known to the fsm as h_count). Next, the front porch is counted off in the Assert_blank state, where only the blanking signal is asserted. Next, the horizontal sync along with blanking is asserted in the Assert_hsync stage. The fsm remains in this state for hsync_len cycles. Finally, the fsm reaches the back porch of the VGA signal in the Deassert_hSync stage. Here, the output blanking signal remains asserted, but the fsm lifts the horizontal sync. The fsm exits Deassert_hSync once the entire line has been read and maxcount on the coupled counter goes high (the signal h_trigger to the fsm). The return back to Idle indicates the start of a new line. The fsm state_vert functions in a similar manner, except that relevant parameters are stated in terms of lines. The blanking signal from both state machines are OR d together to produce Blank_Z, the net blanking signal. An OR is used because both outputs are active low. Although it may be possible implement this component in the absence of a formal state machine, some memory of state, whether in the form of a counter or flip-flops, will be required. More combinational logic could be used to create identical functionality, but this requires a significantly higher level of complexity. As such, the use of two relatively state machines is preferred in our design. 3.5 Character Generation The implementation of the Character Generation is fairly intuitive. The State Machine (FSM) takes in several inputs like the output of counters keeping track of the number of pixels/characters displayed; in addition, it also takes in the current horizontal and vertical pixel count as generated by the VGA Timing Generator as described in section 3.4 The FSM starts off when the horizontal/vertical pixel count is at a certain location i.e. top right. This is necessary because the display is refreshed every 16.67 ms (60 Hz refresh rate) and its necessary that the content is drawn at the same position every time otherwise it will seem that the data is moving across the screen. The subsequent states are responsible for setting signals to load the combinational logic which take the ASCII representation and convert it to the character ROM specific address. The 8-bit data from the character ROM is loaded into a left shift register that register is then left shifted 8 times and each bit is copied to the VGA display using the VGA RGB signals. After the first line of the first character is drawn, the first line of the next character is fetched and the process is repeated. After the FSM has finished drawing the first lines of all the characters, everything is reset and focus moves onto second lines of all the characters. This process is repeated until all the characters are fully drawn. (See Appendix C.2 for simulation result) 11

4. DESIGN VERIFICATION The primary design verification that we wanted to carry out was predicated upon the successful implementation of the Ethernet MAC layer, where we would connect five XUP boards to an Ethernet switch and try out different configurations of boards broadcasting and streaming data to verify that our implementation could be used in a large scale networked context. This is precisely what we did and verified that the MAC layer was fully functional. 4.1 Tests Given the nature of the project where we were dealing with a FPGA development board, direct testing was a difficult prospect because you could not simply connect up oscilloscope or logic analyzer probes on packaged IC chips and observe the state of the device. Instead, we had to primarily rely on simulation testing to verify that the hardware designs we implemented were functional and behaved as expected. Another important point is that it is difficult to simulate the behavior of external chips like the audio codec and the Ethernet PHY transceiver without having external libraries that define its behavior to the simulator. As such, we had to assume that the chips would behave as outlined in the datasheets for simulation testing purposes. There are a limited number of debug header pins on the development board that can be connected to a logic analyzer we utilized them to do individual component testing as a last resort to determine why a component was not working as we designed it to. Besides simulation and logic analyzer testing, the successful run of the Synthesis tool indicated to a large extent that the design was successfully mapped into hardware and that the design did not contain too many behavioral components. 4.1.1 Counter Simulation Testing We designed a 20-bit counter with a separate enable pin to prevent gating the clock. In addition, two separate inputs are available to define from where to start counting and when to stop. The counter was used extensively throughout the project; therefore we felt that simulation verification of its correct operation was necessary. The simulation output with a description of the test can be found in Appendix C.1 4.1.2 AC97 Codec Simulation Testing The AC97 Codec simulation was used to verify that the SYNCH signals were being asserted at the appropriate time as well as the fact that the frames were off the correct length. The simulation output with a description of the test can be found in Appendix C.3 4.1.3 VGA Timing Generator Simulation Testing Initially, a misunderstanding of the VGA standard caused a great deal of problems. It was erroneously thought that a notion of time, not of pixel count, dictated the length of the various porches. Upon changing the design to count pixels and lines for the various porches, the design functioned as expected. The simulation in Appendix C.4 shows the assertion of HSYNC, Blank and VSYNC signals at appropriate times and was matched with the VGA timing reference diagram given in Appendix A.4 12

4.1.3 AC97, Ethernet Testing Even though the functional testing verified that the AC97 plus Ethernet worked as expected, we preformed logic analyzer testing to ensure that the frames were of the appropriate lengths. In addition, at times the sound on the remote side had noise and static we used the logic analyzer to isolate and solve that problem. The logic analyzer output can be found in Appendix C.5 with a description of the output. 4.1.4 MAC Testing To ensure the functionality of the MAC layer, we took an incremental approach to testing. The first step was to produce a design that appeared to simulate properly in ModelSim. The next step was to synthesize the design, and send the design to the XUP boards. For our initially hopeful tests, we connected two boards to each other using a crossover cable. We looked at the results using a logic analyzer to verify that data could properly be transferred between two devices. Next we connected the two devices to an Ethernet switch. Looking at the logic analyzer traces, we realized that our frames were being dropped. This resulted in much looking at the frames that we were sending. This led to our next test which involved connecting one device directly to the computer using a crossover cable. The computer would try and capture the frames produced by our device using the program Ethereal [6]. However, this approach did not work. Frames that are considered invalid by the Ethernet card are discarded without notifying software. This led to the next approach to see what a valid frame is. Winpcap [7] is a library that provides an API for capturing and sending raw frames from the Ethernet card. We modified an example program to send a frame to our device. We then would look at the results captured by the logic analyzer. We found that the preamble and SFD (Start Field Delimiter) were different than specified in the IEEE 802.3 standard. The extensive testing finally led to a breakthrough when we were able to successfully generate a valid CRC which had been our main problem. A misreading of the Xlinix CRC Application Note [8] had led to a misunderstanding with respect to utilizing the CRC module. 4.2 Conclusions The testing outlined above helped verify that the components were working as expected and provided a greater measure of confidence in the final integrated design working properly. In the end, we stuck to our initial Design Review testing plan to verify the correct functioning of our project. 13

5. Cost Table 1 Labor and Parts Breakdown Labor Haider Ali ($50/hr) x 2.5 x 100 hrs $12,500 Kent Borecky ($50/hr) x 2.5 x 100 hrs $12,500 Parts XUP Development Board (Available) $ 300 ISSI IS61LV5128T, 512K x 8 SRAM Module (Available) $50 Cypress CY7C68013 USB controller (Available) $50 D-link Ethernet Switch $25 Dynex 7 CAT6 Network Cable (5 cables) $(15*5) TOTAL $25,500 In the end, our labor and parts cost were in line with our estimates during the Design Review as suggested by Table 1. Given the fact that the project was a proof-of-concept, it would not be feasible to project any price for which the project could be sold to end-users. Rather, if this was to be sold to customers, an ASIC or packaged FPGA device with only the necessary external devices would be more appropriate. For a ball park reference, given that today ipods are sold for around $300, a device with ipod functionality in addition to network connectivity would be around $400-$500 range. 14

6. CONCLUSIONS From the outset, Pod-Casting was a rather ambitious project even while just focusing on the Ethernet MAC implementation. Although we were unable to initially get the MAC layer fully functional, we however did manage to fix the problems we were having with the CRC module of the Ethernet Frame structure. After fixing this problem, our project was able to support multiple XUP boards communicating via an Ethernet switch and broadcasting and listening to streams simultaneously. In addition, we were able to integrate the keyboard and VGA for asynchronous audio information update into the rest of the design implementation. As such, Pod-Casting adhered to the initial project plan to implement an audio streaming architecture, with all the features initially proposed in the design review. A certain improvement on the current implementation would involve more efficient means to broadcast data; in the current paradigm the broadcaster acts like a AM/FM Radio station by sending out data to everyone given the constraints on bandwidth, it might be more useful to allow other devices to notice the presence of a broadcasting device and manually select to join the broadcasting. In this way, the issue with not being able to use the broadcast address would then be reduced to sequencing through all connected devices listening to the audio stream and ensuring that everyone gets all the audio frames to prevent static and noise on the received side. Given that we are operating at such a low level of abstraction where there is no reliable delivery mechanism, the next step towards an improved streaming experience would be to implement the TCP stack and IP and/or UDP packets to transfer the audio data. Given the widespread availability of cheap DSP chips that perform hardware-based MPEG Layer III decoding, it begs the question whether it is really worth re-inventing the wheel at least with respect to the decoder part of the project. Nonetheless, even after this experience, the project developers feel that the basic premise of the project is valid and that it would be interesting to see the direction in which a device-p2p network would flourish towards. A more realistic enterprise could perhaps utilize a soft-core RISC processor for the Virtex-II Pro using which the MPEG decoder could be implemented. 15

REFERENCES [1] National Semiconductors, LM4550 AC 97 Rev 2.1 Multi-Channel Audio Codec with Stereo Headphone Amplifier, Sample Rate Conversion and National 3D Sound, [Online Document], May 2004 [cited 12 Jun 2005], Available HTTP: http://www.national.com/ds.cgi/lm/lm4550.pdf [2] Intel, Intel LXT972A 3.3V Dual-Speed Fast Ethernet Transceiver Datasheet, [Online Document], August 2002 [cited 16 Jun 2005], Available HTTP: http://www.intel.com/design/network/products/lan/datashts/24918603.pdf [3] ISSI Semiconductors, 512K x 8 HIGH-SPEED CMOS STATIC RAM, [Online Document], February 2003 [cited 16 Jun 2005], Available HTTP: http://courses.ece.uiuc.edu/ece412/docs/xup/61lv5128al.pdf [4] Fairchild Semiconductors, FMS3818 Triple Video D/A Converter 3 x 8 bit, 180 Ms/s, [Online Document], November 2001 [cited 16 Jun 2005], Available HTTP: http://courses.ece.uiuc.edu/ece412/docs/xup/fms3818krc.pdf [5] MIT., Audio Input and Output, [Online Document], April 2004, [cited 30 Jun 2005], Available http://web.mit.edu/6.111/www/s2004/newkit/audio.shtml [6] Ethereal., Network Analyzer Software, [Online Website], July 2005, [cited 30 Jun 2005], Available http://www.ethereal.com/ [7] Winpcap, Network Capture API, [Online Website], July 2005, [cited 30 Jun 2005], Available http://www.winpcap.org/ [8] Xilinx, 802.3 CRC App Note [Online Document], July 2005, [cited 30 Jun 2005], Available http://direct.xilinx.com/bvdocs/appnotes/xapp209.pdf 16

Appendix A Block Diagrams Appendix B Hardware Description Language Schematics Appendix C Simulations Appendix D Spectral Analysis 17