$< XILINX XC4000 Series Field Programmable Gate Arrays July 30, 1996 (Version 1.03) Product Specification XC4000-Series Features Note: XC4000-Series devices described in this data sheet include the XC4000E, XC4000EX, XC4000L, and XC4000XL. This information does not apply to the older Xilinx families: XC4000, XC4000A, XC4000D or XC4000H. For information on these devices, see the Xilinx WEBLINX at http:/Avww.xilinx.com. Third Generation Field-Programmable Gate Arrays - Select-RAM memory: on-chip ultra-fast RAM with - synchronous write option - dual-port RAM option - Fully PCI compliant (speed grades -3 and faster) - Abundant flip-flops - Flexible function generators - Dedicated high-speed carry logic - Wide edge decoders on each edge - Hierarchy of interconnect lines - Internal 3-state bus capability - 8 global tow-skew clock or signal distribution networks System Performance to 66 MHz e Flexible Array Architecture Systems-Oriented Features - {EEE 1149.1-compatible boundary scan logic support - Individually programmable output slew rate - Programmable input pull-up or pull-down resistors - 12-mA sink current per XC4000E output (4 mA per XC4000L output) * Configured by Loading Binary File - Unlimited reprogrammability * Readback Capability * Backward Compatible with XC4000 Devices * XACTstep Development System runs on '386/'486/ Pentium-type PC, Sun-4, and Hewlett-Packard 700 series - Interfaces to popular design environments - Fully automatic mapping, placement and routing - Interactive design editor for design optimization - RAM/ROM compiler Low-Voltage Versions Available Low-Voltage Devices Function at 3.0 - 3.6 Voits * XC4000L: Low-Voltage Versions of XC4000E devices * XC4000XL: Low-Voltage Versions of XC4000EX devices Additional XC4000EX/XL Features Highest Capacity Over 130,000 Usable Gates Additional Routing Over XC4000E - almost twice the routing capacity for high-density designs Buffered Interconnect for Maximum Speed New Latch Capability in Configurable Logic Blocks * Improved VersaRing |/O Interconnect for Better Fixed Pinout Flexibility Flexible New High-Speed Clock Network - 8 additional Early Buffers for shorter clock delays - 4 additional FastCLK buffers for fastest clock input - Virtually unlimited number of clock signals * Optional Multiplexer or 2-input Function Generator on Device Outputs * High-Speed Parallel Express Configuration Mode Improved I/O Setup and Clock-to-Output with FastCLkK and Global Early Buffers 4 Additional Address Bits in Master Parallel Configuration Mode Introduction XC4000-Series high-performance, high-capacity Field Pro- grammable Gate Arrays (FPGAs) provide the benefits of custom CMOS VLSI, while avoiding the initial cost, long development cycle, and inherent risk of a conventional masked gate array. The result of eleven years of FPGA design experience and feedback from thousands of customers, these FPGAs com- bine architectural versatility, on-chip Select-RAM memory with edge-triggered and dual-port modes, increased speed, abundant routing resources, and new, sophisticated soft- ware to achieve fully automated implementation of com- plex, high-density, high-performance designs. The XC4000 Series currently has 19 members, as shown in Table 1. July 30, 1996 (Version 1.03) 4-5XC4000 Series Field Programmable Gate Arrays Table 1: XC4000-Series Field Programmable Gate Arrays | | Max. Max Logic | Max. RAM Typical | Total | Number Decode | Gates Bits Gate Range CLB Logic of Inputs Max. Device (No RAM) (No Logic)|(Logic and RAM)*| Matrix Blocks Flip-Flops; per side | User I/O XC4003E 3,000 3,200 2,000 - 5,000 10 x 10 100 360 30 80 XC4005E/L | 5,000 6,272 3,000 - 9,000 14x 14 196 616 42 112 XC4006E 6,000 8,192 4,000- 12,000 | 16x16 256 768 48 128 XC4008E 8,000 10,368 6,000 - 15,000 18x 18 324 936=| =O 144 XC4010E/L 10,000 12,800 7,000 - 20,000 20x20 | 400 1,120 60 160 XC4013E/L 13,000 18,432 | 10,000-30,000 | 24x24 | 576 1,536 72 192 XC4020E=.20,000 25,088 | 13,000-40,000 28x28 | 784 | 2,016. 84 224 XC4025E 25,000 32,768 | 15,000-45.000 32x32 | 1,024 2,560 96 256 | XC4028EX/XL 28,000 32,768 | 18,000-50,000 32x32 | 1,024 2,560 96 256 | | XC4036EX/XL | 36,000 41,472 ' 22,000-65,000 36x36 1,296 3,168 108 288 | XC4044EX/XL | 44,000 51,200 27,000-80,000 | 40x40 ~*+1,600 3,840 420 320 XC4052XL 52,000 61,952 33,000-100,000| 44x44 1936 4,576 132 352 XC4062XL_| 62,000 73,728 | 40,000- 130.000 48x48 2,304 _| 5,376 144 384 Larger Devices Available in the First Half of 1997 Note: Throughout the functional descriptions in this docu- ment, references to the XC4000E device family include the XC4000L, and references to the XC4000EX device family include the XC4000XL, unless explicitly stated otherwise. References to the XC4000 Series include the XC4000E, XC4000EX, XC4000L, and XC4000XL families. All func- tionality in low-voltage families is the same as in the corre- sponding 5-Volt family, except where numerical references are made to timing, power, or current-sinking capability. Description XC4000-Series devices are implemented with a regular, flexible, programmable architecture of Configurable Logic Blocks (CLBs), interconnected by a powerful hierarchy of versatile routing resources, and surrounded by a perimeter of programmable Input/Output Blocks (IOBs). They have generous routing resources to accommodate the most complex interconnect patterns. The devices are customized by loading configuration data into internal memory cells. The FPGA can either actively read its configuration data from an external serial or byte- parallel PROM (master modes), or the configuration data * Max values of Typical Gate Range include 20-30% of CLBs used as RAM. can be written into the FPGA from an external device (slave, peripheral and Express modes). XC4000-Series FPGAs are supported by powerful and sophisticated software, covering every aspect of design from schematic or behavioral entry, floorplanning, simula- tion, automatic block placement and routing of intercon- nects, to the creation, downloading, and readback of the configuration bit stream. Because Xilinx FPGAs can be reprogrammed an unlimited number of times, they can be used in innovative designs where hardware is changed dynamically, or where hard- ware must be adapted to different user applications. FPGAs are ideal for shortening design and development cycles, and also offer a cost-effective solution for produc- tion rates well beyond 5,000 systems per month. For lowest high-volume unit cost, a design can first be implemented in the XC4000E or XC4000EX, then migrated to one of Xilinx compatible HardWire mask-programmed devices. Tabie 2 shows density and performance for a few common circuit functions that can be implemented in XC4000-Series devices. 4-6 July 30, 1996 (Version 1.03)$< XILINX Table 2: Density and Performance for Several Common Circuit Functions in XC4000E! | Design Class Function | CLBs Used | XC4000E-3 | XC4000E-2 | Units 256 x 8 Single Port (read/modify/write) | 72 6 80 _ | Mhz - | Memory | 32x 16 bit FIFO | simultaneous read/write 48 63 80 MHz | MUXed read/write 32 63 80 MHz | | 9 bit Shift Register (with enable) OS 170 200. | MHz 16 bit Pre-Scaled Counter a 142 170 MHz 16 bit Loadable Counter 8 65 76 | MHz 16 bit Accumulator 9 6 | 76 | MHz 8 bit, 16 tap FIR Filter sample rate as Logic parallel 400 55 65 MHz | serial 68 8.1 10 MHz 8x8 Parallel Multiplier ; | single stage, register to register _ 8 37 30 | ons | 16 bit Address Decoder {internal decode) 3 AT ' 3.9 | _ns | 9 bit Parity Checker 1 43 27 | ns Note: 1. Most functions are faster in XC4000EX due to faster carry logic, direct connects, and other additional interconnect. Taking Advantage of Reconfiguration FPGA devices can be reconfigured to change logic function while resident in the system. This capability gives the sys- tem designer a new degree of freedom not available with any other type of logic. Hardware can be changed as easily as software. Design updates or modifications are easy, and can be made to products already in the field. An FPGA can even be recon- figured dynamically to perform different functions at differ- ent times. Reconfigurable logic can be used to implement system self-diagnostics, create systems capable of being reconfig- ured for different environments or operations, or implement multi-purpose hardware for a given application. As an added benefit, using reconfigurable FPGA devices simpli- fies hardware design and debugging and shortens product time-to-market. July 30, 1996 (Version 1.03) 4-7XC4000 Series Field Programmable Gate Arrays XC4000E and XC4000EX Families Compared to the XC4000 For readers already familiar with the XC4000 family of Xil- inx Field Programmable Gate Arrays, the major new fea- tures in the XC4000-Series devices are listed in this section. The biggest advantages of XC4000E and XC4000EX devices are significantly increased system speed, greater capacity, and new architectural features, particularly Select-RAM memory. The XC4000EX devices also offer many new routing features, including special high-speed clock buffers that can be used to capture input data with minimal delay. Any XC4000E device is pinout- and bitstream-compatible with the corresponding XC4000 device. An existing XC4000 bitstream can be used to program an XC4000E device. However, since the XC4000E includes many new features, an XC4000E bitstream cannot be loaded into an XC4000 device. Most XC4000EX devices have no corresponding XC4000 devices, because of the larger CLB arrays. The XC4028EX has the same array size as the XC4025 and XC4025E, but is not bitstream-compatible. However, the XC4025, XC4025E, and XC4028EX are all pinout-compatibie. improvements in XC4000E and XC4000EX Increased System Speed Delays in FPGA-based designs are layout dependent. There is a rule of thumb designers can considerthe sys- tem clock rate should not exceed one third to one haif of the specified toggle rate. Critical portions of a design, such as shift registers and simple counters, can run fasterapprox- imately two thirds of the specified toggle rate. XC4000E and XC4000EX devices can run at synchronous system clock rates of up to 66 MHz, and internal perfor- mance can exceed 150 MHz. This increase in performance over the previous families stems from improvements in both device processing and system architecture. XC4000- Series devices use a sub-micron triple-layer metal process. In addition, many architectural improvements have been made, as described below. PCi Compliance XC4000-Series -3 and faster speed grades are fully PCI compliant. XC4000E and XC4000EX devices can be used to implement a one-chip PCI solution. Carry Logic The speed of the carry logic chain has increased dramati- cally. Some parameters, such as the delay on the carry chain through a single CLB (Teyp), have improved by as much as 50% from XC4000 values. See Fast Carry Logic on page 21 for more information. Select-RAM Memory: Edge-Triggered, Synchronous RAM Modes The RAM in any CLB can be configured for synchronous, edge-triggered, write operation. The read operation is not affected by this change to an edge-triggered write. Dual-Port RAM A separate option converts the 16x2 RAM in any CLB into a 16x1 dual-port RAM with simultaneous Read/Write. The function generators in each CLB can be configured as either level-sensitive (asynchronous) single-port RAM, edge-triggered (synchronous) single-port RAM, edge-trig- gered (synchronous) dual-port RAM, or as combinatorial logic. Configurable RAM Content The RAM content can now be loaded at configuration time, so that the RAM starts up with user-defined data. H Function Generator In XC4000-Series devices, the H function generator is more versatile than in the XC4000. Its inputs can come not only from the F and G function generators but also from up to three of the four control input lines. The H function gen- erator can thus be totally or partially independent of the other two function generators, increasing the maximum capacity of the device. 10B Clock Enable The two flip-flops in each IOB have a common clock enable input, which through configuration can be activated individ- ually for the input or output flip-flop or both. This clock enable operates exactly like the EC pin on the XC4000 CLB. This new feature makes the lOBs more versatile, and avoids the need for clock gating. Output Drivers The output pull-up structure defaults to a TTL-like totem- pole. This driver is an n-channel pull-up transistor, pulling to a voltage one transistor threshold below Vcc, just like the XC4000 outputs. Alternatively, XC4000-Series devices can be globally configured with CMOS outputs, with p-channel puil-up transistors pulling to Vcc. Also, the configurable pull- up resistor in the XC4000 Series is a p-channel transistor that pulls to Vec, whereas in the XC4000 it is an n-channel transistor that pulls to a voltage one transistor threshold below Vcc. Input Thresholds The input thresholds can be globaily configured for either TTL (1.2 V threshold) or CMOS (2.5 V threshold), just like XC2000 and XC3000 inputs. The two global adjustments of input threshold and output level are independent of each other. 4-8 July 30, 1996 (Version 1.03)$< XILINX Global Signal Access to Logic There is additional access from global clocks to the F and G function generator inputs. Configuration Pin Pull-Up Resistors During configuration, the three mode pins, MO, M1, and M2, have weak pull-up resistors. For the most popular con- figuration mode, Slave Serial, the mode pins can thus be left unconnected. The three mode inputs can be individually configured with or without weak pull-up or pull-down resistors after configu- ration. The PROGRAM input pin has a permanent weak pull-up. Soft Start-up Like the XC3000A, XC4000-Series devices have Soft Start-up. When the configuration process is finished and the device starts up, the first activation of the outputs is automatically slew-rate limited. This feature avoids poten- tial ground bounce when all outputs are turned on simulta- neously. immediately after start-up, the slew rate of the individual outputs is, as in the XC4000 family, determined by the individual configuration option. XC4000 and XC4000A Compatibility Existing XC4000 bitstreams can be used to configure an XC4000E device. XC4000A bitstreams must be recom- piled for use with the XC4000E due to improved routing resources, although the devices are pin-for-pin compatible. Additional Improvements in XC4000EX Only Increased Routing New interconnect in the XC4000EX includes twenty-two additional vertical lines in each column of CLBs and twelve new horizontal lines in each row of CLBs. The twelve Quad Lines in each CLB row and column include optional repowering buffers for maximum speed. Additional high- performance routing near the |OBs enhances pin flexibility. Faster Input and Output A fast, dedicated early clock sourced by global clock buffers is available for the |OBs. To ensure synchronization with the regular global clocks, a Fast Capture latch driven by the early clock is available. The input data can be initially loaded into the Fast Capture latch with the early clock, then transferred to the input flip-flop or latch with the low-skew globai clock. A programmable delay on the input can be used to avoid hold-time requirements. See IOB Input Sig- nats on page 24 for more information. Latch Capability in CLBs Storage elements in the XC4000EX CLB can be configured as either flip-flops or latches. This capability makes the FPGA highly synthesis-compatible. 1OB Output MUX From Output Clock A multiplexer in the {OB allows the output clock to select either the output data or the [OB clock enable as the output to the pad. Thus, two different data signals can share a sin- gle output pad, effectively doubling the number of device outputs without requiring a larger, more expensive pack- age. This multiplexer can also be configured as an AND- gate to implement a very fast pin-to-pin path. See IOB Output Signals on page 27 for more information. Express Configuration Mode A new slave configuration mode accepts parallel data input. Data is processed in parallel, rather than serialized inter- nally. Therefore, the data rate is eight times that of the six conventional configuration modes. Additional Address Bits Larger devices require more bits of configuration data. A daisy chain of several large XC4000EX devices may require a PROM that cannot be addressed by the eighteen address bits supported in the XC4000E. The XC4000EX family therefore extends the addressing in Master Parailel configuration mode to 22 bits. July 30, 1996 (Version 1.03) 4-9XC4000 Series Field Programmable Gate Arrays Table 3: CLB Count of Selected XC4000-Series Soft Macros | 7400 Equivalents ~ CLBs Muitiplexers 138 5 brishft4 | 4 m2-le 1, 139 2 |brishfts | 13. |m4-te 1 | 1147 5 . _ m8-1e | 3 "148 | 6 4-Bit Counters mi6-1e 5 | , 150 5 jed4cd 3 |Registers | 151 | 3 i cd4cle i 5 rd4r [ 92 \'152 3 icddrle | 6 |rder | 4 | 153 2 |cb4ce | 3 |rdt6r 8 | 154 | 16 |eb4cle i 6 | 157 2 \cb4re 5 4 58 | 2 "- and 16-Bit Counters Shift Registers ; ie a eb8ce ~~ | 6 sr8ce : [4 | , icb8re ; 10 |sri6re a | 8B | M63 | p cotee G [Decoders 464 4 cc16cle 9 |q2-4e 2 465s 9 cclcied 21 | g3-8e 4 | | 466 | 5 identity Comparators d4-16e | 16 | 168 7 |comp4 1 | 474 3 |comps 2 | 194 5 |comp16 5 | 195 3 | Magnitude Comparators Explanation of counter nomenciature 280 3 jcompm4 ar cb = binary counter 283 | 8 |compms 9 cd = BCD counter 298 | 2 compmi6 20 cc = cascadable binary counter (352 ; 2 d = bidirectional +390 | 3 | = loadable 518 3 | | e =clock enable | 521 | 3 r = synchronous reset | Explanation of RAM nomenciature |RAMs _ 7 = asynchronous clear | | $=single-port edge-triggered [ram1ex4 2 | d = dual-port edge-triggered ram16x4s 2 | no extension = level-sensitive ram16x4d 4 | | 4-10 July 30, 1996 (Version 1.03)$< XILINX Detailed Functional Description XC4000-Series devices achieve high speed through advanced semiconductor technology and improved archi- tecture. The XC4000E and XC4000EX support system clock rates of up to 66 MHz and internal performance in excess of 150 MHz. Compared to older Xilinx FPGA fami- lies, XC4000-Series devices are more powerful. They offer on-chip edge-triggered and dual-port RAM, clock enables on VO flip-flops, and wide-input decoders. They are more versatile in many applications, especially those involving RAM. Design cycles are faster due to a combination of increased routing resources and more sophisticated soft- ware. Basic Building Blocks Xilinx user-programmable gate arrays include two major configurable elements: configurable logic blocks (CLBs) and input/output blocks (IOBs). CLBs provide the functional elements for constructing the user's logic. IOBs provide the interface between the package pins and internal signal lines. Three other types of circuits are also available: 3-State buffers (TBUFs) driving horizontal longlines are associated with each CLB. * Wide edge decoders are available around the periphery of each device. Anon-chip oscillator is provided. Programmable interconnect resources provide routing paths to connect the inputs and outputs of these config- urable elements to the appropriate networks. The functionality of each circuit block is customized during configuration by programming internal static memory cells. The values stored in these memory cells determine the logic functions and interconnections impiemented in the FPGA. Each of these available circuits is described in this section. Configurable Logic Blocks (CLBs) Configurable Logic Blocks implement most of the logic in an FPGA. The principal CLB elements are shown in Figure 1. The number of CLBs needed to implement selected soft macros is shown in Table 3. Two 4-input function generators (F and G) offer unrestricted versatility. Most combinatorial logic functions need four or fewer inputs. However, a third function generator (H) is pro- vided. The H function generator has three inputs. Either zero, one, or both of these inputs can be the outputs of F and G; the other input(s) are from outside the CLB. The CLB can, therefore, implement certain functions of up to nine variables, like parity check or expandable-identity comparison of two sets of four inputs. Each CLB contains two storage elements that can be used to store the function generator outputs. However, the stor- age elements and function generators can also be used independently. These storage elements can be configured as flip-flops in both XC4000E and XC4000EX devices; in the XC4000EX they can optionally be configured as latches. DIN can be used as a direct input to either of the two storage elements. H1 can drive the other through the H function generator. Function generator outputs can also drive two outputs independent of the storage element out- puts. This versatility increases logic capacity and simplifies routing. Thirteen CLB inputs and four CLB outputs provide access to the function generators and storage elements. These inputs and outputs connect to the programmable intercon- nect resources outside the block. Function Generators Four independent inputs are provided to each of two func- tion generators (F1 - F4 and G1 - G4). These function gen- erators, with outputs labeled F and G, are each capable of implementing any arbitrarily defined Boolean function of four inputs. The function generators are implemented as memory look-up tables. The propagation delay is therefore independent of the function implemented. A third function generator, labeled H, can implement any Boolean function of its three inputs. Two of these inputs can optionally be the F and G functional generator out- puts. Alternatively, one or both of these inputs can come from outside the CLB (H2, HO). The third input must come from outside the block (H1). Signals from the function generators can exit the CLB on two outputs. F or H can be connected to the X output. G or H can be connected to the Y output. A CLB can be used to implement any of the following func- tions: * any function of up to four variables, plus any second function of up to four unrelated variables, plus any third function of up to three unrelated variables! any single function of five variables * any function of four variables together with some functions of six variables * some functions of up to nine variables. 1. When three separate functions are generated, one of the function outputs must be captured in a flip-flop internal to the CLB. Ginly two unregistered function generator outputs are available from the CLB. July 30, 1996 (Version 1.03)XC4000 Series Field Programmable Gate Arrays Cyeee 4 Din/Ha SR/Hg Ga S/R CONTROL Bypass YQ G3 LOGIC FUNCTION OF G1-G4 G2 Gy Lagic FUNCTION OF : H P.G AND Hi Fa Bypass S/R CONTROL LOGIC xa FUNCTION f: OF F3 Fg Ft-F4 Fy K (CLOCK) Multiplexer Controfled by Configuration Program x6692 Figure 1: Simplified Block Diagram of XC4000-Series CLB (RAM and Carry Logic functions not shown) Implementing wide functions in a single block reduces both ; the number of blocks required and the delay in the signal Table 4: CLB Storage Element Functionality path, achieving both increased capacity and speed. {active rising edge is shown) The versatility of the CLB function generators significantly Mode K EC SR D Q improves system speed. In addition, the design-software Power-Up or tools can deal with each function generator independently. GSR Xx x Xx SR This flexibility improves cell usage. x 1 X SR Flip-Flops Flip-Flop | __/ 1 0" N The CLB can pass the combinatorial output(s) to the inter- 0 x o connect network, but can also store the combinatorial Latch 1 mo x Q results or other incoming data in one or two flip-flops, and 0 1* 0* D D connect their outputs to the interconnect network as well. Both Xx 0 o* x Q The two edge-triggered D-type flip-flops have common Legend: , : . X on't care clock (K) and clock enable (EC) inputs. Either or both clock fT ising edge inputs can also be permanently enabled. Storage element SR ator Reset value. Reset is default. functionality is described in Table 4. a Input is Low or unconnected (default value) 1 Input is High or unconnected (default value) Latches (XC4000EX only) The CLB storage elements can also be configured as Clock input latches. The two latches have common clock (K) and clock _ Each flip-flop can be triggered on either the rising or falling enable (EC) inputs. Storage element functionality is clock edge. The clock pin is shared by both storage ele- described in Table 4. ments. However, the clock is individually invertible for each storage element. Any inverter placed on the clock input is automatically absorbed into the CLB. 4-12 July 30, 1996 (Version 1.03)Clock Enable The clock enable signal (EC) is active High. The EC pin is shared by both storage elements. if left unconnected for either, the clock enable for that storage element defaults to the active state. EC is not invertible within the CLB. Set/Reset An asynchronous storage element input (SR) can be con- figured as either set or reset. This configuration option determines the state in which each flip-flop becomes oper- ational after configuration. It also determines the effect of a Global Set/Reset pulse during normal operation, and the effect of a pulse on the SR pin of the CLB. All three set/ reset functions for any single flip-flop are controlled by the same configuration data bit. The set/reset state can be independently specified for each flip-flop. This input can also be independently disabled for either flip-fiop. The set/reset state is specified by using the INIT attribute, or by placing the appropriate set or reset flip-flop library symbol. SR is active High. It is not invertible within the CLB. Global Set/Reset A separate Global Set/Reset line (not shown in Figure 1) sets or clears each storage element during power-up, reconfiguration, or when a dedicated Reset net is driven active. This global net (GSR) does not compete with other routing resources; it uses a dedicated distribution network. Each flip-flop is configured as either globally set or reset in the same way that the local set/reset (SR) is specified. Therefore, if a flip-flop is set by SR, it is also set by GSR. Similarly, a reset flip-flop is reset by both SR and GSR. GSR can be driven from any user-programmable pin as a global reset input. To use this global net, place an input pad and input buffer in the schematic or HDL code, driving the GSR pin of the STARTUP symbol. (See Figure 2.) A specific pin location can be assigned to this input using a LOC attribute or property, just as with any other user-pro- grammabie pad. An inverter can optionally be inserted after the input buffer to invert the sense of the Global Set/ Reset signal. Alternatively, GSR can be driven from any internal node. STARTUP > GSR 2} IBUE GTS a3 f aias } 5 CLK DONEIN | X5260 Figure 2: Schematic Symbols for Global Set/Reset $< XILINX Data Inputs and Outputs The source of a storage element data input is programma- ble. It is driven by any of the functions F, G, and H, or by the Direct In (DIN) block input. The flip-flops or latches drive the XQ and YQ CLB outputs. Two fast feed-through paths are available, as shown in Figure 1. A two-to-one multiplexer on each of the XQ and YQ outputs selects between a storage element output and any of the control inputs. This bypass is sometimes used by the automated router to repower internal signals. Control Signals Multiplexers in the CLB map the four controi inputs (C1 - C4 in Figure 1) into the four internal control signals (H1, DIN/ H2, SR/HO, and EC). Any of these inputs can drive any of the four internal control signals. When the logic function is enabled, the four inputs are: * EC Enable Clock * SR/HO Asynchronous Set/Reset or H function generator Input 0 * DIN/H2 Direct In or H function generator Input 2 * H1H function generator input 1. When the memory function is enabled, the four inputs are: EC Enable Clock WE Write Enable * DO Data Input to F and/or G function generator D1 Data input to G function generator (16x1 and 16x2 modes) or 5th Address bit (32x1 mode). Using FPGA Flip-Flops and Latches The abundance of flip-flops in the XC4000 Series invites pipelined designs. This is a powerful way of increasing per- formance by breaking the function into smailer subfunc- tions and executing them in parallel, passing on the results through pipeline flip-flops. This method should be seriously considered wherever throughput is more important than latency. To include a CLB flip-flop, place the appropriate library symbol. For example, FDCE is a D-type flip-flop with clock enable and asynchronous clear. The corresponding latch symbol (for the XC4000EX only) is called LDCE. In XC4000-Series devices, the flip flops can be used as registers or shift registers without blocking the function gen- erators from performing a different, perhaps unrelated task. This ability increases the functional capacity of the devices. The CLB setup time is specified between the function gen- erator inputs and the clock input K. Therefore, the specified CLB flip-flop setup time includes the delay through the function generator. July 30, 1996 (Version 1.03) 4-13XC4000 Series Field Programmable Gate Arrays Using Function Generators as RAM Optional modes for each CLB make the memory look-up tables in the F and G function generators usable as an array of Read/Write memory celis. Available modes are level-sensitive (similar to the XC4000/A/H families), edge- triggered, and dual-port edge-triggered. Depending on the selected mode, a single CLB can be configured as either a 16x2, 32x1, or 16x1 bit array. Supported CLB memory configurations and timing modes for single- and duai-port modes are shown in Table 5. XC4000-Series devices are the first programmabie logic devices with edge-triggered (synchronous) and dual-port RAM accessible to the user. Edge-triggered RAM simpli- fies system timing. Dual-port RAM doubles the effective throughput of FIFO applications. These features can be individually programmed in any XC4000-Series CLB. Advantages of On-Chip and Edge-Triggered RAM The on-chip RAM is extremely fast. The read access time is the same as the logic delay. The write access time is slightly slower. Both access times are much faster than any off-chip solution, because they avoid I/O delays. Edge-triggered RAM, also called synchronous RAM, is a feature never before available in a Field Programmable Gate Array. The simplicity of designing with edge-triggered RAM, and the markedly higher achievable performance, add up to a significant improvement over existing devices with on-chip RAM. Three application notes are available from Xilinx that dis- cuss edge-triggered RAM: XC4000E Edge-Triggered and Dual-Port RAM Capability Implementing FIFOs in XC4000E RAM? and Synchronous and Asynchronous FIFO Designs. All three application notes apply to both XC4000E and XC4000EX RAM. Table 5: Supported RAM Modes 16, 16, 32 Edge- Level- | x x | x | Triggered Sensitive. 1 2; 1 | Timing Timing Singie-Port | Vj voi Vv | v v Duai-Port Vv jou [eee RAM Configuration Options The function generators in any CLB can be configured as RAM arrays in the following sizes: * Two 16x1 RAMs: two data inputs and two data outputs with identical or, if preferred, different addressing for each RAM * One 32x1 RAM: one data input and one data output. One F or G function generator can be configured as a 16x1 RAM while the other function generators are used to imple- ment any function of up to 5 inputs. Additionally, the XC4000-Series RAM may have either of two timing modes: Edge-Triggered (Synchronous): data written by the designated edge of the CLB clock. WE acts as a true clock enable. * Level-Sensitive (Asynchronous): an external WE signal acts as the write strobe. The selected timing mode applies to both function genera- tors within a CLB when both are configured as RAM. The number of read ports is also programmable: * Single Port: each function generator has a common read and write port * Dual Port: both function generators are configured together as a single 16x1 dual-port RAM with one write port and two read ports. Simultaneous read and write operations to the same or different addresses are supported. RAM configuration options are selected by placing the appropriate library symbol. Choosing a RAM Configuration Mode The appropriate choice of RAM mode for a given design should be based on timing and resource requirements, desired functionality, and the simplicity of the design pro- cess. Recommended usage is shown in Table 6. The difference between level-sensitive, edge-triggered, and dual-port RAM is only in the write operation. Read operation and timing is identical for all modes of operation. Table 6: RAM Mode Selection | ~ Dual-Port | | | Level- Edge- Edge | | Sensitive Triggered | Triggered | | Use for New | No Yes | Yes | | Designs? | _ to Size (16x1, 2CLB. 1/2CLB. = 1CLB CC Registered) : Simultaneous No : No Yes 1 _ Read/Write _ oe Relative | | 2X (4X Performance | x ex effective) July 30, 1996 (Version 1.03)$< XILINX Cyene ty Do a DIN WRITE DECODER 16-LATCH ARRAY Greer Ga 1 of 16 READ WRITE PULSE ADDRESS Din WRITE DECODER 16-LATCH ARRAY 4 __ FreneFa 10116 kK (CLOCK) WRITE PULSE xB752 Figure 3: 16x2 (or 16x1) Edge-Triggered Single-Port RAM 4 Cy ee9 Cy < + | | | a L/ \/ \E WE DyiAg Do EC tT) DIN L_/ WRITE 46-LATCH [| DECODER ARRAY + Gys9"Gq 4, | Freesky v tof 16 | [) a READ | WRITE PULSE _ ADORESS A iH {] = ) DIN WRITE 16-LATCH | DECODER ARRAY 4 7 | 1 of 16 LATCH ! a K ENABLE| ; - ______.. { } Cr}! READ (CLOCK) : WRITE PULSE ADDRESS X6754 Figure 4: 32x1 Edge-Triggered Single-Port RAM (F and G addresses are identical) July 30, 1996 (Version 1.03) 4-15XC4000 Series Field Programmable Gate Arrays RAM Inputs and Outputs The F1-F4 and G1-G4 inputs to the function generators act as address lines, selecting a particular memory cell in each look-up table. The functionality of the CLB control signals changes when the function generators are configured as RAM. The DIN/ H2, H1, and SR/HO lines become the two data inputs (DO, D1) and the Write Enable (WE) input for the 16x2 memory. When the 32x1 configuration is selected, D1 acts as the fifth address bit and DO is the data input. The contents of the memory cell(s) being addressed are available at the F and G function-generator outputs. They can exit the CLB through its X and Y outputs, or can be cap- tured in the CLB flip-flop(s). Configuring the CLB function generators as Read/Write memory does not affect the functionality of the other por- tions of the CLB, with the exception of the redefinition of the control signals. In 16x2 and 16x1 modes, the H function generator can be used to implement Boolean functions of F, G, and D1, and the D flip-flops can latch the F, G, H, or DO signals. Single-Port Edge-Triggered Mode Edge-triggered (synchronous) RAM _ simplifies timing requirements. XC4000-Series edge-triggered RAM timing operates like writing to a data register. Data and address are presented. The register is enabled for writing by a logic High on the write enable input, WE. Then a rising or falling clock edge loads the data into the register. as shown in Figure 5. Complex timing relationships between address, data, and write enable signals are not required, and the external write enable pulse becomes a simple clock enable. The active edge of WCLK latches the address, input data, and WE sig- nals. An internal write pulse is generated that performs the write. See Figure 3 and Figure 4 for block diagrams of a CLB configured as 16x2 and 32x1 edge-triggered, single- port RAM. The relationships between CLB pins and RAM inputs and outputs for single-port, edge-triggered mode are shown in Table 7. The Write Clock input (WCLK) can be configured as active on either the rising edge (default) or the falling edge. It uses the same CLB pin (K) used to clock the CLB flip-flops, but it can be independently inverted. Consequently, the RAM output can optionally be registered within the same DATA IN ADDRESS | DATA OUT $e X6461 Figure 5: Edge-Triggered RAM Write Timing CLB either by the same clock edge as the RAM, or by the opposite edge of this clock. The sense of WCLK applies to both function generators in the CLB when both are config- ured as RAM. The WE pin is active-High and is not invertible within the CLB. Note: The pulse following the active edge of WCLK (Twps in Figure 5) must be less than one millisecond wide. For most applications, this requirement is not overly restrictive: however, it must not be forgotten. Stopping WCLK at this point in the write cycle could result in excessive current and even damage to the larger devices if many CLBs are con- figured as edge-triggered RAM. Table 7: Single-Port Edge-Triggered RAM Signals RAM Signal CLB Pin Function D DO or D1 Data In (18x2, 16x1) DO (32x1) A[3:0] _ Fi-F4or+~Addresst~S '@1-G4 Al) D1 (2x1) Address WE WE Write Enable WCLKOKt~