# RASP 2.8: A New Generation of Floating-gate based Field Programmable Analog Array

Arindam Basu\*, Christopher M. Twigg<sup>†</sup>, Stephen Brink\*, Paul Hasler\*, Csaba Petre\*, Shubha Ramakrishnan\*, Scott Koziol\* and Craig Schlottmann\*

\*School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332

<sup>†</sup>Department of Electrical and Computer Engineering, Binghamton University, Binghamton, New York 13902 Email: {arindamb,phasler}@ece.gatech.edu, ctwigg@binghamton.edu

Abstract—The RASP 2.8 is a very powerful reconfigurable analog computing platform with thirty-two computational analog blocks(CABs). Each CAB has a wide variety of sub-circuits ranging in granularity from multipliers and programmable offset wide linear range Gm blocks to NMOS and PMOS transistors. The programmable interconnects and circuit elements in the CAB are implemented using floating gate transistors. This system exhibits significant performance enhancements over its predecessor in terms of achievable signal bandwidth(> 50 MHz), accuracy(> 9bits), dynamic range(> 7 decades of current), speed of floatinggate programming(> 200 gates/sec) and isolation between ON and OFF switches. The improved bandwidth is primarily due to an improved routing fabric that includes nearest neighbor connections. Programming performance improved drastically by implementing the entire algorithm on-chip with an SPI digital interface. Several complex system examples are presented.

# I. RASP 2.8: OVERVIEW

The RASP 2.8 FPAA is a powerful system comprising over fifty thousand programmable floating gate elements which can be utilized as programmable interconnect as well as adaptivecomputational elements. This leads to a platform capable of performing signal processing and computational tasks beyond a typical digital signal processor but at a fraction of its power. Similar advantages in computation have been demonstrated in [1], but this also marks a paradigm shift in the concept of analog design since its superior performance compared to earlier designs enables this to be used not only as a prototyping tool, but also as an attractive option for the final implementation.

The RASP 2.8 has thirty-two CABs connected by multilevel routing. The CABs are of two major types as shown in fig. 1(c). The first one has three operational transconductance amplifiers(OTA), three floating capacitors (500 fF each), two multi-input floating gates which can be used for constructing translinear circuits using MITE architectures, a voltage buffer, a transmission-gate with dummy switch for switchedcapacitor applications and nMOS/pMOS transistor arrays with two common terminals for easily constructing source-follower or current-mirror topologies. All the OTAs are biased using floating gate transistors giving the user the option to tradeoff bandwidth, noise and power. Cascode biasing circuits valid for inversion levels are included as well. Two of the OTAs have floating-gate differential pairs which enable programming



Fig. 1: **Chip level architecture:** (a) Routing architecture showing multi-level routing lines with different capacitances for improved bandwidth and connectivity. (b) Die Photo of the fabricated IC. (c) Two basic CAB types with internal components. These are complemented by circuits compiled in the switch fabric.

the offset of the amplifier as well as provide wide input linear range that is essential to reduce distortion in Gm-C filters and oscillators. The second type of CAB has a folded Gilbert multiplier in addition to a wide linear range OTA. The multiplier also has floating-gate differential pairs to reduce distortion. These CAB components can be connected using the switch-matrix consisting of floating-gate switches, which unlike other digital switch implementations, can be used for analog computations.

## II. RASP 2.8: ARCHITECTURE

The present generation of FPAA exhibits significantly improved performance over its predecessor [2] primarily because of several architectural modifications that are described next.



Fig. 2: **Improved Isolation:** (a) Source side selection together with indirect programming of the floating gate switches allows the RASP 2.8 to display impressive isolation between ON and OFF transistors. (b) Grayscale values in the figure correspond to logarithm of measured current from the array after the switches were programmed in this pattern.

## A. Switch Isolation and Programming

The programmable switch matrix used in the earlier FPAA used the application of high gate voltage or high drain voltage as the method of isolation while the selected device had a low voltage at both the gate and drain terminals. However this method has a number of disadvantages, the primary one being over-injection of devices beyond the isolation point [3]. This IC employs source side selection [3] coupled with indirect programming as shown in fig. 2. The signal 'rsel' selects the desired row and removes the source current of other rows thus prohibiting injection. This leads to significantly better isolation compared to [2] which used ' $V_{gate}$ ' to control isolation. Hence, it is also possible to measure devices programmed to accurate currents located on the same column as an ON switch. Fig. 2 shows the current levels programmed into a pattern of 12x12 switches with the grayscale values representing measured currents. All the ON switches conduct more than 20 uA while the OFF black devices are at levels less than 40pA.

# B. Routing

The routing architecture of the IC shown in 1(a) demonstrates the four major types of interconnections - local, nearest neighbor vertical and horizontal (nnv and nnh) and global. This granularity allows for high speed interconnects to be routed on low-capacitance lines like local or nearest neighbors while global connections are used only for I/O after the internal processing is complete. This results in huge power savings and facilitates low-power adaptive designs. Fig. 3 shows the configuration used to estimate the capacitances. The wide linear range OTA is biased at a Gm of 32 nS. Different routing lines corresponding to various capacitive loads are added successively and the step responses are measured. A voltage buffer is used to isolate the pad capacitance from the Gm-C element. Thus routing between CABs can be accomplished with relatively lower parasitic as compared to the earlier version and can achieve bandwidths of approximately 6 Mhz at around 100 nA of current. The achievable bandwidth within a CAB should be an order of magnitude higher. In addition to bandwidth, this characterization allows one to use the routing



Fig. 3: **Capacitance estimation:** Step responses are measured after adding one and three nearest neighbor vertical(nnv), three nearest neighbor horizontal(nnh) and one global line respectively as capacitive load to the Gm-C filter. Capacitances estimated from the resulting time constants are 151 fF for nnv, 228 fF for nnh and 763 fF for global lines.

to reduce kT/C noise as desired. The other feature of the routing scheme is bridge transistors that allow local lines to be bridged between CABs facilitating variable length connections without incurring the capacitance penalty of global lines.

# C. On-chip Programming

Earlier generations of the FPAA used off-chip current measurement circuits which led to inaccuracy in the measurement due to noise and increased the minimum measurable current to the ESD leakage. Also, serial communication with a picoammeter is a time consuming operation resulting in large programming times. Fig. 4(a) shows the architecture of the current programming scheme which does all measurement operations on-chip and provides a digital SPI interface to a microprocessor (uP). Binary scaled current mode 7-bit DACs are used to supply the gate and drain voltages during both programming and operational modes. The drain selection circuit sets the drain to one of four choices depending on the current state in the programming algorithm. When the source of the floating-gate is being ramped-up, the drain is set to Vdd to prevent injection. For switch injection or accurate bias programming, the drain is then set to GND or a DAC voltage respectively for a fixed amount of time. Lastly, for measuring the currently programmed charge on the floating gate, the drain is switched to the current measurement circuitry.

The huge improvement in measuring accuracy, speed and dynamic range is obtained by using a logarithmic transimpedance amplifier described in [4]. Fig. 4(b) shows measured current from an off-chip picoammeter which saturates at the ESD leakage level of around 100pA while the inferred current from the logamp goes below 100 fA. The logamp is followed by a lowpass filter to limit bandwidth and improve noise performance [4] which has been measured to be around 9 bits.

The output voltage of the logarithmic amplifier is quantized by a ramp ADC as shown in the figure. The clock to the ADC is currently generated by a microprocessor and is limited to 25 Mhz resulting in an average conversion time of  $500\mu s$  which should decrease in proportion to clock frequency. The digital



Fig. 4: **On-chip Programming:** (a) The scheme for on-chip programming is shown. DACs supply voltages to drain and gate. The programmed current is measured using a logarithmic I-V converter followed by a linear ramp ADC. (b) Measured I-V characteristic of logarithmic TIA showing improved accuracy over off-chip ammeter. (c) Ramp ADC characteristic displaying good linearity.

output word is sent to a microprocessor that implements the programming algorithm over a SPI interface.

Another important improvement is introduction of rowparallel programming for switches. The rows of the floatinggate array are selected by a decoder but the columns are selected using a shift register which enables selecting multiple columns per row. This leads to switch programming time given by  $N_{rows} \times 100 \mu s$ .

# D. I/O pad and Scanner Shift Register

Special bidirectional I/O pads have been incorporated in this IC which have buffer amplifiers capable of driving high capacitive loads when enabled. Their bandwidth is determined by a programmable floating gate device. Also an analog 16 bit shift register is available to scan through and observe different lines allowing the user an option to debug their circuit almost in a SPICE-like fashion.

## III. CIRCUIT AND SYSTEM EXAMPLES

In this section we describe a few of the systems that have been implemented on the RASP 2.8. Fig. 5(a) demonstrates a wide linear range(WLR) OTA that is part of the CABs. To demonstrate the programmable nature of the offsets, the OTA was used as a comparator and linearly spaced charge differences were programmed into the differential pair. In the experiment, one terminal was fixed at 1V while the other was swept from 0 to 2.4V. The resulting curves are plotted in fig. 5(b) and show the threshold shifts. Fig. 5(c)



Fig. 5: **Programmable Offset OTA:** (a) The circuit for a wide input linear range OTA with a floating-gate differential pair and bias. The floating gates allow programming bias currents and offsets. (b) The OTA is used as a voltage comparator to demonstrate the different programmed offsets. (c) The measured threshold offsets are linearly related to the programmed charge difference.

plots the measured threshold against the programmed charge difference. The deviations from linearity are primarily because of inaccurate extraction of the threshold from the high-gain characteristics. This set of curves directly exhibit the feasibility of implementing a flash ADC in the FPAA accurate to the programming accuracy of 9 bits.

The next system is a second order current mode delta sigma converter. Fig. 6(a) shows the implementation in the



Fig. 6: **Current Mode Delta Sigma:** (a) The circuit for a secondorder current mode delta sigma modulator. The first integration is performed by integrating current on capacitor and the second one is implemented using a WLR OTA. (b) Input sinusoidal signal to the modulator. (c) Output digital waveform from the modulator.

FPAA. The first integration is performed using a cascoded floating gate current source integrating charge on a capacitor. These floating gates are elements of the switch matrix and demonstrate the computational power of the analog switches. The second integration is performed by the WLR OTA on an explicit drawn capacitor. Fig 6(b) and (c) show the input sinusoidal signal and the modulator output respectively. A better implementation where the current source is switched using a differential pair instead of being turned off can be easily done and is not shown due to lack of space.

The third circuit fully exploits the computational ability of the switches [5]. It is a four quadrant vector matrix multiplier where the matrix of the weights is stored as charge difference on the floating gates. Fig. 7(a) shows the fully differential version of the circuit which can perform four quadrant multiplication. All the floating gates shown are part of the switch fabric and thus a really computation intensive operation can be obtained without having dedicated hardware in the CAB. The power of this architecture lies in its low power and scalability. Fig. 7(b) shows measured four quadrant multiplication on a linear scale.

Many other systems including amplitude and frequency modulators, oscillators, winner-take-all can also be easily implemented.

| Process                        | $0.35 \mu m$                    |
|--------------------------------|---------------------------------|
| Die Size                       | $3mm \times 3mm$                |
| Power Supply                   | 2.4V                            |
| Injection Vdd                  | 5.6V                            |
| Number of CABs                 | 32                              |
| Switch programming time        | $N_{rows} \times 100 \mu s$     |
| Bias programming time          | 5 ms/element                    |
| Programming accuracy and range | 9 bits over 100fA to 10 $\mu A$ |

TABLE I: Table of Parameters



Fig. 7: Vector Matrix Multiplier: (a) The circuit for a four quadrant vector matrix multiplier implemented in current mode. (b) Measured four quadrant multiplication showing different programmed weights.

### IV. CONCLUSION

The RASP 2.8 generation of FPAA devices provide a powerful platform for prototyping and implementing largescale signal processing applications. The programmable switch matrix composed of floating gate devices shows excellent isolation and can be readily utilized in computation. On chip programming interface allows current measurements below 100fA. The fully digital interface allows easy integration with a microprocessor. Programming times are around 5 ms for accurate biases and 100  $\mu s$  per row of switches. Different levels of routing allow implementation of high performance circuits while allowing for fast turn-around times.

#### REFERENCES

- R. Melville G. Cowan and Y. Tsividis, "A vlsi analog computer/digital computer accelerator," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 1, pp. 42–53, Jan 2006.
- [2] C.M. Twigg and P.E. Hasler, "A large-scale reconfigurable analog signal processor ic," in *Proceedings of the IEEE Custom Integrated Circuits Conference*, Sept 2006, pp. 5–8.
- [3] C.M. Twigg and P.E. Hasler, "Programmable conductance switches for fpaas," in *Proceedings of the International Symposium on Circuits and Systems*, May 2007, pp. 173–76.
- [4] R. Robucci A. Basu and P. Hasler, "A low-power, compact, adaptive logarithmic transimpedance amplifier operating over seven decades of current," *IEEE Transactions on Circuits and Systems I*, vol. 49, no. 1, pp. 2167–77, Oct 2007.
- [5] C.M. Twigg J Gray and P. Hasler, "Programmable floating gate fpaa switches are not dead weight," in *Proceedings of the International Symposium on Circuits and Systems*, May 2007, pp. 169–72.

## MP-16-4