# An SoC FPAA Based Programmable, Ladder-Filter Based, Linear-Phase Analog Filter

Jennifer Hasler<sup>10</sup>, Senior Member, IEEE, and Sahil Shah

Abstract—This work demonstrates a Continuous-Time (CT) Ladder filter using transconductance amplifiers as an approximate delay stage implemented on a large-scale Field Programmable Analog Array (FPAA) and characterized on an SoC FPAA. We experimentally demonstrate a reprogrammable CT Analog linear-phase filter by utilizing the ladder filter delay element and Vector-Matrix Multiplication (VMM) both compiled on the SoC FPAA. Using the Ladder Filter as a programmable CT delay operation enables a traditionally difficult analog signal operation. This effort extensively models and characterizes the ladder filter delay stage in terms of its transfer function, delay tunability, power requirements, distortion, and SNR. The theoretical development is compared to experimental measurements on an SoC FPAA with programmable ladder filter delay of  $2.9\mu s$  and  $4.2\mu s$ for multiple input frequencies (e.g. 5kHz, 20kHz). In addition, we show that VMMs can compensate the non-idealities found in the ladder filter delay-line operation.

*Index Terms*—Floatiing-gate, field programmable analog array (FPAA), ladder filters, linear-phase filters.

**THE** paper shows programmable and configurable Continuous-Time (CT) Analog (approximate) linear-phase filter, utilizing Ladder Filter as an approximate delay stage, on a large-scale Field Programmable Analog Array (FPAA) fabric. A linear-phase filter using a delay line and Vector-Matrix Multiplication (VMM) computation is analogous to a digital Finite Impulse Response (FIR) linear-phase filter approach (Fig. 1). Our objective is to demonstrate, model, and experimentally characterize a CT analog ladder filter approximate delay line on an SoC FPAA [1], [2], and use this block to demonstrate a programmable CT linear-phase filter. The analog computation, using an ASIC or an FPAA devices (e.g. [1]), is  $1000 \times$  energy-efficient compared with digital computation.

This paper focuses on generating near constant delays, generating an orthogonal vector of signals from a single signal, required for linear-phase filtering. The linear CT ladder

Jennifer Hasler is with the School of Electrical and Computer Engineering (ECE), Georgia Institute of Technology, Atlanta, GA 30332-0250 USA (e-mail: jennifer.hasler@ece.gatech.edu).

Sahil Shah was with Georgia Tech ECE, Atlanta, GA 30332-0250 USA. He is now with the University of Maryland, College Park, MD 20742 USA. Digital Object Identifier 10.1109/TCSI.2020.3038360



Fig. 1. Block diagram showing the key blocks for implementing a Continuous-Time (CT) Analog (approximate) FIR filter in an SoC FPAA IC: an approximate CT delay line using a Ladder filter (with slight amplitude attenuation), and a Vector-Matrix Multiplication (VMM) that can be programmed to compensate for ladder filter amplitude attenuations.

filter delay block solves this challenging question, where other approaches such as bandpass filter banks provide an easier signal decomposition for a CT filter [3]. The difficulty of building efficient analog delay elements typically directs analog computation to utilize almost any other signal decomposition other than delays. Previous discussions have demonstrated other time-domain signal decompositions (e.g. wavelet, classifier [1], [2]). This work is a first demonstration of a programmable and reconfigurable approximate analog delay signal decomposition.

Ladder filters start from an Inductor–Capacitor (LC) based prototype filter to build a particular filter design. Ladder filters have been used for either a fixed single or a selectable corner frequency with applications that include switched capacitor [4] and CT channel [5] circuit approaches. This paper utilizes ladder filters as a properly terminated LC circuit to approximate a delay line. One challenge is to characterize the amplitude attenuation per tap (vector **A** in Fig. 1), and compensate the attenuation by programming the VMM weights, (**W**). VMM and TA FG Programming enables higher precision operation for the delay-line and overall analog FIR operation.

Although delays are the most natural operation for any digital system [6], wideband delays are the hardest computation for analog systems, to compete with any solution of digital operations. Multiple attempts at delay lines have been proposed towards building linear-phase CT filters, typical of the digital FIR equivalent structure. Approaches for potential delay-lines include a sequence of sample and hold elements [9], a sequence of first-order low-pass or all-pass voltage

1549-8328 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Manuscript received April 21, 2020; revised July 25, 2020 and September 21, 2020; accepted October 24, 2020. Date of publication December 30, 2020; date of current version January 12, 2021. This article was recommended by Associate Editor A. Worapishet. (*Corresponding author: Jennifer Hasler.*)

mode [10], [11] or current mode stages [16], as well as digitally encoding the input analog signal (e.g. Sigma Delta modulation) before a digital delay line and analog VMM filter [17], [18]. The ladder-filter delay-line approach allows for an approximate CT delay line with a nearly constant delay as expected by an ideal, finite stage Inductor-capacitor line, as opposed to relying on a cascade of first-order filter stages. A linear-phase filter could give an approximately constant delay over most of its passband region; a short cascade of linear phase filters can retain this linear phase property in particular implementations (e.g. [12], [13], [15]). This ladder-filter delay line enables nearly constant wideband delays for large number of programmable taps more closely related to typical FIR filters used in digital implementations. The Vector-Matrix Multiplication (VMM) block for the computation has been a stable programmable solution both in custom [7], as well as configurable [8] (e.g. FPAA) hardware applications, utilizing Floating-Gate (FG) circuits for programmability.

This paper shows a delay / Analog CT linear-phase filter block compiled into, and measured from, an FPAA system, enabling reusable IP block by other engineers for a range of applications. FPAA implementation enables experimental verification, exactly as custom IC experimental verification, although with some different parasitic values. On-chip experimental demonstration in this work follows traditions of circuits experimentally measured on an FPAA (e.g. [2]). This effort starts by discussing the CT Delay Line framework for a linear-phase filter (Sec. I), then by discussing the CT Ladder filter implementation and data (Sec. II), then showing the CT linear-phase filter built using this ladder filter structure and VMM block (Sec. III), and finishing by summarizing the results, potential metrics, and potential directions (Sec. IV).

## I. CT DELAY LINE FRAMEWORK

The linear phase filter is a combination of the delay and the VMM computation (Fig. 2). The digital implementation (analog in / analog out) requires an input ADC / sampler, delay line, VMM (multiple Multiply ACcumulate (MAC) units), and output DAC + filtering. The analog implementation uses a CT delay line and VMM for a CT linear phase filter. The ideal delay line output is a tap with its particular delay; for a uniform delay ( $\tau$ ) per stage for the  $k^{th}$  stage

$$V_k(t) = V_{in}(t - k\tau). \tag{1}$$

A digital approximation utilizes a digital delay line (after a sampled ADC) through precise clocking structure (Fig. 2).

For an analog delay-line approximation (Fig. 2), the output of the taps of the ladder filter structure is

$$V_k(t) = A_k V_{in}(t - k\tau) \tag{2}$$

where **A** is the vector of attenuation weights resulting from the ladder filter approximation at each delay tap; these attenuation levels could be a function of frequency. The FG ladder-filter line allows for different delays per stage.

The linear-phase filter requires combining the delay structure (ladder-filter) with a VMM stage. A FG crossbar structure (e.g. FPAA routing fabric) has an output current (Fig. 2)

$$\mathbf{I} = g \mathbf{W} \mathbf{V},\tag{3}$$



Fig. 2. The proposed FPAA analog delay line and linear-phase filter concept. The ideal approach of a CT delay line can be approximated by a digital delay line using a Digital Shift Register, or by an analog CT delay line implemented through a Ladder-Filter circuit approximation. The sampled digital delay line ideally delays the resulting word (from an ADC) by one clock cycle. The CT analog delay line (not requiring sampling, ADC), built using two Floating-Gate (FG) programmed Transconductance Amplifiers (TA) per delay stage, and a VMM. The VMM, multiplying an input vector ( $\mathbf{x}$ ) by a stored FG weight ( $\mathbf{W}$ ), compensates for amplitude attenuation.

where the FG circuit elements have a nominal conductance, g, and a programmed weight values,  $w_{k,l}$  associated with the FG charge, to perform multiplication. These current outputs are summed along the single output wire to form an output current vector **I**. An on-chip transimpedence circuit with a transformation resistance, **R**, is used to convert current to voltage vector  $\mathbf{V}_{out}$  given by

$$p_{out,l} = \frac{g}{R} \sum_{k=1}^{m} W_{k,l} v_k \tag{4}$$

This combined structure (CT delay line + VMM) gives the (linear) frequency response,

$$H_{out,l}(f) = \frac{V_{out,l}(f)}{V_{in}(f)} = \frac{g}{R} \sum_{k=1}^{m} W_{k,l} A_k e^{-j2k\pi f\tau}, \quad (5)$$

enabling linear-phase CT filters along the same strategies utilized for digital linear-phase FIR filters [19]. The frequency response will be continuous and look similar to the Z transform approach for frequencies (f) between 0 and  $1/2\pi \tau$ .

A potential CT delay block, described using a Ladder-Filter Topology (Sec. II), enables comparisons between digital and analog linear-phase filters The sampled digital delay line using a multibit digital shift register ideally delays (by one clock) the resulting word, typically originating from an ADC (Fig. 2). Digital delay lines are fairly efficient implementations if data is not moved through external memories [21]. The average power required comes from the charging and discharging of the total capacitance (C) in the Flip-Flop (FF) and sampling clock circuitry, for each word line bit (e.g. 16-bit word) as  $\frac{1}{2}CV_{dd}^2 f$ . where f is the sampling frequency. The analog and digital efficiency comparison primarily rests on the relative sizes of their capacitances, as well as the energy required for the minimum delay resolution. The relative analog delay line capacitance eventually is limited by the desired SNR of the delay line. The minimum digital delay is set by the sampling rate that must be at least the Nyquist rate for the incoming signal. The minimum analog delay is set by bias currents and not directly constrained by Nyquist sampling as the CT signal is not sampled, although might be attenuated by the circuit dynamics. One often expects similar power consumption between these two techniques for moderate analog SNR (e.g. 10-12 bits); at very high SNR (e.g. 24-bit), the analog cost would be substantially higher (e.g. [21]).

Comparisons between analog and digital VMM computations follows from classical discussions where analog computation tends to be a 1000× improvement over digital approaches [7], [8], [20], [23]. A VMM computation operating at the energy-efficiency wall [20], at 16-bit precision would be 40MMAC / mW, roughly 1000× the VMM computation in an FPAA device [8]. A moderate digital linear filter of 64 taps, would achieve 10-12 bits of output VMM SNR due to summation errors, corresponding to the typical 10-12 bit SNR FPAA analog implementation [21]. The VMM summation errors for digital implementations grow with larger number of taps [21]. Increasing analog resolution to very high levels (e.g. 16-bit) requires larger capacitances where the SNR signal power scales linearly with increased capacitance, including routing between delay stages to the multiplier blocks.

#### II. CT DELAYS: FPAA LADDER FILTER

The ladder filter emulates the alternate integration of current (capacitor) and voltage (inductor) of an LC line (Fig. 3a)

$$L\frac{dI_k}{dt} = V_{k-1} - V_{k+1}, \ k \text{ odd},$$
  

$$C\frac{dV_k}{dt} = I_{k-1} - I_{k+1}, \ k \text{ even}$$
(6)

resulting in the second-order space and time equation

$$LC\frac{d^2V_k}{dt^2} = V_{k-2} - 2V_k + V_{k+2}, \ k \text{ even}$$

where  $\tau^2 = LC$ . The boundary condition at the last stage (m) is represented by the resistive termination,  $R = \sqrt{L/C}$ ,

$$\tau^2 \frac{d^2 V_m}{dt^2} = V_{m-2} - V_m - L \frac{dI_{m+1}}{dt} = V_{m-2} - V_m - \tau \frac{dV_m}{dt},$$

for forward propagating waves (no reflections).

The programmable Transconductance Amplifier (TA) Ladder Filter builds from LC delay line filter approximation. A TA-C filter stage implements the  $1^{st}$  dynamics in (6): I<sub>k</sub> for odd k and V<sub>k</sub> for even k (Fig. 3a):

$$\tau_{1,2} \frac{dV_k}{dt} = V_{k-1} - V_{k+1}, \ \tau_1 \ k \text{ odd}, \ \tau_2 \ k \text{ even}$$
(7)

where  $\tau_1 = C_2/G_{m1}$  (downward TA), and  $\tau_2 = C_1/G_{m2}$  (upward TA), and  $G_{m1}$ ,  $G_{m2}$  are the corresponding transconductances. The second-order space and time equation is

$$\tau^2 \frac{d^2 V_k}{dt^2} = V_{k-2} - 2V_k + V_{k+2},\tag{8}$$



Fig. 3. A programmable TA-Capacitor Ladder Filter is built from an LC-delay line filter approximation implemented in an SoC FPAA IC. (a) Circuit Diagram showing the transformation from an LC-prototype circuit. The arrows illustrate the equivalence between these two blocks, as well as the TA used to model the differential equation of the inductor block. The odd numbered TA filters model the inductor current, and the even numbered TA filters model the capacitor voltages. The TA elements have a programmable bias current to set the individual transconductances, and therefore the resulting line delays. The line is terminated by an equivalent conductance equal to  $\sqrt{L/C}$ . (b) Measured outputs of a 10-tap ladder-filter delay line (5kHz (98mV-pp) sinusoidal input). Measured FPAA device data shows a total delay of  $42\mu s$  (75.6 degrees phase shift) with a constant  $4.2\mu s$  delay per tap. (c) Measurement of attenuation along the delay line resulting from finite FG TA gain.

where  $\tau^2 = \tau_1 \tau_2$ . A resistive termination final TA (V<sub>m</sub> to the negative TA terminal) sets the x=1 boundary condition:

$$\tau^2 \frac{d^2 V_m}{dt^2} = V_{m-2} - V_m - \tau \frac{dV_m}{dt},$$
(9)

resulting in only forward propagating waves. A matrix formulation a single input frequency ( $\omega$ ) to calculate the frequency response along this line:

$$\begin{pmatrix}
\begin{bmatrix}
2 & -1 & & & \\
-1 & 2 & -1 & & \\
& & \ddots & \ddots & & \\
& & -1 & 2 & -1 \\
& & & -1 & 1+j\omega\tau
\end{bmatrix} - (\omega\tau)^{2}I \\
\begin{bmatrix}
V_{1} \\
V_{2} \\
\vdots \\
V_{m-1} \\
V_{m}
\end{bmatrix} \\
= \begin{bmatrix}
1 \\
0 \\
\vdots \\
0 \\
0
\end{bmatrix} (10)$$

The circuit effectively approximates a second-order hyperbolic PDE seeing the approximation

$$V_{k-2} - 2V_k + V_{k+2} \to \Delta x^2 \frac{\partial^2 V}{\partial x^2},\tag{11}$$



Fig. 4. Measured signal magnitude and phase shift for each tap along the same programmed ladder-filter line, first for a 5kHz sine-wave input, and then for a 20kHz sine-wave input. Both cases show  $2.9\mu$ s per stage for this 100mV peak-to-peak input signal.

where  $\Delta x$  is a distance between the two nodes, resulting in the continuous approximation

$$\tau^2 \frac{\partial^2 V}{\partial t^2} - \Delta x^2 \frac{\partial^2 V}{\partial x^2} = 0.$$
 (12)

As the line input is the positive TA input, the circuit does not resistively load the previous stage as well as significantly reducing any input matching constraints other than handling the TA's input capacitance. One can assume a normalized distance for x between 0 and 1, or one can use the physical distance between stages, or another consistent spatial definition; without loss of generality for this analysis, we will assume a normalized distance for x between 0 and 1, and  $\Delta x = 1/m$ . This PDE can be written as

$$\tau^2 \left( \frac{\partial}{\partial t} - v \frac{\partial}{\partial x} \right) \left( \frac{\partial}{\partial t} + v \frac{\partial}{\partial x} \right) V = 0 \tag{13}$$

resulting in forward or backward wave propagation of velocity, v, from the flow lines of each operator, where

$$v = \frac{\Delta x}{\tau} = \frac{\Delta x}{\sqrt{LC}}.$$
 (14)

One can normalize the time (and resulting frequency) response by  $\tau$ . The right boundary condition (at x=1) for matched resistive termination results in wave propagating only in the single forward direction, eliminating reflections at the right boundary. The second operator in (13) describes the line dynamics for matched resistive termination. The approximate ideal LC line voltages from (9), (12), or (13) are

$$V_k(t) = V_{in}(t - \frac{k}{2}\tau)$$
 (even),  $= V_1(t - \frac{k-1}{2}\tau)$  (odd). (15)

effectively showing a constant delay ( $\tau$ ) between each stage that corresponds with experimental data (Fig. 3b).

Figures 3 and 4 show measurements from a 10- and 12-tap delay line compiled on an FPAA with equally programmed transconductances and capacitive loads. We get nearly a constant delay of the input sinusoid for different input frequencies and  $\tau$ . The delay is roughly the same per stage, with the phase shift nearly linearly changing as a function of the input frequency. Capacitances and parallel routing lines has less than 1% mismatch, and current sources are programmed better than 1% accuracy even for subthreshold currents [24]. Further, any further issues, for example indirect programming on TA bias currents, can be eliminated through built-in self-test loops [25]. Numerical modeling of the original circuit (8) further demonstrates these delay functions (Fig. 5a).

The following subsections further discus the ladder filter block, including showing finite TA gain setting line attenuation (Sec. II-A), deriving a frequency response for the taps (Sec. II-B), showing output noise at each tap (Sec. II-C), an overview of FG TA in an FPAA fabric (Sec. II-D), and showing the nonlinear analysis of delay taps (Sec. II-E).

# A. Finite TA Gain Set Line Attenuation

The output amplitude of the delay line attenuates for increasing position (Figs. 3c, 4) due to the finite gain of TA amplifiers. An additional term models finite TA gain (Fig. 6):

$$\tau_{1,2}\frac{dV_k}{dt} + aV_k = V_{k-1} - V_{k+1}, \ \tau_1 \ k \text{ odd}, \ \tau_2 \ k \text{ even},$$

where  $a = 1/A_v = 1/(G_m R_o)$ ,  $\tau$  is fixed,  $A_v$  is the maximum TA gain, and  $R_o$  is the TA output resistance, resulting in

$$\tau^2 \frac{d^2 V_k}{dt^2} 2a\tau V_k + a^2 V_k = V_{k-2} - 2V_k + V_{k+2}, \quad (16)$$

with its spatially continuous version as

$$\left(\tau\frac{\partial}{\partial t} + a - \Delta x\frac{\partial}{\partial x}\right)\left(\tau\frac{\partial}{\partial t} + a + \Delta x\frac{\partial}{\partial x}\right)V(x,t) = 0. \quad (17)$$

The resulting first operator, shows the forward wave (delay) propagation with the expected velocity, as well as an expected (small) decay in time represented by the *a* term. From the forward propagating operator above, one would expect a decay per stage of  $e^{-a}$ , corresponding to the attenuation seen in Figs. 3c and 4 as also seen in a modified numerical model of (10) with  $A_v = 20$  in Fig. 5b. In practice, one would want the TA open-loop gain  $10 \times$  or higher to minimize any decay in these structures. The VMM has programmable gain to compensate for this attenuation at each tap. An adaptive algorithm could be developed to compensate for these gain levels as part of an FIR adaptive filter.



Fig. 5. Numerically computed results for the ladder filter delay circuit (8) using (10) for a line with constant  $\tau$  stages. The operating frequency is normalized by the programmed ladder-filter  $\tau$ . (a) 32-tap line numerical phase computation for multiple frequencies with no finite gain TA effects. The resulting delay (slope of phase curves) linearly increases with the number of taps and is effectively independent of frequency. The nearly constant delay continues beyond the normalized frequency,  $\omega \tau = 1$ . (b) Numerical computation of the amplitude attenuation on a 12-tap line including a TA finite gain of 20. The attenuation levels are similar to measurements (Figs. 3,4).

## B. General Tap Frequency Response

This formulation describes the frequency response at each tap starting from (17). For an input sinusoidal at  $\omega_0$ ,

$$V_{in} = V(0, t) = e^{j\omega_0}$$

we assume a form of the solution

$$V(x,t) = H(x,\omega_0)e^{j\omega_0(t-(x/v)) - (a/\Delta x)x}.$$
 (18)

Solving (17) with a matched impedance, results only a a forward propagating wave (18), resulting in no reflections on the line. Reflections show oscillatory patterns of gain/attenuation down a ladder-filter line. Attenuating those oscillatory patterns is achieved with improved line matching. Changing the TA bias current near  $V_m$  would improve the resulting resistive termination issue bringing the circuit elements closer to their ideal, non-mismatched, values. No reflections means the backwards propagating wave, the second operator in (17), where the feedforward wave operator governs this computation. In this case  $\frac{\partial H}{\partial x} = 0$ , where  $H(\omega_0, x)$  is constant with position. As the system is driven by an input, the initial condition is

$$H(\omega_0, x = 0) = 1$$



Fig. 6. Small signal and noise analysis for the programmable TA elements. The TA elements and their respective transconductances  $(G_{m1}, G_{m2})$  result in an equivalent LC circuit with a resulting noise source per node. At low frequencies, the noise source and Q is limited by the open-loop TA gains.

resulting in the frequency response at the  $k_{th}$  of m taps is

$$V(x,t) = e^{j\omega_0 t} e^{-k(a+j\tau)}.$$
(19)

If the impedance was not matched, then a forward and backward wave are modeled, where the ratio of two waves would be governed by the calculated reflection resulting in a frequency and spatially dependent effect on the resulting wave transmission (e.g. passband region ripples). The slight shift in the response (Fig. 4) is consistent with a 10-20% mismatch from the ideal resistive termination. The mismatch in routing capacitance modifies the ideal termination resistance.

The frequency response at each tap can eventually be affected by the frequency dynamics of each TA. A TA could have an internal pole due to internal capacitances. Modeling each TA with  $\tau_1 = \tau_2 = \tau$ , and a second higher-frequency pole corresponding to  $\epsilon \tau$  results in the model equations

$$\tau \frac{dV_k}{dt} + \epsilon \tau^2 \frac{dlV_k}{dt} = V_{k-1} - V_{k+1}, \qquad (20)$$

resulting in the line partial differential equation

$$\left(\tau \frac{\partial}{\partial t} + \epsilon \tau^2 \frac{\partial}{\partial t} - \Delta x \frac{\partial}{\partial x}\right)$$
$$\left(\tau \frac{\partial}{\partial t} + \epsilon \tau^2 \frac{\partial}{\partial t} + \Delta x \frac{\partial}{\partial x}\right) V(x, t) = 0. \quad (21)$$

The forward wave dynamics with a forward velocity ( $v = \Delta x/\tau$ ) models the matched-line dynamics. The 1<sup>st</sup> order response at tap k of m-taps for a single frequency input ( $\omega_0$ ):

$$V_k(t) = \frac{e^{-k(a+j\tau)}}{1+j\epsilon\omega_0}\tau e^{j\omega_0 t}.$$
(22)

Typically  $\epsilon$  is small, as the bias current for the conductances would be similar, and the load capacitance typically would

 $10 \times$  to  $100 \times$  larger (>  $100 \times$  larger for the FPAA devices). A high-performance delay line would use a fully differential TA with no internal poles (e.g. FG CM Feedback, e.g. [22]).

# C. Delay Block Noise Analysis

Figure 6 shows a single-ended ladder filter, configured to emulate an LC delay line, its linearized behavior, as well as its noise performance. By modifying (7) we can model the TA current noise  $(\hat{I}_1, \hat{I}_2)$  as

$$\tau_{1,2} \frac{dV_k}{dt} = \frac{\hat{I}_{1,2}}{G_{m1,2}} + V_{k-1} - V_{k+1}, \ \tau_1 \ k \ \text{odd}, \ \tau_2 \ k \ \text{even},$$
(23)

$$\tau^2 \frac{d^2 V_k}{dt^2} = V_{k-2} - 2V_k + V_{k+2} + \frac{I_1}{G_{m1}} + \tau_1 \frac{d}{dt} \frac{I_2}{G_{m2}},$$
 (24)

where  $\tau^2 = \tau_1 \tau_2$  and channel current noise in a saturated MOSFET is given as  $\hat{I}^2 = 2 q I \Delta f$ . The noise will be wideband for the delay line at least until  $\omega \tau > 1$ , and therefore the  $\hat{I}_1$  term will effectively lead to kT/C type noise modified by the number of noise transistors ( $\nu$ ) and TA linear range. The  $\hat{I}_2$  term is noise-shaped in this band, resulting in lower noise. Treating  $\hat{I}_2$  term as a kT/C noise source as well overestimates the total noise. The noise input at a given stage would be

noise voltage = 
$$\sqrt{\frac{qvV_{L1}}{C_1} + \frac{qvV_{L2}}{C_2}}$$
. (25)

The noise power at node k looks like an input injected at that point due to the forward wave propagation. It would have a noise component from the previous stages, resulting in  $k \times$ more noise power. Although the modeling focuses on thermal noise, 1/f noise modeling follows similarly depending on the relevant bandwidth. Typically subthreshold and near-threshold circuits are biased such that 1/f noise is a smaller component of the total system noise over the desired signal bandwidth. If 1/f noise was significant where one can choose transistor sizes, one would utilize larger W × L transistors, effectively averaging over the noise sources, and decreasing the 1/f corner frequency. The introduction of advanced transistor technologies, such as high-electron mobility transistors, would significantly improve the 1/f noise performance as well as possibly improving the thermal noise performance.

## D. FPAA Ladder Filter Implementation

A SoC FPAA device (Fig. 7) [1], [2] provided the experimental platform to demonstrate the ladder-filter delay line (e.g. Figs. 3, 4). These devices enable both analog and digital computation [2], while retaining the x1000 improvement (as predicted by [23]) in computational energy-efficiency compared to custom digital solutions (e.g. [8]). The VMM implementation utilizes the CAB local routing fabric. This SoC FPAA, implemented in a 350nm CMOS process, enables FG programming [24], that achieves better than 1% subthreshold programming accuracy over 6 orders of magnitude of current over a roughly half million FG devices on the SoC FPAA IC. The Computational Analog Blocks (CAB) includes two FG and two non-FG input TAs [2]. The design of our TA



Fig. 7. Block diagram of the SoC FPAA IC and a view of the TA elements in a typical analog Computational Analog Block (CAB). The Ladder Filter utilizes the CAB TAs (two FG and two non-FG input), where as the VMM utilizes the CAB local routing fabric for these implementations.



Fig. 8. FG circuit design opportunities using TA / OTA design include programmable FG current sources (FG charge programming), programmable FG differential-pair input offsets (FG charge programming), and selectable differential-pair linear range due to FG capacitive coupling ratios.

blocks is empowered using FG circuit approaches. FG circuit design opportunities using TA design includes programmable FG current sources (sub threshold through above threshold), programmable FG differential-pair inputs (e.g. programmable offset voltages), and selectable differential-pair linear range due to FG capacitive coupling ratios without adding additional noise (Fig. 8). For TAs operating below or near  $I_{th}$  (< 200mV overdrive), their differential output current ( $I_{out}$ ) is function of the positive ( $V^+$ ) and negative ( $V^-$ ) input voltages,

$$I_{out} = I_{bias} \tanh\left(\frac{V^+ - V^-}{V_L}\right),\tag{26}$$

where  $V_L$ , is the resulting linear range of the device  $V_L = 2 U_T / \kappa$  (approx 70mV) for a subthreshold  $I_{bias}$ . Linear range can be extended using floating-gate input devices. Each TA has a linear range,  $V_{L1}$  and  $V_{L2}$ , resulting in time constants

$$\tau_1 = \frac{C_1 \ V_L}{I_{bias1}}, \tau_2 = \frac{C_2 \ V_L}{I_{bias2}}.$$
  
$$G_{m1} = I_{bias1}/V_{L1}, G_{m2} = I_{bias2}/V_{L2}$$

These approaches can achieve temperature insensitive operation, particularly if the gate of the current source is biased using an FG bootstrap current source [26].

The circuit components involve transconductance amplifiers and transistors (and similar components) with current sources programmable over six orders of magnitude in current (and therefore time constant) [24]. The FG TAs were biased with 200nA current, and were selected for the largest linear range of roughly 1V (open loop gain  $\approx 20$ , resulting in  $G_{m1} = 200nA/V$  (e.g. Fig. 3b). Each stage used a non-FG buffer biased with  $2\mu$ A. The  $4.2\mu$ s delay (Fig. 3b) corresponds to roughly a 1pF load capacitance, typical of an FPAA routing to nearby CAB components. The total power required for a 12-tap ladder filter ( $V_{dd} = 2.5V$ ) is  $18\mu$ W, scaled down and specialized architectures would dramatically decrease the load capacitance and the resulting energy [27]. FG devices can be used to set cascade voltages with no additional circuitry required for biasing. For most applications, the output load is primarily a capacitor with series resistances for switches.

# E. Nonlinear Analysis and Distortion Effects of Ladder Line

The focus of this section is to extend the analysis of the TA delay line to nonlinear behavior, (e.g. harmonic distortion) of this ladder filter structure. The primary nonlinearity occurs at the differential pair for the transconductance stage; the effect of the output resistance can be made negligible. We assume a single linear range  $V_L$  and a single time constant ( $\tau$ ) at a given node. Using the scaling of voltages,  $x_k = V_k/V_L$  and  $x_{in} = V_{in}/V_L$ , and the scaling of time,  $t = \tau t_1$ , the resulting definition results in the following set of differential equations

$$\tau \frac{dV_k}{dt} = \tanh\left(\frac{V_{k-1} - V_{k+1}}{V_L}\right), \frac{dx_k}{dt_1} = \tanh\left(x_{k-1} - x_{k+1}\right)$$
(27)

for a line of delay length m, where  $x_0 = x_{in}$ . For sake of clarity, we will focus the analysis first on the single-ended case; the differential ended case will have lower distortion when programmed to balance the resulting functions.

To analyze (27) for distortion measures, assume an input,

$$x_{in} = \epsilon \sin(\omega t) = \epsilon \sin(\omega_1 t_1) \tag{28}$$

with an input amplitude,  $\epsilon$ , and angular frequency,  $\omega = \omega_1/\tau$ . The resulting solution (Appendix 1) for the first-order sine-wave input into the delay line for a for even k line

harmonics at 
$$k = 8\epsilon^3 \sin^3(\frac{\omega_1}{2})$$
  
 $\left(\frac{3}{4}\cos(\omega_1(t-(k+1)/2)) + \frac{1}{4}\cos(3\omega_1(t-(k+1)/2))\right),$ 

where the even k line are the tap outputs. From these components, we can calculate the resulting third-order components (Fig. 9), both as a function of normalized frequency  $\omega_1$  and then as a function of amplitude  $\epsilon$ . Choosing  $\epsilon$  depends on the particular input signal. The parameter,  $\epsilon$  is proportional to the amplitude and directly ties to typical distortion measurements, such as single or two-tone tests. The distortion components decrease as a cubic function of frequency; for many applications, the resulting bandwidth is typically less than a single delay element. A more complex input requires setting  $\epsilon$  with the maximum input size. Frequencies near  $\tau$  have highest resulting distortion with no filtering improvements.

The injected components propagate down the delay line, and the signal power of these components add down the line.



Fig. 9. Calculated  $3^{rd}$  order harmonic distortion from ladder filter delay line. (a) distortion versus input amplitude for  $\omega_1 = 1$ . (b) distortion versus normalized frequency  $\omega_1$  for input amplitude at the linear range (V<sub>L</sub>), or  $\epsilon = 1$  for V<sub>1</sub> (dip after  $\omega = 1$ ) and V<sub>2</sub> taps. The ladder filter circuit continues to have nearly the same delay for frequencies above  $\omega \tau = 1$ .

The input nonlinearities happen before any filtering, so the distortion is a static result. By superposition, each component will propagate down the delay line with the expected delay; the effects will be a modification on the primary frequency and a similar effect as an additive third harmonic term. Distortion accumulation will be manageable due to approximate gain of 1 per stage. The distortion power, arising from uncorrelated signals, to be N times the distortion of the first stage.

# III. CT FPAA LINEAR-PHASE FILTER

Building from a delay-line approximation using a TA based ladder filter cascade, this section shows the measurements from a CT linear-phase filter experimentally demonstrated on the SoC FPAA. The analog delay line is an analog approximation of a linear-phase FIR filter. These filters are required where attenuation in the particular medium is low, but the traveling speed is finite and significant given the waveforms being processed as is the case in acoustic and ultrasound processing for echo cancelation and beamforming.

#### A. Linear Phase Demonstration

The approximate-delay signal temporal decomposition is combined with a VMM implemented in the crossbar routing fabric (Fig. 10). This particular VMM circuit only required positive weights, requiring only one source-coupled FG routing cell per line. The resulting filter is a boxcar filter with a sinusoidal modulation to include a zero to the low-pass filter (Fig. 10). The measured chirp response shows the low-pass frequency response (Fig. 10). This result was an initial demonstration of the filter's functionality.

The delay-line computation is the most challenging component of linear-phase filter. The ladder filter approach enables significantly higher frequencies to have a near constant delay compared to other delay approximations, such as an  $1^{st}$  order all-pass response (e.g. [10], [11], [16]). A  $1^{st}$  order all-pass



Fig. 10. Experimental Demonstration of a linear-phase filter composed of a ladder filter as a delay line approximation and a source-coupled FG transistor VMM configuration. The output VMM current is transformed to a voltage signal using an on-chip transimpedance amplifier built from two TA. This VMM used positive weights that were proportional to their programmed values. After applying a chirp signal to the  $V_{in}$  (100mV peak-to-peak), one can extract the resulting amplitude verses input frequency for the filter.

TABLE I Comparison of This FPAA Ladder-Filter Delay Line Approximation With Other Delay-Line Approximations

|      | Delay<br>(typical) | Power       | IC<br>Process | FOM    | Туре      | Prog? |
|------|--------------------|-------------|---------------|--------|-----------|-------|
| This |                    |             |               |        | Ladder,   |       |
| Work | $2.9 \mu s$        | $1.5 \mu W$ | 350nm         | 35.5pJ | FPAA      | Р     |
| [16] | 2.1ns              | 2.1mW       | 350nm         | 36.0pJ | All-pass  | Т     |
| [11] | 25ps               | 4mW         | 40nm          | 62.5pJ | All-pass  | F     |
| [10] | 75ps               | 5.5mW       | 28nm          | 526pJ  | All-pass  | Т     |
| [14] | 15.3ps             | 8mW         | 65nm          | 29.0pJ | $G_m$ -LC | F     |

Ladder filter approaches allows for a wider bandwidth for a  $\tau$  compared to All-pass filters or other responses. The FPAA structure is fully programmable (P), as compared to some globally tunable lines (T), or fixed-delay (F) lines.

response with perfect matching has a similar low-signal attenuation between stages like the ladder-filter approach, as opposed to the smaller bandwidth and signal attenuation from a cascade of LPF blocks (e.g. [33]). For the ladder filter, the resulting power required for a particular delay is directly a function of the load capacitances as well as the supply voltage and signal linear range. Assuming capacitance effectively scales quadratically with IC process, a Figure Of Merit (FOM) is

$$FOM = \frac{Delay(=\tau)Power}{(IC Process Normalized to 1\mu m)^2},$$

where a design would prefer a smaller FOM. Included linearity and related measures would modify the FOM, although we will utilize this metric at this stage. Programmability, as in the SoC FPAA, enables local parameters for tunability of a single delay element or global delay line through a current source bias. This SoC FPAA compiled implementation did not envision ladder-filters in the IC design, compares well to other techniques implemented custom Si that did not include the programmable, ladder-filter functionality (Table I). An

TABLE II Analog Delay Line Modeling

|                              | Delay     |                                              | Power  | Noise                                          |       | SNR                                 |        |
|------------------------------|-----------|----------------------------------------------|--------|------------------------------------------------|-------|-------------------------------------|--------|
| 1 tap<br>k <sup>th</sup> tap | auk $	au$ | $\frac{2CV_{dd}V_L/\tau}{2kCV_{dd}V_L/\tau}$ |        | $\frac{\sqrt{q\nu V_L/C}}{\sqrt{q\nu kV_L/C}}$ |       | $\frac{CV_L/(q\nu)}{CV_L/(q\nu k)}$ |        |
| Property                     |           |                                              |        |                                                |       |                                     |        |
| Delay (1) 4.2                |           | μs                                           | 1ns    | 1ns                                            | 100ps |                                     | 25ps   |
| Taps                         | 1         | 2                                            | 32     | 32                                             | 32    |                                     | 16     |
| C                            |           | σF                                           | 100fF  | 350fF                                          | 350fF |                                     | 350pF  |
| $V_L$ 1                      |           | V                                            | 0.2V   | 1V                                             | 0.5V  |                                     | 0.3V   |
| $V_{dd}$ 2.5                 |           | 5V                                           | 1V     | 1V                                             | 1V    |                                     | 1V     |
| Power                        | 14.3      | μW                                           | 1.28mW | 22.4mW                                         | 112   | mW                                  | 134mW  |
| Noise(max)                   | ) 2.4     | mV                                           | 5.54mV | 6.62mV                                         | 4.68  | BmV                                 | 2.57mV |
| SNR (min)                    | 52.4      | 4dB                                          | 31.2dB | 43.6dB                                         | 40.   | 6dB                                 | 41.4dB |

Summary of analytic expressions and representative numerical values showing the performance of these delay lines. Power is for all of the taps used.

unoptimized FPAA structure typically has larger capacitances due to the routing infrastructure. If this SoC FPAA structure was scaled to 40nm CMOS the smaller delays (e.g. 15-75ps) described earlier could be reached [28].

FG rows easily generate several parallel outputs from this single delay line. The VMM structure and source-node interfacing is similar to the VMM+WTA classifiers [29]. The weight values were programmed with above threshold currents  $(1-2\mu A)$ . The VMM source voltages are biased 100mV below power-supply resulting in a 40-80nA bias current. These techniques are directly extendable to negative weights using two FG elements [8]. The output was measured using two TAs in a transimpedance configuration (similar to used in [1]) where an FG TA is used in the feedback path to convert the current signal into a voltage. This FG TA used a  $2\mu$ A bias current. As the VMM computation corner frequency was > 50kHz, the bias currents could have been decreased reducing the power requirements. The transimpedance amplifier required more power ( $\approx 4\mu A$  bias current) than our programmed VMM computation (700nA bias current). The total VMM computation required  $16\mu W$  with another  $10\mu W$  for the transimpedance amplifier. Thisdelay line power, VMM power, and transimpedance power are similar to other delay-line approximations (e.g. [10]), except that the VMM computation requires more power as the tunable parameters are DACs.

## B. Analog FIR System Modeling

Having demonstrated these linear-phase analog filters, one can model these effects and consider how these techniques scale for different applications (Table II). These comparisons start by summarizing the modeling equations for the ladder-filter delay line (Table II). The power consumption for a single delay ( $P_d$ ) and M delay stages (M  $P_d$ ) are modeled:

$$P_{d} = \left(\frac{C_{2}V_{L1}}{\tau_{1}} + \frac{C_{1}V_{L2}}{\tau_{2}}\right)V_{dd}$$
$$MP_{d} = M\left(\frac{C_{2}V_{L1}}{\tau_{1}} + \frac{C_{1}V_{L2}}{\tau_{2}}\right)V_{dd}$$
(29)

where  $V_{dd}$  is the power supply. For a symmetric line we get

$$P_d = \frac{2CV_L}{\tau} V_{dd}, M P_d = \frac{2CV_L M}{\tau} V_{dd}$$
(30)

| SOURCE-INPUT V MM MODELING PARAMETERS |                               |                                 |                   |  |  |  |  |
|---------------------------------------|-------------------------------|---------------------------------|-------------------|--|--|--|--|
| Property                              | Bandwidth $(f)$               | Computation                     | Power $(P)$       |  |  |  |  |
| Expression                            | $\frac{I_{so}}{4\pi C_d U_T}$ | N M f                           | $2NMI_{so}V_{dd}$ |  |  |  |  |
| 26pA                                  | 4kHz                          | 400kMAC(/s)                     | 12.5nW            |  |  |  |  |
| 2.6nA                                 | 400kHz                        | 40MMAC(/s)                      | $1.25 \mu W$      |  |  |  |  |
| 260nA                                 | 40MHz                         | 4GMAC(/s)                       | $125\mu W$        |  |  |  |  |
| 26µA                                  | 4GHz                          | 400GMAC(/s)                     | 12.5mW            |  |  |  |  |
| MMAC(/s)                              | $/\mu W$                      | $\frac{1}{8\pi C_d U_T V_{dd}}$ | 31MMAC/µW         |  |  |  |  |

TABLE III

 $10log_{10}$ Computation for 12×8 (M×N) 4-quadrant VMM operation and  $V_{dd}$ = 2.5V.  $C_d$  is set to 20fF, a typical for moderate size FG-based switches.

Total noise and SNR at the  $k^{th}$  tap for a single stage is

Noise  $(\hat{i}_{out}/(2MI_{s0}))$ 

SNR

noise voltage = 
$$\sqrt{\frac{qvkV_L}{C}}$$
, SNR =  $\frac{CV_L}{qvk}$ . (31)

 $\overline{q/(C_d M U_T)}$ 

 $U_T C_d M$ 

0.5%

46 dB

These equations model the noise-floor of the system; in the 12-stage delay line shown previously, the total noise at the last stage would be 2.4mV. Modeling the four-quadrant source-coupled VMM block (adapted from a similar structure in [8]) requires the programmed bias current  $(I_{s0})$  for an individual FG element as well as the total capacitance  $(C_d)$ from a single FG element node including the source-drain junctions to well / substrate as well as routing capacitance. A  $C_d$  of 20fF is typical for a moderate size 350nm FG device, or typical of a large W/L (e.g. 40) 40nm FG device. The power, computation, and bandwidth all vary proportionally to  $I_{so}$ (Fig. III) as all these devices are operating with subthreshold currents ( $25\mu$ A bias current for a larger W/L 40nm process).  $V_{dd} = 2.5V$  as a common comparison between these devices, although it would scale with IC process (e.g. 1V for 40nm).

These computational models allow comparisons of these methods between analog and digital techniques. A 12-tap  $4.2\mu$ s CT delay as demonstrated previously could be part of 8 parallel analog filters with a  $12 \times 8.400$ kHz bandwidth VMM requiring 2.6nA bias current per element. The  $14.3\mu W$  CT delay line interfaces to the  $1.25\mu$ W VMM through a pFET transistor that also supplies the VMM current. The VMM module used earlier was programmed to much higher bias currents (65nA) to not have any effects from the VMM module. The output SNR of these filters would likely span 52 to 65dB depending on the particular application. A similar digital filter requires at least a 1-2 $\mu$ s sample rate, and would lose (by an input filter) or alias (no filter) the higher frequencies. We are not considering the additional cost of any input ADCs or DACs or signal conditioning in these discussions. A  $2\mu$ s sample rate would require 6MMAC (/s) per filter output, requiring 2.5mW for 16bit operations near the energy-efficiency wall [20]. For 8 filter outputs, the operation would require 20mW, slightly  $1000 \times$  more than the analog equivalent operation. The digital delay line energy is negligible if it is implemented near the multiply units in front of a set of parallel multipliers; moving digital data in and out of a memory bank can drastically increase the resulting energy/power estimate. The resulting digital computation would roughly have 12bit (66dB) SNR at the output in the best case, only slightly higher than the

TABLE IV ENERGY EFFICIENCY CALCULATIONS FOR SPATIO-TEMPORAL FILTERS

|                         | Delay<br>Power<br>(mW) | VMM<br>Power<br>(mW) | Total<br>Power<br>(mW) | VMM<br>(GMAC(/s)) |
|-------------------------|------------------------|----------------------|------------------------|-------------------|
| Temporal Filter         |                        |                      |                        |                   |
| (1 input, N=1)          | 67.2                   | 4.16                 | 71.36                  | 128               |
| Spatio-Temporal Filters |                        |                      |                        |                   |
| (1 input, N=10)         | 67.2                   | 41.6                 | 108.8                  | 1280              |
| Temporal Filter         |                        |                      |                        |                   |
| (8 input, N=1)          | 537.6                  | 32.6                 | 570.2                  | 1024              |
| Spatio-Temporal Filters |                        |                      |                        |                   |
| (8 input, N=10)         | 537.6                  | 326                  | 863.6                  | 10240             |

M=32 taps (3.2ns delay). Target bandwidth of 4GHz (0-4GHz) with 100ps delay stages.  $V_{dd}$  at 1V, consistent with a 40nm CMOS process. Output SNR would range in the 45-55dB depending on the particular application.

resulting analog system. For the analog system, the delay line can be costly compared to other approaches like bandpass wavelet filters, and yet, it can well matches a particular application and make the overall approach more effective.

One could extend these modeling discussions towards temporal analog-FIR type filters and their extension to spatio-temporal analog-FIR type filters for low RF (e.g. 4GHz bandwidth) application, possibly still in a configurable architecture (Table IV). For this architecture, one would likely use a 40nm CMOS FPAA design with  $V_{dd} = 1V$ ; scaling of FPAAs has been addressed elsewhere [27]. Scaling programmable TA-based ladder-filters and VMM results in potentially decreased capacitance size. FPAA capacitance decreases quadratically with decreasing IC process linewidth, resulting in a quadratic increase in energy-efficiency. FG programmability enables roughly the same bias current range, a current range that becomes mostly subthreshold for small IC process linewidths (e.g. 40nm and smaller). In these cases, the resulting improvement over digital computation remains at a similar level using specialized digital architectures.

#### IV. SUMMARY AND IMPACT

This work describes a programmable and configurable CT analog (approximate) linear-phase filter utilizing Ladder Filter as an approximate delay stage and a VMM on a large-scale Field Programmable Analog Array (FPAA). This effort demonstrates approximate analog delay signal decomposition, now allowing these capabilities for analog computation along with other analog signal decompositions. This work describes a programmable and configurable CT analog (approximate) linear-phase filter utilizing Ladder Filter approximate delay stage and a VMM on a large-scale FPAA. Generally, analog linear-phase filters are custom ASICs designed for a single application on an ASIC (e.g. [12], [13], [15]), losing the wider applicability of programmable FIR filters with linear phase response. This work is a first demonstration of a programmable and reconfigurable approximate analog delay signal decomposition on an FPAA, adding these capabilities for analog computation.

These approaches opens up a wider range of PDE computation enabling a wider range of possible computations (e.g. [30], [31]) in a programmable Gm-C configurable (e.g. FPAA) environment. CT Delay lines built from first-order amplifiers (e.g. [33]) and their linear phase extensions using all-pass stages (e.g. [10], [11], [16], are approximations of a first-order hyperbolic ODE with significant spatial diffusion (e.g. heat equation). These techniques extend small ladder filter Gm-C implementations (e.g. [4], [5]) for a systematic signal delay lines. These ladder-filter topologies enable a second-order time and space wave-guiding based delay lines that bring near constant delay near and past the time-constants of each stage and have significantly reduced diffusion and frequency attenuation. The CT delay lines, as opposed to bucket brigade and other sampled systems [32], allow the full time and frequency information for the next computation.

The FPAA implementation, built using ladder filter structure and VMM block, and experimental measurements showed the CT delay line framework for a linear-phase filter, the CT ladder filter implementation, and the CT linear-phase filter. Programming the VMM weights (**W**) allows compensating amplitude attenuation in the ladder-filter block. This amplitude attenuation can be predicted through the ladder-filter modeling. The theory and modeling can guide the Analog Built-in Self Test (e.g. [25]) to automatically make these corrections before deployment or in the field. Programmable linear-phase analog filters open an area of analog computation that has a challenging implementation history. These techniques scale for different applications, and demonstrate a x1000 improvement over digital computation even in FPAA devices (e.g. [1]).

# Appendix 1: Analytical Model for Distortion of TA Ladder Filter Delay Line

This appendix analyzes the circuit equations to model distortion for the ladder filter delay line. To analyze the coupled ODE system in (27) for distortion measures, assume an input,

$$x_{in} = \epsilon \sin(\omega t) = \epsilon \sin(\omega_1 t_1) \tag{32}$$

with an input amplitude,  $\epsilon$ , and angular frequency,  $\omega = \omega_1/\tau$ . Without loss of generality,  $t_1 \rightarrow t$ , through this derivation. We expect the resulting steady-state output solutions as

$$x_k = x_{k,0}(t) + \epsilon x_{k,1}(t) + \epsilon^2 x_{k,2}(t) + \epsilon^3 x_{k,3}(t) + O(\epsilon^4)$$
(33)

For this solution, we will need the Taylor expansion of

$$\tanh(x) = x - \frac{1}{3}x^3 + \frac{1}{5}x^5 + O(x^7).$$
(34)

The solution for  $x_{k,0}(t)$ , given we have a stable ODE with no forcing input for this order, we get  $x_{k,0}(t) = 0$  for all k. The solution for the first order term requires linearizing tanh(x) x, resulting in the first order system of equations

$$\frac{dx_{k,1}}{dt} = x_{k-1,1} - x_{k+1,1}, \ x_{0,1} = \sin(\omega_1 t), \tag{35}$$

the mathematical representation of the linear behavior of the ladder-filter line. The output node is sinusoidal at a single frequency because the input is sinusoidal at that frequency.

The resulting functional forms are computed iteratively by solving each stage starting from  $V_1(t)$ . Starting with an input

$$V_{in}(t) = \sin(\omega_1 t)$$
  

$$V_2(t) = \sin(\omega_1(t-1))$$



Fig. 11. Plot of the magnitude for  $V_k$  for odd values of k. We find that the magnitude for small  $\omega_1 < 1$  is nearly 1, and decreases quickly afterwords.

( $\epsilon$  normalized in t), The ODE for V<sub>1</sub>(t) is as

$$\tau \frac{dV_1(t)}{dt} = \sin(\omega_1 t) - \sin(\omega_1 (t-1)) \\ = 2\cos(\omega_1 (t-1/2))\sin((-\omega_1/2))$$

The solution for the single sine-wave input results in

$$V_1(t) = -\frac{\sin(\omega_1/2)}{\omega_1/2}\cos(\omega_1(t-1/2))$$

where we obtain a  $\tau/2$  delay. For small delay ( $\omega_1/2 \ll 1$ ), the resulting amplitude is 1; for a large delay ( $\omega_1/2 \gg 1$ , realizing  $|sin(\omega_1/2)| < 1$ , the amplitude was less than  $\frac{1}{\omega_1}$ . For some  $\omega_1$ , the amplitude for large delays goes directly to zero. The integration stage keeps the resulting amplitude from being too large. The solution to an input sine wave is

$$V_{k,1} = \sin(\omega_1(t+k/2)), \text{ k even}$$
  

$$V_{k,1} = -\frac{\sin(\omega_1/2)}{\omega_1/2}\cos(\omega_1(t+(k+1)/2)), \text{ k odd}$$
(36)

as expected for the modeling of the voltage nodes and current nodes, respectively, for an LC ladder filter. Figure 11 shows the normalized frequency response magnitude for the odd stages; the even stages is identical to the input structure.

The higher-order terms are driven from nonlinear computations of lower order terms. Substituting (33) into (34),

$$\begin{aligned} \tanh(x_{k+1} - x_{k-1}) \\ &= \epsilon(x_{k+1,1}(t) - x_{k-1,1}(t)) \\ &+ \epsilon^2(x_{k+1,2}(t) - x_{k-1,2}(t)) + \epsilon^3(x_{k+1,3}(t) - x_{k-1,3}(t)) \\ &\left(\epsilon(x_{k+1,1}(t) - x_{k-1,1}(t)) + O(\epsilon^2)\right)^3 + O(\epsilon^4). \end{aligned}$$
(37)

For the second order terms, the equations are identical to (35) with no driving term. As a result,  $x_{k,2}(t) = 0$  for all k. The third order terms satisfy the following ODE:

$$\frac{dx_{k,3}}{dt} = x_{k-1,3} - x_{k+1,3} + \left(x_{k+1,1}(t) - x_{k-1,1}(t)\right)^3,$$
  
$$= x_{k-1,3} - x_{k+1,3} + \left(\frac{dx_{k,1}}{dt}\right)^3$$
  
$$x_{m+1,3} = x_{0,3} = 0.$$
 (38)

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on January 17,2021 at 15:38:21 UTC from IEEE Xplore. Restrictions apply.

The sine-wave input solution for the first-order driving terms:

$$\frac{dx_{k,1}}{dt} = \omega_1 \sin(\omega_1(t - k/2)), \text{ (even k)}$$

$$\left(\frac{dx_k, 1}{dt}\right)^3 = \omega_1^3 \cos^3(\omega_1(t - k/2))$$

$$= \omega_1^3 \left(\frac{3}{4}\cos(\omega_1(t - k/2)) + \frac{1}{4}\cos(3\omega_1(t - k/2))\right)$$

$$\frac{dx_{k,1}}{dt} = -2\sin(\frac{\omega_1}{2})\cos(\omega_1(t - (k + 1)/2), \text{ (odd k)})$$

$$\left(\frac{dx_k, 1}{dt}\right)^3 = -8\sin^3(\frac{\omega_1}{2})\cos^3(\omega_1(t - (k + 1)/2))$$

$$= -8\sin^3(\frac{\omega_1}{2})$$

$$\left(\frac{3}{4}\cos(\omega_1(t - (k + 1)/2)) + \frac{1}{4}\cos(3\omega_1(t - (k + 1)/2))\right) (39)$$

Each stage injects first and third-order harmonic components that are injected and propagate down the odd k line

$$\epsilon^{3}\omega_{1}^{3}\left(\frac{3}{4}\cos(\omega_{1}(t-k/2))+\frac{1}{4}\cos(3\omega_{1}(t-k/2))\right),$$
  
and harmonics at even k =  $8\epsilon^{3}\sin^{3}(\frac{\omega_{1}}{2})$   
 $\left(\frac{3}{4}\cos(\omega_{1}(t-(k+1)/2))+\frac{1}{4}\cos(3\omega_{1}(t-(k+1)/2))\right)$ 

where the even k line are the tap outputs. The resulting third-order components are calculated from these components. The injected components propagate down the delay line, and the signal power of these components would add down the line. The distortion is a static function because the input nonlinearities occur before filtering. The output filtering stage, given that the total current is more than the signal, should be small distortion for the maximum signal. Choosing  $\epsilon$  depends on the particular input signal. Typical distortion tests, like single or two-tone tests, the  $\epsilon$  is the amplitude. A more complex input requires setting  $\epsilon$  with the maximum input size.

#### REFERENCES

- S. George *et al.*, "A programmable and configurable mixed-mode FPAA SoC," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 6, pp. 2253–2261, Jun. 2016.
- [2] J. Hasler, "Large-scale field-programmable analog arrays," Proc. IEEE, vol. 108, no. 8, pp. 1283–1302, Aug. 2020.
- [3] M. Kucic, A. Low, Hasler, and J. Neff, "A programmable continuoustime floating-gate Fourier processor," *IEEE Trans. Circuits Syst. II. Analog Digit. Signal Process.*, vol. 48, no. 1, pp. 90–99, Jan. 2001.
- [4] W. C. Black, D. J. Allstot, and R. A. Reed, "A high performance low power filter," in *Proc. JSSC*, 1980, vol. 15, no. 6, pp. 929–938.
- [5] F. Behbahani, W. Tan, A. Karimi-Sanjaani, A. Roithmeier, and A. A. Abidi, "A broad-band tunable CMOS channel-select filter for a low-IF wireless receiver," *IEEE J. Solid-State Circuits*, vol. 35, no. 4, pp. 476–489, Apr. 2000.
- [6] J. Hasler, "Opportunities in physical computing driven by analog realization," in *Proc. IEEE Int. Conf. Rebooting Comput. (ICRC)*, Oct. 2016, pp. 1–8.
- [7] R. Chawla, A. Bandyopadhyay, V. Srinivasan, and P. Hasler, "A 531 nW/MHz, 128×132 current-mode programmable analog vector-matrix multiplier with over two decades of linearity," in *Proc. IEEE Custom Integr. Circuits Conf.*, Oct. 1992, pp. 651–654.
- [8] C. R. Schlottmann and P. E. Hasler, "A highly dense, low power, programmable analog vector-matrix multiplier: The FPAA implementation," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 1, no. 3, pp. 403–411, Sep. 2011.

- [9] V. Srinivasan, G. Rosen, and P. Hasler, "Low-power realization of FIR filters using current-mode analog design techniques," in *Proc. Asilomar Conf. Signals, Syst. Comput.*, vol. 2, Nov. 2004, pp. 2223–2227.
- [10] E. Mammei et al., "8.3 A power-scalable 7-tap FIR equalizer with tunable active delay line for 10-to-25Gb/s multi-mode fiber EDC in 28nm LP-CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 142–143.
- [11] R. Boesch, K. Zheng, and B. Murmann, "A 0.003 mm<sup>2</sup> 5.2 mW/tap 20 GBd inductor-less 5-tap analog RX-FFE," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2016, pp. 1–2.
- [12] G. A. De Veirman and R. G. Yamasaki, "A 27 MHz programmable bipolar 0.05 degrees equiripple linear-phase lowpass filter," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Dec. 1992, pp. 64–65.
- [13] N. Rao, V. Balan, and R. Contreras, "A 3-V, 10-100-MHz continuoustime seventh-order 0.05° equiripple linear phase filter," *IEEE J. Solid-State Circuits*, vol. 34, no. 11, pp. 1676–1682, Nov. 1999.
- [14] A. Momtaz and M. M. Green, "An 80 mW 40 Gb/s 7-tap T/2-spaced feed-forward equalizer in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 629–639, Mar. 2010.
- [15] H. Wei Su and Y. Sun, "A CMOS 100 MHz continuous-time seventh order 0.05° equiripple linear phase leapfrog multiple loop feedback G<sub>m</sub>-C filter," in *Proc. IEEE Int. Symp. Circuits Systems. Process.*, May 2002, pp. 1–5.
- [16] G. Gurun, J. S. Zahorian, A. Sisman, M. Karaman, P. E. Hasler, and F. L. Degertekin, "An analog integrated circuit beamformer for highfrequency medical ultrasound imaging," *IEEE Trans. Biomed. Circuits Syst.*, vol. 6, no. 5, pp. 454–467, Oct. 2012.
- [17] H. Qiuting and G. S. Moschytz, "Analog FIR filters with an oversampled Sigma–Delta modulator," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 39, no. 9, pp. 658–663, 1992.
- [18] E. Ozalevli, W. Huang, P. E. Hasler, and D. V. Anderson, "A reconfigurable mixed-signal VLSI implementation of distributed arithmetic used for finite-impulse response filtering," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 2, pp. 510–521, Mar. 2008.
- [19] A. Oppenheim and R. W. Schafer, *Digital Signal Processing*. Upper Saddle River, NJ, USA: Prentice-Hall, 1975.
- [20] H. B. Marr, B. Degnan, P. Hasler, and D. Anderson, "Minimization of energy per op in an asynchronous pipeline above and below threshold," *Proc. IEEE Trans. VLSI*, Feb. 2012, pp. 1–5.
- [21] J. Hasler, "Analog architecture and complexity theory to empowering ultra-low power configurable analog and mixed mode SoC systems," *J. Low Power Electron. Appl.*, vol. 12, pp. 1–37, Jan. 2019.
- [22] R. Chawla, F. Adil, G. Serrano, and P. Hasler, "Programmable G<sub>m</sub>-C filters using floating-gate operational transconductance amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 54, no. 3, pp. 481–491, Mar. 2007.
- [23] C. Mead, "Neuromorphic electronic systems," *Proc. IEEE*, vol. 78, no. 10, pp. 1629–1636, Dec. 1990.
- [24] S. Kim, J. Hasler, and S. George, "Integrated floating-gate programming environment for system-level ICs," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 24, no. 6, pp. 2244–2252, Jun. 2016.
  [25] S. Shah and J. Hasler, "Tuning of multiple parameters with a BIST
- [25] S. Shah and J. Hasler, "Tuning of multiple parameters with a BIST system," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 7, pp. 1772–1780, Jul. 2017.
- [26] V. Srinivasan, G. Serrano, C. M. Twigg, and P. Hasler, "A floating-gatebased programmable CMOS reference," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 11, pp. 3448–3456, Dec. 2008.
- [27] J. Hasler, S. Kim, and F. Adil, "Scaling floating-gate devices predicting behavior for programmable and configurable circuits and systems," *Jlpea*, vol. 6, no. 13, pp. 1–19, 2016.
- [28] J. Hasler and H. Wang, "A fine-grain FPAA fabric for RF+baseband," in *Proc. GOMAC*, Mar. 2015, pp. 1–8.
- [29] J. Hasler and S. Shah, "VMM + WTA embedded classifiers learning algorithm implementable on SoC FPAA devices," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 8, no. 1, pp. 65–76, Mar. 2018.
- [30] E. Afshari, H. S. Bhat, and A. Hajimiri, "Ultrafast analog Fourier transform using 2-D LC lattice," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 8, pp. 2332–2343, Sep. 2008.
- [31] G. N. Lilis, J. Park, W. Lee, G. Li, H. S. Bhat, and E. Afshari, "Harmonic generation using nonlinear LC lattices," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 7, pp. 1713–1723, Jul. 2010.
- [32] R. Marston, "Analog delay lines," in *Proc. Radio Electronics*, Oct. 1986, pp. 66–80.
- [33] C. Mead, Analog VLSI and Neural Systems. Reading, MA, USA: Addison-Wesley, 1989.