Model order reduction for multi-terminal circuits

Citation for published version (APA):

Document status and date:
Published: 01/01/2009

Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher’s website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne

Take down policy
If you believe that this document breaches copyright please contact us at:
openaccess@tue.nl
providing details and we will investigate your claim.

Download date: 05. Mar. 2024
Model order reduction for multi-terminal circuits

by

R. Ionutiu, J. Rommes

Centre for Analysis, Scientific computing and Applications
Department of Mathematics and Computer Science
Eindhoven University of Technology
P.O. Box 513
5600 MB Eindhoven, The Netherlands
ISSN: 0926-4507
Model order reduction for multi-terminal circuits

Roxana Ionutiu\textsuperscript{1} and Joost Rommes\textsuperscript{2}

Abstract Analysis of effects due to parasitics is of vital importance during the design of large-scale integrated circuits, since it gives insight into how circuit performance is affected by undesired parasitic effects. Due to the increasing amount of interconnect and metal layers, parasitic extraction and simulation may become very time consuming or even unfeasible. Developments are presented, for reducing systems describing $R$ and $RC$ netlists resulting from parasitic extraction. The methods exploit tools from graph theory to improve sparsity preservation especially for circuits with multi-terminals. Circuit synthesis is applied after model reduction, and the resulting reduced netlists are tested with industrial circuit simulators. With the novel $RC$ reduction method SparseMA, experiments show reduction of 95\% in the number of elements and 68x speed-up in simulation time.

1 Introduction

Analysis of effects due to parasitics is of vital importance during the design of large-scale integrated circuits and derived products. One way to model parasitics is by means of parasitic extraction, which results in large linear $RCL(k)$ networks. In ESD analysis, for instance, the interconnect network is modeled by resistors with resistances that are based on the metal properties. In other (RF) applications one needs $RC$ or even $RCLk$ extractions to deal accurately with higher frequencies as well.

Roxana Ionutiu
Eindhoven University of Technology, Department of Mathematics and Computer Science, P.O. Box 513 5600 MB Eindhoven jointly with the School of Engineering and Science, Jacobs University Bremen, 28725 Bremen, Germany, e-mail: r.ionutiu@tue.nl, r.ionutiu@jacobs-university.de

Joost Rommes
NXP Semiconductors, HTC46-2.206, PostBox WDA-2, NL-5656 AE Eindhoven, e-mail: joost.rommes@nxp.com
The resulting parasitic networks may contain up to millions of resistors, capacitors, and inductors, and hundreds of thousands of internal nodes, and thousands of external nodes (nodes with connections to active elements such as transistors). Simulation of such large networks within reasonable time is often not possible, and including such networks in full system simulations may be even unfeasible. Hence, there is need for much smaller networks that accurately or even exactly describe the behavior of the original network, but allow for fast analysis.

In this chapter we describe recently developed methods for the reduction of large $R$ networks, and present a new approach for the reduction of large $RC$ networks. We show how insights from graph theory, numerical linear algebra, and matrix reordering algorithms can be used to construct a reduced network with the same number of external nodes, but much fewer internal nodes and circuit elements (resistors and capacitors). The approach is illustrated by numerical results.

The chapter is organized as follows. Sect. 2 revisits recent work on reduction of $R$ networks [8, 9]. It provides the basis for understanding how graph theoretical tools can be used to significantly improve the sparsity of the reduced models, which are later synthesized [29] into reduced netlists. Sect. 3 deals with the reduction of $RC$ networks. Sect. 3.1 first reviews an existing method which employs pole analysis and congruence transformations (PACT) [1] to reduce $RC$ netlists with multi-terminals. In Sect. 3.2 the new method Sparse Modal Approximation (SparseMA) is presented, where graph-theoretical tools are brought in to enhance sparsity preservation for the reduced models. The numerical results for both $R$ and $RC$ netlist reduction are presented in Sect. 4. Sect. 5 concludes.

2 Reduction of $R$ networks

In this section we review the approach for reducing $R$ networks, as developed in [8, 9]. Reduction of $R$ networks, i.e., networks that consist of resistors only, is needed in electro-static discharge analysis (ESD) analysis, where large extracted $R$ networks are used to model the interconnect. Accurate modeling of interconnect is required here, since the costs involved may vary from a few cents to millions if, due to interconnect failures, a respin of the chip is needed. An example of a damaged piece of interconnect that was too small to conduct the amount of current is shown in Figure 1.

2.1 Circuit equations and matrices

Kirchhoff’s Current Law and Ohm’s Law for resistors lead to the following system of equations for a resistor network with $N$ resistors (resistor $i$ having resistance $r_i$) and $n$ nodes ($n < N$):
Fig. 1 Example of a piece of interconnect that was damaged because it was too small to conduct the amount of current caused by a peak charge.

\[
\begin{bmatrix}
R & P \\
-P^T & 0
\end{bmatrix}
\begin{bmatrix}
i_b \\
v
\end{bmatrix} =
\begin{bmatrix}
0 \\
i_n
\end{bmatrix},
\]

(1)

where \( R = \text{diag}(r_1, \ldots, r_N) \in \mathbb{R}^{N \times N} \) is the resistor matrix, \( P \in \{-1, 0, 1\}^{N \times n} \) is the incidence matrix, \( i_b \in \mathbb{R}^N \) are the resistor currents, \( i_e \in \mathbb{R}^n \) are the injected node currents, and \( v \in \mathbb{R}^n \) are the node voltages.

The MNA (modified nodal analysis) formulation [11] can be derived from (1) by eliminating the resistor currents \( i_b = -R^{-1}Pv \):

\[
Gv = i_n,
\]

(2)

where \( G = P^T R^{-1}P \in \mathbb{R}^{n \times n} \) is symmetric positive semidefinite. Since currents can only be injected in external nodes, and not in internal nodes of the network, system (2) has the following structure:

\[
\begin{bmatrix}
G_{11} & G_{12} \\
G_{12}^T & G_{22}
\end{bmatrix}
\begin{bmatrix}
v_e \\
v_i
\end{bmatrix} =
\begin{bmatrix}
B \\
0
\end{bmatrix}
i_e
\]

(3)

where \( v_e \in \mathbb{R}^{n_e} \) and \( v_i \in \mathbb{R}^{n_i} \) are the voltages at external and internal nodes, respectively \( (n = n_e + n_i) \), \( i_e \in \mathbb{R}^{n_e} \) are the currents injected in external nodes, \( B \in \{-1, 0, 1\}^{n_e \times n_e} \) is the incidence matrix for the current injections, and \( G_{11} = G_{11}^T \in \mathbb{R}^{n_e \times n_e} \), \( G_{12} \in \mathbb{R}^{n_e \times n_i} \), and \( G_{22} = G_{22}^T \in \mathbb{R}^{n_i \times n_i} \). The block \( G_{11} \) is also referred to as the terminal block.

A current source (with index \( s \)) between terminals \( a \) and \( b \) with current \( j \) results in contributions \( B_{a,s} = 1, B_{b,s} = -1 \), and \( i_e(s) = j \). If current is only injected in a terminal \( a \) (for instance if \( a \) connects the network to the top-level circuit), the contributions are \( B_{a,s} = 1 \) and \( i_e(s) = j \).

Finally, systems (1)–(3) must be made consistent by grounding a node \( \text{gnd} \), i.e., setting \( v(\text{gnd}) = 0 \) and removing the corresponding equations. In the following we will still use the notation \( G \) for the grounded system matrix, if this does not lead to confusion.
2.2 Problem formulation

The problem is: given a very large resistor network described by (1), find an equivalent network with (a) the same external nodes, (b) exactly the same path resistances between external nodes, (c) \( \hat{n} \ll n \) internal nodes, and (d) \( \hat{r} \ll r \) resistors. Additionally, (e) the reduced network must be realizable as a netlist so that it can be (re)used in the design flow as subcircuit of large systems.

Simply eliminating all internal nodes will lead to an equivalent network that satisfies conditions (a)–(c), but violates (d) and (e): for large numbers \( m \) of external nodes, the number of resistors \( \hat{r} = (m^2 - m)/2 \) in the dense reduced network is in general much larger than the number of resistors in the sparse original network (\( r \) of \( O(n) \)), leading to increased memory and CPU requirements.

2.3 Existing approaches

There are several approaches to deal with large resistor networks. In some cases the need for an equivalent reduced network can be circumvented in some way: due to sparsity of the original network, memory usage and computational complexity are in principle not an issue, since solving linear systems with the related conductance matrices is typically of complexity \( O(n^\alpha) \), where \( 1 < \alpha \leq 2 \), instead of the traditional \( O(n^3) \) [17]. Of course, \( \alpha \) depends on the sparsity and will rapidly increase as sparsity decreases. This also explains why eliminating all internal nodes does not work in practice: the large reduction in unknowns is easily undone by the enormous increase in number of resistors, mutually connecting all external nodes.

However, if we want to (re)use the network in full system simulations, a reduced equivalent network is needed to limit simulation times or make simulation possible at all. In [20] approaches based on large-scale graph partitioning packages such as (h)METIS [21] are described, but only applied to small networks. Structure preserving projection methods for model reduction [22, 23], finally, have the disadvantage that they lead to dense reduced-order models if the number of terminals is large. There is commercial software [18, 19] available for the reduction of parasitic reduction networks.

2.4 Improved approach

Knowing that eliminating all internal nodes is not an option and that projection methods lead to dense reduced-order models, we use concepts from matrix reordering algorithms such as AMD [24] and BBBD [25], usually used as preprocessing step for (parallel) LU- or Cholesky-factorization, to determine which nodes to eliminate. The fill-in reducing properties of these methods also guarantee sparsity of the reduced network. Similar ideas have also been used in [20, 26].
Our main motivation for this approach is that large resistor networks in ESD typically are extracted networks with a structure that is related to the underlying (interconnect) layout. Unfortunately, the extracted networks are usually produced by extraction software of which the algorithms are unknown, and hence the structure of the extracted network is difficult to recover. Standard tools from graph theory, however, can be used to recover at least part of the structure.

Our approach can be summarized as follows:

1. The first step is to compute the strongly connected components [12] of the network. The presence of strongly connected components is very natural in extracted networks: a piece of interconnect connecting two other elements such as diodes or transistors, for instance, results in an extracted network with two terminals, disconnected from the rest of the extracted circuit. By splitting the network into connected components, we have simplified the problem of reduction because we can deal with the connected components one by one.

2. The second step is to selectively eliminate internal nodes in the individual connected components. For resistor networks, this can be done using the Schur complement [28], and no approximation error is made. The key here is that those internal nodes are eliminated that give the least fill-in. First, (Constrained) AMD [13] is used to reorder the unknowns such that the terminal nodes will be among the last to eliminate. To find the optimal reduction, internal nodes are eliminated one-by-one in the order computed by AMD, while keeping track of the reduced system with fewest resistors.

   Since the ordering is chosen to minimize fill-in, the resulting reduced matrix is sparse. Note that all operations are exact, i.e., we do not make any approximations. As a result, the path resistances between external nodes remain equal to the path resistances in the original network.

3. Finally, the reduced conductance matrix can be realized as a reduced resistor network that is equivalent to the original network. This is done easily by un-stampaig the values in the $G$ matrix intro the corresponding resistor values and their node connections in the netlist [5]. Since the number of resistors (and number of nodes) is smaller than in the original network, also the resulting netlist is smaller in size.

   An additional reduction could be obtained by removing relatively large resistors from the resulting reduced network. However, this will introduce an approximation error that might be hard to control a priori, since no sharp upper bounds on the error are available [7]. Another issue that is subject to further research is that the optimal ratio of number of (internal) nodes to resistors (sparsity) may also depend on the ratio of number of external to internal nodes, and on the type of simulation that will be done with the network.

   In the following sections we will describe how strongly connected components and fill-in minimizing reorderings can be used for the reduction of $RC$ networks as well.
3 Reduction of RC networks

This section presents the developments for RC netlist reduction, first by reviewing an existing approach called PACT (Pole Analysis via Congruence Transformations). Then, graph-based tools are brought in to enhance sparsity preservation with the novel reduction method, SparseMA (Sparse Modal Approximation).

Following the problem description in [1], consider the modified nodal analysis (MNA) description of an input impedance type RC circuit, driven by input currents:

\[(G + sC)x(s) = Bu(s), \quad (4)\]

where \(x\) denote the node voltages, and \(u\) represent the currents injected into the terminals (also called ports or external nodes). The number of internal nodes is \(n\), and the number of terminals is \(p\), thus \(G \in \mathbb{R}^{(p+n) \times (p+n)}\), \(C \in \mathbb{R}^{(p+n) \times (p+n)}\) and \(B \in \mathbb{R}^{(p+n) \times p}\). A natural choice for the system outputs are the voltage drops at the terminal nodes, i.e., \(y(s) = B^T x(s)\). Thus the transfer function of (4) is the input impedance:

\[Z(s) = \frac{y(s)}{u(s)} = B^T (G + sC)^{-1}B. \quad (5)\]

Modal approximation is a method to reduce (4), by preserving its most dominant eigenmodes. The dominant eigenmodes are a subset of the poles of \(Z(s)\) [i.e. of the generalized eigenvalues \(\Lambda ((-G, C))\)] and can be computed using specialized eigenvalue solvers (SADPA [4] or SAMDP [2,10]). For the complete discussion on modal approximation and its implementation we refer to [3,4,10]. Here, we emphasize that applying modal approximation to reduce (4) directly is unsuitable especially if the underlying RC circuit has many terminals (inputs). This is because modal approximation does not preserve the structure of \(B\) and \(B^T\) during reduction (for ease of understanding we denote the input-output structure loss as non-preservation of terminals) [5]. Modeling the input-output connectivity of the reduced model would require synthesis via controlled sources at the circuit terminals, and furthermore would connect all terminals with one-another [5]. In this chapter we present several alternatives for reducing RC netlists where not only the terminals are preserved, but also the sparsity of the reduced models.

Grouping the node voltages so that \(x_P \in \mathbb{R}^p\) are the voltages measured at the terminal nodes, and \(x_I \in \mathbb{R}^n\) are the voltages at the internal nodes, we can partition (4) as follows:

\[
\begin{pmatrix}
G_P & G_C \\
G_C & G_I
\end{pmatrix}
+ s
\begin{pmatrix}
C_P & C_C \\
C_C & C_I
\end{pmatrix}
\begin{bmatrix}
x_P \\
x_I
\end{bmatrix}
= \begin{bmatrix}
B_P \\
0
\end{bmatrix} u. \quad (6)
\]

Since no current is injected into internal nodes, the non-zero contribution from the input is \(B_P \in \mathbb{R}^{(p \times p)}\). Eliminating \(x_I\), system (6) is equivalent to:
In (7) the matrix blocks \((G_P + sC_P)\) corresponding to the circuit terminals are isolated. Applying modal approximation on \(Y_I(s)\) would reduce the system and preserve the location of the terminals. This would involve for instance computing the dominant eigenmodes of \((-G_I, C_I)\) via a variant of SAMDP (called here frequency dependent SAMDP, because the input-output matrices \((G_C + sC_C)\) depend on the frequency \(s\)). We have implemented this approach, but it turns out that a large number of dominant eigenmodes of \((-G_I, C_I)\) would be needed to capture the DC and offset of the full system \(Y(s)\). Instead, two alternatives are presented that improve the quality of the approximation: an existing method called PACT (Pole Analysis via Congruence Transformations) [1] and a novel graph-based reduction called SparseMA (Sparse Modal Approximation).

### 3.1 Existing method: PACT

In [1] the authors propose to capture the DC and offset of \(Y(s)\) via a congruence transformation which reveals the first two moments of \(Y(s)\) as follows. Since \(G_I\) is symmetric positive definite, the Cholesky factorization \(LL^T = G_I\) exists. Using the following congruence transformation:

\[
X = \begin{bmatrix}
1 & 0 \\
-G_I^{-1}G_C & L^{-T}
\end{bmatrix}, \quad G' = X^T G X = \begin{bmatrix} G'_P & 0 \\ 0 & 1 \end{bmatrix}, \quad C' = X^T C X = \begin{bmatrix} C'_P & C'_I \\ C'_C & C'_I \end{bmatrix}
\]

equations (7), (8) are rewritten as:

\[
\begin{align*}
[(G'_P + sC'_P) - s^2 C'_C (I + sC'_I)^{-1} C'_C] x'_P &= B_P u \quad (10) \\
Y'_P(s) - s^2 C'_C (I + sC'_I)^{-1} C'_C] x'_P &= B_P u
\end{align*}
\]

\[
Y'(s) = Y'_P(s) - Y'_I(s), \quad (11)
\]

where:

\[
G'_P = G_P - G_T G_C , \quad M = G_I^{-1} G_C \quad (12)
\]

\[
C'_P = C_P - N^T M - M^T C_C, \quad N = C_C - C_I M \quad (13)
\]

\[
C'_C = L^{-1} N, \quad C'_I = L^{-1} C_I L^{-T}. \quad (14)
\]

In (10), the term \(Y'_P(s)\) captures the first two moments of \(Y'(s)\) and is preserved in the reduced model. The reduction is performed on \(Y_I(s)\) only. In [1] this is done via modal approximation as described next. Using the symmetric eigendecomposition
\[ C_I' = U \Lambda_I' U^T, \quad U^T U = I, \]

the system matrices (9) are block diagonalized as follows:

\[
\begin{align*}
X' &= \begin{bmatrix} 1 & 0 \\ 0 & U \end{bmatrix}, \quad G'' = X'^T G X' = \begin{bmatrix} G^p & 0 \\ 0 & 1 \end{bmatrix} = G', \\
C'' &= X'^T C X' = \begin{bmatrix} C^p & C^T C U \\ U^T C^T + U^T C U \\ C^T C & U \end{bmatrix}, \\
Y''(s) &= Y_p'(s) - s^2 k \sum_{i=1}^{k} \frac{r_i^T r_i}{1 + s \lambda_i}, \quad r_i^T = C^T C [i, i], \quad \lambda_i = \Lambda_I'[i,i].
\end{align*}
\] (15)

The reduced model is obtained by selecting only \( k \) of the \( n \) eigenvalues from \( \Lambda_I' \):

\[
Y_k''(s) = Y_p'(s) - s^2 \sum_{i=1}^{k} \frac{r_i^T r_i}{1 + s \lambda_i}, \quad r_i^T = C^T C [i, i], \quad \lambda_i = \Lambda_I'[i,i].
\] (18)

In [1], a selection criterion for \( \lambda_i' \), \( i = 1 \ldots k \) is proposed, based on a user-specified error and a maximum frequency. These eigenmodes are computed in [1] via the Lanczos algorithm. The criterion proposed in [3, 10] can also be used to compute the dominant eigenmodes \( \lambda_i' \) via SAMDP.

The advantage of the PACT reduction method is the preservation of the first two moments of \( Y(s) \) in \( Y_p'(s) \). This ensures that the DC and offset of the response is approximated well in the reduced model. The main costs of such an approach are:

1. Performing a Cholesky factorization of \( C_I \) (which becomes expensive when \( n \) is very large),
2. Solving an eigenvalue problem from a dense \( C_I \) matrix and, most importantly,
3. The fill-in in the port block matrices \( G_p, C_p \) and \( C_c \).

It turns out that (2) can be solved more efficiently by keeping \( C_I' \) as a product of sparse matrices during computation, and will be addressed elsewhere. Avoiding problems (1) and (3) however require new strategies to improve sparsity, and are presented in Sect. 3.2.

The fill-in introduced in \( G_p, C_p \) becomes especially important for RC netlists with many terminals \([p \sim O(10^3)]\). Compared to the original model where the port blocks \( G_p, C_p \) were sparse, the dense \( G_p, C_p \) will yield many \( R \) and \( C \) components during synthesis, resulting in a reduced netlist where almost all the nodes are interconnected. Simulating such netlists might require longer time measures than the original circuit simulation, hence sparser reduced models (and netlists) are desired.

Next, we present several ideas for improving the sparsity of RC reduced models via a combination of tools including: netlist partitioning, graph-based node reordering strategies, and efficient algorithms for modal approximation.

### 3.2 Improved graph-based method: SparseMA

In this section we present an improved model reduction method for RC circuits, which overcomes the disadvantages of PACT: it requires no matrix factorizations prior to reduction, performs all numerical computations on sparse matrices, and
most importantly, preserves the sparsity of the matrix blocks corresponding to the external nodes. The method is called **sparse modal approximation (SparseMA)** and uses tools from graph theory to identify a partitioning and reordering of nodes that, when applied prior to the model reduction step, can significantly improve the sparsity of the reduced model.

The idea is to reorder the nodes in the $RC$ netlist so that some of the internal nodes ($m$) are promoted as external nodes, together with the circuit terminals ($p$). We will denote as **selected nodes** the collection of $p + m$ terminals and promoted internal nodes. The $n - m$ internal nodes are the **remaining nodes**. Supposing one has already identified such a partitioning of nodes, the following structure is revealed, where without loss of generality we assume the selected nodes appear in the border of the $G$ and $C$ matrices:

$$
\begin{bmatrix}
G_R & G_K \\
G_K^T & G_S
\end{bmatrix} + s
\begin{bmatrix}
C_R & C_K \\
C_K^T & C_S
\end{bmatrix}
\begin{x} x_R \\
x_S
\end{bmatrix} = \begin{bmatrix} 0 \\
B_S
\end{bmatrix} u. \tag{19}
$$

Note that in $B_S$ the rows corresponding to the promoted $m$ internal nodes are still zero. Similarly to (7), the admittance is expressed as:

$$
\begin{bmatrix}
(G_S + sC_S) - (G_K + sC_K)^T (G_R + sC_R)^{-1} (G_K + sC_K)
\end{bmatrix} x_S = B_S u \tag{20}
$$

$$
Y(s) = Y_S(s) - Y_R(s). \tag{21}
$$

Recall that reducing $Y_I(s)$ directly from the simple partitioning (6) and (7) is not a method of choice, because by preserving $Y_P(s)$ only, the DC and offset of $Y(s)$ would not be accurately matched. Using instead the improved partitioning (19) and (20), one aims at better approximating the DC and offset of $Y(s)$ by preserving $Y_S(s)$ (which now encaptures not only the external nodes but also a subset of the internal nodes). Finding the partitioning (19) only requires a reordering of nodes, thus no Cholesky factorization or fill-introducing congruence transformation is needed prior to the MOR step. One can reduce $Y_R(s)$ directly with modal approximation (via frequency dependent SAMDP), and preserve the sparsity of the extended port blocks from $Y_S(s)$.

By interpolating $k$ dominant eigenmodes from the symmetric eigendecoposition $[A_R, V] = \text{eig}(-G_R, C_R)$, the reduced model is obtained:

$$
Y_k(s) = Y_S(s) - \sum_{i=1}^{k} \frac{q_i^T q_i}{1 + s\lambda_i}, \quad q_i^T = (G_K + sC_K)^T V_{[:,i]}, \quad \lambda_i = A_{R[i,i]}. \tag{22}
$$

In matrix terms, the reduced model is easily constructed by re-connecting the preserved selected matrix blocks to the reduced blocks:

$$
\begin{bmatrix}
G_R & G_K \\
G_K^T & G_S
\end{bmatrix} + s
\begin{bmatrix}
C_R & C_K \\
C_K^T & C_S
\end{bmatrix}
\begin{x} x_R \\
x_S
\end{bmatrix} = \begin{bmatrix} 0 \\
B_S
\end{bmatrix} u. \tag{23}
$$
where:

$$
\tilde{G}_R = V^T_{\cdot,1:k} G_R V_{\cdot,1:k} \rightarrow \text{diagonal}, \quad \tilde{G}_K = V^T_{\cdot,1:k} G_K, \quad G_S \rightarrow \text{sparse} \tag{24}
$$

$$
\tilde{C}_R = V^T_{\cdot,1:k} C_R V_{\cdot,1:k} \rightarrow \text{diagonal}, \quad \tilde{C}_K = V^T_{\cdot,1:k} C_K, \quad C_S \rightarrow \text{sparse}. \tag{25}
$$

The remaining problem is how to determine the selected nodes and the partitioning (19). Inspired from the results obtained for $R$ networks, we propose to first find the permutation $P$ which identifies the strongly connected components (sccs) of $G$. Both $G$ and $C$ are reordered according to $P$, revealing the structure (19). With this permutation, the circuit terminals are redistributed according to the sccs of $G$, and several clusters of nodes can be identified: a large component consisting of internal nodes and very few (or no) terminals, and clusters formed each by internal nodes plus some terminals. We propose to leave all clusters consisting of internal nodes and terminals intact, and denote these nodes as the selected nodes mentioned above. If there are still terminals outside these clusters, they are added to these selected nodes and complete the blocks $G_S$, $C_S$. The remaining cluster of internal nodes forms $G_R$ and $C_R$. The model reduction step is performed on $G_R$ and $C_R$ (and implicitly on $G_K$ and $C_K$). We also note that matrices $G_K$ and $C_K$ resulting from this partitioning usually have many zero columns, thus $\tilde{G}_K$ and $\tilde{C}_K$ will preserve these zero columns.

The procedure is illustrated in Sect. 4 through a medium-sized example. Larger netlists can be treated via a similar reordering and partitioning strategy, possibly in a recursive manner (for instance when after an initial reordering the number of selected nodes is too large, the same partitioning strategy could be re-applied to $G_S$ and $C_S$ and further reduce these blocks). Certainly, other reorderings of $G$ and $C$ could be exploited, for instance according to a permutation which identifies the sccs of $C$ instead of $G$. The choice for either using $G$ or $C$ to determine the permutation $P$ is made according to the structure of the underlying system and may depend on the application. We also emphasize that the reduced models for both PACT and SparseMA are passive and therefore also stable. Passivity is ensured by the fact that all transformations applied throughout are congruence transformations on symmetric positive definite matrices, thus the reduced system matrices remain symmetric positive definite.

4 Numerical results

The graph-based reduction procedures were applied on several networks resulting from parasitic extraction. We present results for both $R$ and $RC$ networks.
4.1 R network reduction

Table 1 shows results for three resistor networks of realistic interconnect layouts. The number of nodes is reduced by a factor $> 10$ and the number of resistors by a factor $> 3$. As a result, the computing time for calculating path resistances in the original network (including nonlinear elements such as diodes) is 10 times smaller.

Table 1 Results of reduction algorithm

<table>
<thead>
<tr>
<th></th>
<th>Network I</th>
<th></th>
<th>Network II</th>
<th></th>
<th>Network III</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Original</td>
<td>Reduced</td>
<td>Original</td>
<td>Reduced</td>
<td>Original</td>
<td>Reduced</td>
</tr>
<tr>
<td>#external nodes</td>
<td>274</td>
<td>3399</td>
<td>1978</td>
<td>2015</td>
<td>1978</td>
<td>1902</td>
</tr>
<tr>
<td>#internal nodes</td>
<td>5558</td>
<td>516</td>
<td>99112</td>
<td>6012</td>
<td>101571</td>
<td>1902</td>
</tr>
<tr>
<td>#resistors</td>
<td>8997</td>
<td>1505</td>
<td>161183</td>
<td>62685</td>
<td>164213</td>
<td>39011</td>
</tr>
<tr>
<td>CPU time</td>
<td>10 s</td>
<td>1 s</td>
<td>67 hrs</td>
<td>7 hrs</td>
<td>20 hrs</td>
<td>2 hrs</td>
</tr>
<tr>
<td>Speed up</td>
<td>10x</td>
<td>9.5x</td>
<td></td>
<td></td>
<td>10x</td>
<td></td>
</tr>
</tbody>
</table>

4.2 RC network reduction

We reduce an RC netlist with $n = 3231$ internal nodes and $p = 22$ terminals (external nodes). The structure of the original G and C matrices is shown in Figures 2 and 3, where the $p = 22$ terminals correspond to their first 22 rows and columns.

The permutation revealing the strongly connected components of G reorders the matrices as shown in Figures 4 and 5. The reordering is especially visible in the “arrow-form” capacitance matrix. There, the $p = 22$ terminal nodes together with $m = 40$ internal nodes are promoted to the border, revealing the 62 selected nodes that will be preserved in the reduced model [i.e. the $G_S$ and $C_S$ blocks in (19)]. The
first $n - m = 3191$ nodes are the remaining internal nodes and form the $G_R$ and $C_R$ blocks in (19). The $G_K$ block has only 1 non-zero column, and also in $C_K$ many zero columns can be identified.

The reduced SparseMA model is obtained according to (22) and (23) and is shown in Figures 6 and 7. The internal blocks $G_R$ and $C_R$ were reduced from dimension 3191 to $\hat{G}_R$ and $\hat{C}_R$ of dimension $k = 7$, by interpolating the 7 most dominant eigenmodes of $[A_R, V] = \text{eig}(-G_R, C_R)$. Note that $\hat{G}_R$ and $\hat{C}_R$ are diagonal. The selected 62 nodes corresponding to the $G_S$ and $C_S$ blocks are preserved, evidently preserving sparsity. The only fill-in introduced by the proposed reduction procedure is in the non-zero columns of $G_K$ and $C_K$. It is worth noticing that $G_K$ only has 1 non-zero column, thus remains sparse.
The sparsity structure of the PACT reduced model (18) is shown in Figures 13 and 9. The blocks corresponding to the first 22 nodes (the preserved external nodes) are full, as are the capacitive connection blocks to the reduced internal part. Only the reduced internal blocks remain sparse (diagonal).

Aside from sparsity preservation, one is interested in the quality of the approximation for the reduced model. In Fig. 10, we show that the SparseMA model accurately matches the original response for a wide frequency range ($1 \text{ Hz} \rightarrow 10 \text{ THz}$). The Pstar [6] simulations of the synthesized model are identical to the Matlab simulations (the synthesized model was obtained via the RLCSYN unstamping procedure [7, 30]). In Fig. 11, the relative errors between the original model and three reduced models are presented: SparseMA, PACT and the commercial software Jivaro [18]. The SparseMA model is the most accurate for the entire frequency range.
Fig. 12 shows a different AC circuit simulation, where the SparseMA model performs comparably to the reduced model obtained with the commercial software Jivaro [18]. Finally, the transient simulation in Fig. 13 confirms that the SparseMA model is both accurate and stable.

Table 2 shows the reduction results for the RC network. For the 3 reduced models: SparseMA, PACT and Jivaro we assess the effect of the reduction by means of several factors. With all methods, both the number of nodes and the number of circuit elements was reduced significantly, resulting in at least 68x speed-up in AC simulation time. It should be noted that the SparseMA model and the Jivaro model have lower ratios of \# elements/\# unknowns and \# elements/\# nodes than the PACT model. Even though the Jivaro and the PACT model are faster to simulate for this network, the SparseMA model gives a good trade-off between approximation quality, sparsity preservation and CPU speed-up. Recall that the matrix blocks corresponding to the circuit terminals become dense with PACT, but remain sparse with SparseMA. As for circuits with more terminals \( \sim O(10^3) \) the corresponding matrix blocks become larger, preserving their sparsity via SparseMA is an additional advantage. Hence, the improvement on simulation time could be greater with SparseMA when applied on larger models with many terminals.

5 Concluding remarks

New approaches were presented for reducing R and RC circuits with multi-terminals, using tools from graph theory. It was shown how netlist partitioning and node reordering strategies can be combined with existing model reduction techniques, to improve the sparsity of the reduced RC models and implicitly their simulation time. The proposed sparsity preserving method, SparseMA, performs comparably to the
Table 2 Results with SparseMA reduction on RC netlist

<table>
<thead>
<tr>
<th></th>
<th>Original</th>
<th>Red. SparseMA</th>
<th>Red. PACT</th>
<th>Red. Jivaro</th>
</tr>
</thead>
<tbody>
<tr>
<td>#external nodes</td>
<td>3231</td>
<td>47</td>
<td>7</td>
<td>12</td>
</tr>
<tr>
<td>#internal nodes</td>
<td>3253</td>
<td>69</td>
<td>29</td>
<td>34</td>
</tr>
<tr>
<td>#unknowns</td>
<td>3466</td>
<td>383</td>
<td>414</td>
<td>97</td>
</tr>
<tr>
<td>#resistors</td>
<td>7944</td>
<td>73</td>
<td>68</td>
<td>28</td>
</tr>
<tr>
<td>#capacitors</td>
<td>3466</td>
<td>383</td>
<td>414</td>
<td>97</td>
</tr>
<tr>
<td>#elements</td>
<td>3.53</td>
<td>9.8</td>
<td>68.8</td>
<td>10.4</td>
</tr>
<tr>
<td>#int. nodes</td>
<td>3.5</td>
<td>6.7</td>
<td>16.6</td>
<td>3.67</td>
</tr>
<tr>
<td>CPU time</td>
<td>6.8 s</td>
<td>0.1 s</td>
<td>0.06 s</td>
<td>0.02 s</td>
</tr>
<tr>
<td>Speed up</td>
<td>68x</td>
<td>113x</td>
<td>340x</td>
<td></td>
</tr>
</tbody>
</table>

Acknowledgements The first author acknowledges the Marie Curie Fellowship Programme, COMSON project MRTN-CT-2005-019417, for the work within TU/Eindhoven and NXP Semiconductors, Eindhoven, the Netherlands. The second author was supported by the O-MOORE-NICE! project, Marie Curie Actions of FP6, MTKI-CT-2006-042477.

References