Automated Extraction of Expert Knowledge in Analog Topology Selection and Sizing
Trent McConaghy\textsuperscript{1,2}, Pieter Palmers\textsuperscript{1}, Georges Gielen\textsuperscript{1}, Michiel Steyaert\textsuperscript{1}

1 ESAT-MICAS, K. U. Leuven, Kasteelpark Arenberg 10, Leuven, Belgium
2 Solido Design Automation Inc., Saskatoon, Canada


ABSTRACT
This paper presents a methodology for analog designers to maintain their insights into the relationship among performance specifications, topology choice, and sizing variables, despite those insights being constantly challenged by changing process nodes and new specs. The methodology is to take a data-mining perspective on a Pareto Optimal Set of sized analog circuit topologies, then doing: extraction of a specs-to-topology decision tree; global nonlinear sensitivity analysis on topology and sizing variables; and determining analytical expressions of performance tradeoffs. These approaches are all complementary as they answer different designer questions. Once the knowledge is extracted, it can be readily distributed to help other designers, without needing further synthesis. Results are shown for operational amplifier design on a database containing thousands of Pareto Optimal designs across five objectives.

1. INTRODUCTION
Analog designers use their experience and intuition to choose circuit topologies and to design new topologies. Unfortunately, the topology used may not be optimal, with possible adverse affects on the related product’s performance, power, area, yield, and profitability. The suboptimal design may be because the designer is on an unfamiliar process node, the designer is time-constrained, or simply because the designer just doesn’t have deep experience (it is well recognized that analog design takes decades to master [1]). That said, it still means that a suboptimal topology may be used. Hence, it is desirable to provide support for the designer in topology selection and design, and ideally to catalyze the learning process. Prior CAD research has focused on automated topology selection & design (with nice successes [2]), but has had little emphasis on giving insight back to the user. In fact, by deferring control to automated tools, a designer’s learning might slow. Even worse, the designer could end up poorly-equipped when problems arise.

This paper asks: is there a means to help analog designers maintain and build expert topology-performance knowledge? The starting point is a recent innovation, which traverses thousands of circuit topologies to automatically generate a database of Pareto-optimal sized circuits [3]. Contributions are:

1. A data-mining perspective on the database to extract the following expert knowledge: (a) a specs-to-topology decision tree, (b) global nonlinear sensitivities on topology and sizing variables, and (c) analytical performance-tradeoff models.
2. A suggested flow in which even reluctant users can conveniently use the extracted knowledge (Figure 1). The database generation and knowledge extraction only needs to be done once per process node, e.g. by a single designer or a modeling group. The knowledge can be stored in a document (e.g. pdf, html), and simply made available to other designers.

Figure 1: Target flow. The extracted knowledge is readily available to all designers, without requiring them to invoke automated sizing.

This paper’s knowledge extraction procedures will be explained using a reference database, generated as described in section 2. Section 3 describes how a specs-to-decision tree is extracted from the database. Section 3.1 describes extraction of global nonlinear sensitivities, and Section 4 extraction of analytical tradeoffs model. Section 5 concludes.

2. GENERATION OF DATABASE
This section describes the setup to generate the sized-topologies database. Table 1 lists the search space and goals. The technology was 0.18μm CMOS with 1.8 V supply voltage. The output DC voltage was 0.9 V, and load capacitance 1pF. HSPICE\textsuperscript{TM} was the simulator.

<table>
<thead>
<tr>
<th>Table 1: Problem Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Search Space</td>
</tr>
<tr>
<td>Objectives</td>
</tr>
<tr>
<td>Constraints</td>
</tr>
</tbody>
</table>
We use MOJITO [3] search, which is an evolutionary algorithm with an age-layered population structure [4] to prevent premature convergence, and NSGA-II at each layer to handle constrained multi-objective optimization [5]. Settings were: 100 individuals per age layer; 10 age layers; maximum age per layer: 19, 39, ..., 159, 179, ∞. While MOJITO is tuned for cell-level circuits like op amps and bias generators, reference [6] generates system-level multi-objective multi-topology databases, e.g. for ADCs. The knowledge-extraction approaches of this paper apply to any technique that can generate such a database.

As input to the tree construction, each topology was assigned a fitness degree [5]. In contrast, this paper constructs the specs-to-topology decision tree automatically from data. This is only possible now, because a prerequisite to get the data was a competent multi-topology multi-objective sizer that could output a diverse set of topologies.

3. EXTRACTION OF SPECS-TO-TOPOLOGY Decision Tree

This section describes the automatic extraction of decision (CART) trees [7] that map from performance values to topology choices. Decision trees have a double use: they can directly suggest a choice based on inputs, yet also expose the series of steps underlying the decision. CART trees are in widespread use, such as medicine: “In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. … Decision trees are a reliable and effective decision making technique that provide high classification accuracy with a simple representation of gathered knowledge.” Decision trees have not gone unnoticed in analog CAD either, as they have been proposed as the centerpiece of topology-choosing “expert systems”, e.g. [9]. Unfortunately, these trees had to be manually constructed which took weeks to months of effort, and were based on rules of thumb that became obsolete as soon as the process node changed. In contrast, this paper constructs the specs-to-topology decision tree automatically.

As input to the tree construction, each topology was assigned a fitness degree. For performance ranges, and actually even gives a suggested topology from a set of input specs. We see that low-frequency gain ($A_{dc}$) is the first variable selected on, and following through the tree, we see that all specifications play a role for selecting some topologies: gain-bandwidth (GBW), power, slew rate (SR), and dynamic range (DR). When specifications require low gain, the tree suggests single-stage topologies, and two stages when higher gain is required. In cases where very large gain is required with a limited power budget, a two-stage amplifier with large degrees of cascoding (K) is suggested. If power is less of an issue, one can also use a non-cascoded two-stage amplifier (G). Since only Pareto-optimal individuals are used to generate the tree, the choice for the more power-efficient variant implies lower performance for one or more other metrics (in this case e.g. dynamic range). Also reassuring is that while there were thousands of possible topologies, just 15 were returned. This is in line with many analog designers’ expectation that just a couple dozen opamp topologies serve most purposes. The challenge, of course, is which topologies those are, and for what specs they are appropriate.

It is important to remember that the tree is a classifier at its core, which can help avoid reading too much into it. To aid understanding, we outline its construction. The algorithm starts with just a root node holding all data points. From among all
possible combinations of \( [\text{split_variable}, \text{split_value}] \), it chooses the one from that splits off the most data points. That split creates a left and right child, each getting a subset of the data according to the chosen variable and value. The algorithm recurses, splitting each leaf node until there is just one sample at each leaf node or another stopping criterion is hit. There are CART extensions to capture sensitivities to exact split values, but this is at a cost of additional complexity in the reported tree. Another extension is for the user to give preference to choosing certain split variables first, which may result in interesting alternative trees.

An additional benefit of tree extraction is based on there being more than 2-3 objectives, which means the raw data is difficult to visualize; the tree gives alternate perspective among 5 objectives, highlighting which topologies cover which performance regions.

### 3.1 GLOBAL NONLINEAR SENSITIVITY ANALYSIS

The aim here is to address questions such as: “how much does each topology choice matter? Should I be changing the topology or device sizes? Which block or variables should I change?”

There may even be more specific questions, such as “how much does cascading affect gain?” Our approach to handle such questions is to perform global nonlinear sensitivity analysis. We need to be global -- across the range of variables -- because we have thousands of training points, and one cannot do small perturbations on integer-valued design variables such as topology-choice variables.

We cannot assume linearity because not being local means a Taylor approximation does not apply; topology-choice variables are categorical; and small ad-hoc tests showed that linear models fit poorly.

The sensitivity extraction flow we follow for each performance metric \( y \) is:

1. Given: a set of \( \{ \text{X}, \text{y} \} = \{ x_k, y_k \}, k=1..N \) Pareto-optimal points where \( x_k \) is a d-dimensional topology/sizing input point and \( y_k \) is a corresponding performance value
2. Build a regression model \( m \) that maps \( \text{X} \) to \( \text{y} \)
3. From \( m \), compute nonparametric sensitivities \( \varepsilon = \{ e_i \}, i=1..d \)
4. Return \( \varepsilon \)

Steps 2 and 3 have specific challenges. Step 2, regressor construction, needs to handle numerical and categorical input variables, which prevents usage of polynomials, splines / piecewise polynomials, support vector machines, kriging, and neural networks. CAFFEINE [10] works on categorical variables, but it would run very slowly on 50 input variables and 1500 training samples. A CART tree is not appropriate because the model needs to do regression, not classification. However, a relatively recent technology achieves the effect of regression on CART trees by boosting them: stochastic gradient boosting (SGB) [11]. SGB also has acceptable scaling and prediction properties, so we employ it here.

Step 3 needs to compute sensitivities from the model, yet be global, nonlinear, and ideally, nonparametric. The proposed solution defines global nonlinear sensitivity (impact) for a variable \( x_i \) as the relative error that a scrambled input variable \( x_i \) will give in predicting, compared to other variables \( x_j \), \( j=1..d, \neq i \) when they are scrambled. Table 2 gives the algorithm that uses this concept to extract impacts (inspired by chapter 10 of [11]). \( NS \) is number of scrambles; \( \text{nmse} \) is normalized mean-squared error.

<table>
<thead>
<tr>
<th>Table 2: Procedure RegressorImpacts()</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Input:</strong> X, y, m</td>
</tr>
<tr>
<td><strong>Output:</strong> e</td>
</tr>
<tr>
<td>For ( i = 1 ) to ( d ):</td>
</tr>
<tr>
<td>( e_i = 0 )</td>
</tr>
<tr>
<td>Repeat ( NS ) times:</td>
</tr>
<tr>
<td>( X_{scr} = X ) except randomly permute row ( i )</td>
</tr>
<tr>
<td>( y_{scr} = m.\text{simulate}(X_{scr}) )</td>
</tr>
<tr>
<td>( e_i = e_i + \text{nmse}(y_i, y_{scr}) )</td>
</tr>
<tr>
<td>( S = \sum_{i=0}^{d-1} e_i )</td>
</tr>
<tr>
<td>( e_i = \frac{e_i}{S}, i=1..d )</td>
</tr>
</tbody>
</table>

With this flow, we extracted sensitivities for each performance. SGB and CART were coded in about 500 lines of python. SGB parameters were: maximum number of trees = 500, learning rate \( \alpha = 0.10 \), minimum tree depth = 2, maximum tree depth = 7, \( NS = 500 \). Build time for an SGB model was about 15 s on a 2.0 GHz Linux machine; impact extraction from the model took about 25 s.

Figure 4 illustrates results for GBW’s 10 most-impacting variables. We see that the most important variable is \( \text{chosen_part_index} \), which selects one vs. two stages. The variables that are commonly associated with the GBW of opamps -- bias current of the first stage and size of compensation capacitance -- also show up. Interestingly, the figure also indicates a large influence of the length of the transistors in the first stage (input, folding and load). This can be readily explained: these lengths directly influence impedance on the internal nodes, and hence the location of the non-dominant pole. The phase margin requirement (>65°) translates into the requirement that this non-dominant pole frequency is sufficiently higher than the GBW (approx 2x) [13]. It is also interesting to see that for GBW, only one topology parameter made it into the top 10 variables; sizing
parameters comprise the other 9. This means that once one vs. two stages is chosen, changing the right sizing variables will make the biggest difference to GBW. Of course, the most sensitive variables can be different for different performance metrics, and the designer must consider all metrics.

4. EXTRACTION OF ANALYTICAL PERFORMANCE TRADEOFFS

Designers often they manually manipulate equations that relate performance tradeoffs [13][14]. Equations facilitate understanding because a direct relationship is expressed and the model is manipulatable to change the output variable. The problem is that hand-derived analytical expressions are based on 1st or 2nd order approximations and may have little relation to the problem is that hand-derived analytical expressions are based on model is manipulatable to change the output variable. The their expert insights on the topology-sizing-specs relationship, This paper presented a methodology to help designers maintain their expert insights on the topology-sizing-specs relationship, which is a challenge due to changing process nodes and more. The approach is to take a data-mining perspective on a Pareto Optimal Set of sized analog circuit topologies: extract a specs-to-topology decision tree (via CART); do global nonlinear sensitivity analysis on topology and sizing variables (via SGB and a variable-scrambling heuristic); and generate analytical whitebox models to capture tradeoffs among performances (via CAFFEINE). These approaches are all complementary as they answer different designer questions. Once extracted, the knowledge for a circuit type on a process node can readily be distributed to other designers, without need for more synthesis. Results are shown for operational amplifier design on a database containing thousands of Pareto Optimal designs across five objectives. As a final note, we must emphasize once again that these techniques are meant to augment designer experience, not replace it. The designer is key.

6. REFERENCES


5. CONCLUSION

This paper presented a methodology to help designers maintain their expert insights on the topology-sizing-specs relationship,