# Asymptotically Efficient Retiming Under Setup and Hold Constraints

Marios C. Papaefthymiou

Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science Univesity of Michigan Ann Arbor, MI 48109

## Abstract

In this paper we present a polynomial-time algorithm for retiming synchronous circuits with edge-triggered registers under setup and hold constraints. Given a circuit G and a target clock period c, our algorithm computes in  $O(V^3E)$  steps a retimed circuit that achieves c and is free of hold violations, where V is the circuit's gate count, and E is the number of wires in the circuit. This is the first polynomial-time algorithm ever reported for retiming with constraints on both long and short paths. The asymptotically efficient operation of our algorithm is based on a novel formulation of the timing constraints as an integer monotonic program with  $O(E^2)$  inequalities.

#### 1 Introduction

Retiming is a general architectural-level optimization that can improve circuit performance by shifting storage elements across combinational logic blocks. The first algorithmic investigation of retiming for reducing the clock period and area of edge-triggered circuits appeared in [7, 8]. Since then, retiming has been extended to optimize the timing and area of level-clocked circuits [4, 9, 10, 13] and has been further explored in the context of logic synthesis, power optimization, and testability [2, 11].

Over the years, most retiming research has focused on optimum timing. The original work in [7] has been followed by a flurry of research on minimum clock-period retiming of edge-triggered and level-clocked circuits under various delay models and multiphase clocking disciplines [3, 4, 6, 9, 10, 12, 17]. The polynomial-time retiming algorithms described in these papers can only be used to satisfy setup timing constraints, however. An algorithm for retiming single-phase level-clocked circuits under setup and hold constraints has been presented in [15]. Its worst-case running time is exponential, however. A polynomial-time algorithm for pipelining a combinational circuit with level-clocked latches in order to achieve minimum clock period under a twophase clocking scheme while satisfying all setup and hold constraints has been presented in [5]. That algorithm is applicable only to acyclic combinational circuits, however, and cannot be used to retime general circuits with sequential cycles.

In this paper we present a polynomial-time algorithm for minimum clock-period retiming of circuits with edge-triggered registers under both setup and hold constraints. This problem remained open until now and appeared to be intractable due to the competing requirements imposed by the two kinds of constraints. Surprisingly, our algorithm always terminates with the correct answer and does so in a polynomial number of steps in the size of the input circuit. Previous retiming algorithms do not consider hold constraints and can result in retimed circuits with short-path violations. In contrast, our algorithm always computes a retimed circuit in which every setup and every hold constraint is satisfied.

Given any edge-triggered sequential circuit G, a target clock period c, a setup time S, and a hold time H, our algorithm computes a retimed circuit  $G_r$  in which all long-path and all short-path constraints are satisfied. If the problem is infeasible for the given input, it reports so and terminates. The worst-case running time of our algorithm is  $O(V^3E)$ , where V and E are the gate count and the number of wires in G, respectively. In conjunction with binary search, this algorithm can determine a minimum clock period in  $O(V^3E \lg V)$ time. The delay model in our procedure encompasses minimum/maximum gate delays and clock skew.

The asymptotically efficient operation of our algorithm stems from the formulation of retiming under setup and hold constraints as an *integer monotonic pro*gram with  $O(E^2)$  inequalities. An integer monotonic program is a special case of integer programming that involves monotonic functions of the unknowns and can be solved using iterative relaxation. The single-source shortest-paths problem is an example of a simple such program. The integer monotonic program we derive in this paper is more complex than a shortest-paths program. Yet, it is still solvable in polynomial time.

The remainder of this paper has six sections. In Section 2 we give an example that illustrates the inadequacy of previous retiming algorithms in the presence of hold constraints. Section 3 gives background material on retiming and presents the delay model used in our procedure. In Section 4 we give  $O(E^2)$  necessary and sufficient conditions for the retimed circuit to be free of hold violations. These conditions along with the setup constraints are cast as an equivalent integer monotonic program in Section 5. Section 6 describes an  $O(V^3E)$ -time algorithm for solving this program. Section 7 discusses extensions of our contributions to levelclocked circuits and mentions a number of important open problems.



Figure 1: (a) Original circuit. (b) Retimed circuit with hold violation. (c) Retimed circuit with no timing violations.

#### 2 Example

In this section we describe the retiming problem under setup and hold constraints. We also argue that the existing retiming algorithms can result in circuits with hold time violations, since they ignore hold constraints.

The sequential circuit shown in Figure 1(a) has five combinational logic blocks connected in a ring and two edge-triggered registers. The pair of integers in each block gives the maximum and the minimum propagation delay of the data through that block. For example, whenever data propagate through block A, they always require at least 1 time unit and never more than 10 time units. For simplicity, each register is assumed to have zero delay, zero setup time, and a hold time of 4. Moreover, clock skew is assumed to be zero. (Nonzero delays, setup times, and clock skews can be introduced without any loss of generality.) Thus, the shortest clock period achievable by this circuit is 10 + 30 + 20 = 60, since ABC is the longest combinational path in it. There are no hold violations in this circuit, since the minimum propagation delays along ABC and DE are 7 and 5, respectively, both exceeding the register hold time.

The retimed circuit in Figure 1(b) is obtained by shifting the register k across block C and is functionally equivalent to the one in Figure 1(a). This circuit can be computed by applying the retiming algorithm in [8] with a target clock period of 40. Since the longest combinational path in this circuit is AB with a delay of 10 + 30 = 40, there are no setup violations in the retimed circuit. The minimum delay along the path AB is 1 + 2 = 3 < 4, however. Therefore, a fast signal propagating along AB can contaminate the inputs of register k and result in erroneous circuit operation.

The retimed circuit in Figure 1(c) satisfies all setup and hold constraints and achieves a clock period of 50.



Figure 2: (a) Vertex u in original circuit. (b) Vertex u after retiming by r(u) = 1.

This is the shortest clock period that can be achieved by retiming the original circuit. Our algorithm is guaranteed to compute such an optimal retiming in polynomially many steps. All previous polynomial-time algorithms for retiming ignore hold constraints and may return suboptimal circuits or circuits that contain races.

# 3 Preliminaries

We model an edge-triggered circuit as a directed multigraph  $G = \langle V, E, d, \delta, w, \sigma \rangle$ . With the exception of the parameters  $\delta$  and  $\sigma$  that are introduced to account for minimum gate delays and clock skew, respectively, this model is identical to the one used in [8]. Each vertex uin the vertex-set V corresponds to a combinational logic block. The nonnegative weights d(u) and  $\delta(u)$  associated with each vertex u denote the maximum and minimum data propagation delay through u, respectively. Each edge  $u \xrightarrow{e} v$  in the edge-set E corresponds to a wire from u to v in the circuit. All wires are assumed to have zero delay. For each edge  $u \xrightarrow{e} v$ , the nonnegative integer edge-weight w(e) denotes the register count of the corresponding wire from u to v. All registers in the circuit are assumed to have equal positive setup times S and equal positive hold times H. Without loss of generality, register delays can be assumed to be zero.

In addition to the register count w(e), each edge  $u \stackrel{e}{\rightarrow} v$  is associated with a weight  $\sigma(e)$  that gives the delay in the propagation of the clock signal from the clock source to the wire denoted by e. Clock skew is assumed to be *monotonic*, that is, the "effective delay" of a path increases with the number of combinational gates in it. Given a path  $x \stackrel{i}{\rightarrow} u \stackrel{p}{\rightarrow} v \stackrel{j}{\rightarrow} y$ , where p is combinational,  $w(i) \geq 1$ , and  $w(j) \geq 1$ , the *minimum effective delay* of p is given by the expression  $\delta(p) + \sigma(i) - \sigma(j)$ , where  $\delta(p) = \sum_{x \in p} \delta(x)$ . Clock skew monotonicity is ensured

if for each edge pair  $x \xrightarrow{i} u$ ,  $u \xrightarrow{j} y$  in E, we have

$$\delta(u) + \sigma(i) - \sigma(j) \ge 0 . \tag{1}$$

A retiming of an edge-triggered circuit  $G = \langle V, E, d, \delta, w, \sigma \rangle$  is an integer-valued vertex-labeling  $r: V \to \mathbb{Z}$ . This labeling denotes a transformation of the original circuit G into a functionally equivalent circuit  $G_r = \langle V, E, d, \delta, w_r, \sigma \rangle$ , where for each edge  $u \xrightarrow{e} v$  in  $G, w_r$  is defined by the equation

$$w_r(e) = w(e) + r(v) - r(u)$$
 (2)

Figure 2 illustrates the retiming transformation for a vertex u in V. When retiming with r(u) = 1, a stage of registers is shifted from the outputs of u to its inputs. Thus, the output of u's computation in  $G_r$  is generated r(u) clock cycles later than in G, and the label r(u) is called the *lag* of u. In order for  $G_r$  to be *well-formed*, for all edges  $e \in E$ , we must have

$$w_r(e) \ge 0 \quad . \tag{3}$$

A retiming that satisfies Inequality (3) is called *legal*.

Equation (2) can be used to show that for every vertex pair u, v in V, the change in the register count along any path  $u \rightsquigarrow v$  depends solely on the lags of its two endpoints. Thus, for any path  $u \stackrel{p}{\longrightarrow} v$ , we have

$$w_r(p) = w(p) + r(v) - r(u) , \qquad (4)$$

where  $w(p) = \sum_{e \in p} w(e)$ . From Equation (4) it follows that when the path p is a cycle, then retiming does not change its original register count.

Equation (4) can be used to identify paths that can become critical after retiming. Based on this equation and Inequality (3), the maximum change in the register count of any path  $u \rightsquigarrow v$  is given by the expression

$$W(u, v) = \min\left\{w(p) : u \stackrel{p}{\rightsquigarrow} v\right\} . \tag{5}$$

Thus, the only paths  $u \xrightarrow{p} v$  that can become combinational (and possibly critical) in  $G_r$  are those for which w(p) = W(u, v) in G. For each of the  $O(V^2)$  vertex pairs u, v in V, the quantity

$$D(u,v) = \max\left\{d(p) : u \stackrel{p}{\rightsquigarrow} v, w(p) = W(u,v)\right\}$$
(6)

gives the longest propagation delay from u to v whenever the retimed circuit includes a combinational path between the two vertices. Therefore, the clock period of any retimed circuit  $G_r$  is always some element in the  $O(V^2)$ -size set of D(u, v)'s.

For the retimed circuit  $G_r$  to satisfy all setup constraints, every combinational path in  $G_r$  whose propagation delay exceeds the target clock period c must be interrupted by a register. Thus, for each vertex pair u, v in V with D(u, v) > c, the retimed register count  $W_r(u, v)$  of each path  $u \rightsquigarrow v$  that can become combinational is given by the equality

$$W_{r}(u, v) = W(u, v) + r(v) - r(u)$$
(7)

and must satisfy the inequality

$$W(u, v) + r(v) - r(u) \ge 1 .$$
(8)

The  $O(V^2)$  difference constraints specified by Inequalities (3) and (8) can be computed in  $O(VE + V^2 \lg V)$  steps by an all-pairs shortest-paths algorithm and can be solved in  $O(V^3)$  steps by a Bellman-Ford algorithm for single-source shortest-paths. An asymptotically more efficient procedure that solves these constraints in O(VE) steps has been presented in [8].

#### 4 Timing constraints

This section gives a mathematical statement of the hold constraints that must be satisfied in a correctly timed edge-triggered circuit. It also describes a set of  $O(E^2)$ necessary and sufficient conditions for a retimed circuit to be free of hold violations. These conditions are independent of the setup constraints.

Discovering an efficiently solvable set of constraints that captures the hold requirements in the retimed circuit is a surprisingly challenging task. At first, it seems that one could derive such a set by a straightforward adaptation of Inequality (8). Intuitively, this inequality states that if a long path can become combinational in a retimed circuit, it must be interrupted by at least one register. This idea cannot be applied in the case of hold constraints, however. If a potentially combinational path is too short, then we may or may *not* need to insert a register in it. The decision depends on the existence of registers in the path's boundaries.

A retimed circuit  $G_r$  is free of hold violations if and only if there exists no register-to-register combinational path whose minimum effective propagation delay is shorter than the register hold time H. This statement is formalized in the following proposition.

**Proposition 1** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edgetriggered circuit, and let H be the register hold time. Given any legal retiming function  $r : V \to Z$ , the retimed circuit  $G_r$  is free of hold violations if and only if for every path  $x \xrightarrow{i} u \xrightarrow{p} v \xrightarrow{j} y$  in  $G_r$ , where p is combinational,  $w_r(i) \ge 1$ , and  $w_r(j) \ge 1$ , we have that  $\delta(p) + \sigma(i) - \sigma(j) \ge H$ .

A direct consequence of the hold requirements is that no wire in the retimed circuit can have more than one register.

**Lemma 2** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edge-triggered circuit. Given any legal retiming function  $r : V \to Z$ , the retimed circuit  $G_r$  has no hold violations only if for every edge  $u \stackrel{e}{\to} v$  in E, we have

$$w_r(e) \le 1 \ . \tag{9}$$

*Proof.* By contradiction. Suppose that for some wire  $u \stackrel{e}{\rightarrow} v$  in E, we have  $w_r(e) \ge 2$ . Then, any two registers on e are connected by a zero-delay path and result in a hold violation.

**Corollary 3** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edge-triggered circuit. Given any legal retiming function  $r: V \rightarrow Z$ , the retimed circuit  $G_r$  has no hold violations only if for every simple path or simple cycle  $u \stackrel{p}{\leadsto} v$  in G, we have

$$w_r(p) \le |V| \ . \tag{10}$$

Let us now turn to paths with more than one edge. Given two vertices u and v, the only paths  $u \xrightarrow{p} v$  that can possibly give rise to a hold violation between u and v are the ones that can become combinational after retiming, that is, the paths that satisfy w(p) = W(u, v). Among these paths, we only need to focus on the ones with the shortest minimum propagation delay. To that end, we define the  $O(V^2)$  values  $\Delta(u, v)$  as follows:

$$\Delta(u,v) = \min\left\{\delta(p) : u \stackrel{p}{\rightsquigarrow} v, w(p) = W(u,v)\right\} . \tag{11}$$

These values are used in the following main theorem that gives a set of necessary and sufficient conditions for the absence of any hold violations in the retimed circuit. **Theorem 4** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edge-triggered circuit, and let H be the hold time of each register. Given any legal retiming function  $r: V \to \mathbb{Z}$  that satisfies the condition of Lemma 2, the retimed circuit  $G_r$ has no hold violations if and only if for every edge pair  $x \stackrel{i}{\to} u, v \stackrel{j}{\to} y$  in E such that  $\Delta(u, v) + \sigma(i) - \sigma(j) < H$ , we have

$$W_r(u,v) + w_r(v,y) > 1 \implies w_r(x,u) = 0$$
. (12)

*Proof.*  $(\Rightarrow)$  We will prove the contrapositive: If there

exists an edge pair  $x \xrightarrow{i} u$ ,  $v \xrightarrow{j} y$  in E such that  $\Delta(u, v) + \sigma(i) - \sigma(j) < H$  for which  $W_r(u, v) + w_r(v, y) \geq 1$  and  $w_r(x, u) \geq 1$ , then the retimed circuit  $G_r$  has a hold violation.

Consider a path  $u \xrightarrow{p} v$  in  $G_r$  such that w(p) = W(u, v) and  $\delta(p) = \Delta(u, v)$ . Since  $W_r(u, v) + w_r(v, y) \ge 1$ , there exists at least one edge in the path  $q = u \xrightarrow{p} v \rightarrow y$  whose register count is positive. Let  $s \rightarrow t$  be the first such edge in q, and let  $q' = u \rightsquigarrow s$  be a prefix subpath of q. Then, q' is a combinational path, and since  $w_r(x, u) \ge 1$ , it connects a register on  $x \rightarrow u$  with a register on  $s \rightarrow t$ . From Inequality (1) we have

$$\delta(q') + \sigma(i) - \sigma(k) \le \delta(p) + \sigma(i) - \sigma(j)$$

where k is the edge following  $s \to t$  in q. Since  $\delta(p) = \Delta(u, v)$ , it follows that

$$\delta(q') + \sigma(i) - \sigma(k) < H .$$

Therefore, a hold violation occurs along the path q' in the retimed circuit.

 $(\Leftarrow)$  By contradiction. Suppose that for every edge pair  $u' \xrightarrow{e} u, v \xrightarrow{e'} v'$  in E with  $\Delta(u, v) + \sigma(e) - \sigma(e') < H$ , we have that  $W_r(u, v) + w_r(v, v') \ge 1$  implies  $w_r(u', u) = 0$ . Moreover, suppose that the retimed circuit  $G_r$  has a hold violation. We will derive a contradiction of the implication.

Since some hold constraint is violated in  $G_r$ , there exists a path  $x' \xrightarrow{i} x \xrightarrow{p} y \xrightarrow{i'} y'$  in  $G_r$  such that  $w_r(p) = 0$ ,  $w_r(x', x) \ge 1$ ,  $w_r(y, y') \ge 1$ , and  $\delta(p) + \sigma(i) - \sigma(i') < H$ . Since q is combinational, it follows that w(p) = W(x, y). Without loss of generality, assume that p is such that  $\delta(p) = \Delta(x, y)$ . Since  $\delta(p) + \sigma(i) - \sigma(i') < H$ , it follows that  $\Delta(x, y) + \sigma(i) - \sigma(i') < H$ . Moreover,  $w_r(y, y') \ge 1$  implies that  $W_r(x, y) + w_r(y, y') \ge 1$ . Since  $w_r(x', x) = 1$ , the path p contradicts the implication in the theorem's statement.

# 5 Integer monotonic program

In this section we give an integer monotonic program that is equivalent to the necessary and sufficient conditions in Theorem 4. We then argue that the setup constraints can also be expressed as a monotonic program. A solution to the two sets of constraints can be computed using iterative relaxation.

The necessary and sufficient conditions in Theorem 4 can be recast as a set of linear inequalities.

**Theorem 5** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edge-triggered circuit, and let H be the hold time of the circuit's registers. Given a legal retiming function  $r: V \to Z$  that satisfies the conditions of Lemma 2, the retimed circuit has no hold violations if and only if for every edge pair  $x \xrightarrow{i} u, v \xrightarrow{j} y$  in E such that  $\Delta(u, v) + \sigma(i) - \sigma(j) < H$ , we have

$$(|V|+1) \cdot w_r(x,u) + (W_r(u,v) + w_r(v,y)) \le |V|+1.$$
(13)

*Proof.* We first show that if Inequality (13) holds then Relation (12) holds. The proof is by a straightforward case analysis. If  $W_r(u, v) + w_r(v, y) \ge 1$ , then Inequality (13) and Inequality (3) imply that  $w_r(u', u) = 0$ , and therefore Relation (12) holds. If  $W_r(u, v) + w_r(v, v') = 0$ , then Inequality (13) implies that  $w_r(x, u) \le 1$ . This constraint is identical to Inequality (9), and therefore Relation (12) holds.

We now prove that Relation (12) yields Inequality (13). If  $W_r(u, v) + w_r(v, y) \ge 1$ , then  $w_r(x, u) = 0$ . From Inequality (9), we have that  $w_r(v, y) \le 1$ . Moreover, from Inequality (10) we have that  $W_r(u, v) \le |V|$ . Therefore,  $W_r(u, v) + w_r(v, y) \le |V| + 1$  and Inequality (13) holds. If  $W_r(u, v) + w_r(v, y) = 0$ , then Inequality (13) is equivalent to  $w_r(x, u) \le 1$ , which already holds from Inequality (9).

The inequalities in Theorem 5 can be rewritten as a *simple integer monotonic program* [4], which is defined as follows:

**Problem SIMP** (Simple Integer Monotonic Programming) Let S be a set of constraints over the unknowns  $x_1, x_2, \dots, x_n$ , in which the kth constraint has the form

$$f_k(x_i) \ge g_k(x_1, x_2, \cdots, x_n) , \qquad (14)$$

where the function  $f_k$  is a monotonically increasing in the single unknown  $x_i$  and  $g_k$  is monotonically increasing in the n-1 unknowns  $x_j$ , for  $j = 1, 2, \dots, n, j \neq i$ . The simple integer monotonic programming problem is to find a vector  $x = \langle x_1, x_2, \dots, x_n \rangle$  of integers satisfying S, or to determine that no feasible vector exists.

**Theorem 6** The retiming problem under setup and hold constraints can be reduced to the Problem SIMP with  $O(E^2)$  constraints.

*Proof.* For a given edge-triggered circuit  $G = \langle V, E, d, \delta, w, \sigma \rangle$ , a target clock period c, a setup time S, and a hold time H, the retiming problem under setup and hold constraints is expressed by the following inequalities:

• For every edge  $u \to v$  in E, we have

$$w(u,v) + r(v) \ge r(u) . \tag{15}$$

• For every edge pair  $x \xrightarrow{i} u, v \xrightarrow{j} y$  in V such that  $D(u, v) + \sigma(i) - \sigma(j) > c - S$ , we have

$$W(u, v) + r(v) \ge r(u) + 1$$
. (16)

• For every edge  $u \to v$  in E, we have

(

$$r(u) + 1 \ge w(u, v) + r(v)$$
. (17)

• For every edge pair  $x \xrightarrow{i} u, v \xrightarrow{j} y$  in E such that  $\Delta(u, v) + \sigma(i) - \sigma(j) < H$ , we have

$$|V|+1) \cdot (r(x)+1) \ge$$
(18)

$$r(y) + |V| \cdot r(u) + w(v, y) + W(u, v) + w(x, u)$$
.

MonoRelax (S)1 for  $i \leftarrow 1$  to n2  $x_i \leftarrow 0$ 3 while there exists an unsatisfied constraint in Slet  $f_k(x_i) \ge g_k(x_1, x_2, \cdots, x_n)$  be unsatisfied 4 5repeat  $x_i \leftarrow x_i + 1$ until constraint satisfied 6 7  $S \leftarrow S \cup \{ \text{all constraints with } x_i \text{ on r.h.s.} \}$ return  $\langle x_1, x_2, \cdots, x_n \rangle$ 8

Figure 3: Algorithm MONORELAX for solving a simple integer monotonic program S with unknowns  $x_1, x_2, \dots, x_n$ . The procedure returns a solution if S is satisfiable.

The setup constraints are captured by Inequalities (15) and (16), which are derived by extending the constraints in Section 3 to encompass monotonic clock skew. The hold constraints are captured by Inequalities (17) and (18) and follow from Theorem 5. All these  $O(E^2)$  inequalities are in the form of Problem SIMP.

# 6 Polynomial-time algorithm

In this section, we describe our  $O(V^3 E)$ -time algorithm for retiming with setup and hold constraints. We first outline a general algorithm for solving integer monotonic programs. Subsequently, we adapt that algorithm to solve the retiming problem.

A simple integer monotonic program can be solved using Algorithm MONORELAX which is described in Figure 3. This algorithm relies on the fact that simple integer monotonic programs can be solved by iterative relaxation [4]. Its correctness stems from the following theorem which has been proved in [4].

**Theorem 7** For any simple integer monotonic program having a solution in which  $x_i \ge 0$  for i = 1, 2, ..., n, there exists a unique minimum solution  $\langle \overline{x}_1, \overline{x}_2, ..., \overline{x}_n \rangle$ such that for all other nonnegative solutions  $\langle x_1, x_2, ..., x_n \rangle$ , we have  $\overline{x}_i \le x_i$  for i = 1, 2, ..., n.

Theorem 7 applies to the retiming problem described by Inequalities (15)-(18). If  $\overline{r}$  is a solution for these inequalities, then  $\overline{r} + k$  is also a solution, where k is any integer. Consequently, any solution can be shifted up by a constant to yield a nonnegative solution. Moreover, no variable r(v) needs to increase above a minimum value  $\overline{r}(v)$ , since by the monotonicity of f and g that minimum value is also a solution. Therefore, if the retiming problem is feasible, one can find a minimum nonnegative solution to it.

The following lemma shows that if the retiming problem is feasible, then |V| gives an upper bound for the minimum solution  $\overline{r}$ .

**Lemma 8** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edge-triggered circuit, let c be a target clock period, let S be the register setup time, and let H be the register hold time. Moreover, let  $\overline{r}$  be the minimum nonnegative retiming such that the retimed circuit  $G_r$  achieves c with no hold violations. Then for every vertex  $v \in V$ , we have

$$\overline{r}(v) \le |V| . \tag{19}$$

RSH(G, c, S, H)1  $Q \leftarrow \{\text{Inequalities (15)-(18) from Theorem 6}\}$ 2 for every vertex  $v \in V$ 3  $r(v) \leftarrow 0$ while  $Q \neq \emptyset$ 4 5remove " $f(r(x)) \ge g(r(u), r(y))$ " from Q 6 **if** f(r(x)) < g(r(u), r(y))7repeat  $r(x) \leftarrow r(x) + 1$  $\mathbf{if} \ r(x) > |V|$ 8 9 return fail 10**until**  $f(r(x)) \ge g(r(u), r(y))$ 11 $Q \leftarrow Q \cup \{ \text{all constraints with } r(x) \text{ on r.h.s.} \}$ 

12 return r

Figure 4: Algorithm RSH for retiming to achieve a specified clock period c with no hold violations.

*Proof.* By contradiction. Let  $\overline{r}$  be a minimum retiming, and let  $v \in V$  be a vertex with  $\overline{r}(v) > |V|$ . Moreover, let u be a vertex such that  $\overline{r}(u) \leq \overline{r}(v)$  for all  $v \in$ V. Without any loss of generality, we can assume that  $\overline{r}(u) = 0$ . (Otherwise, the assignment  $\overline{r}(v) - \overline{r}(u)$  to each  $v \in V$  also satisfies Inequalities (15)-(18), thus contradicting the minimality of  $\overline{r}$ .)

Now, consider a simple path  $v \stackrel{p}{\rightarrow} u$  with w(p) = W(v, u). Adding Inequality (3) along p, we obtain

$$\overline{r}(v) - \overline{r}(u) \le W(v, u) \; .$$

Since  $\overline{r}(u) = 0$ , we have  $\overline{r}(v) \leq W(v, u)$ . From Inequality (10) it follows that  $\overline{r}(v) \leq |V|$ , which contradicts our original assumption.

Our Algorithm RSH for efficient retiming with setup and hold constraints is given in Figure 4. Initially, all constraints corresponding to Inequalities (15)-(18)are inserted into a queue, and all variables r(v) are set to zero. The algorithm proceeds by iteratively removing constraints from the queue. For each violated constraint, it increases the value of the variable on its lefthand side and reinserts into the queue any constraints that are violated as a result of this operation. This iterative scheme continues until all constraints are satisfied or until the value of some variable exceeds |V|. In the latter case, the algorithm concludes that the specified clock period cannot be achieved by retiming.

**Theorem 9** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edge-triggered circuit, let c be a target clock period, let S be the register setup time, and let H be the register hold time. In  $O(V^3E)$  time, Algorithm RSH determines a retiming r such that the retimed circuit achieves c and is free of any hold violations or determines that no such retiming exists.

**Proof.** From Theorem 7 and Lemma 8, Algorithm RSH computes a unique nonnegative solution for Inequalities (15)-(18) in which every variable r(v) attains its minimum value that does not exceed |V|.

We now show that Algorithm RSH terminates in  $O(V^3E)$  steps. The parameters D,  $\Delta$ , and W in Step 1 can be computed in  $O(VE + V^2 \lg V)$  time using Johnson's algorithm for all-pairs shortest-paths [1]. The  $O(E^2)$  Inequalities (15)–(18) can be computed in  $O(E^2)$  time.

The functions f(r(x)) and g(r(u), r(y)) can be computed in O(1) time. Since we have only V variables and no variable can exceed |V|, the body of the **while** loop is executed  $O(V^2)$  times. Since for each increment of variable r(v), at most O(VE) constraints with r(v) on their right-hand side are inserted in the queue, Step 11 takes O(VE) time. Consequently, the total running time of Algorithm RSH is  $O(VE + V^2 \lg V + E^2 + V^3 E) =$  $O(V^3 E)$ .

**Theorem 10** Let  $G = \langle V, E, d, \delta, w, \sigma \rangle$  be an edge-triggered circuit, let c be a target clock period, let S be the register setup time, and let H be the register hold time. An optimal retuning r such that the retimed circuit achieves the minimum possible clock period and is free of any hold violations can be determined in  $O(V^3 E \lg V)$ time.

*Proof.* Algorithm RSH is applied  $O(\lg V)$  times in a binary search among the  $O(V^2)$  possible clock periods D(u, v).

# 7 Conclusion

We presented the first polynomial-time algorithm ever reported for retiming with setup and hold constraints, thus settling a long-standing open problem. Our algorithm runs in  $O(V^3E)$  steps, where V is the circuit's gate count and E is the number of wires in it. The asymptotically efficient operation of the algorithm relies on an integer monotonic programming formulation of the problem that comprises  $O(E^2)$  inequalities and takes into accout minimum/maximum propagation delays and clock skew. Although the results in this paper are stated in terms of edge-triggered circuits, they can be adapted in a straightforward manner to encompass the case of single-phase level-clocked circuits.

An interesting problem for further investigation is the design of an asymptotically faster algorithm than RSH. We conjecture that an  $O(V^2E)$ -time algorithm can be designed using an approach similar to the one described in [8]. From a practical standpoint, it is also interesting to investigate schemes for reducing the number of constraints in the integer monotonic program.

The most fertile area for further research appears to be the investigation of retiming in level-clocked circuits that use multiple overlapping phases. In the presence of level-clocked latches, the problem of optimal timing with constraints on short and long paths is particularly challenging even when latch locations are fixed [14]. The insights we have developed for edge-triggered circuits may result in efficient retiming algorithms for multiphase level-clocked circuits.

#### References

- T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. McGraw-Hill, MIT Press, 1990.
- [2] S. Dey and S. Chakradhar. Retiming sequential circuits to enhance testability. In Proceedings of the 12th IEEE VLSI Test Symposium, pages 28-33, April 1994.
- [3] A. T. Ishii. Retiming gated-clocks and precharged circuit structures. In Digest of Technical Papers of the 1993 IEEE International Conference on CAD, pages 300-307, November 1993.

- [4] A. T. Ishii, C. E. Leiserson, and M. C. Papaefthymiou. Optimizing two-phase, level-clocked circuitry. Journal of the Association for Computing Machinery, 41(1):148-199, January 1997.
- [5] A. T. Ishii and M. C. Papaefthymiou. Efficient pipelining of level-clocked circuits with min-max propagation delays. In TAU'95 ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, November 1995.
- [6] K. N. Lalgudi and M. C. Papaefthymiou. Retiming edgetriggered circuits under general delay models. *IEEE Transactions on Computer-Aided Design of Integrated Circuits*, 16(12):1393-1408, December 1997.
- [7] C. E. Leiserson and J. B. Saxe. Optimizing synchronous systems. Journal of VLSI and Computer Systems, 1(1):41-67, 1983.
- [8] C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1), 1991. Also available as MIT/LCS/TM-372.
- [9] B. Lockyear and C. Ebeling. Optimal retiming of multi-phase, level-clocked circuits. In Advanced Research in VLSI and Parallel Systems: Proc. of the 1992 Brown/MIT Conference. MIT Press, March 1992.
- [10] B. Lockyear and C. Ebeling. The practical application of retiming to the design of high performance systems. In Digest of Technical Papers of the 1993 IEEE International Conference on CAD, pages 288-295, November 1993.
- [11] S. Malik, E. Sentovich, R. K. Brayton, and A. Sangiovanni-Vincentelli. Retiming and resynthesis: Optimizing sequential networks with combinational techniques. In Proc. of the Hawaii International Conference on System Sciences, June 1990.
- [12] M. C. Papaefthymiou. Understanding retiming through maximum average-delay cycles. *Mathematical Systems Theory*, 27:65-84, 1994.
- [13] M. C. Papaefthymiou and K. H. Randall. TIM: a timing package for two-phase, level-clocked circuitry. In Proceedings of the 30th ACM/IEEE Design Automation Conference, June 1993. Also available as an MIT VLSI Memo 92-693, October 1992.
- [14] K. A. Sakallah, T. N. Mudge, T. M. Burks, and E. S. Davidson. Synchronization of pipelines. *IEEE Transac*tions on Computer-Aided Design of Integrated Circuits and Systems, 12(8):1132-1146, August 1993.
- [15] N. Shenoy, R. K. Brayton, and A. Sangiovanni-Vincentelli. Retiming of circuits with single phase levelsensitive latches. In International Conference on Computer Design, October 1991.
- [16] N. V. Shenoy, R. K. Brayton, and A. L. Sangiovanni-Vincentelli. Minimum padding to satisfy short path constraints. In Digest of Technical Papers of the 1993 IEEE/ACM International Conference on CAD, November 1993.
- [17] T. Soyata, E. Friedman, and J. Mulligan. Incorporating interconnect, register, and clock distribution delays into the retiming process. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 16(1):105-120, January 1997.