## On the NP-completeness of Regular 2-D FPGA Routing Architectures and A Novel Solution

Yu-Liang Wu<sup>1</sup>

Douglas Chang<sup>2</sup>

Department of electrical and Computer Engineering<sup>1</sup> Department of Computer Science<sup>2</sup> University of California, Santa Barbara, CA 93106

## Abstract

Several industrial FPGA routing architectures have been shown to have no efficient routing algorithms (unless P=NP) [3,4]. Here, we further investigate if the intractability of the routing problem on a regular 2-D FPGA routing architecture can be alleviated by adding routing switches. We show that on this routing architecture, even with a substantial increase in switching flexibility, a polynomial time, predictable routing algorithm is still not likely to exist, and there is no constant ratio bound of the detailed over global routing channel densities. We also show that a perfect routing is unachievable on this architecture even with near complete (maximum) switching flexibility.

We also discuss a new, greedy routing architecture, that possesses predictable and other desired routing properties, yet requires fewer routing resources than regular architectures. This theoretical result may suggest an alternative approach in routing architecture designs.

## 1. Introduction

Routability has become one of the major concerns in current FPGA technology because of its dominating impact on circuit performance and chip area. In this paper, we study routability issues of a 2-dimensional (2-D) Look-up-table and SRAM technology based Xilinx-style [1,5] FPGA architecture.

In the past, the <u>mappability</u> of a global routing of <u>all nets with</u> <u>arbitrary topologies</u> to a feasible detailed routing has been used in justifying the routability of a given architecture [3,4]. Since a routing with just a couple of nets unrouted is as bad as any other failed routing, such a global view reflects <u>the desired 100% routing rate</u>. This routing problem has been shown to be NP-complete for an industrial model of this type [3]. The previous experimental [5] and probabilistic modeling [6] results show that the routability of this routing architecture monotonically increases as the switching resources increase. Here, we analyze the following three routing properties of the investigated architecture with increasing switching resources:

[Existence of an efficient routing algorithm]

• Polynomial Routing Decision Algorithm (PRDA): A polynomial time algorithm to determine the following: Given a global routing, is there a corresponding feasible detailed routing?

[Existence of a routing solution (allowing exhaustive search)]

- <u>Constant Mapping Ratio (CMR)</u>: The detailed routing channel density over the global routing channel density is always bounded by a constant.
- <u>Perfect Routing (PR)</u>: A detailed routing using the same number of tracks as the global routing is always achievable.





Permission to copy without fee all or part of this material is granted, provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

## 2. Routing Model, Terminology

The investigated architecture is depicted in Fig. 1. It is a 2-D array of look-up-tables (L), connection (C) boxes and switch (S) boxes [1,3,6]. Wire segments, spanning between pairs of adjacent switch boxes, run in both vertical and horizontal channels with the same number of tracks W. In each channel, tracks ids are numbered from right to left, or from top to bottom. The C boxes contain routing switches that can be programmed to connect logic pins to wire segments. The S boxes contain switches that allow one wire segment to be connected to another. The switching flexibility,  $\underline{F}_{S_2}$  of a track surrounding an S box is defined to be the number of outgoing tracks to which a track can connect. Here, we assume a logic pin in any C box can connect to any wire running through it. The global channel density, Wg, is the maximum number of nets which run in parallel in any channel as decided by a global router. The detailed channel density, Wd, is the minimum number of tracks needed to realize the connection topology decided by the global router. We define the Mapping Ratio (MR) to be  $W_d/W_g$  and X(G) to be the chromatic number of a graph G.

A 2-D FPGA routing architecture is homogeneous if

1. all L blocks and routing boxes are allocated as a symmetrical 2-D array and routing boxes of the same type have the same connection topology, and

2. the S boxes (except the boundary S boxes) are symmetric (i.e., from one side to any other side has the same connectivity).

A routing architecture is disjoint if, when all S box switches are closed, then all wire segments are partitioned into D>1 disjoint wire sets, called domains, and a given domain will cover the same number of tracks in every C box, which is defined to be the domain capacity,  $\underline{D}_{C}$ , of this domain. A disjoint architecture is even if the domain capacities of all domains are the same, otherwise it is uneven. An architecture is regular if it is both homogeneous and disjoint. Without loss of generality, we can assume the set of track ids belonging to a domain of a regular architecture is the same in every C box. For example, Fig. 1 shows an uneven regular architecture with two domains. Any wire segment with track id 1 or 2 belongs to the first domain and can connect to any other neighboring tracks with track id 1 or 2. The second domain has domain capacity of just 1 which is the same as the popular XC4000 architecture [1]. In this second domain, each diagonal intersection point represents six switches, for which an arbitrary subset can be programmed to be closed to make a desired junction. In this paper, we do not discuss general non-disjoint type architectures. An analysis of this type with Fs = 3 can be found in [3].

# **3.** Routing Decision Problem on the 2-D Regular Routing Architecture

The Routing Decision Problem (RDP) for an even regular architecture with  $D_c = 1$ , which is similar to the popular XC4000 architecture, has been shown to be NP-complete [2]. Here we investigate if the intractability of this routing problem can be alleviated by purely adding switching resources. By multiplying the F<sub>s</sub> value of each track surrounding every S box, the architecture remains even regular with a multiplied  $D_c$  value.

[RDP of 2-D even regular routing architecture]

Given: A global routing of a circuit on a 2-D even regular routing architecture with channel capacity W, domain capacity  $D_c$ , and  $D = W/D_c$  domains.

Question: Does there exist a valid W track (D domain) detailed routing for the given global routing? I.e., is the global routing W track routable (D domain routable)?

In [2, 3], the graph coloring problem was reduced to the  $D_c=1$  RDP by considering each node in the graph as a net. If there is an edge between two nodes, then the two corresponding nets run through the same C box. Then the X(G) of the graph will be the number of tracks (capacity one domains) needed for the detailed routing. Thus the RDP was shown to be NP-complete for W 3. In the case of W = 2 and  $D_c = 1$ , this RDP can be solved by a polynomial 2-coloring graph algorithm. (Certainly, an even regular structure with just 2 tracks is not realistic.) To see the complexity of a routing architecture with domain capacity greater than one, we first examine an example of a regular architecture with domain capacity two.

In Fig. 2, a global routing of 6 nets is given. Can the global routing be mapped to a detailed routing on a chip with two domains each of capacity 2? Note that all wires in a net must belong to the same domain because only routing tracks within the same domain have switches to connect to each other. Fig. 2a depicts a situation where nets  $a_1$  and  $a_2$  are assigned into the same domain, say D<sub>1</sub>; and nets b<sub>1</sub> and b<sub>2</sub> are assigned to the other, D<sub>2</sub>. Then nets  $a_1$  and  $a_2$  have to stay in D<sub>1</sub>, and because the domain capacity is just 2, nets  $c_1$  and  $c_2$  must be assigned to the domain D<sub>2</sub>. This leads  $c_1$  and  $c_2$  to conflict with b<sub>1</sub> and b<sub>2</sub> in the lower right C box. On the other hand, the domain assignment shown in Fig. 2b is a feasible solution ( $a_1$  and  $a_2$  are in two different domains).





Fig. 2a. An ilkgaldetaikd route of 6 nets on an even regular architecture with 2 domains (each with Dc = 2).

Fig. 2b. A legal detailed route of the same global route

**Lemma 1.** Problem [ RDP of 2-D Even Regular Routing Architecture ] is NP-complete for any fixed S box topology with  $D_c = 2$ , and D = 3.

#### Proof:

This problem is certainly in NP.

We now reduce the 3-colorable graph problem to our problem. We say two nets have a routing <u>track constraint</u> if their global routes run through a same C box, and have a <u>domain</u> <u>constraint</u> if they must be assigned into two different domains. Be aware that a routing track constraint does not necessarily imply a domain constraint when  $D_c>1$ .

Let G(V,E) be any arbitrary graph. We will construct a 2-pin net routing instance whose domain constraints are given by the edges of G. Thus G(V,E) is 3-colorable iff the constructed FPGA routing instance is 3-domain routable.

Let the edges in G be numbered from 1 to |E|. Associated with each edge  $e_i$ , say (a,b) here, there is a partial global routing in 7 (D\*D<sub>c</sub>+1 in general) diagonally placed C boxes with 4 nets {a<sub>1</sub>, a<sub>2</sub>, b<sub>1</sub>, b<sub>2</sub>} ({a<sub>1</sub>...,a<sub>D<sub>c</sub></sub>, b<sub>1</sub>,...,b<sub>D<sub>c</sub>} in general) and 17 enforcer nets ((D<sub>c</sub>\*D-1)D+(D-2)D<sub>c</sub> in general) {x<sub>i1</sub>, x<sub>i2</sub>, y<sub>i1</sub>, ..., y<sub>i5</sub>, y'<sub>i1</sub>, ..., y'<sub>i5</sub>, and y"<sub>i1</sub>, ..., y"<sub>i5</sub>}.</sub>



The 7 C boxes (for edge  $e_i = (a,b)$ ) arranged in a diagonal fashion will contain net segments as following:

| $(x_{i1}, x_{i2}, a_1, a_2, b_1, b_2),(1)$                    |
|---------------------------------------------------------------|
| $(x_{i1}, y_{i1}, y_{i2}, y_{i3}, y_{i4}, y_{i5})$ (2)        |
| $(x_{i2}, y_{i1}, y_{i2}, y_{i3}, y_{i4}, y_{i5}),(3)$        |
| $(a_1, y'_{i1}, y'_{i2}, y'_{i3}, y'_{i4}, y'_{i5})$ (4)      |
| $(a_2, y'_{i1}, y'_{i2}, y'_{i3}, y'_{i4}, y'_{i5}),(5)$      |
| $(b_1, y''_{i1}, y''_{i2}, y''_{i3}, y''_{i4}, y''_{i5})$ (6) |
| $(b_2, y''_{i1}, y''_{i2}, y''_{i3}, y''_{i4}, y''_{i5})(7)$  |

By allocating the C boxes diagonally, we can build a partial global route, starting from the leftmost (highest) C box, that connects all segments belonging to the same net by using a vertical route starting at the adjacent lower right C box then to the nearest lower C box containing the same net (as shown in Fig. 3). To complete the global routing, the appropriate nets from the partial global routings must be connected. For example, if there is an edge (a,b) and an edge (a,c) then the net segments a<sub>1</sub> and a<sub>2</sub> from both partial global routings need to be connected. This process is done for all nets. This global route can be built without adding new track constraints, since any additional track constraint created by this is a proper subset of track constraints created by one of the original C boxes.

The goal of this partial global routing in these 7 C boxes is to enforce the following domain constraints:

- Nets x<sub>i1</sub> and x<sub>i2</sub> must be in the same domain. Similarly for nets a<sub>1</sub>, a<sub>2</sub> and nets b<sub>1</sub>, b<sub>2</sub>.
- 2. Nets a<sub>1</sub> and a<sub>2</sub> must be in a different domain from nets b<sub>1</sub> and b<sub>2</sub>.

We claim that with the above partial global routing, these 2 domain constraints will be fulfilled. First, from (2) and (3), nets  $x_{i1}$  and  $x_{i2}$  must be in the same domain. Suppose not. Then let  $x_{i1}$  be in domain  $D_1$  and  $x_{i2}$  be in domain  $D_2$ . Then there must be a  $y_{ij}$  in  $D_1$  and a  $y_{ik}$  in  $D_2$ , say  $y_{i1}$  in  $D_1$ ,  $y_{i2}$  in  $D_2$ . However that leaves  $y_{i3}$ ,  $y_{i4}$ , and  $y_{i5}$  to fit in only one domain (capacity only 2), which is impossible. Thus nets  $x_{i1}$  and  $x_{i2}$  must be in the same domain. A similar argument holds for (4) and (5) concerning nets  $a_1$  and  $a_2$ , and (6) and (7) concerning nets  $b_1$  and  $b_2$ . Therefore, domain constraint 1 is satisfied. Finally, from (1) nets  $x_{i1}$  and  $x_{i2}$ , nets  $a_1$  and  $a_2$ , and nets  $b_1$  and  $b_2$ , all must be in different domains, thus domain constraint 2 is satisfied as well.

Every coloring constraint edge in G(V,E) will impose a similar routing domain constraint in the FPGA routing instance, therefore, G is 3-colorable iff the FPGA routing instance is 3-domain routable.

As the 3-coloring problem is NP-complete for a general graph, the considered 3-disjoint domain routability problem is also NP-complete. Q.E.D.

For general D domain routable problem with D 3 and D<sub>c</sub> 2, we similarly build a routing instance with  $Wg = D^*D_c$  by adding  $(D_c^*D-1)D$  y-type +  $(D-2)D_c$  x-type enforcing nets, and reduce a D-colorable graph coloring problem to this D-domain routable problem. For each edge  $e_i = (a,b)$ , we build a partial global routing with the following  $D^*D_c+1$  C boxes:

 $\begin{array}{l} (x^{i}_{i1},..,x^{i}_{ibc},...,x^{D^{-2}}_{i1},...,x^{D^{-2}}_{ibc}, a_{1},..,a_{bc}, b_{1},..,b_{bc}), \\ (a_{1}, y^{i}_{i1},...,x^{D^{-2}}_{ibc}, a_{1},..,a_{bc}, b_{1},..,b_{bc}), \\ (a_{1}, y^{i}_{i1},...,x^{D^{-2}}_{ik}) \text{ where } k = Wg - 1 \\ \dots \\ (a_{bc}, y^{i}_{i1},...,y^{i}_{ik}), \\ (b_{1}, y^{2}_{i1},...,y^{2}_{ik}), \\ (x^{i}_{i1}, y^{3}_{i1},...,y^{2}_{ik}), \\ (x^{i}_{i1}, y^{3}_{i1},...,y^{3}_{ik}), \\ \dots \\ (x^{D^{-2}}_{i1}, y^{D}_{i1},...,y^{D}_{ik}), \\ \dots \\ (x^{D^{-2}}_{ibc}, y^{D}_{i1},...,y^{D}_{ik}) \\ \dots \\ (x^{D^{-2}}_{ibc}, y^{D}_{i1},...,y^{D}_{ik}) \\ \end{array}$ 

The complexity of this problem remains NP-complete. Thus we have the following theorem.

**Theorem 1.** Problem [ RDP of 2-D Even Regular Routing Architecture ] is NP-complete for any fixed S box topology with  $D_c$  1 and D 3.

From this theorem we conclude that because the D domain routability problem is NP-complete for any even regular architecture with any large number of routing domains and capacity, it is not likely that we can alleviate this mapping intractability by purely adding switches. Similar to the approach shown in [7], this result can be easily extended into multi-pin routing cases with or without doglegs.

## 4. Mapping Ratio Bound and Perfect Routing Problem

The next questions are: Does there exist a 2-D regular routing architecture that can achieve perfect routing? And what is the worst case mapping ratio bound of this architecture? **Lemma 2.** In an even regular routing architecture, there is no constant mapping ratio bound, if  $W_g > D_c$ . In the worst case, MR = N/W<sub>g</sub>, where N is the number of nets.

#### Proof

If  $W_g = D_c$ , all the nets can be contained in a single domain. Otherwise, if  $D_c < W_g$  we claim that there is no constant ratio bound and consequently no perfect routing. We can show the existence of a worst case by building a completely distributed global route of N nets over Wg density. This is a net segment distribution such that for every Wg size subset of N nets, there is a C box containing segments of these nets. Thus there will be C(N, Wg) connection boxes each with a different combination of Wg nets. Given a completely distributed global routing, any routing domain with capacity D<sub>c</sub> can only pack (detailed route)  $D_c$  nets. This is because if this domain packs  $D_c + 1$  nets then it will have a resource conflict at those C boxes that contain these  $D_{c}$  + 1 nets. Therefore, after the first domain is packed, N -  $D_{c}$ nets will be left unpacked. Similarly, the next routing domain can only pack another D<sub>c</sub> of the leftover nets. The same reasoning is valid until finally the number of leftover nets are less then or equal to D<sub>c</sub> and packable in one domain. In this case, there is no detailed routing solution without using exactly N tracks, and MR = N/W<sub>g</sub>, which is clearly not constant bounded. Consequently, there can be no perfect routing for this architecture either. The key left for this proof is to show this kind of completely distributed global route can be built on regular architectures.





Fig. 4 shows an example on how a completely distributed global route with N = 4, and  $W_g = 3$ , can be built on a regular routing architecture. By allocating each of such C box diagonally, we can assign a unique vertical route for each net and then connect it to all its segments in all the appropriate C boxes. It is clear that this is a valid global route without adding new routing constraints. Q.E.D.

It is not hard to see that these undesired results are not just on even regular structures, it is also true on any uneven regular model .

**Theorem 2.** All 2-D (even or non-even) regular routing architectures do not have constant mapping ratio bound if  $W_g > max$  ( $|D_i|$ ), where  $|D_i|$  is the domain capacity of the ith domain, and cannot achieve perfect routing.

Considering an uneven regular routing architecture with just two routing domains, one with capacity W - 1 and the other with capacity one. In such a routing architecture, the switch density of a switch box is  $6(W - 1)^2 + 6$ , which is close to the <u>complete flex-ibility</u> of  $6W^2$ , however, both CMR and PR are still not achievable.

## 5. Greedy Routing Architectures

The design style of a routing architecture seems to have a dominating impact on the existence of some desired routing properties. In this section, we show a different design style, a greedy routing structure, that can lead to some better routing properties.

A <u>Greedy Routing Architecture (GRA)</u> is defined to be a routing architecture possessing the following <u>greedy routing proper-</u><u>ty</u>: "A global optimum routing can be extended from a local optimal routing." A classical example of GRA is the channel routing problem with no vertical constraint for which an optimum routing can be achieved from greedily packing each track domain using the left-edge algorithm. Most other known routing architectures do not possess this greedy routing property, therefore an optimally routed partial routing can not be extended into an optimum global solution. For example, a single net may be routed optimally, but this route may be a poor choice when the routing of all nets is considered. Among many possibilities, we just show two simple examples to illustrate this notion.

## 5.1 type I: 3-Way Switch Box



A T-box is defined to be an S Box with incoming and outgoing nets in only 3 directions. Two T-box structures are shown at Fig. 5. Fig. 5a shows a T-box with  $F_s$  of 2 for each track while in Fig. 5b, complete flexibility (In fact, models with less flexibilities can also achieve these same properties.) is added in between the top and the left sides. Using these T-boxes, the global routing of all nets can be limited into a tree type topology. And an optimum greedy routing can be carried out in linear time with the worst case mapping ratio of 3/2 for T-box-1 based architecture [7]. Here, we demonstrate a simple way of conducting a greedy perfect routing for a T-box-2 based architecture by solving a local [ one-side constrained 3-way routing problem ]:

Instance: A T-box-2 routing architecture, a global route of nets on the 3 surrounding C boxes of the T-box-2, with detailed route of one side (top) pre-determined (one-side constrained).

Question: Does there exist a valid detailed routing of this global routing on the given T-box-2 routing architecture?

The routing problem can be solved by

1. first applying the left edge routing algorithm from the top to the right C boxes, then from the right to the left C boxes.

2. applying bipartite matching between the top and left C boxes. This algorithm solves the PR problem of the T-box-2 locally, and a global optimum routing can be achieved by propagating this local routing to the whole chip by either depth first or breadth-first manner from the root. (Fig. 6.)



#### 5.2 Type II: Non-Symmetric 4-Way Switch Box

By binding a routing architecture to a pre-specified routing procedure (or algorithm), many greedy routing structures can also be built. The architecture shown at Fig. 7 is an example of applying a prespecified spiral routing sequence with the preallocated three kinds of non-symmetric 4-way S boxes. A 4-way nonsymmetric S box can be built by adding more switches on certain side pairs to make a prescribed routing procedure perfect. The number, starting from 1, shown in the figure is the routing sequence of those C boxes. In this example three kinds of S boxes are used, each kind is used to implement a 4-way routing with one-side, two-side, or three-side constrained respectively. Compared to a regular structure that requires  $6W^2$  (complete flexibility) switches per S box to achieve CMR or PR, this greedy architecture requires fewer switches while achieving CMR and PR without any additional limitation on net topologies [7].

**Theorem 3.** Both Type I and II greedy architectures can achieve PRDA, CMR, and PR with incomplete switch box flexibilities.

## 6. Conclusions

In this paper, we investigate several fundamental routing problems on a well-known regular 2-D FPGA architecture with substantial addition of switching flexibility. The results seem to be a bit counterintuitive. We also see that two types of greedy architectures can be developed by either limiting the global route topology (3-way S box) or applying non-symmetric 4-way S box-es without limiting the global route topology. The common theme of these greedy architectures is a strong tie between the routing algorithms and their correspondingly designed architectures. Probably, a more interesting point is: the routing algorithm is decided first and then the architecture is.

Although in the worst-case analysis (CMR and PR), the greedy architecture shows a clear superiority over the regular one, it is still unknown how a more practical "semi-greedy" architecture performs on average cases over a regular architecture with comparable routing resources. And it is still open if a type I greedy architecture would pay-off with the sacrifice of some topological freedom in choosing a global route. With respect to the PRDA, some regular architectures may still possess efficient routing heuristics [8]. It will be a future challenge to justify these

complicated trade-offs in choosing these two different style of architectures and its extension to higher design automation levels. We suspect that this may be a strongly application dependent issue and greedy architectures should find some application in very large real-time reconfigurable circuitries. Our theoretical results help understand the implications the routing architecture imposes on routing problems and also suggest a new view in routing architecture designs.

## 7. Acknowledgements

The authors would like to thank Professor Malgorzata Marek-Sadowska of University of California, Santa Barbara, and Professor Shuji Tsukiyama of Chuo University, Tokyo, for their many helpful technical discussions.

This work was supported in part by the National Science Foundation under Grant MIP 9117328 and in part by AT&T Bell Laboratories and Digital Equipment Corporation through the California MICRO program.

## **References**:

[1]. "The Programmable Logic Data Book", Xilinx, 1994.

[2]. Yu-Liang Wu, and Malgorzata Marek-Sadowska, "Graph Based Analysis of FPGA Routing", Proceedings of EURO-DAC with EURO-VHDL, pp. 104-109, 1993.

[3]. Yu-Liang Wu, Shuji Tsukiyama, and Malgorzata Marek-Sadowska, "On Computational Complexity of a Detailed Routing Problem in Two-Dimensional FPGAs", Proceedings of 4th Great Lakes Symposium on VLSI, March 1994.

[4]. Vwani P. Roychowdhury, Jonathan W. Greene, and Abbas El Gamal, "Segmented Channel Routing", IEEE Trans. on CAD, Vol. 12, No. 1. pp. 79-95, January 1993.

[5]. S. Brown, J. Rose, and Z. G. Vranesic, "A Detailed Router for Field-Programmable Gate Arrays", IEEE Trans. on CAD, Vol. 11, No. 5. pp. 620-628, May 1992.

[6]. Stephen Brown, J. Rose, and Zvonko G. Vranesic, "A Stochastic Model to Predict the Routability of Field-Programmable Gate Arrays", IEEE Trans. on CAD, Vol 12, No. 12, pp. 1827-1838, December 1993.

[7]. Yu-Liang Wu, Shuji Tsukiyama, and Malgorzata Marek-Sadowska, "Graph Based Analysis of 2-D FPGA Routing", submitted for publication.

[8]. Yu-Liang Wu, and Malgorzata Marek-Sadowska, "An Efficient Router for 2-D Field Programmable Gate Arrays", Proceedings of EDAC, pp. 412-416, 1994.