HDL Code Restructuring Using Timed Decision Tables

Jian Li
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, Illinois 61801

Rajesh K. Gupta
Information & Computer Science
University of California, Irvine
Irvine, CA 92697

http://www.ics.uci.edu/~iesag

Abstract

Behavioral HDL descriptions from designers are structured in a program calling hierarchy for the purposes of programming convenience and conceptualization. This code structure is often not suitable for direct synthesis into digital hardware. For instance, information regarding operation exclusivity and resource sharing can be explored by restructuring the code. In this paper, we present a method to restructure behavioral HDL code using merging and decomposition of Timed Decision Tables (TDTs).

1 Introduction

With the maturity of synthesis tools at various levels of design abstractions, hardware design is getting closer to software programming [1]. As in programming, designers often use language level structures such as subprograms (functions/procedure calls) to organize the HDL descriptions. This structure in HDL descriptions is useful for programming convenience and design conceptualization. However, the subprogram calling hierarchy most suitable for design development and design comprehension is not always suitable for synthesis. In this paper, we propose a method to change the subprogram hierarchy in designer-specified HDL descriptions into one that is more suitable for synthesis. The method is implemented using a tabular model called Timed Decision Tables (TDTs). We first use TDT merging to flatten subprogram calling hierarchy. Then we use TDT decomposition to re-build a calling hierarchy under cost matrices related to operation/subprogram size and calling frequency.

Due to space limitation, we focus on use of TDT transformations rather than transformation algorithms. Specifically, we use TDT merging and decomposition transformations. We explain concepts of TDT modeling, TDT merging, and TDT decomposition using examples in HardwareC [2].

The rest of this paper is organized as follows. Section 2 introduces the TDT model and merging and decomposition transformations on TDTs. Section 3 demonstrates code restructuring using an example followed by a discussion of implementation and results.

2 Hierarchy Merging & Decomposition

In this section we first briefly introduce the TDT model before we present TDT merging and decomposition.

2.1 The TDT Model

The TDT representation models a hardware component with a set of three tables: a control table, a dependency table, and a delay table. The control table models the control-flow of input HDL description. It is organized in the form of decision tables [3, 4]. A decision table consists of four quadrants: condition stub, condition entries, action stub, and action entries. The condition stub list conditions which are condition checking in the conditional branches and conditional loops in the input HDL. The action stub list actions which are operations in the behavioral code. Condition entries and action entries are two matrices, each column of which forms a control path. The dependency table specifies information such as data-dependency, concurrency type [2], and serialization relation between each pair of operations. The delay table lists the execution delay associated with each action. More details of TDT can be found in [5].

Example 2.1. Consider the following control table of a TDT model with two action sets $a_2$ and $a_3$ in its two different control paths. In this simple case we ignore the dependency table and take the control table as the whole representation.

$$
TDT_1 = \begin{bmatrix}
   c_1 & 1 & 0 \\
   a_2 & 1 & 0 \\
   a_2 & 0 & 1 
\end{bmatrix}
$$

Note that a '1' in the condition part indicates that condition $c_1$ takes a logic value TRUE while a '0' in the condition indicates
a logic value FALSE. A ‘1’ in the action part indicates that the action in the same row is selected for execution while a ‘0’ indicates that the corresponding action is not selected for execution in this control path.

A TDT may call another TDT as its action. This forms a hierarchical TDT representation.

**Example 2.2.** Now suppose a1 in Example 1.1. is modeled by TDT2 below.

\[
\begin{array}{c|cc}
  c_2 & 1 & 0 & 0 \\
  s_1 & 1 & 0 \\
  s_4 & 1  \\
  s_5 & 1 \\
\end{array}
\]

Then we have a hierarchical TDT representation consisting of TDT1 calling TDT2. Note that for clarity, we have dropped the ‘0’s in the action part.

There are two types of TDTs: procedure TDTs, and process TDTs. A procedure TDT corresponds to nested conditional branches or a subprogram (procedure/function). It is executed only once every time it is invoked. Both TDT1 and TDT2 are examples of procedure TDTs. A process TDT corresponds to a process or a loop in a behavioral HDL. It is executed repeatedly once invoked. A process TDT representing a conditional loop exits at a certain point while a process TDT representing a process runs indefinitely.

**Example 2.3.** A simple loop “while(c) a;” may be represented as a process TDT:

\[
\begin{array}{c|cc}
  S & 0 & 0 \\
  c & 1 & 0 \\
  a & 1  \\
  S \text{ (end of loop)} & 0 & 1 \\
\end{array}
\]

S is a state variable. The double outlines indicate that this is a process table. It starts when \( S = 0 \) and stops when \( S = 1 \).

**2.2 Transformations on TDTs**

This section briefly describes important TDT transformations for hierarchy optimization. There are two categories of TDT transformations: (1) transformations involving multiple TDTs, and (2) transformations on a single TDT.

Transformations involving multiple TDTs include TDT merging and TDT decomposition. Merging is the process of creating a flattened TDT representation from a hierarchical TDT representation. Decomposition is the process of breaking a flattened TDT into TDTs organized in a calling hierarchy.

Transformations on a single TDT are either used for the purpose of reducing the size of TDTs or make other TDT transformations possible. We have presented transformations targeting at size reduction in [5]. Here we show one transformation which is used in merging.

Serialization specified in the dependency table may be explicitly included in the control table by introducing a state variable in the control table. In the sequel, we first show a TDT with a dependency table, followed by an equivalent TDT with serialization information included in the control table.

**Example 2.4.** Consider the following TDT representation:

\[
\begin{array}{c|cc}
  a_1 & a_2 & a_3 \\
  c_1 & 1 & 0 & 0 \\
  c_2 & - & 1 & 0 \\
  a_1 & 1 & 1 \\
  a_2 & 1 & 1 \\
  a_3 & 1 & 0  \\
\end{array}
\]

The ‘m’ in the dependency table indicates that action \( a_3 \) modifies condition \( c_2 \). After introducing a state variable \( S \) to explicitly serialize \( a_1 \) and \( a_2 \) in the control table, we get the following TDT:

\[
\begin{array}{c|cc|c}
  a_1 & a_2 & a_3 & S \\
  c_1 & 1 & 1 & 0 & 0 \\
  c_2 & 1 & 1 & 0 & 0 \\
  a_1 & 1 & 1 & 1 \\
  a_2 & 1 & 1 & 1 \\
  a_3 & 1 & 1 & 1 \\
  S & 0 & 0 & 0 \\
  S & 1 & 1 & 1 \\
\end{array}
\]

The new TDT is a process TDT, which starts execution from \( S = 0 \) and stops when \( S = 2 \). Actions \( a_2 \) and \( a_3 \) can also be serialized in a similar fashion.

**2.3 TDT Merging**

Merging is the process of creating a big flattened TDT representation from a hierarchical TDT representation. Merging involving only procedure TDTs results in a procedure TDT. Merging involving a process TDT always results in a process TDT.

**2.3.1 Three Basic Merging Cases**

Merging involving only procedure TDTs may be classified into three types: (1) merging TDTs in a hierarchy, (2) merging TDTs in a sequence, and (3) merging TDT with a following or preceding action set. Below we show one example for each merging case.

**Base Case 1: merging TDTs in a hierarchy.**

This corresponds to merging TDTs converted from behavioral HDL code with subprogram calls or nested conditional branches.

**Example 2.5.** Given the hierarchical TDT representation in Example 2.2, which consists of TDT1 calling TDT2 as an action set. They can be merged into one table as follows.
Base Case 2: merging TDTs in a sequence. Merging in this case is valid when a concurrency type of parallel is specified, or when a type of data-parallel is specified and no action in a preceding TDT modifies conditions of following TDTs. The dependency tables need to be modified accordingly when there is data dependency between actions in different TDTs.

Example 2.6. Given a TDT sequence \( \{TDT_5; TDT_6\} \) with details of the two TDTs shown below.

\[
\begin{array}{c|c|c|c}
TDT_5: & c_1 & 0 & 1 \\
& c_2 & 1 & 0 \\
& a_1 & 1 \\
\end{array}
\begin{array}{c|c|c|c}
TDT_6: & c_1 & 0 & 1 \\
& c_2 & 1 & 0 \\
& a_2 & a_3 \\
\end{array}
\]

These two TDTs can then be merged into \( TDT_m \) where

\[
\begin{array}{c|c|c|c}
TDT_m: & c_1 & 1 & 0 \\
& c_2 & 0 & 1 \\
& a_1 & 1 \\
& a_2 & 1 \\
& a_3 & 1 \\
\end{array}
\]

Base Case 3: merging a TDT with an action set. This transformation is invalid when one action modifies the condition in a following TDT.

Example 2.7. We assume \( a_3 \) follows \( TDT_1 \) shown in Example 2.1, in an action set of concurrency type serial. Action \( a_3 \) and \( TDT_1 \) can be merged as shown in the following.

\[
\begin{array}{c|c|c|c}
TDT_1: & a_1 & 0 & 1 \\
& a_2 & a_3 \\
\end{array}
\begin{array}{c|c|c|c}
& c & 1 & 0 \\
& a_1 & 1 & 0 \\
& a_2 & 0 & 1 \\
& a_3 & 1 & 1 \\
\end{array}
\]

Note that to preserve the specified behavior, we have to modify the dependency table. The symbol 'n' in column \( a_1 \) indicates that if both \( a_1 \) and \( a_3 \) are selected for execution, they are run in a sequential order where \( a_3 \) is always run after \( a_1 \) finishes.

2.3.2 Merging Involving Data-Dependent Conditions

To handle a data-dependent condition, we introduce a state variable in TDTs to serialize the modification and use of the condition. This may be better explained using one example.

Example 2.8. Given a HardwareC code fragment:

\[
c = e > f; \text{ if } (c) \ a_2; \text{ else } a_3;
\]

The if statement can be converted into a TDT with one condition \( c \). We call this table \( TDT_8 \):
where $TDT_s$ is a sub-TDT containing two serialized actions $a_1$ and $a_2$:

<table>
<thead>
<tr>
<th>$a_1$</th>
<th>$a_2$</th>
<th>$a_3$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
</tr>
</tbody>
</table>

Note that the dependency table has been split accordingly as well. □

In [6] we have presented an algorithm to automatically decompose a flattened TDT. This algorithm uses a two-level algebraic representation of the control table. It consists four major steps:

1. Construct the two-level algebraic representations of the control table.
2. Extract the algebraic kernels of the above expression.
3. Select kernels that lead to valid transformations using specification in the dependency table. (these kernels are called compatible primary kernels [6])
4. Construct the hierarchical TDT representation using original TDT and the compatible primary kernels.

**Example 2.11.** The two-level algebraic expression of control table $TDT_{src}$ is

\[ S_{c1}a_1 + S_{c2}a_2 + S_{c1}c_1a_1 + S_{c1}c_2a_2 + S_{c1}c_3a_3 \]

If we introduce one more state bit $S_1$ to include in control table the serialization between $a_2$ and $a_3$, we get

\[ S_{s1}S_{c1}a_1 + S_{s1}S_{c2}a_2 + S_{s1}S_{c1}c_1a_1 + S_{s1}S_{c1}c_2a_2 + S_{s1}S_{c1}c_3a_3 \]

A primary kernel which has two co-kernels and does not appear to be a kernel of another such kernel is $S_{s1} + S_{s2}$. This kernel is essentially two actions $a_1$ and $a_2$ executing in a sequence. They may be factored out as a procedure if this leads to resource reduction in synthesis. □

The decomposition algorithm is driven by cost functions related to maximizing the number of times a subtable is called, and to minimizing the size of the subtable. Decomposition is done in an iterative manner with merging to construct alternative hierarchical structure.

### 3 HDL Code Restructuring

The merging and decomposition algorithms have been implemented in a program called PUMPKIN. We use TDT merging and TDT decomposition in HDL code restructuring. In this section we give one example to show how restructuring is done.

**Example 3.1.** Consider the following code fragment of the receiver synchronous component of the i8251 UART, written as a HardwareC process `rcvr_sync` which calls a procedure named `hunt_mode`. The process `rcvr_sync` reads data from a serial line, packs it up according to a control command from the main control process called `i8251`, and sends it to process `i8251`. Procedure `hunt_mode` finds the synchronization point in time for process `rcvr_sync` before `rcvr_sync` starts packing and sending data.

```hardwarec
procedure hunt_mode( rxd, drdy, sync1, sync2, mode )

if ( valid ) {
  ncount = ncount - 1;
  data = ncount - 1;
  if ( mode[7:7] == 0 ) {
    if ( sync_mode ) {
      send( i8251, data);
      ncount = ncount - 1;
      if ( ncount < 0 ) {
        done = FALSE;
      } else {
        ncount = ncount + 1;
      }
    }
    goto( done );
  }
}
```

```hardwarec
process rcvr_sync(rxd, drdy, valid, mode, control, sync1, sync2, mode)

if ( valid ) {
  ncount = ncount - 1;
  if ( mode[7:7] == 0 ) {
    if ( sync_mode ) {
      send( i8251, data);
      ncount = ncount - 1;
      if ( ncount < 0 ) {
        done = FALSE;
      } else {
        ncount = ncount + 1;
      }
    }
    goto( done );
  }
}
```

We first convert the HardwareC programs into a hierarchical TDT representation. Then we perform merging to get a flattened TDT representation. Using the algorithms presented in [6], we have found two primary compatible kernels $k_1$ and $k_2$ as indicated in Example 3.1. above. We pick $k_2$ containing a subtraction `ncount - 1` operation and a shifting operation `data >> 1` since it leads to bigger potential size reduction. We reconstruct a sub-TDT using this primary compatible kernel. From this sub-TDT, we generate a HardwareC procedure as shown in Figure 1.

Using the co-kernels and the flattened TDT representation, we can construct the main TDT in the...
procedure sharable(rxd, drdy, data, mode) in port rxd, drdy; in port mode[8];
out port data[8];
[
    boolean ncount[3];
    ncount[2:2] = 1;
    ncount[0:1] = mode[1:2];
    while ( ncount ) [
        wait ( drdy );
        data[7:7] = rxd;
        data = data » 1;
        ncount = ncount - 1;
    ]
]

Figure 1: The extracted sharable procedure.

resulting hierarchical TDT representation. From this TDT representation, we generate the rest of the process rcvr_sync as shown in Figure 2.

4 Discussion and Summary

We have presented a TDT-based method to modify the subprogram hierarchy in behavioral HDL descriptions. This code restructuring is a generalization of code transformation and motion techniques in compilers.

Earlier work on code structuring forms a part of compiler optimizations in high level programming languages [7, 8]. The TDT based code-restructuring is different from conventional compiler optimizations [7] in several ways. First, there are multiple levels of nested concurrent operations in a behavioral HDL description in contrast to a sequential code. While parallelism between operations is defined and used extensively, the correctness of the final result is defined by the original sequential code (i.e., sequential consistency must be maintained). In HDL code, however, correctness of the transformed code is defined by the partial order and specification of concurrent operations in the original HDL. For this reason, timing semantics of HDL statements in terms of number of cycles taken by an operation and partial ordering of operations must be maintained through the transformations. We capture the timing attributes of operations in the delay table of TDTs and use it to generate concurrent final HDL code. Second, the objective in TDT-based optimizations is very different from that in conventional compilers. Compilers mainly optimize to reduce the execution time of software programs while in HDL code transformations, we are targeting at reducing sensitivity of synthesis results to HDL coding styles, increasing resource sharing, as well as reducing the schedule length when possible.

We are conducting experiments on benchmark designs to identify exactly what hierarchical structure is needed that leads to most efficient synthesis. Results from the experiments will be presented at the workshop.

References