# **Covalidation of Hardware-Software Systems**

# Ian G. Harris

### Department of Computer Science University of California Irvine

- Hardware/Software Codesign and Covalidation
- Evaluation of Fault Models
- Timing Fault Model
- Automatic Test Generation
- Conclusions

### Hardware/Software Codesign



Specification: Describe the behavior of each task

- Partitioning: Group tasks to be implemented together
- Allocation: Map partitions to hardware
- Scheduling: Ordering the execution of tasks
- Communication Synthesis: Map data transfers to comm. structures

### Hardware-Software Codesign/Covalidation Flow



- Design is refined at each step
- Each design step may introduce errors

#### • Validation/verification is a bottleneck in the design process

- High cost of design debug (designers, time-to-market)
- High cost of faulty designs (loss of life, product recall)

#### • Hardware/Software covalidation problem is more acute

- Hardware and software are often used together
- Hardware and software are designed separately
- Covalidation is performed late in the process, necessitating long redesign loops

# **General Verification Approaches**

### Formal verification

- Time complexity is high
- Confidence is high for specified properties

### Simulation based validation

- Full-chip validation
- Confidence is lower
- Pentium 4 bugs found by FV (492) vs. Validation (5809) [1]

[1] B. Bentley, "Validating the Intel Pentium 4 Microprocessor," DAC'01.

# **Stages of Covalidation**



# Challenges in Hardware-Software Covalidation

### Covalidation Fault Model

- Describes a set of potential design defects
- Provides goals for test generation

### Automatic Test Generation

- Create a test sequence which guarantees detection of design defects
- Allows the covalidation process to be more fully automated

# **Covalidation Fault Models**

- A covalidation fault models the behavior of a set of design errors
- Properties of a covalidation fault model:

#### **Modeling Accuracy**

 Detection of all faults should imply the detection of a set of modeled faults



• Few faults should model many design errors

 Assume that a defect is associated with a single statement in the behavioral description

The defect associated with each line is non-specific

Detection of the fault is assumed if the statement is executed

**Goal**: To measure the accuracy of a fault model.

•Quantitative evaluation is essential for model development

- Identify weaknesses in existing models
- Build confidence in the use of a fault model
- Avoid the fate of software test (use manufacturing test as a model)

#### •Compare *fault coverage* to *error coverage*

•Fault coverage is a fast approximation of error coverage

 Compute both fault and error coverage for many benchmarks and test sequences

Examine the difference between the fault and error coverages and the standard deviation of the difference

# **Error Coverage Computation**

•A design error model is needed to compute error coverage

Design Error Model Requirements

•Must describe a well defined subset of real design errors

Must be small enough to be tractable

 Error coverage is computed by injecting design errors and performing simulation

 Error is detected if the output of the correct behavior is different from the output of the erroneous behavior

Example: a = b \* c (correct)  $a = \mathbf{x} * c$  (incorrect)

Error is not detected if c = 0 or if statement is not executed

# **Design Error Model**

• "Goof" errors - simple typographical mistakes

Accounted for 12.7% of design errors found in the Pentium 4\*

•Goof errors are described in research in *mutation analysis* 

- 1. Arithmetic Operator Replacement Each +, -, \*, / is replaced by each other
- Relational Operator Replacement Each >, <, =, !=, ... is replaced by each other
- 3. Variable Replacement Each variable is replaced with each other variable of same type
- \* B. Bentley, "Validating the Intel Pentium 4 Microprocessor", DAC 2001

# **Evaluation Experiments**

- •We evaluate statement coverage and branch coverage
- •We use three small examples written in Java

| Benchmark | Statemts | Branches | Errors |
|-----------|----------|----------|--------|
| GCD       | 19       | 1        | 162    |
| Diffeq    | 21       | 7        | 502    |
| TLC       | 65       | 14       | 214    |

- •20 random test sequences are used
- Each sequence achieves 80% 100% fault coverage
  - Only high coverage values are interesting

# **Evaluation Result Summary**

 Experiment demonstrates the type of information that can be gained from this evaluation technique

Data is not sufficient to draw conclusions

|            | Statement Coverage |               | Branch Coverage |               |
|------------|--------------------|---------------|-----------------|---------------|
| Benchmarks | Average diff       | St. Dev. diff | Average diff    | St. Dev. diff |
| GCD        | 16.35              | 3.17          | 16.68           | 3.16          |
| Diffeq     | 8.88               | 11.59         | 14.60           | 20.45         |
| TLC        | 10.72              | 5.06          | 2.28            | 6.24          |

- St. dev. is more important than average
  - Low st. dev shows consistency/fidelity
- •St. dev is roughly proportional to the number of errors

## **Development of Hardware/Software Fault Models**

### Must be formulated for a behavioral description

Manufacturing test has focused on logic level

### Must consider both hardware and software features

### • Hardware-Oriented Language Features:

Timing - signals vs. variables Concurrency - processes

### Static faults

Independent of absolute event timing

# Timing faults

Depends on a specific timing of events



- Producer/Consumer with FIFO to allow rate mismatch
- Delay on "empty" signal impacts synchronization



# Mis-Timed Event (MTE) Fault

- Signal refs are classified as either **Definition** or **Use**
- Two types of MTE faults can occur
  - MTE<sub>early</sub> definition occurs earlier than the correct time
  - MTE<sub>late</sub> definition occurs later than the correct time



- All definition-use (*du*) pairs must be executed during testing
  - MTE early faults are detected by du pairs
  - MTE late faults are detected by ud pairs
- Time difference between the definition and use must not exceed the error span threshold  $\delta$ 
  - Def and Use must be close in time so that a small time variation will reorder the def and use.
  - Magnitude of  $\delta$  determines sensitivity to timing variation

# Error Span Threshold $\boldsymbol{\delta}$

Determines the certainty that a def-use pair is reordered in the presence of a fault



 Large δ: small change in def time may not reorder the def and use

Testing requirements are less stringent, high coverage

Small δ: small change in def time is more likely to reorder def and use

Testing requirements are more stringent, low coverage

# MTE Coverage, Experimental Results

| Bench-<br>mark | # of<br>stmt. | # of<br>signals | # of<br>pairs | MTE<br>cov. * | Stmt.<br>Cov. |
|----------------|---------------|-----------------|---------------|---------------|---------------|
| AAL1           | 538           | 22              | 1732          | 0.1           | 0.70          |
| switch         | 402           | 29              | 200           | 0.65          | 0.93          |
| risc8          | 306           | 37              | 1032          | 0.54          | 0.60          |
| DTMF           | 1257          | 17              | 262           | 0.39          | 0.54          |

Industrial examples in Verilog

- Functional testbenches provided with each example
- \* Error span threshold is infinitely large. FC is maximized.

### Impact of $\delta$ on MTE Coverage, Data Switch



- Create test sequences to detect MTE faults
- Formulate the problem as a constraint logic programming (CLP) problem
- Use a generic CLP solver to perform test generation
- Solver searches the space of all computations to identify one which detects each fault

# **Test Generation Process**



- Input: *a* behavioral system specification
- Output: test sequence to detect a timing fault
- CCG: Computation Constraints Generator
- Computation Constraints describe system behavior

# **Behavioral Representation: CFSM model**

- **Co-design Finite State Machine**: A system is defined as a network of CFSMs where each CFSM describes a concurrent process in the system. The CFSMs communicates via events on signals.
- An Example of CFSMs System: traffic light controller



• Each edge is a cause-reaction pair

•MTE<sub>early</sub> fault on \*short signal: asserted while the *highway* is in the *green* state

# **Computation Model**

Each computation of the system must be described by the values of a set of integer variables **Computation Variables:** State Variable contains the value of state for each CFSM c at a given time step t.  $SV_{highway,0}$  = "green" *Edge Variable* the edge in each CFSM *c* which is traversed at a given time step t.  $EV_{highway,1} = e^{1}$ Signal Variable the value of signal at a given time step. *trigger signal* \*short\_0 = 0 value signal havecar\_2 = 1

# State Constraints of CFSM Computation

- a CFSM can be in state *s* at time t (SV<sub>*c*,*t*</sub> = *s*) if one of the following statements is true:
- The CFSM is in state s at time t-1 and the CFSM does not traverse an edge at time t-1;
- The CFSM is in a state s<sub>p</sub> at time t-1 and an edge from state s<sub>p</sub> to s is traversed at time t-1.



### **Example of State Constraints**



# Edge Constraints of CFSM Computation

a CFSM will traverse an edge e if all of the following statements are :

- The CFSM is in state s<sub>p</sub> at time t, where s<sub>p</sub> is the predecessor state of edge e;
- All of the trigger conditions of edge e are satisfied at time t.



## Example of Edge Constraints

Example: edge constraints of CFSM highway at time 1,

$$(\mathsf{EV}_{highway,1} = "e1") \rightarrow (\mathsf{SV}_{highway,1} = "green") \cap (\mathsf{car}_1 = 1) \cap (\mathsf{havecar}_1 = 1) \quad (1)$$

$$(\mathsf{EV}_{highway,1} = "e2") \rightarrow (\mathsf{SV}_{highway,1} = "yellow") \cap (\mathsf{short}_1 = 1) \quad (2)$$

$$(\mathsf{EV}_{highway,1} = "e3") \rightarrow (\mathsf{SV}_{highway,1} = "red") \cap (\mathsf{long2}_1 = 1) \quad (3)$$

$$(\mathsf{EV}_{highway,1} = "null") \rightarrow \mathsf{NOT1} \cap \mathsf{NOT2} \cap \mathsf{NOT3}$$



# Signal Constraints of CFSM Computation

two types of signals exist in CFSM system

- Trigger Signal: such as \*short
  - \*tsig<sub>t</sub> = 1 if at least one edge emitting this signal is traversed at time (t-1);

 $*tsig_t = 0$  otherwise.





### Example of Trigger Signal Constraints

Example: trigger signal \*short constraints of CFSM highway at time 3,

$$(\text{short}_3 = 1) \rightarrow (\text{EV}_{timer,2} = \text{``e8''}) \cup (\text{EV}_{timer,2} = \text{``e11''})$$
$$(\text{short}_3 = 0) \rightarrow (\text{EV}_{timer,2} \neq \text{``e8''}) \cap (\text{EV}_{timer,2} \neq \text{``e11''})$$



# Formulation of Fault Detection Constraints

• Fault Detection Constraints: i.e.\*short occurs early

| time step     | 0 | 1     | 2 |
|---------------|---|-------|---|
| highway state | - | green |   |
| *short        | - | 1     |   |

• *Equations:*  $SV_{highway,1} = green$ , short<sub>1</sub> = 1.



## Test Generation Result, Traffic Light Controller

- *Fault*: \*short occurs early when highway is green
- Fault Detection Constraints: equations given earlier
- Environment:

Machine P4, 2GHz CPU, 256MB MEM GNU-Prolog version1.2.1

Performance: 45 ms

| time step        | 0     | 1     | 2      |
|------------------|-------|-------|--------|
| highway<br>state | -     | green | -      |
| *short           | -     | 1     |        |
| street<br>state  | red   | red   | red    |
| highway<br>state | green | green | yellow |
| signal<br>*short | 0     | 1     | 0      |

| *tick | 1 | 1 | 1 |
|-------|---|---|---|
| *car  | 0 | 1 | 0 |
| car   | 0 | 1 | 1 |

#### **CFSM** Computation

# Summary

- A method to evaluate the accuracy of a fault model
- A fault model for timing and synchronization errors
- A CLP-based test generation tool