A Strategyproof Mechanism for Scheduling Divisible Loads in Tree Networks

Thomas E. Carroll and Daniel Grosu
Wayne State University
Department of Computer Science
5143 Cass Avenue, Detroit, MI 48202, USA.
{tec, dgrosu}@cs.wayne.edu

Abstract

The underlying assumption of Divisible Load Scheduling is that the processors composing the network are obedient, i.e., they do not “cheat” the algorithm. This assumption is unrealistic if the processors are owned by autonomous, self-interested organizations that have no a priori motivation for cooperation and they will manipulate the algorithm if it is beneficial to do so. In this paper we propose the strategyproof mechanism DLS-TL for scheduling divisible loads in tree networks. Our proposal augments Divisible Load Theory (DLT) with incentives such that it is beneficial for processors to report their true processing capacity and compute their assignments at full processing capacity. Additionally, incentives are provided for processors to report algorithm deviants. Deviants are penalized which abates the processors’ willingness to deviate.

1. Introduction

One of the most studied topics in distributed systems is scheduling. Poor scheduling decisions lead to inefficiencies, underutilized resources, and suboptimal performance. In this paper we focus on the problem of scheduling divisible loads. Divisible load problems are characterized by data parallelism. These problems have large data sets where every element within the set requires an identical type of processing. The set can be partitioned into any number of fractions where each fraction requires scheduling. These problems commonly arise in many domains including image processing [15], databases [8], linear algebra [9], visualization [4], and multimedia broadcasting [5].

Scheduling divisible loads is the subject of Divisible Load Theory (DLT) which was extensively studied in [6] where influences such as network architectures (e.g., linear, bus, tree), task arrangements, and optimality conditions are explored. The underlying assumption of DLT is that the processors are obedient, i.e., under no circumstances will the processors “cheat”. In the real world, the assumption is unrealistic as the nodes may be owned by autonomous, self-interested entities that have no a priori motivation for cooperation and they are tempted to manipulate the algorithms in hope of increased benefits. In this type of environment, the processors are properly modeled as strategic agents. New protocols for DLT must account for this self-interested behavior. Mechanism design theory [21] — a field of economics that has recently garnered interest in computer science — provides the framework for solving such problems involving self-interested parties. The theory addresses incentive compatibility: rational agents (self-interested, utility-maximizing) are provided incentives which induce a behavior that maximizes the social welfare. Of interest are the strategyproof mechanisms. Each participant in a mechanism is characterized by private parameters. A strategyproof mechanism will result in a participant maximizing its utility if it truthfully reports its private parameters and follows the specified algorithm.

In our previous work [13], we showed how DLT can be augmented with incentives. We designed the strategyproof DLS-BL mechanism for scheduling divisible loads in bus networks. DLS-BL provides incentives to the processors to participate and to report their processing capacity to the centralized, trusted scheduler. The agents maximize their welfare by truthful reporting their values to the mechanism and executing their assignments as reported.

In this paper we look to augment DLT with incentives for tree networks, where the network comprises strategic processors. In this model, the load is distributed from the root of the tree downward through intermediate processors until all processors are assigned load. We propose the strategyproof mechanism DLS-TL to optimally distribute load among the processors in a tree network. The mechanism provides incentives to the processors to participate and report their full processing capacity. The DLS-TL is an example of autonomous node mechanism, where the agents (i.e., the processors) have control over both the inputs to
the algorithm and the algorithm itself. The self-interested processors will implement a different algorithm if it is beneficial to do such. To combat this scenario, processors are provided incentives to report algorithm deviants. Penalties abate processors’ willingness to deviate.

**Related work.** The divisible load scheduling problem was studied extensively in recent years resulting in a cohesive theory called Divisible Load Theory (DLT). A reference book on DLT is [6]. Two recent surveys on DLT are [7] and [22]. This theory has been used for scheduling loads on heterogeneous distributed systems in the context of different applications such as image processing [15], databases [8], linear algebra [9], and multimedia broadcasting [5]. Scheduling divisible loads in grids has been investigated in [25]. New results and open research problems in DLT are presented in [3]. All these works assumed that the participants in the load scheduling algorithms are obedient and follow the algorithm. Recently, several researchers considered the mechanism design theory to solve several computational problems that involve self-interested participants. These problems include resource allocation and task scheduling [19, 23, 24], routing [10] and multicast transmission [11]. In their seminal paper, Nisan and Ronen [20] considered for the first time the mechanism design problem in a computational setting. They proposed and studied a VCG (Vickrey-Clarke-Groves) type mechanism for the shortest path in graphs where edges belong to self-interested agents. They also provided a mechanism for solving the task scheduling on unrelated machines problem. A general framework for designing strategyproof mechanisms for one parameter agent was proposed by Archer and Tardos [1]. They developed a general method to design strategyproof mechanisms for optimization problems that have general objective functions and restricted form for valuations. In a subsequent paper [2] the same authors investigated the frugality of shortest path mechanisms. Grosu and Chronopoulos [14] derived a strategyproof mechanism that gives the overall optimal solution for the static load balancing problem in distributed systems. A strategyproof mechanism with verification combining incentives and DLT was proposed by Grosu and Carroll [13]. The results and the challenges of designing distributed mechanisms are surveyed in [12]. The strategyproof computing paradigm proposed in [17] considers the self-interest and incentives of participants in distributed computing systems. Ng et al. [18] proposed a strategyproof system for dynamic resource allocation in data staging. Mitchell and Teague [16] extended the distributed mechanism in [11] devising a new model where the agents themselves implement the mechanism, thus allowing them to deviate from the algorithm.

**Our contributions.** The main contribution of this paper is the design of a strategyproof mechanism with verification for scheduling divisible loads in tree networks assuming a linear cost model for the processors. We define the mechanism and prove its properties.

**Organization.** The paper is structured as follows. In Section 2 we present a description of the divisible load scheduling problem in the context of tree networks. In Section 3 we discuss the mechanism design foundations. In Section 4 we present our proposed mechanism. In Section 5 we prove the mechanism’s properties. In Section 6 we draw conclusions and present future directions.

### 2. Divisible Load Scheduling Problem

We first consider a distributed system comprising $m + 1$ processors interconnected in a single-level tree network. The network is composed of a set of terminal processors, $(P_1, \ldots, P_m)$, and their parent, $P_0$. Processor $P_i$ is characterized by $w_i$, which is the time it takes to process a unit load. The processor is assigned $\alpha_i$ units of load and it takes time $\alpha_i w_i$ to compute the assignment. This corresponds to a linear cost model. The parent $P_0$ is the load-originating processor that is responsible for distributing the load to the children. We assume that the processors have front-ends that allow simultaneous communication and processing and that the root processor can communicate with only one terminal processor at a given time (i.e., we assume the one-port model). Processor $P_i$ transmits $\alpha_j$ units of load to $P_j$ in time $\alpha_i z_j$, where $z_j$ is the time it takes to communicate a unit load from $P_0$ to child $P_j$. We denote by $\alpha = (\alpha_0, \ldots, \alpha_m)$ the vector of load allocations. Processor $P_i$ finishes its assignment in time $T_i(\alpha)$, which is the total time to receive (if $i \neq 0$) and process its assignment. The execution on a single-level tree is depicted in Figure 1.

The scheduling problem denoted as SINGLE-LEVEL TREE-LINEAR is to determine the optimal load allocation $\alpha$ which minimizes the total execution time $T(\alpha) = \max(T_0(\alpha), \ldots, T_m(\alpha))$. The finish time $T_i(\alpha)$ of processor $P_i$ is

$$T_i(\alpha) = \begin{cases} 0 & \text{if } \alpha_i = 0 \\ \alpha_i w_i & \text{if } i = 0 \text{ and } \alpha_i > 0 \\ \sum_{j=0}^{i-1} \alpha_j z_j + \alpha_i w_i & \text{if } i \neq 0 \text{ and } \alpha_i > 0. \end{cases} \quad (1)$$

The SINGLE-LEVEL TREE-LINEAR problem can be formalized as

$$\min_{\alpha} T(\alpha) \quad (2)$$

subject to constraints: (i) $\alpha_i \geq 0, i = 0, \ldots, m$, and (ii) $\sum_{i=0}^{m} \alpha_i = 1$

The following theorems proved in [6] characterize the optimal solution.
Theorem 2.1. (Sequencing) An optimal solution to the SINGLE-LEVEL TREE-LINEAR problem is obtained when \( z_1 \leq z_2 \leq \ldots \leq z_m \).

Theorem 2.2. (Participation) The optimal solution for the SINGLE-LEVEL TREE-LINEAR problem is obtained when all processors participate and they all finish executing their assigned load at the same time, i.e., \( T_0(\alpha) = T_1(\alpha) = \ldots = T_m(\alpha) \).

From Figure 1, we derive the following recursive equations for the optimal load allocation

\[
\begin{align*}
\alpha_0 w_0 &= \alpha_1 z_1 + \alpha_1 w_1 \\
\alpha_j w_j &= \alpha_{j+1} z_{j+1} + \alpha_{j+1} w_{j+1}, & j = 1, \ldots, m - 1.
\end{align*}
\]  

(3) \hspace{2cm} (4)

We can reduce the single-level tree to a single processor with equivalent processing capacity. Figure 2 illustrates the reduction. We compute the equivalent processing capacity \( w_{eq} \) by

\[
w_{eq} = \max(T_0(\alpha), \ldots, T_m(\alpha)).
\]  

(5)

Time \( T_i \) is the time \( P_i \) takes to complete its portion of the unit load. Since \( \sum_{j=0}^m \alpha_j = 1 \), the processing capacity of the tree is equal to the finish time of the slowest processor. If \( \alpha \) is optimal (i.e., \( \alpha \) resulted from (3) and (4)), then \( w_{eq} \) reduces to

\[
w_{eq} = T_i(\alpha), \quad i = 0, \ldots, m.
\]  

(6)

The subsequent algorithm solves the SINGLE-LEVEL TREE-LINEAR problem.

Algorithm 2.1. (SINGLE-LEVEL TREE-LINEAR)

Input: Processing capacities \( w_0, \ldots, w_m \);
Link capacities \( z_1, \ldots, z_m \) such that \( z_1 \leq \ldots \leq z_m \);

Output: Load allocations \( \alpha_0, \ldots, \alpha_m \);
Equivalent processing capacity \( w_{eq} \);
1. Compute \( \alpha_0, \ldots, \alpha_m \) by using equations (3) and (4);
2. Compute \( w_{eq} \) by using equation (6).

The algorithm is executed by \( P_0 \) when a new load needs processing. In order to compute \( \alpha \) and \( w_{eq} \), processor \( P_i \) (\( i = 1, \ldots, m \)) reports \( w_i \) to \( P_0 \). It is assumed that \( z_1, z_2, \ldots, z_m \) are known to \( P_0 \).

We now consider the problem of scheduling divisible loads in multi-level tree networks [6]. Figure 3 depicts an execution on a multi-level tree distributed system composed of eight processors. Notice that after non-terminal processors receive load, they simultaneously begin computing a portion and transmitting the remainder of it to its children. A tree network comprises processor set \( \{P_0, \ldots, P_p\} \) (with processing capacities \( w_0, \ldots, w_p \) and link capacities \( z_1, \ldots, z_p \), where \( P_0 \) is the root). In the rest of the paper we denote by \( S_i \) the single-level subtree with root \( P_i \). The load is distributed from top to bottom, passing through each level. The TREE-LINEAR scheduling is defined similarly as SINGLE-LEVEL TREE-LINEAR where we desire to optimally distribute load across the tree such that we minimize the total execution time. The following theorem characterizes the optimal allocations for trees.

Theorem 2.3. (Reduction) The optimal solution is obtained by traversing tree \( T \) from bottom to top, replacing single-level subtrees with single equivalent processors until \( T \) is reduced to one processor. For single-level subtree \( S_i \), we compute load distribution \( \alpha^i \) and \( w_{eq}^i \) by SINGLE-LEVEL TREE-LINEAR algorithm and replace \( S_i \) with a processor with equivalent processing capacity \( w_{eq}^i \).
The following algorithm solves the TREE-LINEAR problem.

**Algorithm 2.2. (TREE-LINEAR)**

**Input:** Processor capacities $w_0, \ldots, w_p$;
Link capacities $z_1, \ldots, z_p$;

**Output:** Load allocations $(\alpha_{00}', \ldots, \alpha_{0m_0}')$, $(\alpha_{i0}', \ldots, \alpha_{im_i}')$, where $(\alpha_{00}', \ldots, \alpha_{im_i}')$ is the allocation for single-level tree $S_i$;
1. For any other processor $i \neq 0$, do
2. $\alpha_{00}' \leftarrow 1$;
3. Until a single processor remains; do
4. Find a single-level subtree $S_j = (P_{0j}', \ldots, P_{m_j}')$, where $P_{0j}' = \bar{P}_j$;
5. $\alpha_{00}', \ldots, \alpha_{im_i}', w_{eq}' \leftarrow$ SINGLE-LEVEL TREE-LINEAR$(w_0', \ldots, w_{m_j}'; z_0', \ldots, z_{m_j}')$
6. Replace $S_j$ with a single processor with processor capacity $w_{eq}'$

In the above algorithm we set $\alpha_{00}' = 1$ for terminal processor $P_j$, which conveys the fact that $\bar{P}_j$ computes all the work that it receives. Note that $\alpha_{00}', \ldots, \alpha_{im_i}'$ are the load allocation fractions for single-level tree $S_j$ and not for the entire tree $\mathcal{T}$. The root $P_0'$ distributes load to processor $P_i'$ ($i = 1, \ldots, m_j$) according to these fractions. Let $\bar{P}_j$ be the root of the subtree which corresponds to equivalent processor $P_j'$. Processor $\bar{P}_j$ receives $\gamma_i$ fractions of the total work and sends $\alpha_{ij}'$ fractions to $\bar{P}_i$. Processor $\bar{P}_j$ receives $\gamma_i$ fractions of the total work given by

$$
\gamma_i = \begin{cases} 
1 & \text{if } i = 0, \\
\gamma_j \alpha_{ij}' & \text{if } i \neq 0.
\end{cases}
$$

(7)

At the start, the root processor $\bar{P}_0$ is the only processor with access to load and thus, $\gamma_0 = 1$. For any other processor $\bar{P}_i$ ($i \neq 0$), its load is dependent on the load received by
its parent. Out of γ fractions of total work, processor $\bar{P}_i$ computes $c_{\alpha_i}^{\gamma}$ fractions of it; thus, it computes $c_{\alpha_i}^{\gamma,\gamma}$ fractions of the total work.

In the above algorithm it is assumed that $P_i$ reports its true processing capacity to its parent. When the processors are owned and operated by disparate, autonomous organizations that are self-interested and welfare-maximizing, they will misreport their processing capacity or deviate from the algorithm in hope of generating increased profits. In the subsequent sections, we present a mechanism that provides incentives to the agents to report truthfully and that fine agents that deviate from the algorithm.

3. Mechanism Design Framework

In this section, we introduce the main concepts of mechanism design theory. We limit our discussion to mechanism design for one parameter agents. Each agent in this mechanism design problem is characterized by private data represented by a single real value. We define the problem in the following.

A mechanism design problem for one parameter agents is characterized by

(i) A finite set $L$ of allowed outputs. The output is a vector $\alpha(w) = (\alpha_1(w), \ldots, \alpha_m(w)) \in L$, computed according to the agents' bids, $w = (w_1, \ldots, w_m)$. Here, $w_i$ is the bid of agent $i$.

(ii) Each agent $i$ ($i = 1, \ldots, m$) has a privately known value $t_i$ called the true value and a publicly known parameter $\bar{w}_i \geq t_i$ called the actual value. The preferences of agent $i$ are given by a function called valuation $V_i(\alpha, \bar{w})$.

(iii) Each agent goal is to maximize its utility. The utility of agent $i$ is $U_i(w, \bar{w}) = Q_i(w, \bar{w}) + V_i(\alpha(w), \bar{w})$, where $Q_i$ is the payment handed by the mechanism to agent $i$ and $\bar{w}$ is the vector of execution values. The payments are handed to the agents after the mechanism learns $\bar{w}$.

(iv) The goal of the mechanism is to select an output $\alpha$ that optimizes a given cost function $g(w, \alpha)$.

Definition 3.1. (Mechanism with Verification) A mechanism with verification is characterized by two functions.

(i) The output function $\alpha(w) = (\alpha_1(w), \ldots, \alpha_m(w))$. The input to this function is the vector of agents' bids $w = (w_1, \ldots, w_m)$ and returns an output $\alpha \in L$.

(ii) The payment function $Q_i(w, \bar{w}) = (Q_1(w, \bar{w}), \ldots, Q_m(w, \bar{w}))$, where $Q_i(w, \bar{w})$ is the payment handed by the mechanism to agent $i$.

Note. In the rest of the paper, we denote by $w_{-i}$ the vector of bids excluding the bid of agent $i$. The vector $w$ is represented by $(w_{-i}, w_i)$.

The following defines an important property in that an agent will maximize its utility when $\bar{w}_i = w_i = t_i$ independent of the actions of the other agents.

Definition 3.2. (Strategy-proof Mechanism) A mechanism is called strategy-proof if for every agent $i$ of type $t_i$ and for every bids $w_{-i}$ of the other agents, the agent's utility is maximized when it declares its real type $t_i$ (i.e., truth-telling is a dominant strategy).

The next property guarantees non-negative utility for truthful agents. This is important as agents willfully participate in hope of profits.

Definition 3.3. (Voluntary Participation Mechanism) We say that a mechanism satisfies the voluntary participation condition if $U_i((w_{-i}, \bar{w}_i)) \geq 0$ for every agent $i$, true value $t_i$, and other agents' bids $w_{-i}$ (i.e., truthful agents never incur a loss).

There are two models for characterizing distributed mechanisms. They differ in the degree of control that the agents have. A mechanism is a tamper-proof mechanism if the agents control the inputs. In these types of mechanism, the agents can only specify its inputs and thus, the only method of cheating is altering its inputs. A more general model is the autonomous node model. A mechanism is an autonomous node mechanism if the agents control the inputs and the algorithm. An agent will implement an algorithm different from what is specified if it is beneficial to do so.

In our model, each processor $P_i$ is characterized by a valuation function $V_i$ which in this case is equal to the cost of processing a given load. A processor $P_i$ wants to maximize its utility $U_i$ which is the sum of its valuation $V_i$ and the payment $Q_i$ given to it. A processor $P_i$ is parametrized by true processing capacity $t_i$. It bids processing capacity $w_i$ to the mechanism; $w_i$ may be different than true capacity $t_i$. It may choose to process its assignment at a different speed than either its true capacity $t_i$ or bid capacity $w_i$. This is its actual processing capacity $\bar{w}_i$, where $\bar{w}_i \geq t_i$.

4. The Proposed Mechanism

We propose the DLS-TL mechanism for scheduling divisible loads in tree networks. The system model comprises a tree $T$ of $p + 1$ processors, where the root $P_0$ is obedient and processors $P_1, \ldots, P_p$ are strategic. Processor $P_0$ is special as it performs functions the ensure mechanism validity. We assume that the links are obedient and that the communication protocols are tamper-proof.

Notation. We use the following notations in this section.

- Tree $T$ has processor set $(P_0, \ldots, P_p)$; processor $P_0$ is the root of $T$. Tree $S_i$ is a single-level subtree of the reduced $T$ with root $P_0$ (with processing capacity $w_0^i$) and leaves $(P_1^i, \ldots, P_m^i)$ (with processing capacities $w_1^i, \ldots, w_m^i$ and link capacities $z_1 \leq \ldots \leq z_m$).
- Processor $P_0^i$ is an alias for $P_i$. Equivalent processor
$P^i_j$ ($i = 1, \ldots, m_j$) corresponds to the subtree rooted at $P_j$. Processor $P_k$ is the parent to processor $P_j$.

- Let $SK_j$ be the private key of processor $P_j$. The secure digital signature of message $m$ under $SK_j$ is $\text{sig}(m)$. The message $dsm_i(m) = (m, \text{sig}(m))$ is the digitally signed message $m$ under $SK_i$.

The description of DLS-TL mechanism follows. Informally, we assume the existence of a payment infrastructure and a public key infrastructure (PKI). We assume that all processors have a public cryptographic key set and that the public key from the set is registered with the PKI. Furthermore, we assume a processor recognizes its parent, grand-parent, siblings, and children and is capable of verifying their signatures.

A processor $P_i$ may process its assigned load at a different processing rate given by the actual processing capacity, where $\tilde{\tau}_i \geq \tau_i$. Thus, processor $P_i$ may process its assigned load at a slower rate than its true processing capacity. We cope with this situation by employing a strategyproof mechanism with verification. The goal of a strategyproof mechanism with verification is to give incentives to agents such that it is beneficial for them to report their values and process the assigned loads using their full processing capacity. Each processor $P_i$ is augmented with a tamper-proof meter that records $\tilde{\tau}_i$ and reports the value as $dsm_0(\tilde{\tau}_i)$.

**DLS-TL Mechanism**

**Phase I (Computing load allocation - bottom-up)** We apply the TREE-LINEAR algorithm with the following modifications to bid transmission. In single-level subtree $S_j$, processor $P^i_j$ transmits its digitally-signed bid $dsm_i(w^j_i)$ to parent $P^0_i$. We denote by $w^j_i = (w^j_{0_1}, \ldots, w^j_{0_m})$ the vector of bids and by $w^j_{dsm_i}$ the vector of signed bids. Processor $P^i_j$ verifies the authenticity of the received messages and it will terminate the protocol if it receives contradictory messages from source $P^0_i$. Processor $P^0_i$ submits the evidence to $P_0$, which attempts to substantiate the claim. If the claim is true, $P^0_i$ is fined $C$ and $P^0_i$ is rewarded $C$. If the claim is false, $P^0_i$ is fined $C$ and $P^0_i$ is rewarded $C$. Fine $C$ must be larger than any potential profits that are attainable by cheating. If $P^0_i$ receives an unverifiable message, the protocol is terminated and no rewards or fines are issued as either sender or recipient may have invalidated the message. We denote by $\alpha^j = (\alpha^j_{0_1}, \ldots, \alpha^j_{0_m})$ the vector of load fractions for subtree $S_j$. Processor $P^0_i$ computes $\alpha^j$ and $w^j_{eq}$ using the SINGLE-LEVEL TREE-LINEAR algorithm, and it then transmits $dsm_i(w^j_{eq})$ to its parent $P_k$. The correctness of the $w^j_{eq}$ will be verified in Phase II. All load allocations $\alpha^0, \ldots, \alpha^p$ are computed in this phase.

**Phase II (Computing load allocation - top-down)** We compute the allocation $\gamma_i$, which is the fraction of the total load that processor $P_i$ receives. We recursively undo the transformations performed in Phase I. The root $P^0_i$ sends message

$$G_i = (dsm_k(\gamma_j), dsm_j(\gamma_i),$$

$$dsm_{i}(w^j_{eq}), dsm_{j}(w^j_{0_1}, \ldots, w^j_{0_m}))$$

(8)

to child $P^1_i$, where $P_k$ is the parent of $P_j$. $\gamma_j$ is the fraction of total load transmitted to $P_j$ from $P_k$, and $\gamma_i$ is the fraction of total load transmitted to $P_i$ from $P_j$ as computed by (7). Processor $P^1_i$ verifies the various signatures and computations in $G_i$. Specifically, it verifies that $\sum_{j=0}^{m} w^j = w^j_{eq}$ and that $\gamma_i = \alpha^j \gamma_j$. Processor $P_i$ terminates the protocol if it cannot validate the computations, receives contradictory messages, or cannot validate the signatures. In the first two scenarios, $P_i$ submits a complaint to $P_0$. If the complaint is confirmed, $P_j$ is fined $C$ and $P_i$ is rewarded $C$. If the complaint is debunked, $P_i$ is fined $C$ and $P_j$ is rewarded $C$. If a signature is unverifiable, all parties experience zero utility.

**Phase III (Load distribution and execution)** The load is distributed to the processors from root to leaves. Beginning with processor $P_0$, processor $P^0_i$ distributes $\alpha^j$ fractions of load to $P^1_i$; processor $P^0_i$ computes $\alpha^j$ fractions itself. A processor begins computing its load as soon as it receives its entire assignment. Processor $P^0_i$ may attempt to increase its utility by decreasing its fraction ($\alpha^j < \alpha^j_0$) and increasing its children fractions ($\alpha^j > \alpha^j_0$) such that $\sum_{j=0}^{m} \alpha^j = 1$. In this scenario, $\gamma_i > \gamma_j$, where $\gamma_i$ is the actual fraction of total work transmitted to processor $P_i$. At this point, processor $P_i$ has no choice but to accept the greater load. When computing completes, processor $P_i$ notifies $P_0$ about $\gamma_i > \gamma_j$. We assume that the data is embedded with a device $L_i$ such that $L_i$ permits processor $P_i$ to prove it has received $\gamma_i \leq \gamma_i$ fractions of load, i.e., a processor cannot show more work than it has received. Processor $P_0$ substantiates the claim by verifying that $\gamma_i > \gamma_j$ and, if the claim is false, $P_i$ is fined $C$. If the claim is confirmed, processor $P_0$ begins traversing the predecessors of $P_i$. Let processors $P_i$ and $P_j$ be predecessors of $P_i$ such that $P_x$ is a child of $P_y$; furthermore,
let $\tilde{P}_i$ be the root of equivalent processor $P_i^e$. Processor $P_0$ fines $C$ to all processors $P_i$ where $\tilde{\gamma}_i > \alpha_i \gamma_i$. The fines are pooled and rewarded to $\tilde{P}_i$. A processor may be fined multiple times depending on the number of processors impacted by its transgression.

**Phase IV (Payment computation)** The payment for processor $\tilde{P}_i$ (with parent $\tilde{P}_j$) is computed by $\tilde{P}_i$. In the following, $\alpha(\mathbf{w}^i) = (\alpha_{0i}, \ldots, \alpha^m_{0i})$ is the allocations for single-level subtree $S_i$ computed from bids $\mathbf{w} = (w^i_0, \ldots, w^i_m)$; $\mathbf{w}^j = (w^j_0, \ldots, w^j_m)$ are the bids for $S_j$; $\alpha^0_i$ is the actual load allocation for $\tilde{P}_i$; $\tilde{w}_i$ is the actual processing capacity for $\tilde{P}_i$; and $\tilde{\gamma}_i$ is the actual fraction of total work received by $\tilde{P}_i$. The goal of processor $\tilde{P}_i$ is to maximize its utility. The utility of $\tilde{P}_i$ is

$$ U_i(\alpha(\mathbf{w}^i), \mathbf{w}^i, \alpha_0^i, \tilde{w}_i, \tilde{\gamma}_i) = V_i(\alpha^0_i, \tilde{w}_i, \tilde{\gamma}_i) + Q_i(\alpha(\mathbf{w}^i), \mathbf{w}^i, \tilde{w}_i, \tilde{\gamma}_i), $$

where $V_i$ is the valuation function and $Q_i$ is the payment function. The valuation function $V_i$ is

$$ V_i(\alpha^0_i, \tilde{w}_i, \tilde{\gamma}_i) = -\gamma_i \alpha^0_i \tilde{w}_i $$

The payment function $Q_i$ is

$$ Q_i(\alpha(\mathbf{w}^i), \mathbf{w}^i, \tilde{w}_i, \tilde{\gamma}_i) = \begin{cases} 0 & \text{if } \tilde{\gamma}_i = 0, \\ C_i(\alpha(\mathbf{w}^i), \tilde{w}_i, \tilde{\gamma}_i) + B_i(\alpha(\mathbf{w}^i), \mathbf{w}^i, \tilde{w}_i) & \text{if } \tilde{\gamma}_i > 0, \end{cases} $$

where

$$ C_i(\alpha(\mathbf{w}^i), \tilde{w}_i, \tilde{\gamma}_i) = \tilde{\gamma}_i \alpha^0_i \tilde{w}_i $$

is the compensation function and

$$ B_i(\alpha(\mathbf{w}^i), \mathbf{w}^i, \tilde{w}_i) \equiv w^i_{eq}(\alpha(\mathbf{w}^i_{-j}), \mathbf{w}^i_{-j}) - w^i_{eq}(\alpha(\mathbf{w}^i_{-j}, \tilde{w}_i)), (\mathbf{w}^i_{-j}, \tilde{w}_i) $$

is the work bonus function. The function $w^i_{eq}(\alpha(\mathbf{w}^i_{-j}), \mathbf{w}^i_{-j})$ is the optimal equivalent processing capacity of single-level subtree $S_j$ without subtree $S_i$. The function $w^i_{eq}(\alpha((\mathbf{w}^i_{-j}, \tilde{w}_i)), (\mathbf{w}^i_{-j}, \tilde{w}_i))$ is the equivalent processing capacity adjusted for the actual processing capacity of $\tilde{P}_i$, where

$$ \tilde{w}_i = \begin{cases} \tilde{w}_i & \text{if } \tilde{P}_i \text{ is a terminal processor in tree } T, \\ \tilde{w}_i \alpha^0_i \tilde{\gamma}_i & \text{if } \tilde{w}_i \geq \tilde{w}_i, \\ \tilde{w}_i & \text{if } \tilde{w}_i < \tilde{w}_i. \end{cases} $$

In the above, we adjust the performance of subtree $S_i$ for the actual performance of processor $\tilde{P}_i$. In the event that $\tilde{P}_i$ runs slower ($\tilde{\gamma}_i > \alpha_i \gamma_i$), the equivalent processing capacity of $S_i$, $w^i_{eq}$, is dominated by $\tilde{P}_i$’s performance; if $\tilde{P}_i$ runs faster ($\tilde{\gamma}_i < \alpha_i \gamma_i$), $w^i_{eq}$ remains unchanged from $w^i_i$. Processor $\tilde{P}_i$ saves

$$ \text{Proof}_i = (G_i, w^i_{ds}, ds_{0i}(\tilde{w}_i), ds_{mi}(\tilde{\gamma}_i), L_i) $$

as evidence of correctly computing the payment. It submits $Q_i$ to the payment infrastructure. With probability $q$ (where $0 < q \leq 1$), processor $\tilde{P}_i$ requests $\text{Proof}_i$ from $\tilde{P}_i$. If $\tilde{P}_i$ fails to provide a valid proof for $Q_i$, it receives fine $C/q$.

This concludes the description of the DLS-TL mechanism. The mechanism as described above is valid for selfish-but-agreeable agents but not selfish-and-annoying agents. A selfish-but-agreeable agent will deviate from the algorithm only if it strictly improves its welfare, while a selfish-and-annoying agent will only follow the algorithm if it is the only action that maximizes its welfare. In the case of DLS-TL, selfish-and-annoying processors will subvert the mechanism by performing undesirable actions (e.g., corrupting data, sending the same data set to multiple children, etc.) where their behavior is not constrained by incentives or penalties. If the load is associated with a problem where the solution can be verified (e.g., searches, factorizations), we can easily amend the mechanism to tolerate selfish-and-annoying processors. We begin by altering (11) to

$$ Q_i(\alpha(\mathbf{w}^i), \mathbf{w}^i, \tilde{w}_i, \tilde{\gamma}_i) = \begin{cases} 0 & \text{if } \tilde{\gamma}_i = 0, \\ C_i(\alpha(\mathbf{w}^i), \tilde{w}_i, \tilde{\gamma}_i) + B_i(\alpha(\mathbf{w}^i), \mathbf{w}^i, \tilde{w}_i) + S & \text{if } \tilde{\gamma}_i > 0, \end{cases} $$

where $S = 0$ if a solution is not found, and $s$ if a solution is found. The function $S$ is called the solution bonus. The bonus $s$ is a small, positive quantity that rewards agents for following the given algorithm. Selfish-and-annoying agents will not risk the loss of $s$; hence, they will not deviate.

**5. DLS-TL Properties**

In this section we study the properties of the DLS-TL mechanism. The first property we investigate is strategyproofness. To prove this, we first show that all the processors do not have incentives to deviate from the algorithm. Then we show that if the processors do not deviate (but they can still execute differently than they bid), then truthful processors receive the maximum profit. Combining the two, we prove that the mechanism is strategyproof.

**Lemma 5.1.** A selfish-but-agreeable processor will be fined for deviating from DLS-TL.
Proof. Let processor $\bar{P}_i$ be a selfish-but-agreeable agent and $\bar{P}_k$ be a child of $\bar{P}_i$. A selfish-but-agreeable agent will deviate if the action increases its utility. Processor $\bar{P}_i$ may deviate from the protocol by either (i) sending contradictory messages, (ii) incorrectly computing $w_{eq}^j$ or $\gamma_h$, (iii) decreasing its workload by increasing the workload of its children (i.e., $\gamma_h > \gamma_q$), (iv) requesting a payment greater than $Q_i$, (v) falsely accusing a processor of cheating. Processor $\bar{P}_i$ will not deviate in other fashions (e.g., corrupt data) because there is no benefit to do so. To combat deviation, incentives are provided to processors for monitoring one another. In case (i), the recipient will report $\bar{P}_i$ and obtain reward $C$; processor $\bar{P}_i$ will be fined $C$. Fine $C$ is a deterrent as it is greater than any profit attainable by cheating. In case (ii), the selfish-and-annoying agent does not have incentives to deviate. Processor $\bar{P}_i$ will be fined for and only for deviating. Therefore, processor $\bar{P}_i$ does not have incentives to deviate from DLS-TL.

Lemma 5.2. A processor receives a fine only if it has deviated from DLS-TL.

Proof. Processor $\bar{P}_i$ is fined for either deviating from the protocol or another processor $\bar{P}_j$ produces contradictory messages signed by $\bar{P}_i$. In the first case, $\bar{P}_i$ clearly deviates from DLS-TL. In the second case, $\bar{P}_i$ sends the messages either by successfully forging signatures or by possessing the private key $SK_i$. We assume that the forging of signatures is impossible. Processor $\bar{P}_i$ obtains $SK_i$ either by $\bar{P}_i$ sharing it or by stealing it from $\bar{P}_i$. It is a violation of the mechanism for a second party to possess $SK_i$. Thus, $\bar{P}_i$ is fined for protocol deviation.

Theorem 5.1. (Selfish-but-Agreeable Agent Compliance) A selfish-but-agreeable processor does not have incentives to deviate from DLS-TL.

Proof. From Lemma 5.1 and 5.2, a selfish-but-agreeable processor will be fined for and only for deviating. Therefore, the processor does not have incentives to deviate from DLS-TL.

Theorem 5.2. (Selfish-and-Annoying Agent Compliance) A selfish-and-annoying agent does not have incentives to deviate from DLS-TL if the solution bonus function is employed.

Proof. Let processor $\bar{P}_i$ be a selfish-and-annoying agent. Theorem 5.1 handles the cases in which deviation results in greater utility, i.e., $U_i' - U_i > 0$, where $U_i'$ is the utility of a deviating $\bar{P}_i$. Processor $\bar{P}_i$ will also deviate if it does not reduce its utility, i.e., $U_i' - U_i = 0$. Such actions include data corruption and sending the same data to different children. These actions reduce the possibility of obtaining a solution to the given problem and thus, reduce the possibility of receiving the solution bonus. Processor $\bar{P}_i$ is welfare maximizing and thus, will not choose to do such. Therefore, processor $\bar{P}_i$ does not have incentives to deviate from DLS-TL.

Lemma 5.3. The mechanism is strategyproof if the processors do not deviate from the algorithm.

Proof. Assume processor $\bar{P}_i$ is the parent of $\bar{P}_j$. The utility $U_i$ of processor $\bar{P}_j (i = 1, \ldots, p)$ is

$$U_i = V_i + Q_i = -\bar{\gamma}_i\bar{\alpha}_i\bar{w}_i + \bar{\gamma}_i\bar{\alpha}_i\bar{w}_i + w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}^j - w^j_{eq}(\alpha((w_{i,-j}, w_{i}^j)), (w_{i,-j}, \bar{w}_i)))$$

We assume that the processors do not deviate and thus abide by the computed load allocations; therefore, $\alpha_i = \alpha_i^0$. The utility $U_i$ is

$$U_i = w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}^j - w^j_{eq}(\alpha((w_{i,-j}, w_{i}^j)), (w_{i,-j}, \bar{w}_i))).$$

We consider two cases:

(i) $\bar{w}_i = t_i$, i.e., processor $\bar{P}_i$ computes the load at full capacity. Assume $\bar{P}_i$ to be a terminal processor in tree $T$. If $\bar{P}_i$ bids its true value $(w^j_{eq} = t_i)$, then its utility $U_i^{[e]}$ is

$$U_i^{[e]} = w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}^j) - w^j_{eq}(\alpha((w_{i,-j}, t_i)), (w_{i,-j}, t_i))$$

$$= w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}^j) - w^j_{eq}.$$  \hspace{1cm} (18)

If $\bar{P}_i$ bids lower $(w^j_{eq} < t_i)$, then its utility $U_i^{[l]}$ is

$$U_i^{[l]} = w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}) - w^j_{eq}(\alpha((w_{i,-j}, w^j_{eq})), (w_{i,-j}, t_i))$$

$$= w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}) - w^j_{eq}.$$  \hspace{1cm} (19)

We want to show $U_i^{[e]} \geq U_i^{[l]}$, which reduces to showing $w^j_{eq} \leq w^j_{eq}$. By the SINGLE-LEVEL TREE-LINEAR algorithm, we know that $\alpha((w_{i,-j}, t_i))$ is optimal. By bidding lower than the true value, $\bar{P}_i$ is assigned more load and the other processors are assigned less load. The greater load will increase the execution time of $\bar{P}_i$ and increase the equivalent processing capacity such that $w^j_{eq} \leq w^j_{eq}$. Therefore, $U_i^{[e]} \geq U_i^{[l]}$. The other possibility is that $\bar{P}_i$ bids higher $(w^j_{eq} > t_i)$. Its utility $U_i^{[h]}$ is

$$U_i^{[h]} = w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}) - w^j_{eq}(\alpha((w_{i,-j}, w^j_{eq})), (w_{i,-j}, t_i))$$

$$= w^j_{eq}(\alpha(w_{i,-j}), w_{i,-j}) - w^j_{eq}.$$  \hspace{1cm} (20)
Similar to above, we want to show \( U_j^{[e]} \geq U_i^{[h]} \). Bidding higher than the true value, results in reduced load to \( P_i \) and increased load to the other processors. Since \( \alpha((w_i, t_i)) \) is optimal, \( w_j^{[e]} \leq w_j^{[h]} \) and thus, \( U_j^{[e]} \geq U_i^{[h]} \).

We now assume \( P_i \) to be an interior processor of tree \( T \). If \( P_i \) bids its true value \( (w_i^j = t_i) \), then its utility \( U_i^{[j]} \) is

\[
U_i^{[j]} = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha((w_i^j, w_i^{[j]})), (w_{-i}^j, \alpha_i^{[j]})) = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha_i^{[j]}),
\]

where \( w_j^{[i]} \) is the processing capacity of \( S_i \). If \( P_i \) bids lower \( (w_i^j < t_i) \), then its utility \( U_i^{[j]} \) is

\[
U_i^{[j]} = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha((w_i^j, w_i^{[j]})), (w_{-i}^j, \alpha_i^{[j]})) = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha_i^{[j]}),
\]

where \( w_j^{[i]} \) is the processing capacity of \( S_i \) and \( \alpha_i^{[j]} \) is the fraction of load assigned to \( P_i \), both of which are computed with bids \( (w_i^j, w_i^{[j]}, ..., w_m^j) \). We know that \( \alpha((w_i^j, w_i^{[j]})) \) is the optimal allocation by the SINGLE-LEVEL TREE-LINEAR algorithm. By bidding lower, \( P_i \) is assigned more load, \( i.e. \), \( \alpha_i^{[j]} \geq \alpha_i^{[e]} \). The performance of the single-level subtree is constrained by \( P_i \). Thus, \( w_j^{[e]} \leq w_j^{[i]} \) which proves \( U_i^{[j]} \geq U_i^{[i]} \). Finally, if \( P_i \) bids higher \( (w_i^j \geq t_i) \), then its utility \( U_i^{[j]} \) is

\[
U_i^{[j]} = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha((w_i^j, w_i^{[j]})), (w_{-i}^j, \alpha_i^{[j]})) = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha_i^{[j]}).
\]

Another useful property is voluntary participation. When a mechanism satisfies the voluntary participation condition, a truthful processor will never obtain negative utility \( (i.e., U_i \geq 0) \).

**Lemma 5.4.** If the processors do not deviate from the protocol, the DLS-TL mechanism satisfies the voluntary participation condition.

**Proof.** The utility of processor \( P_i \) (with parent \( P_j \)) when it bids its true value is

\[
U_i = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha((w_i^j, w_i^{[j]})), (w_{-i}^j, \alpha_i^{[j]})).
\]

If \( P_i \) is an intermediate processor, then \( w_i^j = \hat{w}_i = \alpha_i^{[j]} t_i \). If it is a terminal processor, then \( \hat{w}_i = t_i \). If we set \( \alpha_i^{[j]} = 1 \) for terminal processor \( P_i \) (as we did in Section 2), then we simplify (24) to

\[
U_i = w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) - w_j^{[e]}(\alpha((w_i^j, \alpha_i^{[j]})), (w_{-i}^j, \alpha_i^{[j]})).
\]

The equivalent processing capacity \( w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) \) is obtained by using all the processors except for \( P_i \) and its subtree \( S_i \). By Theorem 2.4, we know that the optimal equivalent processing capacity \( w_j^{[e]}(\alpha(\alpha_i^{[j]} t_i)), (w_{-i}^j, \alpha_i^{[j]})) \leq w_j^{[e]}(\alpha(w_i^j), w_{-i}^j) \). Therefore, the utility is \( U_i \geq 0 \).

**Theorem 5.4.** (Voluntary Participation) The DLS-TL mechanism satisfies the voluntary participation condition.

**Proof.** We construct the proof similar to Theorem 5.3. Lemma 5.4 states that the mechanism satisfies voluntary participation if no deviation occurs. By Theorems 5.1 and 5.2, we know that processors will not deviate. Hence, DLS-TL mechanism satisfies the voluntary participation condition.

Due to the lack of space, we quickly examine the communication complexity of DLS-TL. Disregarding the communication with \( P_i \) for algorithm enforcement and the distribution of load, we see that Phase II dominates communication in terms of message quantity and size. In that phase, the root of a single-level subtree must send a \( m + 1 \) sized message to each of its \( m \) children. The worst case is then a single-level tree with \( n \) processors in which the root must transmit an \( n \)-sized message to each of its \( n - 1 \) children; thus, the communication complexity in the worst-case is \( \Theta(n^2) \).
6. Conclusion

In this paper we proposed the strategyproof mechanism DLS-TL for scheduling divisible loads in tree networks. In a tree network, the load originates at the root and it is dispersed through intermediate processors until the load is distributed to all. Through the use of incentives, processors report and process their assignments at full capacity. Incentives are also provided for reporting algorithm deviation. A processor will readily report a deviation in order to receive a reward. The fine for deviation is greater than any profits attainable by cheating, which will dissuade processors from attempting it. In a final note, DLS-TL satisfies the voluntary participation condition. All truthful, non-deviating processors will obtain non-negative utility.

We are planning to continue our investigation into the augmentation of DLT with incentives. By examining various influences such as network architectures and the rational behavior of links, we hope to achieve a cohesive theory combining DLT with incentives.

Acknowledgment. This research was supported in part by Wayne State University Faculty Research Grant.

References