Awards:
Best Paper: Approximating Bipartite Minimum Vertex Cover in the CONGEST model, by Salwa Faour and Fabian Kuhn.
Best Student Paper: Fast Byzantine SGD, by Amine Boussetta, El-Mahdi El-Mhamdi, Rachid Guerraoui, Alexandre Maurer, and Sébastien Rouault.
All times are in Central European Time.
In this work, we provide new upper and lower bounds for broadcasting in multi-channel radio networks, from the perspective of resource competitiveness. Our algorithms work for arbitrary $n,C$ values, require minimal prior knowledge, and can tolerate a powerful adaptive adversary. More specifically, in our algorithms, for large $T$ values, each node’s runtime is $O(T/C)$, and each node’s energy cost is $\tilde{O}(\sqrt{T/n})$. We also complement algorithmic results with lower bounds, proving both the time complexity and the energy complexity of our algorithms are optimal or near-optimal (within a poly-log factor). Our technical contributions lie in using “epidemic broadcast” to achieve time efficiency and resource competitiveness, and employing coupling techniques in the analysis to handle the adaptivity of the adversary. At the lower bound side, we first derive a new energy complexity lower bound for 1-to-1 communication in the multi-channel setting, and then apply simulation and reduction arguments to obtain the desired result.
The problem we tackle in this paper is that of \emph{dynamic} reliable broadcast, i.e., enabling processes to join or leave the system. This property is desirable for long-lived applications (aiming to be highly available), yet has been precluded in previous asynchronous reliable broadcast protocols. We study this property in a general adversarial (i.e., Byzantine) environment.
We introduce the first specification of a dynamic Byzantine reliable broadcast (\name) primitive that is amenable to an asynchronous implementation. We then present an algorithm implementing this specification in an asynchronous network. Our \name algorithm ensures that if any correct process in the system broadcasts a message, then every correct process delivers that message unless it leaves the system. Moreover, if a correct process delivers a message, then every correct process that has not expressed its will to leave the system delivers that message. We assume that more than $2⁄3$ of processes in the system are correct at all times, which is tight in our context.
We also show that if only one process in the system can fail—and it can fail only by crashing—then it is impossible to implement a stronger primitive, ensuring that if any correct process in the system broadcasts (resp., delivers) a message, then every correct process in the system delivers that message—including those that leave.
This paper investigates the achievability of broadcast in general networks, i.e., networks where only some subsets of minicast channels may be available, thereby addressing open problems posed by Jaffe et al. [PODC’12], Raykov [ICALP’15]. To that end, we propose a hierarchy over all possible general adversaries, and identify for each class of general adversaries 1) a set of minicast channels that are necessary to achieve broadcast and 2) a set of minicast channels that are sufficient to achieve broadcast. In particular, this allows us to derive bounds on the amount of $b$-minicasts that are necessary and that suffice towards constructing broadcast in general $b$-minicast networks.
This paper presents local algorithms which ensure a maximally resilient path restoration for a large family of product graphs, including the widely used tori and generalized hypercube topologies. Our algorithms provably ensure that even under multiple link failures, traffic is rerouted to the other endpoint of every failed link whenever possible (i.e. detouring failed links), enforcing waypoints and hence accounting for the network policy. The algorithms are particularly well-suited for emerging segment routing networks based on label stacks.
These assumptions, however, are unsuitable for cross-domain applications, including blockchains, where not all acceptors are equally trustworthy, and not all learners have the same assumptions and priorities. We present the first consensus algorithm to be heterogeneous in all three respects. Learners set their own mixed failure tolerances over differently trusted sets of acceptors. We express these assumptions in a novel Learner Graph, and demonstrate sufficient conditions for consensus.
We present Heterogeneous Paxos: an extension of Byzantine Paxos. Heterogeneous Paxos achieves consensus for any viable Learner Graph in best-case three message sends, which is optimal. We present a proof-of-concept implementation, and demonstrate how tailoring for heterogeneous scenarios can save resources and latency.
-Multi-threshold reliable broadcast is possible if and only if $\max{t_c,t_v} + 2t_t < n$. -Multi-threshold almost-surely consensus is possible if $\max{t_c, t_v} + 2t_t < n$, $2t_v + t_t < n$ and $t_t < n/3$. Assuming a global coin, it is possible if and only if $\max{t_c, t_v} + 2t_t < n$ and $2t_v + t_t < n$. -Multi-threshold $(1-\epsilon)$-consensus is possible if and only if $\max{t_c, t_v} + 2t_t < n$ and $2t_v + t_t < n$.
Inspired by the CGC filter developed by Gupta and Vaidya, PODC 2020 \cite{vaidya-2f-redundancy}, we propose a gradient descent-based algorithm, Echo-CGC. Our main novelty is a mechanism to utilize the \textit{broadcast properties} of the radio network to avoid transmitting the raw gradients (full $d$-dimensional vectors). In the radio network, each worker can overhear previous gradients that were transmitted to the parameter server.
Roughly speaking, in Echo-CGC, %each worker will compute its local stochastic gradient, and if it
if a worker agrees'' with a combination of prior gradients, it will broadcast the
echo message” instead of the its raw local gradient. The echo message contains a vector of coefficients (of size at most $n$) and the ratio of the magnitude between two gradients (a float). In comparison, the traditional approaches need to send $n$ local gradients in each round, where each gradient is typically a vector in a ultra-high dimensional space ($d \gg n$).
%Each worker $i$ computes the angle between the received gradients in the current round and its own gradient (on the local cost function). If there exists a linear combination of previous gradients broadcast by other workers whose angle is less than a certain bound $\alpha$, then worker $i$ broadcasts the ``echo” message, which contains a vector of coefficients (of size at most $n$) and the ratio of the magnitude between two gradients (a float). Compared to the traditional approach that sends the entire raw gradient – which is typically a vector in a ultra-high dimensional space – the echo message contains only a few bytes \qinzi{Is this still true with the Span method?} \lewis{No, we can say it’s in the order of $n$ and will be beneficial when $n << d$}; hence, Echo-CGC effectively reduces the communication complexity.
The improvement on communication complexity of our algorithm depends on multiple factors, including number of nodes, number of faulty workers in an execution, and the cost function. We numerically analyze the improvement, and show that with a large number of nodes, Echo-CGC reduces 80\% of the communication under suitable assumptions.
Many alternative \emph{gradient aggregation rules} (GARs) have recently been proposed to tolerate a maximum number $f$ of Byzantine workers. These GARs differ according to (1) the complexity of their computation time, (2) the maximal number of Byzantine workers despite which convergence can still be ensured (breakdown point), and (3) their accuracy, which can be captured by (3.1) their angular error, namely the angle with the true gradient, as well as (3.2) their ability to aggregate full gradients. In particular, many are not full gradients for they operate on each dimension separately, which results in a coordinate-wise blended gradient, leading to low accuracy in practical situations where the number ($s$) of workers that are actually Byzantine in an execution is small ($s<<f$). Essentially, this is because they do not aggregate full gradients.
We propose \textsc{Aksel}, a new scalable median-based GAR with an optimal time complexity ($\mathcal{O}(n)$),
an optimal breakdown point ($n > 2f$) and the lowest upper bound on the \textit{expected angular error} ($\mathcal{O}(1)$) among \textit{full gradient} approaches.
We also study the \textit{actual angular error} of \textsc{Aksel} when the gradient distribution is normal and show that it only grows in $\mathcal{O}(\log{n})$, which is the first logarithmic upper bound ever proven assuming an optimal breakdown point. We also report on an empirical evaluation of \textsc{Aksel}
on various classification tasks, which we compare to alternative GARs against state-of-the-art attacks. \textsc{Aksel} is the only GAR reaching top accuracy when there is actually none or few Byzantine workers while maintaining a good defense even under the extreme case ($s=f$).
For simplicity of presentation, we consider a scheme with a single server. However, as we explain in the paper, \textsc{Aksel} can also easily
be adapted to multi-server architectures.
ACE is aimed at boosting the prevailing approach for practical fault tolerance. This approach, often named \emph{partial synchrony}, is based on a leader-based paradigm: a good leader makes progress and a bad leader does no harm. The partial synchrony approach focuses on safety and forgoes liveness under targeted and dynamic attacks. Specifically, an attacker might block specific leaders, e.g., through a denial of service, to prevent progress. ACE provides boosting by running \emph{waves} of parallel leaders and selecting a \emph{winning} leader only retroactively, achieving boosting at a linear communication cost increase.
ACE is agnostic to the fault model, inheriting it s failure model from the wrapped solution assumptions. As our evaluation shows, an asynchronous Byzantine fault tolerance (BFT) replication system built with ACE around an existing partially synchronous BFT protocol demonstrates reasonable slow-down compared with the base BFT protocol during faultless synchronous scenarios, yet exhibits significant speedup while the system is under attack.
Based on models considered in recent protocols for populations with persistent-state agents (Dudek et al. 2017; Alistarh et al. 2017; Alistarh et al. 2020), we formalize a Catalytic Input (CI) model comprised of $n$ input agents and $m$ worker agents. For $m = \Theta(n)$, we show that computing the PARITY of the input population with high probability requires at least $\Omega(n^2)$ total interactions, demonstrating a strong separation between the CI and standard population models. On the other hand, we show that the third-state dynamics can be naturally adapted to this new model to solve approximate majority in $O(n \log n)$ total steps with high probability when the input margin is $\Omega(\sqrt{n \log n})$, which preserves the time and space efficiency of the process in the original model.
Moreover we show the robustness of third-state dynamics protocols to the transient leak faults considered by (Alistarh et al. 2017; Alistarh et al 2020). In both the original and CI models, these protocols successfully compute approximate majority with high probability in the presence of leaks occurring at each time step with probability $\beta \leq O\left(\frac{\sqrt{n \log n}}{n}\right)$. The resilience of these dynamics to adversarial leaks exhibits a subtle connection to previous results involving Byzantine agents.
The tasks for which we deduce such an algorithm are maximal matching, $(degree+1)$-coloring, 2-approximation for minimum weight vertex cover, and maximal independent set (which is the most subtle case). For some of these tasks, node insertions can also be among the allowed topology changes, and for some of them also abrupt node deletions.
Furthermore, we study bi-chromatic variants of these parameters defined on a graph whose vertices are colored either red or blue, and one focuses only on distances for pairs of vertices that are colored differently. Motivated by applications in computational geometry, bi-chromatic diameter, radius and eccentricities have been recently studied in the sequential setting [Backurs et al. STOC’18, Dalirrooyfard et al. ICALP’19]. We provide the first distributed upper and lower bounds for such problems.
Our technical contributions include introducing the notion of approximate pseudo-center, which extends the pseudo-centers of [Choudhury and Gold SODA’20], and presenting an efficient distributed algorithm for computing approximate pseudo-centers. On the lower bound side, our constructions introduce the usage of new functions into the framework of reductions from 2-party communication complexity to distributed algorithms.
We demonstrate the power of hybrid networks by presenting algorithms to compute Single-Source Shortest Paths and the diameter very efficiently in sparse graphs. Specifically, we present exact $O(\log n)$ time algorithms for cactus graphs (i.e., graphs in which each edge is contained in at most one cycle), and $3$-approximations for graphs that have at most $n + O(n^{1⁄3})$ edges and arboricity $O(\log n)$. For these graph classes, our algorithms provide exponentially faster solutions than the best known algorithms for general graphs in this model. Beyond shortest paths, we also provide a variety of useful tools and techniques for hybrid networks, which may be of independent interest.
The algorithms which showed these improvements in operation time tracked a complete execution history, storing complete object state at all n processes in the system, leading to n copies of every stored data element. In this paper, we consider the question of reducing the space complexity of linearizable implementations of shared data types, which provide intuitive behavior through strong consistency guarantees. We improve the existing algorithm for a relaxed queue, showing that it is possible to store only one copy of each element in a shared queue, while still having a low amortized time cost. This is one of several important steps towards making these data types practical in real world systems.
Our key building blocks are: * A $D$-process abortable RME algorithm, for $D \leq W$, with $O(1)$ passage complexity and $O(1+F)$ super-passage complexity. We obtain this algorithm by using the Fetch-And-Add (FAA) primitive, unlike prior work on RME that uses Fetch-And-Store (FAS/SWAP). * A generic transformation that transforms any abortable RME algorithm with passage complexity of $B < W$, into an abortable RME lock with passage complexity of $O(\min(K,B))$.