Technical Program

Paper Detail

Paper Title Rateless Codes for Distributed Computations with Sparse Compressed Matrices
Paper IdentifierFR3.R2.1
Authors Ankur Mallick, Gauri Joshi, Carnegie Mellon University, United States
Session Coding for Stragglers
Location Saint Germain, Level 3
Session Time Friday, 12 July, 14:30 - 16:10
Presentation Time Friday, 12 July, 14:30 - 14:50
Manuscript  Click here to download the manuscript
Abstract Unpredictable slowdown of worker nodes, or node straggling, is a major bottleneck when performing large matrix computations such as matrix-vector multiplication in a distributed fashion. For sparse matrices, the problem is compounded by irregularities in the distribution of non-zero elements, which leads to an imbalance in the computation load at different nodes. To mitigate the effect of stragglers we use \emph{rateless} codes that generate redundant linear combinations of the matrix rows (or columns) and distribute them across workers. This coding scheme utilizes all partial work done by worker nodes, and eliminates tail latency. We also propose a balanced row-allocation strategy for allocating rows of a sparse matrix to workers that ensures that equal amount of non-zero matrix entries are assigned to each worker. The entire scheme is designed to work with compressed, memory-efficient formats like CSR/CSC that are used to store sparse matrices in practice. Theoretical analysis and simulations show that our balanced rateless coding strategy achieves significantly lower overall latency than conventional sparse matrix-vector multiplication strategies.