Technical Program

Paper Detail

Paper Title Distributed Matrix Multiplication with MDS Array BP-XOR Codes for Scaling Clusters
Paper IdentifierTH1.R2.4
Authors Suayb Arslan, MEF University, Turkey
Session Coded Computing II
Location Saint Germain, Level 3
Session Time Thursday, 11 July, 09:50 - 11:10
Presentation Time Thursday, 11 July, 10:50 - 11:10
Manuscript  Click here to download the manuscript
Abstract This study presents a novel coded computation technique for distributed matrix-matrix product computation at a massive scale that outperforms well known previous strategies in terms of total execution time. Our method achieves this performance by distributing the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. The product computation is performed using MDS array Belief Propagation (BP)-decodable codes based on pure XOR operations. In addition, our scheme is configurable and suited for modern compute node architectures equipped with multiple processing units organized in a hierarchical manner. Assuming the number of backup nodes being sublinear in the size of the product, we shall demonstrate that the proposed scheme achieves order-optimal computation from an end-to-end latency perspective while ensuring acceptable communication requirements that can be addressed by today's high speed network link infrastructures.