As the technology progresses, interconnect delays have become bottlenecks of chip performance. Three dimensional (3D) integrated circuits are proposed as one way to address this p...
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representati...
David E. Culler, Richard M. Karp, David A. Patters...
Strassen’s matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen’s reduces the total number of operations. Strasse...
Memory system bottlenecks limit performance for many applications, and computations with strided access patterns are among the hardest hit. The streams used in such applications h...
In this paper, we study the effectiveness of the multilevel paradigm in considerably reducing the diagnosis latency of distributed algorithms for fault detection in networks with ...