We revisit and use the dependence transformation method to generate parallel algorithms suitable for cluster and grid computing. We illustrate this method in two applications: to o...
Ulisses Kendi Hayashida, Kunio Okuda, Jairo Panett...
Abstract—Message progression schemes that enable communication and computation to be overlapped have the potential to improve the performance of parallel applications. With curre...
In this contribution we introduce a low-complexity bit-parallel algorithm for computing square roots over binary extension fields. Our proposed method can be applied for any type ...
Most application level fault tolerance schemes in literature are non-adaptive in the sense that the fault tolerance schemes incorporated in applications are usually designed witho...
Zizhong Chen, Ming Yang, Guillermo A. Francia III,...
Multithreading has been proposed as an architectural strategy for tolerating latency in multiprocessors and, through limited empirical studies, shown to offer promise. This paper ...
Rafael H. Saavedra-Barrera, David E. Culler, Thors...