The development of efficient parallel out-of-core applications is often tedious, because of the need to explicitly manage the movement of data between files and data structures ...
The thesis of this research is that the task of exposing the parallelism in a given application should be left to the algorithm designer, who has intimate knowledge of the applica...
For a specified application, there is an opportunity to improve cache performance by smart choosing of index bits of a cache. A texture cache for texture mapping of 3D computer gr...
In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. Its main goal is to provide pa...
We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of ...
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govi...