Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
Recent work has shown that the use of switched current methods can provide an effective route to implementation of analog IC functionality using a standard digital CMOS process. Fu...
-Multi-dimensional applications, such as image processing and seismic analysis, usually require the optimized performance obtained from instruction-level parallelism. The critical ...
Customizing architectures for particular applications is a promising approach to yield highly energy-efficient designs for embedded systems. This work explores the benefits of arc...
The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. Thi...
Surendra Byna, William D. Gropp, Xian-He Sun, Raje...