We recently proposed a new approach to parallelization, by decomposing the time domain, instead of the conventional space domain. This improves latency tolerance, and we demonstrat...
Increasing non-recurring engineering (NRE) and mask costs are making it harder to turn to hardwired Application Specific Integrated Circuit (ASIC) solutions for high performance a...
OpenMP allows programmers to specify nested parallelism in parallel applications. In the case of scientific applications, parallel loops are the most important source of paralleli...
Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system....
Xingfu Wu, Valerie E. Taylor, Charles W. Lively, S...
Distributed applications, especially the ones being I/O intensive, often access the storage subsystem in a non-sequential way (stride requests). Since such behaviors lower the ove...