Single-particle 3D reconstruction from cryo-electron microscopy (cryo-EM) images is a kernel application of biological molecules analysis, as the computational requirement of whic...
New static source routing algorithms for High Performance Computing (HPC) are presented in this work. The target parallel architectures are based on the commonly used fattree netw...
Modern GPUs offer much computing power at a very modest cost. Even though CUDA and other related recent developments are accelerating the use of GPUs for general purpose applicati...
The L2 cache is commonly managed using LRU policy. For workloads that have a working set larger than L2 cache, LRU behaves poorly, resulting in a great number of less reused lines...
Molecular Dynamics applications enhance our understanding of biological phenomena through bio-molecular simulations. Large-scale parallelization of MD simulations is challenging b...
“Is transactional memory useful?” is the question that cannot be answered until we provide substantial applications that can evaluate its capabilities. While existing TM appli...
Vladimir Gajinov, Ferad Zyulkyarov, Osman S. Unsal...
This paper proposes DCC (Dynamic Cache Clustering), a novel distributed cache management scheme for large-scale chip multiprocessors. Using DCC, a per-core cache cluster is compri...
To sustain emerging data-intensive scientific applications, High Performance Computing (HPC) centers invest a notable fraction of their operating budget on a specialized fast sto...
Henry M. Monti, Ali Raza Butt, Sudharshan S. Vazhk...
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...