This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit ...
John H. Kelm, Daniel R. Johnson, Steven S. Lumetta...
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL...
Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor N. ...
An increasing number of distributed applications are being constructed by composing them out of existing applications. The resulting applications can be very complex in structure,...
Abstract— The success of classical high level synthesis has been limited by the complexity of the applications it can handle, typically not large enough to necessitate the depart...
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...