To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of...
Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, Jack D...
: The exposed memory hierarchies employed in many network processors (NPs) are expensive in terms of meeting the worst-case processing requirement. Moreover, it is difficult to ef...
Growing demand for high performance in embedded systems is creating new opportunities for Instruction-Level Parallelism ILP techniques that are traditionally used in high perform...
Daniel A. Connors, Jean-Michel Puiatti, David I. A...
For high-data-rate applications, turbo-like iterative decoders are implemented with parallel hardware architecture. However, to achieve high throughput, concurrent accesses to each...
Awais Sani, Philippe Coussy, Cyrille Chavet, Eric ...
This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information ...
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. P...