In the era of multicores, many applications that tend to require substantial compute power and data crunching (aka Throughput Computing Applications) can now be run on desktop PCs...
Chi-Keung Luk, Ryan Newton, William Hasenplaugh, M...
Parallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF8 to UTF-...
A generic and retargetable tool flow is presented that enables the export of timing data from software running on a cycle-accurate Virtual Prototype (VP) to a concurrent function...
Trevor Meyerowitz, Alberto L. Sangiovanni-Vincente...
Nowadays, key characteristics of a processor's instruction set are only exploited in high-level languages by using inline assembly or compiler intrinsics. Inserting intrinsic...
Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-perfo...