Sciweavers

64 search results - page 11 / 13
» Optimal register assignment to loops for embedded code gener...
Sort
View
CF
2009
ACM
14 years 1 months ago
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedde...
Ioannis E. Venetis, Guang R. Gao
ASPLOS
2008
ACM
13 years 9 months ago
Communication optimizations for global multi-threaded instruction scheduling
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several lim...
Guilherme Ottoni, David I. August
CODES
2004
IEEE
13 years 10 months ago
Operation tables for scheduling in the presence of incomplete bypassing
Register bypassing is a powerful and widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, bypassing ha...
Aviral Shrivastava, Eugene Earlie, Nikil D. Dutt, ...
LCTRTS
2005
Springer
14 years 13 days ago
Cache aware optimization of stream programs
Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly preval...
Janis Sermulins, William Thies, Rodric M. Rabbah, ...
LCTRTS
2005
Springer
14 years 13 days ago
Complementing software pipelining with software thread integration
Software pipelining is a critical optimization for producing efficient code for VLIW/EPIC and superscalar processors in highperformance embedded applications such as digital sign...
Won So, Alexander G. Dean