Sciweavers

1652 search results - page 149 / 331
» A performance analysis of local synchronization
Sort
View
DAC
2003
ACM
16 years 5 months ago
Multilevel global placement with retiming
Multiple clock cycles are needed to cross the global interconnects for multi-gigahertz designs in nanometer technologies. For synchronous designs, this requires retiming and pipel...
Jason Cong, Xin Yuan
ASPLOS
2009
ACM
16 years 5 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards
PPOPP
2006
ACM
15 years 10 months ago
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further ex...
Xianghui Hu, Xinan Tang, Bei Hua
MICRO
2002
IEEE
108views Hardware» more  MICRO 2002»
15 years 9 months ago
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
We describe the design, analysis, and performance of an on–line algorithm to dynamically control the frequency/voltage of a Multiple Clock Domain (MCD) microarchitecture. The MC...
Greg Semeraro, David H. Albonesi, Steve Dropsho, G...
IJHPCN
2006
116views more  IJHPCN 2006»
15 years 4 months ago
Implications of application usage characteristics for collective communication offload
Abstract-- The performance of collective communication operations is known to have a significant impact on the scalability of some applications. Indeed, the global, synchronous nat...
Ron Brightwell, Sue Goudy, Arun Rodrigues, Keith D...