Sciweavers

249 search results - page 35 / 50
» Design and Implementation of the NUMAchine Multiprocessor
Sort
View
ARC
2007
Springer
116views Hardware» more  ARC 2007»
14 years 2 months ago
Systematic Customization of On-Chip Crossbar Interconnects
Abstract. In this paper, we present a systematic design and implementation of reconfigurable interconnects on demand. The proposed on-chip interconnection network provides identic...
Jae Young Hur, Todor Stefanov, Stephan Wong, Stama...
CODES
2006
IEEE
14 years 2 months ago
Automatic run-time extraction of communication graphs from multithreaded applications
Embedded system synthesis, multiprocessor synthesis, and thread assignment policy design all require detailed knowledge of the runtime communication patterns among different threa...
Ai-Hsin Liu, Robert P. Dick
MICRO
2005
IEEE
136views Hardware» more  MICRO 2005»
14 years 2 months ago
Automatic Thread Extraction with Decoupled Software Pipelining
Until recently, a steadily rising clock rate and other uniprocessor microarchitectural improvements could be relied upon to consistently deliver increasing performance for a wide ...
Guilherme Ottoni, Ram Rangan, Adam Stoler, David I...
PC
1998
202views Management» more  PC 1998»
13 years 8 months ago
BSPlib: The BSP programming library
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full de nition of BSPl...
Jonathan M. D. Hill, Bill McColl, Dan C. Stefanesc...
ICPP
2009
IEEE
13 years 6 months ago
Using Coherence Information and Decay Techniques to Optimize L2 Cache Leakage in CMPs
This paper evaluates several techniques to save leakage in CMP L2 caches by selectively switching off the less used lines. We primarily focus on private snoopy L2 caches. In this c...
Matteo Monchiero, Ramon Canal, Antonio Gonzá...