Sciweavers

1993 search results - page 357 / 399
» Designing and Building Parallel Program
Sort
View
CANPC
1999
Springer
14 years 2 months ago
Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware
Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best perfo...
David Brooks, Margaret Martonosi
MICRO
2010
IEEE
153views Hardware» more  MICRO 2010»
13 years 8 months ago
Throughput-Effective On-Chip Networks for Manycore Accelerators
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network des...
Ali Bakhoda, John Kim, Tor M. Aamodt
ICS
2009
Tsinghua U.
14 years 4 months ago
High-performance regular expression scanning on the Cell/B.E. processor
Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in ever...
Daniele Paolo Scarpazza, Gregory F. Russell
IPPS
2009
IEEE
14 years 4 months ago
Implementing and evaluating multithreaded triad census algorithms on the Cray XMT
Commonly represented as directed graphs, social networks depict relationships and behaviors among social entities such as people, groups, and organizations. Social network analysi...
George Chin Jr., Andrès Márquez, Sut...
ICS
2001
Tsinghua U.
14 years 2 months ago
Slice-processors: an implementation of operation-based prediction
We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operation...
Andreas Moshovos, Dionisios N. Pnevmatikatos, Amir...