Sciweavers

309 search results - page 22 / 62
» Parallel Memory Architecture for Arbitrary Stride Accesses
Sort
View
IJPP
2010
156views more  IJPP 2010»
13 years 4 months ago
ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures
Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform ar...
François Broquedis, Nathalie Furmento, Bric...
ICS
2005
Tsinghua U.
14 years 1 months ago
A heterogeneously segmented cache architecture for a packet forwarding engine
As network traffic continues to increase and with the requirement to process packets at line rates, high performance routers need to forward millions of packets every second. Eve...
Kaushik Rajan, Ramaswamy Govindarajan
SPAA
1995
ACM
13 years 11 months ago
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent yea...
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias,...
IPPS
2009
IEEE
14 years 2 months ago
Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture
DTA (Decoupled Threaded Architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on exi...
Roberto Giorgi, Zdravko Popovic, Nikola Puzovic
IEEEPACT
2002
IEEE
14 years 14 days ago
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that su...
Jaewook Shin, Jacqueline Chame, Mary W. Hall