Sciweavers

244 search results - page 19 / 49
» Optimizing Loop Performance for Clustered VLIW Architectures
Sort
View
ICPP
1998
IEEE
14 years 1 days ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang
MICRO
2005
IEEE
130views Hardware» more  MICRO 2005»
14 years 1 months ago
Exploiting Vector Parallelism in Software Pipelined Loops
An emerging trend in processor design is the addition of short vector instructions to general-purpose and embedded ISAs. Frequently, these extensions are employed using traditiona...
Samuel Larsen, Rodric M. Rabbah, Saman P. Amarasin...
ECIR
2004
Springer
13 years 9 months ago
Performance Analysis of Distributed Architectures to Index One Terabyte of Text
We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of r...
Fidel Cacheda, Vassilis Plachouras, Iadh Ounis
PROCEDIA
2011
12 years 10 months ago
10x10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency
Two decades of microprocessor architecture driven by quantitative 90/10 optimization has delivered an extraordinary 1000-fold improvement in microprocessor performance, enabled by...
Andrew A. Chien, Allan Snavely, Mark Gahagan
MICRO
1998
IEEE
91views Hardware» more  MICRO 1998»
14 years 1 days ago
Effective Cluster Assignment for Modulo Scheduling
Clustering is one solution to the demand for wideissue machines and fast clock cycles because it allows for smaller, less ported register files and simpler bypass logic while rema...
Erik Nystrom, Alexandre E. Eichenberger