Sciweavers

244 search results - page 18 / 49
» Optimizing Loop Performance for Clustered VLIW Architectures
Sort
View
PLDI
1993
ACM
13 years 12 months ago
Global Optimizations for Parallelism and Locality on Scalable Parallel Machines
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
Jennifer-Ann M. Anderson, Monica S. Lam
ISCAS
2005
IEEE
191views Hardware» more  ISCAS 2005»
14 years 1 months ago
Behavioural modeling and simulation of a switched-current phase locked loop
Recent work has shown that the use of switched current methods can provide an effective route to implementation of analog IC functionality using a standard digital CMOS process. Fu...
Peter R. Wilson, Reuben Wilcock
ISCAPDCS
2001
13 years 9 months ago
Branch Prediction of Conditional Nested Loops through an Address Queue
-Multi-dimensional applications, such as image processing and seismic analysis, usually require the optimized performance obtained from instruction-level parallelism. The critical ...
Zhigang Jin, Nelson L. Passos, Virgil Andronache
CASES
2006
ACM
13 years 11 months ago
Efficient architectures through application clustering and architectural heterogeneity
Customizing architectures for particular applications is a promising approach to yield highly energy-efficient designs for embedded systems. This work explores the benefits of arc...
Lukasz Strozek, David Brooks
CLUSTER
2003
IEEE
14 years 1 months ago
Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost
The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. Thi...
Surendra Byna, William D. Gropp, Xian-He Sun, Raje...