Search Sciweavers | Sciweavers

45 search results - page 2 / 9

» Performance Evaluation of Tiling for the Register Level

click to vote

CF
2009
ACM

178views Applied Computing» more CF 2009»

Mapping the LU decomposition on a many-core architecture: challenges and solutions

14 years 4 months ago

Download www.capsl.udel.edu

Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedde...

Ioannis E. Venetis, Guang R. Gao

claim paper

Read More »

click to vote

ICS
1995
Tsinghua U.

104views Distributed And Parallel Com...» more ICS 1995»

Optimum Modulo Schedules for Minimum Register Requirements

14 years 1 months ago

Download domino.research.ibm.com

Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirement...

Alexandre E. Eichenberger, Edward S. Davidson, San...

claim paper

Read More »

click to vote

HPCA
2008
IEEE

174views Distributed And Parallel Com...» more HPCA 2008»

An OS-based alternative to full hardware coherence on tiled CMPs

14 years 10 months ago

Download www.inf.ed.ac.uk

The interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these architectures from scaling...

Christian Fensch, Marcelo Cintra

claim paper

Read More »

click to vote

MICRO
2000
IEEE

176views Hardware» more MICRO 2000»

An Advanced Optimizer for the IA-64 Architecture

13 years 9 months ago

Download www.info.uni-karlsruhe.de

level of abstraction, compared with the program representation for scalar optimizations. For example, loop unrolling and loop unrolland-jam transformations exploit the large regist...

Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel M....

claim paper

Read More »

click to vote

EUROPAR
2010
Springer

189views Distributed And Parallel Com...» more EUROPAR 2010»

Optimized Dense Matrix Multiplication on a Many-Core Architecture

13 years 11 months ago

Download www.capsl.udel.edu

Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...

Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...

claim paper

Read More »

« Prev « First page 2 / 9 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers