ICS 2009 | Sciweavers

91

ICS
2009
Tsinghua U.

105views Distributed And Parallel Com...» more ICS 2009»

Designing multi-socket systems using silicon photonics

15 years 10 months ago

Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth electrically due to power limitations, which will hurt their ability to drive perform...

Scott Beamer, Krste Asanovic, Christopher Batten, ...

claim paper

Read More »

135

click to vote

ICS
2009
Tsinghua U.

112views Distributed And Parallel Com...» more ICS 2009»

MPI-aware compiler optimizations for improving communication-computation overlap

15 years 10 months ago

Download www.eecis.udel.edu

Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...

Anthony Danalis, Lori L. Pollock, D. Martin Swany,...

claim paper

Read More »

81

click to vote

ICS
2009
Tsinghua U.

107views Distributed And Parallel Com...» more ICS 2009»

Pattern-based sparse matrix representation for memory-efficient SMVM kernels

15 years 10 months ago

Download people.cs.vt.edu

Mehmet Belgin, Godmar Back, Calvin J. Ribbens

claim paper

Read More »

111

click to vote

ICS
2009
Tsinghua U.

143views Distributed And Parallel Com...» more ICS 2009»

Fast and scalable list ranking on the GPU

15 years 10 months ago

Download researchweb.iiit.ac.in

General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to oﬀer the highest perfo...

M. Suhail Rehman, Kishore Kothapalli, P. J. Naraya...

claim paper

Read More »

121

click to vote

ICS
2009
Tsinghua U.

144views Distributed And Parallel Com...» more ICS 2009»

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

15 years 10 months ago

Download www.cs.virginia.edu

Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...

Jiayuan Meng, Kevin Skadron

claim paper

Read More »

113

click to vote

ICS
2009
Tsinghua U.

105views Distributed And Parallel Com...» more ICS 2009»

Cancellation of loads that return zero using zero-value caches

15 years 10 months ago

Download www.ce.chalmers.se

The speed gap between processor and memory continues to limit performance. To address this problem, we explore the potential of eliminating Zero Loads—loads accessing memory loc...

Md. Mafijul Islam, Sally A. McKee, Per Stenstr&oum...

claim paper

Read More »

149

click to vote

ICS
2009
Tsinghua U.

167views Distributed And Parallel Com...» more ICS 2009»

High-performance regular expression scanning on the Cell/B.E. processor

15 years 10 months ago

Download domino.research.ibm.com

Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in ever...

Daniele Paolo Scarpazza, Gregory F. Russell

claim paper

Read More »

86

click to vote

ICS
2009
Tsinghua U.

111views Distributed And Parallel Com...» more ICS 2009»

Using many-core hardware to correlate radio astronomy signals

15 years 10 months ago

Download www.astron.nl

Rob van Nieuwpoort, John W. Romein

claim paper

Read More »

119

click to vote

ICS
2009
Tsinghua U.

151views Distributed And Parallel Com...» more ICS 2009»

Parametric multi-level tiling of imperfectly nested loops

15 years 10 months ago

Download www.cse.ohio-state.edu

Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efﬁcient generation of multilevel tiled code is essential for maximizing da...

Albert Hartono, Muthu Manikandan Baskaran, C&eacut...

claim paper

Read More »

125

click to vote

ICS
2009
Tsinghua U.

169views Distributed And Parallel Com...» more ICS 2009»

Combining thread level speculation helper threads and runahead execution

15 years 10 months ago

Download homepages.inf.ed.ac.uk

With the current trend toward multicore architectures, improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), ...

Polychronis Xekalakis, Nikolas Ioannou, Marcelo Ci...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers