Theoretical Computer Science

145

ICS
2009
Tsinghua U.

140views Distributed And Parallel Com...» more ICS 2009»

High-performance CUDA kernel execution on FPGAs

15 years 11 months ago

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...

Alexandros Papakonstantinou, Karthik Gururaj, John...

claim paper

Read More »

100

click to vote

ICS
2009
Tsinghua U.

105views Distributed And Parallel Com...» more ICS 2009»

Designing multi-socket systems using silicon photonics

15 years 11 months ago

Download www.cs.berkeley.edu

Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth electrically due to power limitations, which will hurt their ability to drive perform...

Scott Beamer, Krste Asanovic, Christopher Batten, ...

claim paper

Read More »

141

click to vote

ICS
2009
Tsinghua U.

112views Distributed And Parallel Com...» more ICS 2009»

MPI-aware compiler optimizations for improving communication-computation overlap

15 years 11 months ago

Download www.eecis.udel.edu

Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...

Anthony Danalis, Lori L. Pollock, D. Martin Swany,...

claim paper

Read More »

95

Voted

ICS
2009
Tsinghua U.

107views Distributed And Parallel Com...» more ICS 2009»

Pattern-based sparse matrix representation for memory-efficient SMVM kernels

15 years 11 months ago

Download people.cs.vt.edu

Mehmet Belgin, Godmar Back, Calvin J. Ribbens

claim paper

Read More »

121

click to vote

ICS
2009
Tsinghua U.

143views Distributed And Parallel Com...» more ICS 2009»

Fast and scalable list ranking on the GPU

15 years 11 months ago

Download researchweb.iiit.ac.in

General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to oﬀer the highest perfo...

M. Suhail Rehman, Kishore Kothapalli, P. J. Naraya...

claim paper

Read More »

131

click to vote

ICS
2009
Tsinghua U.

144views Distributed And Parallel Com...» more ICS 2009»

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

15 years 11 months ago

Download www.cs.virginia.edu

Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...

Jiayuan Meng, Kevin Skadron

claim paper

Read More »

125

click to vote

ICS
2009
Tsinghua U.

105views Distributed And Parallel Com...» more ICS 2009»

Cancellation of loads that return zero using zero-value caches

15 years 11 months ago

Download www.ce.chalmers.se

The speed gap between processor and memory continues to limit performance. To address this problem, we explore the potential of eliminating Zero Loads—loads accessing memory loc...

Md. Mafijul Islam, Sally A. McKee, Per Stenstr&oum...

claim paper

Read More »

161

click to vote

ICS
2009
Tsinghua U.

167views Distributed And Parallel Com...» more ICS 2009»

High-performance regular expression scanning on the Cell/B.E. processor

15 years 11 months ago

Download domino.research.ibm.com

Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in ever...

Daniele Paolo Scarpazza, Gregory F. Russell

claim paper

Read More »

98

click to vote

ICS
2009
Tsinghua U.

111views Distributed And Parallel Com...» more ICS 2009»

Using many-core hardware to correlate radio astronomy signals

15 years 11 months ago

Download www.astron.nl

Rob van Nieuwpoort, John W. Romein

claim paper

Read More »

128

Voted

ICS
2009
Tsinghua U.

151views Distributed And Parallel Com...» more ICS 2009»

Parametric multi-level tiling of imperfectly nested loops

15 years 11 months ago

Download www.cse.ohio-state.edu

Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efﬁcient generation of multilevel tiled code is essential for maximizing da...

Albert Hartono, Muthu Manikandan Baskaran, C&eacut...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers