Sciweavers

449 search results - page 7 / 90
» Optimizing the use of GPU memory in applications with large ...
Sort
View
SI3D
2010
ACM
14 years 2 months ago
Parallel Banding Algorithm to compute exact distance transform with the GPU
We propose a Parallel Banding Algorithm (PBA) on the GPU to compute the exact Euclidean Distance Transform (EDT) for a binary image in 2D and higher dimensions. Partitioning the i...
Thanh-Tung Cao, Ke Tang, Anis Mohamed, Tiow Seng T...
ISCA
1993
IEEE
112views Hardware» more  ISCA 1993»
13 years 11 months ago
Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors
The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with...
Edward Rothberg, Jaswinder Pal Singh, Anoop Gupta
ICCAD
2009
IEEE
118views Hardware» more  ICCAD 2009»
13 years 5 months ago
Memory organization and data layout for instruction set extensions with architecturally visible storage
Present application specific embedded systems tend to choose instruction set extensions (ISEs) based on limitations imposed by the available data bandwidth to custom functional un...
Panagiotis Athanasopoulos, Philip Brisk, Yusuf Leb...
ASPDAC
2000
ACM
120views Hardware» more  ASPDAC 2000»
13 years 11 months ago
Data memory minimization by sharing large size buffers
- This paper presents software synthesis techniques to deal with non-primitive data type from graphical dataflow programs based on the synchronous dataflow (SDF) model. Non-primiti...
Hyunok Oh, Soonhoi Ha
PLDI
1995
ACM
13 years 11 months ago
Unifying Data and Control Transformations for Distributed Shared Memory Machines
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Contr...
Michal Cierniak, Wei Li