Sciweavers

901 search results - page 95 / 181
» Hiding Communication Latency in Data Parallel Applications
Sort
View
VLSID
2003
IEEE
183views VLSI» more  VLSID 2003»
14 years 8 months ago
Design of a 2D DCT/IDCT application specific VLIW processor supporting scaled and sub-sampled blocks
We present an innovative design of an accurate, 2D DCT IDCT processor, which handles scaled and sub-sampled input blocks efficiently. In the IDCT mode, the latency of the processo...
Rohini Krishnan, Om Prakash Gangwal, Jos T. J. van...
DAGM
2003
Springer
14 years 1 months ago
Domain Decomposition for Parallel Variational Optical Flow Computation
We present an approach to parallel variational optical flow computation by using an arbitrary partition of the image plane and iteratively solving related local variational proble...
Timo Kohlberger, Christoph Schnörr, Andr&eacu...
SIGCOMM
2009
ACM
14 years 2 months ago
Safe and effective fine-grained TCP retransmissions for datacenter communication
This paper presents a practical solution to a problem facing high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets—the TCP incast problem. In these netw...
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Eli...
LCPC
2005
Springer
14 years 1 months ago
Optimizing Packet Accesses for a Domain Specific Language on Network Processors
Programming network processors remains a challenging task since their birth until recently when high-level programming environments for them are emerging. By employing domain speci...
Tao Liu, Xiao-Feng Li, Lixia Liu, Chengyong Wu, Ro...
IPPS
2007
IEEE
14 years 2 months ago
Load Miss Prediction - Exploiting Power Performance Trade-offs
— Modern CPUs operate at GHz frequencies, but the latencies of memory accesses are still relatively large, in the order of hundreds of cycles. Deeper cache hierarchies with large...
Konrad Malkowski, Greg M. Link, Padma Raghavan, Ma...