Sciweavers

FPGA
2016
ACM
108views FPGA» more  FPGA 2016»
8 years 8 months ago
A Study of Pointer-Chasing Performance on Shared-Memory Processor-FPGA Systems
The advent of FPGA acceleration platforms with direct coherent access to processor memory creates an opportunity for accelerating applications with irregular parallelism governed ...
Gabriel Weisz, Joseph Melber, Yu Wang, Kermin Flem...
FPGA
2016
ACM
63views FPGA» more  FPGA 2016»
8 years 8 months ago
Automatically Optimizing the Latency, Area, and Accuracy of C Programs for High-Level Synthesis
Loops are pervasive in numerical programs, so high-level synthesis (HLS) tools use state-of-the-art scheduling techniques to pipeline them efficiently. Still, the run time perform...
Xitong Gao, John Wickerson, George A. Constantinid...
FPGA
2016
ACM
75views FPGA» more  FPGA 2016»
8 years 8 months ago
FPRESSO: Enabling Express Transistor-Level Exploration of FPGA Architectures
In theory, tools like VTR—a retargetable toolchain mapping circuits onto easily-described hypothetical FPGA architectures—could play a key role in the development of wildly in...
Grace Zgheib, Manana Lortkipanidze, Muhsen Owaida,...
FPGA
2016
ACM
72views FPGA» more  FPGA 2016»
8 years 8 months ago
CASK: Open-Source Custom Architectures for Sparse Kernels
Sparse matrix vector multiplication (SpMV) is an important kernel in many scientific applications. To improve the performance and applicability of FPGA based SpMV, we propose an ...
Paul Grigoras, Pavel Burovskiy, Wayne Luk
FPGA
2016
ACM
69views FPGA» more  FPGA 2016»
8 years 8 months ago
A Case for Work-stealing on FPGAs with OpenCL Atomics
We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman–Tsigas implementation for GPUs, we synchronize workitems...
Nadesh Ramanathan, John Wickerson, Felix Winterste...
FPGA
2016
ACM
71views FPGA» more  FPGA 2016»
8 years 8 months ago
Resolve: Generation of High-Performance Sorting Architectures from High-Level Synthesis
Field Programmable Gate Array (FPGA) implementations of sorting algorithms have proven to be efficient, but existing implementations lack portability and maintainability because t...
Janarbek Matai, Dustin Richmond, Dajung Lee, Zac B...
FPGA
2016
ACM
83views FPGA» more  FPGA 2016»
8 years 8 months ago
GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths
Bitwidth optimization of FPGA datapaths can save hardware resources by choosing the fewest number of bits required for each datapath variable to achieve a desired quality of resul...
Nachiket Kapre, Deheng Ye