The advent of FPGA acceleration platforms with direct coherent access to processor memory creates an opportunity for accelerating applications with irregular parallelism governed ...
Gabriel Weisz, Joseph Melber, Yu Wang, Kermin Flem...
Loops are pervasive in numerical programs, so high-level synthesis (HLS) tools use state-of-the-art scheduling techniques to pipeline them efficiently. Still, the run time perform...
Xitong Gao, John Wickerson, George A. Constantinid...
In theory, tools like VTR—a retargetable toolchain mapping circuits onto easily-described hypothetical FPGA architectures—could play a key role in the development of wildly in...
Sparse matrix vector multiplication (SpMV) is an important kernel in many scientific applications. To improve the performance and applicability of FPGA based SpMV, we propose an ...
We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman–Tsigas implementation for GPUs, we synchronize workitems...
Nadesh Ramanathan, John Wickerson, Felix Winterste...
Field Programmable Gate Array (FPGA) implementations of sorting algorithms have proven to be efficient, but existing implementations lack portability and maintainability because t...
Bitwidth optimization of FPGA datapaths can save hardware resources by choosing the fewest number of bits required for each datapath variable to achieve a desired quality of resul...