Code generation for embedded processors creates opportunities for several performance optimizations not applicable for traditional compilers. We present techniques for improving d...
Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nico...
Synchronization primitives for large shared-memory multiprocessors need to minimize latency and contention. Software queue-based locks address these goals, but suffer if a process...
Robert W. Wisniewski, Leonidas I. Kontothanassis, ...
We report efficient implementation techniques for FFT-based dense multivariate polynomial arithmetic over finite fields, targeting multi-cores. We have extended a preliminary study...
Abstract. In this paper, we present a novel method for reducing the computational complexity of a Support Vector Machine (SVM) classifier without significant loss of accuracy. We a...
In this paper, we present a general and an efficient algorithm for automatic selection of new application-specific instructions under hardware resources constraints. The instructi...
Carlo Galuzzi, Elena Moscu Panainte, Yana Yankova,...