Targeted optimization of program segments can provide an additional program speedup over the highest default optimization level, such as -O3 in GCC. The key challenge is how to au...
Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Ying...
Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to eve...
Similarity search, namely, finding approximate nearest neighborhoods, is the core of many large scale machine learning or vision applications. Recently, many research results dem...
Dynamic software bug detection tools are commonly used because they leverage run-time information. However, they suffer from a fundamental limitation, the Path Coverage Problem: t...
The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP cluster...