Testingthe performance scalabilityof parallelprograms can be a time consuming task, involving many performance runs for different computer configurations, processor numbers, and p...
Allen D. Malony, Vassilis Mertsiotakis, Andreas Qu...
Many modern embedded processors (esp. DSPs) support partitioned memory banks (also called X-Y memory or dual bank memory) along with parallel load/store instructions to achieve co...
Xiaotong Zhuang, Santosh Pande, John S. Greenland ...
—With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as w...
Jiangtian Li, Xiaosong Ma, Karan Singh, Martin Sch...
We had introduced the massively parallel global cellular automata (GCA) model. Parallel algorithms derived from applications can be mapped straight forward onto this model. In thi...
This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our sche...