This paper presents Xetal-Pro SIMD processor, which is based on Xetal-II, one of the most computational-efficient (in terms of GOPS/Watt) processors available today. XetalPro supports ultra wide VDD scaling from nominal supply to the sub-threshold region. Although aggressive VDD scaling causes severe throughput degradation, this can be compensated by the nature of massive parallelism in the Xetal family. The predecessor of Xetal-Pro, Xetal-II, includes a large on-chip frame memory (FM), which cannot operate reliably at ultra low voltage. Therefore we investigate both different FM realizations and memory organization alternatives. We propose a hybrid memory architecture which reduces the non-local memory traffic and enables further VDD scaling. Compared to Xetal-II operating at nominal voltage, we could gain more than 10
Yifan He, Yu Pu, Richard P. Kleihorst, Zhenyu Ye,