Heterogeneous coarse-grained processing elements: A template architecture for embedded processing acceleration