—Sensor processing is a common task within many embedded system domains, such as in control systems, the sensor feedback is used for actuator control. In this paper we have surveyed several embedded system domains, and extracted kernels of computation that are common across applications within a given domain, or across domains. We have shown that adding architectural support for executing these common kernels of computation can yield an overall better system performance. We present a light weight, simplified prototype of a Sensor Processing Unit (SPU) that offloads these computations from the main Arithmetic Logic Unit (ALU) of an embedded processor, and that accesses sensor data in a low latency manner. Our SPU prototype shows an average speed up factor of 2.48 over executing these kernels on an embedded PowerPC processor. A large portion of this speed up is due to our low latency method for accessing sensor data. Isolating our speed up to purely computation still shows