Complex application-specific media instructions and kernels are emulated with simple to implement extended subword instructions. We show that assuming extended register file entries to accommodate intermediate results and by implementing a few simple instructions, packing/unpacking, saturation, and frequently used complex instructions can be practically eliminated. It is shown that in most emulations there is a potential performance improvement, making the proposed scheme suitable for embedded processors with a limited hardware budget. Categories and Subject Descriptors
Ben H. H. Juurlink, Asadollah Shahbahrami, Stamati