Due to the continuously decreasing cost of FPGAs, they have become a valid implementation platform for SOCs. Typically, a soft core processor implementation is used to execute the software parts of the SOC. As each system is individually designed for a particular application, the idea is natural to support compute intensive parts of the code through customized hardware acceleration. Two different architectural variants have been proposed for this purpose in SOCs: either as an instruction set extension with specialized pipeline implementation or as a peripheral component that is programmed through memory mapping. In this contribution we analyze the efficiency (speedup related to LUTs) of those two variants.