We describe results of a case study whose intent was to determine whether new techniques for hardware/software partitioning of an application’s binary are competitive with partitioning at the C source code level. While such competitiveness has been shown previously for standard benchmark suites involving smaller or unoptimized applications, the case study instead focuses on a complete 16,000-line highlyoptimized commercial-grade application, namely an H.264 video decoder. The several month study revealed that binary partitioning was indeed competitive, achieving nearly identical 2.5x speedups as source level partitioning, compared to a standard microprocessor. Furthermore, the study revealed that several simple C-level coding modifications, including pass by value-return, function specialization, algorithmic specialization, hardware-targeted reimplementation, global array elimination, hoisting and sinking of error code, and conversion to explicit control flow, could lead to improved...