Application-specific instruction set processor (ASIP) has become an important design choice for embedded systems. It can achieve both high flexibility offered by the base processor core and high performance and energy efficiency offered by the dedicated hardware extensions. Although a lot of efforts have been devoted to computation acceleration, e.g., automatic custom instruction identification and synthesis, the limited on-chip data storage elements, including the register file and data cache, have become a potential performance bottleneck. In this paper, we propose a hardware/software cooperative approach and a linear scan register allocation algorithm to utilize the existing custom registers in ASIPs for eliminating register spills. The data traffic between the processor and memory can be reduced through efficient on-chip communications between the base processor core and custom hardware extensions. Our experimental results demonstrate that a promising performance gain can be achie...