Bytecode hardware-translation improves the performance of a Java Virtual Machine (JVM) with small hardware resource and complexity overhead. Instruction folding is a technique to further improve the performance of a JVM by reducing the redundancy in its stack-based operations. However, the variable instruction length of the Java bytecode makes the folding logic complex. In this paper, we propose a folding scheme with reduced hardware complexity and evaluate its performance. For eleven benchmark cases, the proposed scheme folded 7.1% to 36.8% of the bytecodes which correspond to 74.0% to 99.7% of the PicoJava-II’s folding performance.