In this paper, we present an architecture exploration methodology for low-end embedded systems where the reduction of cost is a primary design concern. The architecture exploration of such systems needs to explore a wide design space spanned by detailed architecture parameters through cycle-accurate performance estimation. For fast exploration, the proposed methodology is based on an efficient evolutionary algorithm, called QEA, and trace-driven simulation to evaluate architecture candidates quickly. We applied the proposed methodology to NAND flashbased Multimedia Card as a case study considering the following design parameters: buffer size, flash memory configuration, clock, communication architecture, and memory allocation. The experimental results validate the proposed methodology by showing the optimal architecture configurations with varying performance constraints and design parameters.