We propose a methodology for non-intrusive design of concurrently self-testable FSMs. Unlike duplication schemes, wherein a replica of the original FSM acts as a predictor-comparator that immediately detects potential faults, the proposed method selects and optimizes only a minimal portion of the original FSM, adequate to detect all possible faults, yet at the cost of potential fault detection latency. Furthermore, in contrast to concurrent error detection approaches, which presume the ability to re-synthesize the FSM and exploit parity-based state encoding, the proposed method is non-intrusive and does not interfere with the state encoding and implementation of the FSM. Experimental results on FSMs of various sizes and densities indicate that the proposed method detects more than 92% of all faults with an average latency of 4 clock cycles and more than 99% of all faults with an average latency of 35 clock cycles. Furthermore, a hardware overhead cost reduction of up to 30% is achieve...