Different devices, such as mobile phones, soft phones, or desktop phones, have varying processing power, bandwidth, and media capabilities. Heterogeneous P2P voice systems that are built based on a set of capabilities will not be suitable for devices that have different capabilities. In this paper, we present an architecture for P2P voice systems that can dynamically change P2P overlay mechanisms to better suit different device, user, and feature requirements of a P2P voice system. As a first step towards realizing the architecture, we propose a P2P-SIP based architecture that separates out P2P mechanisms from SIP. Our architecture allows dynamic P2P structural changes, limits bloating of the SIP protocol, and lays a foundation for a flexible hierarchical system.