How an internal observer, that is not given any a priori knowledge or interpretation of what its sensors receives, learn to imitate seems a formidable issue from a viewpoint of a constructivist approach towards both establishing the design principle for an intelligent robot and understanding human intelligence. This paper argue two issues towards imitation by an internal observer: one concerns how to construct the self body representation of the robot with vision and proprioception and the other concerns how to construct a mapping of vocalization between agents with different articulation systems. Preliminary results with real robots are given.