Abstract. Infants acquire spoken language through hearing and imitating utterances mainly from their parents [1,2,3] but never imitate their parents’ voices as they are. What in the voices do the infants imitate? Due to poor phonological awareness, it is difficult for them to decode an input utterance into a string of small linguistic units like phonemes [3,4,5,6], so it is also difficult for them to convert the individual units into sounds with their mouths. What then do infants acoustically imitate? Developmental psychology claims that they extract the holistic sound pattern of an input word, called word Gestalt [3,4,5], and reproduce it with their mouths. We address the question “What is the acoustic definition of word Gestalt?” [7] It has to be speaker-invariant because infants extract the same word Gestalt for a particular input word irrespective of the person speaking that word to them. Here, we aim to answer the above question by regarding speech as timbre-based melody th...