Speech communication consists of three steps: production, transmission, and hearing. Every step inevitably involves acoustic distortions due to gender differences, age, microphone- and room-related factors, and so on. In spite of these variations, listeners can extract linguistic information from speech as easily as if the communications had not been affected by variations at all. One may hypothesize that listeners modify their internal acoustic models whenever extralinguistic factors change. Another possibility is that the linguistic information in speech can be represented separately from the extralinguistic factors. In this study, inspired by studies of humans and animals, a novel solution to the problem of intrinsic variations is proposed. Speech structures invariant to these variations are derived as transform-invariant features and their linguistic validity is discussed. Their high robustness is demonstrated by applying the speech structures to automatic speech recognition and ...