Multi-stream spectro-temporal and cepstral features based on data-driven hierarchical phoneme clusters