This paper investigates the combination of different neural network topologies for probabilistic feature extraction. On one hand, a five-layer neural network used in bottle neck feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transform, independently of the training targets. On the other hand, a hierarchical processing technique is effective and robust over several conditions. Even though the hierarchical and bottle neck processing performs equally well, the combination of both topologies improves the system by 5% relative. Furthermore, the MFCC baseline system is improved by up to 20% relative. This behaviour could be confirmed on two different tasks. In addition, we analyse the influence of multi-resolution RASTA filtering and long-term spectral features as input for the neural network feature extraction.