Predictions computed by a classification tree are usually constant on axis-parallel hyperrectangles corresponding to the leaves and have strict jumps on their boundaries. The density function of the underlying class distribution may be continuous and the gradient vector may not be parallel to any of the axes. In these cases a better approximation may be expected, if the prediction function of the original tree is replaced by a more complex continuous approximation. The approximation is constructed using the same training data on which the original tree was grown and the structure of the tree is preserved. The current paper uses the model of trees with soft splits suggested by Quinlan and implemented in C4.5, however, the training algorithm is substantially different. The method uses simulated annealing, so it is quite computationally expensive. However, this allows to adjust the soft thresholds in groups of the nodes simultaneously in a way that better captures interactions between sev...