Learning to classify short and sparse text & web with hidden topics from large-scale data collections