Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. We p...
Phrase has been considered as a more informative feature term for improving the effectiveness of document clustering. In this paper, we propose a phrase-based document similarity t...
Clustering web search engine results for ambiguous keyword searches poses unique challenges. First, we show that one cannot readily import the frequency based feature ranking to c...
Today, a huge amount of text is being generated for social purposes on social networking services on the Web. Unlike traditional documents, such text is usually extremely short an...
Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and ana...