In this paper we study the problem of finding most topical named entities among all entities in a document, which we refer to as focused named entity recognition. We show that th...
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...
Large-scale text categorization is an important research topic for Web data mining. One of the challenges in large-scale text categorization is how to reduce the amount of human e...
In this paper we investigate a novel and important problem in multi-document summarization, i.e., how to extract an easy-tounderstand English summary for non-native readers. Exist...
Nonnegative Matrix Factorization (NMF) has been proven to be effective in text mining. However, since NMF is a well-known unsupervised components analysis technique, the existing ...