The World Wide Web currently has a huge amount of data, with practically no classification information, and this makes it extremely difficult to handle effectively. It has been re...
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. In a traditional topic model which plays an important role in the uns...
—Probabilistic topic models were originally developed and utilised for document modeling and topic extraction in Information Retrieval. In this paper we describe a new approach f...
Abstract. The PROBADO project is a research effort to develop Digital Library support for non-textual documents. The main goal is to contribute to all parts of the Digital Library...
We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to ...