In this paper, we propose a novel document clustering method based on the non-negative factorization of the termdocument matrix of the given document corpus. In the latent semanti...
Among the retrieval models that have been proposed in the last years, the ESA model of Gabrilovich and Markovitch received much attention. The authors report on a significant imp...
We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
This paper describes a codesign environment based on the WWW (World Wide Web) and its implementation. Tool invocations and their respective results are linked using hypertext docu...
In this paper, we present an analysis based on linguistic and typographic features that allows for the identification of titles in web documents. We focus in particular on procedu...