Abstract. In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, ext...
Documents, such as those seen on Wikipedia and Folksonomy, have tended to be assigned with multiple topics as a meta-data. Therefore, it is more and more important to analyze a re...
Relevance feedback has been considered as a means of incorporating learning into information retrieval systems for quite sometime now. This paper discusses the research results of...
We propose a hybrid clustering strategy by integrating heterogeneous information sources as graphs. The hybrid clustering method is extended on the basis of modularity based Louva...
Xinhai Liu, Shi Yu, Yves Moreau, Frizo A. L. Janss...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...