: This work focuses on clustering a site into groups of documents that are predictive of future user accesses. Two approaches have been developed and tested. The first approach use...
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
Successfully structuring information in databases, OLAP cubes, and XML is a crucial element in managing data nowadays. However this process brought new challenges to usability. It...
The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the cont...
One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructe...