A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex langu...
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
We present DL8, an exact algorithm for finding a decision tree that optimizes a ranking function under size, depth, accuracy and leaf constraints. Because the discovery of optimal...
Web users display their preferences implicitly by navigating through a sequence of pages or by providing numeric ratings to some items. Web usage mining techniques are used to ext...
The number of web pages available on Internet increases day after day, and consequently finding relevant information becomes more and more a hard task. However, when we consider ...
Aliaksandr Birukou, Enrico Blanzieri, Paolo Giorgi...