A substantial subset of the web data follows some kind of underlying structure. Nevertheless, HTML does not contain any schema or semantic information about the data it represents...
Nowadays, data mining is based on low-level speci cations of the employed techniques typically bounded to a speci c analysis platform. Therefore, data mining lacks a modelling arc...
The Iceberg SemiJoin (ISJ) of two datasets R and S returns the tuples in R which join with at least k tuples of S. The ISJ operator is essential in many practical applications incl...
Mohammed Kasim Imthiyaz, Dong Xiaoan, Panos Kalnis
This paper investigates methods to automatically infer structural information from large XML documents. Using XML as a reference format, we approach the schema generation problem ...
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...