Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
For extracting the characteristics a specific geographic entity, and notably a place, we propose to use dynamic Extreme Tagging Systems in combination with the classic approach of...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of ...
Geert Jan Bex, Wouter Gelade, Frank Neven, Stijn V...
This paper is concerned with the problem of Imbalanced Classification (IC) in web mining, which often arises on the web due to the "Matthew Effect". As web IC applicatio...