When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web s...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber...
This study explores language's fragmenting effect on usergenerated content by examining the diversity of knowledge representations across 25 different Wikipedia language edit...
Given only the URL of a web page, can we identify its topic? This is the question that we examine in this paper. Usually, web pages are classified using their content [7], but a U...
We describe an infrastructure for the collection and management of large amounts of text, and discuss the possibility of information extraction and visualisation from text corpora...
: ? Towards Combining Web Classification and Web Information Extraction: a Case Study Ping Luo, Fen Lin, Yuhong Xiong, Yong Zhao, Zhongzhi Shi HP Laboratories HPL-2009-86 Classific...