Text is ubiquitous and, not surprisingly, many important applications rely on textual data for a variety of tasks. As a notable example, information extraction applications derive...
Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay ...
Many malicious activities on the Web today make use of compromised Web servers, because these servers often have high pageranks and provide free resources. Attackers are therefore...
John P. John, Fang Yu, Yinglian Xie, Arvind Krishn...
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
An important precondition for the success of the Semantic Web is founded on the principle that the content of web pages will be semantically annotated. In this paper, we propose a ...
The proliferation of electronic content has notably lead to the apparition of large corpora of interrelated structured documents (such as HTML and XML Web pages) and semantic annot...