The integration of heterogenous data sources is a crucial step for the upcoming semantic web – if existing information is not integrated, where will the data come from that the s...
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as we...
Jacob Abernethy, Olivier Chapelle, Carlos Castillo
This paper presents a grammar-induction based approach to partitioning a Web page into several small pages while each small page fits not only spatially but also logically for mob...
Discovering the complex relationships between entities is one way of benefitting from the Semantic Web. This paper discusses new approaches to implementing -operators into RDF quer...
The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are on...