This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper...
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
Currently most link-related applications treat all links in the same web page to be identical. One link-related application usually requires one certain property of hyperlinks but...
Mingliang Zhu, Weiming Hu, Ou Wu, Xi Li, Xiaoqin Z...
Efficient management of RDF data is an important factor in realizing the Semantic Web vision. The existing approaches store RDF data based on triples instead of a relation model. I...
SALSA is a link-based ranking algorithm that takes the result set of a query as input, extends the set to include additional neighboring documents in the web graph, and performs a...