Sciweavers

563 search results - page 41 / 113
» Crawling the web for structured documents
Sort
View
PDIS
1996
IEEE
15 years 8 months ago
Querying the World Wide Web
The World Wide Web is a large, heterogeneous, distributedcollectionof documents connected by hypertext links. The most common technologycurrently used for searching the Web depend...
Alberto O. Mendelzon, George A. Mihaila, Tova Milo
WWW
2009
ACM
16 years 4 months ago
Enhancing diversity, coverage and balance for summarization through structure learning
Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to ...
Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, Y...
ERCIMDL
2005
Springer
113views Education» more  ERCIMDL 2005»
15 years 9 months ago
mod_oai: An Apache Module for Metadata Harvesting
We describe mod_oai, an Apache 2.0 module that implements the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The OAI-PMH is the de facto standard for metadata...
Michael L. Nelson, Herbert Van de Sompel, Xiaoming...
NIPS
2001
15 years 5 months ago
The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank
The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a pa...
Matthew Richardson, Pedro Domingos
SIGIR
2008
ACM
15 years 4 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...