Search Sciweavers | Sciweavers

563 search results - page 41 / 113

» Crawling the web for structured documents

145

click to vote

PDIS
1996
IEEE

143views Distributed And Parallel Com...» more PDIS 1996»

Querying the World Wide Web

15 years 8 months ago

Download www.cs.utsa.edu

The World Wide Web is a large, heterogeneous, distributedcollectionof documents connected by hypertext links. The most common technologycurrently used for searching the Web depend...

Alberto O. Mendelzon, George A. Mihaila, Tova Milo

claim paper

Read More »

158

click to vote

WWW
2009
ACM

152views Internet Technology» more WWW 2009»

Enhancing diversity, coverage and balance for summarization through structure learning

16 years 4 months ago

Download www2009.org

Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to ...

Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, Y...

claim paper

Read More »

127

click to vote

ERCIMDL
2005
Springer

113views Education» more ERCIMDL 2005»

mod_oai: An Apache Module for Metadata Harvesting

15 years 9 months ago

Download public.lanl.gov

We describe mod_oai, an Apache 2.0 module that implements the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The OAI-PMH is the de facto standard for metadata...

Michael L. Nelson, Herbert Van de Sompel, Xiaoming...

claim paper

Read More »

126

click to vote

NIPS
2001

140views Information Technology» more NIPS 2001»

The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank

15 years 5 months ago

Download research.microsoft.com

The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a pa...

Matthew Richardson, Pedro Domingos

claim paper

Read More »

169

click to vote

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

15 years 4 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

« Prev « First page 41 / 113 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers