Sciweavers

2423 search results - page 87 / 485
» Hypertext Information Retrieval for the Web
Sort
View
EDBT
2006
ACM
112views Database» more  EDBT 2006»
14 years 9 months ago
Indexing Shared Content in Information Retrieval Systems
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
CIKM
2003
Springer
14 years 2 months ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...
WWW
2008
ACM
14 years 9 months ago
Recrawl scheduling based on information longevity
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Christopher Olston, Sandeep Pandey
LREC
2008
139views Education» more  LREC 2008»
13 years 10 months ago
Experiments to Investigate the Connection between Case Distribution and Topical Relevance of Search Terms in an Information Retr
We have performed a set of experiments made to investigate the utility of morphological analysis to improve retrieval of documents written in languages with relatively large morph...
Jussi Karlgren, Hercules Dalianis, Bart Jongejan
SAINT
2007
IEEE
14 years 3 months ago
A Generic API for Retrieving Human-Oriented Information from Social Network Services
A unique type of Web service, called a Social Network Service (SNS), first appeared in 2003. Some researches suggested a method to extract meaningful information from SNSs. Such ...
Teruaki Yokoyama, Shigeru Kashihara, Takeshi Okuda...