Sciweavers

72 search results - page 7 / 15
» Ontology-Focused Crawling of Web Documents
Sort
View
MAICS
2004
13 years 9 months ago
Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research
This paper describes the current state of RUgle, a system for classifying and indexing papers made available on the World Wide Web, in a domain-independent and universal manner. B...
Eric G. Berkowitz, Mohamed Reda Elkhadiri
SIGMOD
2006
ACM
232views Database» more  SIGMOD 2006»
14 years 8 months ago
To search or to crawl?: towards a query optimizer for text-centric tasks
Text is ubiquitous and, not surprisingly, many important applications rely on textual data for a variety of tasks. As a notable example, information extraction applications derive...
Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay ...
SIGMOD
2000
ACM
85views Database» more  SIGMOD 2000»
14 years 7 days ago
Finding Replicated Web Collections
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Junghoo Cho, Narayanan Shivakumar, Hector Garcia-M...
SPIRE
1999
Springer
14 years 4 days ago
CoBWeb - A Crawler for the Brazilian Web
One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed ...
Altigran Soares da Silva, Eveline A. Veloso, Paulo...
NSDI
2010
13 years 9 months ago
The Architecture and Implementation of an Extensible Web Crawler
Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...
Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...