Sciweavers

611 search results - page 20 / 123
» Random web crawls
Sort
View
PDP
2008
IEEE
14 years 2 months ago
Bulk-Synchronous On-Line Crawling on Clusters of Computers
This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. ...
Mauricio Marín, Carolina Bonacic
CIKM
2011
Springer
12 years 7 months ago
Focusing on novelty: a crawling strategy to build diverse language models
Word prediction performed by language models has an important role in many tasks as e.g. word sense disambiguation, speech recognition, hand-writing recognition, query spelling an...
Luciano Barbosa, Srinivas Bangalore
CIKM
2005
Springer
14 years 1 months ago
Focused crawling for both topical relevance and quality of medical information
Subject-specific search facilities on health sites are usually built using manual inclusion and exclusion rules. These can be expensive to maintain and often provide incomplete c...
Thanh Tin Tang, David Hawking, Nick Craswell, Kath...
WEBI
2007
Springer
14 years 1 months ago
Question Answering over Implicitly Structured Web Content
Implicitly structured content on the Web such as HTML tables and lists can be extremely valuable for web search, question answering, and information retrieval, as the implicit str...
Eugene Agichtein, Chris Burges, Eric Brill
GCC
2005
Springer
14 years 1 months ago
Parallel Web Spiders for Cooperative Information Gathering
Web spider is a widely used approach to obtain information for search engines. As the size of the Web grows, it becomes a natural choice to parallelize the spider’s crawling proc...
Jiewen Luo, Zhongzhi Shi, Maoguang Wang, Wei Wang