Sciweavers

72 search results - page 13 / 15
» Ontology-Focused Crawling of Web Documents
Sort
View
ECCV
2008
Springer
14 years 9 months ago
Learning Visual Shape Lexicon for Document Image Content Recognition
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content catego...
Guangyu Zhu, Xiaodong Yu, Yi Li, David S. Doermann
WWW
2005
ACM
14 years 8 months ago
The infocious web search engine: improving web searching through linguistic analysis
In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities pre...
Alexandros Ntoulas, Gerald Chao, Junghoo Cho
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
WWW
2010
ACM
14 years 2 months ago
New-web search with microblog annotations
Web search engines discover indexable documents by recursively ‘crawling’ from a seed URL. Their rankings take into account link popularity. While this works well, it introduc...
Tom Rowlands, David Hawking, Ramesh Sankaranarayan...
HPDC
2003
IEEE
14 years 1 months ago
Distributed Pagerank for P2P Systems
This paper defines and describes a fully distributed implementation of Google’s highly effective Pagerank algorithm, for “peer to peer”(P2P) systems. The implementation is ...
Karthikeyan Sankaralingam, Simha Sethumadhavan, Ja...