Sciweavers

289 search results - page 19 / 58
» Postal Address Detection from Web Documents
Sort
View
CIKM
2008
Springer
13 years 10 months ago
Achieving both high precision and high recall in near-duplicate detection
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
Lian'en Huang, Lei Wang, Xiaoming Li
NDQA
2003
131views Education» more  NDQA 2003»
13 years 9 months ago
Panel on Web-Based Question Answering
Early TREC-style Question Answering Systems were characterized by the following features: (a) the answer of the question was known to be included in a given local corpus, (b) the ...
Dragomir R. Radev
AIRWEB
2009
Springer
14 years 2 months ago
Looking into the past to better classify web spam
Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However,...
Na Dai, Brian D. Davison, Xiaoguang Qi
WSDM
2009
ACM
112views Data Mining» more  WSDM 2009»
14 years 2 months ago
Finding text reuse on the web
With the overwhelming number of reports on similar events originating from different sources on the web, it is often hard, using existing web search paradigms, to find the origi...
Michael Bendersky, W. Bruce Croft
CVPR
2010
IEEE
13 years 11 months ago
Harvesting Large-Scale Weakly-Tagged Image Databases from the Web
To leverage large-scale weakly-tagged images for computer vision tasks (such as object detection and scene recognition), a novel cross-modal tag cleansing and junk image filtering...
Jianping Fan