Sciweavers

45 search results - page 6 / 9
» An Efficient Similarity Join Algorithm with Cosine Similarit...
Sort
View
CIKM
2008
Springer
13 years 9 months ago
Achieving both high precision and high recall in near-duplicate detection
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
Lian'en Huang, Lei Wang, Xiaoming Li
ICDE
2006
IEEE
156views Database» more  ICDE 2006»
14 years 8 months ago
Reasoning About Approximate Match Query Results
Join techniques deploying approximate match predicates are fundamental data cleaning operations. A variety of predicates have been utilized to quantify approximate match in such o...
Sudipto Guha, Nick Koudas, Divesh Srivastava, Xiao...
WWW
2004
ACM
14 years 8 months ago
Web data integration using approximate string join
Web data integration is an important preprocessing step for web mining. It is highly likely that several records on the web whose textual representations differ may represent the ...
Yingping Huang, Gregory R. Madey
WWW
2008
ACM
14 years 8 months ago
Contextual advertising by combining relevance with click feedback
Contextual advertising supports much of the Web's ecosystem today. User experience and revenue (shared by the site publisher ad the ad network) depend on the relevance of the...
Deepayan Chakrabarti, Deepak Agarwal, Vanja Josifo...
ICDE
2003
IEEE
144views Database» more  ICDE 2003»
14 years 8 months ago
Scalable template-based query containment checking for web semantic caches
Semantic caches, originally proposed for client-server database systems, are being recently deployed to accelerate the serving of dynamic web content by transparently caching data...
Khalil Amiri, Sanghyun Park, Renu Tewari, Sriram P...